Possible SYN flooding on port 3306 (MySQL)

The system setup is such that the MySQL servers that put the "Possible SYN flooding on port 3306" in the log files only are exposed to system internal backend services. These in turn aren't exposed to the wild wild web. Fronting the system we have the servers publishing services to internet.

Thus I was kind of stunned when the log messages started to appear and even though we had done a release of the system I found it far fetched that we should start to DOS our self. So why did the messages appear?

Two different error messages could be identified in the log files and they seem to be related, especially since the Java servers with link failure do communicate with the MySQL servers.

Possible SYN flooding on port 3306 @ MySQL server
Communication link failure @ Java backend servers


After some tcpdumping, head scratching and googling I think I have it down to the root cause, hidden in how TCP works in general, and the OS config in combination with MySQL beeing a "Greetings" protocol - well I has not payed any attention to that :-(

Lets go through the explanation while also explaining how a TCP connection is set up.

When establishing a TCP connection we have a client C and a server S
C send SYN to S
S send SYN-ACK to C

At this point the C considers the connection established and acknowledge this by
C send ACK to S

If the ACK never reaches the S or if S "forgets" about it C will have a connection but not S. In most cases (protocols)  this isn't a problem since C will send something almost immediately to S. Thus exposing the mismatching state.

However the MySQL protocol doesn't work like this!!
In the MySQL protocol the server is expected to send a "greeting" (as in SMTP,IMAP, POP) to the C. Thus if S "forgets" about C. C will be left dangling.....

How can this happen then?
S can be prompted to "forget" about it if a lot of connections are established in a short time frame a.k.a. SYN flooding. The burst of connections pushes the kernel queue of pending connections to overflow and this should trigger syncookies (if enabled).

When the kernel goes into "syncookie mode" it does
  • printing the possible SYN flooding on port 3306
  • S send SYN-ACK to AND immediately forget all about the connection!!
So all the things that are needed for this to occur are:
  1. The protocol requires the client to wait for the server to send the first message.
  2. The server gets too many connections at once, so it "forgets" some of them if not syncookies are enabled
How to prove?
If the theory is true then we should  be able to detect it by running lsof and netstat on both C and S. Comparing these we should find the rare TCP connections that only are established on C.

How to fix:
The fix is two folded:
  • First we should review and most likely change the kernel parameters of the server.
    •  net.ipv4.tcp_syncookies enable syncookies fire SYN-ACK and forget 
    • net.ipv4.tcp_max_syn_backlog (default 1024)  might need to be increased
  • Secondly - implement proper error handling and timeout configuration in the application taking the "greetings" protocol into account.

Comments

Popular posts from this blog

Part 1 - Disaster Recovery with SRM and vSphere Replication

Part 2 - Disaster Recovery with SRM and vSphere Replication