In Release It, I talk about users and the harm they do to our systems. One of the toughest types of user to deal with is the flash mob. A flash mob often results from Attacks of Self-Denial, like when you suddenly offer a $3000 laptop for $300 by mistake.
When a flash mob starts to arrive, you will suddenly see a surge of TCP/IP connection requests at your load-distribution layer. If the mob arrives slowly enough (less than 1,000 connections per second) then the app servers will be hurt the most. For a really fast mob, like when your site hits the top spot on digg.com, you can get way more than 1,000 connections per second. This puts the hurt on your web servers.
As the TCP/IP connection requests arrive, the OS queues them for servicing by the application. As the application gets around to calling "accept" on the server socket, the server's TCP/IP stack sends back the SYN/ACK packet and the connection is established. (There's a third step, but we can skip it for the moment.) At that point, the server hands the established connection off to a worker thread to process the request. Meanwhile, the thread that accepted the connection goes back to accept the next one.
Well, when a flash mob arrives, the connection requests arrive faster than the application can accept and dispatch them. The TCP/IP stack protects itself by limiting the number of pending connection requests, so if the requests arrive faster than the application can accept them, the queue will grow until the stack has to start refusing connection requests. At that point, your server will be returning intermittent errors and you're already failing.
The solution is much easier said than done: accept and dispatch connections faster than they arrive.
Filip Hanik compares some popular open-source servlet containers to see how well they stand up to floods of connection requests. In particular, he demonstrates the value of Tomcat 6's new NIO connector. Thanks to some very careful coding, this connector can accept 4,000 connections in 4 seconds on one server. Ultimately, he gets it to accept 16,000 concurrent connections on a single server. (Not surprisingly, RAM becomes the limiting factor.)
It's not clear that these connections can actually be serviced at that point, but that's a story for another day.