Felix von Leitner posted a banchmark where OpneBSD did not scale that good. While there was a massive flame fest on the OpenBSD mailing-list I used my time to remeasure and optimize OpenBSD one by one. First I did with the help of Markus Friedl a speed up of in_pcblookup() -- the PCB lookup routine massivly used in the bindbench benchmark. We are using a 2nd hashlist to speed up these lookups. Some first results can be found in bind.png. I use a logarithmic scale so that the new version does not adhere the X-Axis of the plot. As you see even the optimized version does rise the more connections are open. This raise comes form overloaded hashlists. I think that all other systems suffer from the same effect. Also don't forget to raise the kern.maxfiles limit. Felix von Leitners benchmark does no error checking, you will then see a sudden drop of time spent after you hit that limit. This explains also the strange FreeBSD plot in Felix von Leitners paper. The bindbench tested more or less the connect(2) behavior of a machine, which is absolutly not that important -- manly because the SYN, SYN/ACK delay are often an order of magnitude bigger. So I extended Markus Friedls benchmark with a accept/connect benchmark to test the also important accept behaviour. The plot of a local run can be found in accept.png and a version with no logaritmic scale in accpet-nonlog.png. The new in_pcblookup() is massivly faster but still slow -- a run with 20000 connections took 3 minutes. Where do we lose the time? It's again in in_pcblookup(). tcp_input() calls in_pcblookup to find the correct listening socket if any. Our problem is that for this case in_pcblookup() still scales in a O(N) way. I changed the way tcp_input() is searching for listening sockets and so the run with 20000 connections dropped to 35 seconds execution time. I also tried some other optimizations but had to realize that those where not that effective. Manly becasue they did not happen that often -- at least with the benchmarks I was running. The machine running these test was a Netra T1 440MHz with 1GB of RAM and kern.maxfiles set to 120000. I also decreased the send & recv spaces of tcp sockets so that the kernel did not run out of memory. This is only needed if you are trying to do more that 40000 connections. If you hit the limit the machine becomes unusable and probably does a panic because of resource starvation.