mercredi, septembre 16, 2015

Recipe for building a 10Mpps FreeBSD based router

First you need a server, with a minimum of 8 cores and a good NIC.
My setup is this one:

  • HP ProLiant DL360p Gen8: 8 cores Intel Xeon E5-2650 @ 2.60GHz
  • Quad port 10 Gigabit Chelsio TS540-CR


The first step is to entering the UEFI and disabling Hyper-Threading.
(note to myself: Need to generate benchmark comparing HT impact for a router use)

Once done, you can install a FreeBSD on it… but not a classical 10.2!

Default behavior of FreeBSD multi-queue NIC drivers is to create a number of queue equal to number of core (with a maximum number of 16 for Chelsio).
This mean for a 8 cores server, it will create 8 queues:
  • Each queue will obtain its own IRQ
  • The NIC will load-balance in/out frames between these queues
The NIC load-balance algorithm kepts same flow on the same queue by default: Then you need lot's of differents flow (different src/dst IP addresss or TCP/UDP ports) for a correct distribution among all theses queues: Don't bench your setup with only one FTP flow as example.

And FreeBSD meet a problem here because the number of queue/core didn't scale well after 4 cores:



=> On this 8 cores setup, you need to reduce your NIC queue number to 4 for the best performance.

But recently this problem was resolved by Alexander V. Chernikov (melifaro) on its experimental projects/routing branchs (lot's of cleaning regarding fine locking and testing new ideas):




Well… almost resolved: We see a big improvement and perfect linear scale up to 4 cores but still not linear to 8 cores. 
And surprisingly this non-linear problem isn't related to the improvement in forwarding code, but to the new random entropy harvester brings recently in head that is collecting first 2 bytes of each frame under single mutex.
Disabling INTERRUPT and NET_ETHER entropy sources (by adding harvest_mask="351" in /etc/rc.conf) solve the problem:


=> 9.5Mpps on this setup !

And for non-network people who didn't understand value in "paquet-per-second", here is a different graph regarding impact on forwarding performance with ipfw or pf enabled. With the equivalent IMIX on the right side:




Now how to reach 10Mpps? Just use a little more powerful CPU ;-)

And if you want to test these new performance on your hardware you just need one USB flash disk and installing on it these BSD Router Project (nanobsd) EXPERIMENTAL images used for theses benchs. It's just a dd to the USB flash disk, and more installation instructions are on BSDRP web site.