10GbE
So Where did 10GbE Go?
15/05/11 08:05
First some background. During 2009 this site saw over 20,000 page views, and at one point The Register even used 10GbE.net as an expert reference by mentioning & linking to us to validate a storyline. This resulted in 10GbE.net rocketing into the top four sites when searching for the string “10GbE”. 10GbE.net, and it’s sister 40gbe.net, were started in December 2007 as a one man stealth marketing effort to help get 10GbE off the ground more quickly. Let’s face it my job is to sell 10GbE network adapters, so I created the site to drive traffic, and improve sales. Both of which it did.
Initially 10GbE.net hosted several pages listing all the currently available network adapters by type, interface, performance & price. A sort of bleeding edge Consumer Reports for both adapters, and later switches. As the traffic, and attention grew it became more uncomfortable to operate in stealth. Also it had become a huge sink for my spare time. After speaking to one of my mentors it was decided to pull the plug on both sites.
Today the market is different, and more perspective is needed. I’ve recast this site, and cross linked my 10GbE.net domain to 40GbE.net to form this new Extreme Performance Networking blog. From this vantage point I can share more information, and my unique perspective while remaining in full view.
Initially 10GbE.net hosted several pages listing all the currently available network adapters by type, interface, performance & price. A sort of bleeding edge Consumer Reports for both adapters, and later switches. As the traffic, and attention grew it became more uncomfortable to operate in stealth. Also it had become a huge sink for my spare time. After speaking to one of my mentors it was decided to pull the plug on both sites.
Today the market is different, and more perspective is needed. I’ve recast this site, and cross linked my 10GbE.net domain to 40GbE.net to form this new Extreme Performance Networking blog. From this vantage point I can share more information, and my unique perspective while remaining in full view.
9.9856Gbps It's the Law
17/03/09 07:52
Someone said yesterday that we were wrong and that the speed limit for 10GbE was really 10.3125Gbps, so dumbstruck I needed to validate our claim of 9.95Gbps.
Most NICs talk XAUI to a PHY driver chip that then controls the media. XAUI has four pairs of receive/transmit lanes that each operating at 3.125Gbps and utilize 8/10b encoding. So after decoding you have 10.000Gbps of actual usable media bandwidth, hence 10G. Ethernet adds it's own shipping and handling fee though and so people who measure bandwidth using the operating system (OS) on their server will never see the full capacity of the pipe due to this overhead.
Ethernet is then traveling over XAUI and it requires an interframe gap (IFG) spacing of 96 bits, more precisely the time it would take to transmit 96 bits (actually 8 octets) which on 10GbE is 9.6ns. It turns out though that the 10GbE spec actually redefined the gap to 40 bits or 4.0ns. So between every packet there is 4ns of air.
Ethernet also requires a seven byte preamble and a one byte start of frame delimiter, that's 64 more bits or another 6.4ns of air. So between any two actual "packets" on a 10GbE wire this is a total 10.4ns of dead air. Well how does that impact the actual OS measurable bandwidth?
Let's look at the detailed best case, a jumbo frame. First there is the 7 octet preamble, 1 octet start-of-frame delimiter, then what would be called the "Jumbo Frame" which is 9,018 octets (it's really a 9000 octet payload with a 14 octet header and a 4 octet CRC at the end) then the IGF or 5 octets in the case of 10GbE. When people typically measure NIC bandwidth they only measure the "packet" or 9018 byte part, and often don't know about those other 13 required bytes. So a 10GbE NIC running jumbo frames can never achieve more than (9018/9031)*10Gbps or 9.9856Gbps and a good wire-rate NIC will demonstrate this.
Now in the worst case we have the dreaded "64 byte" frame which has the same 13 bytes of stuff between "frames" so the that calculation is (64/77)*10Gbps or 8.3117Gbps. So NEVER expect your 10GbE NIC to deliver 10Gbps using 64 byte frames (which really only contain 46 bytes of actual data) because Ethernet will shackle it to 8.3117Gbps. If your application just measures actual payload then it's 46/77*10Gbps or 5.97Gbps.
So if you drive a 9,000 byte tractor trailer you can speed along at 9.9856Gbps, but if you're tooling along on your little 64 byte rice rocket you'll never get above 5.97Gbps.
Most NICs talk XAUI to a PHY driver chip that then controls the media. XAUI has four pairs of receive/transmit lanes that each operating at 3.125Gbps and utilize 8/10b encoding. So after decoding you have 10.000Gbps of actual usable media bandwidth, hence 10G. Ethernet adds it's own shipping and handling fee though and so people who measure bandwidth using the operating system (OS) on their server will never see the full capacity of the pipe due to this overhead.
Ethernet is then traveling over XAUI and it requires an interframe gap (IFG) spacing of 96 bits, more precisely the time it would take to transmit 96 bits (actually 8 octets) which on 10GbE is 9.6ns. It turns out though that the 10GbE spec actually redefined the gap to 40 bits or 4.0ns. So between every packet there is 4ns of air.
Ethernet also requires a seven byte preamble and a one byte start of frame delimiter, that's 64 more bits or another 6.4ns of air. So between any two actual "packets" on a 10GbE wire this is a total 10.4ns of dead air. Well how does that impact the actual OS measurable bandwidth?
Let's look at the detailed best case, a jumbo frame. First there is the 7 octet preamble, 1 octet start-of-frame delimiter, then what would be called the "Jumbo Frame" which is 9,018 octets (it's really a 9000 octet payload with a 14 octet header and a 4 octet CRC at the end) then the IGF or 5 octets in the case of 10GbE. When people typically measure NIC bandwidth they only measure the "packet" or 9018 byte part, and often don't know about those other 13 required bytes. So a 10GbE NIC running jumbo frames can never achieve more than (9018/9031)*10Gbps or 9.9856Gbps and a good wire-rate NIC will demonstrate this.
Now in the worst case we have the dreaded "64 byte" frame which has the same 13 bytes of stuff between "frames" so the that calculation is (64/77)*10Gbps or 8.3117Gbps. So NEVER expect your 10GbE NIC to deliver 10Gbps using 64 byte frames (which really only contain 46 bytes of actual data) because Ethernet will shackle it to 8.3117Gbps. If your application just measures actual payload then it's 46/77*10Gbps or 5.97Gbps.
So if you drive a 9,000 byte tractor trailer you can speed along at 9.9856Gbps, but if you're tooling along on your little 64 byte rice rocket you'll never get above 5.97Gbps.
Dualies Aren't Just for Trucks
04/02/09 07:41
One would think that after 30 years our industry would have developed a NIC naming convention for "dual-port." Does a dual-port NIC mean your OS sees one or two interfaces? Do dual-port NICs mean that one port is active and the other is for fail-over? Can a dual-port run traffic through both port simultaneously? It all depends on who you talk to, and the product they're selling.
With 10GbE we've seen three main approaches for building dual-port NICs:
Active/Active: this is what most people expect, a single OS interface with a driver that sprays traffic fairly evenly across both network ports and if one port fails the other picks up the slack until it can handle no more:
Dual-NIC: two OS interfaces are presented to the OS and both interfaces run independently. This typically affords the best performance and the most flexibility:
Active/Passive or Active/Fail-over: a single OS interface with a driver that monitors connectivity on the active port and if the connection fails the driver migrates traffic rapidly over to the second port:
Do the above categories cover it, or do we need more lingo? When looking for a dual-port NIC, what features do you require, and what do you expect? Please let us know.
P.S. As I brought this page back online I left off the links as most no longer apply, but from a historical perspective it is interesting to see how things have progressed.
With 10GbE we've seen three main approaches for building dual-port NICs:
Active/Active: this is what most people expect, a single OS interface with a driver that sprays traffic fairly evenly across both network ports and if one port fails the other picks up the slack until it can handle no more:
- Chelsio's N320E for $790 is an example of this type of card.
- Intel's AF DA card for $799 appears to be another example of this class of card.
Dual-NIC: two OS interfaces are presented to the OS and both interfaces run independently. This typically affords the best performance and the most flexibility:
- Myricom's 10G-PCIE2-8B2-2S+E for $995 appears to be the only example of this approach. Myricom utilizes two unique 10GbE controllers on the same PCI Express Gen2 NIC and a PCI Express bridge chip to break the slot into two unique NIC devices.
Active/Passive or Active/Fail-over: a single OS interface with a driver that monitors connectivity on the active port and if the connection fails the driver migrates traffic rapidly over to the second port:
- Myricom's 10G-PCIE-8B-2S+E for $795 is an example of this type of card. The fail over time is under 10 microseconds.
- Chelsio's B320E Bypass adapter for $3,483 is similar but it can detect an OS/BIOS/System failure and make a hard switch over to the second port.
Do the above categories cover it, or do we need more lingo? When looking for a dual-port NIC, what features do you require, and what do you expect? Please let us know.
P.S. As I brought this page back online I left off the links as most no longer apply, but from a historical perspective it is interesting to see how things have progressed.
Thinning the 10GbE Herd
10/01/09 07:33
In 2007 over one million 10GbE network ports were purchased. Many of those were for switch to switch interconnects but some were to connect servers to networks via 10GbE. Natural selection is now taking effect in the 10GbE NIC market as the big dogs, Intel & Broadcom, start thrashing around in an effort to secure market share as 10GbE matures. Both want to dominate the 10GbE LAN on Motherboard (LoM) market. In the NIC market four companies likely supply over 80% of the 10GbE NICs purchased and they are: Chelsio, Intel Myricom and Neterion. The remaining 20% of NIC sales fall to companies like: Broadcom, SMC, NetXen, ServerEngines, Tehuti, AdvancedIO, Endace, Napatech, etc... One should be wondering why Broadcom is in the second group, it's because Broadcom's focus is on selling 10GbE silicon to OEMs like IBM and HP for LoM projects positioning their silicon on high end server mother boards and not retailing NIC cards.
Officially the first documented victim is NetEffect, the leader in iWarp (Infiniband for 10GbE) NICs. NetEffect rose from the ashes of a failed Infiniband company, Banderacom, earlier this decade to apply their silicon development skills and Infiniband algorithms to the more stable Ethernet market as a new feature called iWarp. NetEffect in-fact led the iWarp charge, it was the self proclaimed leader in low-latency iWarp 10GbE NICs. In August NetEffect filed for reorganization in US Bankruptcy court. With the failure of NetEffect the market has cast its vote and drove a steak through the heart of iWarp, hopefully terminating this feature.
Rumors have been swirling around Teak Technologies, a maker of 10GbE NICs and a switch, for some time. It appears that Teak has not weathered the storm and has since faded away, their domain name is no longer resolving to an IP address. The domain was never transferred from the founder, and the founder announced this spring on Linkedin that he had moved on some time ago. Is it conclusive evidence, no, but would you buy technology from a tech company whose URL won't resolve to a server?
It is a tough economic climate for startup NIC companies, particularly those in the bottom 20% as they have likely never had a quarter in the black. Now is a challenging time to be out there seeking another round of capital from ones VCs. Several have been without an injection of new funding for over two years and lack the sales volume required to sustain their own existence much beyond year end. As such we've directly questioned one firm to see if they are alive, and another that is widely rumored in the industry to be in trouble, but their marketing departments are still bailing.
Officially the first documented victim is NetEffect, the leader in iWarp (Infiniband for 10GbE) NICs. NetEffect rose from the ashes of a failed Infiniband company, Banderacom, earlier this decade to apply their silicon development skills and Infiniband algorithms to the more stable Ethernet market as a new feature called iWarp. NetEffect in-fact led the iWarp charge, it was the self proclaimed leader in low-latency iWarp 10GbE NICs. In August NetEffect filed for reorganization in US Bankruptcy court. With the failure of NetEffect the market has cast its vote and drove a steak through the heart of iWarp, hopefully terminating this feature.
Rumors have been swirling around Teak Technologies, a maker of 10GbE NICs and a switch, for some time. It appears that Teak has not weathered the storm and has since faded away, their domain name is no longer resolving to an IP address. The domain was never transferred from the founder, and the founder announced this spring on Linkedin that he had moved on some time ago. Is it conclusive evidence, no, but would you buy technology from a tech company whose URL won't resolve to a server?
It is a tough economic climate for startup NIC companies, particularly those in the bottom 20% as they have likely never had a quarter in the black. Now is a challenging time to be out there seeking another round of capital from ones VCs. Several have been without an injection of new funding for over two years and lack the sales volume required to sustain their own existence much beyond year end. As such we've directly questioned one firm to see if they are alive, and another that is widely rumored in the industry to be in trouble, but their marketing departments are still bailing.
Shake & Bake - Conduct a Bake-off
20/09/08 07:06
Have you ever held a Bake-Off to select a core technology for a project? Not an RFI, but an actually honest to god series of "real world" tests. Few things are as exciting as setting up a technology obstacle course that is somewhat indicative of what your business environment is like then having various vendors run through it. Several times in my past I've conducted these when emerging technologies like server UPS systems and VOIP telephony were new in order to shake out the posers from the players, evaluate "real-world" performance then determine value.
Few vendors post actual price and performance data on the web, let alone the methodology they used to arrive at those performance numbers. If only there were an independent third party that actually ran Netperf, Iperf, ntttcps, ntttcpr and other tools on all the available 10GbE NICs using the same test systems then posted the results for everyone to see. Some companies would never recover. For legal reasons the vendors won't, and in most cases do not want to, do it because the results would only help one or two companies and likely not theirs. Today all most consumers have to go on is the cost of the adapter, wouldn't it be great if you knew the cost/Mbps of the adapter prior to buying it so you could easily compare between adapters. Some would argue that features like iWARP and TOE should be factored in, but today they are just marketing fluff and rarely delivery any significant end user value.
So how do you determine which NIC will perform the best and deliver the most value for your company, do a bake-off! If you can make the time and the project is big enough the cost to conduct the back-off should easily be offset by the savings, education, and performance gains you reap over time. Also a well constructed and executed bake-off will demonstrate not only to you, but your management, that you're an effective individual and a good steward of the companies resources.
Finally, share the full set of results with the vendors that participated, some will moan and groan, while others will kindly thank you for the opportunity to compete and move on. If the race was close their reactions at this point might be your deciding factor. So pull on your oven mitts and start baking...
Few vendors post actual price and performance data on the web, let alone the methodology they used to arrive at those performance numbers. If only there were an independent third party that actually ran Netperf, Iperf, ntttcps, ntttcpr and other tools on all the available 10GbE NICs using the same test systems then posted the results for everyone to see. Some companies would never recover. For legal reasons the vendors won't, and in most cases do not want to, do it because the results would only help one or two companies and likely not theirs. Today all most consumers have to go on is the cost of the adapter, wouldn't it be great if you knew the cost/Mbps of the adapter prior to buying it so you could easily compare between adapters. Some would argue that features like iWARP and TOE should be factored in, but today they are just marketing fluff and rarely delivery any significant end user value.
So how do you determine which NIC will perform the best and deliver the most value for your company, do a bake-off! If you can make the time and the project is big enough the cost to conduct the back-off should easily be offset by the savings, education, and performance gains you reap over time. Also a well constructed and executed bake-off will demonstrate not only to you, but your management, that you're an effective individual and a good steward of the companies resources.
Finally, share the full set of results with the vendors that participated, some will moan and groan, while others will kindly thank you for the opportunity to compete and move on. If the race was close their reactions at this point might be your deciding factor. So pull on your oven mitts and start baking...
Hidden Costs & Benefits of 10GbE
13/08/08 18:58
When embarking on a new IT project one rarely considers the network, unless of course the network is the project. Data networks in many cases get the same level of attention as the AC power. You expect plenty to be available, all the time and without interruption. Rarely is the network considered a performance bottle neck.
One time I assumed responsibility for improving the performance of an MS SQL server that was vital to our business. The primary job this server ran took 75 minutes and it was scheduled to run, how many of you see this coming, every hour! This server was tracking and reporting on $10's of millions in new business every month.At first glance I noticed several back to back to back bottle necks. The system was memory starved, the drives were in a near constant state of thrashing and all SQL I/O from the system went through a $10/NIC card. Although the NIC functioned it was forcing the switch to drop far too many packets. At lunch that day we picked up a newer server class NIC card for $40 and immediately recorded a substantial performance improvement. The job would finish in just under the 60 minutes allowed. We could have spent the next week chasing performance curves, instead we installed a new server, a dual processor single core box and the job now completed in well under a minute. So a $40 NIC improved performance by 20% while replacing the whole server for roughly $5,000 improved performance by 98%. Clearly the NIC delivered the biggest bang for the buck, but it just brought the network performance curve in-line with that of the CPU, memory & disk.
How many dual-socket quad-core servers were installed today, August 13th 2008, with GbE? These servers have 4X the horse power of my $5,000 server from 2002, but they both share the same GbE. Furthermore, today we use VMWare and Xen to pack several logical servers into a single physical server in an effort to more efficiently utilize our hardware resources. We don't hesitate to add more memory or disk, but adding a 10GbE board requires substantially more effort and planning.
When making the jump from GbE to 10GbE one needs to not only select a NIC, but the media (CX4 or fiber) and a new switch infrastructure. High performance NICs run $700-$2,000/each. depending on the media and vendor. If you go fiber the optics run $500-3,000/each and you need one on each end of the cable. Finally there's the switch. Stack-able layer-2 switches run in the $400-$1,200/port range while enterprise layer-3 switches often run several thousand dollars/port.
If your server is I/O bound a good 10GbE NIC and switch can enable 5-10X the output of the "free" GbE port that comes with your server. Suppose you purchase a new server for $5,000, then you add a high performance 10GbE CX4 copper NIC and use a low cost layer two switch so the upgrade to 10GbE costs roughly $1,200 for this server. You need to only measure a 25% gain in overall performance for you to realize a positive return on your investment! There are a new breed of hybrid switches that now offer 24 GbE ports and four 10GbE ports so one can easily make the shift from GbE for servers to 10GbE. Consider giving 10GbE a try.
One time I assumed responsibility for improving the performance of an MS SQL server that was vital to our business. The primary job this server ran took 75 minutes and it was scheduled to run, how many of you see this coming, every hour! This server was tracking and reporting on $10's of millions in new business every month.At first glance I noticed several back to back to back bottle necks. The system was memory starved, the drives were in a near constant state of thrashing and all SQL I/O from the system went through a $10/NIC card. Although the NIC functioned it was forcing the switch to drop far too many packets. At lunch that day we picked up a newer server class NIC card for $40 and immediately recorded a substantial performance improvement. The job would finish in just under the 60 minutes allowed. We could have spent the next week chasing performance curves, instead we installed a new server, a dual processor single core box and the job now completed in well under a minute. So a $40 NIC improved performance by 20% while replacing the whole server for roughly $5,000 improved performance by 98%. Clearly the NIC delivered the biggest bang for the buck, but it just brought the network performance curve in-line with that of the CPU, memory & disk.
How many dual-socket quad-core servers were installed today, August 13th 2008, with GbE? These servers have 4X the horse power of my $5,000 server from 2002, but they both share the same GbE. Furthermore, today we use VMWare and Xen to pack several logical servers into a single physical server in an effort to more efficiently utilize our hardware resources. We don't hesitate to add more memory or disk, but adding a 10GbE board requires substantially more effort and planning.
When making the jump from GbE to 10GbE one needs to not only select a NIC, but the media (CX4 or fiber) and a new switch infrastructure. High performance NICs run $700-$2,000/each. depending on the media and vendor. If you go fiber the optics run $500-3,000/each and you need one on each end of the cable. Finally there's the switch. Stack-able layer-2 switches run in the $400-$1,200/port range while enterprise layer-3 switches often run several thousand dollars/port.
If your server is I/O bound a good 10GbE NIC and switch can enable 5-10X the output of the "free" GbE port that comes with your server. Suppose you purchase a new server for $5,000, then you add a high performance 10GbE CX4 copper NIC and use a low cost layer two switch so the upgrade to 10GbE costs roughly $1,200 for this server. You need to only measure a 25% gain in overall performance for you to realize a positive return on your investment! There are a new breed of hybrid switches that now offer 24 GbE ports and four 10GbE ports so one can easily make the shift from GbE for servers to 10GbE. Consider giving 10GbE a try.