GZIP has a critical role in smashing monstrous energy costs and increasing performance of big data systems. Using compressed data, systems require less energy and computing resources to store, transmit, and process data. Consequently, many applications have adopted GZIP compression such as web page servers, WAN optimization, financial trading systems, data storage, genomics, and even Hadoop. However, some users don’t use GZIP because CPUs can become heavily loaded, resulting in decreased network performance and energy efficiency. Fortunately, GZIP hardware can offload data compression from CPUs and creates many benefits in big data systems:
- Increases System Performance:
Offloading compression widens I/O bottle necks by increasing effective data rates and can remove CPU limiting bottle necks, optimize workloads, and create better end user experiences.
- Reduces OpEx Costs:
Compared to CPUs performing compression, the energy break even point of GZIP hardware is on the order of weeks.
- Reduces CapEx Costs:
As a result of increased performance, systems will require fewer compute nodes since CPUs operate more efficiently.
Through using GZIP hardware, CPU offloading increases network I/O efficiency, optimizes workloads, and results in systems requiring less operational and capital costs.
GZIP Hardware Offloading Experiment
To demonstrate the benefits of GZIP, AHA created a hardware offloading experiment using a web page server. A hardware configuration diagram of the experiment is shown below. In this experiment, the client emulator floods the Apache web server with page requests. To provide performance comparisons, the web server changes its page responses using no GZIP, CPU GZIP, and hardware GZIP. Hardware GZIP compression is provided with two AHA372 accelerators. Additionally, a power meter measures watts used by the system. Through this configuration, we can demonstrate how GZIP compression affects bottle necks, optimizes workloads, and reduces operational and capital cost.
For performance comparisons, we a show a strip chart with rolling averages of the following metrics:
- Effective Throughput of the 10 Gig-E link
- CPU Load of the Apache Web Server
- Energy Efficiency Joule/Gigabit Transmitted by the Apache Web Server
The strip chart below shows these metrics captured in the bars labeled: Throughput, CPU Load, and Energy Efficiency using no GZIP, CPU GZIP, and AHA GZIP.
Increases System Performance
Before we begin analyzing performance, notice the drop in throughput and increase in CPU Load between the no GZIP and CPU GZIP cases. It’s easy to see why some users are turned off from using GZIP compression. The CPU cannot simultaneously serve pages and perform compression efficiently.
Hardware GZIP has a huge 18x throughput and 17x energy efficiency advantage over CPU GZIP. Even using an eight core CPU at 100% load, the web server is only capable 6% of the throughput using the GZIP hardware.
Let’s look at the throughput bar to see how system performance increases by reducing I/O bottle necks. On the 10Gig-E link, there is over a 4x increase (from 9 Gbps to 35 Gbps) in effective throughput from the no GZIP case to the hardware GZIP case. In the demo, we used two AHA372s, which have an aggregate compression rate of 40 Gpbs. However, the hardware GZIP case reaches an effective throughput of 35 Gbps because the data has a compression ratio on the order of 3.5:1. Since our maximum link throughput is 10 Gbps, we reach the maximum effective data rate possible for the given compression ratio (10 Gbps Link * 3.5:1 Compression Ratio = 35 Gbps effective throughput).
When offloading compression to widen bottle necks, knowing the compression ratio of data is important in order to select the compression rate of hardware for the application. In this experiment, we used two AHA372s for a maximum compression rate of 40 Gbps and effective throughput of 35 Gbps. (Alternatively, we could have used a single AHA374 for 40 Gbps.) If the data had a higher compression ratio, such as 6:1 or 8:1 we would require faster hardware, such as the AHA378 to reach the maximum effective data rate possible for those compression ratios (60 Gbps and 80 Gbps, respectively).
With GZIP offloading, systems create better end customer experiences (i.e., more pages served more quickly). By examining the CPU load bar, we can see how widening the I/O bottleneck, i.e., increasing effective throughput, allows the CPU to serve more pages. In this case, the CPU Load increased (from 15% to 59%) from no GZIP to hardware GZIP. As a result of widening, the web server is able to provide almost 4x more pages with twice the energy efficiency. GZIP offloading frees up the CPU to perform other tasks creating better end customer experiences.
Reduces OpEx Costs
Using the data from the experiment, we can calculate when the cost for energy saved is equal to the cost of the GZIP hardware. Let’s call this the energy break-even point (eBEP). The eBEP is calculated in the table below, where the eBEP savings is calculated against the No GZIP and CPU GZIP cases. In areas where the cost of electricity is high (e.g., Japan), compression hardware can pay for itself many times over.
Reduces CapEx Costs
In addition to saving energy costs, compression hardware reduces the number of compute nodes required, which means less operational and capital costs. Capital cost savings can be determined by a performance match between the no GZIP and hardware GZIP cases. To exceed the throughput of hardware compression (35 Gbps), four CPUs using no GZIP are required (9 Gbps * 4 CPU = 36 Gbps). Likewise, between the hardware GZIP and the CPU GZIP case 18 CPUs would be required (18 CPU * 2 Gbps = 36 Gbps). Consequently, additional CPUs mean additional support hardware i.e., mother boards, enclosures, and network equipment. Systems with hardware GZIP are more efficient requiring both less capital and operational expenses.
Summary and Conclusion
GZIP compression hardware offloading is a clear winner, when it comes to energy cost reduction and performance improvement. Hardware compression creates more efficient network I/O, which optimizes workloads on computing systems. More efficient network I/O reduces energy costs. With optimized workloads, fewer computing nodes are required resulting in less capital and operating costs. GZIP hardware offloading increases system performance, reduces Opex/CapEx. Although big data applications can consume a monstrous amount of resources, GZIP hardware increases performance and smashes costs.