The myriad electronic devices and “things” connected to the Internet create a digital universe comprising data about anything and everything that is happening: in a factory; throughout a restaurant chain, across a smart city; within a research project. Extracting value from the data has the potential to improve business performance, accelerate medical advancement, enhance security and law enforcement, and improve the daily lives and social-media experiences of millions of digital consumers around the world - to name a few.
Traditional databases and querying tools are not able to handle the sheer volume of data, the diversity of data types, and the complex ways of combining and manipulating the data to gain desired insights. Accordingly, new types of data sources, log tools (such as Splunk) and visualization tools (like Tableau) are emerging that are better suited to analytic workloads.
Adapting to Data-Analytics Workloads
Big Data analytics tools access and use data in a different way to the transactional processes of historical applications for querying databases. Whereas transactional applications work on small quantities of data in a sequential, indexed fashion, applications such as SAP predictive analytics retrieve data in bulk and can apply a wide variety of analysis techniques. In addition, the rapidly increasing number of devices in users’ hands, and things connected to the IoT are generating data at an exponentially accelerating rate.
Combined, the exploding volume of data and the complexity of the workloads are outstripping the performance of established data center compute platforms. Customers are demanding the insights generated by analytics applications quickly, sometimes even in real-time for business or financial purposes, and traditional architectures cannot keep pace. Conventional approaches to boosting performance are falling short, as CPU advancement related to Moore’s Law as well as increasing operating frequency (Dennard scaling) have both slowed within the past five years. On the other hand, simply adding extra blades or cabinets is increasingly difficult from the point of view of cost, space, and power.
View from the Inside
Experts in high-performance computing are driving the development of new platforms that leverage hardware acceleration within heterogeneous architectures comprising flexible combinations of conventional CPUs, Graphics Processing Units (GPU) and Field-Programmable Gate Arrays (FPGA). A new generation of compute accelerators is emerging, which takes advantage of the individual strengths of each type of processor to deliver significant improvements in performance, power, and space efficiency.
Chris Kachris, CEO of InAccel, explains that he sees heterogeneous computing based on hardware accelerators like FPGAs becoming the main platforms for the efficient execution of data analytics. “Typical processors cannot sustain the increased computation complexity of analytics applications,” he says, “While an FPGA can provide the flexibility of the processors through the utilization of ready-to-use modules and the performance of specialized chips.
InAccel creates high-performance accelerators for use as IP in machine learning, financial, or other applications, including accelerators for Apache Spark-based applications, these accelerators can be deployed to FPGA IaaS. The accelerators enhance the performance of mission-critical applications and significantly reduce customers’ ownership costs.
“FPGAs can provide much higher performance than other platforms and also are adaptable enough to meet future requirements and future algorithms,” Kachris added. “We can deploy FPGAs in a library-based scheme without the need to change the original applications.”
Heterogeneous compute platform
Brad Kashani of Bigstream provides more detail on the types of workloads that suit GPUs and FPGAs. “Operations like dense matrix multiply, or where memory needs are very high, might be appropriate for GPUs,” he says. “On the other hand, Sparse matrix operations, SQL operations such as filter, join, or sort, or applications that involved manipulating data formats, are better supported by FPGAs, among others. FPGAs provide the flexibility needed to capture a large number of data analytics operations efficiently.
“In addition, we have successfully increased acceleration by using FPGAs in inline mode, processing data directly from the Network Interface Card (NIC) without CPU intervention. This mode provides the highest performance we have seen, while reducing the variance of latency through the system.”
Where large data requirements call for a solution that runs on a cluster of machines, a heterogeneous architecture enables compute-intensive and repetitive tasks to be offloaded from the main CPU. A configurable accelerator, like an FPGA, can be optimized accordingly. CEO and co-founder of Vitesse Data, CK Tan, explains how his company has put this approach into action to further accelerate the performance of its Deepgreen DB, a variant of Greenplum that runs many times faster than the open-source offering.
“Many analytics applications require running SQL on MPP databases, and customers want the fastest possible performance. There is a limit to the density of CPUs in a cluster, and it is quite clear that CPUs alone cannot satisfy the need for ever faster aggregates for business intelligence and ad-hoc OLAP queries.” Using FPGAs as accelerators frees up CPUs to work on other concurrent queries, improving the overall responsiveness of the system. “This has allowed us to quadruple the compute resources available on the cluster without expanding its footprint. This dovetails nicely with the call to derive more intelligence and insight from corporate data.”
He goes on to explain how acceleration is effectively transparent for end users. “Greenplum customers can switch over to Deepgreen by simply swapping the database software, and immediately see their applications run faster on the same hardware. In addition, the customer can make the system run even faster by adding FPGA cards into the machines; Deepgreen DB will make use of the extra resources to push the performance needle further to the right.”
Prediction and Learning
One area where companies expect to derive significant extra value as the science of big data analytics continues to move forward is in predictive analysis, to help anticipate trends and events in various spheres from machine maintenance to financial services. “Large-scale analytics, such as building a predictive model of the data at hand, is often run as a process overnight,” comments Xelera CEO Felix Winterstein. “Increasingly, we are seeing that customers either want more accurate models, or they want to reduce turnaround time to hours, or even minutes, to get data models instantaneously.”
To meet these demands, Xelera is using FPGAs to achieve up to two orders of magnitude acceleration in targeted applications, compared with multi-core CPUs. “The GPU is a strong candidate as a vehicle for acceleration in some applications that are by nature well suited for the GPU architecture,” he comments. “On the other hand, the FPGA is an extremely versatile accelerator that can speed up a much broader range of applications.”
In addition to faster execution times for analytics workloads, power savings are a further advantage gained by hardware acceleration. Ryft has achieved acceleration factors of between 10 and 100 by using FPGAs for workloads such as fuzzy text search, signal processing, and machine-learning image or video analytics. At the same time, power consumption is also much lower. “We have seen power consumption reduced as much as ten-fold in many real-world scenarios,” explains Pat McGarry, VP of Engineering. “The massive performance gains with smaller power requirements are real eye-openers for customers interested in analytics at the edge, where power is often at a premium, and racks-upon-racks of equipment are simply not an option.”
He goes on to discuss the role of hardware acceleration in on-premises and hybrid Edge/Cloud deployment models. “Although most of our customers run on-premise with local FPGA acceleration resources connected via PCIe to standard, racked x86 servers, we also offer an ability to leverage FPGAs running on the AWS FPGA-accelerated EC2 F1 instance, which allows for cloud-based workload acceleration or even hybrid deployment models that combine on-premise and Cloud computing.”
He expects hybrid deployment to become more and more important over time. “It has become abundantly clear that it is not possible to ship “all” edge data to the cloud in a timely fashion, and certainly not cost-effectively. We will see forward-looking analytics platforms leverage some amount of meaningful data analytics processing at the edge, which will thin the data being sent to the cloud for potentially more extensive processing.”
Agreeing that FPGAs have a place in both sides of the hybrid equation, he adds, “Clearly the acceleration capabilities that FPGAs afford at the edge in a variety of machine-learning capacities will expand from today’s niche-plays to become even more widely interesting to the data analytics community.”
Looking ahead, machine learning will become ever more critical to the future of effective data analytics, as the pace of data generation continues to increase and real-time algorithmic capability, storage, network bandwidth, compute performance, and human analyst capabilities are unable to keep pace.
Conclusion
Enterprise and scientific users are becoming increasingly aware of the value contained within the vast quantities of data now collected from the physical and virtual worlds. Demand for data analytics is typically proportional to the amount of data being generated. As the quantity of data continues to expand exponentially, the need for data analytics will grow at a similar rate.
Contemporary hardware and software architectures cannot be leveraged cost-effectively to meet the data generation, storage, and analytics needs. Hence, there is an urgent need for new and novel approaches based on heterogeneous compute platforms that appropriately couple CPU and FPGA resources. FPGAs, with their configurability, flexibility, scope for parallelism, and power efficiency, ensure effective and efficient acceleration of data-processing workloads.