GPUs Data Analytics Book
GPUs Data Analytics Book
m
pl
im
Introduction
en
ts
of
to GPUs for
Data Analytics
Advances and Applications for
Accelerated Computing
978-1-491-99801-4
[LSI]
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
3. New Possibilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Designed for Interoperability and Integration 8
8. Getting Started. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
iii
Introduction
v
The ebook is organized into eight chapters:
vi | Introduction
CHAPTER 1
The Evolution of Data Analytics
1
effective for batch-oriented data analytics applications, it lacks the
performance needed to process data streams in real time.
By 2010, the in-memory database became affordable owing to the
ability to configure servers with terabytes of low-cost random-access
memory (RAM). Given the dramatic increase in read/write access to
RAM (100 nanoseconds versus 10 milliseconds for DAS), the
improvement in performance was dramatic. But as with virtually all
advances in performance, the bottleneck shifted—this time from
I/O to compute for a growing number of applications.
This performance bottleneck has been overcome with the recent
advent of GPU-accelerated compute. As is explained in Chapter 2,
GPUs provide massively parallel processing power that we can scale
both up and out to achieve unprecedented levels of performance
and major improvements in price and performance in most data‐
base and data analytics applications.
3
ating the processing-intensive workloads common in today’s data
analysis applications.
7
• Interactive location-based intelligence (Chapter 6)
• Exact phrases
• AND/OR
• Wildcards
• Grouping
• Fuzzy search
• Proximity search
• Ranges of numbers
13
Figure 4-1. GPU databases accelerate the ML pipeline for faster model
development and deployment
Live data can have enormous value, but only if it can be processed as
it streams in. Without the processing power required to ingest and
analyze these streams in real time, however, organizations risk miss‐
ing out on the opportunities in two ways: the applications will be
limited to a relatively low volume and velocity of data, and the
results will come too late to have real value.
This need for speed is particularly true for the Internet of Things
(IoT). The IoT offers tremendous opportunities to derive actionable
insights from connected devices, both stationary and mobile, and to
make these devices operate more intelligently and, therefore, more
effectively.
Even before the advent of the IoT, the need to analyze live data in
real time, often coupled with data at rest, had become almost uni‐
versal. Although some organizations have industry-specific sources
of streaming data, nearly every organization has a data network, a
website, inbound and outbound phone calls, heating and lighting
controls, machine logs, a building security system, and other infra‐
structure—all of which continuously generates data that holds
potential—and perishable—value.
Today, with the IoT, or as some pundits call it, the Internet of Every‐
thing, the number of devices streaming data is destined to prolifer‐
ate to 30 billion or more by 2020, according to various estimates.
17
Only the GPU database has the processing power and other capabil‐
ities needed to take full advantage of the IoT. In particular, the abil‐
ity to perform repeated, similar instructions in parallel across a
massive number of small, efficient cores makes the GPU ideal for
IoT applications. Because many “Things” generate both time- and
location-dependent data, the GPU’s geospatial functionality enables
support for even the most demanding IoT applications.
The IoT era is here and growing relentlessly, and only a GPU data‐
base can enable organizations to take full advantage of the many
possibilities. For those online analytical processing and other busi‐
ness intelligence (BI) applications that stand to benefit from IoT
insights, some GPU-accelerated databases now support standards
like SQL-92 and BI tools, as well as the high availability and robust
security often required in such applications.
21
Figure 6-1. The GPU-accelerated database is ideally suited for the
interactive location-based analytics that are becoming increasingly
desirable
27
Cognitive computing applications will need to utilize the full spec‐
trum of analytical processes-business intelligence, AI, machine
learning, deep learning, natural-language processing, text search
and analytics, pattern recognition, and more. Every one of these
processes can be accelerated using GPUs. In fact, its thousands of
small, efficient cores make GPUs particularly well-suited to parallel
processing of the repeated similar instructions found in virtually all
of these compute-intensive workloads.
Cognitive computing servers and clusters can be scaled up or out as
needed to deliver whatever real-time performance might be required
—from subsecond to a few minutes. We can further improve perfor‐
mance by using algorithms and libraries optimized for GPUs.
By breaking through the cost and other barriers to achieving perfor‐
mance on the scale of a Watson supercomputer, GPU acceleration
will indeed usher in the Cognitive Era of computing.
29
Open designs make it easy to incorporate GPU-based solutions into
virtually any existing data architecture, where they can integrate
with both open source and commercial data analytics frameworks.
With purpose-built GPU solutions, the potential gain can quite liter‐
ally be without the pain normally associated with the techniques tra‐
ditionally used to achieve satisfactory performance. This means no
more need for indexing or redefining schemas or tuning/tweaking
algorithms, and no more need to ever again predetermine queries in
order to be able to ingest and analyze data in real time, regardless of
how the organization’s data analytics requirements might change
over time.
As with anything new, of course, it is best to research your options
and choose a solution that can meet all of your analytical needs,
scale as you require, and, most important, be purpose-built to take
full advantage of the GPU. So start with a pilot project to gain famil‐
iarity with the technology, because you will not be able to fully
appreciate the raw power and potential of a GPU-accelerated data‐
base until you experience it for yourself.