0% found this document useful (0 votes)
55 views

Why Big Data Matters: by Frank Ohlhor ST

big data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Why Big Data Matters: by Frank Ohlhor ST

big data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Big Data Analytics: Turning Big Data into Big Money

By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.

CHAPTER

Why Big Data


Matters

n owin g wh at Big Data is an d kn owin g its valu e are two differen t


th in gs. Even with an u n derstan din g of Big Data an alytics, th e
valu e of th e in form ation can still be dif cu lt to visu alize. At rst
glan ce, th e well of stru ctu red, u n stru ctu red, an d sem istru ctu red data
seem s alm ost u n fath om able, with each bu cket drawn bein g little m ore
th an a m ish m ash of u n related data elem en ts.
Fin din g wh at m atters an d wh y it m atters is on e of th e rst steps
in drin kin g from th e well of Big Data an d th e key to avoid drown in g in
in form ation . However, th is qu estion still rem ain s: Wh y does Big Data
m atter? It seem s dif cu lt to an swer for sm all an d m ediu m bu sin esses,
especially th ose th at h ave sh u n n ed bu sin ess in telligen ce solu tion s in
th e past an d h ave com e to rely on oth er m eth ods to develop th eir
m arkets an d m eet th eir goals.
For th e en terprise m arket, Big Data an alytics h as proven its
valu e, an d exam ples abou n d. Com pan ies su ch as Facebook,
Am azon , an d Google h ave com e to rely on Big Data an alytics as part
of th eir prim ary m arketin g sch em es as w ell as a m ean s of servicin g
th eir cu stom ers better.
For exam ple, Am azon h as leveraged its Big Data well to create an
extrem ely accu rate represen tation of wh at produ cts a cu stom er sh ou ld
bu y. Am azon accom plish es th at by storin g each cu stom er s search es
an d pu rch ases an d alm ost an y oth er piece of in form ation available,

11

c02 22 October 2012; 17:53:9


12 BI G DATA ANAL YTI CS

an d th en applyin g algorith m s to th at in form ation to com pare on e


cu stom er s in form ation with all of th e oth er cu stom ers in form ation .
Am azon h as learn ed th e key trick of extractin g valu e from a large
data well an d h as applied perform an ce an d depth to a m assive am ou n t
of data to determ in e wh at is im portan t an d wh at is extran eou s. Th e
com pan y h as su ccessfu lly captu red th e data exh au st th at an y cu s-
tom er or poten tial cu stom er h as left beh in d to bu ild an in n ovative
recom m en dation an d m arketin g data elem en t.
Th e resu lts are real an d m easu rable, an d th ey offer a practical
advan tage for a cu stom er. Take, for exam ple, a cu stom er bu yin g a
jacket in a sn owy region . Wh y n ot su ggest pu rch asin g gloves to m atch ,
or boots, as well as a sn ow sh ovel, an ice m elt, an d tire ch ain s? For an
in -store salesperson , th ose recom m en dation s m ay com e n atu rally; for
Am azon , Big Data an alytics is able to in terpret tren ds an d brin g
u n derstan din g to th e pu rch asin g process by sim ply lookin g at wh at
cu stom ers are bu yin g, wh ere th ey are bu yin g it, an d wh at th ey h ave
pu rch ased in th e past. Th ose data, com bin ed with oth er pu blic data
su ch as cen su s, m eteorological, an d even social n etworkin g data,
create a u n iqu e capability th at services th e cu stom er an d Am azon
as well.
Mu ch th e sam e can be said for Facebook, wh ere Big Data com es
in to play for critical featu res su ch as frien d su ggestion s, targeted ads,
an d oth er m em ber-focu sed offerin gs. Facebook is able to accu m u late
in form ation by u sin g an alytics th at leverage pattern recogn ition , data
m ash -u ps, an d several oth er data sou rces, su ch as a u ser s preferen ces,
h istory, an d cu rren t activity. Th ose data are m in ed, alon g with th e data
from all of th e oth er u sers, to create focu sed recom m en dation s, wh ich
are reported to be qu ite accu rate for th e m ajority of u sers.

BIG DA TA REA CHES DEEP

Google leverages th e Big Data m odel as well, an d it is on e of th e ori-


gin ators of th e software elem en ts th at m ake Big Data possible. How-
ever, Google s approach an d focu s is som ewh at differen t from th at of
com pan ies like Facebook an d Am azon . Google aim s to u se Big Data to
its fu llest exten t, to ju dge search resu lts, predict In tern et traf c u sage,
an d service cu stom ers with Google s own application s. From th e

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 13

advertisin g perspective, Web search es can be tied to produ cts th at t


in to th e criteria of th e search by delvin g in to a vast m in e of Web search
in form ation , u ser preferen ces, cookies, h istories, an d so on .
Of cou rse, Am azon , Google, an d Facebook are h u ge en terprises
an d h ave access to petabytes of data for an alytics. However, th ey are
n ot th e on ly exam ples of h ow Big Data h as affected bu sin ess processes.
Exam ples abou n d from th e scien ti c, m edical, an d en gin eerin g com -
m u n ities, wh ere h u ge am ou n ts of data are gath ered th rou gh experi-
m en tation , observation , an d case stu dies. For exam ple, th e Large
Hadron Collider at CERN can gen erate on e petabyte of data per sec-
on d, givin g n ew m ean in g to th e con cept of Big Data. CERN relies on
th ose data to determ in e th e resu lts of experim en ts u sin g com plex
algorith m s an d an alytics th at can take sign i can t am ou n ts of tim e an d
processin g power to com plete.
Man y ph arm aceu tical an d m edical research rm s are in th e sam e
category as CERN, as well as organ ization s th at research earth qu akes,
weath er, an d global clim ates. All ben e t from th e con cept of Big Data.
However, wh ere does th at leave sm all an d m ediu m bu sin esses? How
can th ese en tities ben e t from Big Data an alytics? Th ese bu sin esses do
n ot typically gen erate petabytes of data or deal with trem en dou s
volu m es of u n categorized data, or do th ey?
For small and m ediu m busin esses (SMB), Big Data analytics can
deliver valu e for m u ltiple busin ess segments. Th at is a relatively recent
development with in the Big Data analytics m arket. Small and m ediu m
busin esses h ave access to scores of publicly available data, inclu ding
m ost of th e Web and social n etworkin g sites. Several h osted services
h ave also come in to being th at can offer the computin g power, storage,
and platform s for analytics, changin g the Big Data analytics m arket in to
a pay as you go, con sum e what you n eed entity. This proves to be
very affordable for the SMB m arket and allows th ose busin esses to take
it slow and experiment with what Big Data analytics can deliver.

O BSTA CLES REMA IN

With th e barriers of data volu m e an d costs som ewh at elim in ated, th ere
are still sign i can t obstacles for SMB en tities to leverage Big Data.
Th ose obstacles in clu de th e pu rity of th e data, an alytical kn owledge,

c02 22 October 2012; 17:53:9


14 BI G DATA ANAL YTI CS

an u n derstan din g of statistics, an d several oth er ph ilosoph ical an d


edu cation al ch allen ges. It all com es down to an alyzin g th e data n ot ju st
becau se th ey are th ere bu t for a speci c bu sin ess pu rpose.
For SMBs lookin g to gain experien ce in an alytics, th e rst place to
tu rn to is th e Web n am ely, for an alyzin g web site traf c. Here an
SMB can u se a tool like Blekko (h ttp:/ / www.blekko.com ) to look at
traf c distribu tion to a web site. Th is in form ation can be very valu able
for SMBs th at rely on a com pan y web site to dissem in ate m arketin g
in form ation , sell item s, or com m u n icate with cu rren t an d poten tial
cu stom ers. Blekko ts th e Big Data paradigm becau se it looks at
m u ltiple large data sets an d creates visu al resu lts th at h ave m ean in gfu l,
action able in form ation . Usin g Blekko, a sm all bu sin ess can qu ickly
gath er statistics abou t its web site an d com pare it with a com petitor s
web site.
Alth ou gh Blekko m ay be on e of th e sim plest exam ples of Big Data
an alytics, it does illu strate th e poin t th at even in its sim plest form , Big
Data an alytics can ben e t SMBs, ju st as it can ben e t large en terprises.
Of cou rse, oth er tools exist, an d n ew on es are com in g to m arket all of
th e tim e. As th ose tools m atu re an d becom e accessible to th e SMB
m arket, m ore opportu n ities will arise for SMBs to leverage th e Big
Data con cept.
Gath erin g th e data is u su ally h alf th e battle in th e an alytics gam e.
SMBs can search th e Web with tools like 80Legs, Extractiv, an d Nee-
dlebase, all of wh ich offer capabilities for gath erin g data from th e Web.
Th e data can in clu de social n etworkin g in form ation , sales lists, real
estate listin gs, produ ct lists, an d produ ct reviews an d can be gath ered
in to stru ctu red storage an d th en an alyzed. Th e gath ered data prove to
be a valu able resou rce for bu sin esses th at look to an alytics to en h an ce
th eir m arket stan din gs.
Big Data, wh eth er don e in -h ou se or on a h osted offerin g, provides
valu e to bu sin esses of an y size from th e sm allest bu sin ess lookin g to
n d its place in its m arket to th e largest en terprise lookin g to iden tify
th e n ext worldwide tren d. It all com es down to discoverin g an d
leveragin g th e data in an in telligen t fash ion .
Th e am ou n t of data in ou r world h as been explodin g, an d an a-
lyzin g large data sets is already becom in g a key basis of com petition ,
u n derpin n in g n ew waves of produ ctivity growth , in n ovation , an d

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 15

con su m er su rplu s. Bu sin ess leaders in every sector are goin g to h ave to
deal with th e im plication s of Big Data, eith er directly or in directly.
Fu rth erm ore, th e in creasin g volu m e an d detail of in form ation
acqu ired by bu sin esses an d govern m en t agen cies paired with th e rise
of m u ltim edia, social m edia, in stan t m essagin g, e-m ail, an d oth er
In tern et-en abled tech n ologies will fu el expon en tial growth in data
for th e foreseeable fu tu re. Som e of th at grow th can be attribu ted to
in creased com plian ce requ irem en ts, bu t a key factor in th e in crease in
data volu m es is th e in creasin gly sen sor-en abled an d in stru m en ted
world. Exam ples in clu de RFID tags, veh icles equ ipped with GPS sen -
sors, low-cost rem ote sen sin g devices, in stru m en ted bu sin ess pro-
cesses, an d in stru m en ted web site in teraction s.
Th e qu estion m ay soon arise of wh eth er Big Data is too big, leadin g
to a situ ation in wh ich determ in in g valu e m ay prove m ore dif cu lt.
Th is will evolve in to an argu m en t for th e qu ality of th e data over th e
qu an tity. Neverth eless, it will be alm ost im possible to deal with ever-
growin g data sou rces if bu sin esses don t prepare to deal with th e
m an agem en t of data h ead-on .

DATA CO N TIN UE TO EVO LVE

Before 2010, m an agin g data was a relatively sim ple ch ore: On lin e
tran saction processin g system s su pported th e en terprise s bu sin ess
processes, operation al data stores accu m u lated th e bu sin ess tran sac-
tion s to su pport operation al reportin g, an d en terprise data wareh ou ses
accu m u lated an d tran sform ed bu sin ess tran saction s to su pport both
operation al an d strategic decision m akin g.
Th e typical en terprise n ow experien ces a data growth rate of 40 to
60 percen t an n u ally, wh ich in tu rn in creases n an cial bu rden s an d
data m an agem en t com plexity. Th is situ ation im plies th at th e data
th em selves are becom in g less valu able an d m ore of a liability for m an y
bu sin esses, or a low-com m odity elem en t.
Noth in g cou ld be fu rth er from th e tru th . More data m ean m ore
valu e, an d cou n tless com pan ies h ave proved th at axiom with Big Data
an alytics. To exem plify th at valu e, on e n eeds to look n o fu rth er th an at
h ow vertical m arkets are leveragin g Big Data an alytics, wh ich leads to
a disru ptive ch an ge.

c02 22 October 2012; 17:53:9


16 BI G DATA ANAL YTI CS

For exam ple, sm aller retailers are collectin g click-stream data from
web site in teraction s an d loyalty card data from tradition al retailin g
operation s. Th is poin t-of-sale in form ation h as tradition ally been u sed
by retailers for sh oppin g basket an alysis an d stock replen ish m en t, bu t
m an y retailers are n ow goin g on e step fu rth er an d m in in g th e data for
a cu stom er bu yin g an alysis. Th ose retailers are th en sh arin g th ose data
(after n orm alization an d iden tity scru bbin g) with su ppliers an d
wareh ou ses to brin g added ef cien cy to th e su pply ch ain .
An oth er exam ple of n din g valu e com es from th e world of sci-
en ce, wh ere large-scale experim en ts create m assive am ou n ts of data
for an alysis. Big scien ce is n ow paired with Big Data. Th ere are far-
reach in g im plication s in h ow big scien ce is workin g with Big Data; it is
h elpin g to rede n e h ow data are stored, m in ed, an d an alyzed. Large-
scale experim en ts are gen eratin g m ore data th an can be h eld at a lab s
data cen ter (e.g., th e Large Hadron Collider at CERN gen erates over 15
petabytes of data per year), wh ich in tu rn requ ires th at th e data be
im m ediately tran sferred to oth er laboratories for processin g a tru e
m odel of distribu ted an alysis an d processin g.
Oth er scienti c quests are prime examples of Big Data in action ,
fueling a disru ptive change in h ow experim en ts are performed and
data in terpreted. Th an ks to Big Data m eth odologies, contin en tal-scale
experiments h ave become both politically and techn ologically feasible
(e.g., th e Ocean Observatories Initiative, the National Ecological Obser-
vatory Network, and USArray, a con tin en tal-scale seism ic observatory).
Mu ch of th e disru ption is fed by im proved in stru m en t an d sen sor
tech n ology; for in stan ce, th e Large Syn optic Su rvey Telescope h as a
3.2-gigabyte pixel cam era an d gen erates over 6 petabytes of im age
data per year. It is th e platform of Big Data th at is m akin g su ch lofty
goals attain able.
Th e validation of Big Data an alytics can be illu strated by advan ces
in scien ce. Th e biom edical corporation Bioin form atics recen tly
an n ou n ced th at it h as redu ced th e tim e it takes to sequ en ce a gen om e
from years to days, an d it h as also redu ced th e cost, so it will be feasible
to sequ en ce an in dividu al s gen om e for $1,000, pavin g th e way for
im proved diagn ostics an d person alized m edicin e.
Th e n an cial sector h as seen h ow Big Data an d its associated
an alytics can h ave a disru ptive im pact on bu sin ess. Fin an cial services

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 17

rm s are seein g larger volu m es th rou gh sm aller tradin g sizes,


in creased m arket volatility, an d tech n ological im provem en ts in au to-
m ated an d algorith m ic tradin g.

DA TA AN D DATA A N A LYSIS ARE GETTIN G


MO RE CO MPLEX

On e of th e su rprisin g ou tcom es of th e Big Data paradigm is th e sh ift of


wh ere th e valu e can be fou n d in th e data. In th e past, th ere was an
in h eren t h ypoth esis th at th e bu lk of valu e cou ld be fou n d in stru ctu red
data, wh ich u su ally con stitu te abou t 20 percen t of th e total data stored.
Th e oth er 80 percen t of data is u n stru ctu red in n atu re an d was often
viewed as h avin g lim ited or little valu e.
Th at perception began to ch an ge on ce th e su ccesses of search
en gin e providers an d e-retailers sh owed oth erwise. It was th e an alysis
of th at u n stru ctu red data th at led to click-stream an alytics (for
e-retailers) an d search en gin e prediction s th at lau n ch ed m u ch of th e Big
Data m ovem en t. Th e rst exam ples of th e su ccessfu l processin g of large
volu m es of u n stru ctu red data led oth er in du stries to take n ote, wh ich in
tu rn h as led to en terprises m in in g an d an alyzin g stru ctu red an d
u n stru ctu red data in con ju n ction to look for com petitive advan tages.
Un stru ctu red data brin g com plexity to th e an alytics process.
Tech n ologies su ch as im age processin g for face recogn ition , search
en gin e classi cation of videos, an d com plex data in tegration du rin g
geospatial processin g are becom in g th e n orm in processin g u n stru c-
tu red data. Add to th at th e n eed to su pport tradition al tran saction -
based an alysis (e.g., n an cial perform an ce), an d it becom es easy to see
com plexity growin g expon en tially. Moreover, oth er capabilities are
becom in g a requ irem en t, su ch as web click-stream data drivin g
beh avioral an alysis.
Beh avioral an alytics is a process th at determ in es pattern s of
beh avior from h u m an -to-h u m an an d h u m an -to-system in teraction
data. It requ ires large volu m es of data to bu ild an accu rate m odel. Th e
beh avioral pattern s can provide in sigh t in to wh ich series of action s led
to an even t (e.g., a cu stom er sale or a produ ct switch ). On ce th ese
pattern s h ave been determ in ed, th ey can be u sed in tran saction pro-
cessin g to in u en ce a cu stom er s decision .

c02 22 October 2012; 17:53:9


18 BI G DATA ANAL YTI CS

Wh ile m odels of tran saction al data an alytics are well u n derstood


an d m u ch of th e valu e is realized from stru ctu red data, it is th e valu e
fou n d in beh avioral an alytics th at allows th e creation of a m ore pre-
dictive m odel. Beh avioral in teraction s are less u n derstood, an d th ey
requ ire large volu m es of data to bu ild accu rate m odels. Th is is an oth er
case wh ere m ore data equ al m ore valu e; th is is backed by research th at
su ggests th at a soph isticated algorith m with little data is less accu rate
th an a sim ple algorith m with a large am ou n t of data. Eviden ce of th is
can be fou n d in th e algorith m s u sed for voice an d h an dwritin g rec-
ogn ition an d crowd sou rcin g.

THE FUTURE IS N O W

New developm en ts for processin g u n stru ctu red data are arrivin g on
th e scen e alm ost daily, with on e of th e latest an d m ost sign i can t
com in g from th e social n etworkin g site Twitter. Makin g sen se of its
m assive database of u n stru ctu red data was a h u ge problem so h u ge,
in fact, th at it pu rch ased an oth er com pan y ju st to h elp it n d th e valu e
in its m assive data store. Th e su ccess of Twitter revolves arou n d h ow
well th e com pan y can leverage th e data th at its u sers gen erate. Th is
am ou n ts to a great deal of u n stru ctu red in form ation from th e m ore
th an 200 m illion accou n ts th e site h osts, wh ich gen erates 230 m illion
Twitter m essages a day.
To address th e problem , th e social n etworkin g gian t pu rch ased
BackType, th e developer of Storm , a software produ ct th at can parse
live data stream s su ch as th ose created by th e m illion s of Twitter feeds.
Twitter h as released th e sou rce code of Storm , m akin g it available to
oth ers wh o wan t to pu rsu e th e tech n ology. Twitter is n ot in terested in
com m ercializin g Storm .
Storm h as proved its valu e for Twitter, wh ich can n ow perform
an alytics in real tim e an d iden tify tren ds an d em ergin g topics as th ey
develop. For exam ple, Twitter u ses th e software to calcu late h ow
widely Web addresses are sh ared by m u ltiple Twitter u sers in real tim e.
With th e capabilities offered by Storm , a com pan y can process Big
Data in real tim e an d garn er kn owledge th at leads to a com petitive
advan tage. For exam ple, calcu latin g th e reach of a Web address cou ld
take u p to 10 m in u tes u sin g a sin gle m ach in e. However, with a Storm

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 19

clu ster, th at workload can be spread ou t to dozen s of m ach in es, an d a


resu lt can be discovered in ju st secon ds. For com pan ies th at m ake
m on ey from em ergin g tren ds (e.g., ad agen cies, n an cial services, an d
In tern et m arketers), th at faster processin g can be cru cial.
Like Twitter, m an y organ ization s are discoverin g th at th ey h ave
access to a great deal of data, an d th ose data, in all form s, cou ld
be tran sform ed in to in form ation th at can im prove ef cien cies, m axi-
m ize pro ts, an d u n veil n ew tren ds. Th e trick is to organ ize an d
an alyze th e data qu ickly en ou gh , a process th at can n ow be accom -
plish ed u sin g open sou rce tech n ologies an d lu m ped u n der th e h eadin g
of Big Data.
Oth er exam ples abou n d of h ow u n stru ctu red, sem istru ctu red, an d
stru ctu red Big Data stores are providin g valu e to bu sin ess segm en ts.
Take, for exam ple, th e on lin e sh oppin g service Livin gSocial, wh ich
leverages tech n ologies su ch as th e Apach e Hadoop data processin g
platform to garn er in form ation abou t wh at its u sers wan t.
Th e process h as allowed Livin gSocial to offer predictive an alysis in
real tim e, wh ich better services its cu stom er base. Th e com pan y is n ot
alon e in its qu est for squ eezin g th e m ost valu e ou t of its u n stru ctu red
data. Oth er m ajor sh oppin g sites, sh oppin g com parison sites, an d
on lin e version s of brick-an d-m ortar stores h ave also im plem en ted
tech n ologies to brin g real-tim e an alytics to th e forefron t of cu stom er
in teraction .
However, in th at h igh ly com petitive m arket, n din g n ew ways to
in terpret th e data an d process th em faster is provin g to be th e critical
com petitive advan tage an d is drivin g Big Data an alytics forward with
n ew in n ovation s an d processes. Th ose en terprises an d m an y oth ers
learn ed th at data in all form s can n ot be con sidered a com m odity item ,
an d ju st as with gold, it is th rou gh m in in g th at on e n ds th e n u ggets of
valu e th at can affect th e bottom lin e.

c02 22 October 2012; 17:53:9

You might also like