0% found this document useful (0 votes)
72 views25 pages

Overview of Big Data: Saidatul Rahah Hamidi

Big data refers to the large volumes of structured and unstructured data that are so large that traditional data processing applications are inadequate. This data can be analyzed to provide better decisions, strategic business moves, and more accurate predictions of customer behavior. Companies use big data analytics to efficiently run their operations and understand specific customer segments. Distributed systems are well-suited for big data because they provide scalability, enhanced performance, fault tolerance, and the ability to move computations to where the data is located. Big data brings value to enterprises through analytics that support major business decisions.

Uploaded by

syahmina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views25 pages

Overview of Big Data: Saidatul Rahah Hamidi

Big data refers to the large volumes of structured and unstructured data that are so large that traditional data processing applications are inadequate. This data can be analyzed to provide better decisions, strategic business moves, and more accurate predictions of customer behavior. Companies use big data analytics to efficiently run their operations and understand specific customer segments. Distributed systems are well-suited for big data because they provide scalability, enhanced performance, fault tolerance, and the ability to move computations to where the data is located. Big data brings value to enterprises through analytics that support major business decisions.

Uploaded by

syahmina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

OVERVIEW of BIG DATA

Saidatul Rahah Hamidi


contents
Big data era
Big data is a term that describes the large volume of data both structured and
unstructured that overwhelms a business on a day-to-day basis
 Big data can be analyzed to provide
 Better decisions
 Strategic business moves.

Companies can
 Accurately predict what specific segments of customers will want to
buy
 Helps companies run their operations in a much more efficient way.

https://www.youtube.com/watch?v=eVSfJhssXUA
https://www.youtube.com/watch?v=TzxmjbL-i4Y
End
The Brief History of Big Data
1989 1991
Early use of term Big Data in The birth of the internet. Anyone can
magazine article by fiction author now go online and upload their own
Erik Larson – commenting on data, or analyze data uploaded by
advertisers’ use of data to target other people.
customers.

1997
Google launch their search engine 1999
which will quickly become the most First use of the term Big Data in an academic paper
popular in the world. – Visually Exploring Gigabyte Datasets in Realtime
Michael Lesk estimates the digital (ACM)
universe is increasing tenfold in size First use of term Internet of Things, in a business
every year. presentation by Kevin Ashton to Procter and Gamble.
The Brief History of Big Data
2001 2005
Three “Vs” of Big Data – Volume, Hadoop – an open source Big Data framework now
Velocity, Variety – defined by Doug developed by Apache – is developed.
Laney The birth of “Web 2.0 – the user-generated web”.

2014
Mobile internet use overtakes desktop
for the first time
88% of executives responding to an 2015
international survey by GE say that big The data volumes are exploding, more data has
data analysis is a top priority been created in the past two years than in the entire
previous history of the human race.
Big Data Analytics
Big data analytics is the process of collecting, organizing and analyzing large sets of data (we called it
Big Data) to discover patterns and other useful information.
Big Data Analytics
En
Characteristics of big data d

Velocity Volume
The speed at which the data is The quantity of generated and
generated and processed to meet stored data. The size of the data Sensor Data
the demands and challenges that determines the value and
lie in the path of growth and potential insight, and whether it We are increasingly surrounded by
development. Big data is often can be considered big data or sensors that collect and share
available in real-time. not. data. Take your smart phone, it
contains a global positioning
sensor to track exactly where you
are every second of the day, it
Variety includes an accelometer to track
Data comes in all types of formats the speed and direction at which
Veracity which is in form structured, you are travelling. We now have
The data quality of captured data numeric data in traditional sensors in many devices and
can vary greatly, affecting the databases to unstructured text products.
accurate analysis. documents, email, video, audio,
sensor data, stock ticker data and
financial transactions.
Characteristics

Volume Variety Velocity Veracity

Data at Scale Data in Many Forms Data in Motion Data Uncertainty


Terabytes to Structured, unstructured, text, Analysis of streaming data to enable Managing the reliability and
petabytes of data multimedia decisions within fractions of a second. predictability of inherently imprecise
data types.
Turning Big Data into Value:
The ‘Datafication’ of our Analysing
World; Big Data:
Volume

• Activities • Text analytics


• Conversations • Sentiment analysis
• Words Velocity • Face recognition
• Voice • Voice analytics
• Social Media • Movement analytics Value
• Browser logs • Etc.
Variety
• Photos
• Videos
• Sensors
• Etc. Veracity
En
Role of distributed system in big data d

Scalability Active Systems


Management
Distributed systems function
across multiple machines,
owing to which these systems It help leverage
are inherently scalable. This flexibility from sources
implies that a distributed connected to the
system can make optimal use distribution network for
of system resources in the managing congestion,
light of the demand it is Enhanced Performance
primarily through
under. financial incentives.
Distributed systems make
way for better service
performance than
centralized systems.
Role of distributed system in big data
(continue..)
Write-once, read-many : a file is created, written to the file system and then
closed. Once the file is closed, changes cannot be made to its contents.
Streaming access : Big data applications typically process entire files. Instead
of optimizing the file system to randomly access individual data elements, data
distributed system is optimized for batch processing of entire files as a
continuous stream of data.
Move computations to the data : instead of the computational program
retrieving the data for processing in a central location, copies of the program
are “pushed” to the nodes containing the data to be processed. Each copy of
the program produces results that are then aggregated across nodes and sent
back to client.
 Fault-tolerance : Distributed system helps to replicate data across many
different devices so that when one device fails, the data is still available from
another device.
Big data value for the enterprise En
d

Big data can provide analytics to support major business decisions such
as production planning, sales management and capital investments.

Sentiment analysis can be done using big data. Therefore, you


can get more feedback about who is saying what about your
company and you can plan on how to improve your company.

Storing large amounts of data is much easier using big data


technologies and more importantly, the data you store will be
accurate because big data tools has greatly reduced the risk of
inaccurate data.
What’s Driving Big Data

- Optimizations and predictive analytics


- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

- Ad-hoc querying and reporting


- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
En
Data Privacy d

Issues Data Security

Data Discrimination
Issues and challenges
of big data
Dealing with data
growth

Challenges Getting Data into Big


Data Structure

Data Discrimination
En
d

Issues
Storage and
Data Privacy Data Security Data Discrimination
Transport Issues
• Privacy infractions • Data breaches mean • Being ready to • Big data need big
crop up may strike the discovery of access computerized storage. Current
immediately upon a customer info to instruction (please disk technology
institution enlists those that would in usual online activity, limits are about 4
powerless care a different way online preferences) terabytes per disk.
measures. don't have any which could So, 1 Exabyte would
• Although the access to such a skeptically have an require 25,000
systems software person delicate affect on the disks.
specialist remains instruction. With potential for an • To handle this issue,
chiefly responsible each of the latest individual, as the data should be
for the act, it may breaches, your example to secure a processed “in place”
happen to be private information loan, and on the and transmit only
stopped have to is actually in danger. outside that one the resulting
skillful happen to be person’s proficiency information.
stricter tools and to validate this
protocols that fact message is declared
shield penetrable. to be decidedly
unfair.
Challenges of Big Data
• In order to deal with data growth, organizations are turning to a number of different
technologies. When it comes to storage, converged and hyper converged infrastructure and
software-defined storage can make it easier for companies to scale their hardware. And
Dealing with data technologies like compression, reduplication and tiring can reduce the amount of space and
growth the costs associated with big data storage.

• It could be indisputable that one the resolved of a large data management comes to
analyzing and processing a large number of data. The obligation to navigate transformation
Getting Data into and extraction isn't defined to conventional relational data sets.
Big Data
Structure

• Big data brings along with it some huge analytical challenges. The type of analysis to be
done on this huge amount of data which can be unstructured, semi structured or structured
requires a large number of advance skills. This can be done by using one of two techniques:
Analytical either incorporate massive data volumes in analysis or determine upfront which Big data is
Challenge relevant.
Miscellaneous Challenges En
d

Other challenges may occur while integrating big data. Some of the
challenges include
 integration of data  Veracity data
 skill availability  validity of data.
 solution cost
 the volume of data
 the rate of transformation of data

• It is also a challenge to process a large amount of data at a reasonable speed so that information
is available for data consumers when they need it. The validation of data set is also fulfilled while
transferring data from one source to another or to consumers as well.

Scalability:
The scalability issue of Big data has lead towards cloud computing, which now aggregates multiple
disparate workloads with varying performance goals.
This requires high level of sharing of resources which is expensive and various challenges like how
to run and execute jobs so that we can meet the goal of each workload cost effectively.
En
d

E-Commerce &
Retail/Customer
Customer Services

Big Data Finances/Fraud


Telecommunication
Application Services

Health & Life


Web & Digital
Services
Media
En
d

Retail/Customer
 Merchandizing and market
basket analysis

 Campaign management and


customer loyalty programs

 Supply-chain management and


analytics

 Event- and behavior-based


targeting

 Market and consumer


segmentations
En
d

Finances/Fraud Services
 Compliance and regulatory
reporting
 Risk analysis and management
 Fraud detection and security
analytics Credit risk, scoring and
analysis
 High speed arbitrage trading
 Trade surveillance
 Abnormal trading pattern analysis

KUALA LUMPUR (Feb 15): Industrial engineering and


marine product supplier Pansar Bhd was slapped with an
unusual market activity (UMA) query by Bursa Malaysia
Securities Bhd, after its share price hit a 15-year high of
79 sen today.
En
d

Web & Digital Media

 Large-scale clickstream analytics


 Ad targeting, analysis, forecasting and
optimization
 Abuse and click-fraud prevention
 Social graph analysis and profile
segmentation
 Campaign management and loyalty
programs

https://www.youtube.com/watch?v=XjmldAL9RQs
En
d

E-Commerce & Customer Services


 Cross-channel analytics  Right offer at the right time
 Event analytics  Next best offer or next best action
 Recommendation engines using predictive
analytics
En
d

Telecommunication

 Revenue assurance and price


optimization
 Customer churn prevention
 Campaign management and
customer loyalty
 Call detail record (CDR)
analysis
 Network performance and
optimization
 Mobile user location analysis
En
d

Health & Life Services

 Clinical trials data analysis


 Disease pattern analysis
 Campaign and sales program optimization
 Patient care quality and program analysis
 Medical device and pharmacy supply-chain
management
 Drug discovery and development analysis

You might also like