Unit 1 Data Science and Big Data
Unit 1 Data Science and Big Data
Introduction
Data that is always increasing and cannot be processed and stored on a single machine is
termed as Big Data.
Big Data
Big Data has given the organization a new way to analyze and visualize their data effectively.
For example:
Health: Health care organizations are leveraging big data technology to capture all the information
about a patient to get more complete view for insight into care coordination, health management &
outcome.
How Big Data Work
Making big data work requires three main actions:
•Integration: Big data collects terabytes, and sometimes even petabytes, of raw data from
many sources that must be received, processed, and transformed into the format that
business users and analysts need to start analyzing it.
•Management: Big data needs big storage, whether in the cloud, on-premises, or both. Data
must also be stored in whatever form required. It also needs to be processed and made
available in real time. Increasingly, companies are turning to cloud solutions to take
advantage of the unlimited compute and scalability.
•Analysis: The final step is analyzing and acting on big data—otherwise, the investment
won’t be worth it. Beyond exploring the data itself, it’s also critical to communicate and share
insights across the business in a way that everyone can understand. This includes using
tools to create data visualizations like charts, graphs, and dashboards.
Applications
Tracking Customer Spending Habit, Shopping Behavior: In big retails store (like Amazon,
Walmart, Big Bazar etc.) management team has to keep data of customer’s spending habit shopping
behavior, customer’s most liked product (so that they can keep those products in the store).
Recommendation: By tracking customer spending habit, shopping behavior, Big retails store provide
a recommendation to the customer. E-commerce site like Amazon, Walmart, Flipkart does product
recommendation.
Education Sector: Online educational course conducting organization utilize big data to search
candidate, interested in that course. If someone searches for YouTube tutorial video on a subject, then
online or offline course provider organization on that subject send ad online to that person about their
course.
Examples
• Tracking consumer behavior and shopping habits to deliver
hyper-personalized retail product recommendations tailored to individual customers
• Monitoring payment patterns and analyzing them against historical customer activity
to detect fraud in real time
• Combining data and information from every stage of an order’s shipment journey with
hyperlocal traffic insights to help fleet operators optimize last-mile delivery
• Using AI-powered technologies like
natural language processing to analyze unstructured medical data (such as research
reports, clinical notes, and lab results) to gain new insights for improved treatment
development and enhanced patient car
• Using image data from cameras and sensors, as well as GPS data, to detect potholes
and improve road maintenance in cities
• Analyzing public datasets of satellite imagery and geospatial datasets to visualize,
monitor, measure, and predict
the social and environmental impacts of supply chain operations
Challenges
Securing Data
Tools and techniques
Data Science
▪ Data science is a field that deals with unstructured, structured data, and semi-structured
data. It involves practices like data cleansing, data preparation, data analysis, and much
more.
▪ Data science is the combination of: statistics, mathematics, programming, and problem-
solving;, capturing data in ingenious ways; the ability to look at things differently; and the
activity of cleansing, preparing, and aligning data. This umbrella term includes various
techniques that are used when extracting insights and information from data.
▪ Unlock the potential of analytics with Simplilearn's top-rated analytics courses. Gain a
competitive edge in the job market and propel your career forward.
Need of Data Science
▪ From business to the health industry, science to our everyday lives, marketing to research,
in fact, for everything in a fraternity, data is required to thrust the movement forward.
Computer science and information technology have taken over our lives, and it is advancing
with each passing day with such velocity and variety that the operational techniques used a
few years back have now become obsolete.
▪ The same is the case with challenges and problems. The problems and concerns of the past
for a specific theme, illness, or shortfall may not be the same today as they have advanced
in terms of complexity.
▪ Every field of science and study or organization, therefore, needs an updated set of
operational systems and technology to keep up with the challenges of today and tomorrow
as well as to derive solutions for unanswered questions
Employ and utilize
Organizations can use big data analytics systems and software to make data-driven decisions that
can improve business-related outcomes.
The benefits may include more effective marketing, new revenue opportunities, customer
personalization and improved operational efficiency.
With an effective strategy, these benefits can provide competitive advantages over rivals.
Current trends and opportunities
Apache Hadoop
• Apache Hadoop is an open source, Java-based software platform that manages data
processing and storage for big data applications.
• The platform works by distributing Hadoop big data and analytics jobs across nodes in
a computing cluster, breaking them down into smaller workloads that can be run in
parallel. Some key benefits of Hadoop are scalability, resilience and flexibility.
• The Hadoop Distributed File System (HDFS) provides reliability and resiliency by
replicating any node of the cluster to the other nodes of the cluster to protect against
hardware or software failures.
• Hadoop flexibility allows the storage of any data format including structured and
unstructured data.
Tableau
• Tableau is a powerful tool used for data analysis and visualization. It allows the
creation of amazing and interactive visualization and that too without coding.
• Tableau is very famous as it can take in data and produce the required data
visualization output in a very short time.
• Basically, it can elevate your data into insights that can be used to drive your action in
the future..
Tableau Features