0% found this document useful (0 votes)
12 views20 pages

Unit 1 Notes Bda

Big Data refers to large, complex data sets that require advanced processing techniques for analysis and insights, characterized by high volume, velocity, and variety. Its importance lies in how effectively organizations utilize this data to drive decision-making, cost savings, and innovation. The document also distinguishes between structured, unstructured, and semi-structured data, and outlines the architecture and components necessary for managing Big Data.

Uploaded by

anki85631
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views20 pages

Unit 1 Notes Bda

Big Data refers to large, complex data sets that require advanced processing techniques for analysis and insights, characterized by high volume, velocity, and variety. Its importance lies in how effectively organizations utilize this data to drive decision-making, cost savings, and innovation. The document also distinguishes between structured, unstructured, and semi-structured data, and outlines the architecture and components necessary for managing Big Data.

Uploaded by

anki85631
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT – I

Introduction to Big Data:


What is Big Data?
According to Gartner, the definition of Big Data –
“Big data” is high-volume, velocity, and variety information assets that demand cost-
effective, innovative forms of information processing for enhanced insight and decision
making.”
This definition clearly answers the “What is Big Data?” question – Big Data refers to complex
and large data sets that have to be processed and analyzed to uncover valuable information that
can benefit businesses and organizations.

However, there are certain basic tenets of Big Data that will make it even simpler to answer what
is Big Data:

•It refers to a massive amount of data that keeps on growing exponentially with time.
•It is so voluminous that it cannot be processed or analyzed using conventional data processing
techniques.

•It includes data mining, data storage, data analysis, data sharing, and data visualization.
•The term is an all-comprehensive one including data, data frameworks, along with the
tools and techniques used to process and analyze the data.

History of Big Data


Although the concept of big data itself is relatively new, the origins of large data sets go back
to the 1960s and 70s when the world of data was just getting started with the first data centers
and the development of the relational database.

Around 2005, people began to realize just how much data users generated through Facebook,
YouTube, and other online services. Hadoop (an open-source framework created specifically to
store and analyze big data sets) was developed that same year. NoSQL also began to gain
popularity during this time.

The development of open-source frameworks, such as Hadoop (and more recently, Spark) was
essential for the growth of big data because they make big data easier to work with and cheaper
to store. In the years since then, the volume of big data has skyrocketed. Users are still
generating huge amounts of data—but it’s not just humans who are doing it.

With the advent of the Internet of Things (IoT), more objects and devices are connected to the
internet, gathering data on customer usage patterns and product performance. The emergence of
machine learning has produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud computing has
expanded big data possibilities even further. The cloud offers truly elastic scalability, where
developers can simply spin up ad hoc clusters to test a subset of data.

Benefits of Big Data and Data Analytics:


•Big data makes it possible for you to gain more complete answers because you have more
information.
•More complete answers mean more confidence in the data—which means a completely
different approach to tackling problems.

Types of Big Data:


Now that we are on track with what is big data, let’s have a look at the types of big data:
(a)Structured:

Structured is one of the types of big data and By structured data, we mean data that can be
processed, stored, and retrieved in a fixed format. It refers to highly organized information that
can be readily and seamlessly stored and accessed from a database by simple search engine
algorithms. For instance, the employee table in a company database will be structured as the
employee details, their job positions, their salaries, etc., will be present in an organized manner.

(b)Unstructured:
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data. Email is an
example of unstructured data.

Structured and unstructured are two important types of big data.


(c)Semi-structured:
Semi structured is the third type of big data. Semi-structured data pertains to the data
containing both the formats mentioned above, that is, structured and unstructured data. To be
precise, it refers to the data that although has not been classified under a particular repository
(database), yet contains vital information or tags that segregate individual elements within the
data. Thus we come to the end of types of data.

Characteristics of Big Data:


Back in 2001, Gartner analyst Doug Laney listed the 3 ‘V’s of Big Data – Variety, Velocity, and
Volume. Let’s discuss the characteristics of big data. These characteristics, isolated, are enough
to know what big data is. Let’s look at them in depth:

(a)Variety:
Variety of Big Data refers to structured, unstructured, and semi-structured data that is gathered
from multiple sources. While in the past, data could only be collected from spreadsheets and
databases, today data comes in an array of forms such as emails, PDFs, photos, videos, audios,
SM posts, and so much more. Variety is one of the important characteristics of big data.

(b)Velocity:
Velocity essentially refers to the speed at which data is being created in real-time. In a broader
prospect, it comprises the rate of change, linking of incoming data sets at varying speeds, and
activity bursts.

(c)Volume:
Volume is one of the characteristics of big data. We already know that Big Data indicates huge
‘volumes’ of data that is being generated on a daily basis from various sources like social media
platforms, business processes, machines, networks, human interactions, etc. Such a large amount
of data is stored in data warehouses. Thus comes to the end of characteristics of big data.

Why is Big Data Important?


The importance of big data does not revolve around how much data a company has but how
a company utilizes the collected data. Every company uses data in its own way; the more
efficiently a company uses its data, the more potential it has to grow. The company can take data
from any source and analyze it to find answers which will enable:
1.Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring cost
advantages to business when large amounts of data are to be stored and these tools also help in
identifying more efficient ways of doing business.

2.Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily
identify new sources of data which helps businesses analyzing data immediately and make quick
decisions based on the learning.

3.Understand the market conditions: By analyzing big data you can get a better understanding
of current market conditions. For example, by analyzing customers’ purchasing behaviors, a
company can find out the products that are sold the most and produce products according to this
trend. By this, it can get ahead of its competitors.

4.Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get
feedback about who is saying what about your company. If you want to monitor and improve the
online presence of your business, then, big data tools can help in all this.

5.Using Big Data Analytics to Boost Customer Acquisition and Retention:


The customer is the most important asset any business depends on. There is no single business
that can claim success without first having to establish a solid customer base. However, even
with a customer base, a business cannot afford to disregard the high competition it faces. If a
business is slow to learn what customers are looking for, then it is very easy to begin offering
poor quality products. In the end, loss of clientele will result, and this creates an adverse overall
effect on business success. The use of big data allows businesses to observe various customer
related patterns and trends. Observing customer behavior is important to trigger loyalty.

6.Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights:
Big data analytics can help change all business operations. This includes the ability to match
customer expectation, changing company’s product line and of course ensuring that the
marketing campaigns are powerful.

7.Big Data Analytics As a Driver of Innovations and Product Development:


Another huge advantage of big data is the ability to help companies innovate and redevelop their
products.
Business Intelligence vs Big Data:
Although Big Data and Business Intelligence are two technologies used to analyze data to
help companies in the decision-making process, there are differences between both of them.
They differ in the way they work as much as in the type of data they analyze.

Traditional BI methodology is based on the principle of grouping all business data into a
central server. Typically, this data is analyzed in offline mode, after storing the information in an
environment called Data Warehouse. The data is structured in a conventional relational database
with an additional set of indexes and forms of access to the tables (multidimensional cubes).

A Big Data solution differs in many aspects to BI to use. These are the main differences between
Big Data and Business Intelligence:

1.In a Big Data environment, information is stored on a distributed file system, rather
than on a central server. It is a much safer and more flexible space.

2.Big Data solutions carry the processing functions to the data, rather than the data to the
functions. As the analysis is centered on the information, it´s easier to handle larger
amounts of information in a more agile way.

3.Big Data can analyze data in different formats, both structured and unstructured. The
volume of unstructured data (those not stored in a traditional database) is growing at
levels much higher than the structured data. Nevertheless, its analysis carries different
challenges. Big Data solutions solve them by allowing a global analysis of various
sources of information.

4.Data processed by Big Data solutions can be historical or come from real-time sources.
Thus, companies can make decisions that affect their business in an agile and efficient
way.

5.Big Data technology uses parallel mass processing (MPP) concepts, which improves
the speed of analysis. With MPP many instructions are executed simultaneously, and
since the various jobs are divided into several parallel execution parts, at the end the
overall results are reunited and presented. This allows you to analyze large volumes of
information quickly.

Big Data vs Data Warehouse:


Big Data has become the reality of doing business for organizations today. There is a boom in
the amount of structured as well as raw data that floods every organization daily. If this data is
managed well, it can lead to powerful insights and quality decision making.

Big data analytics is the process of examining large data sets containing a variety of data types to
discover some knowledge in databases, to identify interesting patterns and establish relationships
to solve problems, market trends, customer preferences, and other useful information. Companies
and businesses that implement Big Data Analytics often reap several business benefits.
Companies implement Big Data Analytics because they want to make more informed business
decisions.

A data warehouse (DW) is a collection of corporate information and data derived from
operational systems and external data sources. A data warehouse is designed to support business
decisions by allowing data consolidation, analysis and reporting at different aggregate levels.
Data is populated into the Data Warehouse through the processes of extraction, transformation
and loading (ETL tools). Data analysis tools, such as business intelligence software, access the
data within the warehouse. Types of digital data:

 Digital Data:
Digital data is information stored on a computer system as a series of 0’s and 1’s in a binary
language. Digital data jumps from one value to the next in a step by step sequence.

Example: Whenever we send an email, read a social media post, or take pictures with our
digital camera, we are working with digital data.

 Digital data can be classified into three forms:


a. Unstructured Data: The data which does not conform to a data model or is not in a form
that can be used easily by a computer program is categorized as unstructured data. About 80—
90% data of an organization is in this format.
Example: Memos, chat rooms, PowerPoint presentations, images, videos, letters, researches,
white papers, the body of an email, etc.

b. Semi-Structured Data: The data which does not conform to a data model but has some
structure is categorized as semi-structured data. However, it is not in a form that can be used
easily by a computer program.
Example: Emails, XML, markup languages like HTML, etc. Metadata for this data is available
but is not sufficient.

c. Structured Data: The data which is in an organized form (ie. in rows and columns) and
can be easily used by a computer program is categorized as semi-structured data. Relationships
exist between entities of data, such as classes and their objects.
Example: Data stored in databases.
Introduction to Big Data platform
A big data platform is a type of IT solution that combines the features and capabilities of
several big data applications and utilities within a single solution, this is then used further for
managing as well as analyzing Big Data. It focuses on providing its users with efficient analytics
tools for massive datasets. The users of such platforms can custom build applications according
to their use case like to calculate customer loyalty (ECommerce user case), and so on.

Goal: The main goal of a Big Data Platform is to achieve: Scalability, Availability, Performance,
and Security.
Example: Some of the most commonly used Big Data Platforms are:
• Hadoop Delta Lake Migration Platform
• Data Catalog Platform
• Data Ingestion Platform
• IoT Analytics Platform
Drivers for Big Data
Big Data has quickly risen to become one of the most desired topics in the industry. The main
business drivers for such rising demand for Big Data Analytics are:
1. The digitization of society
2. The drop in technology costs
3. Connectivity through cloud computing
4. Increased knowledge about data science
5. Social media applications
6. The rise of Internet-of-Things(IoT)
Example: A number of companies that have Big Data at the core of their strategy like:
Apple, Amazon, Facebook and Netflix have become very successful at the beginning of the
21st century. Big Data Architecture:

Big data architecture is designed to handle the ingestion, processing, and analysis of data that
is too large or complex for traditional database systems.

➢ The big data architectures include the following components:


Data sources: All big data solutions start with one or more data sources.
Example,
• Application data stores, such as relational databases.
• Static files produced by applications, such as web server log files.
• Real-time data sources, such as IoT devices.
Data storage: Data for batch processing operations is stored in a distributed file store that can
hold high volumes of large files in various formats (also called data lake).

Example,
• Azure Data Lake Store or blob containers in Azure Storage.
Batch processing: Since the data sets are so large, therefore a big data solution must process
data files using long-running batch jobs to filter, aggregate, and prepare the data for analysis.

Real-time message ingestion: If a solution includes real-time sources, the architecture must
include a way to capture and store real-time messages for stream processing.

Stream processing: After capturing real-time messages, the solution must process them by
filtering, aggregating, and preparing the data for analysis. The processed stream data is then
written to an output sink. We can use opensource Apache streaming technologies like Storm and
Spark Streaming for this.

Analytical data store: Many big data solutions prepare data for analysis and then serve the
processed data in a structured format that can be queried using analytical tools. Example: Azure
Synapse Analytics provides a managed service for large-scale, cloud-based data warehousing.

Analysis and reporting: The goal of most big data solutions is to provide insights into the data
through analysis and reporting. To empower users to analyze the data, the architecture may
include a data modelling layer. Analysis and reporting can also take the form of interactive data
exploration by data scientists or data analysts.

Orchestration: Most big data solutions consist of repeated data processing operations, that
transform source data, move data between multiple sources and sinks, load the processed data
into an analytical data store, or push the results straight to a report. To automate these workflows,
we can use an orchestration technology such as Azure Data Factory.
Big Data Characteristics:
Big data can be described by the following characteristics:
1. Volume
2. Variety
3. Velocity
 5 Vs of Big Data, Big Data technology components:
1.Volume: Big Data is a vast volumes of data generated from many sources daily, such as
business processes, machines, social media platforms, networks, human interactions, and so on.
Example: Facebook generates approximately a billion messages, 4.5 billion times the “Like”
button is recorded, and more than 350 million new posts are uploaded each day.
• Big data technologies can handle large amounts of data.
2.Variety: Big Data can be structured, unstructured, and semi-structured that are being collected
from different sources. Data were only collected from databases and sheets in the past, But these
days the data will come in an array of forms ie.- PDFs, Emails, audios, Social Media posts,
photos, videos, etc.

3.Velocity: Velocity refers to the speed with which data is generated in real-time.
• Velocity plays an important role compared to others.
• It contains the linking of incoming data sets speeds, rate of change, and activity bursts.
• The primary aspect of Big Data is to provide demanding data rapidly.
Example of data that is generated with high velocity - Twitter messages or Facebook posts.
4.Veracity: Veracity refers to the quality of the data that is being analyzed. It is the process of
being able to handle and manage data efficiently.

Example: Facebook posts with hashtags.


5.Value: Value is an essential characteristic of big data. It is not the data that we process or store,
it is valuable and reliable data that we store, process and analyze.

Big Data Technology Components:

1.Ingestion: The ingestion layer is the very first step of pulling in raw data. It comes from
internal sources, relational databases, non-relational databases, social media, emails, phone calls
etc.

 There are two kinds of ingestions:


• Batch, in which large groups of data are gathered and delivered together.
• Streaming, which is a continuous flow of data. This is necessary for real-time data
analytics.

2.Storage: Storage is where the converted data is stored in a data lake or warehouse and
eventually processed. The data lake/warehouse is the most essential component of a big data
ecosystem. It needs to contain only thorough, relevant data to make insights as valuable as
possible. It must be efficient with as little redundancy as possible to allow for quicker processing.

3.Analysis: In the analysis layer, data gets passed through several tools, shaping it into
actionable insights.
 There are four types of analytics on big data:
1. Diagnostic: Explains why a problem is happening.
2. Descriptive: Describes the current state of a business through historical data.
3. Predictive: Projects future results based on historical data.
4. Prescriptive: Takes predictive analytics a step further by projecting best future efforts.
4.Consumption: The final big data component is presenting the information in a format
digestible to the enduser. This can be in the forms of tables, advanced visualizations and even
single numbers if requested. The most important thing in this layer is making sure the intent and
meaning of the output is understandable.

Big Data importance and applications:


Big Data Importance:
Big Data importance doesn’t revolve around the amount of data a company has but lies in
the fact that how the company utilizes the gathered data. Every company uses its collected data
in its own way. More effectively the company uses its data, more rapidly it grows. By analyzing
the big data pools effectively the companies can get answers to:

Cost Savings: Some tools of Big Data like Hadoop can bring cost advantages to business when
large amounts of data are to be stored.
• These tools help in identifying more efficient ways of doing business.
Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily
identify new sources of data which helps businesses analyzing data immediately.

• This helps us to make quick decisions based on the learnings.


Understand the market conditions: By analyzing big data we can get a better understanding of
current market conditions.

• For example: By analyzing customers’ purchasing behaviours, a company can find out the
products that are sold the most and produce products according to this trend. By this, it can
get ahead of its competitors.
Control online reputation:
• Big data tools can do sentiment analysis.
• Therefore, you can get feedback about who is saying what about your company.
• If you want to monitor and improve the online presence of your business, then big data tools
can help in all this.
Using Big Data Analytics to Boost Customer Acquisition(purchase) and Retention:
• The customer is the most important asset any business depends on.
• No single business can claim success without first having to establish a solid customer base.
• If a business is slow to learn what customers are looking for, then it is very likely to deliver
poor quality products.
• The use of big data allows businesses to observe various customer-related patterns and
trends.
Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights:
• Big data analytics can help change all business operations.
• Like the ability to match customer expectations, changing company’s product line, etc.
• And ensuring that the marketing campaigns are powerful.

Big Data Applications:


In today’s world big data have several applications, some of them are listed below:
Tracking Customer Spending Habit, Shopping Behavior: In big retails stores, the
management team has to keep data of customer’s spending habits, shopping behaviour, most
liked product, which product is being searched/sold most, based on that data, the
production/collection rate of that product gets fixed.

Recommendation: By tracking customer spending habits, shopping behaviour, big retail stores
provide recommendations to the customers.

Smart Traffic System: Data about the condition of the traffic of different roads, collected
through cameras, GPS devices placed in the vehicle.

• All such data are analyzed and jam-free or less jam way, less time taking ways are
recommended.
• One more profit is fuel consumption can be reduced.
Secure Air Traffic System: At various places of flight, sensors are present.
• These sensors capture data like the speed of flight, moisture, temperature, and other
environmental conditions.
• Based on such data analysis, an environmental parameter within flight is set up and varied.
• By analyzing flight’s machine-generated data, it can be estimated how long the machine can
operate flawlessly and when it can be replaced/repaired.
Auto Driving Car: In the various spots of the car camera, a sensor is placed that gathers data
like the size of the surrounding car, obstacle, distance from those, etc.
• These data are being analyzed, then various calculations are carried out.
• These calculations help to take action automatically.
Virtual Personal Assistant Tool: Big data analysis helps virtual personal assistant tools like
Siri, Cortana and Google Assistant to provide the answer to the various questions asked by users.
• This tool tracks the location of the user, their local time, season, other data related to
questions asked, etc.
• Analyzing all such data provides an answer.
Example: Suppose one user asks “Do I need to take Umbrella?”The tool collects data like
location of the user, season and weather condition at that location, then analyzes these data to
conclude if there is a chance of raining, then provides the answer.

IoT: Manufacturing companies install IOT sensors into machines to collect operational data.
• Analyzing such data, it can be predicted how long a machine will work without any problem
when it requires repair.
• Thus, the cost to replace the whole machine can be saved.

Education Sector Energy Sector: Online educational courses conducting organization utilize
big data to search candidates interested in that course.
• If someone searches for a YouTube tutorial video on a subject, then an online or offline
course provider organization on that subject sends an ad online to that person about their
course.
Media and Entertainment Sector: Media and entertainment service providing company like
Netflix, Amazon Prime, Spotify do analysis on data collected from their users.

• Data like what type of video, music users are watching, listening to most, how long users are
spending on site, etc are collected and analyzed to set the next business strategy.

BIG DATA SECURITY:


Big data security is the collective term for all the measures and tools used to guard both the
data and analytics processes from attacks, theft, or other malicious activities that could harm or
negatively affect them.

For companies that operate on the cloud, big data security challenges are multi-faceted. When
customers give their personal information to companies, they trust them with personal data
which can be used against them if it falls into the wrong hands.

BIG DATA COMPLIANCE:


Data compliance is the practice of ensuring that sensitive data is organized and managed in
such a way as to enable organizations to meet enterprise business rules along with legal and
governmental regulations.

Organizations that don’t implement these regulations can be fined up to tens of millions of
dollars and even receive a 20-year penalty.

BIG DATA AUDITING:


Auditors can use big data to expand the scope of their projects and draw comparisons over larger
populations of data. Big data also helps financial auditors to streamline the reporting process and
detect fraud.

These professionals can identify business risks in time and conduct more relevant and accurate
audits.
BIG DATA PROTECTION:
Big data security is the collective term for all the measures and tools used to guard both the data
and analytics processes from attacks, theft, or other malicious activities that could harm or
negatively affect them.

That’s why data privacy is there to protect those customers but also companies and their
employees from security breaches. When customers give their personal information to
companies, they trust them with personal data which can be used against them if it falls into the
wrong hands.
Data protection is also important as organizations that don’t implement these regulations can be
fined up to tens of millions of dollars and even receive a 20-year penalty.
Big Data privacy and ethics:
Most data is collected through surveys, interviews, or observation. When customers give their
personal information to companies, they trust them with personal data which can be used against
them if it falls into the wrong hands. That’s why data privacy is there to protect those customers
but also companies and their employees from security breaches.

One of the main reasons why companies comply with data privacy regulations is to avoid fines.
Organizations that don’t implement these regulations can be fined up to tens of millions of
dollars and even receive a 20-year penalty. Reasons, why we need to take data privacy seriously,
are :
• Data breaches could hurt your business.
• Protecting your customers’ privacy.
• Maintaining and improving brand value.
• It gives you a competitive advantage.
• It supports the code of ethics.

Big Data Analytics:


Big data analytics is a complex process of examining big data to uncover information, such as
- hidden patterns, correlations, market trends and customer preferences.
• This can help organizations make informed business decisions.
• Data Analytics technologies and techniques give organizations a way to analyze data sets
and gather new information.
• Big Data Analytics enables enterprises to analyze their data in full context quickly and
some also offer real-time analysis.
Importance of Big Data Analytics:
Organizations use big data analytics systems and software to make data-driven decisions that can
improve business-related outcomes.

• The benefits include more effective marketing, new revenue opportunities, customer
personalization and improved operational efficiency.
• With an effective strategy, these benefits can provide competitive advantages over rivals.
• Big Data Analytics tools also help businesses save time and money and aid in gaining
insights to inform data-driven decisions.
• Big Data Analytics enables enterprises to narrow their Big Data to the most relevant
information and analyze it to inform critical business decisions.
Challenges of conventional systems:
• Big data is the storage and analysis of large data sets.
• These are complex data sets that can be both structured or unstructured.
• They are so large that it is not possible to work on them with traditional analytical tools.
• One of the major challenges of conventional systems was the uncertainty of the Data
Management Landscape.
• Big data is continuously expanding, there are new companies and technologies that are
being developed every day.
• A big challenge for companies is to find out which technology works bests for them
without the introduction of new risks and problems.
• These days, organizations are realising the value they get out of big data analytics and
hence they are deploying big data tools and processes to bring more efficiency in their
work environment.

Intelligent data analysis, nature of data:


Intelligent Data Analysis (IDA) is one of the most important approaches in the field of data
mining. Based on the basic principles of IDA and the features of datasets that IDA handles, the
development of IDA is briefly summarized from three aspects:

• Algorithm principle
• The scale
• Type of the dataset
Intelligent Data Analysis (IDA) is one of the major issues in artificial intelligence and
information. Intelligent data analysis discloses hidden facts that are not known previously and
provide potentially important information or facts from large quantities of data. It also helps in
making a decision.
Based on machine learning, artificial intelligence, recognition of pattern, and records and
visualization technology, IDA helps to obtain useful information, necessary data and interesting
models from a lot of data available online in order to make the right choices.

 IDA includes three stages:


(1) Preparation of data.
(2) Data mining.
(3) Data validation and Explanation.

Analytic processes and tools:


Big Data Analytics is the process of collecting large chunks of structured/unstructured data,
segregating and analyzing it and discovering the patterns and other useful business insights from
it. These days, organizations are realizing the value they get out of big data analytics and hence
they are deploying big data tools and processes to bring more efficiency in their work
environment.
Many big data tools and processes are being utilized by companies these days in the processes
of discovering insights and supporting decision making. Big data processing is a set of
techniques or programming models to access large- scale data to extract useful information for
supporting and providing decisions.

 Below is the list of some of the data analytics tools used most in the industry:
• R Programming (Leading Analytics Tool in the industry)
• Python
• Excel
• SAS
• Apache Spark
• Splunk
• RapidMiner
• Tableau Public
• KNime

Analysis vs Reporting:
Reporting: Once data is collected, it will be organized using tools such as graphs and tables.
• The process of organizing this data is called reporting.
• Reporting translates raw data into information.
• Reporting helps companies to monitor their online business and be alerted when data falls
outside of expected ranges.
• Good reporting should raise questions about the business from its end users.
Analysis: Analytics is the process of taking the organized data and analyzing it.
• This helps users to gain valuable insights on how businesses can improve their
performance.
• Analysis transforms data and information into insights.
• The goal of the analysis is to answer questions by interpreting the data at a deeper level
and providing actionable recommendations.

Conclusion:
• Reporting shows us “what is happening”.
• The analysis focuses on explaining “why it is happening” and “what we can do about
it”.

Modern data analytic tools:


• These days, organizations are realising the value they get out of big data analytics and
hence they are deploying big data tools and processes to bring more efficiency to their
work environment.
• Many big data tools and processes are being utilised by companies these days in the
processes of discovering insights and supporting decision making.
• Data Analytics tools are types of application software that retrieve data from one or more
systems and combine it in a repository, such as a data warehouse, to be reviewed and
analysed.
• Most organizations use more than one analytics tool including spreadsheets with
statistical functions, statistical software packages, data mining tools, and predictive
modelling tools.
• Together, these Data Analytics Tools give the organization a complete overview of the
company to provide key insights and understanding of the market/business so smarter
decisions may be made.
• Data analytics tools not only report the results of the data but also explain why the results
occurred to help identify weaknesses, fix potential problem areas, alert decision-makers
to unforeseen events and even forecast future results based on decisions the company
might make.
 Below is the list some of data analytics tools:
• R Programming (Leading Analytics Tool in the industry)
• Python
• Excel
• SAS
• Apache Spark
• Splunk
• RapidMiner
• Tableau Public
• KNime

*****

You might also like