0% found this document useful (0 votes)
33 views14 pages

BCE Report

The report discusses the significance of Big Data as a transformative force in various sectors, emphasizing its role in decision-making and efficiency improvements. It outlines the complexities of Big Data, including its volume, variety, velocity, and veracity, and highlights the need for advanced analytics to extract value from large datasets. The document also details the architecture, components, technologies, and applications of Big Data, underscoring its relevance in a digital economy like Hong Kong's.

Uploaded by

Biya Rahul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views14 pages

BCE Report

The report discusses the significance of Big Data as a transformative force in various sectors, emphasizing its role in decision-making and efficiency improvements. It outlines the complexities of Big Data, including its volume, variety, velocity, and veracity, and highlights the need for advanced analytics to extract value from large datasets. The document also details the architecture, components, technologies, and applications of Big Data, underscoring its relevance in a digital economy like Hong Kong's.

Uploaded by

Biya Rahul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A Report on

Big Data

By

59 Steve Correia

60 Latesh Billava

62 Nathen Vaz

63 Nathen Carneiro

64 Justin Madhri

A report submitted in partial fulfilment of the requirements for


TE V semester Business Communication and Ethics Course

Under the guidance of


Ms. Eden Fernandes

Department of Information Technology

St. Francis Institute of Technology

September 2022

1
Abstract

Big data is a new driver of the world economic and societal changes. The world’s data
collection is reaching a tipping point for major technological changes that can bring new ways
in decision making, managing our health, cities, finance and education. While the data
complexities are increasing including data’s volume, variety, velocity and veracity, the real
impact hinges on our ability to uncover the ‘value’ in the data through Big Data Analytics
technologies. Big Data Analytics poses a grand challenge on the design of highly scalable
algorithms and systems to integrate the data and uncover large hidden values from datasets
that are diverse, complex, and of a massive scale. Potential breakthroughs include new
algorithms, methodologies, systems and applications in Big Data Analytics that discover useful
and hidden knowledge from the Big Data efficiently and effectively. Big Data Analytics is
relevant to Hong Kong as it moves towards a digital economy and society. Hong Kong is already
among the best in the world in Big Data Analytics. Big data analytics must also be team effort
cutting across academic institutions, government and society and industry, and by researchers
from multiple disciplines including computer science and engineering, health, data science and
social and policy areas

2
Acknowledgement

A project is always coordinated, guided and scheduled team effort aimed at realizing a common
goal. We are grateful and gracious to all those people who have helped and guided us through this
project and make this experience worthwhile. We wish to sincerely thank our Director Brother
Shantilal Kujur and Principal Dr. Sincy George and our HOD of Information Technology Dr.
Prachi Raut for giving us this opportunity to prepare a project in the Third Year of Information
Technology. We are highly indebted to our institute St. Francis Institute of Technology and the
Department of Information Technology for providing us with this learning opportunity with the
required resources to accomplish our task so far. We are truly grateful to our mentor Ms. Eden
Fernandes who persistently guided us for the betterment of our project, report and the presentation.
This work would not have been possible without her necessary insights and intellectual suggestions
that have helped us achieve so much. We also take the opportunity to thank all teaching and non-
teaching staff for their endearing support and cooperation.

3
Table of Contents

Sr.No Title Pg.no.

1 Introduction 6

2 Problem Definition 7

3 Architecture 8

4 Components of Big Data 10

5 Technology 11

6 Application 12

7 Conclusion 13

8 Reference 14

4
List of Illustrations

Fig No. Title Page No.

Fig 3.1 Big Data Architecture 8

Fig 4.1 Components of Big Data 10

5
Chapter 1: Introduction

Big data is a broad term for data sets so large or complex that traditional data
processing applications are inadequate. Challenges include analysis, capture, data curation,
search, sharing, storage, transfer, visualization, and information privacy. The term often refers
simply to the use of predictive analytics or other certain advanced methods to extract
value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more
confident decision making. And better decisions can mean greater operational efficiency, cost
reductions and reduced risk.
Data sets grow in size in part because they are increasingly being gathered by cheap
and numerous information-sensing mobile devices, aerial (remote sensing), software logs,
cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor
networks. The world's technological per-capita capacity to store information has roughly
doubled every 40 months since the 1980s; as of 2012, every day 2.5 exabytes (2.5×1018) of
data were created; The challenge for large enterprises is determining who should own big data
initiatives that straddle the entire organization.
Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a
desktop PC or notebook that can handle the available data set.
Relational database management systems and desktop statistics and visualization
packages often have difficulty handling big data. The work instead requires "massively parallel
software running on tens, hundreds, or even thousands of servers". What is considered
"big data" varies depending on the capabilities of the users and their tools, and expanding
capabilities make Big Data a moving target.

6
Chapter 2: Problem Definition

Problem Definition is probably one of the most complex and heavily neglected stages
in the big data analytics pipeline. In order to define the problem a data product would solve,
experience is mandatory. Most data scientist aspirants have little or no experience in this stage.
Most big data problems can be categorized in the following ways −
• Supervised Regression
In this case, the problem definition is rather similar to the previous example; the difference
relies on the response. In a regression problem, the response y ∈ ℜ, this means the response is
real valued. For example, we can develop a model to predict the hourly salary of individuals
given the corpus of their CV.
• Unsupervised Learning
Management is often thirsty for new insights. Segmentation models can provide this insight in
order for the marketing department to develop products for different segments. A good
approach for developing a segmentation model, rather than thinking of algorithms, is to select
features that are relevant to the segmentation that is desired.
• Learning to Rank
This problem can be considered as a regression problem, but it has particular characteristics
and deserves a separate treatment. The problem involves given a collection of documents we
seek to find the most relevant ordering given a query. In order to develop a supervised learning
algorithm, it is needed to label how relevant an ordering is, given a query.

7
Chapter 3: Architecture

Fig 3.1 Big Data Architecture


• Data sources: All big data solutions start with one or more data sources. Examples
include:
o Application data stores, such as relational databases.
o Static files produced by applications, such as web server log files.
o Real-time data sources, such as IoT devices.

• Data storage: Data for batch processing operations is typically stored in a distributed file
store that can hold high volumes of large files in various formats. This kind of store is
often called a data lake. Options for implementing this storage include Azure Data Lake
Store or blob containers in Azure Storage.

• Batch processing: Because the data sets are so large, often a big data solution must
process data files using long-running batch jobs to filter, aggregate, and otherwise
prepare the data for analysis. Usually these jobs involve reading source files, processing
them, and writing the output to new files.
• Real-time message ingestion: If the solution includes real-time sources, the
architecture must include a way to capture and store real-time messages for stream
processing. This might be a simple data store, where incoming messages are dropped into
a folder for processing.

8
• Stream processing: After capturing real-time messages, the solution must process
them by filtering, aggregating, and otherwise preparing the data for analysis. The
processed stream data is then written to an output sink.
• Analytical data store: Many big data solutions prepare data for analysis and then
serve the processed data in a structured format that can be queried using analytical tools.
The analytical data store used to serve these queries can be a Kimball-style relational
data .
• Analysis and reporting: The goal of most big data solutions is to provide insights
into the data through analysis and reporting. To empower users to analyze the data,
the architecture may include a data modeling layer, such as a multidimensional OLAP
cube or tabular data model in Azure Analysis Services. It might also support self-
service BI, using the modeling and visualization technologies in Microsoft Power BI or
Microsoft Excel.
• Orchestration: Most big data solutions consist of repeated data processing operations,
encapsulated in workflows, that transform source data, move data between multiple
sources and sinks, load the processed data into an analytical data store, or push the results
straight to a report or dashboard.

9
Chapter 4: Components of Big Data

Big-data projects have a number of different layers of abstraction from abstraction of the
data through to running analytics against the abstracted data. Following figure shows the
basic elements of analytical Big-data and their interrelationships. The higher level
components help make big data projects easier and more dynamic. Hadoop is often at the
center of Big-data projects, but it is not a precondition.

Fig 4.1 Components of Big Data

The Components of analytical Big-Data are given below:


• Hadoop packaging and support organizations like Cloudera; to include Map Reduce-
essentially the compute layer of big data.
• Any File system like Hadoop Distributed File System (HDFS), that manages the retrieval
and storing of data and metadata required for computation. Databases such as Hbase? can
also be used.
• A higher level language such as Pig (part of Hadoop) can be used instead of using JAVA
to simplify the writing of computations
• A data warehouse layer named Hive is a built on top of Hadoop

10
Chapter 5: Technology

Big data requires exceptional technologies to efficiently process large quantities of


data within tolerable elapsed times
Multidimensional big data can also be represented as tensors, which can be
more efficiently handled by tensor-based computation, such as multilinear subspace
learning. Additional technologies being applied to big data include massively parallel-
processing (MPP) databases, search-based applications, data mining, distributed file
systems, distributed databases, cloud based infrastructure (applications, storage and
computing resources) and the Internet.
Some but not all MPP relational databases have the ability to store and manage
petabytes of data. Implicit is the ability to load, monitor, back up, and optimize the use of
the large data tables in the RDBMS.
Real or near-real time information delivery is one of the defining characteristics
of big data analytics. Latency is therefore avoided whenever and wherever possible. Data
in memory is good—data on spinning disk at the other end of a FC SAN connection is not.
The cost of a SAN at the scale needed for analytics applications is very much higher
than other storage techniques.

11
Chapter 6: Applications

The applications of the big data are in the following fields:


1. Government: For example in the United States of America, in the year of
2012, the administration of Obama declared the big data research and
development initiative, because it is used to address many issues faced by the
government. The big data is also utilized by the Indian government.
2. International development: The development in the big data analysis furnishes
cost-effective opportunities to enhance the decision in critical advancement
areas like health care, employment opportunities and crime, security and natural
disaster. Hence, in this way, the big data is helpful for the international
development.
3. Manufacturing: In manufacturing, the big data furnishes an infrastructure for
transparency in manufacturing or producing industry.
4. Cyber-physical models: The present PHM implementations make avail of data
during the actual usage while the analytical step by step procedures can do more
precisely when more data is included. This is the role of big data in the cyber-
physical models.
5. Media: In the media, it is used in the internet of things which do the activities like
targeting of computers and data capturing.
6. Technology: In the technology, it is used in the websites like eBay, Amazon
and Facebook and Google utilize it.
7. Private sector: The application of big data in the private sector includes the
retail, retail banking, and real estate.
8. Science: The best example for its application in science is about the Large
Hardom collider that represented 150 million sensors transmitting information
40 million times per second.
9. The big data also has the application in the science and research.

12
Chapter 7: Conclusion

The availability of Big Data, low-cost commodity hardware, and new information
management and analytic software have produced a unique moment in the history of data
analysis. The convergence of these trends means that we have the capabilities required to
analyze astonishing data sets quickly and cost-effectively for the first time in history.
As more and more data is generated and collected, data analysis requires scalable, flexible,
and high performing tools to provide insights in a timely fashion. However, organizations are
facing a growing big data ecosystem where new tools emerge and “die” very quickly.
Therefore, it can be very difficult to keep pace and choose the right tools.
The Age of Big Data is here, and these are truly revolutionary times if both business and
technology professionals continue to work together and deliver on the promise.

13
Chapter 8: References

[1]https://www.oracle.com/in/big-data/what-is-big-data/#:~:text=Big%20data%20defined,-
What%20exactly%20is&text=The%20definition%20of%20big%20data,especially%20from%20
new%20data%20sources.

[2]https://www.google.com/imgres?imgurl=https%3A%2F%2Flearn.microsoft.com%2Fen-
us%2Fazure%2Farchitecture%2Fguide%2Farchitecture-styles%2Fimages%2Fbig-data-
logical.svg&imgrefurl=https%3A%2F%2Flearn.microsoft.com%2Fen-
us%2Fazure%2Farchitecture%2Fguide%2Farchitecture-styles%2Fbig-
data&tbnid=JXDcfbAxV60fkM&vet=12ahUKEwjLo7H7uLD6AhWsKbcAHeJDAHwQMygAe
gUIARDcAQ..i&docid=CaW5MoAE7PDdYM&w=751&h=267&q=big%20data%20architectur
e&ved=2ahUKEwjLo7H7uLD6AhWsKbcAHeJDAHwQMygAegUIARDcAQ

[3]https://www.google.com/imgres?imgurl=https%3A%2F%2Fstatic.packt-
cdn.com%2Fproducts%2F9781784391409%2Fgraphics%2F4008_01_02.jpg&imgrefurl=https%
3A%2F%2Fsubscription.packtpub.com%2Fbook%2Fbig-data-and-business-
intelligence%2F9781784391409%2F1%2Fch01lvl1sec12%2Fcomponents-of-the-big-data-
ecosystem&tbnid=xqwfeKqU07ihM&vet=12ahUKEwjNnczmurD6AhUmk9gFHaSNBJQQMyg
DegUIARDOAQ..i&docid=CBIJo1w1l4bLAM&w=1000&h=738&q=components%20of%20big
%20data&ved=2ahUKEwjNnczmurD6AhUmk9gFHaSNBJQQMygDegUIARDOAQ

[4]https://www.techtarget.com/searchdatamanagement/definition/big-data

[5] https://www.javatpoint.com/what-is-big-data

[6]https://www.sap.com/india/insights/what-is-big-data.html

14

You might also like