Data Science_notes_X
Data Science_notes_X
Data Science
Page 1 of 3
Data Science
the goal of our project. Its main use is to help structure a system and communicate
the result to others.
5. How is the relationship between the elements, shown in a system map?
Ans : In a System Map, the cause & effect relationship of elements with each other
are shown with the help of arrows. The arrowhead depicts the direction of the effect
and the sign (+ or –) shows their relationship. If the arrow goes from X to Y with a +
sign, it means that both are directly related to each other. That is, If X increases, Y
also increases and vice versa. On the other hand, If the arrow goes from X to Y with
a – sign, it means that both the elements are inversely related to each other which
means if X increases, Y would decrease and vice versa.
6. What are the sources of data?
Ans : Data may be collected offline or online. The offline sources of data are surveys,
interviews, observations, records, sensors etc. The online sources of data are open
sourced government portals, reliable web sites, open sourced statistical web sites
etc.
7. List out the points to be considered while collecting data from any data sources.
Ans : Following points should be kept in mind, while accessing data from any of the
data sources:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken from reliable sources as the data collected from random
sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in proper
training of the AI model.
8. List out the commonly used formats to store tabular data.
Ans : CSV(Comma Separated Values), Spreadsheets & SQL are some commonly
used formats to store tabular data.
9. What is module and package in Python?
Ans : The module is a simple Python file that contains collections of functions and
global variables and with a .py extension file. It is an executable file.
A python package is a collection of modules. Modules that are related to each other
are mainly put in the same package
10. List out the packages used in data science projects.
Ans : NumPy, Pandas, MatPlotLib, NLTK, are the packages used in data science
projects.
11. List out any four types of graphs that can be plotted using Matplotlib.
Ans : Scatter plot, Bar chart, Histogram, Box plot
12. List out the kind of errors that may come with data while collecting it.
Ans : While collecting data, it is possible that the data might come with some errors.
They are, Erroneous Data – a) Incorrect values that do not resemble the kind of data
expected in that position b) Invalid or Null values which comprises corrupted data
values c) Missing Data - The values of these cells are missing and hence the cells
remain empty. d) Outliers - Data which does not fall in the range of a certain element
are referred to as outliers.
13. Why is data visualization important?
Ans : While collecting data, it is possible that the data might come with some errors.
Analysing the data collected can be difficult as it is all about tables and numbers.
Page 2 of 3
Data Science
While machines work efficiently on numbers, humans need visual aid to understand
and comprehend the information passed. Hence, data visualisation is important to
interpret the data collected and identify patterns and trends out of it.
14. Explain the following applications of Data science.
a) Targeted Advertisement b) Internet search
c) Recommender Systems d) Price Comparison
Ans : a) Targeted Advertisement : Targeted advertising is a form of online advertising
that focuses on the specific traits, interests, and preferences of a consumer.
Targeted advertising allows brands to send different messaging to different
consumers based on what the brand knows about the customer.
b) Internet search : Internet search is the process of exploring the Internet for
information with the use of a search engines like Google or Microsoft Bing. AI search
engines work by first crawling and indexing web pages across the internet, extracting
useful data like text, images and links.
c) Recommender Systems : A recommendation system is an AI algorithm, that
uses Big Data to suggest or recommend additional products to consumers. These can
be based on various criteria, including past purchases, search history, demographic
information, and other factors. Recommender systems are highly useful as they help
users discover products and services they might otherwise have not found on their
own.
d) Price Comparison : These are websites that compares the price of a particular
product or service in different stores or from different companies. Price comparison
site helps to find the best price, latest products, and online shopping deals.
Page 3 of 3