0% found this document useful (0 votes)
12 views

Data Literacy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Data Literacy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

AMITY INTERNATIONAL SCHOOL, NOIDA

CLASS XI SUB: ARTIFICIAL INTELLIGENCE SESSION 2024-


25

CHAPTER-5 DATA LITERACY/REVISION NOTES

1. Data Literacy

 Definition:
Data literacy refers to the ability to locate, understand, analyze, and
use data effectively. It includes a range of skills from collecting and
organizing data to interpreting results and applying insights
ethically. It is essential in the age of AI, where raw data is
transformed into actionable insights for various purposes.

 Importance of Data Literacy:

 Critical Thinking: It enhances critical thinking by allowing


students to distinguish between different types of data and
their reliability.

 Decision Making: Data-driven decision-making is key in


fields like education, healthcare, business, and government.
Being data literate enables individuals to make informed
decisions based on solid evidence.

 AI and Machine Learning: As AI and machine learning (ML)


are heavily data-driven, understanding data is crucial for
anyone involved in these fields. AI depends on converting
large datasets into usable knowledge.

 Skills in Data Literacy:

 Data Collection: Gathering relevant and accurate data.

 Data Organization: Structuring data in meaningful ways


(tables, charts, etc.).

 Data Analysis: Using statistical methods or AI tools to


interpret the data.

 Data Ethics: Ensuring that data is used responsibly and


ethically.

 Example Question: “Can you categorize the information you see


online, in books, and from friends? Is all of this information the
same?”
 This primes students to think about data as information in
different forms and how it can be used for various purposes.

2. Data Collection

 Definition:
Data collection refers to the process of gathering information from
multiple sources for analysis, prediction, or further use. It is the
foundational step in any AI or machine learning project.

 Importance:
Collecting accurate and relevant data is critical for creating
predictive models in AI. High volumes of data are often required to
develop reliable algorithms, especially in complex projects such as
medical AI.

 Primary vs. Secondary Data:

 Primary Data: Collected specifically for a particular purpose.


This can be gathered through:

 Surveys: Collecting opinions or feedback through


questionnaires.
Example: A researcher uses a questionnaire to
understand customer preferences for a new product.

 Interviews: Direct communication with individuals or


groups to gather information.
Example: An organization conducts interviews to collect
employee feedback on job satisfaction.

 Observation: Watching and recording behaviors as


they occur naturally.
Example: Observing children’s play patterns in a
schoolyard to understand social dynamics.

 Experiments: Manipulating variables to observe


outcomes and establish cause-effect relationships.
Example: Testing the effectiveness of two different
advertising campaigns on consumers.

 Secondary Data: Data that has already been collected by


others and is available for reuse. This includes:

 Books, Journals, News Articles: Information already


compiled and analyzed.
 Web Scraping: Using automated tools to extract data
from websites (e.g., scraping product prices from an e-
commerce site).

 Social Media Tracking: Gathering and analyzing user


behavior on social platforms.
Example: Analyzing social media comments to
understand public opinion on a new product.

 Precompiled Datasets: Databases like Kaggle that


offer ready-made datasets for analysis.

 Key Considerations in Data Collection:

 Diversity: Ensure the data collected is diverse enough to


cover various scenarios the AI model might encounter. For
example, in training a robot to sort recyclable materials, the
data should include many types of materials to improve the
robot’s performance.

 Volume of Data: The amount of data needed depends on the


complexity of the model. Simple models like license plate
detection require less data, whereas advanced AI systems in
healthcare demand vast amounts of data.

 Question for Students: “Think about your favorite movie


recommendation platform. How do you think they use data to
suggest movies you might like?”

 This helps students connect the concept of data collection


with a familiar real-world application, such as Netflix or
YouTube.

3. Exploring Data

 Definition:
Data exploration is the process of understanding the data,
identifying patterns, and cleaning it before detailed analysis. This
involves getting familiar with the values in the data and
understanding whether they are typical, extreme, or require
correction.

 Levels of Measurement:

 Nominal: Categories with no inherent order, such as colors or


car brands.

 Example: The color of a student’s eyes or the model of a


smartphone.
 Ordinal: Categories with a specific order, but the differences
between them cannot be measured.

 Example: Restaurant ratings like “unpalatable,” “just


okay,” “tasty,” and “delicious.”

 Interval: Ordered data where the differences between values


are meaningful, but there is no true zero.

 Example: Temperature measured in Celsius or


Fahrenheit. A 20-degree difference is meaningful, but 0
degrees does not represent “no temperature.”

 Ratio: Ordered data with a true zero point, allowing for


meaningful ratios between values.

 Example: Weight or exam scores. A score of 80 is four


times greater than a score of 20.

 Key Statistical Concepts:

 Mean: The average of a dataset.

 Median: The middle value of a dataset, useful for skewed


data.

 Mode: The most frequently occurring value in a dataset.

 Variance and Standard Deviation: Measures of how spread


out the data is from the mean

 Question: “Imagine you’re collecting data on students’


favorite movie genres. Could you rank the genres from most
to least popular (ordinal), or would you just say which genre is
the favorite (nominal)?”

 This exercise helps students understand the difference


between nominal and ordinal data and how different types of
data are analyzed.

4. Statistical Analysis of Data

 Definition:
Statistical analysis involves using mathematical techniques to
summarize and interpret data. In AI, statistics help transform raw
data into insights that can guide decisions.

 Measures of Central Tendency:

 Mean: The arithmetic average of a dataset.


 Median: The value that divides a dataset into two equal
halves.

 Mode: The most common value in a dataset.

 Example:

 Mean: For the dataset {5, 10, 15, 20, 30}, the mean is
(5+10+15+20+30)/5 = 16.

 Median: In the dataset {10, 11, 15, 17, 20, 21, 27, 28, 30, 32,
32, 35, 40}, the median is 27.

 Mode: In the dataset {22, 24, 17, 18, 17, 19, 18, 21, 20, 21,
22, 22}, the mode is 22.

 Variance and Standard Deviation: These measures tell us how


spread out data points are from the average. A small variance
indicates that data points are close to the mean, while a large
variance shows that data points are more spread out.

 Example: If the heights of five dogs are 600mm, 470mm,


170mm, 430mm, and 300mm, the variance and standard
deviation can help us understand how much the heights
deviate from the mean height.

5. Representation of Data

 Definition:
Data representation involves visualizing data to make it easier to
interpret. This can include graphs, charts, and diagrams, which help
simplify complex data into a more understandable format.

 Types of Graphical Representations:

 Line Graphs: Show trends over time.

 Bar Charts: Compare categories or groups.

 Pie Charts: Show proportions of a whole.

 Scatter Plots: Show the relationship between two variables.

 Histograms: Represent the frequency distribution of data.

You might also like