0% found this document useful (0 votes)
4 views

Data visualization (3)

The document discusses data visualization, emphasizing the importance of organizing and presenting data through methods like tabulation and graphical representation. It outlines various types of data, such as raw, organized, and frequency distributions, and introduces visualization techniques like histograms and scatter plots. Additionally, it explains the concepts of cross-section and time series data, along with practice questions for applying these concepts.

Uploaded by

shivagahoi2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data visualization (3)

The document discusses data visualization, emphasizing the importance of organizing and presenting data through methods like tabulation and graphical representation. It outlines various types of data, such as raw, organized, and frequency distributions, and introduces visualization techniques like histograms and scatter plots. Additionally, it explains the concepts of cross-section and time series data, along with practice questions for applying these concepts.

Uploaded by

shivagahoi2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data visualization

• Data refers to raw facts, figures, or measurements collected about events, objects, or phenomena. It forms the foun-
dation for gaining insights and making informed decisions. Data can be qualitative, such as names or categories, or
quantitative, like numerical values or measurements. However, raw data in its initial form is often vast, unstructured,
and challenging to interpret.
• Examples of Data:
• Names of students in a class.
• Heights of individuals in centimeters.
• Monthly sales of a product.
• Customer satisfaction ratings.
• Tabulating and visualizing data are crucial processes that simplify and organize complex datasets, making them
easier to understand and analyze.
• Tabulation systematically arranges data into rows and columns, highlighting key features and relationships, while
visualization uses charts, graphs, and diagrams to provide a quick, engaging overview of the information.
• Importance of Data Visualization: Data visualization involves using graphical methods to represent data. It plays
a critical role in data analysis by enabling:

• Clarity: Simplifies complex datasets, making them easier to understand.


• Trend identification: Highlights patterns and trends that may not be immediately apparent in raw data.
• Quick decision-making: Facilitates faster insights, crucial for decision-making processes.
• Engagement: Visual representation is more engaging and intuitive than raw data or tables.

Organizing data
The process of organizing data related to a quantitative phenomenon typically involves the following stages:

• Raw data: A collection of individual observations in their original, unorganized form. For example, a list of exam
scores such as 45, 67, 89, 56, etc.

• Organized (Arrayed) data: Sorting raw data into ascending or descending order to make patterns more apparent.
For instance, arranging the scores as 45, 56, 67, 89.
• Discrete (Ungrouped) frequency distribution: Representing data by showing how often each individual value
occurs. For example, a table displaying the frequency of each exam score.

• Grouped frequency distribution: Combining data values into intervals or ranges (e.g., 40–49, 50–59) and showing
the frequency of observations within each range. This method is useful for summarizing large datasets.
• Continuous frequency distribution: Similar to grouped frequency distribution but used for continuous data,
where the intervals have no gaps (e.g., 40.5–49.5, 50.5–59.5). This is particularly relevant for measurements like
height or weight.

Frequency distribution table


• A frequency distribution is a tabular arrangement of data that pairs the values of a variable with their corre-
sponding frequencies, i.e., the number of times each value occurs.
• Example: The following table shows the ages of 50 individuals in a ward:
36 30 39 44 40 49 51 41 43 35
30 38 53 32 35 38 32 45 44 41
50 45 42 31 36 50 34 33 32 50
46 39 37 33 34 34 35 48 50 47
35 50 48 45 47 47 43 49 52 35

1
• Lets express it in the form of a discrete or ungrouped frequency distribution:
Age Tally Bar Frequency Age Tally Bar Frequency
30 :: 2 42 :: 2
31 : 1 43 :: 2
32 ::: 3 44 :: 2
33 :: 2 45 ::: 3
34 ::: 3 46 : 1
35 ; 5 47 ::: 3
36 :: 2 48 :: 2
37 : 1 49 :: 2
38 :: 2 50 ; 5
39 :: 2 51 : 1
40 : 1 52 : 1
41 : 1 53 : 1

Grouped Frequency Distribution


• An ungrouped frequency distribution is particularly useful in two scenarios:
• When the values of a variable are frequently repeated, making it easier to identify patterns without further
grouping.
• When the variable involves only a small number of distinct values, allowing for direct and clear representation
without significant loss of simplicity.
• However, when the values are scattered or numerous, ungrouped frequency distribution may not effectively condense
the data. In such cases, the first meaningful step in summarizing the data is dividing it into classes (or class
intervals).
• This involves grouping the entire range of variable values into appropriate intervals and recording the number of
observations in each group. Grouping data into classes not only simplifies the dataset but also makes the data more
interpretable and ready for analysis.
• The following table represents the grouped frequency distribution of the data:
Marks range Number of students
30–34 11
35–39 12
40–44 8
45–49 11
50–54 8

Grouped frequency distribution


• A grouped frequency distribution can be converted into a continuous grouped frequency distribution by
ensuring that the intervals are continuous, with no gaps between the upper limit of one class and the lower limit of
the next class. This involves adjusting the class boundaries slightly.

• Identify Class Intervals: Start with the class intervals from the grouped frequency distribution. For example:
30–34, 35–39, 40–44, 45–49, 50–54.
• Adjust the boundaries of each class to make them continuous. This is done by subtracting a small value from
the lower boundary and adding the same value to the upper boundary.
• For 30–34: new lower boundary = 30 - 0.5=29.5, new upper boundary = 34 + 0.5=34.5. Repeat this for all
intervals.
• Create continuous classes: The adjusted intervals will now be: 29.5-34.5, 34.5-39.5, 39.5-44.5, 44.5-49.5, 49.5-
54.5.

2
Marks range Number of students
29.5–34.5 11
34.5–39.5 12
39.5–44.5 8
44.5–49.5 11
49.5–54.5 8

• The ideal number of classes in a frequency distribution is essential to ensure the data is represented in a balanced
manner. Too few classes can oversimplify the data and obscure important details, while too many classes can make
the data difficult to interpret.
• To determine the optimal number of classes, we will use the following formulas:
• k = 1 + 3.322 log10 N
• k = 1 + log2 N
Here, k is the approximate number of classes and N is the total number of observations. Given Data:
• Total number of observations, n = 50 (from the marks data in the previous example).
Using formula: k = 1 + log2 50, k = 1 + 5.65 = 6.65. Round k to the nearest whole number: k ≈ 7. Researchers may
adjust the number of classes based on specific dataset characteristics and the objectives of their analysis.
Histogram
• A histogram is a graphical representation of a grouped frequency distribution with continuous classes. It is an area
diagram and can be defined as a set of rectangles with bases along with the intervals between class boundaries and
with areas proportional to frequencies in the corresponding classes.
• Example: The following table represents the variable and its frequency distribution:
Variable range Frequency
10–20 15
21–30 23
31–40 9
41–50 36
51–60 53
61–70 48
71–80 60

• Histogram for the above frequency distribution will be:

Ogive
• An ogive is a graphical representation of the cumulative frequency distribution of a dataset. It is used to determine
the number of observations below a particular value in the dataset and is particularly helpful in understanding
the distribution of data. An ogive is a smooth, non-decreasing curve that progresses as the cumulative frequencies
increase.
Scattar plot

3
• Scatter plots are the graphs that present the relationship between two variables in a data-set. It represents data
points on a two-dimensional plane or on a Cartesian system. The independent variable or attribute is plotted on the
X-axis, while the dependent variable is plotted on the Y-axis. These plots are often called scatter graphs or scatter
diagrams.
• Here, a scattar plot is depicted where X-axis shows age and Y-axis shows weights of 50 students.

Cross-Section data

• Cross-section data refers to data collected at a single point in time across multiple subjects or entities. It provides
a snapshot of different entities or variables at a particular moment.
• Examples: Income levels of households in a city in 2025, GDP of various countries for the year 2023, test scores of
students in a particular class on a single exam day.
Time series data

• Time series data refers to data collected over multiple time periods for a single subject or entity. It captures how a
variable changes over time.
• Monthly sales of a product from January to December 2024, temperature readings recorded daily over a year, stock
prices of a company observed over a week.

Practice questions
1. A fitness center has conducted a survey to record the number of push-ups completed by 50 participants in a single
session. The following data represents the recorded push-ups for each participant: 45, 50, 38, 52, 47, 60, 43, 55, 49,
41, 62, 51, 48, 58, 40, 44, 46, 54, 53, 57, 39, 42, 61, 59, 56, 45, 50, 37, 63, 64, 41, 47, 52, 48, 46, 58, 44, 60, 49, 53,
55, 43, 62, 57, 45, 50, 40, 48, 54, 56. Create an ungrouped frequency distribution table. Organize the data into a
grouped frequency distribution table. Use the following intervals for the class ranges: 37–40, 41–44, 45–48, 49–52,
53–56, 57–60, 61–64. Draw a histogram to represent the grouped frequency distribution.
2. A teacher conducted a math exam for 40 students in the class. The teacher wants to analyze the overall performance
of the students to identify trends and areas for improvement. The following marks (out of 100) were obtained by
the students: 48, 72, 65, 89, 54, 77, 61, 92, 68, 74, 59, 81, 66, 90, 55, 73, 50, 85, 78, 69, 64, 70, 58, 88, 82, 60, 67, 76,
71, 53, 62, 84, 57, 75, 49, 86, 80, 63, 56, 79. Group the marks into class intervals of width 10 and draw histogram
and ogive.
3. A sports academy conducted a survey to examine the relationship between the age and height of 20 participants.
The following data represents the observations:

4
Age (Years) Height (cm) Age (Years) Height (cm)
12 140 22 174
13 145 23 175
14 150 24 176
15 152 25 178
16 158 26 179
17 160 27 180
18 165 28 181
19 168 29 182
20 170 30 183
21 172 31 184
Using the given data, draw a scatter plot.
4. A retail store wants to analyze the monthly sales trend for one of its popular products over the past year. The data
below shows the sales (in units) for each month:
Month Sales (Units) Month Sales (Units)
January 120 July 170
February 130 August 160
March 140 September 150
April 150 October 140
May 160 November 180
June 170 December 190
Using the data provided, plot a time series graph with: Months on the x-axis and sales (in units) on the y-axis.

You might also like