FTA-Module 1-Notes (1)
FTA-Module 1-Notes (1)
1
Risk Mitigation: Proper understanding of data helps in identifying potential risks and
uncertainties, allowing proactive measures to be taken to mitigate them.
Efficient Problem Solving: Data understanding facilitates the identification of relevant
information and patterns, making problem-solving processes more efficient and effective.
Quality Assurance: Understanding data ensures data quality and integrity, minimizing errors
and inaccuracies that could lead to flawed conclusions.
Real-world Applications of Different Data Types
Tabular Data: In finance, tabular data is used for analyzing stock prices, predicting market
trends, and managing investment portfolios. In healthcare, it is utilized for patient record
management, clinical trials analysis, and drug development.
Graphical Data: In marketing, graphical data is used for visualizing sales trends, customer
demographics, and advertising performance. In academia, it is employed for presenting
research findings, illustrating scientific concepts, and summarizing data analysis results.
Image Data: In agriculture, image data is used for monitoring crop health, assessing soil
conditions, and predicting yields. In security, it is utilized for facial recognition, object detection,
and surveillance.
Audio Data: In telecommunications, audio data is used for voice communication, speech
recognition, and voice-controlled devices. In entertainment, it is employed for music streaming,
audio editing, and sound production.
Understanding these data types empowers individuals and organizations to leverage their data
effectively, driving innovation, and creating value across various industries and domains.
Data Understanding and Interpretation
Data understanding and interpretation are fundamental steps in the data analysis process,
facilitating the extraction of meaningful insights and patterns from raw data. Exploratory Data
Analysis (EDA) techniques play a crucial role in this process by providing an initial overview of
the dataset's characteristics, including its structure, distributions, and relationships between
variables. Descriptive statistics, such as mean, median, mode, variance, and standard deviation,
offer quantitative summaries of the dataset's central tendency, dispersion, and shape. These
statistics help analysts gain a deeper understanding of the data's distribution and variability.
2
Additionally, data visualization techniques, such as those offered by Matplotlib, Seaborn, and
Plotly, enable the graphical representation of data, making it easier to identify trends, outliers,
and patterns visually. Understanding data distributions and patterns is essential for uncovering
underlying relationships and making informed decisions based on the data's insights, ultimately
driving effective problem-solving and decision-making processes.
Data understanding forms the foundation of effective analytics. Before diving into complex
analyses or building sophisticated models, it's crucial to thoroughly understand the data you're
working with. This guide will walk you through the essential steps and considerations for
gaining a deep understanding of data in analytics.
1. Data Profiling:
Start by profiling your dataset to get an overview of its characteristics.
Examine the size, structure, and data types to understand the scope of your dataset.
Identify any missing values, outliers, or inconsistencies that may impact your analysis.
3
Include descriptions of data sources, data dictionaries, and any data quality issues encountered.
Document your assumptions, methodologies, and initial insights to guide your analysis.
4
Handling Missing Values, Outliers, and Duplicates:
Missing Values: Data may contain missing values, which can adversely affect analysis and
modeling. Common techniques for handling missing values include:
Removing rows or columns with missing values if they are negligible in quantity.
Imputing missing values using statistical measures such as mean, median, mode, or predictive
imputation.
Outliers: Outliers are data points that significantly deviate from the rest of the dataset.
Techniques for handling outliers include:
Identifying outliers using statistical methods (e.g., Z-score, IQR) and visual inspection.
Treating outliers by winsorizing (capping or flooring extreme values) or transforming them.
Duplicates: Duplicates are identical rows or observations within the dataset. Removing
duplicates ensures data integrity and avoids biases in analysis.
Data Transformation:
Normalization: Normalization scales numerical features to a standard range, typically between
0 and 1, making them comparable. It prevents features with larger scales from dominating the
model.
Standardization: Standardization transforms numerical features to have a mean of 0 and a
standard deviation of 1. It maintains the shape of the distribution and is useful when features
have different scales.
Feature Engineering:
Feature engineering involves creating new features or transforming existing ones to improve
model performance or capture additional information from the data.
Techniques include:
Creating interaction terms or polynomial features to capture nonlinear relationships.
Encoding categorical variables into numerical representations using techniques like one-hot
encoding or label encoding.
Extracting information from date/time variables, text data, or spatial data to create meaningful
features.
5
Data Imputation Methods:
Mean, Median, Mode Imputation: Imputing missing values with the mean, median, or mode of
the respective feature.
Predictive Imputation: Using machine learning algorithms to predict missing values based on
the values of other features. Techniques such as K-nearest neighbors (KNN), linear regression,
or decision trees can be employed for predictive imputation.
Data engineering
Data engineering forms the backbone of any analytics endeavor, laying the groundwork for
collecting, processing, and preparing data for analysis. Here's a concise guide to the basics of
data engineering for students embarking on their analytics journey.
1. Data Collection:
Data engineering starts with collecting data from various sources like databases, APIs, files, or
sensors.
Designing efficient data pipelines to extract, ingest, and aggregate data is key.
2. Data Storage:
Once collected, data needs storage in systems like relational databases, NoSQL databases, data
lakes, or cloud storage services.
3. Data Processing:
Transforming raw data into a usable format for analysis is done through data processing.
Distributed computing frameworks like Apache Spark or Apache Flink are used for efficient
processing of large datasets.
4. Data Integration:
Integrating data from different sources into a centralized repository ensures a unified view.
6
ETL (Extract, Transform, Load) processes are designed for this purpose, ensuring data
consistency and quality.
STATISTICS
Statistics is a science of facts and figures which may be readily available or obtained through
the process of direct enquiry or enumeration. It deals with the methods of collecting,
classifying and analyzing the data so as to draw some valid conclusions. Statistics, as a branch of
mathematics, serves as a foundational framework for collecting, organizing, analyzing,
interpreting, and presenting data. Its applications span across diverse fields, including science,
business, economics, medicine, and social sciences, where data-driven decision-making is
paramount.
Need for Statistics and Exploratory Data Analysis (EDA):
7
Understanding Variation: Statistics helps us understand and quantify the variation
inherent in data. By analyzing variability, we can identify patterns, trends, and
relationships that provide insights into real-world phenomena.
Making Informed Decisions: Statistics provides tools and techniques for making
informed decisions based on data evidence rather than intuition or anecdotal evidence.
It enables us to draw reliable conclusions and predictions from empirical observations.
Quality Improvement: In fields such as manufacturing and healthcare, statistics is used
for quality improvement initiatives. It helps identify areas for improvement, monitor
processes, and make data-driven decisions to enhance quality and efficiency.
Risk Assessment and Management: Statistics plays a vital role in risk assessment and
management by quantifying uncertainty and identifying potential risks. It helps
businesses and organizations make informed decisions to mitigate risks and maximize
opportunities.
Exploratory Data Analysis (EDA): EDA is an essential step in the data analysis process
that involves summarizing the main characteristics of a dataset, often through visual
and numerical techniques. It helps analysts understand the structure of the data, detect
anomalies, and generate hypotheses for further investigation.
Descriptive statistics
Descriptive statistics serve as a fundamental tool for summarizing and conveying the main
characteristics of a dataset. By analyzing descriptive statistics, researchers gain valuable insights
into the central tendency, variability, and distribution of the data. Measures such as the mean,
median, and mode offer insights into the typical or central value of the dataset, while measures
of dispersion like variance and standard deviation quantify the spread or variability of the data
points around the central tendency. Additionally, descriptive statistics help to characterize the
distribution of the data, including its shape, skewness, and kurtosis, providing further
understanding of the dataset's underlying structure. Overall, descriptive statistics play a vital
role in providing a concise and informative summary of the dataset, facilitating data
interpretation and decision-making processes.
8
Inferential Statistics
Inferential statistics involve making inferences or generalizations about a population based on
sample data. It allows researchers to draw conclusions, make predictions, and test hypotheses
about population parameters using sample statistics. Common inferential techniques include:
Hypothesis Testing: A statistical method used to assess whether observed differences or
effects are statistically significant or occurred by chance. It involves formulating null and
alternative hypotheses, selecting an appropriate test statistic, and determining the
probability of observing the test statistic under the null hypothesis (p-value).
Confidence Intervals: Confidence intervals provide a range of values within which the
true population parameter is likely to lie with a certain level of confidence. They are
constructed based on sample statistics and provide a measure of uncertainty around the
point estimate.
Regression Analysis: Regression analysis is used to model the relationship between one
or more independent variables (predictors) and a dependent variable (outcome). It
helps in predicting the value of the dependent variable based on the values of the
independent variables.
Analysis of Variance (ANOVA): ANOVA is used to compare means across multiple
groups or treatments to determine whether there are statistically significant differences
between them. It assesses whether the variability between groups is greater than the
variability within groups.
Classification
The significance of a large mass of statistical data known as raw data cannot be understood
unless it is arranged in some definite manner. The process of arrangement in different groups
is called classification.
Frequency table
9
It is a tabular arrangement consisting of various classes of uniform size known as class intervals
and the number in each class known as frequency.
The difference between two consecutive vertical entries of the classes is known as the width of
the class usually denoted by ℎ. The average of the left and right end points of the class interval
is known as the midpoint of the class usually denoted by 𝑥.
Mean, Variance and Standard Deviation
∑𝑥
Mean = 𝑥̅ = 𝑛
∑(𝑥−𝑥̅ )2 ∑ 𝑥2
Variance = 𝑉 = (or) 𝑉= − 𝑥̅ 2
𝑛 𝑛
Standard Deviation = 𝜎 = √𝑉
∑ 𝑓𝑥
Mean = 𝑥̅ = ∑𝑓
∑ 𝑓(𝑥−𝑥̅ )2 ∑ 𝑥 2𝑓
Variance = 𝑉 = ∑𝑓
(or) 𝑉= ∑𝑓
− 𝑥̅ 2
Standard Deviation = 𝜎 = √𝑉
If 𝑚1 , 𝜎1 be the mean and standard deviation of a sample of size 𝑛1 and 𝑚2 , 𝜎2 be the mean
and standard deviation of a sample of size 𝑛2 then the standard deviation 𝜎 of the combined
ample of size 𝑛1 + 𝑛2 is given by
10
where 𝐷𝑖 = 𝑚𝑖 − 𝑀 ; 𝑚 being the mean of combined sample.
Median
(i) If the values of a variable are arranged in the ascending or descending order of
magnitude:
Thus the median is equal to the mid-value (ie) the value which divides the total
frequency into equal parts
𝑯 𝑵
𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑳 + ( − 𝑪)
𝑭 𝟐
Mode
(i) The mode is defined as that value of the variable which occurs most frequently (ie)
the value of the maximum frequency.
11
(ii) For the grouped distribution it is given by the formula for regular distribution
𝑯 𝒇𝟏
𝑴𝒐𝒅𝒆 = 𝑳 +
𝒇𝟏 + 𝒇𝟐
PROBLEMS:
1. Find the (i) mean, (ii) median, (iii) mode and (iv) standard deviation of a set of
observations: 6, 8, 7, 5, 4, 9, 3, 3, 3
Solution:
∑𝑥 6+ 8+ 7+ 5+ 4+ 9+ 3+3+3
(i) Mean = 𝑥̅ = = = 5.333
𝑛 9
∑ 𝑥2
∴ Variance = 𝑉 = − 𝑥̅ 2
𝑛
1
𝑉 = 9 {62 + 82 + 72 + 52 + 42 + 92 + 32 + 32 + 32 } − 5.3332
12
𝑉 = 4.6667
2. Find the (i) mean, (ii) median, (iii) mode and (iv) standard deviation for the following
grouped data
Mid value 5 10 15 20 25 30 35
Frequency 2 5 8 10 7 4 1
Solution:
𝒙 𝒇 𝒙𝒇 𝒙𝟐 𝒇 C.F
5 2 10 50 2
10 5 50 500 7
15 8 120 1800 15
20 10 200 4000 25
25 7 175 4375 32
30 4 120 3600 36
35 1 35 1225 37
∑ 𝒇 = 37 ∑ 𝒇𝒙 = 710 ∑ 𝒙𝟐 𝒇 = 15550
∑ 𝑓𝑥 710
(i) Mean = 𝑥̅ = ∑𝑓
= = 19.1892
37
37
(ii) Here 𝑁 = = 18.5 , which lies at x = 30, hence Median is 30
2
(iii) The value of x corresponding to the maximum frequency size 10 is 30. Hence mode is
30
13
(iv) Standard Deviation = 𝜎 = √𝑉
∑ 𝑥 2𝑓 15550
∴ Variance = 𝑉 = ∑𝑓
− 𝑥̅ 2 = − 19.18922 = 52.0453
37
Solution:
Class 𝒇 𝒙 𝒇𝒙 𝒙𝟐 𝒇 C.F
0 - 10 3 5 15 75 3
10 - 20 16 15 240 3600 19
20 - 30 26 25 650 16250 45
30 - 40 31 35 1085 37975 76
40 - 50 16 45 720 32400 92
50 - 60 8 55 440 24200 100
∑ 𝒇 = 100 ∑ 𝒇𝒙 =3150 ∑ 𝒙𝟐 𝒇 = 114500
∑ 𝑓𝑥 3150
(i) Mean = 𝑥̅ = ∑𝑓
= = 31.5
100
𝑁 100
(ii) Here = = 50 , which falls in the interval 30 – 40.
2 2
14
∑ 𝑥 2𝑓 114500
∴ Variance = 𝑉 = ∑𝑓
− 𝑥̅ 2 = − 31.52 = 152.75
100
X 30 44 66 62 60 34 80 46 20 38
Y 34 46 701 38 55 48 60 34 45 30
Solution:
𝑿 𝒀 𝑿𝟐 𝒀𝟐
30 34 900 1156
44 46 1936 2116
66 70 4356 4900
62 38 3844 1444
60 55 3600 3025
34 48 1156 2304
80 60 6400 3600
46 34 2116 1156
20 45 400 2025
38 30 1444 900
∑ 𝑿 = 480 ∑ 𝒀 = 460 ∑ 𝑿𝟐 = 26152 ∑ 𝒀𝟐 = 22626
∑𝑋 480
Mean of X = 𝑋̅ = 𝑛 = 10 = 48
∑𝑌 460
Mean of Y = 𝑌̅ = 𝑛 = 10 = 46
∑ 𝑋2 26152
Variance of X = 𝜎12 = − 𝑋̅ 2 = − 482 = 311.2
𝑛 10
15
∑ 𝑌2 22626
Variance of Y = 𝜎22 = − 𝑌̅ 2 = − 462 = 146.6
𝑛 10
Since mean of X is greater than mean of Y, therefore X is the better score getter. And
coefficient of variation of X is less than the coefficient of variation of Y, then X is more
consistent than Y.
5. An analysis of monthly wages paid to the workers of two company A and B belonging to
the same industry gives the following results:
Solution:
No. of worker in company X = 𝑛1 = 500
16
∑𝑋
Mean of X = 𝑋̅ = = 186 ∑ 𝑋 = 𝑛1 𝑋̅ = 500(186) = 93000
𝑛1
Variance of X = 𝜎12 = 81
Variance of Y = 𝜎22 = 100
Standard Deviation of X = 𝜎1 = √81 = 9
Standard Deviation of Y = 𝜎2 = √100 = 10
𝜎 9
Coefficient of variation of X = 𝑋̅1 ∗ 100 = 186 ∗ 100 = 4.84
𝜎2 10
Coefficient of variation of Y = ∗ 100 = ∗ 100 = 5.71
𝑌̅ 175
Combined S.D:
1
𝜎2 = (𝑛 𝜎 2 + 𝑛2 𝜎22 + 𝑛1 𝐷12 + 𝑛2 𝐷22 )
𝑛1 + 𝑛2 1 1
1
𝜎2 = (500(81) + 600(100) + 500(186 − 180)2 + 600(175 − 180)2 )
500 + 600
𝜎 2 = 121.3636
∴ 𝜎 = √121.36 = 11.0165
17
Co-variation of two independent magnitudes is known as correlation. If two variables x and y
are related in such a way that increase or decrease in one of them corresponds to increase or
decrease in the other. We say that the variables are positively correlated. Also if increase or
decrease in one of them corresponds to decrease or increase in the other, the variables are said
to be negatively correlated.
Properties
The coefficient of correlation numerically does not exceed unity
Covariance:
Let the corresponding values of two variable X and Y given in ordered pairs then the covariance
between X and Y is denoted by 𝐶𝑜𝑣(𝑋, 𝑌) and it is defined as
∑(𝒙 − 𝒙
̅)(𝒚 − 𝒚
̅)
𝑪𝒐𝒗(𝑿, 𝒀) =
𝒏
Properties
The coefficient of correlation numerically does not exceed unity
If X and Y are independent then 𝐶𝑜𝑣(𝑋, 𝑌) = 0
Note
If 𝑟 = ±1, we say that x and y are perfectly correlated and if 𝑟 = 0, we say that x and y are non-
correlated
PROBLEMS:
Q. Compute the coefficient of correlation and Covariance
x 1 2 3 4 5 6 7
y 9 8 10 12 11 13 14
Solution:
𝝈𝟐𝒙 +𝝈𝟐𝒚 −𝝈𝟐𝒛
(i) We have 𝒓= where 𝒛 = 𝒙 − 𝒚
𝟐𝝈𝒙 𝝈𝒚
x y z=x-y x2 y2 z2
1 9 -8 1 81 64
2 8 -6 4 64 36
3 10 -7 9 100 49
4 12 -8 16 144 64
5 11 -6 25 121 36
19
6 13 -7 36 169 49
7 14 -7 49 196 49
∑ 𝒙 = 28 ∑ 𝒚 = 77 ∑ 𝒛 = -49 ∑ 𝒙𝟐 = 140 ∑ 𝒚𝟐 = 875 ∑ 𝒛𝟐 = 347
∑𝑥 28 ∑ 𝑥2 140
𝑥̅ = = =4 ; 𝜎𝑥2 = − 𝑥̅ 2 = − 42 = 4
𝑛 7 𝑛 7
∑𝑦 77 ∑ 𝑦2 875
𝑦̅ = = = 11 ; 𝜎𝑦2 = − 𝑦̅ 2 = − 112 = 4
𝑛 7 𝑛 7
∑𝑧 −49 ∑ 𝑧2 347
𝑧̅ = = = −7 ; 𝜎𝑧2 = − 𝑧̅ 2 = − (−7)2 = 0.57
𝑛 7 𝑛 7
𝟒+𝟒−𝟎.𝟓𝟕
Therefore 𝒓= = 𝟎. 𝟗𝟑
𝟐(𝟐)(𝟐)
(i) 𝑪𝒐𝒗(𝑿, 𝒀) = 𝒓 𝝈𝒙 𝝈𝒚
= (𝟎. 𝟗𝟑)(𝟐)(𝟐) = 𝟑. 𝟕𝟐
Solution:
x y ̅
𝑿= 𝒙−𝒙 ̅
𝒀 =𝒚−𝒚 X2 Y2 XY
1 8 -6 -7 36 49 42
3 6 -4 -9 16 81 36
4 10 -3 -5 9 25 15
2 8 -5 -7 25 49 35
5 12 -2 -3 4 9 6
8 16 1 1 1 1 1
9 16 2 1 4 1 2
10 10 3 -5 9 25 -15
20
13 32 6 17 36 289 102
15 32 8 17 64 289 136
∑ 𝒙 = 70 ∑ 𝒚 = 150 ∑ 𝑿𝟐 = 204 ∑ 𝒀𝟐 = 818 ∑ 𝑿𝒀 = 360
2. The following is the frequency distribution of a random sample of weekly earnings of the
employees. Calculate the average weekly
Weekly 10 12 14 16 18 20 22 24 26 28 30 32
earning
No. of 3 6 10 15 24 42 75 90 79 55 36 26
employees
Class 0 – 10 10 – 20 20 – 30 30 – 40 40 - 50
Frequency 7 8 20 10 5
Class 0–8 8 – 16 16 – 24 24 – 32 32 - 40 40 - 48
Frequency 8 7 16 24 15 7
5. The total sale (in thousands) of a particular item in a shop, on 10 consecutive days is
reported by a clerk as 35, 29.6, 38, 30, 40, 41, 42, 45, 3.6, and 3.8. Calculate the average.
Later it was found that there was a number 10 in the machine and the reports of 4 th to 8th
day were 10 more than the true values and in the last 2 days he put a decimal in wrong
place (for example 3.6 was really 36). Calculate the true mean value.
21
6. For the two frequency distributions given below the mean calculated from the first was 25.4
and that the second was 32.5 find the value of the 𝑥 and 𝑦.
Class 10 – 20 20 – 30 30 – 40 40 - 50 50 – 60
Frequency - 1 20 15 10 𝑥 𝑦
Frequency - 2 4 8 4 2𝑥 𝑦
Number 1 2 3 4 5 6 7 8 9
Frequency 8 10 11 16 20 25 15 9 6
Number 5 10 15 20 25 30 35 40 45
Frequency 29 224 465 582 634 644 650 653 655
Class 20 – 30 30 – 40 40 - 50 50 – 60 60 - 70
Frequency 3 5 20 10 5
10. A number of particular articles has been classified according to their weight. After drying
for two week the same articles have again be weighted and similarly classified. It is known
that the median weight in the first weight was 20.8302 while in the second weighting it was
17.3502. Some frequencies 𝑎 and 𝑏 in the first weighting and 𝑥 and 𝑦 in the second are
𝑥 𝑦
missing. It is given that 𝑎 = 3 and 𝑏 = 2 . Find out the missing frequencies.
Class 0–5 5 – 10 10 – 15 15 - 20 20 – 25 25 - 30
Frequency - 1 𝑎 𝑏 11 52 75 22
Frequency - 2 𝑥 𝑦 40 50 30 28
11. In a factory employing 3000 person, 5% earn less than Rs. 3 per hour, 580 earn from Rs.
3.01 to Rs. 4.5 per hour, 30% earn from Rs. 4.51 to Rs. 6 per hour. 500 earn from Rs. 6.01 to
Rs. 7.5 per hour, 20% earn from Rs. 7.51 to 9 per hour and the rest earn Rs. 9.01 or more
per hour. What is the median wage?
12. According to the census of 2021, the following are the population figures in thousands of 20
cities: 2000, 1180, 1785, 1500, 560, 782, 1200, 385, 1123, 222, 2001, 1178, 1780, 1550, 559,
780, 1250, 390, 1120, 225. Find the median.
22
13. Find the mode of the following data
Number 1 2 3 4 5 6 7 8
Frequency 4 9 16 25 22 15 7 3
14. The median and mode are given to be Rs. 25 and Rs. 24 respectively. Calculate the missing
frequency.
Class 0 – 10 10 – 20 20 – 30 30 – 40 40 - 50
Frequency 14 𝑥 27 𝑦 15
Class 0 – 10 10 – 20 20 – 30 30 – 40 40 - 50 50 - 60 60 - 70
Frequency 5 8 7 12 28 20 10
16. The median and mode of the following wages are known to be Rs. 33.5 and Rs. 34
respectively. Find the value of 𝑥, 𝑦 and 𝑧. Given total frequency is 230.
Class 0 – 10 10 – 20 20 – 30 30 – 40 40 - 50 50 - 60 60 - 70
Frequency 4 16 𝑥 𝑦 𝑧 6 4
17. Calculate the mode form the following frequency distribution by the method of grouping
Number 4 5 6 7 8 9 10 11 12 13
Frequency 2 5 8 9 12 14 14 15 11 13
18. Calculate the standard deviation from the following frequency distribution
Number 6 7 8 9 10 11 12
Frequency 3 6 9 13 8 5 4
19. For a group of 200 candidates, the mean and standard deviation of scores were found to be
40 and 15 respectively. Later on it was discovered the score 43 and 35 was misread as 34
and 53 respectively. Find the corrected standard deviation corresponding to the corrected
figures.
23
21. The first group of the two samples has 100 items with mean 15 and standard deviation 3. If
the whole group has 250 items with mean 15.6 and standard deviation √13.44. Find the
standard deviation of the second group.
22. The number examined, the mean weight and standard deviation in each group of
examination by three medical examination are given below. Find the mean weight and
standard deviation of the entire data when grouped together
23. Find the correlation co-efficient between 𝑥 and 𝑦 from the given data:
X 21 23 30 54 57 58 72 78 87 90
Y 60 71 72 83 110 84 100 92 113 135
24. Find the correlation co-efficient between 𝑥 and 𝑦 from the given data:
𝑥 78 89 97 69 59 79 68 57
𝑦 125 137 156 112 107 138 123 108
25. Calculate the covariance of the following pairs of observation of two variables:
(10,35), (15,20), (20,30), (25,30), (30,35), (35,38), (40,42), (45,30), (50,40)
26. Find the Covariance by using co-efficient of correlation between industrial production and
export using the following data and comment on the result.
𝑥 98 87 90 85 95 75
𝑦 15 12 10 10 16 7
28. Calculate the Covariance by using correlation co-efficient for the following heights in inches
of fathers (x) and their sons (y).
x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71
*********
24