0% found this document useful (0 votes)
19 views75 pages

r File Finall

Uploaded by

aditi modi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views75 pages

r File Finall

Uploaded by

aditi modi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Affiliated to Dr. A. P. J.

Abdul Kalam Technical University, Lucknow, Uttar Pradesh

PRACTICAL FILE

PROGRAM: -MBA (BUSINESS ANALYTICS)

SEMESTER-1A

ACADEMIC YEAR: - 2023-2024

SUBJECT: - BASICS OF DATA MANAGEMENT WITH “R”

SUBJECT CODE: - KMBA152

SCHOLAR NUMBER: - 2301008

SUBMITTED BY: - SUBMITTED TO: -

ADITI MODI MS. NEETU SINGH


(Assistant Professor)

1
INDEX
S.No. Module Page no. Signature
Intro How to Install R and R studio 5
1. Learn the Basic Syntax of R
1.1 R Script for Arithmetic Operators. 6–8
1.2 R Script for Logical Operators. 8 – 10
1.3 R Script for Relational Operators. 10 – 11
1.4 R script for Assignment Operators 12
1.5 R script for miscellaneous operators 12 – 14
1.6 R Script for Conditional Statements. 14 – 16
1.7 R Scripts for Looping. 17 – 19
1.8 R Scripts for User-Defined Functions. 19 – 21
1.9 R Scripts for Data Frames 21 - 26

2. Learn how to organize and modify data in R using data frames


and dplyr R Script for Data Manipulation with the help of
dplyr package
2.1 Filter Function 27
2.2 Distinct function 28
2.3 Arrange function 28
2.4 Select Function 29 – 30
2.5 Rename Function 31
2.6 Mutate Function 31 -32
2.7 Transmutate function 32
2.8 Summarize Function 33

3. Learn how to prepare data for analysis in R using dplyr and


tidyr R Script for Data Manipulation with the help of tidyr
package
3.1 Gather Function 35 -36
3.2 Separate Function 36 – 38
3.3 Unite Function 38 - 40
3.4 Spread Function 41 - 42

2
4. Learn the basics of how to create
visualizations using the popular R package
ggplot2
4.1 R Script for Summary of Data Set 42
4.2 R Script for Data Layers 43
4.3 R Script for Aesthetic Layer 43 – 44
4.4 R Script for Geometric Layer 44 – 45
4.5 R Script for Adding Size, Colour and Shape 45 – 47
4.6 R Script for Histogram Plot 47 – 48
4.7 R Script for Facet Layer 48 – 49
4.8 R Script for Statistics Layer 49 – 50
4.9 R Script for Coordinates Layer 50 – 51
4.10R Script for Coord_cartesian() 51 – 52
4.11 R Script for Theme Layer 52 – 53

5. Learn the basics of aggregate functions in R


with dplyr, which let us calculate quantities
that describe groups of data
5.1 R script to create with 4 columns and group 53 – 54
with subjects and get the aggregates like
minimum, sum, and maximum.
5.2 R Script to create with 4 columns and 55 – 56
group with subjects and get the average
(mean)
6. Learn the basics of joining tables together in
R with dplyr
6.1 R Script for Inner Join 57
6.2 R Script for Left Join 57 – 58
6.3 Script for Right Join 58 – 59
6.4 Script for Full Join 59 – 60
6.5 R Script for Semi Join 60 – 61
6.6 R Script for Anti Join 62

7. Learn to use R or manually calculate the


mean, median, and mode of real-world
datasets
7.1 R Script for importing data using read.csv 62 – 65
and find mean median and mode value

3
8. Learn how to quantify the spread of the dataset
by calculating the variance and standard 66 – 67
deviation in R

9. Learn how to calculate three important


descriptive statistics- Quartiles, Quantiles, and
Interquartile range that describes the spread of 68 – 69
the data
10. Learn about the statistics used to run
hypothesis tests and use R to run different t-tests that 70 - 74
compare distribution

HOW TO INSTALL R AND R STUDIO


4
Steps for Downloading R.

Step – 1: Go to CRAN R website.

Step – 2: Click on the Download R for Windows link

Step – 3: Click on the base subdirectory link or install R for the first-time link

Step – 4: Click Download R X.X.X for Windows (X.X.X stand for the latest version
of R. eg:-

4.3.2) and save the executable .exe file.

Step – 5: Run the .exe file and follow the installation instructions.

Steps for Downloading RStudio.

Step – 1: With R-base installed, let’s move on to installing RStudio. To begin, go


to download RStudio and click on the download button for RStudio desktop.

Step – 2: Click on the link for the windows version of RStudio and save the .exe
file.

Step – 3: Run the .exe and follow the installation instructions.

Now your R and R studio install on your desktop.

5
R SCRIPT
Module 1- Basics of R syntax:

Program 1

1.1 Arithmetic operators: Such operators are used for performing math operations
like addition, subtraction, multiplication, division etc. They are further of 6 types:
a. Addition operator: The values at the corresponding positions of both the
operands are added.
Code:
a = c (1,4.9)
b = c (6, 4)
print (a+b)
Output:

b. Subtraction operator: The second operand values are subtracted from the first.
Code:
a = c (1,0.1)
b = c (2.33, 4)
print (a-b)
Output:

6
c. Multiplication operator(*): The multiplication of corresponding elements of
vectors and Integers are multiplied.
Code:
a = c (1,8)
b = c (2, 4)
print (a*b)
Output:

d. Division operator (/): The first operand is divided by the second operand.
Code:
a = c (1,0.1)
b = c (2.33, 4)
print (a/b)
Output:

e. Power operator (^): The first operand is raised to the power of the second
operand.
Code:
a = c (1,2)
b = c (2, 4)
print (a^b)
Output:

7
f. Modulo operator (%%): The remainder of the first operand divided by the
second operand is returned.
Code:
a = c (1,6)
b = c (8 , 4)
print (a%%b)
Output:

1.2 Logical operators: Logical operations simulate element-wise decision operations,


based on the specified operator between the operands, which are then evaluated to
either a True or False Boolean value. They are of 5 types:
a. Element-wise Logical AND operator (&): Returns True if both the operands are
True.
Code:
a = c (7,5)
b = c (6, 7)
print (a&b)
Output:

8
b. Element-wise Logical OR operator (|): Returns True if either of the operands is
True.
Code:
a = c (7,10)
b = c (14, 5)
print (a|b)
Output:

c. NOT Operator(!): A unary operator that negates the status of the elements of
the operand.
Code:
a=7
print (!a)
Output:

d. Logical and operator (&&): Returns True if both the first elements of the
operands are True.
Code:
a=2
b=5
print (a&&b)
Output:

9
e. Logical OR operator (||): Returns True if either of the first elements of the
operands is True.
Code:
a=7
b=7
print (a||b)
Output:

1.3 Relational operators: The relational operators carry out comparison operations
between the corresponding elements of the operands. They are of 5 types:
a. Less than (<): Returns TRUE if the corresponding element of the first operand is
less than that of the second operand.
Code:
#less than
a <- c(5,3)
b <- c(4,7)
# Performing operations on Operands
print(a<b)
Output:

10
b. Less than equal to (<=): Returns TRUE if the corresponding element of the first
operand is less than or equal to that of the second operand.
Code:
#less than equal to
a <- c(5,3)
b <- c(4,3)
# Performing operations on Operands
print(a<=b)
Output:

c. Greater than (>): Returns TRUE if the corresponding element of the first operand
is greater than that of the second operand.
Code:
#greater than
a <- c(5,3)
b <- c(4,7)
# Performing operations on Operands
print(a>b)
Output:

d. Greater than equal to (>=): Returns TRUE if the corresponding element of the
first operand is greater or equal to that of the second operand.
Code:
#greater than equal to
a <- c(5,3)
b <- c(4,7)
# Performing operations on Operands

11
print(a>=b)
Output:

e. Not equal to (!=): Returns TRUE if the corresponding element of the first operand
is not equal to the second operand.
Code:
#not equal to
a <- c(5,7)
b <- c(4,7)
# Performing operations on Operands
print(a!=b)
Output:

1.4 Assignment operators: Assignment operators are used to assigning values to


various data objects in R. The objects may be integers, vectors, or functions. They
are of 2 types:
a. Left assignment: (<- or <<- or =): Assigns a value to a vector.
Code:
#left assignment
a <- c(7:15)
# Performing operations on Operands
print(a)
Output:

12
b. Right assignment: (-> or ->>): Assigns a value to a vector.
Code:
#right assignment
c(12:25) -> b
# Performing operations on Operands
print(b)
Output:

1.5 Miscellaneous operators: These are the mixed operators that simulate the
printing of sequences and assignment of vectors, either left or right-handed. They
are of 3 types:
a. %in%operator: Checks if an element belongs to a list and returns a boolean value TRUE
if the value is present.
Code:
#%in% operator
a = 0.2
list1 = c(TRUE, 0.1, "apple")
print (a %in% list1)
Output:

13
b. Colon operator : Prints a list of elements starting with the element before the colon to
the element after it.
Code:
#colon operator
a = (1:25)
print(a)
Output:

c. %*%operator: This operator is used to multiply a matrix with its transpose.


Code:
#%*% operator
m = matrix(c(10,9,3,9,5,6,4,6,7), nrow=3, ncol=3)
print(m)
print(t(m))
p = m %*% t(m)
print(p)

Output:

14
1.6 Decision making in R programming (Conditional statements) : The decision
making in R programming are as followed:
a. If statement: Keyword if tells compiler that this is a decision control instruction
and the condition following the keyword if is always enclosed within a pair of
parentheses. If the condition is TRUE the statement gets executed and if
condition is FALSE then statement does not get executed.
Code:
#IF statement
a = 58
b = 70
#TRUE condition
if (a>b)
{
c = a-b
print("condition a>b is TRUE")
print(paste("Difference between a,b is:", c))
}
#FALSE Condition
if (a<b)
{
c = a-b
print("condition a<b is TRUE")
print(paste("Difference between a,b is:", c))
}
Output:

15
b. If-else statement: it provides us with an optional else block which gets
executed if the condition for if block is false. If the condition provided to if block is
true then the statement within the if block gets executed, else the statement
within the else block gets executed.
Code:
#Ifelse statement
a = 58
b = 70
if (a>b)
{
c = a-b
print("condition a>b is TRUE")
print(paste("Difference between a,b is:", c))
} else
{
c = a-b
print("condition a<b is TRUE")
print(paste("Difference between a,b is:", c))
}
Output:

16
c. Nested if else statement: When we have an if-else block as an statement
within an if block or optionally within an else block, then it is called as nested if
else statement.
Code:
#Nested IF Statement
ifelse(test = 15>16,
yes = ifelse(test = 15>14,
yes = 'TRUE TWICE',
no = "YES, & NO"),
no = "No")
Output:

1.7 R scripts for looping: There are 3 types of loops in R programming:


a. For loop: Repeat a statement or group of statement for certain number of
times
17
Code:
#For loop
for(value in seq (10,20,2))
{
print(value)
}
Output:

b. While loop: Tests the condition and repeats a statement or group of


statements.
Code:
#while loop
a=2
while(a<=4)
{
print(a)
a=a+1
}
Output:

18
c. Repeat loop: It executes sequence of statements multiple times:
Code:
#repeat loop
a=3
repeat
{
print(a)
a=a+1
#checking stop condition
if(a>7)
{
#break statement to terminate loop
break
}
}

Output:

19
1.8 User defined Functions: In R, we can create our own functions, such
functions are known as user defined functions.

Code:

#CODE 1

# A simple R function to check

# whether x is a multiple of 2

multipleof2 = function(x){

if(x %% 2 == 0)

return("It is a multiple of 2")

else

return("It is not a multiple of 2")

20
print(multipleof2(4))

print(multipleof2(3))

Output:

Code:

#CODE 2

# A simple R program to demonstrate

# passing arguments to a function

Rectangle = function(length=4, width=7){

area = length * width

return(area)

# Case 1:

print(Rectangle(12, 13))

# Case 2:

print(Rectangle(width = 80, length = 45))

# Case 3:

21
print(Rectangle())

Output:

1.9 R scripts for data frames: Data Frames in R Language are generic data objects of
R which are used to store the tabular data. Data frames can also be interpreted as
matrices where each column of a matrix can be of the different data types. Data
Frame is made up of three principal components, the data, rows, and columns.
a. Creating a data frame:
Code:

#creating a data frame

df1 <- data.frame(

Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),

Duration = c(60, 30, 45),

stringsAsFactors = FALSE

print(df1)

22
Output:

b. Getting structure of R data frame:


Code:

#using str()

df1 <- data.frame(

Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),

Duration = c(60, 30, 45),

stringsAsFactors = FALSE

print(str(df1))

Output:

23
c. Summary of data in data frame:

Code:

#getting the summary

df1 <- data.frame(

Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),

Duration = c(60, 30, 45),

stringsAsFactors = FALSE

summary(df1)

Output:

d. Extract data from data frame:


Code:

#extract data

df1 <- data.frame(


24
Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),

Duration = c(60, 30, 45),

stringsAsFactors = FALSE

print(df1$Pulse)

Output:

e. Expand data frame:


Code 1(Adding Rows):

#expanding the dataframe

#adding rows

df1 <- data.frame(

Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),

Duration = c(60, 30, 45),

stringsAsFactors = FALSE

New_row_DF = rbind(dataframe1, c("Training", 110, 110))

print(New_row_DF)

Output:

25
Code 2(Adding Columns):

#expanding the dataframe

#adding columns

df1 <- data.frame(

Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),

Duration = c(60, 30, 45),

stringsAsFactors = FALSE

New_col_DF = cbind(dataframe1, Steps = c(3000, 6000, 4000))

print(New_col_DF)

Output:

f. Getting Dimensions of the dataframe:


Code:

df1 <- data.frame(

Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),


26
Duration = c(60, 30, 45),

stringsAsFactors = FALSE

print(dim(df1))

Output:

g. Count of Rows and Columns in the dataframe:


Code:

#count of rows and columns in dataframe

df1 <- data.frame(

Training = c("Strength", "Stamina", "Other"),

Pulse = c(100, 150, 120),

Duration = c(60, 30, 45),

stringsAsFactors = FALSE

print(nrow(df1))

print(ncol(df1))

Output:

27
Module 2- Learn how to organize and modify data in R using data
frames and dplyr
Program 2: Data manipulation functions present in DPLYR are:

2.1 Filter: It produces a subset of data frame.


Code:

#Filter

library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

"Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

print(d)

d%>%filter(is.na(ht))

d%>%filter(!is.na(ht))

Output:

28
2.2 Distinct: Removes duplicate rows in a data frame.

Code:

library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

"Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

print(distinct(d))

Output:

2.3 Arrange: It reorders rows of data frame.

Code:
29
library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

"Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

d.name<- arrange(d, school)

print(d.name)

Output:

2.4 Select: The select method is used to extract the required columns as a table by

specifying the required column names in select method.

Code:

#Select

library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

"Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

select(d, starts_with("ht"))

30
select(d, -starts_with("ht"))

select(d, 1:2)

select(d, contains("n"))

select(d, matches("na"))

Output:

2.5 Rename: It rename the variables name.

Code:

31
#Rename

library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

"Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

rename(d, height=ht)

Output:

2.6 -7 Mutate and Transmutate : Create new variables without dropping old ones is

Mutate and Create new variables by dropping old.

Code:

#Mutate & Transmute

library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

```````````````````````````````````` "Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

mutate(d, x3=ht-age)

32
transmute(d, x3=ht+age)

Output:

2.7 Summarize: Give summarized data like sum, average etc.

Code:

#Summarize

library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

"Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

summarise(d, mean=mean(age))

summarise(d, med=min(age))

summarise(d, med=max(age))

summarise(d, med=sd(age))

Output:

33
2.8 Sample: Give the sample of the dataframe

Code:

#Getting part of the dataframe

library(dplyr)

d=data.frame(name=c("Abhinav", "Bharay",

"Cameron", "Devon"),

age=c(17, 15, 19, 16),

ht=c(46, NA, NA, 69),

school=c("yes", "yes", "no", "no"))

sample_n(d,3)

sample_frac(d,0.50)

Output:

34
Module 3- Data Manipulation in R with TIDYR package
Program 3

Step 1: Creation of data:

Code:

#creating the dataframe

library(tidyr)

n = 10

tidydf = data.frame(

S.No = c(1:n),

Group.1 = c(23, 345, 76, 212, 88,

199, 72, 35, 90, 265),

Group.2 = c(117, 89, 66, 334, 90,

101, 178, 233, 45, 200),

Group.3 = c(29, 101, 239, 289, 176,

320, 89, 109, 199, 56))

print(tidydf)

35
Output:

Data manipulation functions present in TIDYR are:

3.1 Gather:

Code:

#Gather

library(tidyr)

n = 10

tidydf = data.frame(

S.No = c(1:n),

Group.1 = c(23, 345, 76, 212, 88,

199, 72, 35, 90, 265),

Group.2 = c(117, 89, 66, 334, 90,

101, 178, 233, 45, 200),

Group.3 = c(29, 101, 239, 289, 176,

320, 89, 109, 199, 56))

tall=tidydf %>%

gather(Group, Frequency,

36
Group.1:Group.3)

print(tall)

Output:

3.2 Separate:

Code:

37
#Separate

library(tidyr)

n = 10

tidydf = data.frame(

S.No = c(1:n),

Group.1 = c(23, 345, 76, 212, 88,

199, 72, 35, 90, 265),

Group.2 = c(117, 89, 66, 334, 90,

101, 178, 233, 45, 200),

Group.3 = c(29, 101, 239, 289, 176,

320, 89, 109, 199, 56))

tall=tidydf %>%

gather(Group, Frequency,

Group.1:Group.3)

sep=tall %>%

separate(Group, c("Allotment",

"Number"))

print(sep)

Output:

38
3.3 Unite:

Code:

library(tidyr)

n = 10

tidydf = data.frame(

S.No = c(1:n),

Group.1 = c(23, 345, 76, 212, 88,


39
199, 72, 35, 90, 265),

Group.2 = c(117, 89, 66, 334, 90,

101, 178, 233, 45, 200),

Group.3 = c(29, 101, 239, 289, 176,

320, 89, 109, 199, 56))

sep=tall %>%

separate(Group, c("Allotment",

"Number"))

uni=sep %>%

unite(Group, Allotment,

Number, sep = ".")

print(uni)

Output:

40
41
3.4 Spread:

Code:

#Spread

library(tidyr)

n = 10

tidydf = data.frame(

S.No = c(1:n),

Group.1 = c(23, 345, 76, 212, 88,

199, 72, 35, 90, 265),

Group.2 = c(117, 89, 66, 334, 90,

101, 178, 233, 45, 200),

Group.3 = c(29, 101, 239, 289, 176,

320, 89, 109, 199, 56))

uni=sep %>%

unite(Group, Allotment,

Number, sep = ".")

sp=uni %>%

spread(Group, Frequency)

print(sp)

Output:

42
Module 4- Basics of how to create visualisations using the popular R package
ggplot 2
Program 4

4.1 Summary of dataset:

Code:

#installing and loading packages

install.packages("dplyr")

library(dplyr)

#summary of the dataset

summary(iris)

Output:

43
4.2 R script for data layers:

Code:

#Rscript for datalayers

library(ggplot2)

library(dplyr)

ggplot(data = iris)

Output:

4.3 R Script for Aesthetic Layer:

Code:

#aesthetic layer

ggplot(data = iris, aes(x = Sepal.Width, y=Petal.Length, col=Sepal.Length))

Output:

44
4.4 R Script for Geometric Layer:

Code:

#geometric layer

ggplot(data = iris, aes(x = Sepal.Width, y = Petal.Length))+geom_point()

Output:

45
4.5 R Script for Adding Size, Colour and Shape:

Code:

#Adding color & size

ggplot(data = iris,

aes(x = Sepal.Width, y = Petal.Length, col = Species)) + geom_point(size = 2)

Output:

46
Code:

# Adding colour and shape

ggplot(data = iris,

aes(x = Sepal.Width, y = Petal.Length, col = factor(Species),

shape = factor(Species))) +

geom_point()

Output:

47
4.6 R script for Histogram Plot

Code:

# Histogram

ggplot(data = iris, aes(x = Sepal.Length)) +

geom_histogram(binwidth = 0.5, fill = "red", color = "green", alpha = 0.7) +

labs(title = "Histogram of Sepal Length in Iris Dataset",

x = "Sepal Length",

y = "Frequency")

Output:

48
4.7 R script for Facet Layer

Code:

# Scatter plot with facet grid

ggplot(data = iris,

aes(x = Sepal.Length, y = Petal.Length, shape = factor(Sepal.Width))) +

geom_point() +

facet_grid(Species ~ .) +

labs(title = "Scatter Plot of Sepal Length vs. Petal Length",

x = "Sepal Length",

49
y = "Petal Length")

Output:

4.8 R script Statistics layer

Code:

#statistics layer

ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length)) +

geom_point() +

stat_smooth(method = lm, col = "yellow")

50
Output:

4.9 R script for Coordinates layer

Code:

#coordinates layer

ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length)) +

geom_point() +

stat_smooth(method = lm, col = "green") +

scale_y_continuous("sepal", limits = c(2, 10),

expand = c(0, 0)) +

scale_x_continuous("petal", limits = c(0, 10),

expand = c(0, 0)) + coord_equal()

51
Output:

4.10R script for Coord_cartesian()

Code:

#coord_cartesian

ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length, col = "pink")) +

geom_point() + geom_smooth() +

coord_cartesian(xlim = c(3, 6))

52
Output:

4.11R script for Theme layer

Code:

#theme layer

ggplot(data = iris, aes(x = Sepal.Length, y = Petal.Length)) +

geom_point() + facet_grid(. ~ Species) +

theme(plot.background = element_rect(

fill = "pink", colour = "purple"))

53
Output:

Module 5- Learn the basics of aggregate functions in R with dplyr, which let us
calculate quantities that describe groups of data.
Program 5

a. Display data:

Code:

library(dplyr)

employee_data = data.frame(emp_id = c(101,201,301,401,501,601,701), name =


c("Chhavi", "Vilay", "Yuvraj", "Udit", "Yukta", "Nivi", "Kshama"), department =
c("Finance","HR","Marketing","HR","Sales",

"Marketing","HR"),

salary = c(34000,23000,41000,20000,35000,67000,87000))

54
print("Original Data frame")

print(employee_data)

Output:

5.1 R script to create with 4 columns and group with subjects and get the aggregates like
minimum, sum, and maximum.

Code:

#R script with aggregates

library(dplyr)

employee_data = data.frame(emp_id = c(101,201,301,401,501,601,701), name =


c("Chhavi", "Gaurav", "geet", "deepika", "aastha", "Nivi", "Kshama"), department
= c("Finance","HR","Marketing","HR","Sales",

"Marketing","HR"),

salary = c(34000,23000,41000,20000,35000,67000,87000))

print(aggregate(employee_data$salary, list(employee_data$department), FUN =


sum))
55
print(aggregate(employee_data$salary, list(employee_data$department), FUN =
max))

print(aggregate(employee_data$salary, list(employee_data$department), FUN =


min)

Output:

5.2 R Script to create with 4 columns and group with subjects and get the average (mean).

Code:

#R script with aggregates

library(dplyr)

employee_data = data.frame(emp_id = c(101,201,301,401,501,601,701), name =


c("Chhavi", "Vikas", "bharat", "kiran", "srishti", "garima", "ritit"), department =
c("Finance","HR","Marketing","HR","Sales",

"Marketing","HR"),

salary = c(34000,23000,41000,20000,35000,67000,87000))

print(aggregate(employee_data$salary, list(employee_data$department), FUN =


mean))

Output:

56
57
Module 6- Basics of joining tables together in R with DPLYR
Program 6 There are different methods of joining data with the dplyr in the R programming
language.

6.1 Using Inner join:

Code:

# create dataframe with 5 integers

a = data.frame(ID=c(10,11,12,13,14))

# create dataframe with 5 integers

b = data.frame(ID=c(13,14,15,16,17))

# perform inner join

inner_join(a, b, by="ID")

Output:

6.2 Using left join:

Code:

# create dataframe with 5 integers

a = data.frame(ID=c(10,11,12,13,14))

# create dataframe with 5 integers

b = data.frame(ID=c(13,14,15,16,17))

58
#performing left join

left_join(a,b, by = "ID")

Output:

6.3 Using Right join:

Code:

# create dataframe with 5 integers

a = data.frame(ID=c(10,11,12,13,14))

# create dataframe with 5 integers

b = data.frame(ID=c(13,14,15,16,17))

# perform right join

right_join(a,b, by = "ID")

Output:

59
6.4 Using Full join:

Code:

# create dataframe with 5 integers

a = data.frame(ID=c(10,11,12,13,14))

# create dataframe with 5 integers

b = data.frame(ID=c(13,14,15,16,17))

# perform full join

full_join(a,b, by = "ID")

Output:

60
6.5 Using Semi join:

Code:

# create dataframe with 5 integers

a = data.frame(ID=c(10,11,12,13,14))

# create dataframe with 5 integers

b = data.frame(ID=c(13,14,15,16,17))

#perform semi join

semi_join(a,b, by = "ID")

Output:

61
6.6 Using Anti join:

Code:

# create dataframe with 5 integers

a = data.frame(ID=c(10,11,12,13,14))

# create dataframe with 5 integers

b = data.frame(ID=c(13,14,15,16,17))

#perform anti join, stands for difference

anti_join(b,a, by = "ID")

Output:

62
Module 7- Calculating mean, median and mode of real world data set using R.

Program 7

1. Selecting data- Data is collected from Kaggle.com and data used is


cardiogoodfitness.

2. Importing the data into R:

Code:

# R program to import data into R

# Import the data using read.csv()

myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)

#printing first 6 rows

print(head(myData))

Output:

3. Calculating mean:

Code:

# R program to import data into R

# Import the data using read.csv()

myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)

#calculating mean
63
mean = mean(myData$Age)

print(mean)

Output:

4. Calculating median:

Code:

# R program to import data into R

# Import the data using read.csv()

myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)

#calculating median

median = median(myData$Age)

print(median)

Output:

5. Calculating mode:

Code:

# R program to import data into R

# Import the data using read.csv()

64
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)

#Calculate mode

mode = function(){

return(sort(-table(myData$Age))[1])

mode()

Output:

6. Printing the last 6 rows and their mode

Code:

# R program to import data into R

# Import the data using read.csv()

myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)

#printing the last 6 rows

print(tail(myData))

mode = function(){

return(sort(table(myData$Age))[2])

mode()
65
Output:

7. Calculating MFV (Most frequent value):

Code:

# R program to import data into R

# Import the data using read.csv()

myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)

#Most Frequenting Value

library(modeest)

mode_a = mfv(myData$Age)

print(mode_a)

Output:

66
Module 8- Calculating the Variance and Standard deviation in R

Program 8

a. Calculating variance:

Code:

#R program to get variance of a list

list=c(212,231,234,564,235)

#Calculating variance using var()

print(var(list))

Output:

b. Calculating standard deviation:

Code:

#R program to get standard deviation of a list

list=c(212,231,234,564,235)

#Calculating standard deviation using sd()

print(sd(list))

Output:

c. Calculating Variance and Standard Deviation using iris dataset


67
Code:

#printing 5 rows of iris data set

head(iris)

sepal = iris$Sepal.Length

#printing variance, sd and mean

print(var(sepal))

print(sd(sepal))

variance_value <- var(sepal)

std_dev_value <- sd(sepal)

# Print the results

cat("Sepal Length Variance:", variance_value, "\n")

cat("Sepal Length Standard Deviation:", std_dev_value,"\n")

Output:

68
Module 9- Learn how to calculate three important descriptive statistics-
Quartiles, Quantiles, and Interquartile range that describe the spread of the
data
Program 9
Quartile (0.25, 0.5, 0.75)
Code:
prob= iris$Sepal.Length
res1= quantile(prob, probs=c(0,0.25,0.5,0.75,1))
res1
Output:

OR
Code:
df<-data.frame(x=c(2,13,5,36,12,50),
y=c('a','b','c','c','c','b'))
res4<-quantile(df$x, probs=c(0,0.25,0.5,0.75,1))
res4
Output:

IQR= Inter quartile Range


Code:
prob= iris$Sepal.Length
IQR(prob)
Output:

Showing Quartiles and Inter quartile Range using Data set- Cardio good fitness
Code:

69
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)

values<-c(values<-c(myData$Age))
quantile(values,0.25)

values<-c(myData$Age)
quantile(values,0.5)

values<-c(myData$Age)
quantile(values,0.75)

values<-c(myData$Age)
IQR(values)

Output:

70
Module 10- Learn about the statistics used to run hypothesis tests and use
R to run different t-tests that compare distributions

Program 10

CODE:
library(ggplot2)
library(dplyr)
library(tidyr)
library(magrittr)
library(gridExtra)
library(e1071)
midwest
head(midwest)
tail(midwest)
summary(midwest)
skewness(midwest$area)
kurtosis(midwest$area)
x = (midwest$popdensity)
t.test(x,y= NULL,alternative = c(“two.sided”, “less”, “greater”),
paired = FALSE, var.equal = FALSE, conf.level = 0.95)

t.test(x,y= NULL,alternative = c(“two.sided”, “less”, “greater”), mu=0,


paired = FALSE, var.equal = FALSE, conf.level = 0.95)

OUTPUT:

71
72
73
#Creating boxplot

Code:
#view first 6 rows of "airquality"dataset

head(airquality)

#create boxplot for the variable"ozone"

boxplot(airquality$Ozone)

#boxplot using ggplot

boxplot(airquality,

data=airquality,

main="temperature distribution by month",

xlab = "month",

ylab = "degrees(f)",

col="steelblue",

border="black")
74
Output:

75

You might also like