r File Finall
r File Finall
PRACTICAL FILE
SEMESTER-1A
1
INDEX
S.No. Module Page no. Signature
Intro How to Install R and R studio 5
1. Learn the Basic Syntax of R
1.1 R Script for Arithmetic Operators. 6–8
1.2 R Script for Logical Operators. 8 – 10
1.3 R Script for Relational Operators. 10 – 11
1.4 R script for Assignment Operators 12
1.5 R script for miscellaneous operators 12 – 14
1.6 R Script for Conditional Statements. 14 – 16
1.7 R Scripts for Looping. 17 – 19
1.8 R Scripts for User-Defined Functions. 19 – 21
1.9 R Scripts for Data Frames 21 - 26
2
4. Learn the basics of how to create
visualizations using the popular R package
ggplot2
4.1 R Script for Summary of Data Set 42
4.2 R Script for Data Layers 43
4.3 R Script for Aesthetic Layer 43 – 44
4.4 R Script for Geometric Layer 44 – 45
4.5 R Script for Adding Size, Colour and Shape 45 – 47
4.6 R Script for Histogram Plot 47 – 48
4.7 R Script for Facet Layer 48 – 49
4.8 R Script for Statistics Layer 49 – 50
4.9 R Script for Coordinates Layer 50 – 51
4.10R Script for Coord_cartesian() 51 – 52
4.11 R Script for Theme Layer 52 – 53
3
8. Learn how to quantify the spread of the dataset
by calculating the variance and standard 66 – 67
deviation in R
Step – 3: Click on the base subdirectory link or install R for the first-time link
Step – 4: Click Download R X.X.X for Windows (X.X.X stand for the latest version
of R. eg:-
Step – 5: Run the .exe file and follow the installation instructions.
Step – 2: Click on the link for the windows version of RStudio and save the .exe
file.
5
R SCRIPT
Module 1- Basics of R syntax:
Program 1
1.1 Arithmetic operators: Such operators are used for performing math operations
like addition, subtraction, multiplication, division etc. They are further of 6 types:
a. Addition operator: The values at the corresponding positions of both the
operands are added.
Code:
a = c (1,4.9)
b = c (6, 4)
print (a+b)
Output:
b. Subtraction operator: The second operand values are subtracted from the first.
Code:
a = c (1,0.1)
b = c (2.33, 4)
print (a-b)
Output:
6
c. Multiplication operator(*): The multiplication of corresponding elements of
vectors and Integers are multiplied.
Code:
a = c (1,8)
b = c (2, 4)
print (a*b)
Output:
d. Division operator (/): The first operand is divided by the second operand.
Code:
a = c (1,0.1)
b = c (2.33, 4)
print (a/b)
Output:
e. Power operator (^): The first operand is raised to the power of the second
operand.
Code:
a = c (1,2)
b = c (2, 4)
print (a^b)
Output:
7
f. Modulo operator (%%): The remainder of the first operand divided by the
second operand is returned.
Code:
a = c (1,6)
b = c (8 , 4)
print (a%%b)
Output:
8
b. Element-wise Logical OR operator (|): Returns True if either of the operands is
True.
Code:
a = c (7,10)
b = c (14, 5)
print (a|b)
Output:
c. NOT Operator(!): A unary operator that negates the status of the elements of
the operand.
Code:
a=7
print (!a)
Output:
d. Logical and operator (&&): Returns True if both the first elements of the
operands are True.
Code:
a=2
b=5
print (a&&b)
Output:
9
e. Logical OR operator (||): Returns True if either of the first elements of the
operands is True.
Code:
a=7
b=7
print (a||b)
Output:
1.3 Relational operators: The relational operators carry out comparison operations
between the corresponding elements of the operands. They are of 5 types:
a. Less than (<): Returns TRUE if the corresponding element of the first operand is
less than that of the second operand.
Code:
#less than
a <- c(5,3)
b <- c(4,7)
# Performing operations on Operands
print(a<b)
Output:
10
b. Less than equal to (<=): Returns TRUE if the corresponding element of the first
operand is less than or equal to that of the second operand.
Code:
#less than equal to
a <- c(5,3)
b <- c(4,3)
# Performing operations on Operands
print(a<=b)
Output:
c. Greater than (>): Returns TRUE if the corresponding element of the first operand
is greater than that of the second operand.
Code:
#greater than
a <- c(5,3)
b <- c(4,7)
# Performing operations on Operands
print(a>b)
Output:
d. Greater than equal to (>=): Returns TRUE if the corresponding element of the
first operand is greater or equal to that of the second operand.
Code:
#greater than equal to
a <- c(5,3)
b <- c(4,7)
# Performing operations on Operands
11
print(a>=b)
Output:
e. Not equal to (!=): Returns TRUE if the corresponding element of the first operand
is not equal to the second operand.
Code:
#not equal to
a <- c(5,7)
b <- c(4,7)
# Performing operations on Operands
print(a!=b)
Output:
12
b. Right assignment: (-> or ->>): Assigns a value to a vector.
Code:
#right assignment
c(12:25) -> b
# Performing operations on Operands
print(b)
Output:
1.5 Miscellaneous operators: These are the mixed operators that simulate the
printing of sequences and assignment of vectors, either left or right-handed. They
are of 3 types:
a. %in%operator: Checks if an element belongs to a list and returns a boolean value TRUE
if the value is present.
Code:
#%in% operator
a = 0.2
list1 = c(TRUE, 0.1, "apple")
print (a %in% list1)
Output:
13
b. Colon operator : Prints a list of elements starting with the element before the colon to
the element after it.
Code:
#colon operator
a = (1:25)
print(a)
Output:
Output:
14
1.6 Decision making in R programming (Conditional statements) : The decision
making in R programming are as followed:
a. If statement: Keyword if tells compiler that this is a decision control instruction
and the condition following the keyword if is always enclosed within a pair of
parentheses. If the condition is TRUE the statement gets executed and if
condition is FALSE then statement does not get executed.
Code:
#IF statement
a = 58
b = 70
#TRUE condition
if (a>b)
{
c = a-b
print("condition a>b is TRUE")
print(paste("Difference between a,b is:", c))
}
#FALSE Condition
if (a<b)
{
c = a-b
print("condition a<b is TRUE")
print(paste("Difference between a,b is:", c))
}
Output:
15
b. If-else statement: it provides us with an optional else block which gets
executed if the condition for if block is false. If the condition provided to if block is
true then the statement within the if block gets executed, else the statement
within the else block gets executed.
Code:
#Ifelse statement
a = 58
b = 70
if (a>b)
{
c = a-b
print("condition a>b is TRUE")
print(paste("Difference between a,b is:", c))
} else
{
c = a-b
print("condition a<b is TRUE")
print(paste("Difference between a,b is:", c))
}
Output:
16
c. Nested if else statement: When we have an if-else block as an statement
within an if block or optionally within an else block, then it is called as nested if
else statement.
Code:
#Nested IF Statement
ifelse(test = 15>16,
yes = ifelse(test = 15>14,
yes = 'TRUE TWICE',
no = "YES, & NO"),
no = "No")
Output:
18
c. Repeat loop: It executes sequence of statements multiple times:
Code:
#repeat loop
a=3
repeat
{
print(a)
a=a+1
#checking stop condition
if(a>7)
{
#break statement to terminate loop
break
}
}
Output:
19
1.8 User defined Functions: In R, we can create our own functions, such
functions are known as user defined functions.
Code:
#CODE 1
# whether x is a multiple of 2
multipleof2 = function(x){
if(x %% 2 == 0)
else
20
print(multipleof2(4))
print(multipleof2(3))
Output:
Code:
#CODE 2
return(area)
# Case 1:
print(Rectangle(12, 13))
# Case 2:
# Case 3:
21
print(Rectangle())
Output:
1.9 R scripts for data frames: Data Frames in R Language are generic data objects of
R which are used to store the tabular data. Data frames can also be interpreted as
matrices where each column of a matrix can be of the different data types. Data
Frame is made up of three principal components, the data, rows, and columns.
a. Creating a data frame:
Code:
stringsAsFactors = FALSE
print(df1)
22
Output:
#using str()
stringsAsFactors = FALSE
print(str(df1))
Output:
23
c. Summary of data in data frame:
Code:
stringsAsFactors = FALSE
summary(df1)
Output:
#extract data
stringsAsFactors = FALSE
print(df1$Pulse)
Output:
#adding rows
stringsAsFactors = FALSE
print(New_row_DF)
Output:
25
Code 2(Adding Columns):
#adding columns
stringsAsFactors = FALSE
print(New_col_DF)
Output:
stringsAsFactors = FALSE
print(dim(df1))
Output:
stringsAsFactors = FALSE
print(nrow(df1))
print(ncol(df1))
Output:
27
Module 2- Learn how to organize and modify data in R using data
frames and dplyr
Program 2: Data manipulation functions present in DPLYR are:
#Filter
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
"Cameron", "Devon"),
print(d)
d%>%filter(is.na(ht))
d%>%filter(!is.na(ht))
Output:
28
2.2 Distinct: Removes duplicate rows in a data frame.
Code:
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
"Cameron", "Devon"),
print(distinct(d))
Output:
Code:
29
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
"Cameron", "Devon"),
print(d.name)
Output:
2.4 Select: The select method is used to extract the required columns as a table by
Code:
#Select
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
"Cameron", "Devon"),
select(d, starts_with("ht"))
30
select(d, -starts_with("ht"))
select(d, 1:2)
select(d, contains("n"))
select(d, matches("na"))
Output:
Code:
31
#Rename
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
"Cameron", "Devon"),
rename(d, height=ht)
Output:
2.6 -7 Mutate and Transmutate : Create new variables without dropping old ones is
Code:
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
mutate(d, x3=ht-age)
32
transmute(d, x3=ht+age)
Output:
Code:
#Summarize
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
"Cameron", "Devon"),
summarise(d, mean=mean(age))
summarise(d, med=min(age))
summarise(d, med=max(age))
summarise(d, med=sd(age))
Output:
33
2.8 Sample: Give the sample of the dataframe
Code:
library(dplyr)
d=data.frame(name=c("Abhinav", "Bharay",
"Cameron", "Devon"),
sample_n(d,3)
sample_frac(d,0.50)
Output:
34
Module 3- Data Manipulation in R with TIDYR package
Program 3
Code:
library(tidyr)
n = 10
tidydf = data.frame(
S.No = c(1:n),
print(tidydf)
35
Output:
3.1 Gather:
Code:
#Gather
library(tidyr)
n = 10
tidydf = data.frame(
S.No = c(1:n),
tall=tidydf %>%
gather(Group, Frequency,
36
Group.1:Group.3)
print(tall)
Output:
3.2 Separate:
Code:
37
#Separate
library(tidyr)
n = 10
tidydf = data.frame(
S.No = c(1:n),
tall=tidydf %>%
gather(Group, Frequency,
Group.1:Group.3)
sep=tall %>%
separate(Group, c("Allotment",
"Number"))
print(sep)
Output:
38
3.3 Unite:
Code:
library(tidyr)
n = 10
tidydf = data.frame(
S.No = c(1:n),
sep=tall %>%
separate(Group, c("Allotment",
"Number"))
uni=sep %>%
unite(Group, Allotment,
print(uni)
Output:
40
41
3.4 Spread:
Code:
#Spread
library(tidyr)
n = 10
tidydf = data.frame(
S.No = c(1:n),
uni=sep %>%
unite(Group, Allotment,
sp=uni %>%
spread(Group, Frequency)
print(sp)
Output:
42
Module 4- Basics of how to create visualisations using the popular R package
ggplot 2
Program 4
Code:
install.packages("dplyr")
library(dplyr)
summary(iris)
Output:
43
4.2 R script for data layers:
Code:
library(ggplot2)
library(dplyr)
ggplot(data = iris)
Output:
Code:
#aesthetic layer
Output:
44
4.4 R Script for Geometric Layer:
Code:
#geometric layer
Output:
45
4.5 R Script for Adding Size, Colour and Shape:
Code:
ggplot(data = iris,
Output:
46
Code:
ggplot(data = iris,
shape = factor(Species))) +
geom_point()
Output:
47
4.6 R script for Histogram Plot
Code:
# Histogram
x = "Sepal Length",
y = "Frequency")
Output:
48
4.7 R script for Facet Layer
Code:
ggplot(data = iris,
geom_point() +
facet_grid(Species ~ .) +
x = "Sepal Length",
49
y = "Petal Length")
Output:
Code:
#statistics layer
geom_point() +
50
Output:
Code:
#coordinates layer
geom_point() +
51
Output:
Code:
#coord_cartesian
geom_point() + geom_smooth() +
52
Output:
Code:
#theme layer
theme(plot.background = element_rect(
53
Output:
Module 5- Learn the basics of aggregate functions in R with dplyr, which let us
calculate quantities that describe groups of data.
Program 5
a. Display data:
Code:
library(dplyr)
"Marketing","HR"),
salary = c(34000,23000,41000,20000,35000,67000,87000))
54
print("Original Data frame")
print(employee_data)
Output:
5.1 R script to create with 4 columns and group with subjects and get the aggregates like
minimum, sum, and maximum.
Code:
library(dplyr)
"Marketing","HR"),
salary = c(34000,23000,41000,20000,35000,67000,87000))
Output:
5.2 R Script to create with 4 columns and group with subjects and get the average (mean).
Code:
library(dplyr)
"Marketing","HR"),
salary = c(34000,23000,41000,20000,35000,67000,87000))
Output:
56
57
Module 6- Basics of joining tables together in R with DPLYR
Program 6 There are different methods of joining data with the dplyr in the R programming
language.
Code:
a = data.frame(ID=c(10,11,12,13,14))
b = data.frame(ID=c(13,14,15,16,17))
inner_join(a, b, by="ID")
Output:
Code:
a = data.frame(ID=c(10,11,12,13,14))
b = data.frame(ID=c(13,14,15,16,17))
58
#performing left join
left_join(a,b, by = "ID")
Output:
Code:
a = data.frame(ID=c(10,11,12,13,14))
b = data.frame(ID=c(13,14,15,16,17))
right_join(a,b, by = "ID")
Output:
59
6.4 Using Full join:
Code:
a = data.frame(ID=c(10,11,12,13,14))
b = data.frame(ID=c(13,14,15,16,17))
full_join(a,b, by = "ID")
Output:
60
6.5 Using Semi join:
Code:
a = data.frame(ID=c(10,11,12,13,14))
b = data.frame(ID=c(13,14,15,16,17))
semi_join(a,b, by = "ID")
Output:
61
6.6 Using Anti join:
Code:
a = data.frame(ID=c(10,11,12,13,14))
b = data.frame(ID=c(13,14,15,16,17))
anti_join(b,a, by = "ID")
Output:
62
Module 7- Calculating mean, median and mode of real world data set using R.
Program 7
Code:
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)
print(head(myData))
Output:
3. Calculating mean:
Code:
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)
#calculating mean
63
mean = mean(myData$Age)
print(mean)
Output:
4. Calculating median:
Code:
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)
#calculating median
median = median(myData$Age)
print(median)
Output:
5. Calculating mode:
Code:
64
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)
#Calculate mode
mode = function(){
return(sort(-table(myData$Age))[1])
mode()
Output:
Code:
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)
print(tail(myData))
mode = function(){
return(sort(table(myData$Age))[2])
mode()
65
Output:
Code:
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)
library(modeest)
mode_a = mfv(myData$Age)
print(mode_a)
Output:
66
Module 8- Calculating the Variance and Standard deviation in R
Program 8
a. Calculating variance:
Code:
list=c(212,231,234,564,235)
print(var(list))
Output:
Code:
list=c(212,231,234,564,235)
print(sd(list))
Output:
head(iris)
sepal = iris$Sepal.Length
print(var(sepal))
print(sd(sepal))
Output:
68
Module 9- Learn how to calculate three important descriptive statistics-
Quartiles, Quantiles, and Interquartile range that describe the spread of the
data
Program 9
Quartile (0.25, 0.5, 0.75)
Code:
prob= iris$Sepal.Length
res1= quantile(prob, probs=c(0,0.25,0.5,0.75,1))
res1
Output:
OR
Code:
df<-data.frame(x=c(2,13,5,36,12,50),
y=c('a','b','c','c','c','b'))
res4<-quantile(df$x, probs=c(0,0.25,0.5,0.75,1))
res4
Output:
Showing Quartiles and Inter quartile Range using Data set- Cardio good fitness
Code:
69
myData = read.csv("C:\\Users\\91858\\Downloads\\CardioGoodFitness.csv",
stringsAsFactor=F)
values<-c(values<-c(myData$Age))
quantile(values,0.25)
values<-c(myData$Age)
quantile(values,0.5)
values<-c(myData$Age)
quantile(values,0.75)
values<-c(myData$Age)
IQR(values)
Output:
70
Module 10- Learn about the statistics used to run hypothesis tests and use
R to run different t-tests that compare distributions
Program 10
CODE:
library(ggplot2)
library(dplyr)
library(tidyr)
library(magrittr)
library(gridExtra)
library(e1071)
midwest
head(midwest)
tail(midwest)
summary(midwest)
skewness(midwest$area)
kurtosis(midwest$area)
x = (midwest$popdensity)
t.test(x,y= NULL,alternative = c(“two.sided”, “less”, “greater”),
paired = FALSE, var.equal = FALSE, conf.level = 0.95)
OUTPUT:
71
72
73
#Creating boxplot
Code:
#view first 6 rows of "airquality"dataset
head(airquality)
boxplot(airquality$Ozone)
boxplot(airquality,
data=airquality,
xlab = "month",
ylab = "degrees(f)",
col="steelblue",
border="black")
74
Output:
75