0% found this document useful (0 votes)
15 views36 pages

Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024

The document is a lab manual for a Data Science course using R, authored by Dr. P. Rajasekar. It includes a list of experiments covering basic mathematical functions, vector and matrix operations, data manipulation, and data visualization in R. Each experiment outlines aims, theoretical background, and practical assignments to enhance understanding of R programming.

Uploaded by

jkjai3113
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views36 pages

Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024

The document is a lab manual for a Data Science course using R, authored by Dr. P. Rajasekar. It includes a list of experiments covering basic mathematical functions, vector and matrix operations, data manipulation, and data visualization in R. Each experiment outlines aims, theoretical background, and practical assignments to enhance understanding of R programming.

Uploaded by

jkjai3113
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Lab Manual

Data Science using R

Dr.P.RAJASEKAR
Associate Professor,
School of Computing
SRMIST
List of Experiments

1. Basic Mathematical functions in R

2. Implementation of vector data objects operations

3. Implementation of matrix, array and factors in R

4. Implementation and use of data frames in R

5. Create Sample (Dummy) Data in R and perform data manipulation with R

6. Write a R program to take input from the user (name and age) and display the values.

Also print the version of R installation.

7. Write a R program to create a sequence of numbers from 20 to 50 and find the mean of

numbers from 20 to 60 and sum of numbers from 51 to 91.

8. Write a R program to create three vectors a,b,c with 3 integers. Combine the three

vectors to become a 3×3 matrix where each column represents a vector. Print the

content of the matrix.

9. Write a R program to concatenate two given matrixes of same column but different

rows.

10… Write a R program to create a data frame from four given vectors.

11. Write a R program to sort a given data frame by multiple column(s).

12. Write a R program to count the number of NA values in a data frame column.

13. Write a R program to create a simple bar plot of four subjects’ marks.

14. Write a R program to create a simple bar plot for ozone concentration in air with
“airquality” dataset.

15. Write a R program to create a histogram for maximum daily temperature for with
“airquality” dataset.

16. Write a R program to create a boxplot for the variable “wind” with “airquality” dataset.
Experiment No: 1
Aim: To perform the basic mathematical operations in R programming

Theory:

In R, the fundamental unit of share-able code is the package. A package bundles


together code, data, documentation, and tests and provides an easy method to share with
others1. As of May 2017 there were over 10,000 packages available on CRAN. This huge
variety of packages is one of the reasons that R is so successful: chances are that someone has
already solved a problem that you’re working on, and you can benefit from their work by
downloading their package.

Installing Packages
The most common place to get packages from is CRAN. To install packages from CRAN you
use install.packages("packagename"). For instance, if you want to install the ggplot2
package, which is a very popular visualization package you would type the following in the
console:
# install package from CRAN
install.packages("ggplot2")

Loading Packages
Once the package is downloaded to your computer you can access the functions and
resources provided by the package in two different ways:
# load the package to use in the current R session
library(packagename)

Getting Help on Packages


For more direct help on packages that are installed on your computer you can use the help
and vignette functions. Here we can get help on the ggplot2 package with the following:
help(package = "ggplot2") # provides details regarding contents of a package
vignette(package = "ggplot2") # list vignettes available for a specific package
vignette("ggplot2-specs") # view specific vignette
vignette() # view all vignettes on your computer

Assignment
The first operator you’ll run into is the assignment operator. The assignment operator is used
to assign a value. For instance we can assign the value 3 to the variable x using the <-
assignment operator.
# assignment
x <- 3
Interestingly, R actually allows for five assignment operators:
# leftward assignment
x <- value
x = value
x <<- value
# rightward assignment
value -> x
value ->> x
The original assignment operator in R was <- and has continued to be the preferred among R
users. The = assignment operator was added in 2001 primarily because it is the accepted
assignment operator in many other languages and beginners to R coming from other
languages were so prone to use it.
The operators <<- is normally only used in functions which we will not get into the details.

Evaluation
We can then evaluate the variable by simply typing x at the command line which will return
the value of x. Note that prior to the value returned you’ll see ## [1] in the command line.
This simply implies that the output returned is the first output. Note that you can type any
comments in your code by preceding the comment with the hash tag (#) symbol. Any values,
symbols, and texts following # will not be evaluated.
# evaluation
x
## [1] 3

Case Sensitivity
Lastly, note that R is a case sensitive programming language. Meaning all variables,
functions, and objects must be called by their exact spelling:

x <- 1
y <- 3
z <- 4
x*y*z
## [1] 12
x*Y*z
## Error in eval(expr, envir, enclos): object 'Y' not found

Basic Arithmetic
At its most basic function R can be used as a calculator. When applying basic arithmetic, the
PEMDAS order of operations applies: parentheses first followed by exponentiation,
multiplication and division, and final addition and subtraction.

8+9/5^2
## [1] 8.36

8 + 9 / (5 ^ 2)
## [1] 8.36
8 + (9 / 5) ^ 2
## [1] 11.24
(8 + 9) / 5 ^ 2
## [1] 0.68
By default R will display seven digits but this can be changed using options() as previously
outlined.
1/7
## [1] 0.1428571
options(digits = 3)
1/7
## [1] 0.143
pi
## [1] 3.141592654
options(digits = 22)
pi
## [1] 3.141592653589793115998
We can also perform integer divide (%/%) and modulo (%%) functions. The integer divide
function will give the integer part of a fraction while the modulo will provide the remainder.
42 / 4 # regular division
## [1] 10.5
42 %/% 4 # integer division
## [1] 10
42 %% 4 # modulo (remainder)
## [1] 2

Miscellaneous Mathematical Functions


There are many built-in functions to be aware of. These include but are not limited to the
following. Go ahead and run this code in your console.
x <- 10
abs(x) # absolute value
sqrt(x) # square root
exp(x) # exponential transformation
log(x) # logarithmic transformation
cos(x) # cosine and other trigonometric functions

Infinite, and NaN Numbers:


When performing undefined calculations, R will produce Inf (infinity) and NaN (not a
number) outputs.
1/0 # infinity
## [1] Inf
Inf - Inf # infinity minus infinity
## [1] NaN

The workspace environment will also list your user defined objects such as vectors, matrices,
data frames, lists, and functions. For example, if you type the following in your console:
x <- 2
y <- 3
You will now see x and y listed in your workspace environment. To identify or remove the
objects (i.e. vectors, data frames, user defined functions, etc.) in your current R environment:

# list all objects


ls()

# identify if an R object with a given name is present


exists("x")

# remove defined object from the environment


rm(x)

# you can remove multiple objects


rm(x, y)

# basically removes everything in the working environment -- use with caution!


rm(list = ls())

Result:

In this way we had understand the basics of R programming.


Experiment No: 2

Aim: Implementation of vector and List data objects operations

Theory:

With R, it’s Important that one understand that there is a difference between the actual
R object and the manner in which that R object is printed to the console. Often, the printed
output may have additional bells and whistles to make the output more friendly to the users.
However, these bells and whistles are not inherently part of the object
R has five basic or “atomic” classes of objects:
• character
• numeric (real numbers)
• integer
• complex
• logical (True/False)
The most basic type of R object is a vector. Empty vectors can be created with the
vector() function. There is really only one rule about vectors in R, which is that A vector can
only contain objects of the same class. But of course, like any good rule, there is an
exception, which is a list, which we will get to a bit later. A list is represented as a vector but
can contain objects of different classes. Indeed, that’s usually why we use them.
There is also a class for “raw” objects, but they are not commonly used directly in data
analysis

Creating Vectors

The c() function can be used to create vectors of objects by concatenating things together.

> x <- c(0.5, 0.6) ## numeric

> x <- c(TRUE, FALSE) ## logical

> x <- c(T, F) ## logical

> x <- c("a", "b", "c") ## character

> x <- 9:29 ## integer

> x <- c(1+0i, 2+4i) ## complex


Note that in the above example, T and F are short-hand ways to specify TRUE and FALSE.
However, in general one should try to use the explicit TRUE and FALSE values when
indicating logical values. The T and F values are primarily there for when you’re feeling lazy.

You can also use the vector() function to initialize vectors.

> x <- vector("numeric", length = 10)

>x

[1] 0 0 0 0 0 0 0 0 0 0

A vector is an object that contains a set of values called its elements.

Numeric vector

x <- c(1,2,3,4,5,6)

The operator <– is equivalent to "=" sign.

Character vector

State <- c("DL", "MU", "NY", "DL", "NY", "MU")

To calculate frequency for State vector, you can use table function.

To calculate mean for a vector, you can use mean function.

Since the above vector contains a NA (not available) value, the mean function returns NA.

To calculate mean for a vector excluding NA values, you can include na.rm = TRUE
parameter in mean function.
You can use subscripts to refer elements of a vector.

Convert a column "x" to numeric

data$x = as.numeric(data$x)

Some useful vectors can be created quickly with R. The colon operator is

used to generate integer sequences


> 1:10

[1] 1 2 3 4 5 6 7 8 9 10

> -3:4

[1] -3 -2 -1 0 1 2 3 4

> 9:5

[1] 9 8 7 6 5

More generally, the function seq() can generate any arithmetic progression.

> seq(from=2, to=6, by=0.4)

[1] 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 5.6 6.0

> seq(from=-1, to=1, length=6)

[1] -1.0 -0.6 -0.2 0.2 0.6 1.0

Sometimes it’s necessary to have repeated values, for which we use rep()

> rep(5,3)

[1] 5 5 5

> rep(2:5,each=3)

[1] 2 2 2 3 3 3 4 4 4 5 5 5

> rep(-1:3, length.out=10)

[1] -1 0 1 2 3 -1 0 1 2 3
We can also use R’s vectorization to create more interesting sequences:

> 2^(0:10)

[1] 1 2 4 8 16 32 64 128 256 512 1024

> 1:3 + rep(seq(from=0,by=10,to=30), each=3)

[1] 1 2 3 11 12 13 21 22 23 31 32 33

Lists:

A list allows you to store a variety of objects.

You can use subscripts to select the specific component of the list.
> x <- list(1:3, TRUE, "Hello", list(1:2, 5))

Here x has 4 elements: a numeric vector, a logical, a string and another list.

We can select an entry of x with double square brackets:

> x[[3]]

[1] "Hello"

To get a sub-list, use single brackets:

> x[c(1,3)]

[[1]]

[1] 1 2 3

[[2]]

[1] "Hello"

Notice the difference between x[[3]] and x[3].

We can also name some or all of the entries in our list, by supplying argument names to list():

> x <- list(y=1:3, TRUE, z="Hello")

>x

$y

[1] 1 2 3

[[2]]

[1] TRUE

$z

[1] "Hello"
Notice that the [[1]] has been replaced by $y, which gives us a clue as to

how we can recover the entries by their name. We can still use the numeric

position if we prefer:

> x$y

[1] 1 2 3

> x[[1]]
[1] 1 2 3

The function names() can be used to obtain a character vector of all the

names of objects in a list.

> names(x)

[1] "y" "" "z"

Result:

Thus, we have done Implementation of vector and list data objects operations using R.
Experiment No. 3

Aim: Implementation of various operations on matrix, array and factors in R

Theory:
Matrices are much used in statistics, and so play an important role in R. To create a matrix
use the function matrix(), specifying elements by column first:

> matrix(1:12, nrow=3, ncol=4)

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

This is called column-major order. Of course, we need only give one of the dimensions:

> matrix(1:12, nrow=3)

unless we want vector recycling to help us:

> matrix(1:3, nrow=3, ncol=4)

[,1] [,2] [,3] [,4]

[1,] 1 1 1 1

[2,] 2 2 2 2

[3,] 3 3 3 3

Sometimes it’s useful to specify the elements by row first

> matrix(1:12, nrow=3, byrow=TRUE)

There are special functions for constructing certain matrices:

> diag(3)

[,1] [,2] [,3]

[1,] 1 0 0

[2,] 0 1 0

[3,] 0 0 1
> diag(1:3)

[,1] [,2] [,3]

[1,] 1 0 0

[2,] 0 2 0

[3,] 0 0 3

> 1:5 %o% 1:5

[,1] [,2] [,3] [,4] [,5]

[1,] 1 2 3 4 5

[2,] 2 4 6 8 10 [3,]

3 6 9 12 15 [4,] 4

8 12 16 20 [5,] 5

10 15 20 25

The last operator performs an outer product, so it creates a matrix with (i, j)-th entry xiyj .
The function outer() generalizes this to any function f on two arguments, to create a matrix
with entries f(xi , yj ). (More on functions later.)

> outer(1:3, 1:4, "+")

[,1] [,2] [,3] [,4]

[1,] 2 3 4 5

[2,] 3 4 5 6

[3,] 4 5 6 7

Matrix multiplication is performed using the operator %*%, which is quite

distinct from scalar multiplication *.

> A <- matrix(c(1:8,10), 3, 3)

> x <- c(1,2,3)

> A %*% x # matrix multiplication

[,1]
[1,] 30

[2,] 36

[3,] 45

> A*x # NOT matrix multiplication

[,1] [,2] [,3]

[1,] 1 4 7

[2,] 4 10 16

[3,] 9 18 30

Standard functions exist for common mathematical operations on matrices

> t(A) # transpose

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 10

> det(A) # determinant

[1] -3

> diag(A) # diagonal

[1] 1 5 10

> solve(A) # inverse

[,1] [,2] [,3]

[1,] -0.6667 -0.6667 1

[2,] -1.3333 3.6667 -2

[3,] 1.0000 -2.0000 1

Array:

Of course, if we have a data set consisting of more than two pieces of categorical information
about each subject, then a matrix is not sufficient. The generalization of matrices to higher
dimensions is the array. Arrays are defined much like matrices, with a call to the array()
command. Here is a 2 × 3 × 3 array:

> arr = array(1:18, dim=c(2,3,3))

> arr

,,1

[,1] [,2] [,3]

[1,] 1 3 5

[2,] 2 4 6

,,2

[,1] [,2] [,3]

[1,] 7 9 11

[2,] 8 10 12

,,3

[,1] [,2] [,3]

[1,] 13 15 17

[2,] 14 16 18

Each 2-dimensional slice defined by the last co-ordinate of the array is shown as a 2 × 3 matrix.
Note that we no longer specify the number of rows and columns separately, but use a single
vector dim whose length is the number of dimensions. You can recover this vector with the
dim() function.

> dim(arr)

[1] 2 3 3

Note that a 2-dimensional array is identical to a matrix. Arrays can be

subsetted and modified in exactly the same way as a matrix, only using the

appropriate number of co-ordinates:

> arr[1,2,3]

[1] 15

> arr[,2,]
[,1] [,2] [,3]

[1,] 3 9 15

[2,] 4 10 16

> arr[1,1,] = c(0,-1,-2) # change some values

> arr[,,1,drop=FALSE]

,,1

[,1] [,2] [,3]

[1,] 0 3 5

[2,] 2 4 6

Factors

R has a special data structure to store categorical variables. It tells R that a variable is
nominal or ordinal by making it a factor.

Simplest form of the factor function:

Ideal form of the factor function:

The factor function has three parameters:


1. Vector Name
2. Values (Optional)
3. Value labels (Optional)
Convert a column "x" to factor

data$x = as.factor(data$x)

Result:

Thus, we have done Implementation of various operations on matrix, array and factors in R.
Experiment No. 4

Aim: Implementation and to perform the various operations on data frames in R


Theory:

A data frame is a table or a two-dimensional array-like structure in which each column


contains values of one variable and each row contains one set of values from each column.

• Data frames are tabular data objects.

• A Data frame is a list of vectors of equal length.

• Data frame in R is used for storing data tables.

Characteristics of a data frame:

1. The column names should be non-empty.

2. The row names should be unique.

3. The data stored in a data frame can be of numeric, factor or character type.

Create Data Frame

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-
11",
"2015-03-27")),
stringsAsFactors = FALSE
)# Print the data frame.
print(emp.data)

When we execute the above code, it produces the following result –


emp_id emp_name salary start_date
1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27
Get the Structure of the Data Frame

The structure of the data frame can be seen by using str() function.

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",


"2015-03-27")),
stringsAsFactors = FALSE
)
# Get the structure of the data frame.
str(emp.data)

When we execute the above code, it produces the following result –

'data.frame': 5 obs. of 4 variables:


$ emp_id : int 1 2 3 4 5
$ : chr "Rick" "Dan" "Michelle" "Ryan"
emp_name : num ...
$ start_date:
salary Date, format:
623 515"2012-01-01"
611 729 843 "2013-09-23" "2014-11-15" "2014-05-11" ...

Summary of Data in Data Frame

The statistical summary and nature of the data can be obtained by applying summary()
function.

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",


"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the summary.
print(summary(emp.data))
When we execute the above code, it produces the following result −

Min.emp_i emp_name
:1 Length:5 Min.salary
:515.2 start_date
Min. :2012-01-01
d
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27

Extract Data from Data Frame:

# Extract Specific columns.

result <- data.frame(emp.data$emp_name,emp.data$salary)

print(result)

When we execute the above code, it produces the following result −


emp.data.emp_name emp.data.salary
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00
4 Ryan 729.00
5 Gary 843.25

# Extract first two rows.

result <- emp.data[1:2,]

print(result)
When we execute the above code, it produces the following result −

emp_id emp_nam salary start_date


1 1 eRick 623.3 2012-01-01
2 2 Dan 515.2 2013-09-23

# Extract 3rd and 5th row with 2nd and 4th column.

result <- emp.data[c(3,5),c(2,4)]

print(result)

When we execute the above code, it produces the following result −

emp_name
start_date
3 Michelle 2014-11-15
5 Gary 2015-03-27

Expand Data Frame

A data frame can be expanded by adding columns and rows.0

1. Add Column

Just add the column vector using a new column name.


# Add the "dept" column.

emp.data$dept <- c("IT","Operations","IT","HR","Finance")

v <- emp.data

print(v)

When we execute the above code, it produces the following result –

emp_id emp_name salary start_date dept


1 Rick 623.30 2012-01-01 IT
2 Dan 515.20 2013-09-23 Operations
3 Michelle 611.00 2014-11-15 IT
4 Ryan 729.00 2014-05-11 HR
5 Gary 843.25 2015-03-27 Finance

2. Add Row
To add more rows permanently to an existing data frame, we need to bring in the new rows
in the same structure as the existing data frame and use the rbind() function.

In the example below we create a data frame with new rows and merge it with the existing
data frame to create the final data frame.

# Create the second data frame


emp.newdata <- data.frame(
emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE
)

# Bind the two data frames.


emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

Conclusion:

Thus, the Implementation and various operations on data frames are performed in R.
Experiment No. 5

Aim: To Create Sample (Dummy) Data in R and perform data manipulation with R

Theory:

This covers how to execute most frequently used data manipulation tasks with R. It includes
various examples with datasets and code. It gives you a quick look at several functions used
in R.

Drop data frame columns by name:

DF <- data.frame( x=1:10, y=10:1, z=rep(5,10), a=11:20 )

# for multiple

> drops <- c("x","z")

DF[ , !(names(DF) %in% drops)]

# OR

> keeps <- c("y", "a")

> DF[keeps]

> DF

Order function for sort:

d3=data.frame(roll=c(2,4,6,3,1,5),

name=c('a','b','c','d','e','e'),

marks=c(44,55,22,33,66,77))

> d3

d3[order(d3$roll),]

OR

d3[with(d3,order(roll)),]

Subsets: roll=c(1:5)
names=c(letters[1:5])
marks=c(12,33,44,55,66)
d4=data.frame(roll,names,marks)
sub1=subset(d4,marks>33 & roll>4)
sub1
sub1=sub1=subset(d4,marks>33 & roll>4,select = c(roll,names))
sub1

Drop factor levels in a subsetted data frame:

df <- data.frame(letters=letters[1:5], numbers=seq(1:5))


df levels(df$letters)
sub2=subset(df,numbers>3) sub2
levels(sub2$letters)
sub2$letters=factor(sub2$letters)
levels(sub2$letters)

Rename Columns in R
colnames(d)[colnames(d)==“roll"]=“ID“

Sorting a vector
x= sample(1:50)
x = sort(x, decreasing = TRUE)
The function sort() is used for sorting a 1 dimensional vector. It cannot be used for more than
1 dimensional vector.

Dealing with missing data

We assume mydata as a data frame which is already available.


Number of missing values in a variable
colSums(is.na(mydata))
Number of missing values in a row
rowSums(is.na(mydata))
List rows of data that have missing values
mydata[!complete.cases(mydata),]
Creating a new dataset without missing data
mydata1 <- na.omit(mydata)
Convert a value to missing
mydata[mydata$Q1==999,"Q1"] <- NA
Experiment No. 6

Write a R program to take input from the user (name and age) and display the values. Also
print the version of R installation.

R Programming Code :

name = readline(prompt="Input your name: ")


age = readline(prompt="Input your age: ")
print(paste("My name is",name, "and I am",age ,"years old."))
print(R.version.string)

Sample Output:

Input your name: Input


your age:
[1] "My name is and I am years old."
[1] "R version 3.4.4 (2018-03-15)"

Result:
Thus, the program is executed successfully.
Experiment No. 7

Write a R program to create a sequence of numbers from 20 to 50 and find the mean of
numbers from 20 to 60 and sum of numbers from 51 to 91.

R Programming Code :

print("Sequence of numbers from 20 to 50:")


print(seq(20,50))
print("Mean of numbers from 20 to 60:")
print(mean(20:60))
print("Sum of numbers from 51 to 91:")
print(sum(51:91))

Sample Output:

[1] "Sequence of numbers from 20 to 50:"


[1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
40 41 42 43 44
[26] 45 46 47 48 49 50
[1] "Mean of numbers from 20 to 60:"
[1] 40
[1] "Sum of numbers from 51 to 91:" [1] 2911

Result:
Thus, the program is executed successfully.
Experiment No. 8

Write a R program to create three vectors a,b,c with 3 integers. Combine the three vectors to
become a 3×3 matrix where each column represents a vector. Print the content of the matrix.

R Programming Code :

a<-
c(1,2,3)
b<-
c(4,5,6)
c<-
c(7,8,9)
m<-cbind(a,b,c)
print("Content of the said matrix:")
print(m)

Sample Output:

[1] "Content of the said matrix:" a b c


[1,] 1 4 7 [2,] 2
5 8 [3,] 3 6 9

Result:
Thus, the program is executed successfully.
Experiment No. 9

Write a R program to concatenate two given matrixes of same column but different rows.

R Programming Code :

x = matrix(1:12, ncol=3)
y = matrix(13:24, ncol=3)
print("Matrix-
1") print(x)
print("Matrix-
2") print(y)
result = dim(rbind(x,y))
print("After concatenating two given matrices:")
print(result)

Sample Output:

[1] "Matrix-1"
[,1] [,2] [,3] [1,] 1
5 9 [2,] 2 6 10 [3,] 3
7 11 [4,] 4 8 12 [1]
"Matrix-2"
[,1] [,2] [,3]
[1,] 13 17 21 [2,] 14 18
22 [3,] 15 19 23 [4,] 16
20 24
[1] "After concatenating two given matrices:" [1] 8 3

Result:
Thus, the program is executed successfully.
Experiment No. 10

Write a R program to create a data frame from four given vectors.

R Programming Code :

name = c('Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas')
score = c(12.5, 9, 16.5, 12, 9, 20, 14.5, 13.5, 8, 19)
attempts = c(1, 3, 2, 3, 2, 3, 1, 1, 2, 1)
qualify = c('yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes')
print("Original data frame:")
print(name)
print(score)
print(attempts)
print(qualify)
df = data.frame(name, score, attempts, qualify)
print(df)

Sample Output:

[1] "Original data frame:"


[1] "Anastasia" "Dima" "Katherine" "James" "Emily"
"Michael"
[7] "Matthew" "Laura" "Kevin" "Jonas"
[1] 12.5 9.0 16.5 12.0 9.0 20.0 14.5 13.5 8.0 19.0 [1] 1 3 2 3 2 3 1 1 2 1
[1] "yes" "no" "yes" "no" "no" "yes" "yes" "no" "no" "yes" name score attempts qualify
1 Anastasia 12.5 1 yes
2 Dima 9.0 3 no
3 Katherine 16.5 2 yes
4 James 12.0 3 no
5 Emily 9.0 2 no
6 Michael 20.0 3 yes
7 Matthew 14.5 1 yes
8 Laura 13.5 1 no
9 Kevin 8.0 2 no
10 Jonas 19.0 1 yes
Result:
Thus, the program is executed successfully.
Experiment No. 11

Write a R program to sort a given data frame by multiple column(s).

R Programming Code :

exam_data = data.frame(
name = c('Anastasia', 'Amsa', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew'),
score = c(12.5, 9, 16.5, 12, 9, 20, 14.5),
attempts = c(1, 3, 2, 3, 2, 3, 1),
qualify = c('yes', 'no', 'yes', 'no', 'no', 'yes', 'yes')
)
print("Original dataframe:")
print(exam_data)
print("dataframe after sorting 'name' and 'score' columns:")
exam_data = exam_data[with(exam_data, order(name, score)),
] print(exam_data)

Sample Output:
[1] "Original dataframe:"
name score attempts qualify
1 Anastasia 12.5 1 yes
2 Amsa 9.0 3 no
3 Katherine 16.5 2 yes
4 James 12.0 3 no
5 Emily 9.0 2 no
6 Michael 20.0 3 yes
7 Matthew 14.5 1 yes
[1] "dataframe after sorting 'name' and 'score' columns:"
name score attempts qualify
2 Amsa 9.0 3 no
1 Anastasia 12.5 1 yes
5 Emily 9.0 2 no
4 James 12.0 3 no
3 Katherine 16.5 2 yes
7 Matthew 14.5 1 yes
6 Michael 20.0 3 yes
Result:
Thus, the program is executed successfully.
Experiment No. 12

Write a R program to count the number of NA values in a data frame column.

R Programming Code :

exam_data = data.frame(
name = c('Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'),
score = c(12.5, 9, 16.5, 12, 9, 20, 14.5, 13.5, 8,
19), attempts = c(1, NA, 2, NA, 2, NA, 1, NA, 2,
1),
qualify = c('yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes')
)
print("Original dataframe:")
print(exam_data)
print("The number of NA values in attempts column:")
print(sum(is.na(exam_data$attempts)))

Sample Output:
[1] "Original dataframe:"
name score attempts qualify
1 Anastasia 12.5 1 yes
2 Dima 9.0 NA no
3 Katherine 16.5 2 yes
4 James 12.0 NA no
5 Emily 9.0 2 no
6 Michael 20.0 NA yes
7 Matthew 14.5 1 yes
8 Laura 13.5 NA no
9 Kevin 8.0 2 no
10 Jonas 19.0 1 yes
[1] "The number of NA values in attempts column:" [1] 4

Result:
Thus, the program is executed successfully.
Experiment No. 13

Bar Plot
There are two types of bar plots- horizontal and vertical which represent data points as horizontal
or vertical bars of certain lengths proportional to the value of the data item. They are generally
used for continuous and categorical variable plotting. By setting the horiz parameter to true and
false, we can get horizontal and vertical bar plots respectively.

13. Write a R program to create a simple bar plot of four subjects’ marks.
marks = c(70, 95, 80, 74)
barplot(marks,main = "Comparing marks of 5 subjects",
xlab = "Marks, ylab = "Subject",
names.arg = c("English", "Science", "Math.", "Hist."),
col = "darkred",
horiz = FALSE)
Output:
> marks = c(70, 95, 80, 74)
>barplot(marks,main = "Comparing marks of 5 subjects",
+ xlab = "Marks",
+ ylab = "Subject",
+ names.arg = c("English", "Science", "Math.", "Hist."),
+ col = "darkred",
+ horiz = FALSE)

Result:

Thus, the bar chart is created successfully.


Experiment No. 14

14. Write a R program to create a simple bar plot for ozone concentration in air with “airquality”
dataset.
# Horizontal Bar Plot for
# Ozone concentration in air
barplot(airquality$Ozone,
main = 'Ozone Concenteration in air',
xlab = 'ozone levels', horiz = TRUE)
# Vertical Bar Plot for
# Ozone concentration in air
barplot(airquality$Ozone, main = ‘Ozone Concenteration in air’,
xlab = ‘ozone levels’, col =’blue’, horiz = FALSE)

Result:

Thus, the bar chart is created successfully.


Experiment No. 15

A histogram is like a bar chart as it uses bars of varying height to represent data distribution.
However, in a histogram values are grouped into consecutive intervals called bins. In a
Histogram, continuous values are grouped and displayed in these bins whose size can be varied.
For a histogram, the parameter xlim can be used to specify the interval within which all values
are to be displayed.
Another parameter freq when set to TRUE denotes the frequency of the various values in the
histogram and when set to FALSE, the probability densities are represented on the y-axis such
that they are of the histogram adds up to one.
Histograms are used in the following scenarios:
• To verify an equal and symmetric distribution of the data.
• To identify deviations from expected values.
15. Write a R program to create a histogram for maximum daily temperature for with “airquality”
dataset.

hist(airquality$Temp, main ="La Guardia Airport's\


Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)

Result:

Thus, the histogram is created successfully.


Experiment No. 16

Box Plot
The statistical summary of the given data is presented graphically using a boxplot. A boxplot
depicts information like the minimum and maximum data point, the median value, first and third
quartile, and interquartile range.

Box Plots are used for:


• To give a comprehensive statistical description of the data through a visual cue.
• To identify the outlier points that do not lie in the inter-quartile range of data.

16. Write a R program to create a boxplot for the variable “wind” with “airquality” dataset.

# Box plot for average wind speed


data(airquality)

boxplot(airquality$Wind, main = "Average wind speed\


at La Guardia Airport",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "brown",
horizontal = TRUE, notch = TRUE)
# Multiple Box plots, each representing
# an Air Quality Parameter
boxplot(airquality[, 0:4],
main ='Box Plots for Air Quality Parameters')

Result:

Thus, the boxplot is created successfully.

You might also like