0% found this document useful (0 votes)
3 views191 pages

R-Programming Unit 1 & 2 Compiled

The document provides an introduction to R programming and its features, emphasizing its capabilities for statistical analysis and data visualization. It covers the basics of the R environment, including the RStudio interface, data types, data structures, and basic operations. Additionally, it explains how to create and manipulate variables, vectors, and functions within R.

Uploaded by

farawey903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views191 pages

R-Programming Unit 1 & 2 Compiled

The document provides an introduction to R programming and its features, emphasizing its capabilities for statistical analysis and data visualization. It covers the basics of the R environment, including the RStudio interface, data types, data structures, and basic operations. Additionally, it explains how to create and manipulate variables, vectors, and functions within R.

Uploaded by

farawey903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 191

Introduction to Analytics

and R-Programming
Introduction to R

 R is a programming language and software environment for


statistical analysis, graphics representation and reporting.
 R was created by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently
developed by the R Development Core Team.
 The core of R is an interpreted computer language which
allows branching and looping as well as modular programming
using functions.
 R allows integration with the procedures written in the C, C++,
.Net, Python or FORTRAN languages for efficiency.
Features of R

R is a programming language and software environment for statistical analysis,


graphic representation and reporting. The following are the important features of R:
 A well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
 Has an effective data handling and storage facility.
 Provides a suite of operators for calculations on arrays, lists, vectors and matrices.
 Provides a large, coherent and integrated collection of tools for data analysis.
 Provides graphical facilities for data analysis and display either directly at the
computer or printing at the papers.
 Before you can ask your computer to save some numbers,
you’ll need to know how to talk to it.
 That’s where R and RStudio come in. RStudio gives you a way
to talk to your computer. R gives you a language to speak in.
 To get started, open RStudio just as you would open any other
application on your computer.
 When you do, a window should appear in your screen like the
one shown here
R screen
The RStudio interface is simple. You type R code into the bottom line of the
RStudio console pane and then click Enter to run it.
The code you type is called a command, because it will command your
computer to do something for you. The line you type it into is called
the command line.
When you type a command at the prompt and hit Enter, your computer executes
the command and shows you the results. Then RStudio displays a fresh prompt
for your next command. For example, if you type 2 + 3 and hit Enter, RStudio
will display:
Did you notice that a [1] appears next to your result?
R is just letting you know that this line begins with the first value in your result.
Some commands return more than one value, and their results may fill up multiple
lines. For example, the command 100:150 returns 51 values; it creates a sequence of
integers from 100 to 150. Notice that new bracketed numbers appear at the start of
the second and third lines of output. These numbers just mean that the second line
begins with the 25th value in the result, and the third line begins with the 49th
value. You can mostly ignore the numbers that appear in brackets:
 If you type an incomplete command and press Enter, R will
display a + prompt, which means it is waiting for you to type the
rest of your command. Either finish the command or hit Escape to
start over:

 If you type a command that R doesn’t recognize, R will return an


error message. If you ever see an error message, don’t panic. R is
just telling you that your computer couldn’t understand or do
what you asked it to do. You can then try a different command at
the next prompt:
The R Console

 The R console is the most important tool for using R. The R


console is a tool that allows you to type commands into R and
see how the R system responds. The commands that you type
into the console are called expressions. A part of the R system
called the interpreter will read the expressions and respond with
a result or an error message. Sometimes, you can also enter an
expression into R through the menus.
 By default, R will display a greater-than sign (“>”) in the console
(at the beginning of a line, when nothing else is shown) when R
is waiting for you to enter a command into the console. R is
prompting you to type something, so this is called a prompt.
Everything in R is an object.
R has 6 basic data types. (In addition to the five listed below, there
is also raw which will not be discussed in this workshop.)
 character
 numeric (real or decimal)
 integer
 logical
 complex
Datastructures in R

R has many data structures. These


include
 vector
 list
 matrix
 data frame
 factors
Data Type Example Syntax
a<- TRUE
Logical TRUE, FALSE
class(a)
b<- 13.5
Numeric 12.3, 5, 999
class(b)
c<- 3L
Integer 2L, 34L, 0L
class(c)
d<- 2 + 5i
Complex 3 + 2i
class(d)
‘a' , '"good", e<- “23”
Character
"TRUE", '23.4' class(e)

 There are six data types of these atomic vectors, also termed as
six classes of vectors. The other R-Objects are built upon the
atomic vectors.
Basic Functions in R console

 Creating an R file
 Saving an R file
 Clearing the Console: The console can be cleared using the shortcut key “ctrl +
L“.

.
Execution of an R file:

There are several ways in which the execution of the commands that are available
in the R file is done.
 Using the run command: This “run” command can be executed using the GUI,
by pressing the run button there, or you can use the Shortcut key control +
enter.
 What does it do? It will execute the line in which the cursor is there.
 Using the source with echo command: This “source with echo” command
can be executed using the GUI, by pressing the source with echo button there, or
you can use the Shortcut key control + shift + enter.
 What does it do? It will print the commands also, along with the output you
are printing.
 Clearing the Environment: Variables on the R environment can be
cleared in two ways:
 Using rm() command: When you want to clear a single variable from
the R environment you can use the “rm()” command followed by the
variable you want to remove.
 Typingrm(variable) will delete the variable which you want to remove.
If you want to delete all the variables that are there in the
environment what you can do is you can use the “rm” with an
argument “list” is equal to “ls” followed by a parenthesis.
 Using the GUI: We can also clear all the variables in the environment
using the GUI in the environment pane by using the brush button.
Run command over Source
command:

 Run can be used to execute the selected lines of R code.


 Source with echo can be used to run the whole file.
 The advantage of using Run is, you can troubleshoot or debug
the program when something is not behaving according to your
expectations.
 The disadvantages of using run command are, it populates the
console and makes it messy unnecessarily.
Syntax of R program

 A program in R is made up of three things: Variables, Comments, and


Keywords. Variables are used to store the data, Comments are used to
improve code readability, and Keywords are reserved words that hold a
specific meaning to the compiler.
Variables

 Like most other languages, R lets you assign values to variables and refer to
them by name.
 In R, the assignment operator is <-. Usually, this is pronounced as “gets.” For
example, the statement:
x <- 1
 is usually read as “x gets 1.”)
 After you assign a value to a variable, the R interpreter will substitute that
value in place of the variable name when it evaluates an expression.
Comments in R

 Comments are a way to improve your code’s


readability and are only meant for the user so the
interpreter ignores it. Only single-line comments are
available in R which can be written by using # at the
beginning of the statement.
Keywords in R

Keywords are the words reserved by a program because they have a special
meaning thus a keyword can’t be used as a variable name, function name, etc.
We can view these keywords by using either help(reserved) or ?reserved.
 Here’s a simple example:
> x <- 1
> y <- 2
> z <- c(x,y)
>z
[1] 1 2
Notice that the substitution is done at the time that the value is
assigned to z, not at the time that z is evaluated.
Suppose that you were to type in the preceding three expressions
and then change the value of y. The value of z would not change:
> y <- 4
>z
[1] 1 2

But try and see if you assign a new value to z, does the previous
one shows up?
 R provides several different ways to refer to a member (or set of
members) of a vector.
 You can refer to elements by location in a vector:
> b <- c(1,2,3,4,5,6,7,8,9,10,11,12)
>b
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> b[7]
[1] 7
> b[1:6]
[1] 1 2 3 4 5 6
> b[b %% 3 == 0]
[1] 3 6 9 12
Exercise

That’s the basic interface for executing R code in RStudio. Try doing these
simple tasks. If you execute everything correctly, you should end up with
the same number that you started with:
 Choose any number and add 2 to it.
 Multiply the result by 3.
 Subtract 6 from the answer.
 Divide what you get by 3.
The variables can be assigned values using leftward, rightward and equal
to operator. The values of the variables can be printed using print() or
cat()function.

# Assignment using equal operator.


var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")
# Assignment using rightward operator.
c(TRUE,1) -> var.3
print(var.1)
cat ("var.1 is ", var.1 ,"\n")
cat ("var.2 is ", var.2 ,"\n")
cat ("var.3 is ", var.3 ,"\n")
Deleting Variables

 Variables can be deleted by using the rm() function. Let us delete the
variable var.3.
 On printing the value of the variable error is thrown.
Basic Operations in R
 When you perform an operation on two vectors, R will match the elements of
the two vectors pairwise and return a vector. For example:

> c(1, 2, 3, 4) + c(10, 20, 30, 40)


[1] 11 22 33 44

> c(1, 2, 3, 4) * c(10, 20, 30, 40)


[1] 10 40 90 160

> c(1, 2, 3, 4) - c(1, 1, 1, 1)


[1] 0 1 2 3
If the two vectors aren’t the same size, R will repeat the smaller sequence
multiple times:
> c(1, 2, 3, 4) + 1
[1] 2 3 4 5
> 1 / c(1, 2, 3, 4, 5)
[1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000
> c(1, 2, 3, 4) + c(10, 100)
[1] 11 102 13 104
> c(1, 2, 3, 4, 5) + c(10, 100)
[1] 11 102 13 104 15
Warning message: In c(1, 2, 3, 4, 5) + c(10, 100) : longer object length is
not a multiple of shorter object length Note the warning if the second
sequence isn’t a multiple of the first.
 In R, the operations that do all of the work are called functions. We’ve
already used a few functions above (you can’t do anything interesting in R
without them).
 Functions are just like what you remember from math class. Most functions
are in the following form: f(argument1, argument2, ...) Where f is the name
of the function, and argument1, argument2, . . . are the arguments to the
function.
 Here are a few more examples:
> exp(1)
[1] 2.718282
> cos(3.141593)
[1] -1
> log2(1)
 Many functions require more than one argument. You can
specify the arguments by name:
> log(x=64, base=4)
[1] 3
Or, if you give the arguments in the default order, you can omit the
names:
> log(64,4)
[1] 3
R operators
Vectors

 When you want to create vector with more than one element,
you should use c() function which means to combine the
elements into a vector.
apple <- c('red','green',"yellow")
apple
class(apple)
 When we execute the above code, it produces the following
result: [1] "red" "green" "yellow"
 [1] "character"
Types of vectors

 Vectors are of different types which are used in R. Following are some
of the types of vectors:
 Numeric vectors: Numeric vectors are those which contain numeric
values such as integer, float, etc.
 Character vectors: Character vectors contain alphanumeric values
and special characters.
 Logical vectors: Logical vectors contain boolean values such as
TRUE, FALSE and NA for Null values.
Creating a vector
# we can use the c function
# to combine the values as a vector.
# By default the type will be double
X <- c(61, 4, 21, 67, 89, 2)
cat('using c function', X, '\n')
 There are
different ways of # seq() function for creating
creating vectors. # a sequence of continuous values.
# length.out defines the length of
Generally, we use
vector.
‘c’ to combine Y <- seq(1, 10, length.out = 5)
different elements cat('using seq() function', Y, '\n')
together.
# use':' to create a vector
# of continuous values.
Z <- 2:7
cat('using colon', Z)
Accessing vector elements
# R program to access elements of a Vector

 Accessing elements in # accessing elements with an index number.


X <- c(2, 5, 18, 1, 12)
a vector is the process
cat('Using Subscript operator', X[2], '\
of performing n')
operation on an
individual element of a # by passing a range of values
vector. There are many # inside the vector index.
Y <- c(4, 8, 2, 1, 17)
ways through which we
cat('Using combine() function', Y[c(4,
can access the 1)], '\n')
elements of the vector.
The most common is # using logical expressions
using the ‘[]’, symbol. Z <- c(5, 2, 1, 4, 4, 3)
cat('Using Logical indexing', Z[Z>4])
Modifying a vector

 Modification of a Vector is # Creating a vector


X <- c(2, 7, 9, 7, 8, 2)
the process of applying
some operation on an # modify a specific element
individual element of a X[3] <- 1
vector to change its value X[2] <-9
in the vector. There are cat('subscript operator', X, '\n')
different ways through # Modify using different logics.
which we can modify a X[X>5] <- 0
vector: cat('Logical indexing', X, '\n')
Deleting a vector

 Deletion of a Vector is the process of deleting all of


the elements of the vector. This can be done by
assigning it to a NULL value. (Deleting through rm()
function can also be adone)
# Creating Vector
M <- c(8, 10, 2, 5)

# set NULL to the vector


M <- NULL
cat('Output vector', M)
Sorting elements of a Vector

sort() function is used with the # Creation of Vector


X <- c(8, 2, 7, 1, 11, 2)
help of which we can sort the
values in ascending or descending # Sort in ascending order
order. A <- sort(X)
cat('ascending order', A, '\n')

# sort in descending order


# by setting decreasing as TRUE
B <- sort(X, decreasing = TRUE)
cat('descending order', B)
Lists

 A list in R is a generic object consisting of an ordered


collection of objects. Lists are one-dimensional,
heterogeneous data structures. The list can be a list of
vectors, a list of matrices, a list of characters and a list
of functions, and so on.
 A list contains many different types of elements inside
it like vectors, functions and even another list inside it.
list1 <- list(c(2,5,3),21.3,sin)
# The first attributes is a numeric vector
# containing the employee IDs which is
created
# using the command here
Creating a List empId = c(1, 2, 3, 4)
# The second attribute is the employee name
# which is created using this line of code
 To create a List in R you here
need to use the function # which is the character vector
called “list()”. In other empName = c("Debi", "Sandeep", "Subham",
words, a list is a generic "Shiba")
vector containing other # The third attribute is the number of
objects. To illustrate how a employees
list looks, we take an # which is a single numeric variable.
example here. We want to numberOfEmp = 4
build a list of employees
with the details. So for this, # We can combine all these three different
we want attributes such as # data types into a list
ID, employee name, and # containing the details of employees
the number of employees. # which can be done using a list command
empList = list(empId, empName, numberOfEmp)
Accessing components of a list
# Creating a list by naming all its
components
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep",
We can access components "Subham", "Shiba")
of a list in two ways. numberOfEmp = 4
 empList = list(
Access components by “empID" = empId,
names: All the “empNames" = empName,
components of a list can "Total Staff" = numberOfEmp
be named and we can use )
those names to access the print(empList)
components of the list
# Accessing components by names
using the dollar command. cat("Accessing name components using
$ command\n")
print(empList$Names)
Access components by indices:
# Creating a list by naming all its components
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
 We can also access the "ID" = empId,
"Names" = empName,
components of the list using "Total Staff" = numberOfEmp
indices. To access the top-level )
print(empList)
components of a list we have
to use a double slicing # Accessing a top level components by indices
cat("Accessing name components using indices\n")
operator “[[ ]]” which is two print(empList[[2]])
square brackets and if we want
# Accessing a inner level components by indices
to access the lower or inner cat("Accessing Sandeep from name using indices\
level components of a list we n")
print(empList[[2]][2])
have to use another square
bracket “[ ]” along with the # Accessing another inner level components by
indices
double slicing operator “[[ ]]“. cat("Accessing 4 from ID using indices\n")
Modifying components of a list
# Creating a list by naming all its components
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
 empList = list(
A list can also be "ID" = empId,
modified by accessing "Names" = empName,
"Total Staff" = numberOfEmp
the components and )
replacing them with the cat("Before modifying the list\n")
print(empList)
ones which you want.
# Modifying the top-level component
empList$`Total Staff` = 5

# Modifying inner level component


empList[[1]][5] = 5
empList[[2]][5] = "Kamala"

cat("After modified the list\n")


print(empList)
empId = c(1, 2, 3, 4)

Concatenation of lists empName = c("Debi", "Sandeep", "Subham",


"Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
Two lists can be "Names" = empName,
"Total Staff" = numberOfEmp
concatenated using the )
concatenation function. cat("Before concatenation of the new list\
n")
So, when we want to print(empList)
concatenate two lists we
# Creating another list
have to use the empAge = c(34, 23, 18, 45)
concatenation operator. empAgeList = list(
"Age" = empAge
Syntax: )
list = c(list, list1)
# Concatenation of list using concatenation
list = the original list operator
list1 = the new list empList = c(empList, empAgeList)

cat("After concatenation of the new list\n")


Deleting components of a list
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham",
"Shiba")
numberOfEmp = 4
 To delete components empList = list(
"ID" = empId,
of a list, first of all, we "Names" = empName,
need to access those "Total Staff" = numberOfEmp
components and then )
cat("Before deletion the list is\n")
insert a negative sign print(empList)
before those
components. It # Deleting a top level components
cat("After Deleting Total staff components\
indicates that we had n")
to delete that print(empList[-3])
component.
# Deleting a inner level components
cat("After Deleting sandeep from name\n")
Merging list

# Create two lists.


 lst1 <- list(1,2,3)
We can merge the list by
lst2 <-
placing all the lists into a
list("Sun","Mon","Tue")
single list.
# Merge the two lists.
new_list <- c(lst1,lst2)

# Print the merged list.


print(new_list)
Converting List to Vector

 Here we are going to convert the list to vector, for


this we will create a list first and then unlist the list
into the vector.
# Create lists.
lst <- list(1:5)
print(lst)

# Convert the lists to vectors.


vec <- unlist(lst)

print(vec)
R List to matrix # Defining list
lst1 <- list(list(1, 2, 3),
list(4, 5, 6))

# Print list
 We will create cat("The list is:\n")
matrices using print(lst1)
matrix() function in cat("Class:", class(lst1), "\n")
R programming.
# Convert list to matrix
Another function mat <- matrix(unlist(lst1), nrow
that will be used is = 2, byrow = TRUE)
unlist() function to
convert the lists into # Print matrix
a vector. cat("\nAfter conversion to
matrix:\n")
print(mat)
cat("Class:", class(mat), "\n")
Matrices

 Matrix is a rectangular arrangement of numbers in


rows and columns. In a matrix, as we know rows are
the ones that run horizontally and columns are the
ones that run vertically. In R programming, matrices
are two-dimensional, homogeneous data structures.
These are some examples of matrices:
 To create a matrix in R you need to use the function
called matrix(). The arguments to this matrix() are
the set of elements in the vector. You have to pass
how many numbers of rows and how many numbers
of columns you want to have in your matrix.
M = matrix( c('a','a','b','c','b','a'), nrow=2,ncol=3,byrow
= TRUE)
print(M)
# R program to create a matrix # Naming rows
A = matrix( rownames(A) = c("a", "b",
"c")
# Taking sequence of elements
c(1, 2, 3, 4, 5, 6, 7, 8, 9), # Naming columns
colnames(A) = c("c", "d",
# No of rows "e")
nrow = 3,
cat("The 3x3 matrix:\n")
# No of columns print(A)
ncol = 3,

# By default matrices are in column-wise


order
# So this parameter decides how to
arrange the matrix
byrow = TRUE
)
Creating special matrices

R allows creation of various different types of matrices with the use


of arguments passed to the matrix() function.

 Matrix where all rows and columns are filled by a single


constant ‘k’:
To create such a matrix the syntax is given below: to illustrate
# R program
Syntax: matrix(k, m, n)
# special matrices
Parameters:
k: the constant # Matrix having 3 rows and
m: no of rows 3 columns
n: no of columns # filled by a single
constant 5
print(matrix(5, 3, 3))
Diagonal matrix:


A diagonal matrix is a matrix in which the entries outside the main
diagonal are all zero. To create such a matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters: # R program to illustrate
k: the constants/array # special matrices
m: no of rows
n: no of columns # Diagonal matrix having 3 rows
and 3 columns
# filled by array of elements
(5, 3, 3)
print(diag(c(5, 3, 3), 3, 3))
Identity matrix:

 A square matrix in which all the elements of the


principal diagonal are ones and all other elements are
zeros. To create such a matrix the syntax is given
below:
# R program to illustrate
Syntax: diag(k, m, n) # special matrices
Parameters:
k: 1 # Identity matrix having
m: no of rows # 3 rows and 3 columns
n: no of columns print(diag(1, 3, 3))
Matrix metrics

Matrix metrics mean once a matrix is created then

 How can you know the dimension of the matrix?


 How can you know how many rows are there in the matrix?
 How many columns are in the matrix?
 How many elements are there in the matrix?
# R program to illustrate
# matrix metrics

# Create a 3x3 matrix cat("Number of rows:\n")


A = matrix( print(nrow(A))
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3, cat("Number of columns:\n")
ncol = 3, print(ncol(A))
byrow = TRUE
) cat("Number of elements:\n")
cat("The 3x3 matrix:\n") print(length(A))
print(A) # OR
print(prod(dim(A)))
cat("Dimension of the matrix:\n")
print(dim(A))
Accessing elements of a Matrix

 We can access elements in the matrices using


thematrix name followed by a square bracket with a
comma in between array. Value before the comma is
used to access rows and value that is after the
comma is used to access columns. Let’s illustrate this
by taking a simple R code.
Accessing rows: # R program to illustrate
# access rows in metrics

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

# Accessing first and second row


cat("Accessing first and second
row\n")
print(A[1:2, ])
# R program to illustrate
Accessing columns:
# access columns in metrics

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

# Accessing first and second column


cat("Accessing first and second
column\n")
print(A[, 1:2])
Accessing # R program to illustrate
elements of a # access an entry in metrics

matrix: # Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8,
9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

# Accessing 2
print(A[1, 2])

# Accessing 6
print(A[2, 3])
# R program to illustrate
# access submatrices in a
Accessing Submatrices: matrix

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8,
 We can access submatrix in a 9),
matrix using nrow = 3,
the colon(:) operator. ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

cat("Accessing the first


three rows and the first two
columns\n")
print(A[1:3, 1:2])
Modifying elements of a
Matrix # Create a 3x3
A = matrix(
matrix

c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
 ncol = 3,
In R you can modify the
byrow = TRUE
elements of the matrices )
by a direct assignment. cat("The 3x3 matrix:\n")
print(A)

# Editing the 3rd rows and 3rd


column element
# from 9 to 30
# by direct assignments
A[3, 3] = 30

cat("After edited the matrix\n")


print(A)
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
Matrix nrow = 3,
ncol = 3,
Concatenation )
byrow = TRUE

cat("The 3x3 matrix:\n")


print(A)

# Creating another 1x3 matrix


Matrix concatenation B = matrix(
refers to the merging of c(10, 11, 12),
nrow = 1,
rows or columns of an ncol = 3
existing matrix. )
 cat("The 1x3 matrix:\n")
Concatenation of a print(B)
row: The
concatenation of a row # Add a new row using rbind()
C = rbind(A, B)
to a matrix is done
using rbind(). cat("After concatenation of a row:\n")
print(C)
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,

Concatenation of ncol = 3,
byrow = TRUE

a column: )
cat("The 3x3 matrix:\n")
print(A)

# Creating another 3x1 matrix


 B = matrix(
The concatenation c(10, 11, 12),
of a column to a nrow = 3,
matrix is done ncol = 1,
byrow = TRUE
using cbind(). )
cat("The 3x1 matrix:\n")
print(B)

# Add a new column using cbind()


C = cbind(A, B)

cat("After concatenation of a column:\


n")
print(C)
# Create a 3x3 matrix
A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow =
3, ncol = 3, byrow = TRUE)
cat("The 3x3 matrix:\n")
print(A)

# Creating another 1x3 matrix


 Dimension B = matrix(
inconsistency: Note c(10, 11, 12),
nrow = 1,
that you have to make ncol = 3,
sure the consistency )
of dimensions cat("The 1x3 matrix:\n")
print(B)
between the matrix
before you do this # This will give an error
matrix concatenation. # because of dimension inconsistency
C = cbind(A, B)

cat("After concatenation of a column:\n")


print(C)
Deleting rows and columns of a
Matrix # Create a 3x3 matrix
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
 To delete a row or a ncol = 3,
column, first of all, you byrow = TRUE
)
need to access that row or cat("Before deleting the 2nd row\
column and then insert a n")
negative sign before that print(A)
row or column. It indicates
that you had to delete that # 2nd-row deletion
A = A[-2, ]
row or column.
 Row deletion: cat("After deleted the 2nd row\n")
print(A)
Column deletion:
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("Before deleting the 2nd
column\n")
print(A)

# 2nd-row deletion
A = A[, -2]

cat("After deleted the 2nd column\


n")
print(A)
 make a alphabet 3*3 matrix starting from a to i
 check the dimensions, no of rows, no of cols, length
 digonal matrix with 5 as your constant
 5 and 3 as constant
 1*3 alphabet ,matrix
 3*1 next alphabets
 Combine them
Operations on Matrices

 There are four basic operations i.e. DMAS (Division,


Multiplication, Addition, Subtraction) that can be done with
matrices. Both the matrices involved in the operation should
have the same number of rows and columns.
 Order of a Matrix : The order of a matrix is defined in terms of
its number of rows and columns.
Order of a matrix = No. of rows × No. of columns
Therefore Matrix [M] is a matrix of order 3 × 3.
# Creating 1st Matrix
Matrices Addition B = matrix(c(1, 2, 3, 4, 5, 6), nrow
= 2, ncol = 3)

# Creating 2nd Matrix


 The addition of two same ordered C = matrix(c(7, 8, 9, 10, 11, 12),
matrices Mr*c and Nr*c yields a nrow = 2, ncol = 3)
matrix Rr*c where every element is
the sum of corresponding # Getting number of rows and columns
elements of the input matrices. num_of_rows = nrow(B)
num_of_cols = ncol(B)
 In the code, nrow(B) gives the
number of rows in B and ncol(B) # Creating matrix to store results
gives the number of columns. sum = matrix(, nrow = num_of_rows,
Here, sum is an empty matrix of ncol = num_of_cols)
the same size as B and C. The
elements of sum are the addition # Printing Original matrices
of the corresponding elements of B print(B)
and C through nested for loops print(C)
Using ‘+’ operator for matrix
addition:

# R program for matrix addition


# using '+' operator

# Creating 1st Matrix


B = matrix(c(1, 2 + 3i, 5.4, 3, 4, 5), nrow = 2, ncol
= 3)

# Creating 2nd Matrix


C = matrix(c(2, 0i, 0.1, 3, 4, 5), nrow = 2, ncol = 3)

# Printing the resultant matrix


print(B + C)
Properties of Matrix Addition:

 Commutative: B + C = C + B
 Associative: For n number of matrices A + (B + C) =
(A + B) + C
 Order of the matrices involved must be same.
# R program to add two matrices

# Creating 1st Matrix


Matrix Subtraction B = matrix(c(1, 2, 3, 4, 5, 6), nrow =
2, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(7, 8, 9, 10, 11, 12), nrow
 The subtraction of two = 2, ncol = 3)
same ordered matrices
# Getting number of rows and columns
Mr*c and Nr*c yields a
num_of_rows = nrow(B)
matrix Rr*c where every num_of_cols = ncol(B)
element is the
difference of # Creating matrix to store results
diff = matrix(B-C, nrow = num_of_rows,
corresponding ncol = num_of_cols)
elements of the
second input matrix # Printing Original matrices
from the first. print(B)
print(C)
Using ‘-‘ operator for matrix
subtraction:

# R program for matrix addition


# using '-' operator

# Creating 1st Matrix


B = matrix(c(1, 2 + 3i, 5.4, 3, 4, 5), nrow = 2, ncol
= 3)

# Creating 2nd Matrix


C = matrix(c(2, 0i, 0.1, 3, 4, 5), nrow = 2, ncol = 3)

# Printing the resultant matrix


print(B - C)
Properties of Matrix Subtraction:

 Non-Commutative: B – C != C – B
 Non-Associative: For n number of matrices A – (B –
C) != (A – B) – C
 Order of the matrices involved must be same.
# R program to multiply two matrices

# Creating 1st Matrix


Matrices Multiplication B = matrix(c(1, 2, 3, 4, 5, 6), nrow =
2, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(7, 8, 9, 10, 11, 12),
 The multiplication nrow = 2, ncol = 3)
of two same
# Getting number of rows and columns
ordered matrices
num_of_rows = nrow(B)
Mr*c and Nr*c yields a num_of_cols = ncol(B)
matrix Rr*c where
# Creating matrix to store results
every element is
prod = matrix(, nrow = num_of_rows,
the product of ncol = num_of_cols)
corresponding
elements of the # Printing Original matrices
input matrices. print(B)
print(C)
Using ‘*’ operator for matrix
multiplication:

# R program for matrix multiplication


# using '*' operator

# Creating 1st Matrix


B = matrix(c(1, 2 + 3i, 5.4), nrow = 1, ncol
= 3)

# Creating 2nd Matrix


C = matrix(c(2, 1i, 0.1), nrow = 1, ncol = 3)

# Printing the resultant matrix


print (B * C)
Properties of Matrix
Multiplication:

 Commutative: B * C = C * B
 Associative: For n number of matrices A * (B * C) =
(A * B) * C
 Order of the matrices involved must be same.
# R program to divide two matrices

# Creating 1st Matrix


Matrices Division B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2,
ncol = 3)

# Creating 2nd Matrix


C = matrix(c(7, 8, 9, 10, 11, 12), nrow = 2,
 The division of two ncol = 3)
same ordered
matrices Mr*c and Nr*c # Getting number of rows and columns
num_of_rows = nrow(B)
yields a matrix Rr*c
num_of_cols = ncol(B)
where every element
is the quotient of # Creating matrix to store results
corresponding div = matrix(B/C, nrow = num_of_rows, ncol =
elements of the the num_of_cols)
first matrix element # Printing Original matrices
divided by the second. print(B)
print(C)
Using ‘/’ operator for matrix
division:
# R program for matrix division
# using '/' operator

# Creating 1st Matrix


B = matrix(c(4, 6i, -1), nrow = 1, ncol =
3)

# Creating 2nd Matrix


C = matrix(c(2, 2i, 0), nrow = 1, ncol =
3)

# Printing the resultant matrix


print (B / C)
Properties of Matrix Division:

 Non-Commutative: B / C != C / B
 Non-Associative: For n number of matrices A / (B /
C) != (A / B) / C
 Order of the matrices involved must be same.
Arrays

 Arrays are essential data storage structures defined by a fixed number of dimensions.
Arrays are used for the allocation of space at contiguous memory locations. Uni-
dimensional arrays are called vectors with the length being their only dimension. Two-
dimensional arrays are called matrices, consisting of fixed numbers of rows and
columns. Arrays consist of all elements of the same data type. Vectors are supplied as
input to the function and then create an array based on the number of dimensions.
 While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required
number of dimension. In the below example we create an array with two elements
which are 3x3 matrices each.

a <- array(c('green','yellow'),dim=c(3,3,2))
print(a)
Creating an Array

An array in R can be created with the use of array() function. List


of elements is passed to the array() functions along with the
dimensions as required.
Syntax:
array(data, dim = (nrow, ncol, nmat), dimnames=names)
where, nrow : Number of rows; ncol : Number of columns
nmat : Number of matrices of dimensions nrow * ncol
dimnames : Default value = NULL.
Uni-Dimensional Array
A vector is a uni-dimensional array, which is specified by
a single dimension, length. A Vector can be created using
‘c()‘ function. A list of values is passed to the c() function
to create a vector. vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print (vec1)

# cat is used to concatenate


# strings and print it.
cat ("Length of vector : ",
length(vec1))
Multi-Dimensional Array

 A two-dimensional matrix is an array specified by a


fixed number of rows and columns, each containing
the same data type. A matrix is created by
using array() function to which the values and the
dimensions are passed.
# arranges data from 2 to 13
# in two matrices of dimensions 2x3
arr = array(2:13, dim = c(2, 3, 2))
print(arr)
Vectors of different lengths
can also be fed as input into vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
the array() function. vec2 <- c(10, 11, 12)
However, the total number
of elements in all the # elements are combined into a
vectors combined should be single vector,
equal to the number of # vec1 elements followed by vec2
elements in the matrices. elements.
The elements are arranged arr = array(c(vec1, vec2), dim = c(2,
in the order in which they 3, 2))
are specified in the print (arr)
function.
row_names <- c("row1", "row2")
col_names <- c("col1", "col2",
Naming of Arrays"col3")
mat_names <- c("Mat1", "Mat2")

# the naming of the various


 The row names, column elements
names and matrices # is specified in a list and
names are specified as a # fed to the function
vector of the number of arr = array(2:14, dim = c(2, 3,
rows, number of columns 2),
and number of matrices dimnames =
respectively. By default, list(row_names,
the rows, columns and col_na
matrices are named by mes, mat_names))
their index values. print (arr)
Accessing arrays

 The arrays can be accessed by using indices for different


dimensions separated by commas. Different components can be
specified by any combination of elements’ names or positions.
 Accessing Uni-Dimensional Array: The elements can be
accessed by using indexes of the corresponding elements.
vec <- c(1:10)

# accessing entire vector


cat ("Vector is : ", vec)

# accessing elements
cat ("Third element of vector is : ",
vec[3])
Accessing entire matrices
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
mat_names <- c("Mat1", "Mat2")
arr = array(c(vec1, vec2), dim = c(2, 3, 2),
dimnames = list(row_names,
col_names,
mat_names))

# accessing matrix 1 by index value


print ("Matrix 1")
print (arr[,,1])

# accessing matrix 2 by its name


print ("Matrix 2")
print(arr[,,"Mat2"])
Accessing specific rows and
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
columns of matrices
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
 mat_names <- c("Mat1", "Mat2")
Rows and
arr = array(c(vec1, vec2), dim = c(2, 3,
columns can 2),
also be accessed dimnames = list(row_names,
by both names col_names,
as well as mat_names))
indices.
# accessing matrix 1 by index value
print ("1st column of matrix 1")
print (arr[, 1, 1])

# accessing matrix 2 by its name


print ("2nd row of matrix 2")
print(arr["row2",,"Mat2"])
Accessing elements individually
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
 Elements can col_names <- c("col1", "col2", "col3")
be accessed mat_names <- c("Mat1", "Mat2")
arr = array(c(vec1, vec2), dim = c(2, 3, 2),
by using both dimnames = list(row_names, col_names,
the row and mat_names))
column
numbers or # accessing matrix 1 by index value
names. print ("2nd row 3rd column matrix 1 element")
print (arr[2, "col3", 1])

# accessing matrix 2 by its name


print ("2nd row 1st column element of matrix 2")
print(arr["row2", "col1", "Mat2"])
Accessing subset of array
elements
row_names <- c("row1", "row2")
 col_names <- c("col1", "col2",
A smaller subset "col3", "col4")
of the array mat_names <- c("Mat1", "Mat2")
elements can be arr = array(1:15, dim = c(2, 4, 2),
accessed by dimnames = list(row_names,
defining a range col_names, mat_names))
of row or # print elements of both the rows
column limits. and columns 2 and 3 of matrix 1
print (arr[, c(2, 3), 1])
Example

 Make a vector from range 2 to 10


 Add 11 in the starting and 12 in the end using
c() fn
 Add 13 using append
 Add 14,15,16 using length
 Add 17 after 5th value using append
Adding elements to array

 Elements can be appended at the different positions in the


array. The sequence of elements is retained in order of
their addition to the array. The time complexity required to
add new elements is O(n) where n is the length of the
array. The length of the array increases by the number of
element additions. There are various in-built functions
available in R to add new values:
 c(vector, values): c() function allows us to append values
to the end of the array. Multiple values can also be added
together.
 append(vector, values): This method allows the values
to be appended at any position in the vector. By default,
this function adds the element at end.
 append(vector, values, after=length(vector)) adds
new values after specified length of the array specified in
the last argument of the function. 17 after 5
 Using the length function of the array:
Elements can be added at length+x indices where x>0
14,15,15
# adding on length + 3 index
x[len + 3]<-9
# creating a uni-dimensional array print ("Array after 4th
x <- c(1, 2, 3, 4, 5) modification ")
print (x)
# addition of element using c() function
x <- c(x, 6) # append a vector of values to
print ("Array after 1st modification ") the array after length + 3 of
print (x) array
print ("Array after 5th
# addition of element using append modification")
function x <- append(x, c(10, 11, 12),
x <- append(x, 7) after = length(x)+3)
print ("Array after 2nd modification ") print (x)
print (x)
# adds new elements after 3rd
# adding elements after computing the index
length print ("Array after 6th
len <- length(x) modification")
x[len + 1] <- 8 x <- append(x, c(-1, -1), after
print ("Array after 3rd modification ") = 3)
print (x) print (x)
 The original length of the array was 7, and after third
modification elements are present till the 8th index
value. Now, at the fourth modification, when we add
element 9 at the tenth index value, the R’s inbuilt
function automatically adds NA at the missing value
positions.
 At 5th modification, the array of elements [10, 11, 12]
are added beginning from the 11th index.
At 6th modification, array [-1, -1] is appended after
the third position in the array.
Removing Elements from Array

 Elements can be removed from arrays in R, either one at a


time or multiple together. These elements are specified as
indexes to the array, wherein the array values satisfying the
conditions are retained and rest removed. The comparison
for removal is based on array values. Multiple conditions can
also be combined together to remove a range of elements.
Another way to remove elements is by using %in% operator
wherein the set of element values belonging to the TRUE
values of the operator are displayed as result and the rest
are removed.
# creating an array of length 9
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print ("Original Array")
print (m)

# remove a single value element:3 from


# remove sequence of elements
array
using another array
m <- m[m != 3]
remove <- c(4, 6, 8)
print ("After 1st modification")
print (m)
# check which element
satisfies the remove property
# removing elements based on condition
print (m % in % remove)
# where either element should be
print ("After 3rd
# greater than 2 and less than equal to
modification")
8
print (m [! m % in % remove])
m <- m[m>2 & m<= 8]
print ("After 2nd modification")
At 1st modification,
print (m) all the element values that are not equal to 3 are retained. At 2nd
modification, the range of elements that are between 2 and 8 are retained, rest are
removed. At 3rd modification, the elements satisfying the FALSE value are printed,
since the condition involves the NOT operator.
Updating Existing Elements of
Array

 The elements of the array can be updated with new


values by assignment of the desired index of the array
with the modified value. The changes are retained in the
original array. If the index value to be updated is within
the length of the array, then the value is changed,
otherwise, the new element is added at the specified
index. Multiple elements can also be updated at once,
either with the same element value or multiple values in
case the new values are specified as a vector.
# updating two indices with two
# creating an array of length different values
9 m[c(2, 5)] <- c(-1, -2)
m <- c(1, 2, 3, 4, 5, 6, 7, 8, print ("After 3rd modification")
9) print (m)
print ("Original Array")
print (m) # this add new element to the
array
# updating single element m[10] <- 10
m[1] <- 0 print ("After 4th modification")
print ("After 1st print (m)
modification") At 2nd modification, the elements at indexes
print (m) 7 to 9 are updated with -1 each. At 3rd
modification, the second element is replaced
# updating sequence of by -1 and fifth element by -2 respectively. At
elements 4th modification, a new element is added
m[7:9] <- -1 since 10th index is greater than the length of
print ("After 2nd the array.
modification")
Factors

 Factors are the R-objects which are created using a vector. It stores the vector
along with the distinct values of the elements in the vector as labels.
 The labels are always character irrespective of whether it is numeric or
character or Boolean etc. in the input vector. They are useful in statistical
modeling.
 Factors are created using the factor() function. The nlevels functions gives the
count of levels.
apple_colors <- c('green','green','yellow','red','red','red','green')
factor_apple <- factor(apple_colors)
factor_apple
nlevels(factor_apple)
 Factors in R Programming Language are data structures
that are implemented to categorize the data or represent
categorical data and store it on multiple levels.
 They can be stored as integers with a corresponding label to
every unique integer. Though factors may look similar to
character vectors, they are integers and care must be taken
while using them as strings. The factor accepts only a
restricted number of distinct values. For example, a data
field such as gender may contain values only from female,
male, or transgender.
 In the above example, all the possible cases are
known beforehand and are predefined. These distinct
values are known as levels. After a factor is created it
only consists of levels that are by default sorted
alphabetically.
Creating a Factor in R
Programming Language

 The command used to create or modify a factor in R


language is – factor() with a vector as input.
The two steps to creating a factor are:
 Creating a vector
 Converting the vector created into a factor using
function factor()
Let us create a factor gender
with levels female, male and
transgender.

# Creating a vector
x < -c("female", "male", "male",
"female")
print(x)

# Converting the vector x into a


factor
# named gender
gender < -factor(x)
print(gender)
Levels can also be predefined by the
programmer.

# Creating a factor with levels defined by


programmer
gender <- factor(c("female", "male", "male",
"female"),
levels = c("female", "transgender",
"male"));
gender
Checking for a Factor in R

 The function is.factor() is used to check whether the


variable is a factor and returns “TRUE” if it is a
factor.
gender <- factor(c("female", "male", "male", "female"));
print(is.factor(gender))
 Function class() is also used to check whether the
variable is a factor and if true returns “factor”.

gender <- factor(c("female", "male", "male", "female"));


class(gender)
Accessing elements of a Factor
in R

 Like we access elements of a vector, the same way


we access the elements of a factor. If gender is a
factor then gender[i] would mean accessing
ith element in the
gender factor.
<- factor(c("female", "male", "male", "female"));
gender[3]

 More than one element can be accessed at a time.


gender <- factor(c("female", "male", "male",
"female"));
gender[c(2,4)]
Modification of a Factor in R

 After a factor is formed, its components can be modified but the new values
which need to be assigned must be at the predefined level.
gender <- factor(c("female", "male", "male", "female" ));
gender[2]<-"female"
gender
 For selecting all the elements of the factor gender except ith element,
gender[-i] should be used. So if you want to modify a factor and add value
out of predefines levels, then first modify levels.
gender <- factor(c("female", "male", "male",
"female" ));
# add new level
levels(gender) <- c(levels(gender), "other")
gender[3] <- "other"
Factors in Data Frame

The Data frame is similar to a 2D array with the columns


containing all the values of one variable and the rows having one
set of values from every column. There are four things to
remember about data frames:
 column names are compulsory and cannot be empty.
 Unique names should be assigned to each row.
 The data frame’s data can be only of three types- factor,
numeric, and character type.
 The same number of data items must be present in each
column.
 In R language when we create a data frame, its column is
categorical data and hence a factor is automatically created on
it.
We can create a data frame and check if its column is a factor.
age <- c(40, 49, 48, 40, 67, 52, 53)
salary <- c(103200, 106200, 150200,
10606, 10390, 14070, 10220)
gender <- c("male", "male", "transgender",
"female", "male", "female", "transgender")
employee<- data.frame(age, salary, gender)
print(employee)
print(is.factor(employee$gender))
Data frames

 R Programming Language is an open-source


programming language that is widely used as a
statistical software and data analysis tool. Data
Frames in R Language are generic data objects of
R which are used to store the tabular data. Data
frames can also be interpreted as matrices where
each column of a matrix can be of the different data
types. DataFrame is made up of three principal
components, the data, rows, and columns.
Data Frames

 Data frames are tabular data objects. Unlike a matrix in data frame each column
can contain different modes of data. The first column can be numeric while the
second column can be character and third column can be logical. It is a list of
vectors of equal length. Data Frames are created using the data.frame() function.
BMI <- data.frame( gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age =c(42,38,26)
)
print(BMI)
Create Dataframe in R
Programming Language
# R program to create dataframe
 To create a data
frame in R # creating a data frame
friend.data <- data.frame(
use data.frame() co friend_id = c(1:5),
mmand and then friend_name = c("Sachin", "Sourav",
pass each of the "Dravid", "Sehwag",
vectors you have "Dhoni"),
created as arguments stringsAsFactors = FALSE
)
to the function. # print the data frame
print(friend.data)
Get the Structure of the R – Data
Frame
# R program to get the
 One can get the structure # structure of the data frame
of the data frame
using str() function in R. It # creating a data frame
can display even the friend.data <- data.frame(
internal structure of large friend_id = c(1:5),
lists which are nested. It friend_name = c("Sachin", "Sourav",
provides one-liner output "Dravid", "Sehwag",
for the basic R objects "Dhoni"),
letting the user know stringsAsFactors = FALSE
about the object and its )
constituents. # using str()
print(str(friend.data))
Summary of data in the data
frame
# R program to get the
 In R data frame, the statistical # summary of the data frame
summary and nature of the
data can be obtained by # creating a data frame
applying summary() function. friend.data <- data.frame(
It is a generic function used to friend_id = c(1:5),
produce result summaries of friend_name = c("Sachin", "Sourav",
the results of various model "Dravid", "Sehwag",
fitting functions. The function "Dhoni"),
invokes particular methods stringsAsFactors = FALSE
which depend on the class of )
the first argument. # using summary()
print(summary(friend.data))
Extract Data from Data Frame
in R Language # R program to extract
# data from the data frame

 Extract data from a # creating a data frame


data frame means friend.data <- data.frame(
friend_id = c(1:5),
that to access its friend_name = c("Sachin", "Sourav",
rows or columns. "Dravid", "Sehwag",
One can extract a "Dhoni"),
specific column stringsAsFactors = FALSE
from a data frame )
using its column # Extracting friend_name column
name. result <- data.frame(friend.data$friend_name)
print(result)
Expand Data Frame
# creating a data frame
friend.data <- data.frame(
friend_id = c(1:5),
 A data frame in R friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
can be expanded
"Dhoni"),
by adding new stringsAsFactors = FALSE
columns and rows)
to the already
existing data# Expanding data frame
friend.data$location <- c("Kolkata", "Delhi",
frame.
"Bangalore", "Hyderabad",
"Chennai")
resultant <- friend.data
# print the modified data frame
print(resultant)
 In R Language, data frames are generic data objects
which are used to store the tabular data. Data frames
are considered to be the most popular data objects in
R programming because it is more comfortable to
analyze the data in the tabular form. Data frames can
also be taught as matrices where each column of a
matrix can be of the different data types. Data frame
is made up of three principal components, the data,
rows, and columns.
Merging Data frames
In R we use merge() function to merge two data frames in R. This
function is present inside join() function of dplyr package. The
most important condition for joining two data frames is that the
column type should be the same on which the merging
happens. merge() function works similarly like join in DBMS.
Types of Merging Available in R are,
1.Natural Join or Inner Join
2.Left Outer Join
3.Right Outer Join
4.Full Outer Join
5.Cross Join
6.Semi Join
7.Anti Join
Basic syntax of merge() function in
R:

 Syntax:
merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE)
 Parameters:
df1: one dataframe
df2: another dataframe
by.df1, by.df2: The names of the columns that are common to both
df1 and df2.
all, all.df1, all.df2: Logical values that actually specify the type of
merging happens.
 First of all, we will create two dataframes that will
help us to understand each join easily.

df1 = data.frame(StudentId = df2 = data.frame(StudentId = c(102, 104,


c(101:106), 106,
Product = c("Hindi", 107, 108),
"English", State = c("Manglore", "Mysore",
"Maths", "Science", "Pune", "Dehradun",
"Political Science", "Delhi"))
"Physics")) df2
df1
Natural Join or Inner Join

 Inner join is used to keep only those rows that are matched from
the data frames, in this, we actually specify the argument all =
FALSE. If we try to understand this using set theory then we can
say here we are actually performing the intersection operation.
For example:
 A = [1, 2, 3, 4, 5] B = [2, 3, 5, 6] Then the output of natural join
will be (2, 3, 5)
 It is the most simplest and common type of joins available in R.
Now let us try to #understand
Joining of dataframes
this using R program:
df = merge(x = df1, y = df2, by =
"StudentId")
df
Left Outer Join

 Left Outer Join is basically to include all the rows of your


dataframe x and only those from y that match, in this, we
actually specify the argument x = TRUE. If we try to
understand this using a basic set theory then we can say
here we are actually displaying complete set x. Now let us
try to understand this#using
R program to illustrate
R program:
# Joining of dataframes

df = merge(x = df1, y = df2, by =


"StudentId",
all.x = TRUE)
df
Right Outer Join

 Right, Outer Join is basically to include all the rows of


your dataframe y and only those from x that match,
in this, we actually specify the argument y = TRUE. If
we try to understand this using a basic set theory
then we can say here we are actually displaying a
complete set y. Now let us try to understand this
using R program:
# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",


all.y = TRUE)
df
Full Outer Join

 Outer Join is basically used to keep all rows from both


dataframes, in this, we actually specify the arguments all =
TRUE. If we try to understand this using a basic set theory
then we can say here we are actually performing the union
option. Now let us try to understand this using R program:

# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",


all = TRUE)
df
Cross Join

 A Cross Join also known as cartesian join results in


every row of one dataframe is being joined to every
other row of another dataframe. In set theory, this
type of joins is known as the cartesian product
between two sets. Now let us try to understand this
using R program:
df = merge(x = df1, y = df2, by = NULL)
df
Semi Join

 This join is somewhat like inner join, with only the left
data frame columns and values are selected. Now let
us try to understand this using R program:
# R program to illustrate
# Joining of dataframes

# Import required library


library(dplyr)

df = df1 %>% semi_join(df2, by =


"StudentId")
df
Anti Join

 In terms of set theory, we can say anti-join as set difference


operation, for example, A = (1, 2, 3, 4) B = (2, 3, 5) then the
output of A-B will be set (1, 4). This join is somewhat like df1
– df2, as it basically selects all rows from df1 that are actually
not present in df2. Now let us try to understand this using R
program:# Import required library
library(dplyr)

df = df1 %>% anti_join(df2, by = "StudentId")


df
Matrix v/s Dataframe
Any Function
The any function in R will tell if you if there are ANY of the given search
terms in your vector. It returns either TRUE or FALSE. To demonstrate this
function, let's create a quick vector that goes from -3 to 5, incrementing by
1.
1.y <- seq(-3, 5, by = 1)
Now we can use the any function. There are several ways to use this, the
simplest is to enter the function and provide the condition. In our case, we'll
check for any negative numbers (e.g., x < 0):
2.any(x < 0)
Since you only have to provide the name of the vector and the condition,
this is the implicit iteration. After entering the previous code and hitting
Enter, R will display the following:
[1] TRUE
This means that the vector x contains negative values.
Another option is to create an if statement to check for any
negative values in the vector.
if(any(y < 0)) cat("Negative Values Found")
Instead of TRUE, the result will now display the message
defined in the if statement.
So far we have determined if any value is negative; next we
can check if ALL values meet the condition.
All function

We can run a test to see if all values meet a condition. Since


R is a tool for statistics and data science, you may not know
what values you have in a given vector.
The following code is a bit advanced, but it creates a
distribution; it is displayed here to demonstrate the
generation of a sequence in which you don't know the end
result.
1.range(q <- sort(round(stats::rnorm(15), 1)))
Now we can check to see if ALL the numbers are
negative. Of course we know the answer, but as part of
advanced R data analysis, we may not know. This is
where the all function comes in handy: We can use it
just as we used the any function.
1.if(all(q < 0)) cat("All Negative Values Found")
In our case, the result of all(q) is FALSE, and our little
message won't display.
Example

 employee data frame


 structure and summary
 access age & salary from employee
 add basic salary package 10000,12000,14000,16000,18000
 second data frame need to be created emp2 combination of employee
id, location, cab
 Merge the two dfs
 both matching keep employee df first & emp2 dataframe first
Taking Input from User in R
Programming
Developers often have a need to interact with users, either to
get data or to provide some sort of result. Most programs today
use a dialog box as a way of asking the user to provide some
type of input. Like other programming languages in R, its also
possible to take input from the user. For doing so, there are two
methods in R.

 Using readline() method


 Using scan() method
Using readline() method
 In R language readline() method takes input in string format. If one
inputs an integer then it is inputted as a string, lets say, one wants to
input 255, then it will input as “255”, like a string. So one needs to
convert that inputted value to the format that he needs. In this case,
string “255” is converted to integer 255. To convert the inputted value
to the desired data type, there are some functions in R,
 as.integer(n); —> convert to integer
 as.numeric(n); —> convert to numeric type (float, double etc)
 as.complex(n); —> convert to complex number (i.e 3+2i)
 as.Date(n) —> convert to date …, etc
 One can also show message in the console window to tell the user,
what to input in the program. To do this one must use a argument
named prompt inside the readline() function.
Actually prompt argument facilitates other functions to constructing
of files documenting. But prompt is not mandatory to use all the
time.
 var1 = readline(prompt = “Enter any number : “);
or,
var1 = readline(“Enter any number : “);
var1 = readline("Enter any Number: ")
var1 <- as.integer(var1)
Taking multiple inputs in R

 Taking multiple inputs in R language is same as taking single input, just need to
define multiple readline() for inputs. One can use braces for define
multiple readline() inside it.
 var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
or,
{
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
}
{
var1 = readline("Enter 1st number : ");
var2 = readline("Enter 2nd number : ");
var3 = readline("Enter 3rd number : ");
var4 = readline("Enter 4th number : ");
}

# converting each value


var1 = as.integer(var1);
var2 = as.integer(var2);
var3 = as.integer(var3);
var4 = as.integer(var4);

# print the sum of the 4 number


print(var1 + var2 + var3 + var4)
 string:
var1 = readline(prompt = “Enter your name :
“);

 character:
var1 = readline(prompt = “Enter any
character : “);
var1 = as.character(var1)
# string input
var1 = readline(prompt = "Enter your name : ");

# character input
var2 = readline(prompt = "Enter any character :
");
# convert to character
var2 = as.character(var2)

# printing values
print(var1)
print(var2)
Using scan() method

 Another way to take user input in R language is using


a method, called scan() method. This method takes
input from the console. This method is a very handy
method while inputs are needed to taken quickly for
any mathematical calculation or for any dataset. This
method reads data in the form of a vector or list. This
method also uses to reads input from a file also.
 x = scan()
scan() method is taking input continuously, to
terminate the input process, need to press Enter key
2 times on the console.
Decision making is an important part of programming. This can be
achieved in R programming using the conditional if…else statement
if (test_expression)
R if statement {
The syntax of if statement is: statement
}

If the test_expression is TRUE, the statement gets executed. But if


it’s FALSE, nothing happens.

Here, test_expression can be a logical or numeric vector, but only


the first element is taken into consideration.

In the case of numeric vector, zero is taken as FALSE, rest as TRUE.


Examples
x <- 5
if(x > 0){ print("Positive
number") }

y <- -1
if(y > 0){ print("Positive
number") }

z <- c(x,y)
if(z > 0){ print("Positive
number") }

m <- c(y,x)
if(m > 0){ print("Positive
if…else statement

if (test_expression) {
 The syntax of if…else statement is:statement1
} else {
statement2 }

 The else part is optional and is only evaluated if


test_expression is FALSE.

 It is important to note that else must be in the same


line as the closing braces of the if statement.
x <- -5
x <- c("what","is","truth")
if(x > 0){
if("Truth" %in% x) {
print("Non-negative number")
print("Truth is found")
} else {
} else {
print("Negative number") }
print("Truth is not found") }

x <- -5
y <- if(x > 0) 5 else 6
y
if…else Ladder

 The if…else ladder (if…else…if) if ( test_expression1) {


statement allows you execute a block
statement1
of code among more than 2
} else if
alternatives
( test_expression2) {
 The syntax of if…else statement is: statement2
} else if
 ( test_expression3) {
Only one statement will get executed
statement3
depending upon the test_expressions.
} else {
statement4
}
x <- 0
x <- c("what","is","truth")
if (x < 0) {
if("Truth" %in% x) {
print("Negative
print("Truth is found the first time")
number")
} else if ("truth" %in% x) {
} else if (x > 0) {
print("truth is found the second time")
print("Positive number")
} else {
} else
print("No truth found") }
print("Zero")
x <- c(-5:5, NA)
if_else(x < 0, NA_integer_, x)
#> [1] NA NA NA NA NA 0 1 2 3 4 5 NA
if_else(x < 0, "negative", "positive",
"missing")
# Unlike ifelse, if_else preserves types
x <- factor(sample(letters[1:5], 10, replace =
TRUE)) x <- seq(0.1,10,0.1)
ifelse(x %in% c("a", "b", "c"), x, factor(NA)) y <- if (x < 5) 1 else 2
#> [1] 1 NA 1 NA 2 2 2 NA 1 1
if_else(x %in% c("a", "b", "c"), x, factor(NA))
#> [1] b <NA> b <NA> c c c <NA> b b
#> Levels: b c d e
# Attributes are taken from the `true` vector
R Factors and Tables

• One often has to deal with categorical variables in statistics


(i.e., variables at the nominal or ordinal level of
measurement). In R, these are best dealt with through the use
of factors.
• For example, fertilizers typically have three main ingredients,
nitrogen (N), phosphorous (P), and potassium (K). Perhaps one
is conducting an experiment to determine which of these
ingredients best promotes root development, and has four
treatment groups (one for each ingredient, and a control group
that receives none of the ingredients).
Plants numbered 1 through 12 are randomly assigned to one of the
four treatment groups so that each group ends up with 3 members. We
could represent this process with the vector named f, as shown below
-- where the treatment given to plant i corresponds to the i th element
f=
of the vector:
c("K","K","none","N","P","P","N","N","none","P","K","none
")
To make R aware that the values listed are values associated with a
categorical variable (which are called levels in R), we convert this
vector into a factor with the factor() function:
fertilizer = factor(f)
Asking R to show the contents of f and fertilizer suggests
there is a subtle difference between the two variables, as
shown below:
f
[1] "K" "K" "none" "N" "P" "P" "N" "N" "none" "P" "K" "none"

fertilizer
[1] K K none N P P N N none P K none
Levels: K N none P
First, it is clear that R is no longer considering the
elements of the factor as strings of characters, given
the absence of double-quotes. Second (and more
importantly), additional information in the form of
"Levels: K N none P" is given. The levels shown
correspond to the unique values seen in the
vector ff (i.e., the categories that represent the
treatment groups).
There are other differences between a vector and a factor, which we
can see if we use the str(x) function. This function in R displays a
compact representation of the internal structure of any R variable. Let's
see what happens when we apply it to both f and fertilizer:

str(f)
# chr [1:12] "K" "K" "none" "N" "P" "P" "N" "N" "none" "P" "K" "none"
str(fertilizer)
# Factor w/ 4 levels "K","N","none",..: 1 1 3 2 4 4 2 2 3 4 ...

Note how in the factor fertilizer, the levels "K", "N", "none", and "P" are
replaced by numbers 1, 2, 3, and 4, respectively. So internally, R only
stores the numbers (indicating the level of each vector element) and
(separately) the names of each unique level. (Interestingly, even if the
vector's elements had been numerical, the levels are stored as strings
of text.)
The way R internally stores factors is important when we want
to combine them. Consider the following failed attempt to
combine factors a.fac and b.fac:

a.fac = factor(c("X","Y","Z","X"))
b.fac = factor(c("X","X","Y","Y","Z"))
factor(c(a.fac,b.fac))
# [1] 1 2 3 1 1 1 2 2 3 Levels: 1 2 3

Notice how we lost the names associated with the different


levels.
There is a way to restore them -- but it would be better not to lose
them in the first place! The as.character() function can help here.
This function can be used to force a factor back into a vector
whose elements are the corresponding strings of text associated
with its levels. For
example, as.character(factor(c("X","Y"))) returns a vector
equivalent to c("X","Y").
To combine two factors (with the same levels), we force them both
back to vectors in the way just described, combine the vectors
with c(), and then convert the result back into a factor -- as shown
below:
a.fac = factor(c("X","Y","Z","X"))
b.fac = factor(c("X","X","Y","Y","Z"))
factor(c(as.character(a.fac),as.character(b.fac)))
One can also change the levels associated with a factor,
using levels() as the following suggests.
a.fac = factor(c("X","Y","Z","X"))
a.fac
# [1] X Y Z X Levels: X Y Z
levels(a.fac) = c("A","B","C")
a.fac
# [1] A B C A Levels: A B C
Tables
Factors can also be used to create tables in R, another important data
type in terms of its relationship to statistics.
As an example, suppose that a sample of 7 people are asked the
following questions in a study of workplace risk of tetanus infections:
•Q1: "On any given day, is there a risk that you might be cut at work?
(Yes, No, or Maybe)"
•Q2: "Does your work put you in contact with soil, dust, or manure?"
(Always, Never, or Sometimes)
The answers to each question for subjects 1 through 7 are given by the
following factors:
Q1 =
factor(c("Sometimes","Sometimes","Never","Always","Always","Sometimes","Sometime
s"))
Q2 = factor(c("Maybe","Maybe","Yes","Maybe","No","Yes","No"))
Thinking that there might be a relationship between these two
variables, we wish to construct a contingency table -- where the
levels of one variable form the column headers and the levels of
the other variable form the row headers, with the body of the
table indicating how many subjects were associated with each
possible pair of levels.
To create such a table in R, we simply use the table() command,
as shown below:
Accessing the elements of the table
 One can also produce new tables from existing ones.
For example, suppose we wanted to see a table of
relative frequencies instead of counts. Much like one
might do with a vector, we simply divide the table by
the sum of its elements:
Practice exercises
 Create an empty vector “a” and the add a sequence of 1 to 20 in it.
Print the vector before and after the addition has been made and
also check for the class of vector a.
 Create three empty vectors named “a”, “b” and “d”. Print the
vectors and display their types. Include (10,20,14.5, 89.000) in “a”;
(TRUE, FALSE, FALSE) in “b” and (“Garima”, “Nishant”, “Tanaya”) in
“d” respectively using the c function. Check for the type and print
the vectors. Repeat the exercise using the index[].
 Using the above vectors, access first element of “a”, second
element of “b” and first and third elements of “d” separately. Modify
the third element of “b” vector as TRUE. Delete the vector “d” and
sort the vector a in a decreasing order and store the results in “A”.
Finding sum, mean and product of
vector in R: practice
vec = c(1, 2, 3 ,4, 5)
print("Sum of the vector:")
 Create a vector named vec with
values 1:5. Calculate # inbuilt sum method
sum(vec),mean(vec),prod(vec). print(sum(vec))
 sum(),
mean(), and prod() methods are # using inbuilt mean method
available in R which are used to print("Mean of the vector:")
compute the specified operation print(mean(vec))
over the arguments specified in
the method. In case, a single # using inbuilt product method
vector is specified, then the print("Product of the vector:")
operation is performed over print(prod(vec))
individual elements
Vector with NaN values
 If we have NA values in the vector, normal functions wont
work.
# declaring a vector
vec = c(1.1,NA, 2, 3.0,NA )
print("Sum of the vector:") # using inbuilt product method
print("Product of the vector:")
# inbuilt sum method print(prod(vec))
print(sum(vec))
# ignoring missing values
# using inbuilt mean method print("Sum of the vector without NaN values:")
print("Mean of the vector with NaN values:") print(sum(vec,na.rm = TRUE))
# not ignoring NaN values # ignoring missing values
print(mean(vec)) print("Product of the vector without NaN values:")
print(prod(vec,na.rm = TRUE))
# ignoring missing values
print("Mean of the vector without NaN values:")
print(mean(vec,na.rm = TRUE))
Practice

 Create a Vector X with these values: 1.1, 2, 3.0, 5.7 and repeat the
exercise.
 Create a Vector Y with these values: 7,NA,9,8,NA,75,NA,65 and
repeat the exercise.
 Make two vectors A and B, A being a sequence of range 1:10 and B
in the range of 10:15 sorted in decreasing order. Calculate A+B,A-
B,B-A,A*B,A/B,B/A, A^B,B^A, remainder and quotient when A/B
and B/A.
 Make two vectors C and D, C being a sequence of range 11:20 and
D in the range of 5:14 sorted in decreasing order. Repeat the
Matrix exercise

 Create a matrix data1 having 5 rows with a sequence of 1 to 10.


Create a matrix data2 having 5 rows with a sequence of 11 to
20. Print the two matrices and calculate
A+B,A-B,B-A,A*B,A/B,B/A; where A is data1 and B corresponds
to data2.
 Create data3 as 21:30 in the similar manner. Calculate product
of all three matrices.
 Create a vector “a” using 3, 4, 5, 6, 7, 8 and vector “b” using 1,
3, 0, 7, 8, 5. Make a matrix M using vector a having 3 columns
and allocate the values row wise. Make a matrix N using vector
 Create a 4*4 matrix X using numbers from 1 to 16. Names the
rows as a,b,c and d respectively. Name the columns as m,n,o and
p. Assess the 3rd row, 2nd column separately. Assess the 3rd row
second column element. Assess the 1st and 3rd row together.
Change the 2nd row 2nd element to 50.
 Make another 4*1 matrix S using values 17,18,19,20. Combine S
matrix into the matrix X. Make another 1*4 matrix R using values
21,22,23,24. Combine R matrix into the matrix X. Delete the 2nd
row and 2nd column of matrix X.
 Create a 4*4 matrix Y using 33:48. Names the rows as e,f,g,h
respectively. Name the columns as q,r,s and t. Assess the 2nd row,
1st column separately. Assess the 2nd row 3rd column element.
Assess the 1st and 3rd column together. Change the 2nd row 1st
element to 1.
 Calculate X+Y,X-Y,Y-X, X*Y,X/Y, Y/X
List examples

 Create a random list of 10 elements using a sequence of


numbers from 1 to 10 without replacement. Repeat the same
exercise with replacement.
 Create a random list of 15 elements using a sequence of
numbers from 1 to 5 with associated probabilities as
0.02,0.2,0.25,0.5,0.9.
 Create a list 10 elements using a sequence of numbers from 1
to 10. Convert this list into array of dimensions 3*3*3.
 Create a list using 3 names: Nitin, Priyanshu, Harshal. Convert
this list to array of dimensions 1*3*2
List2 examples

 Create a list of 3 components, element 1 as a sequence


of 1 to 5; False and True as the second element and
letters from d to i as third element of the list.
 Now access second element components; TRUE; third
element third value; G.
 Delete 3 from the list and print the results.
 Add an element 4 “Sun”, “Mon” “Tue” in the previous
list and print the results.
 Create a dataframe with id 1 to 5 and names
“Pulkita”, “Ritesh”, “Ashish”, “Rahul”, “Akansha”.
 Add the programme name PGDDA to the existing
data frame and print the results. Add the course code
101,103,105,107 and 109 to the dataframe. Get the
structure and summary of the dataframe.
 Construct a data frame named authors and books like this.
Merge the two by author id

You might also like