unit -1 notes R programming
unit -1 notes R programming
Evolution of R
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand, and is currently developed by the R Development Core Team. R is freely available
under the GNU General Public License, and pre-compiled binary versions are provided for
various operating systems like Linux, Windows and Mac. This programming language was
named R, based on the first letter of first name of the two R authors (Robert Gentleman and
Ross Ihaka), and partly a play on the name of the Bell Labs Language S. It also combines
with lexical scoping semantics inspired by Scheme. Moreover, the project conceives in 1992,
with an initial version released in 1995 and a stable beta version in 2000.
Why R Programming Language?
R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
It’s a platform-independent language. This means it can be applied to all operating
system.
It’s an open-source free language. That means anyone can install it in any organization
without purchasing a license.
R programming language is not only a statistic package but also allows us to integrate
with other languages (C, C++). Thus, you can easily interact with many data sources and
statistical packages.
The R programming language has a vast community of users and its growing day by day.
R is currently one of the most requested programming languages in the Data Science job
market that makes it the hottest trend nowadays.
Features of R
R is a domain-specific programming language which aims to do data analysis. It has some
unique features which make it very powerful. The most important arguably being the notation
of vectors. These vectors allow us to perform a complex operation on a set of values in a
single command. These are the following features of R programming:
R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
R has an effective data handling and storage facility, R provides a suite of operators for
calculations on arrays, lists, vectors and matrices.
R provides a large, coherent and integrated collection of tools for data analysis.
R provides graphical facilities for data analysis and display either directly at the
computer or printing at the papers.
It is a well-designed, easy, and effective language which has the concepts of user-defined,
looping, conditional, and various I/O facilities.
It has a consistent and incorporated set of tools which are used for data analysis.
For different types of calculation on arrays, lists and vectors, R contains a suite of
operators.
It provides effective data handling and storage facility.
It is an open-source, powerful, and highly extensible software.
It provides highly extensible graphical techniques.
It allows us to perform multiple calculations using vectors.
R is an interpreted language.
Applications of R:
We use R for Data Science. It gives us a broad variety of libraries related to statistics. It
also provides the environment for statistical computing and design.
R is used by many quantitative analysts as its programming tool. Thus, it helps in data
importing and cleaning.
R is the most prevalent language. So many data analysts and research programmers use it.
Hence, it is used as a fundamental tool for finance.
Tech giants like Google, Facebook, Bing, Twitter, Accenture, Wipro and many more using
R nowadays.
Variables in R
Previously, we wrote all our code in a print () but we don’t have a way to address them as
to perform further operations. This problem can be solved by using variables which like
any other programming language are the name given to reserved memory locations that can
store any type of data. In R, the assignment can be denoted in three ways:
1. = (Simple Assignment)
2. <- (Leftward Assignment)
3. -> (Rightward Assignment)
Example: Output:
"Simple Assignment"
"Leftward Assignment!"
"Rightward Assignment"
The rightward assignment is less common and can be confusing for some programmers,
so it is generally recommended to use the <- or = operator for assigning values in R.
R Script File
Usually, you will do your programming by writing your programs in script files and
then you execute those scripts at your command prompt with the help of R interpreter
called Rscript. So let's start with writing following code in a text file called test.R as
under:
print (myString)
Save the above code in a file test.R and execute it at Linux command prompt as given
below. Even if you are using Windows or other system, syntax will remain same.
$ Rscript test.R
When we run the above program, it produces the following result.
Comments in R
Comments are a way to improve your code’s readability and are only meant for the user
so the interpreter ignores it. Only single-line comments are available in R but we can
also use multiline comments by using a simple trick which is shown below. Single line
comments can be written by using # at the beginning of the statement.
Example:
Output:
[1] "This is fun!"
From the above output, we can see that both comments were ignored by the interpreter. In
R programming, comments are the programmer readable explanation in the source code of
an R program. The purpose of adding these comments is to make the source code easier to
understand. These comments are generally ignored by compilers and interpreters. In R
programming there is only single-line comment. R doesn't support multi-line comment. But
if we want to perform multi-line comments, then we can add our code in a false block.
Single-line comment
#My First program in R programming
print(string)
Keywords in R
Keywords are the words reserved by a program because they have a special meaning thus a
keyword can’t be used as a variable name, function name, etc. We can view these keywords
by using either help(reserved) or reserved.
if, else, repeat, while, function, for, in, next and break are used for control-flow
statements and declaring user-defined functions.
The ones left are used as constants like TRUE/FALSE are used as boolean constants.
NaN defines Not a Number value and NULL are used to define an Undefined value.
Inf is used for Infinity values.
Variables in R Programming
Variables are used to store the information to be manipulated and referenced in the R
program. The R variable can store an atomic vector, a group of atomic vectors, or a
combination of many R objects.
Name of Validity Reason for valid and invalid
variable
_var_name Invalid Variable name can't start with an underscore (_).
var_name, Valid Variable can start with a dot, but dot should not be followed by a
var.name number. In this case, the variable will be invalid.
var_name% Invalid In R, we can't use any special character in the variable name
except dot and underscore.
2var_name Invalid Variable name cant starts with a numeric digit.
.2var_name Invalid A variable name cannot start with a dot which is followed by a
digit.
var_name2 Valid The variable contains letter, number and underscore and starts
with a letter.
Language like C++ is statically typed, but R is a dynamically typed, means it check the type
of data type when the statement is run. A valid variable name contains letter, numbers, dot
and underlines characters. A variable name should start with a letter or the dot not followed
by a number.
Assignment of variable
In R programming, there are three operators which we can use to assign the values to the
variable. We can use leftward, rightward, and equal_to operator for this purpose. There are
two functions which are used to print the value of the variable i.e., print() and cat(). The cat()
function combines multiples values into a continuous print output.
# Assignment using equal operator.
variable.1 = 124
# Assignment using leftward operator.
variable.2 <- "Learn R Programming"
# Assignment using rightward operator.
133L -> variable.3
print(variable.1)
cat ("variable.1 is ", variable.1 ,"\n")
cat ("variable.2 is ", variable.2 ,"\n")
cat ("variable.3 is ", variable.3 ,"\n")
Operators in R
In computer programming, an operator is a symbol which represents an action. An operator is
a symbol which tells the compiler to perform specific logical or mathematical manipulations.
R programming is very rich in built-in operators. In R programming, there are different types
of operator, and each operator performs a different task. For data manipulation, there are
some advance operators also such as model formula and list indexing. There are the following
types of operators used in R:
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
1) Arithmetic Operators:
Arithmetic operations in R simulate various math operations, like addition, subtraction,
multiplication, division, and modulo using the specified operator between operands, which
may be either scalar values, complex numbers, or vectors. The R operators are performed
element-wise at the corresponding positions of the vectors.
Sl. No Operator Operator Name Description Example
1. + Addition This operator is used to add two b <- c(11, 5, 3)
vectors in R. a <- c(2, 3.3, 4) print(a+b)
It will give us the following
output: [1] 13.0 8.3 5.0
2. - Subtraction This operator is used to divide a b <- c(11, 5, 3)
vector from another one. a <- print(a-b)
c(2, 3.3, 4) It will give us the following
output: [1] -9.0 -1.7 3.0
3. * Multiplication This operator is used to multiply b <- c(11, 5, 3)
two vectors with each other. a <- print(a*b)
c(2, 3.3, 4) It will give us the following
output: [1] 22.0 16.5 4.0
4. / Division This operator divides the vector b <- c(11, 5, 3)
from another one. a <- c(2, 3.3, print(a/b)
4) It will give us the following
output: [1] 0.1818182
0.6600000 4.0000000
5 %% Modulus This operator is used to find the b <- c(11, 5, 3)
remainder of the first vector with print(a%%b)
the second vector. a <- c(2, 3.3, It will give us the following
4) output: [1] 2.0 3.3 0
6. %/% Integer Division This operator is used to find the a <- c(2, 3.3, 4)
division of the first vector with b <- c(11, 5, 3)
the second(quotient). print(a%/%b) It will give us
the following output: [1] 0 0
4
7. ^ Exponents This operator raised the first b <- c(11, 5, 3)
vector to the exponent of the print(a^b)
second vector. a <- c(2, 3.3, 4) It will give us the following
output: [1] 0248.0000
391.3539 4.0000
2) Relational Operators
A relational operator is a symbol which defines some kind of relation between two entities.
These include numerical equalities and inequalities. A relational operator compares each
element of the first vector with the corresponding element of the second vector. The result of
the comparison will be a Boolean value. There are the following relational operators which
are supported by R:
Sl. No Operator Operator Name Description Example
1. > Greater than (>) This operator will return TRUE a <- c(1, 3, 5)
when every element in the first b <- c(2, 4, 6)
vector is greater than the print(a>b)
corresponding element of the It will give us the
second vector. following
output:
[1] FALSE FALSE
FALSE
2. < Less than This operator will return TRUE a <- c(1, 9, 5)
when every element in the first b <- c(2, 4, 6)
vector is less than the print(a<b)
corresponding element of the It will give us the
second vector. following
output:
[1] FALSE TRUE
FALSE
3. <= Less than equal This operator will return TRUE a <- c(1, 3, 5)
to when every element in the first b <- c(2, 3, 6)
vector is less than or equal to the print(a<=b)
corresponding element of another It will give us the
vector. following
output:
[1] TRUE TRUE TRUE
4. >= Greater than This operator will return TRUE a <- c(1, 3, 5)
equal to when every element in the first b <- c(2, 3, 6)
vector is greater than or equal to the print(a>=b)
corresponding element of another It will give us the
vector. following
output:
[1] FALSE TRUE
FALSE
5. == Equal to equal This operator will return TRUE a <- c(1, 3, 5)
to when every element in the first b <- c(2, 3, 6)
vector is equal to the corresponding print(a==b)
element of the second vector. It will give us the
following
output:
[1] FALSE TRUE
FALSE
6. != Not equal to This operator will return TRUE a <- c(1, 3, 5)
when every element in the first b <- c(2, 3, 6)
vector is not equal to the print(a>=b)
corresponding element of the It will give us the
second vector. following
output:
[1] TRUE FALSE TRUE
3) Logical Operators
The logical operators allow a program to make a decision on the basis of multiple conditions.
In the program, each operand is considered as a condition which can be evaluated to a false or
true value. The value of the conditions is used to determine the overall value of the op1
operator op2. Logical operators are applicable to those vectors whose type is logical,
numeric, or complex. The logical operator compares each element of the first vector with the
corresponding element of the second vector. There are the following types of operators which
are supported by R:
Sl. No Operator Operator Name Description Example
1. & Logical AND This operator is known as the a <- c(3, 0, TRUE, 2+2i)
operator Logical AND operator. This b <- c(2, 4, TRUE, 2+3i)
operator takes the first element of print(a&b)
both the vector and returns TRUE It will give us the
if both the elements are TRUE. following
output: [1] TRUE FALSE
TRUE TRUE
2. | Logical OR This operator is called the Logical a <- c(3, 0, TRUE, 2+2i)
operator OR operator. This operator takes b <- c(2, 4, TRUE, 2+3i)
the first element of both the vector print(a|b)
and returns TRUE if one of them It will give us the
is TRUE. following
output:
[1] TRUE TRUE TRUE
TRUE
3. ! NOT operator This operator is known as Logical a <- c(3, 0, TRUE, 2+2i)
NOT operator. This operator takes print(!a)
the first element of the vector and It will give us the
gives the opposite logical value as following
a result. output: [1] FALSE TRUE
FALSE FALSE
4. && Logical This operator takes the first a <- c(3, 0, TRUE, 2+2i)
AND operator element of both the vector and b <- c(2, 4, TRUE, 2+3i)
gives TRUE as a result, only if print(a&&b)
both are TRUE. It will give us the
following
output: [1] TRUE
5. || Logical OR This operator takes the first a <- c(3, 0, TRUE, 2+2i)
operator element of both the vector and b <- c(2, 4, TRUE, 2+3i)
gives the result TRUE, if one of print(a||b)
them is true. It will give us the
following
output:
[1] TRUE
4) Assignment Operators
Assignment operators in R are used to assigning values to various data objects in R. The
objects may be integers, vectors, or functions. These values are then stored by the assigned
variable names. There are two kinds of assignment operators: Left and Right
Sl. No Operator Operator Name Description Example
1. <- or = or Left Assignment These operators are known as left a <- c(3, 0, TRUE, 2+2i)
<<- assignment operators. b <<- c(2, 4, TRUE,
2+3i) d = c(1, 2, TRUE,
2+3i) print(a)
print(b)
print(d)
It will give us the
following
output:[1] 3+0i 0+0i 1+0i
2+2i
[1] 2+0i 4+0i 1+0i 2+3i
[1] 1+0i 2+0i 1+0i 2+3i
2. -> or ->> Right Assignment These operators are known as c(3, 0, TRUE, 2+2i) -> a
right assignment operators. c(2, 4, TRUE, 2+3i) ->>
b print(a) print(b)
It will give us the
following
output: [1] 3+0i 0+0i
1+0i 2+2i
[1] 2+0i 4+0i 1+0i 2+3i
5) Miscellaneous Operators
These are the mixed operators in R that simulate the printing of sequences and assignment of
vectors, either left or right-handed.
Sl. No Operator Operator Name Description Example
1. : Colon operator It creates the series of numbers in v <- 2:8
sequence for a vector. print(v)
It produces the following
result −
[1] 2 3 4 5 6 7 8
2. %in% %in% Operator This operator is used to identify if v1 <- 8
an element belongs to a vector. v2 <- 12
t <- 1:10
print(v1 %in% t)
print(v2 %in% t)
It produces the following
result −
[1] TRUE
[1] FALSE
3. %*% %*% Operator This operator is used to multiply a M =
matrix with its transpose. matrix( c(2,6,5,1,10,4),
Transpose of the matrix is nrow = 2,ncol = 3,
obtained by interchanging the byrow = TRUE)
rows to columns and columns to t = M %*% t(M)
rows. The number of columns of print(t)
the first matrix must be equal to It produces the following
the number of rows of the second result −
matrix. Multiplication of the [,1] [,2]
matrix A with its transpose, B, [1,] 65 82
produces a square matrix. [2,] 82 117
2. List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single
mode.
A list contains a mixture of data types. The list is also known as generic vectors because the
element of the list can be of any type of R object. "A list is a special type of vector in which
each element can be a different type. “We can create a list with the help of list() or as.list().
We can use vector() to create a required length empty list.
3. Arrays
There is another type of data objects which can store data in more than two dimensions
known as arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four
rectangular matrices of two rows and three columns. In R, an array is created with the help of
array() function. This function takes a vector as an input and uses the value in the dim
parameter to create an array.
4. Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional rectangular
layout. In the matrix, elements of the same atomic types are contained. For mathematical
calculation, this can use a matrix containing the numeric element. A matrix is created with the
help of the matrix() function in R.
Syntax
The basic syntax of creating a matrix is as follows:
1. matrix(data, no_row, no_col, by_row, dim_name)
5. Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in which
each
column contains the value of one variable, and row contains the set of value from each
column.
There are the following characteristics of a data frame:
1. The column name will be non-empty.
2. The row names will be unique.
3. A data frame stored numeric, factor or character type data.
4. Each column will contain same number of data items.
6. Factors
Factors are also data objects that are used to categorize the data and store it as levels. Factors
can store both strings and integers. Columns have a limited number of unique values so that
factors are very useful in columns. It is very useful in data analysis for statistical modelling.
Factors are created with the help of factor () function by taking a vector as an input
parameter.
Vectors
R vectors are the same as the arrays in C language which are used to hold multiple data
values of the same type. One major key point is that in R the indexing of the vector will start
from „1‟ and not from „0‟. We can create numeric vectors and character vectors as well.
The length is an important property of a vector. A vector length is basically the number of
elements in the vector, and it is calculated with the help of the length() function. Vector is
classified into two parts, i.e., atomic vectors and Lists. They have three common properties,
i.e., function type, function length, and attribute function. There is only one difference
between atomic vectors and lists. In an atomic vector, all the elements are of the same type,
but in the list, the elements are of different data types. In this section, we will discuss only the
atomic vectors. We will discuss lists briefly in the next topic.
Atomic vectors in R
In R, there are four types of atomic vectors. Atomic vectors play an important role in Data
Science. Atomic vectors are created with the help of c() function. These atomic vectors are as
follows:
Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal value to
any variable d, then this d variable will become a numeric type. A vector which contains
numeric elements is known as a numeric vector.
Example:
1. d<-45.5
2. num_vec<-c(10.1, 10.2, 33.2)
3. d
4. num_vec
5. class(d)
6. class(num_vec)
Output
[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"
Integer vector
A non-fraction numeric value is known as integer data. This integer data is represented by
"Int."
The Int size is 2 bytes and long Int size of 4 bytes. There is two way to assign an integer
value to a variable, i.e., by using as.integer() function and appending of L to the value. A
vector which contains integer elements is known as an integer vector.
Example:
1. d<-as.integer(5)
2. e<-5L
3. int_vec<-c(1,2,3,4,5)
4. int_vec<-as.integer(int_vec)
5. int_vec1<-c(1L,2L,3L,4L,5L)
6. class(d)
7. class(e)
8. class(int_vec)
9. class(int_vec1)
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
Character vector
A character is held as a one-byte integer in memory. In R, there are two different ways to
create a character data type value, i.e., using as.character() function and by typing string
between double quotes("") or single quotes(''). A vector which contains character elements is
known as an integer vector.
Example:
1. d<-'shubham'
2. e<-"Arpita"
3. f<-65
4. f<-as.character(f)
5. d
6. e
7. f
8. char_vec<-c(1,2,3,4,5)
9. char_vec<-as.character(char_vec)
10. char_vec1<-c("shubham","arpita","nishka","vaishali")
11. char_vec
12. class(d)
13. class(e)
14. class(f)
15. class(char_vec)
16. class(char_vec1)
Output
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
Repeating Values:
You can create a vector by repeating the values using the rep() function. # Creates a vector
with
five is
Ex:
r<-rep(1, times=5)
r
Vector Function:
Vector Length
To find out how many items a vector has, use the length() function
Ex:
fruits <-c(“banana”, “apple”, “orange”)
length(fruits)
Output: [1] 3
Sort a Vector:
To sort items in a vector alphabetically or numerically, use the sort() function:
Ex:
fruits <-c(“banana”, “apple”, “orange”)
n<-c(5, 6, 1, 8)
sort(fruits)
sort(n)
Output:
[1] “apple” “banana” “orange”
[1] 1 5 6 8
Change an item:
To change the value of a specific item, refer to the index number
Ex:
fruits <-c(“banana”, “apple”, “orange”)
fruits[2]<-“Mango”
fruits
Output:
[1] “banana” “Mango” “orange”
Modifying a R vector
Modification of a Vector is the process of applying some operation on an individual
element of a vector to change its value in the vector. There are different ways through which
we can
modify a vector:
# R program to modify elements of a Vector
# Creating a vector
X<- c(2, 7, 9, 7, 8, 2)
# modify a specific element
X[3] <- 1
X[2] <-9
cat('subscript operator', X, '\n')
# Modify by specifying
# the position or elements.
X<- X[c(3, 2, 1)]
cat('combine() function', X)
Output:
subscript operator 2 9 1 7 8 2
Logical indexing 0 0 0 0 0 2
combine() function 0 0 0
Deleting a R vector
Deletion of a Vector is the process of deleting all of the elements of the vector. This can be
done by assigning it to a NULL value.
# R program to delete a Vector
# Creating a Vector
M<- c(8, 10, 2, 5)
# set NULL to the vector
M<- NULL
cat('Output vector', M)
Output:
Output vector NULL
Sorting elements of a R Vector
sort() function is used with the help of which we can sort the values in ascending or
descending
order.
# R program to sort elements of a Vector
# Creation of Vector
X<- c(8, 2, 7, 1, 11, 2)
# Sort in ascending order
A<- sort(X)
cat('ascending order', A, '\n')
# sort in descending order
# by setting decreasing as TRUE
B<- sort(X, decreasing = TRUE)
cat('descending order', B)
Output:
ascending order 1 2 2 7 8 11
descending order 11 8 7 2 2 1
Lists
A list in R is a generic object consisting of an ordered collection of objects. Lists are one-
dimensional, heterogeneous data structures. The list can be a list of vectors, a list of matrices,
a list of characters and a list of functions, and so on. A list is a vector but with heterogeneous
data elements. A list in R is created with the use of list() function. R allows accessing
elements of an R list with the use of the index value. In R, the indexing of a list starts with 1
instead of 0 like in other programming languages. Lists are the R objects which contain
elements of different types like − numbers, strings, vectors and another list inside it. A list can
also contain a matrix or a function as its elements. List is created using list() function.
Example
1. vec <- c(3,4,5,6)
2. char_vec<-c("shubham","nishka","gunjan","sumit")
3. logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
4. out_list<-list(vec,char_vec,logic_vec)
5. out_list
Output:
[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE
List Functions:
R provides various functions for working with lists, including:
length(): Returns the number of elements in a list.
names(): Returns or sets the names of the elements in a list.
str(): Displays the structure of a list, showing its elements and data types.
unlist():Converts a list to a vector by flattening it.
Lists creation
The process of creating a list is the same as a vector. In R, the vector is created with the help
of c() function. Like c() function, there is another function, i.e., list() which is used to create a
list in R. A list avoid the drawback of the vector which is data type. We can add the elements
in the list of different data types.
Syntax
1. list()
Example 1: Creating list with same data type
1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1
6. list_2
7. list_3
8. list_4
Output:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"
[[1]]
[1] 1 2 3
[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
Example 2: Creating the list with different data type
1. list_data<-list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,22.5,12L)
2. print(list_data)
In the above example, the list function will create a list with character, logical, numeric, and
vector element. It will give the following output
Output:
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] 1 2 3 4 5
[[4]]
[1] TRUE
[[5]]
[1] FALSE
[[6]]
[1] 22.5
[[7]]
[1] 12
Giving a name to list elements
R provides a very easy way for accessing elements, i.e., by giving the name to each element
of a list. By assigning names to the elements, we can access the element easily. There are
only three steps to print the list data corresponding to the name:
1. Creating a list.
2. Assign a name to the list elements with the help of names() function.
3. Print the list data.
Let see an example to understand how we can give the names to the list elements.
Example
1. # Creating a list containing a vector, a matrix and a list.
2. list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow = 2),
3. list("BCA","MCA","B.tech"))
4. # Giving names to the elements in the list.
5. names(list_data) <- c("Students", "Marks", "Course")
6. # Show the list.
7. print(list_data)
Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
Accessing List Elements
R provides two ways through which we can access the elements of a list. First one is the
indexing method performed in the same way as a vector. In the second one, we can access the
elements of a list with the help of names. It will be possible only with the named list.; we
cannot access the elements of a list using names if the list is normal.
Let see an example of both methods to understand how they are used in the list to access
elements.
Example 1: Accessing elements using index
1. # Creating a list containing a vector, a matrix and a list.
2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),
3. list("BCA","MCA","B.tech"))
4. # Accessing the first element of the list.
5. print(list_data[1])
6. # Accessing the third element. The third element is also a list, so all its elements will be
printed.
7. print(list_data[3])
Output:
[[1]]
[1] "Shubham" "Arpita" "Nishka"
[[1]]
[[1]][[1]]
[1] "BCA"
[[1]][[2]]
[1] "MCA"
[[1]][[3]]
[1] "B.tech"
Example 2: Accessing elements using names
1. # Creating a list containing a vector, a matrix and a list.
2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow =
2),list("BCA","M
CA","B.tech"))
3. # Giving names to the elements in the list.
4. names(list_data) <- c("Student", "Marks", "Course")
5. # Accessing the first element of the list.
6. print(list_data["Student"])
7. print(list_data$Marks)
8. print(list_data)
Output:
$Student
[1] "Shubham" "Arpita" "Nishka"
$<NA>
NULL
$Course
[1] "Masters of computer applications"
Check if Item Exists:
To find out if a specified item is present in a list, us[1] TRUE the %in% function
Ex:
lst<-list(“apple”, “banana”, “cherry”)
“apple” %in% lst
Output: [1] TRUE
Remove List Items:
You can also remove list items. The following example creates a new, updated list without an
“apple” item:
Ex:
lst<-list(“apple”, “banana”, “cherry”)
nl<-lst[-1]
nl
Output:
[[1]]
[1] “banana”
[[2]]
[1]”cherry”
Converting list to vector
There is a drawback with the list, i.e., we cannot perform all the arithmetic operations on list
elements. To remove this, drawback R provides unlist() function. This function converts the
list into vectors. In some cases, it is required to convert a list into a vector so that we can use
the elements of the vector for further manipulation. The unlist() function takes the list as a
parameter and change into a vector. Let see an example to understand how to unlist() function
is used in R.
Example
1. # Creating lists.
2. list1 <- list(10:20)
3. print(list1)
4. list2 <-list(5:14)
5. print(list2)
6. # Converting the lists to vectors.
7. v1 <- unlist(list1)
8. v2 <- unlist(list2)
9. print(v1)
10. print(v2)
11. adding the vectors
12. result <- v1+v2
13. print(result)
Output:
[[1]]
[1] 1 2 3 4 5
[[1]]
[1] 10 11 12 13 14
[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19
Array
Arrays are essential data storage structures defined by a fixed number of dimensions. Arrays
are used for the allocation of space at contiguous memory locations. Uni-dimensional arrays
are called vectors with the length being their only dimension. Two-dimensional arrays are
called matrices, consisting of fixed numbers of rows and columns. Arrays consist of all
elements of the same data type. Vectors are supplied as input to the function and then create
an array based on the number of dimensions.
There is the following syntax of R arrays:
1. array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
2. data :The data is the first argument in the array() function. It is an input vector which is
given to the array.
3. matrices : In R, the array consists of multi-dimensional matrices.
4. row_size : This parameter defines the number of row elements which an array can store.
5. column_size : This parameter defines the number of columns elements which an array
can store.
6. dim_names : This parameter is used to change the default names of rows, columns,
layers and blocks.
How to create?
In R, array creation is quite simple. We can easily create an array using vector and array()
function. In array, data is stored in the form of the matrix. There are only two steps to create a
matrix which are as follows
1. In the first step, we will create two vectors of different lengths.
2. Once our vectors are created, we take these vectors as inputs to the array.
Let see an example to understand how we can implement an array with the help of the vectors
and array() function.
Example
1. #Creating two vectors of different lengths
2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4. #Taking these vectors as input to the array
5. res <- array(c(vec1,vec2),dim=c(3,3,2))
6. print(res)
Output
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
Naming rows and columns
In R, we can give the names to the rows, columns, and matrices of the array. This is done
with the help of the dim name parameter of the array() function. It is not necessary to give the
name to the rows and columns. It is only used to differentiate the row and column for better
understanding. Below is an example, in which we create two arrays and giving names to the
rows, columns, and matrices.
Example
1. #Creating two vectors of different lengths
2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4. #Initializing names for rows, columns and matrices
5. col_names <- c("Col1","Col2","Col3")
6. row_names <- c("Row1","Row2","Row3")
7. matrix_names <- c("Matrix1","Matrix2")
8. #Taking the vectors as input to the array
9.res<array(c(vec1,vec2),dim=c(3,3,2),dimnames=list(row_names,col_names,matrix_names)
10. print(res)
Output
, , Matrix1
, , Matrix2
7. , , Matrix2
8. Col1 Col2 Col3
9. Row1 1 10 13
10. Row2 3 11 14
11. Row3 5 12 15
12.
13. Col1 Col2 Col3
14. 5 12 15
15.
16. [1] 13
17.
18. Col1 Col2 Col3
19. Row1 1 10 13
20. Row2 3 11 14
21. Row3 5 12 15
Manipulation of elements
The array is made up matrices in multiple dimensions so that the operations on elements of an
array is carried out by accessing elements of the matrices.
Example
1. #Creating two vectors of different lengths
2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4. #Taking the vectors as input to the array1
5. res1 <- array(c(vec1,vec2),dim=c(3,3,2))
6. print(res1)
7. #Creating two vectors of different lengths
8. vec1 <-c(8,4,7)
9. vec2 <-c(16,73,48,46,36,73)
10. #Taking the vectors as input to the array2
11. res2 <- array(c(vec1,vec2),dim=c(3,3,2))
12. print(res2)
13. #Creating matrices from these arrays
14. mat1 <- res1[,,2]
15. mat2 <- res2[,,2]
16. res3 <- mat1+mat2
17. print(res3)
Output
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,1
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
,,2
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
[1] 12
Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column contains
values of one variable, and rows contains one set of values from each column. A data frame is
a special case of the list in which each component has equal length. A data frame is used to
store data table and the vectors which are present in the form of a list in a data frame, are of
equal length. In a simple way, it is a list of equal length vectors. A matrix can contain one
type of data, but a data frame can contain different data types such as numeric, character,
factor, etc. There are following characteristics of a data frame.
o Rectangular structure: Data frames are two dimensional structures with rows and
columns forming a rectangular shape. All columns must have same number of rows, making
them suitable for structured datasets.
o Column Names: The columns name should be non-empty.
o The rows name should be unique.
o The data which is stored in a data frame can be a factor, numeric, or character type.
o Each column contains the same number of data items.
How to create Data Frame?
In R, the data frames are created with the help of frame() function of data. This function
contains the vectors of any type such as numeric, character, or integer. In below example, we
create a data frame that contains employee id (integer vector), employee name(character
vector), salary(numeric vector), and starting date(Date vector).
Example
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,915.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Printing the data frame.
12. print(emp.data)
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita915.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
Getting the structure of R Data Frame
In R, we can find the structure of our data frame. R provides an in-build function called str()
which returns the data with its complete structure. In below example, we have created a
frame using a vector of different data type and extracted the structure of it.
Example
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
7. "2015-03-27")),
8. stringsAsFactors = FALSE
9. )
10. # Printing the structure of data frame.
11. str(emp.data)
Output
'data.frame': 5 obs. of 4 variables:
$ employee_id : int 1 2 3 4 5
$ employee_name: chr "Shubham" "Arpita" "Nishka" "Gunjan" ...
$ sal : num 623 515 611 729 843
$ starting_date: Date, format: "2012-01-01" "2013-09-23" ...
Factors
The factor is a data structure which is used for fields which take only predefined finite
number of values. These are the variable which takes a limited number of different values.
These are the data objects which are used to categorize the data and to store it on multiple
levels. It can store both integers and strings values, and are useful in the column that has a
limited number of unique values. Factors have labels which are associated with the unique
integers stored in it. It contains predefined set value known as levels and by default R always
sorts levels in alphabetical order. Attributes of a factor There are the following attributes of a
factor in R
a. X
It is the input vector which is to be transformed into a factor.
b. levels
It is an input vector that represents a set of unique values which are taken by x.
c. labels
It is a character vector which corresponds to the number of labels.
d. Exclude
It is used to specify the value which we want to be excluded,
e. ordered
It is a logical attribute which determines if the levels are ordered.
f. nmax
It is used to specify the upper bound for the maximum number of level.
R provides factor() function to convert the vector into factor. There is the following syntax of
factor() function
1. factor_data<- factor(vector)
Let's see an example to understand how factor function is used.
Example
1. # Creating a vector as input.
2.data<c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","
Sumit","Ar pita","Sumit")
4. print(data)
5. print(is.factor(data))
6.
7. # Applying the factor function.
8. factor_data<- factor(data)
9.
10. print(factor_data)
11. print(is.factor(factor_data))
Output
[1] "Shubham" "Nishka" "Arpita" "Nishka" "Shubham" "Sumit" "Nishka"
[8] "Shubham" "Sumit" "Arpita" "Sumit"
[1] FALSE
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] TRUE
[1] Nishka
Levels: Arpita Nishka Shubham Sumit
[1] Shubham Nishka Arpita Shubham Sumit Nishka Shubham Sumit Arpita
[10] Sumit
Levels: Arpita Nishka Shubham Sumit
Modification of factor
Like data frames, R allows us to modify the factor. We can modify the value of a factor by
simply re-assigning it. In R, we cannot choose values outside of its predefined levels means
we cannot insert value if it's level is not present on it. For this purpose, we have to create a
level of that value, and then we can add it to our factor. Let's see an example to understand
how the modification is done in factors.
Example
1. # Creating a vector as input.
2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham")
4. # Applying the factor function.
5. factor_data<- factor(data)
7. #Printing all elements of factor
8. print(factor_data)
10. #Change 4th element of factor with sumit
11. factor_data[4] <-"Arpita"
12. print(factor_data)
14. #change 4th element of factor with "Gunjan"
15. factor_data[4] <- "Gunjan" # cannot assign values outside levels
16. print(factor_data)
17.
18. #Adding the value to the level
19. levels(factor_data) <- c(levels(factor_data),"Gunjan")#Adding new level
20. factor_data[4] <- "Gunjan"
21. print(factor_data)
Output
[1] Shubham Nishka Arpita Nishka Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Arpita Shubham
Levels: Arpita Nishka Shubham
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = "Gunjan") :
invalid factor level, NA generated
[1] Shubham Nishka Arpita Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Gunjan Shubham
Levels: Arpita Nishka Shubham Gunjan
Special Values:
NA(Not Available): NA represents missing or undefined values. It is used to indicate the
absence of a value. It is often used in data analysis to handle missing data points.
Ex:
v<- c(1,2, 3)
v
length(v)<-4
v
Output:
[1]1 2 3
[1] 1 2 3 NA
NaN (Not a Number): NaN represents an undefined or unpresentable value in numerical
calculations. It is often used when a mathematical operation doesn’t result in a valid numeric
value.
Ex:
0/0 output: [1] NaN
Inf and –Inf (Positive and Negative Infinity): Inf represents positive infinity and –
Inf represents negative infinity. These values are that are beyond the representable range.
2^1024 Output [1] Inf -2^1024 Output [1] -Inf
Classes in R Programming
A class is just a blueprint or a sketch of these objects. It represents the set of properties or
methods that are common to all objects of one type. Unlike most other programming
languages, R has a three-class system. These are S3, S4, and Reference Classes.
S3 Class
S3 is the simplest yet the most popular OOP system and it lacks formal definition and
structure. An object of this type can be created by just adding an attribute to it. Following is
an example to make things more clear:
Example:
# create a list with required components
movieList <- list(name = "Iron man", leadActor = "Robert Downey Jr")
# give a name to your class
class(movieList) <- "movie"
movieList
Output
$name
[1] "Iron man"
$leadActor
[1] "Robert Downey Jr
In S3 systems, methods don‟t belong to the class. They belong to generic functions. It means
that we can‟t create our own methods here, as we do in other programming languages like C+
+ or Java. But we can define what a generic method (for example print) does when applied to
our objects.
print(movieList)
Output:
$name
[1] "Iron man"
$leadActor
[1] "Robert Downey Jr"
S4 Class
Programmers of other languages like C++, Java might find S3 to be very much different than
their normal idea of classes as it lacks the structure that classes are supposed to provide. S4 is
a slight improvement over S3 as its objects have a proper definition and it gives a proper
structure to its objects.
Example:
library(methods)
# definition of S4 class
setClass("movies", slots=list(name="character", leadActor = "character"))
# creating an object using new() by passing class name and slot values
Reference Class
Reference Class is an improvement over S4 Class. Here the methods belong to the classes.
These are much similar to object-oriented classes of other languages. Defining a Reference
class is similar to defining S4 classes. We use setRefClass() instead of setClass() and “fields”
instead of “slots”.
Example:
library(methods)
# setRefClass returns a generator
movies <- setRefClass("movies", fields = list(name = "character",
leadActor = "character", rating = "numeric"))
#now we can use the generator to create objects
Coercion
When you call a function with an argument of the wrong type, R will try to coerce values to a
different type so that the function will work. There are two types of coercion that occur
automatically in R: coercion with formal objects and coercion with built-in types.
Coercion includes type conversion . Type conversion means change of one type of data into
another type of data. We have to type of coercion occurs :
1. Implicit Coercion
2. Explicit Coercion
Implicit Coercion : When type conversion occurs by itself in R. We input numeric and
character data in an object . R converts numeric data to character data by itself Implicit
coercion occurs when we operate on a vector in a way that is not intended for its type. For
example, if we add 1 to a logical vector, then the logical values are converted to 0s and 1s
implicitly,
and 1 is added to each element
Explicit Coercion : In explicit coercion , we can change one data type to another data type
by applying function. We create an object “x” which stores integer values from 1 to 6.
x<-0:6
We can check data type of “x” object.
class(x)
class(z)
Basic plots
Basic plots in R. R has a number of built-in tools for basic graph types such as histograms,
scatter plots, bar charts, boxplots and much more. Rather than going through all of different
types, we will focus on plot() , a generic function for plotting x-y data. Plot The plot()
function is used to draw points (markers) in a diagram. The function takes parameters for
specifying points in the diagram.
At its simplest, you can use the plot() function to plot two numbers against each other:
Graph plotting in R is of two types:
Two-dimensional Plotting
In two-dimensional plotting, we visualize and compare one variable with respect to the other.
For example, in a dataset of Air Quality measures, we would like to compare how the AQI
varies with the temperature at a particular place. So, temperature and AQI are two different
variables and we wish to see how one changes with respect to the other. These are the 3 major
kinds of graphs used for such kinds of analysis
R - Bar Charts
A bar chart represents data in rectangular bars with length of the bar proportional to the value
of the variable. R uses the function barplot() to create bar charts. R can draw both vertical and
Horizontal bars in the bar chart. In bar chart each of the bars can be given different colors.
Syntax
The basic syntax to create a bar-chart in R is –
barplot(H,xlab,ylab,main, names.arg,col)
Histograms in R language
A histogram contains a rectangular area to display the statistical information which is
proportional to the frequency of a variable and its width in successive numerical intervals. A
graphical representation that manages a group of data points into different specified ranges. It
has a special feature that shows no gaps between the bars and is similar to a vertical bar
graph.
R – Histograms
We can create histograms in R Programming Language using the hist() function.
Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
Parameters:
v: This parameter contains numerical values used in histogram.
main: This parameter main is the title of the chart.
col: This parameter is used to set color of the bars.
xlab: This parameter is the label for horizontal axis.
border: This parameter is used to set border color of each bar.
xlim: This parameter is used for plotting values of x-axis.
ylim: This parameter is used for plotting values of y-axis.
breaks: This parameter is used as width of each bar.
Creating a simple Histogram in R
Creating a simple histogram chart by using the above parameter. This vector v is plot using
hist().
Example:
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
# Create the histogram.
hist(v, xlab = "No.of Articles ", col = "green", border = "black")
Boxplots in R Language
A box graph is a chart that is used to display information in the form of distribution by
drawing
boxplots for each of them. This distribution of data is based on five sets (minimum, first
quartile, median, third quartile, and maximum).
Boxplots in R Programming Language
Boxplots are created in R by using the boxplot() function.
Syntax: boxplot(x, data, notch, varwidth, names, main)
Parameters:
x: This parameter sets as a vector or a formula.
data: This parameter sets the data frame.
notch: This parameter is the label for horizontal axis.
varwidth: This parameter is a logical value. Set as true to draw width of the box
proportionate to the sample size.
main: This parameter is the title of the chart.
names: This parameter are the group labels that will be showed under each boxplot.
Creating a Dataset
We use the data set “mtcars”.
Let‟s look at the columns “mpg” and “cyl” in mtcars.
input <- mtcars[, c('mpg', 'cyl')]
print(head(input))
Output:
mpg cyl
Mazda RX4
21.0 6
Mazda RX4 Wag 21.0 6
Datsun 710
22.8 4
Hornet 4 Drive 21.4 6
Hornet Sportabout 18.7 8
Valiant
18.1 6
Scatter Plot:
Scatterplots are useful for visualizing the relationship and distribution of data points and for
identifying patterns, clusters or outliners.
x<-c(1, 2, 3, 4, 5)
y<-c(10, 8, 15, 7, 12)
plot(x, y, type="p", pch=19, col="blue", main="Scatter Plot", xlab="X-Axis", ylab="Y-Axis")
Line Plot:
A line plot in R is used to display data points connected by lines. It’s a useful visualization for
showing trends and changes in data over time across a continues variable.
plot(x, y, type="l", lwd=2, col="red", main="Line Plot", xlab="X-Axis", ylab="Y-Axis")
Pie Chart
A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical
proportion. Pie charts represents data visually as a fractional part of a whole, which can be an
effective communication tool.
expenditure <- c(600, 300, 150, 100, 200)
pie(expenditure, main = "Monthly Expenditure Breakdown",
labels = c("Housing", "Food", "Cloths", "Entertainment", "Other") )