0% found this document useful (0 votes)
50 views52 pages

unit -1 notes R programming

The document provides an introduction to R programming, detailing its history, features, applications, advantages, and disadvantages. R is a powerful, open-source language used for statistical analysis and data visualization, developed by Ross Ihaka and Robert Gentleman. It supports various data types and has a rich set of packages, making it a popular choice among data scientists and statisticians.

Uploaded by

Chaya Anu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views52 pages

unit -1 notes R programming

The document provides an introduction to R programming, detailing its history, features, applications, advantages, and disadvantages. R is a powerful, open-source language used for statistical analysis and data visualization, developed by Ross Ihaka and Robert Gentleman. It supports various data types and has a rich set of packages, making it a popular choice among data scientists and statisticians.

Uploaded by

Chaya Anu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

UNIT - 1

Introduction of the language


Introduction of the language, numeric, arithmetic, assignment, and vectors, Matrices and Arrays, Non-
numeric Values, Lists and Data Frames, Special Values, Classes, and Coercion, Basic Plotting.
Introduction to R programming
R is a programming language and software environment for statistical analysis, graphics
representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently developed by the R Development
Core Team. The core of R is an interpreted computer language which allows branching and
looping as well as modular programming using functions. R allows integration with the
procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency. R is
freely available under the GNU General Public License, and pre-compiled binary versions are
provided for various operating systems like Linux, Windows and Mac.R is free software
distributed under a GNU-style copy left, and an official part of the GNU project called GNU
S.
What is R Programming?
"R is an interpreted computer programming language which was created by Ross Ihaka and
Robert Gentleman at the University of Auckland, New Zealand." The R Development Core
Team currently develops R. It is also a software environment used to analyse statistical
information, graphical representation, reporting, and data modelling. R is the implementation
of the S programming language, which is combined with lexical scoping semantics. 18:10 R
not only allows us to do branching and looping but also allows to do modular programming
using functions. R allows integration with the procedures written in the C, C++, .Net, Python,
and FORTRAN languages to improve efficiency. In the present era, R is one of the most
important tools which is used by researchers, data analyst, statisticians, and marketers for
retrieving, cleaning, analysing, visualizing, and presenting data.

Evolution of R
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand, and is currently developed by the R Development Core Team. R is freely available
under the GNU General Public License, and pre-compiled binary versions are provided for
various operating systems like Linux, Windows and Mac. This programming language was
named R, based on the first letter of first name of the two R authors (Robert Gentleman and
Ross Ihaka), and partly a play on the name of the Bell Labs Language S. It also combines
with lexical scoping semantics inspired by Scheme. Moreover, the project conceives in 1992,
with an initial version released in 1995 and a stable beta version in 2000.
Why R Programming Language?

 R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
 It’s a platform-independent language. This means it can be applied to all operating
system.
 It’s an open-source free language. That means anyone can install it in any organization
without purchasing a license.
 R programming language is not only a statistic package but also allows us to integrate
with other languages (C, C++). Thus, you can easily interact with many data sources and
statistical packages.
 The R programming language has a vast community of users and its growing day by day.
 R is currently one of the most requested programming languages in the Data Science job
market that makes it the hottest trend nowadays.

Features of R
R is a domain-specific programming language which aims to do data analysis. It has some
unique features which make it very powerful. The most important arguably being the notation
of vectors. These vectors allow us to perform a complex operation on a set of values in a
single command. These are the following features of R programming:
 R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
 R has an effective data handling and storage facility, R provides a suite of operators for
calculations on arrays, lists, vectors and matrices.
 R provides a large, coherent and integrated collection of tools for data analysis.
 R provides graphical facilities for data analysis and display either directly at the
computer or printing at the papers.
 It is a well-designed, easy, and effective language which has the concepts of user-defined,
looping, conditional, and various I/O facilities.
 It has a consistent and incorporated set of tools which are used for data analysis.
 For different types of calculation on arrays, lists and vectors, R contains a suite of
operators.
 It provides effective data handling and storage facility.
 It is an open-source, powerful, and highly extensible software.
 It provides highly extensible graphical techniques.
 It allows us to perform multiple calculations using vectors.
 R is an interpreted language.

Applications of R:
 We use R for Data Science. It gives us a broad variety of libraries related to statistics. It
also provides the environment for statistical computing and design.
 R is used by many quantitative analysts as its programming tool. Thus, it helps in data
importing and cleaning.
 R is the most prevalent language. So many data analysts and research programmers use it.
Hence, it is used as a fundamental tool for finance.
 Tech giants like Google, Facebook, Bing, Twitter, Accenture, Wipro and many more using
R nowadays.

R Advantages and Disadvantages


R is the most popular programming language for statistical modeling and analysis. Like other
programming languages, R also has some advantages and disadvantages. It is a continuously
evolving language which means that many cons will slowly fade away with future updates to
R. There are the following pros and cons of R
Advantages
1. Open Source
An open-source language is a language on which we can work without any need for a
license or a fee. R is an open-source language. We can contribute to the development of R
by optimizing our packages, developing new ones, and resolving issues.
2. Platform Independent
R is a platform-independent language or cross-platform programming language which
means its code can run on all operating systems. R enables programmers to develop
software for several competing platforms by writing a program only once. R can run quite
easily on Windows, Linux, and Mac.
3. Machine Learning Operations
R allows us to do various machine learning operations such as classification and
regression. For this purpose, R provides various packages and features for developing the
artificial neural network. R is used by the best data scientists in the world.
4. Exemplary support for data wrangling
R allows us to perform data wrangling. R provides packages such as dplyr, readr which
are capable of transforming messy data into a structured form.
5. Quality plotting and graphing
R simplifies quality plotting and graphing. R libraries such as ggplot2 and plotly
advocates for visually appealing and aesthetic graphs which set R apart from other
programming languages.
6. The array of packages
R has a rich set of packages. R has over 10,000 packages in the CRAN repository which
are constantly growing. R provides packages for data science and machine learning
operations.
7. Statistics
R is mainly known as the language of statistics. It is the main reason why R is
predominant than other programming languages for the development of statistical tools.
8. Continuously Growing
R is a constantly evolving programming language. Constantly evolving means when
something evolves, it changes or develops over time, like our taste in music and clothes,
which evolve as we get older. R is a state of the art which provides updates whenever any
new feature is added.
Disadvantages
1. Data Handling
In R, objects are stored in physical memory. It is in contrast with other programming
languages like Python. R utilizes more memory as compared to Python. It requires the
entire data in one single place which is in the memory. It is not an ideal option when we
deal with Big Data.
2. Basic Security
R lacks basic security. It is an essential part of most programming languages such as
Python. Because of this, there are many restrictions with R as it cannot be embedded in a
web-application.
3. Complicated Language
R is a very complicated language, and it has a steep learning curve. The people who don't
have prior knowledge or programming experience may find it difficult to learn R.
4. Weak Origin
The main disadvantage of R is, it does not have support for dynamic or 3D graphics. The
reason behind this is its origin. It shares its origin with a much older programming
language "S."
5. Lesser Speed
R programming language is much slower than other programming languages such as
MATLAB and Python. In comparison to other programming language, R packages are
much slower. In R, algorithms are spread across different packages. The programmers
who have no prior knowledge of packages may find it difficult to implement algorithms.
Fundamentals of R
Basic Syntax in R Programming
R is the most popular language used for Statistical Computing and Data Analysis with
the support of over 10, 000+ free packages in CRAN repository. Like any other
programming language, R has a specific syntax which is important to understand if you
want to make use of its powerful features. This article assumes R is already installed on
your machine. We will be using RStudio but we can also use R command prompt by typing
the following command in the command line.
$R
This will launch the interpreter and now let’s write a basic Hello World program to get
started.
A program in R is made up of three things: Variables, Comments, and Keywords. Variables
are used to store the data, Comments are used to improve code readability, and Keywords
are reserved words that hold a specific meaning to the compiler.

Variables in R
Previously, we wrote all our code in a print () but we don’t have a way to address them as
to perform further operations. This problem can be solved by using variables which like
any other programming language are the name given to reserved memory locations that can
store any type of data. In R, the assignment can be denoted in three ways:
1. = (Simple Assignment)
2. <- (Leftward Assignment)
3. -> (Rightward Assignment)

Example: Output:
"Simple Assignment"
"Leftward Assignment!"
"Rightward Assignment"
The rightward assignment is less common and can be confusing for some programmers,
so it is generally recommended to use the <- or = operator for assigning values in R.
R Script File

Usually, you will do your programming by writing your programs in script files and
then you execute those scripts at your command prompt with the help of R interpreter
called Rscript. So let's start with writing following code in a text file called test.R as
under:

# My first program in R Programming

myString <- "Hello, World!"

print (myString)

Save the above code in a file test.R and execute it at Linux command prompt as given
below. Even if you are using Windows or other system, syntax will remain same.

$ Rscript test.R
When we run the above program, it produces the following result.
Comments in R
Comments are a way to improve your code’s readability and are only meant for the user
so the interpreter ignores it. Only single-line comments are available in R but we can
also use multiline comments by using a simple trick which is shown below. Single line
comments can be written by using # at the beginning of the statement.
Example:

Output:
[1] "This is fun!"
From the above output, we can see that both comments were ignored by the interpreter. In
R programming, comments are the programmer readable explanation in the source code of
an R program. The purpose of adding these comments is to make the source code easier to
understand. These comments are generally ignored by compilers and interpreters. In R
programming there is only single-line comment. R doesn't support multi-line comment. But
if we want to perform multi-line comments, then we can add our code in a false block.

Single-line comment
#My First program in R programming

string <-"Hello World!"

print(string)

The trick for multi-line comment

#Trick for multi-line comment


if(FALSE) {
"R is an interpreted computer programming language which was created by
Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand "
}
#My First program in R programming
string <-"Hello World!"
print(string)

Data Types in R Programming


In programming languages, we need to use various variables to store various information.
Variables are the reserved memory location to store values. As we create a variable in our
program, some space is reserved in memory. In R, there are several data types such as
integer, string, etc. The operating system allocates memory based on the data type of the
variable and decides what can be stored in the reserved memory. There are the following
data types which are used in R programming:
Data type Example Description
Logical True, False It is a special data type for data with only two
possible values which can be construed as true/false
Numeric 12,32,112,5432 Decimal value is called numeric in R, and it is the
default computational data type.
Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,
Complex A Z=1+2i, t=7+3i complex value in R is defined as the pure imaginary
value i.
Character 'a', '"good'", In R programming, a character is used to represent
"TRUE", '35.4' string values. We convert objects into character
values with the help of as. character () function.
Raw. as.raw() A raw data type is used to holds raw bytes
Let's see an example for better understanding of data types:
Logical Data type
A logical value is often created via comparison between variables.
variable_logical<- TRUE
cat(variable_logical,"\n")
cat ("The data type of variable_logical is ",class(variable_logical),"\n\n")
Numeric Data type
Decimal values are called numeric in R. It is the default R data type for numbers in R. If you
assign a decimal value to a variable x as follows, x will be of numeric type. Real numbers
with a decimal point are represented using this data type in R. it uses a format for double-
precision floating-point numbers to represent numerical values. It is the default computational
data type. If we assign a decimal value to a variable x as follows, x will be of numeric type.
Sample Program:
variable_numeric<- 3532
cat(variable_numeric,"\n")
cat ("The data type of variable_numeric is ",class(variable_numeric),"\n\n")

Integer Data type


R supports integer data types which are the set of all integers. we can create as well as
convert a value into an integer type using the as.integer() function. we can also use the capital
‘L’ notation as a suffix to denote that a particular value is of the integer R data type.
Sample Program:
variable_integer<- 133L
cat(variable_integer,"\n")
cat("The data type of variable_integer is ",class(variable_integer),"\n\n")
OR
y = as.integer(3)
y # print the value of y
[1] 3
class(y) # print the class name of y
[1] "integer"
is.integer(y) # is y an integer?
[1] TRUE

complex Data type


R supports complex data types that are set of all the complex numbers. The complex data
type is to store numbers with an imaginary component.
Sample Program:
variable_complex<- 3+2i
cat(variable_complex,"\n")
cat("The data type of variable_complex is ",class(variable_complex),"\n\n")

Character Data type


R supports character data types where you have all the alphabets and special characters. It
stores character values or strings. Strings in R can contain alphabets, numbers, and symbols.
The easiest way to denote that a value is of character type in R data type is to wrap the value
inside single or double inverted commas.
Sample Program
x = as.character(3.14)
x
[1] "3.14"
class(x)
# print the character string
# print the class name of x
[1] "character"
OR
variable_char<- "Learning r programming"
cat(variable_char,"\n")
cat("The data type of variable_char is ",class(variable_char),"\n\n")

#Raw Data type


To save and work with data at the byte level in R, use the raw data type. By displaying a
series of unprocessed bytes, it enables low-level operations on binary data. Here are some
speculative data on R’s raw data types:
Sample Program
variable_raw<- charToRaw("Learning r programming")
cat(variable_raw,"\n")
cat("The data type of variable_char is ",class(variable_raw),"\n\n")

Keywords in R
Keywords are the words reserved by a program because they have a special meaning thus a
keyword can’t be used as a variable name, function name, etc. We can view these keywords
by using either help(reserved) or reserved.

 if, else, repeat, while, function, for, in, next and break are used for control-flow
statements and declaring user-defined functions.
 The ones left are used as constants like TRUE/FALSE are used as boolean constants.
 NaN defines Not a Number value and NULL are used to define an Undefined value.
 Inf is used for Infinity values.

Variables in R Programming
Variables are used to store the information to be manipulated and referenced in the R
program. The R variable can store an atomic vector, a group of atomic vectors, or a
combination of many R objects.
Name of Validity Reason for valid and invalid
variable
_var_name Invalid Variable name can't start with an underscore (_).
var_name, Valid Variable can start with a dot, but dot should not be followed by a
var.name number. In this case, the variable will be invalid.
var_name% Invalid In R, we can't use any special character in the variable name
except dot and underscore.
2var_name Invalid Variable name cant starts with a numeric digit.
.2var_name Invalid A variable name cannot start with a dot which is followed by a
digit.
var_name2 Valid The variable contains letter, number and underscore and starts
with a letter.
Language like C++ is statically typed, but R is a dynamically typed, means it check the type
of data type when the statement is run. A valid variable name contains letter, numbers, dot
and underlines characters. A variable name should start with a letter or the dot not followed
by a number.
Assignment of variable
In R programming, there are three operators which we can use to assign the values to the
variable. We can use leftward, rightward, and equal_to operator for this purpose. There are
two functions which are used to print the value of the variable i.e., print() and cat(). The cat()
function combines multiples values into a continuous print output.
# Assignment using equal operator.
variable.1 = 124
# Assignment using leftward operator.
variable.2 <- "Learn R Programming"
# Assignment using rightward operator.
133L -> variable.3
print(variable.1)
cat ("variable.1 is ", variable.1 ,"\n")
cat ("variable.2 is ", variable.2 ,"\n")
cat ("variable.3 is ", variable.3 ,"\n")
Operators in R
In computer programming, an operator is a symbol which represents an action. An operator is
a symbol which tells the compiler to perform specific logical or mathematical manipulations.
R programming is very rich in built-in operators. In R programming, there are different types
of operator, and each operator performs a different task. For data manipulation, there are
some advance operators also such as model formula and list indexing. There are the following
types of operators used in R:
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
1) Arithmetic Operators:
Arithmetic operations in R simulate various math operations, like addition, subtraction,
multiplication, division, and modulo using the specified operator between operands, which
may be either scalar values, complex numbers, or vectors. The R operators are performed
element-wise at the corresponding positions of the vectors.
Sl. No Operator Operator Name Description Example
1. + Addition This operator is used to add two b <- c(11, 5, 3)
vectors in R. a <- c(2, 3.3, 4) print(a+b)
It will give us the following
output: [1] 13.0 8.3 5.0
2. - Subtraction This operator is used to divide a b <- c(11, 5, 3)
vector from another one. a <- print(a-b)
c(2, 3.3, 4) It will give us the following
output: [1] -9.0 -1.7 3.0
3. * Multiplication This operator is used to multiply b <- c(11, 5, 3)
two vectors with each other. a <- print(a*b)
c(2, 3.3, 4) It will give us the following
output: [1] 22.0 16.5 4.0
4. / Division This operator divides the vector b <- c(11, 5, 3)
from another one. a <- c(2, 3.3, print(a/b)
4) It will give us the following
output: [1] 0.1818182
0.6600000 4.0000000
5 %% Modulus This operator is used to find the b <- c(11, 5, 3)
remainder of the first vector with print(a%%b)
the second vector. a <- c(2, 3.3, It will give us the following
4) output: [1] 2.0 3.3 0
6. %/% Integer Division This operator is used to find the a <- c(2, 3.3, 4)
division of the first vector with b <- c(11, 5, 3)
the second(quotient). print(a%/%b) It will give us
the following output: [1] 0 0
4
7. ^ Exponents This operator raised the first b <- c(11, 5, 3)
vector to the exponent of the print(a^b)
second vector. a <- c(2, 3.3, 4) It will give us the following
output: [1] 0248.0000
391.3539 4.0000

2) Relational Operators
A relational operator is a symbol which defines some kind of relation between two entities.
These include numerical equalities and inequalities. A relational operator compares each
element of the first vector with the corresponding element of the second vector. The result of
the comparison will be a Boolean value. There are the following relational operators which
are supported by R:
Sl. No Operator Operator Name Description Example
1. > Greater than (>) This operator will return TRUE a <- c(1, 3, 5)
when every element in the first b <- c(2, 4, 6)
vector is greater than the print(a>b)
corresponding element of the It will give us the
second vector. following
output:
[1] FALSE FALSE
FALSE
2. < Less than This operator will return TRUE a <- c(1, 9, 5)
when every element in the first b <- c(2, 4, 6)
vector is less than the print(a<b)
corresponding element of the It will give us the
second vector. following
output:
[1] FALSE TRUE
FALSE
3. <= Less than equal This operator will return TRUE a <- c(1, 3, 5)
to when every element in the first b <- c(2, 3, 6)
vector is less than or equal to the print(a<=b)
corresponding element of another It will give us the
vector. following
output:
[1] TRUE TRUE TRUE
4. >= Greater than This operator will return TRUE a <- c(1, 3, 5)
equal to when every element in the first b <- c(2, 3, 6)
vector is greater than or equal to the print(a>=b)
corresponding element of another It will give us the
vector. following
output:
[1] FALSE TRUE
FALSE
5. == Equal to equal This operator will return TRUE a <- c(1, 3, 5)
to when every element in the first b <- c(2, 3, 6)
vector is equal to the corresponding print(a==b)
element of the second vector. It will give us the
following
output:
[1] FALSE TRUE
FALSE
6. != Not equal to This operator will return TRUE a <- c(1, 3, 5)
when every element in the first b <- c(2, 3, 6)
vector is not equal to the print(a>=b)
corresponding element of the It will give us the
second vector. following
output:
[1] TRUE FALSE TRUE

3) Logical Operators
The logical operators allow a program to make a decision on the basis of multiple conditions.
In the program, each operand is considered as a condition which can be evaluated to a false or
true value. The value of the conditions is used to determine the overall value of the op1
operator op2. Logical operators are applicable to those vectors whose type is logical,
numeric, or complex. The logical operator compares each element of the first vector with the
corresponding element of the second vector. There are the following types of operators which
are supported by R:
Sl. No Operator Operator Name Description Example
1. & Logical AND This operator is known as the a <- c(3, 0, TRUE, 2+2i)
operator Logical AND operator. This b <- c(2, 4, TRUE, 2+3i)
operator takes the first element of print(a&b)
both the vector and returns TRUE It will give us the
if both the elements are TRUE. following
output: [1] TRUE FALSE
TRUE TRUE
2. | Logical OR This operator is called the Logical a <- c(3, 0, TRUE, 2+2i)
operator OR operator. This operator takes b <- c(2, 4, TRUE, 2+3i)
the first element of both the vector print(a|b)
and returns TRUE if one of them It will give us the
is TRUE. following
output:
[1] TRUE TRUE TRUE
TRUE
3. ! NOT operator This operator is known as Logical a <- c(3, 0, TRUE, 2+2i)
NOT operator. This operator takes print(!a)
the first element of the vector and It will give us the
gives the opposite logical value as following
a result. output: [1] FALSE TRUE
FALSE FALSE
4. && Logical This operator takes the first a <- c(3, 0, TRUE, 2+2i)
AND operator element of both the vector and b <- c(2, 4, TRUE, 2+3i)
gives TRUE as a result, only if print(a&&b)
both are TRUE. It will give us the
following
output: [1] TRUE
5. || Logical OR This operator takes the first a <- c(3, 0, TRUE, 2+2i)
operator element of both the vector and b <- c(2, 4, TRUE, 2+3i)
gives the result TRUE, if one of print(a||b)
them is true. It will give us the
following
output:
[1] TRUE

4) Assignment Operators
Assignment operators in R are used to assigning values to various data objects in R. The
objects may be integers, vectors, or functions. These values are then stored by the assigned
variable names. There are two kinds of assignment operators: Left and Right
Sl. No Operator Operator Name Description Example
1. <- or = or Left Assignment These operators are known as left a <- c(3, 0, TRUE, 2+2i)
<<- assignment operators. b <<- c(2, 4, TRUE,
2+3i) d = c(1, 2, TRUE,
2+3i) print(a)
print(b)
print(d)
It will give us the
following
output:[1] 3+0i 0+0i 1+0i
2+2i
[1] 2+0i 4+0i 1+0i 2+3i
[1] 1+0i 2+0i 1+0i 2+3i
2. -> or ->> Right Assignment These operators are known as c(3, 0, TRUE, 2+2i) -> a
right assignment operators. c(2, 4, TRUE, 2+3i) ->>
b print(a) print(b)
It will give us the
following
output: [1] 3+0i 0+0i
1+0i 2+2i
[1] 2+0i 4+0i 1+0i 2+3i
5) Miscellaneous Operators
These are the mixed operators in R that simulate the printing of sequences and assignment of
vectors, either left or right-handed.
Sl. No Operator Operator Name Description Example
1. : Colon operator It creates the series of numbers in v <- 2:8
sequence for a vector. print(v)
It produces the following
result −
[1] 2 3 4 5 6 7 8
2. %in% %in% Operator This operator is used to identify if v1 <- 8
an element belongs to a vector. v2 <- 12
t <- 1:10
print(v1 %in% t)
print(v2 %in% t)
It produces the following
result −
[1] TRUE
[1] FALSE
3. %*% %*% Operator This operator is used to multiply a M =
matrix with its transpose. matrix( c(2,6,5,1,10,4),
Transpose of the matrix is nrow = 2,ncol = 3,
obtained by interchanging the byrow = TRUE)
rows to columns and columns to t = M %*% t(M)
rows. The number of columns of print(t)
the first matrix must be equal to It produces the following
the number of rows of the second result −
matrix. Multiplication of the [,1] [,2]
matrix A with its transpose, B, [1,] 65 82
produces a square matrix. [2,] 82 117

Data Structures in R Programming


A data structure is a particular way of organizing data in a computer so that it can be used
effectively. The idea is to reduce the space and time complexities of different tasks. Data
structures in R programming are tools for holding multiple values. R‟s base data structures
are often organized by their dimensionality (1D, 2D, or nD) and whether they’re
homogeneous (all elements must be of the identical type) or heterogeneous (the elements are
often of various types). This gives rise to the six data types which are most frequently utilized
in data analysis.
R has many data structures, which include:
1. Atomic vector
2. List
3. Array
4. Matrices
5. Data Frame
6. Factors
1. Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R data
objects.
There are six types of atomic vectors such as logical, integer, character, double, and raw. "A
vector is a collection of elements which is most commonly of mode character, integer, logical
or numeric". They can be created using the c() function.
nv<- c(1,2,3,4,5)
cv<- c(“apple”, “banana”, “cherry”)
A vector can be one of the following two types:
1. Atomic vector
2. Lists

2. List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single
mode.
A list contains a mixture of data types. The list is also known as generic vectors because the
element of the list can be of any type of R object. "A list is a special type of vector in which
each element can be a different type. “We can create a list with the help of list() or as.list().
We can use vector() to create a required length empty list.

3. Arrays
There is another type of data objects which can store data in more than two dimensions
known as arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four
rectangular matrices of two rows and three columns. In R, an array is created with the help of
array() function. This function takes a vector as an input and uses the value in the dim
parameter to create an array.

4. Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional rectangular
layout. In the matrix, elements of the same atomic types are contained. For mathematical
calculation, this can use a matrix containing the numeric element. A matrix is created with the
help of the matrix() function in R.
Syntax
The basic syntax of creating a matrix is as follows:
1. matrix(data, no_row, no_col, by_row, dim_name)

5. Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in which
each
column contains the value of one variable, and row contains the set of value from each
column.
There are the following characteristics of a data frame:
1. The column name will be non-empty.
2. The row names will be unique.
3. A data frame stored numeric, factor or character type data.
4. Each column will contain same number of data items.

6. Factors
Factors are also data objects that are used to categorize the data and store it as levels. Factors
can store both strings and integers. Columns have a limited number of unique values so that
factors are very useful in columns. It is very useful in data analysis for statistical modelling.
Factors are created with the help of factor () function by taking a vector as an input
parameter.

Vectors
R vectors are the same as the arrays in C language which are used to hold multiple data
values of the same type. One major key point is that in R the indexing of the vector will start
from „1‟ and not from „0‟. We can create numeric vectors and character vectors as well.

The length is an important property of a vector. A vector length is basically the number of
elements in the vector, and it is calculated with the help of the length() function. Vector is
classified into two parts, i.e., atomic vectors and Lists. They have three common properties,
i.e., function type, function length, and attribute function. There is only one difference
between atomic vectors and lists. In an atomic vector, all the elements are of the same type,
but in the list, the elements are of different data types. In this section, we will discuss only the
atomic vectors. We will discuss lists briefly in the next topic.

A vector is a basic data structure which plays an important role in R programming. In R, a


sequence of elements which share the same data type is known as vector. A vector supports
logical, integer, double, character, complex, or raw data type. The elements which are
contained in vector known as components of the vector. We can check the type of vector with
the help of the typeof() function.
The length is an important property of a vector. A vector length is basically the number of
elements in the vector, and it is calculated with the help of the length() function. Vector is
classified into two parts, i.e., Atomic vectors and Lists. They have three common properties,
i.e., function type, function length, and attribute function.

How to create a vector in R?


In R, we use c() function to create a vector. This function returns a one-dimensional array or
simply
vector. The c() function is a generic function which combines its argument. All arguments are
restricted with a common data type which is the type of the returned value. There are various
other ways to create a vector in R, which are as follows:
1) Using the colon(:) operator
We can create a vector with the help of the colon operator. There is the following syntax to
use
colon operator:
1. z<-x:y
This operator creates a vector with elements from x to y and assigns it to z.
Example:
1. a<-4:-10
2. a
Output
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

2) Using the seq() function


In R, we can create a vector with the help of the seq() function. A sequence function creates a
sequence of elements as a vector. The seq() function is used in two ways, i.e., by setting step
size
with ?by' parameter or specifying the length of the vector with the 'length.out' feature.
Example:
1. seq_vec<-seq(1,4,by=0.5)
2. seq_vec
3. class(seq_vec)
Output
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Example:
1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. class(seq_vec)
Output
[1] 1.0 1.6 2.2 2.8 3.4 4.0
[1] "numeric"

Atomic vectors in R
In R, there are four types of atomic vectors. Atomic vectors play an important role in Data
Science. Atomic vectors are created with the help of c() function. These atomic vectors are as
follows:
 Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal value to
any variable d, then this d variable will become a numeric type. A vector which contains
numeric elements is known as a numeric vector.
Example:
1. d<-45.5
2. num_vec<-c(10.1, 10.2, 33.2)
3. d
4. num_vec
5. class(d)
6. class(num_vec)
Output
[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"
 Integer vector
A non-fraction numeric value is known as integer data. This integer data is represented by
"Int."
The Int size is 2 bytes and long Int size of 4 bytes. There is two way to assign an integer
value to a variable, i.e., by using as.integer() function and appending of L to the value. A
vector which contains integer elements is known as an integer vector.
Example:
1. d<-as.integer(5)
2. e<-5L
3. int_vec<-c(1,2,3,4,5)
4. int_vec<-as.integer(int_vec)
5. int_vec1<-c(1L,2L,3L,4L,5L)
6. class(d)
7. class(e)
8. class(int_vec)
9. class(int_vec1)
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
 Character vector
A character is held as a one-byte integer in memory. In R, there are two different ways to
create a character data type value, i.e., using as.character() function and by typing string
between double quotes("") or single quotes(''). A vector which contains character elements is
known as an integer vector.
Example:
1. d<-'shubham'
2. e<-"Arpita"
3. f<-65
4. f<-as.character(f)
5. d
6. e
7. f
8. char_vec<-c(1,2,3,4,5)
9. char_vec<-as.character(char_vec)
10. char_vec1<-c("shubham","arpita","nishka","vaishali")
11. char_vec
12. class(d)
13. class(e)
14. class(f)
15. class(char_vec)
16. class(char_vec1)
Output
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
Repeating Values:
You can create a vector by repeating the values using the rep() function. # Creates a vector
with
five is
Ex:
r<-rep(1, times=5)
r
Vector Function:
Vector Length
To find out how many items a vector has, use the length() function
Ex:
fruits <-c(“banana”, “apple”, “orange”)
length(fruits)
Output: [1] 3
Sort a Vector:
To sort items in a vector alphabetically or numerically, use the sort() function:
Ex:
fruits <-c(“banana”, “apple”, “orange”)
n<-c(5, 6, 1, 8)
sort(fruits)
sort(n)

Output:
[1] “apple” “banana” “orange”
[1] 1 5 6 8
Change an item:
To change the value of a specific item, refer to the index number
Ex:
fruits <-c(“banana”, “apple”, “orange”)
fruits[2]<-“Mango”
fruits
Output:
[1] “banana” “Mango” “orange”

Accessing elements of vectors


We can access the elements of a vector with the help of vector indexing. Indexing denotes the
position where the value in a vector is stored. Indexing will be performed with the help of
integer,
character, or logic.
1) Indexing with integer vector
On integer vector, indexing is performed in the same way as we have applied in C, C++, and
java. There is only one difference, i.e., in C, C++, and java the indexing starts from 0, but in
R, the indexing starts from 1. Like other programming languages, we perform indexing by
specifying an integer value in square braces [] next to our vector.
Example:
1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. seq_vec[2]
Output
[1] 1.0 1.6 2.2 2.8 3.4 4.0
[1] 1.6
2) Indexing with a character vector
In character vector indexing, we assign a unique key to each element of the vector. These
keys are uniquely defined as each element and can be accessed very easily. Let's see an
example to
understand how it is performed.
Example:
1. char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)
2. char_vec
3. char_vec["arpita"]
Output
shubham arpita vaishali
22 23 25
arpita
23

Modifying a R vector
Modification of a Vector is the process of applying some operation on an individual
element of a vector to change its value in the vector. There are different ways through which
we can
modify a vector:
# R program to modify elements of a Vector
# Creating a vector
X<- c(2, 7, 9, 7, 8, 2)
# modify a specific element
X[3] <- 1
X[2] <-9
cat('subscript operator', X, '\n')

# Modify using different logics.


X[1:5]<- 0
cat('Logical indexing', X, '\n')

# Modify by specifying
# the position or elements.
X<- X[c(3, 2, 1)]
cat('combine() function', X)

Output:
subscript operator 2 9 1 7 8 2
Logical indexing 0 0 0 0 0 2
combine() function 0 0 0
Deleting a R vector
Deletion of a Vector is the process of deleting all of the elements of the vector. This can be
done by assigning it to a NULL value.
# R program to delete a Vector
# Creating a Vector
M<- c(8, 10, 2, 5)
# set NULL to the vector
M<- NULL
cat('Output vector', M)
Output:
Output vector NULL
Sorting elements of a R Vector
sort() function is used with the help of which we can sort the values in ascending or
descending
order.
# R program to sort elements of a Vector
# Creation of Vector
X<- c(8, 2, 7, 1, 11, 2)
# Sort in ascending order
A<- sort(X)
cat('ascending order', A, '\n')
# sort in descending order
# by setting decreasing as TRUE
B<- sort(X, decreasing = TRUE)
cat('descending order', B)
Output:
ascending order 1 2 2 7 8 11
descending order 11 8 7 2 2 1

Lists
A list in R is a generic object consisting of an ordered collection of objects. Lists are one-
dimensional, heterogeneous data structures. The list can be a list of vectors, a list of matrices,
a list of characters and a list of functions, and so on. A list is a vector but with heterogeneous
data elements. A list in R is created with the use of list() function. R allows accessing
elements of an R list with the use of the index value. In R, the indexing of a list starts with 1
instead of 0 like in other programming languages. Lists are the R objects which contain
elements of different types like − numbers, strings, vectors and another list inside it. A list can
also contain a matrix or a function as its elements. List is created using list() function.
Example
1. vec <- c(3,4,5,6)
2. char_vec<-c("shubham","nishka","gunjan","sumit")
3. logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
4. out_list<-list(vec,char_vec,logic_vec)
5. out_list
Output:
[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE

List Functions:
R provides various functions for working with lists, including:
length(): Returns the number of elements in a list.
names(): Returns or sets the names of the elements in a list.
str(): Displays the structure of a list, showing its elements and data types.
unlist():Converts a list to a vector by flattening it.
Lists creation
The process of creating a list is the same as a vector. In R, the vector is created with the help
of c() function. Like c() function, there is another function, i.e., list() which is used to create a
list in R. A list avoid the drawback of the vector which is data type. We can add the elements
in the list of different data types.

Syntax
1. list()
Example 1: Creating list with same data type
1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1
6. list_2
7. list_3
8. list_4
Output:
[[1]]
[1] 1
[[2]]
[1] 2

[[3]]
[1] 3

[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"

[[1]]
[1] 1 2 3

[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
Example 2: Creating the list with different data type
1. list_data<-list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,22.5,12L)
2. print(list_data)
In the above example, the list function will create a list with character, logical, numeric, and
vector element. It will give the following output
Output:
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] 1 2 3 4 5
[[4]]
[1] TRUE
[[5]]
[1] FALSE
[[6]]
[1] 22.5
[[7]]
[1] 12
Giving a name to list elements
R provides a very easy way for accessing elements, i.e., by giving the name to each element
of a list. By assigning names to the elements, we can access the element easily. There are
only three steps to print the list data corresponding to the name:
1. Creating a list.
2. Assign a name to the list elements with the help of names() function.
3. Print the list data.
Let see an example to understand how we can give the names to the list elements.
Example
1. # Creating a list containing a vector, a matrix and a list.
2. list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow = 2),
3. list("BCA","MCA","B.tech"))
4. # Giving names to the elements in the list.
5. names(list_data) <- c("Students", "Marks", "Course")
6. # Show the list.
7. print(list_data)
Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
Accessing List Elements
R provides two ways through which we can access the elements of a list. First one is the
indexing method performed in the same way as a vector. In the second one, we can access the
elements of a list with the help of names. It will be possible only with the named list.; we
cannot access the elements of a list using names if the list is normal.
Let see an example of both methods to understand how they are used in the list to access
elements.
Example 1: Accessing elements using index
1. # Creating a list containing a vector, a matrix and a list.
2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),
3. list("BCA","MCA","B.tech"))
4. # Accessing the first element of the list.
5. print(list_data[1])
6. # Accessing the third element. The third element is also a list, so all its elements will be
printed.
7. print(list_data[3])
Output:
[[1]]
[1] "Shubham" "Arpita" "Nishka"

[[1]]
[[1]][[1]]
[1] "BCA"
[[1]][[2]]
[1] "MCA"
[[1]][[3]]
[1] "B.tech"
Example 2: Accessing elements using names
1. # Creating a list containing a vector, a matrix and a list.
2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow =
2),list("BCA","M
CA","B.tech"))
3. # Giving names to the elements in the list.
4. names(list_data) <- c("Student", "Marks", "Course")
5. # Accessing the first element of the list.
6. print(list_data["Student"])
7. print(list_data$Marks)
8. print(list_data)
Output:
$Student
[1] "Shubham" "Arpita" "Nishka"

[,1] [,2] [,3]


[1,] 40 60 90
[2,] 80 70 80
$Student
[1] "Shubham" "Arpita" "Nishka"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
Manipulation of list elements
R allows us to add, delete, or update elements in the list. We can update an element of a list
from anywhere, but elements can add or delete only at the end of the list. To remove an
element from a specified index, we will assign it a null value. We can update the element of a
list by overriding it from the new value. Let see an example to understand how we can add,
delete, or update the elements in the list.
Example
1. # Creating a list containing a vector, a matrix and a list.
2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),
3. list("BCA","MCA","B.tech"))
4. # Giving names to the elements in the list.
5. names(list_data) <- c("Student", "Marks", "Course")
6. # Adding element at the end of the list.
7. list_data[4] <- "Moradabad"
8. print(list_data[4])
9. # Removing the last element.
10. list_data[4] <- NULL
11. # Printing the 4th Element.
12. print(list_data[4])
13. # Updating the 3rd Element.
14. list_data[3] <- "Masters of computer applications"
15. print(list_data[3])
Output:
[[1]]
[1] "Moradabad"

$<NA>
NULL
$Course
[1] "Masters of computer applications"
Check if Item Exists:
To find out if a specified item is present in a list, us[1] TRUE the %in% function
Ex:
lst<-list(“apple”, “banana”, “cherry”)
“apple” %in% lst
Output: [1] TRUE
Remove List Items:
You can also remove list items. The following example creates a new, updated list without an
“apple” item:
Ex:
lst<-list(“apple”, “banana”, “cherry”)
nl<-lst[-1]
nl
Output:
[[1]]
[1] “banana”
[[2]]
[1]”cherry”
Converting list to vector
There is a drawback with the list, i.e., we cannot perform all the arithmetic operations on list
elements. To remove this, drawback R provides unlist() function. This function converts the
list into vectors. In some cases, it is required to convert a list into a vector so that we can use
the elements of the vector for further manipulation. The unlist() function takes the list as a
parameter and change into a vector. Let see an example to understand how to unlist() function
is used in R.
Example
1. # Creating lists.
2. list1 <- list(10:20)
3. print(list1)
4. list2 <-list(5:14)
5. print(list2)
6. # Converting the lists to vectors.
7. v1 <- unlist(list1)
8. v2 <- unlist(list2)
9. print(v1)
10. print(v2)
11. adding the vectors
12. result <- v1+v2
13. print(result)
Output:
[[1]]
[1] 1 2 3 4 5

[[1]]
[1] 10 11 12 13 14

[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19

Array
Arrays are essential data storage structures defined by a fixed number of dimensions. Arrays
are used for the allocation of space at contiguous memory locations. Uni-dimensional arrays
are called vectors with the length being their only dimension. Two-dimensional arrays are
called matrices, consisting of fixed numbers of rows and columns. Arrays consist of all
elements of the same data type. Vectors are supplied as input to the function and then create
an array based on the number of dimensions.
There is the following syntax of R arrays:
1. array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
2. data :The data is the first argument in the array() function. It is an input vector which is
given to the array.
3. matrices : In R, the array consists of multi-dimensional matrices.
4. row_size : This parameter defines the number of row elements which an array can store.
5. column_size : This parameter defines the number of columns elements which an array
can store.
6. dim_names : This parameter is used to change the default names of rows, columns,
layers and blocks.
How to create?
In R, array creation is quite simple. We can easily create an array using vector and array()
function. In array, data is stored in the form of the matrix. There are only two steps to create a
matrix which are as follows
1. In the first step, we will create two vectors of different lengths.
2. Once our vectors are created, we take these vectors as inputs to the array.
Let see an example to understand how we can implement an array with the help of the vectors
and array() function.
Example
1. #Creating two vectors of different lengths
2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4. #Taking these vectors as input to the array
5. res <- array(c(vec1,vec2),dim=c(3,3,2))
6. print(res)
Output
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
Naming rows and columns
In R, we can give the names to the rows, columns, and matrices of the array. This is done
with the help of the dim name parameter of the array() function. It is not necessary to give the
name to the rows and columns. It is only used to differentiate the row and column for better
understanding. Below is an example, in which we create two arrays and giving names to the
rows, columns, and matrices.
Example
1. #Creating two vectors of different lengths
2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4. #Initializing names for rows, columns and matrices
5. col_names <- c("Col1","Col2","Col3")
6. row_names <- c("Row1","Row2","Row3")
7. matrix_names <- c("Matrix1","Matrix2")
8. #Taking the vectors as input to the array
9.res<array(c(vec1,vec2),dim=c(3,3,2),dimnames=list(row_names,col_names,matrix_names)
10. print(res)
Output
, , Matrix1

Col1 Col2 Col3


Row1 1 10 13
Row2 3 11 14
Row3 5 12 15

, , Matrix2

Col1 Col2 Col3


Row1 1 10 13
Row2 3 11 14
Row3 5 12 15
Accessing array elements
Like C or C++, we can access the elements of the array. The elements are accessed with the
help of the index. Simply, we can access the elements of the array with the help of the
indexing method. Let see an example to understand how we can access the elements of the
array using the indexing method.
Example
1. , , Matrix1
2. Col1 Col2 Col3
3. Row1 1 10 13
4. Row2 3 11 14
5. Row3 5 12 15

7. , , Matrix2
8. Col1 Col2 Col3
9. Row1 1 10 13
10. Row2 3 11 14
11. Row3 5 12 15
12.
13. Col1 Col2 Col3
14. 5 12 15
15.
16. [1] 13
17.
18. Col1 Col2 Col3
19. Row1 1 10 13
20. Row2 3 11 14
21. Row3 5 12 15
Manipulation of elements
The array is made up matrices in multiple dimensions so that the operations on elements of an
array is carried out by accessing elements of the matrices.
Example
1. #Creating two vectors of different lengths
2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4. #Taking the vectors as input to the array1
5. res1 <- array(c(vec1,vec2),dim=c(3,3,2))
6. print(res1)
7. #Creating two vectors of different lengths
8. vec1 <-c(8,4,7)
9. vec2 <-c(16,73,48,46,36,73)
10. #Taking the vectors as input to the array2
11. res2 <- array(c(vec1,vec2),dim=c(3,3,2))
12. print(res2)
13. #Creating matrices from these arrays
14. mat1 <- res1[,,2]
15. mat2 <- res2[,,2]
16. res3 <- mat1+mat2
17. print(res3)
Output
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

,,1
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73

,,2
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73

[,1] [,2] [,3]


[1,] 9 26 59
[2,] 7 84 50
[3,] 12 60 88
Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with
the help of the vector input to the matrix function. On R matrices, we can perform addition,
subtraction, multiplication, and division operation. In the R matrix, elements are arranged in a
fixed number of rows and columns. The matrix elements are the real numbers. In R, we use
matrix function, which can easily reproduce the memory representation of the matrix. In the
R matrix, all the elements must share a common basic type.
Example
1. matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)
2. matrix1
Output
[,1] [,2] [,3]
[1,] 11 13 15
[2,] 12 14 16

How to create a matrix in R?


Like vector and list, R provides a function which creates a matrix. R provides the matrix ()
function to create a matrix. This function plays an important role in data analysis. There is the
following
syntax of the matrix in R:
1. matrix(data, nrow, ncol, byrow, dim_name)
data :The first argument in matrix function is data. It is the input vector which is the data
elements of the matrix.
nrow :The second argument is the number of rows which we want to create in the matrix.
ncol :The third argument is the number of columns which we want to create in the matrix.
byrow :The byrow parameter is a logical clue. If its value is true, then the input vector
elements are arranged by row.
dim_name: The dim_name parameter is the name assigned to the rows and columns.
Let's see an example to understand how matrix function is used to create a matrix and arrange
the elements sequentially by row or column.
Example
1. #Arranging elements sequentially by row.
2. P <- matrix(c(5:16), nrow = 4, byrow = TRUE)
3. print(P)
4. # Arranging elements sequentially by column.
5. Q <- matrix(c(3:14), nrow = 4, byrow = FALSE)
6. print(Q)
7. # Defining the column and row names.
8. row_names = c("row1", "row2", "row3", "row4")
9. ccol_names = c("col1", "col2", "col3")
10. R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
11. print(R)
Output
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 8 9 10
[3,] 11 12 13
[4,] 14 15 16

[,1] [,2] [,3]


[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14

col1 col2 col3


row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14
Creating special matrices
# Naming rows
# Naming columns
R allows the creation of various different types of matrices with the use of arguments passed
to the matrix() function.
Matrix where all rows and columns are filled by a single constant „k‟:
To create such a R matrix the syntax is given below:
Syntax: matrix(k, m, n)
Parameters:
k: the constant
m: no of rows
n: no of columns
print(matrix(5, 3, 3))
Output:
[,1] [,2] [,3]
[1,] 5 5 5
[2,] 5 5 5
[3,] 5 5 5
Diagonal matrix:
A diagonal matrix is a matrix in which the entries outside the main diagonal are all zero. To
create such a R matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters:
k: the constants/array
m: no of rows
n: no of columns
Example:
print(diag(c(5, 3, 3), 3, 3))
Output:
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
Accessing matrix elements in R
Like C and C++, we can easily access the elements of our matrix by using the index of the
element. There are three ways to access the elements from the matrix.
1. We can access the element which presents on nth row and mth column.
2. We can access all the elements of the matrix which are present on the nth row.
3. We can also access all the elements of the matrix which are present on the mth column.
Let see an example to understand how elements are accessed from the matrix present on nth
row mth column, nth row, or mth column.
Example
1. # Defining the column and row names.
2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4. #Creating matrix
5. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
6. print(R)
7. #Accessing element present on 3rd row and 2nd column
8. print(R[3,2])
9. #Accessing element present in 3rd row
10. print(R[3,])
11. #Accessing element present in 2nd column
12. print(R[,2])
Output
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

[1] 12

col1 col2 col3


11 12 13
row1 row2 row3 row4
6 9 12 15
Modification of the matrix
R allows us to do modification in the matrix. There are several methods to do modification in
the matrix, which are as follows:
Assign a single element
In matrix modification, the first method is to assign a single element to the matrix at a
particular
position. By assigning a new value to that position, the old value will get replaced with the
new
one. This modification technique is quite simple to perform matrix modification. The basic
syntax for it is as follows:
1. matrix[n, m]<-y
Here, n and m are the rows and columns of the element, respectively. And, y is the value
which
we assign to modify our matrix. Let see an example to understand how modification will be
done:
Example
1. # Defining the column and row names.
2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
5. print(R)
6. #Assigning value 20 to the element at 3d roe and 2nd column
7. R[3,2]<-20
8. print(R)
Output
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 20 13
row4 14 15 16

Addition of Rows and Columns


The third method of matrix modification is through the addition of rows and columns using
the
cbind() and rbind() function. The cbind() and rbind() function are used to add a column and a
row respectively. Let see an example to understand the working of cbind() and rbind()
functions.
Example 1
1. # Defining the column and row names.
2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
5. print(R)
6. #Adding row
7. rbind(R,c(17,18,19))
8. #Adding column
9. cbind(R,c(17,18,19,20))
10. #transpose of the matrix using the t() function:
11. t(R)
12. #Modifying the dimension of the matrix using the dim() function
13. dim(R)<-c(1,12)
14. print(R)
Output
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
17 18 19
col1 col2 col3
row1 5 6 7 17
row2 8 9 10 18
row3 11 12 13 19
row4 14 15 16 20
row1 row2 row3 row4
col1 5 8 11 14
col2 6 9 12 15
col3 7 10 13 16
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 5 8 11 14 6 9 12 15 7 10 13 16
Matrix operations
In R, we can perform the mathematical operations on a matrix such as addition, subtraction,
multiplication, etc. For performing the mathematical operation on the matrix, it is required
that
both the matrix should have the same dimensions. Let see an example to understand how
mathematical operations are performed on the matrix.
Example 1
1. R <- matrix(c(5:16), nrow = 4,ncol=3)
2. S <- matrix(c(1:12), nrow = 4,ncol=3)
3.
4. #Addition
5. sum<-R+S
6. print(sum)
7.
8. #Subtraction
9. sub<-R-S
10. print(sub)
11.
12. #Multiplication
13. mul<-R*S
14. print(mul)
15.
16. #Multiplication by constant
17. mul1<-R*12
18. print(mul1)
19.
20. #Division
21. div<-R/S
22. print(div)
Output
[,1] [,2] [,3]
[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28
[,1] [,2] [,3]
[1,] 4 4 4
[2,] 4 4 4
[3,] 4 4 4
[4,] 4 4 4
[,1] [,2] [,3]
[1,] 5 45 117
[2,] 12 60 140
[3,] 21 77 165
[4,] 32 96 192
[,1] [,2] [,3]
[1,] 60 108 156
[2,] 72 120 168
[3,] 84 132 180
[4,] 96 144 192
[,1] [,2] [,3]
[1,] 5.000000 1.800000 1.444444
[2,] 3.000000 1.666667 1.400000
[3,] 2.333333 1.571429 1.363636
[4,] 2.000000 1.500000 1.333333

Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column contains
values of one variable, and rows contains one set of values from each column. A data frame is
a special case of the list in which each component has equal length. A data frame is used to
store data table and the vectors which are present in the form of a list in a data frame, are of
equal length. In a simple way, it is a list of equal length vectors. A matrix can contain one
type of data, but a data frame can contain different data types such as numeric, character,
factor, etc. There are following characteristics of a data frame.
o Rectangular structure: Data frames are two dimensional structures with rows and
columns forming a rectangular shape. All columns must have same number of rows, making
them suitable for structured datasets.
o Column Names: The columns name should be non-empty.
o The rows name should be unique.
o The data which is stored in a data frame can be a factor, numeric, or character type.
o Each column contains the same number of data items.
How to create Data Frame?
In R, the data frames are created with the help of frame() function of data. This function
contains the vectors of any type such as numeric, character, or integer. In below example, we
create a data frame that contains employee id (integer vector), employee name(character
vector), salary(numeric vector), and starting date(Date vector).
Example
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,915.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Printing the data frame.
12. print(emp.data)
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita915.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
Getting the structure of R Data Frame
In R, we can find the structure of our data frame. R provides an in-build function called str()
which returns the data with its complete structure. In below example, we have created a
frame using a vector of different data type and extracted the structure of it.
Example
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
7. "2015-03-27")),
8. stringsAsFactors = FALSE
9. )
10. # Printing the structure of data frame.
11. str(emp.data)
Output
'data.frame': 5 obs. of 4 variables:
$ employee_id : int 1 2 3 4 5
$ employee_name: chr "Shubham" "Arpita" "Nishka" "Gunjan" ...
$ sal : num 623 515 611 729 843
$ starting_date: Date, format: "2012-01-01" "2013-09-23" ...

Extracting data from Data Frame


The data of the data frame is very crucial for us. To manipulate the data of the data frame, it
is
essential to extract it from the data frame. We can extract the data in three ways which are as
follows:
1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.
Let's see an example of each one to understand how data is extracted from the data frame
with the help these ways. Extracting the specific columns from a data frame
Example
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name= c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
7. "2015-03-27")),
8. stringsAsFactors = FALSE
9. )
10. # Extracting specific columns from a data frame
11. final <- data.frame(emp.data$employee_id,emp.data$sal)
12. print(final)
Output
emp.data.employee_idemp.data.sal
1 1 623.30
2 2 515.20
3 3 611.00
4 4 729.00
5 5 843.25
Extracting the specific rows from a data frame
Example
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Extracting first row from a data frame
12. final <- emp.data[1,]
13. print(final)
14. # Extracting last two row from a data frame
15. final <- emp.data[4:5,]
16. print(final)
Output
employee_id employee_name sal starting_date
1 1 Shubham 623.3 2012-01-01

employee_id employee_name sal starting_date


4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
Extracting specific rows corresponding to specific columns
Example
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
7. "2015-03-27")),
8. stringsAsFactors = FALSE
9. )
10. # Extracting 2nd and 3rd row corresponding to the 1st and 4th column
11. final <- emp.data[c(2,3),c(1,4)]
12. print(final)
Output
employee_id starting_date
2 2 2013-09-23
3 3 2014-11-15
Modification in Data Frame
R allows us to do modification in our data frame. Like matrices modification, we can modify
our data frame through re-assignment. We cannot only add rows and columns, but also we
can delete them. The data frame is expanded by adding rows and columns. We can
1. Add a column by adding a column vector with the help of a new column name using
cbind() function.
2. Add rows by adding new rows in the same structure as the existing data frame and using
rbind() function
3. Delete the columns by assigning a NULL value to them.
4. Delete the rows by re-assignment to them.
Let's see an example to understand how rbind() function works and how the modification is
done in our data frame.
Example: Adding rows and columns
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
7. "2015-03-27")),
8. stringsAsFactors = FALSE
9. )
10. print(emp.data)
11.
12. #Adding row in the data frame
13. x <- list(6,"Vaishali",547,"2015-09-01")
14. rbind(emp.data,x)
15. #Adding column in the data frame
16. y <- c("Moradabad","Lucknow","Etah","Sambhal","Khurja")
17. cbind(emp.data,Address=y)
Output
employee_id employee_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
employee_id employee_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
6 6 Vaishali 547.00 2015-09-01
employee_id employee_name sal starting_date Address
1 1 Shubham 623.30 2012-01-01 Moradabad
2 2 Arpita 515.20 2013-09-23 Lucknow
3 3 Nishka 611.00 2014-11-15 Etah
4 4 Gunjan 729.00 2014-05-11 Sambhal
5 5 Sumit 843.25 2015-03-27 Khurja
Example: Delete rows and columns
1. # Creating the data frame.
2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
7. "2015-03-27")),
8. stringsAsFactors = FALSE
9. )
10. print(emp.data)
11. #Delete rows from data frame
12. emp.data<-emp.data[-1,]
13. print(emp.data)
14. #Delete column from the data frame
15. emp.data$starting_date<-NULL
16. print(emp.data)
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesalstarting_date
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesal
1 1 Shubham623.30
2 2 Arpita515.20
3 3 Nishka611.00
4 4 Gunjan729.00
5 5 Sumit843.25

Factors
The factor is a data structure which is used for fields which take only predefined finite
number of values. These are the variable which takes a limited number of different values.
These are the data objects which are used to categorize the data and to store it on multiple
levels. It can store both integers and strings values, and are useful in the column that has a
limited number of unique values. Factors have labels which are associated with the unique
integers stored in it. It contains predefined set value known as levels and by default R always
sorts levels in alphabetical order. Attributes of a factor There are the following attributes of a
factor in R
a. X
It is the input vector which is to be transformed into a factor.
b. levels
It is an input vector that represents a set of unique values which are taken by x.
c. labels
It is a character vector which corresponds to the number of labels.
d. Exclude
It is used to specify the value which we want to be excluded,
e. ordered
It is a logical attribute which determines if the levels are ordered.
f. nmax
It is used to specify the upper bound for the maximum number of level.

How to create a factor?


In R, it is quite simple to create a factor. A factor is created in two steps
1. In the first step, we create a vector.
2. Next step is to convert the vector into a factor,

R provides factor() function to convert the vector into factor. There is the following syntax of
factor() function
1. factor_data<- factor(vector)
Let's see an example to understand how factor function is used.
Example
1. # Creating a vector as input.
2.data<c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","
Sumit","Ar pita","Sumit")
4. print(data)
5. print(is.factor(data))
6.
7. # Applying the factor function.
8. factor_data<- factor(data)
9.
10. print(factor_data)
11. print(is.factor(factor_data))
Output
[1] "Shubham" "Nishka" "Arpita" "Nishka" "Shubham" "Sumit" "Nishka"
[8] "Shubham" "Sumit" "Arpita" "Sumit"
[1] FALSE
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] TRUE

Accessing components of factor


Like vectors, we can access the components of factors. The process of accessing components
of factor is much more similar to the vectors. We can access the element with the help of the
indexing method or using logical vectors. Let's see an example in which we understand the
different ways of accessing the components.
Example
1. # Creating a vector as input.
2.data<c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","
Sumit","Ar pita","Sumit")
3. # Applying the factor function.
4. factor_data<- factor(data)
5. #Printing all elements of factor
6. print(factor_data)
7. #Accessing 4th element of factor
8. print(factor_data[4])
9. #Accessing 5th and 7th element
10. print(factor_data[c(5,7)])
11. #Accessing all elemcent except 4th one
12. print(factor_data[-4])
13. #Accessing elements using logical vector
14.print(factor_data[c(TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,F
ALSE,TRUE)] )
Output
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Nishka
Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka


Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka Arpita Shubham Sumit Nishka Shubham Sumit Arpita
[10] Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Shubham Shubham Sumit Nishka Sumit


Levels: Arpita Nishka Shubham Sumit

Modification of factor
Like data frames, R allows us to modify the factor. We can modify the value of a factor by
simply re-assigning it. In R, we cannot choose values outside of its predefined levels means
we cannot insert value if it's level is not present on it. For this purpose, we have to create a
level of that value, and then we can add it to our factor. Let's see an example to understand
how the modification is done in factors.
Example
1. # Creating a vector as input.
2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham")
4. # Applying the factor function.
5. factor_data<- factor(data)
7. #Printing all elements of factor
8. print(factor_data)
10. #Change 4th element of factor with sumit
11. factor_data[4] <-"Arpita"
12. print(factor_data)
14. #change 4th element of factor with "Gunjan"
15. factor_data[4] <- "Gunjan" # cannot assign values outside levels
16. print(factor_data)
17.
18. #Adding the value to the level
19. levels(factor_data) <- c(levels(factor_data),"Gunjan")#Adding new level
20. factor_data[4] <- "Gunjan"
21. print(factor_data)
Output
[1] Shubham Nishka Arpita Nishka Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Arpita Shubham
Levels: Arpita Nishka Shubham
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = "Gunjan") :
invalid factor level, NA generated
[1] Shubham Nishka Arpita Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Gunjan Shubham
Levels: Arpita Nishka Shubham Gunjan

Factor in Data Frame


When we create a frame with a column of text data, R treats this text column as categorical
data and creates factor on it.
Example
1. # Creating the vectors for data frame.
2. height <- c(132,162,152,166,139,147,122)
3. weight <- c(40,49,48,40,67,52,53)
4. gender <- c("male","male","female","female","male","female","male")
5.
6. # Creating the data frame.
7. input_data<- data.frame(height,weight,gender)
8. print(input_data)
9.
10. # Testing if the gender column is a factor.
11. print(is.factor(input_data$gender))
12.
13. # Printing the gender column to see the levels.
14. print(input_data$gender)
Output
height weight gender
1 132 40 male
2 162 49 male
3 152 48 female
4 166 40 female
5 139 67 male
6 147 52 female
7 122 53 male
[1] TRUE
[1] male male female female male female male
Levels: female male
Changing order of the levels
In R, we can change the order of the levels in the factor with the help of the factor function.
Example
1. data <- c("Nishka","Gunjan","Shubham","Arpita","Arpita","Sumit","Gunjan","Shubham")
2. # Creating the factors
3. factor_data<- factor(data)
4. print(factor_data)
5. # Apply the factor function with the required order of the level.
6.new_order_factor<-factor(factor_data,levels
=c("Gunjan","Nishka","Arpita","Shubham","Sumit"))
7. print(new_order_factor)
Output
[1] Nishka Gunjan Shubham Arpita Arpita Sumit Gunjan Shubham
Levels: Arpita Gunjan Nishka Shubham Sumit
[1] Nishka Gunjan Shubham Arpita Arpita Sumit Gunjan Shubham
Levels: Gunjan Nishka Arpita Shubham Sumit
Generating Factor Levels
R provides gl() function to generate factor levels. This function takes three arguments i.e., n,
k, and labels. Here, n and k are the integers which indicate how many levels we want and
how many times each level is required. There is the following syntax of gl() function which is
as follows
1. gl(n, k, labels)
1. n indicates the number of levels.
2. k indicates the number of replications.
3. labels is a vector of labels for the resulting factor levels.
Example
1. gen_factor<- gl(3,5,labels=c("BCA","MCA","B.Tech"))
2. gen_factor
Output
[1] BCA BCA BCA BCA BCA MCA MCA MCA MCA MCA
[11] B.Tech B.Tech B.Tech B.Tech B.Tech
Levels: BCA MCA B.Tech

Non – numeric values


Non – numeric value is any form of data that is measured in non-number (or word) form. It
makes use of symbols and letters. Such data can only be identified in a word format. For
example, employee address, name, etc.
Example
# Create a data frame
dataframe <- data.frame(students=c('Bhuwanesh', 'Anil', 'Suraj', 'Piyush', 'Dheeraj'),
section=c('A', 'A', 'C', 'C', 'B'),
minor=c(87, 98, 71, 89, 82),
major=c(80, 88, 84, 74, 70))
# Print the dataframe
print(dataframe)
# Create a data frame
dataframe <- data.frame(students=c('Bhuwanesh',
'Anil', 'Suraj',
'Piyush', 'Dheeraj'),
section=c('A', 'A', 'C', 'C', 'B'),
minor=c(87, 98, 71, 89, 82),
major=c(80, 88, 84, 74, 70))
# Try to calculate mean of values in
# section column
mean(dataframe$team)
Output
NA
How to avoid this warning ?
The only way to avoid this warning is to use the mean() function with vectors having numeric
values only. For example, In the above example we can calculate the mean of the minor
column as it contains numerical values only:
# Create a data frame
dataframe <- data.frame(students=c('Bhuwanesh', 'Anil',
'Suraj', 'Piyush',
'Dheeraj'),
section=c('A', 'A', 'C', 'C', 'B'),
minor=c(87, 98, 71, 89, 82),
major=c(80, 88, 84, 74, 70))
mean(dataframe$minor)
Print ( mean)
Output
85.4
How to Fix: non-numeric argument to binary operator in R
We will see How to Fix: non-numeric argument to the binary operator in R Programming
Language. The “non-numeric argument to binary operator” error occurs when we perform
arithmetic operations on non-numeric elements.
How to produce this error
Here we can see, we have to take the string element and try to add it with a numeric element,
so it will occur.
num <- "2"
res <- num + 4
print(res)
Output:
Error in num + 4: non-numeric argument to binary operator
How to solve it?
To solve this error we will convert non-numeric data into numeric data using as.numeric()
methods.
Example 1: Perform into vector
We will convert non-numeric data from vector into numeric data using as.numeric() methods.
num <- "2"
res <- as.numeric(num) + 3
print(res)
Output is 5
Check if an Object is of Type Numeric in R Programming –
is.numeric() Function is.numeric() function in R Language is used to check if the object
passed to it as argument is of numeric type.
Syntax: is.numeric(x)
Parameters:
x: Object to be checked
Example 1:
# R program to check if
# object is of numeric type
# Calling is.numeric() function
is.numeric(1)
is.numeric(“Program”)
is.numeric(-1.5)
Output:
[1] TRUE
[1] FALSE
[1] TRUE

Special Values:
NA(Not Available): NA represents missing or undefined values. It is used to indicate the
absence of a value. It is often used in data analysis to handle missing data points.
Ex:
v<- c(1,2, 3)
v
length(v)<-4
v
Output:
[1]1 2 3
[1] 1 2 3 NA
NaN (Not a Number): NaN represents an undefined or unpresentable value in numerical
calculations. It is often used when a mathematical operation doesn’t result in a valid numeric
value.
Ex:
0/0 output: [1] NaN
Inf and –Inf (Positive and Negative Infinity): Inf represents positive infinity and –
Inf represents negative infinity. These values are that are beyond the representable range.
2^1024 Output [1] Inf -2^1024 Output [1] -Inf

Classes in R Programming
A class is just a blueprint or a sketch of these objects. It represents the set of properties or
methods that are common to all objects of one type. Unlike most other programming
languages, R has a three-class system. These are S3, S4, and Reference Classes.

Classes and Objects in R


A class is a user-defined data type from which objects are created. They are also referred to as
blueprints sometimes. This is because they define the structure of objects. A class is a passive
entity whereas an object is an active entity. R considers all things as objects. An object is a
single instance of a class. R provides us a class() function which can be either used to define
the class of a function or get the information of the class for an already defined object. A class
in R is present in a vector form. Due to this property, the objects can inherit from many
classes, and also we can specify the order of inheritance even for complex classes.
There are 3 types of Class:
S3 Class
S4 Class
Reference Class

S3 Class
S3 is the simplest yet the most popular OOP system and it lacks formal definition and
structure. An object of this type can be created by just adding an attribute to it. Following is
an example to make things more clear:
Example:
# create a list with required components
movieList <- list(name = "Iron man", leadActor = "Robert Downey Jr")
# give a name to your class
class(movieList) <- "movie"
movieList
Output
$name
[1] "Iron man"
$leadActor
[1] "Robert Downey Jr

In S3 systems, methods don‟t belong to the class. They belong to generic functions. It means
that we can‟t create our own methods here, as we do in other programming languages like C+
+ or Java. But we can define what a generic method (for example print) does when applied to
our objects.

print(movieList)
Output:
$name
[1] "Iron man"

$leadActor
[1] "Robert Downey Jr"

Example: Creating a user-defined print function


# now let us write our method
print.movie <- function(obj)
{
cat("The name of the movie is", obj$name,".\n")
cat(obj$leadActor, "is the lead actor.\n")
}
Output:
The name of the movie is Iron man .
Robert Downey Jr is the lead actor.

S4 Class
Programmers of other languages like C++, Java might find S3 to be very much different than
their normal idea of classes as it lacks the structure that classes are supposed to provide. S4 is
a slight improvement over S3 as its objects have a proper definition and it gives a proper
structure to its objects.

Example:
library(methods)
# definition of S4 class
setClass("movies", slots=list(name="character", leadActor = "character"))

# creating an object using new() by passing class name and slot values

movieList <- new("movies", name="Iron man", leadActor = "Robert Downey Jr")


movieList
Output:
An object of class "movies"
Slot "name":
[1] "Iron man"
Slot "leadActor":
[1] "Robert Downey Jr"
As shown in the above example, setClass() is used to define a class and new() is used to
create the objects.

Reference Class
Reference Class is an improvement over S4 Class. Here the methods belong to the classes.
These are much similar to object-oriented classes of other languages. Defining a Reference
class is similar to defining S4 classes. We use setRefClass() instead of setClass() and “fields”
instead of “slots”.
Example:
library(methods)
# setRefClass returns a generator
movies <- setRefClass("movies", fields = list(name = "character",
leadActor = "character", rating = "numeric"))
#now we can use the generator to create objects

movieList <- movies(name = "Iron Man",


leadActor = "Robert downey Jr", rating = 7)
movieList
Output:
Reference class object of class "movies"
Field "name":
[1] "Iron Man"
Field "leadActor":
[1] "Robert downey Jr"
Field "rating":
[1] 7

Coercion
When you call a function with an argument of the wrong type, R will try to coerce values to a
different type so that the function will work. There are two types of coercion that occur
automatically in R: coercion with formal objects and coercion with built-in types.
Coercion includes type conversion . Type conversion means change of one type of data into
another type of data. We have to type of coercion occurs :
1. Implicit Coercion
2. Explicit Coercion
Implicit Coercion : When type conversion occurs by itself in R. We input numeric and
character data in an object . R converts numeric data to character data by itself Implicit
coercion occurs when we operate on a vector in a way that is not intended for its type. For
example, if we add 1 to a logical vector, then the logical values are converted to 0s and 1s
implicitly,
and 1 is added to each element

Explicit Coercion : In explicit coercion , we can change one data type to another data type
by applying function. We create an object “x” which stores integer values from 1 to 6.

x<-0:6
We can check data type of “x” object.
class(x)

We used as.numeric() to change integer data type to numeric data type.


z<-as.numeric(x)
We check data type of z. It shows “numeric” data type.

class(z)

Basic plots
Basic plots in R. R has a number of built-in tools for basic graph types such as histograms,
scatter plots, bar charts, boxplots and much more. Rather than going through all of different
types, we will focus on plot() , a generic function for plotting x-y data. Plot The plot()
function is used to draw points (markers) in a diagram. The function takes parameters for
specifying points in the diagram.

Parameter 1 specifies points on the x-axis.


Parameter 2 specifies points on the y-axis.

At its simplest, you can use the plot() function to plot two numbers against each other:
Graph plotting in R is of two types:

One-dimensional Plotting: In one-dimensional plotting, we plot one variable at a time. For


example, we may plot a variable with the number of times each of its values occurred in the
entire dataset (frequency). So, it is not compared to any other variable of the dataset.

Two-dimensional Plotting
In two-dimensional plotting, we visualize and compare one variable with respect to the other.
For example, in a dataset of Air Quality measures, we would like to compare how the AQI
varies with the temperature at a particular place. So, temperature and AQI are two different
variables and we wish to see how one changes with respect to the other. These are the 3 major
kinds of graphs used for such kinds of analysis
R - Bar Charts
A bar chart represents data in rectangular bars with length of the bar proportional to the value
of the variable. R uses the function barplot() to create bar charts. R can draw both vertical and
Horizontal bars in the bar chart. In bar chart each of the bars can be given different colors.
Syntax
The basic syntax to create a bar-chart in R is –

barplot(H,xlab,ylab,main, names.arg,col)

Following is the description of the parameters used –

H is a vector or matrix containing numeric values used in bar chart.

xlab is the label for x axis.


ylab is the label for y axis.
main is the title of the bar chart.
names.arg is a vector of names appearing under each bar.
col is used to give colors to the bars in the graph.
Example
A simple bar chart is created using just the input vector and the name of each bar.
# Create the data for the chart
A <- c(17, 2, 8, 13, 1, 22)
B <- c("Jan", "feb", "Mar", "Apr", "May", "Jun")
# Plot the bar chart
barplot(A, names.arg = B, xlab ="Month",
ylab ="Articles", col ="green",
main ="GeeksforGeeks-Article chart")

Histograms in R language
A histogram contains a rectangular area to display the statistical information which is
proportional to the frequency of a variable and its width in successive numerical intervals. A
graphical representation that manages a group of data points into different specified ranges. It
has a special feature that shows no gaps between the bars and is similar to a vertical bar
graph.
R – Histograms
We can create histograms in R Programming Language using the hist() function.
Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
Parameters:
v: This parameter contains numerical values used in histogram.
main: This parameter main is the title of the chart.
col: This parameter is used to set color of the bars.
xlab: This parameter is the label for horizontal axis.
border: This parameter is used to set border color of each bar.
xlim: This parameter is used for plotting values of x-axis.
ylim: This parameter is used for plotting values of y-axis.
breaks: This parameter is used as width of each bar.
Creating a simple Histogram in R
Creating a simple histogram chart by using the above parameter. This vector v is plot using
hist().
Example:
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
# Create the histogram.
hist(v, xlab = "No.of Articles ", col = "green", border = "black")

Boxplots in R Language
A box graph is a chart that is used to display information in the form of distribution by
drawing
boxplots for each of them. This distribution of data is based on five sets (minimum, first
quartile, median, third quartile, and maximum).
Boxplots in R Programming Language
Boxplots are created in R by using the boxplot() function.
Syntax: boxplot(x, data, notch, varwidth, names, main)
Parameters:
x: This parameter sets as a vector or a formula.
data: This parameter sets the data frame.
notch: This parameter is the label for horizontal axis.
varwidth: This parameter is a logical value. Set as true to draw width of the box
proportionate to the sample size.
main: This parameter is the title of the chart.
names: This parameter are the group labels that will be showed under each boxplot.
Creating a Dataset
We use the data set “mtcars”.
Let‟s look at the columns “mpg” and “cyl” in mtcars.
input <- mtcars[, c('mpg', 'cyl')]
print(head(input))
Output:
mpg cyl
Mazda RX4
21.0 6
Mazda RX4 Wag 21.0 6
Datsun 710
22.8 4
Hornet 4 Drive 21.4 6
Hornet Sportabout 18.7 8
Valiant
18.1 6

Creating the Boxplot


Creating the Boxplot graph.
Take the parameters which are required to make a boxplot. Now we draw a graph for the
relation between “mpg” and “cyl”.
# Load the dataset
data(mtcars)
# Create the box plot
boxplot(disp ~ gear, data = mtcars,
main = "Displacement by Gear",
xlab = "Gear",
ylab = "Displacement")

Scatter Plot:
Scatterplots are useful for visualizing the relationship and distribution of data points and for
identifying patterns, clusters or outliners.

x<-c(1, 2, 3, 4, 5)
y<-c(10, 8, 15, 7, 12)
plot(x, y, type="p", pch=19, col="blue", main="Scatter Plot", xlab="X-Axis", ylab="Y-Axis")

Line Plot:
A line plot in R is used to display data points connected by lines. It’s a useful visualization for
showing trends and changes in data over time across a continues variable.
plot(x, y, type="l", lwd=2, col="red", main="Line Plot", xlab="X-Axis", ylab="Y-Axis")

Pie Chart
A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical
proportion. Pie charts represents data visually as a fractional part of a whole, which can be an
effective communication tool.
expenditure <- c(600, 300, 150, 100, 200)
pie(expenditure, main = "Monthly Expenditure Breakdown",
labels = c("Housing", "Food", "Cloths", "Entertainment", "Other") )

You might also like