R-Programming Unit 1 & 2 Compiled
R-Programming Unit 1 & 2 Compiled
and R-Programming
Introduction to R
There are six data types of these atomic vectors, also termed as
six classes of vectors. The other R-Objects are built upon the
atomic vectors.
Basic Functions in R console
Creating an R file
Saving an R file
Clearing the Console: The console can be cleared using the shortcut key “ctrl +
L“.
.
Execution of an R file:
There are several ways in which the execution of the commands that are available
in the R file is done.
Using the run command: This “run” command can be executed using the GUI,
by pressing the run button there, or you can use the Shortcut key control +
enter.
What does it do? It will execute the line in which the cursor is there.
Using the source with echo command: This “source with echo” command
can be executed using the GUI, by pressing the source with echo button there, or
you can use the Shortcut key control + shift + enter.
What does it do? It will print the commands also, along with the output you
are printing.
Clearing the Environment: Variables on the R environment can be
cleared in two ways:
Using rm() command: When you want to clear a single variable from
the R environment you can use the “rm()” command followed by the
variable you want to remove.
Typingrm(variable) will delete the variable which you want to remove.
If you want to delete all the variables that are there in the
environment what you can do is you can use the “rm” with an
argument “list” is equal to “ls” followed by a parenthesis.
Using the GUI: We can also clear all the variables in the environment
using the GUI in the environment pane by using the brush button.
Run command over Source
command:
Like most other languages, R lets you assign values to variables and refer to
them by name.
In R, the assignment operator is <-. Usually, this is pronounced as “gets.” For
example, the statement:
x <- 1
is usually read as “x gets 1.”)
After you assign a value to a variable, the R interpreter will substitute that
value in place of the variable name when it evaluates an expression.
Comments in R
Keywords are the words reserved by a program because they have a special
meaning thus a keyword can’t be used as a variable name, function name, etc.
We can view these keywords by using either help(reserved) or ?reserved.
Here’s a simple example:
> x <- 1
> y <- 2
> z <- c(x,y)
>z
[1] 1 2
Notice that the substitution is done at the time that the value is
assigned to z, not at the time that z is evaluated.
Suppose that you were to type in the preceding three expressions
and then change the value of y. The value of z would not change:
> y <- 4
>z
[1] 1 2
But try and see if you assign a new value to z, does the previous
one shows up?
R provides several different ways to refer to a member (or set of
members) of a vector.
You can refer to elements by location in a vector:
> b <- c(1,2,3,4,5,6,7,8,9,10,11,12)
>b
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> b[7]
[1] 7
> b[1:6]
[1] 1 2 3 4 5 6
> b[b %% 3 == 0]
[1] 3 6 9 12
Exercise
That’s the basic interface for executing R code in RStudio. Try doing these
simple tasks. If you execute everything correctly, you should end up with
the same number that you started with:
Choose any number and add 2 to it.
Multiply the result by 3.
Subtract 6 from the answer.
Divide what you get by 3.
The variables can be assigned values using leftward, rightward and equal
to operator. The values of the variables can be printed using print() or
cat()function.
Variables can be deleted by using the rm() function. Let us delete the
variable var.3.
On printing the value of the variable error is thrown.
Basic Operations in R
When you perform an operation on two vectors, R will match the elements of
the two vectors pairwise and return a vector. For example:
When you want to create vector with more than one element,
you should use c() function which means to combine the
elements into a vector.
apple <- c('red','green',"yellow")
apple
class(apple)
When we execute the above code, it produces the following
result: [1] "red" "green" "yellow"
[1] "character"
Types of vectors
Vectors are of different types which are used in R. Following are some
of the types of vectors:
Numeric vectors: Numeric vectors are those which contain numeric
values such as integer, float, etc.
Character vectors: Character vectors contain alphanumeric values
and special characters.
Logical vectors: Logical vectors contain boolean values such as
TRUE, FALSE and NA for Null values.
Creating a vector
# we can use the c function
# to combine the values as a vector.
# By default the type will be double
X <- c(61, 4, 21, 67, 89, 2)
cat('using c function', X, '\n')
There are
different ways of # seq() function for creating
creating vectors. # a sequence of continuous values.
# length.out defines the length of
Generally, we use
vector.
‘c’ to combine Y <- seq(1, 10, length.out = 5)
different elements cat('using seq() function', Y, '\n')
together.
# use':' to create a vector
# of continuous values.
Z <- 2:7
cat('using colon', Z)
Accessing vector elements
# R program to access elements of a Vector
print(vec)
R List to matrix # Defining list
lst1 <- list(list(1, 2, 3),
list(4, 5, 6))
# Print list
We will create cat("The list is:\n")
matrices using print(lst1)
matrix() function in cat("Class:", class(lst1), "\n")
R programming.
# Convert list to matrix
Another function mat <- matrix(unlist(lst1), nrow
that will be used is = 2, byrow = TRUE)
unlist() function to
convert the lists into # Print matrix
a vector. cat("\nAfter conversion to
matrix:\n")
print(mat)
cat("Class:", class(mat), "\n")
Matrices
A diagonal matrix is a matrix in which the entries outside the main
diagonal are all zero. To create such a matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters: # R program to illustrate
k: the constants/array # special matrices
m: no of rows
n: no of columns # Diagonal matrix having 3 rows
and 3 columns
# filled by array of elements
(5, 3, 3)
print(diag(c(5, 3, 3), 3, 3))
Identity matrix:
# Accessing 2
print(A[1, 2])
# Accessing 6
print(A[2, 3])
# R program to illustrate
# access submatrices in a
Accessing Submatrices: matrix
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
In R you can modify the
byrow = TRUE
elements of the matrices )
by a direct assignment. cat("The 3x3 matrix:\n")
print(A)
Concatenation of ncol = 3,
byrow = TRUE
a column: )
cat("The 3x3 matrix:\n")
print(A)
# 2nd-row deletion
A = A[, -2]
Commutative: B + C = C + B
Associative: For n number of matrices A + (B + C) =
(A + B) + C
Order of the matrices involved must be same.
# R program to add two matrices
Non-Commutative: B – C != C – B
Non-Associative: For n number of matrices A – (B –
C) != (A – B) – C
Order of the matrices involved must be same.
# R program to multiply two matrices
Commutative: B * C = C * B
Associative: For n number of matrices A * (B * C) =
(A * B) * C
Order of the matrices involved must be same.
# R program to divide two matrices
Non-Commutative: B / C != C / B
Non-Associative: For n number of matrices A / (B /
C) != (A / B) / C
Order of the matrices involved must be same.
Arrays
Arrays are essential data storage structures defined by a fixed number of dimensions.
Arrays are used for the allocation of space at contiguous memory locations. Uni-
dimensional arrays are called vectors with the length being their only dimension. Two-
dimensional arrays are called matrices, consisting of fixed numbers of rows and
columns. Arrays consist of all elements of the same data type. Vectors are supplied as
input to the function and then create an array based on the number of dimensions.
While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required
number of dimension. In the below example we create an array with two elements
which are 3x3 matrices each.
a <- array(c('green','yellow'),dim=c(3,3,2))
print(a)
Creating an Array
# accessing elements
cat ("Third element of vector is : ",
vec[3])
Accessing entire matrices
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
mat_names <- c("Mat1", "Mat2")
arr = array(c(vec1, vec2), dim = c(2, 3, 2),
dimnames = list(row_names,
col_names,
mat_names))
Factors are the R-objects which are created using a vector. It stores the vector
along with the distinct values of the elements in the vector as labels.
The labels are always character irrespective of whether it is numeric or
character or Boolean etc. in the input vector. They are useful in statistical
modeling.
Factors are created using the factor() function. The nlevels functions gives the
count of levels.
apple_colors <- c('green','green','yellow','red','red','red','green')
factor_apple <- factor(apple_colors)
factor_apple
nlevels(factor_apple)
Factors in R Programming Language are data structures
that are implemented to categorize the data or represent
categorical data and store it on multiple levels.
They can be stored as integers with a corresponding label to
every unique integer. Though factors may look similar to
character vectors, they are integers and care must be taken
while using them as strings. The factor accepts only a
restricted number of distinct values. For example, a data
field such as gender may contain values only from female,
male, or transgender.
In the above example, all the possible cases are
known beforehand and are predefined. These distinct
values are known as levels. After a factor is created it
only consists of levels that are by default sorted
alphabetically.
Creating a Factor in R
Programming Language
# Creating a vector
x < -c("female", "male", "male",
"female")
print(x)
After a factor is formed, its components can be modified but the new values
which need to be assigned must be at the predefined level.
gender <- factor(c("female", "male", "male", "female" ));
gender[2]<-"female"
gender
For selecting all the elements of the factor gender except ith element,
gender[-i] should be used. So if you want to modify a factor and add value
out of predefines levels, then first modify levels.
gender <- factor(c("female", "male", "male",
"female" ));
# add new level
levels(gender) <- c(levels(gender), "other")
gender[3] <- "other"
Factors in Data Frame
Data frames are tabular data objects. Unlike a matrix in data frame each column
can contain different modes of data. The first column can be numeric while the
second column can be character and third column can be logical. It is a list of
vectors of equal length. Data Frames are created using the data.frame() function.
BMI <- data.frame( gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age =c(42,38,26)
)
print(BMI)
Create Dataframe in R
Programming Language
# R program to create dataframe
To create a data
frame in R # creating a data frame
friend.data <- data.frame(
use data.frame() co friend_id = c(1:5),
mmand and then friend_name = c("Sachin", "Sourav",
pass each of the "Dravid", "Sehwag",
vectors you have "Dhoni"),
created as arguments stringsAsFactors = FALSE
)
to the function. # print the data frame
print(friend.data)
Get the Structure of the R – Data
Frame
# R program to get the
One can get the structure # structure of the data frame
of the data frame
using str() function in R. It # creating a data frame
can display even the friend.data <- data.frame(
internal structure of large friend_id = c(1:5),
lists which are nested. It friend_name = c("Sachin", "Sourav",
provides one-liner output "Dravid", "Sehwag",
for the basic R objects "Dhoni"),
letting the user know stringsAsFactors = FALSE
about the object and its )
constituents. # using str()
print(str(friend.data))
Summary of data in the data
frame
# R program to get the
In R data frame, the statistical # summary of the data frame
summary and nature of the
data can be obtained by # creating a data frame
applying summary() function. friend.data <- data.frame(
It is a generic function used to friend_id = c(1:5),
produce result summaries of friend_name = c("Sachin", "Sourav",
the results of various model "Dravid", "Sehwag",
fitting functions. The function "Dhoni"),
invokes particular methods stringsAsFactors = FALSE
which depend on the class of )
the first argument. # using summary()
print(summary(friend.data))
Extract Data from Data Frame
in R Language # R program to extract
# data from the data frame
Syntax:
merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE)
Parameters:
df1: one dataframe
df2: another dataframe
by.df1, by.df2: The names of the columns that are common to both
df1 and df2.
all, all.df1, all.df2: Logical values that actually specify the type of
merging happens.
First of all, we will create two dataframes that will
help us to understand each join easily.
Inner join is used to keep only those rows that are matched from
the data frames, in this, we actually specify the argument all =
FALSE. If we try to understand this using set theory then we can
say here we are actually performing the intersection operation.
For example:
A = [1, 2, 3, 4, 5] B = [2, 3, 5, 6] Then the output of natural join
will be (2, 3, 5)
It is the most simplest and common type of joins available in R.
Now let us try to #understand
Joining of dataframes
this using R program:
df = merge(x = df1, y = df2, by =
"StudentId")
df
Left Outer Join
# Joining of dataframes
This join is somewhat like inner join, with only the left
data frame columns and values are selected. Now let
us try to understand this using R program:
# R program to illustrate
# Joining of dataframes
Taking multiple inputs in R language is same as taking single input, just need to
define multiple readline() for inputs. One can use braces for define
multiple readline() inside it.
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
or,
{
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
}
{
var1 = readline("Enter 1st number : ");
var2 = readline("Enter 2nd number : ");
var3 = readline("Enter 3rd number : ");
var4 = readline("Enter 4th number : ");
}
character:
var1 = readline(prompt = “Enter any
character : “);
var1 = as.character(var1)
# string input
var1 = readline(prompt = "Enter your name : ");
# character input
var2 = readline(prompt = "Enter any character :
");
# convert to character
var2 = as.character(var2)
# printing values
print(var1)
print(var2)
Using scan() method
y <- -1
if(y > 0){ print("Positive
number") }
z <- c(x,y)
if(z > 0){ print("Positive
number") }
m <- c(y,x)
if(m > 0){ print("Positive
if…else statement
if (test_expression) {
The syntax of if…else statement is:statement1
} else {
statement2 }
x <- -5
y <- if(x > 0) 5 else 6
y
if…else Ladder
fertilizer
[1] K K none N P P N N none P K none
Levels: K N none P
First, it is clear that R is no longer considering the
elements of the factor as strings of characters, given
the absence of double-quotes. Second (and more
importantly), additional information in the form of
"Levels: K N none P" is given. The levels shown
correspond to the unique values seen in the
vector ff (i.e., the categories that represent the
treatment groups).
There are other differences between a vector and a factor, which we
can see if we use the str(x) function. This function in R displays a
compact representation of the internal structure of any R variable. Let's
see what happens when we apply it to both f and fertilizer:
str(f)
# chr [1:12] "K" "K" "none" "N" "P" "P" "N" "N" "none" "P" "K" "none"
str(fertilizer)
# Factor w/ 4 levels "K","N","none",..: 1 1 3 2 4 4 2 2 3 4 ...
Note how in the factor fertilizer, the levels "K", "N", "none", and "P" are
replaced by numbers 1, 2, 3, and 4, respectively. So internally, R only
stores the numbers (indicating the level of each vector element) and
(separately) the names of each unique level. (Interestingly, even if the
vector's elements had been numerical, the levels are stored as strings
of text.)
The way R internally stores factors is important when we want
to combine them. Consider the following failed attempt to
combine factors a.fac and b.fac:
a.fac = factor(c("X","Y","Z","X"))
b.fac = factor(c("X","X","Y","Y","Z"))
factor(c(a.fac,b.fac))
# [1] 1 2 3 1 1 1 2 2 3 Levels: 1 2 3
Create a Vector X with these values: 1.1, 2, 3.0, 5.7 and repeat the
exercise.
Create a Vector Y with these values: 7,NA,9,8,NA,75,NA,65 and
repeat the exercise.
Make two vectors A and B, A being a sequence of range 1:10 and B
in the range of 10:15 sorted in decreasing order. Calculate A+B,A-
B,B-A,A*B,A/B,B/A, A^B,B^A, remainder and quotient when A/B
and B/A.
Make two vectors C and D, C being a sequence of range 11:20 and
D in the range of 5:14 sorted in decreasing order. Repeat the
Matrix exercise