R Command Cheatsheet2551545
R Command Cheatsheet2551545
field separator; eol is the end-of-line separator; na is the string for x[n] list with elements n
by Tom Short, EPRI PEAC, [email protected] 2004-11-07 missing values; use col.names=NA to add a blank column header to x[[n]] nth element of the list
Granted to the public domain. See www.Rpad.org for the source and latest get the column headers aligned correctly for spreadsheet input x[["name"]] element of the list named "name"
version. Includes material from R for Beginners by Emmanuel Paradis (with sink(file) output to file, until sink() x$name id.
permission). Most of the I/O functions have a file argument. This can often be a charac- Indexing matrices
ter string naming a file or a connection. file="" means the standard input or x[i,j] element at row i, column j
output. Connections can include files, pipes, zipped files, and R variables. x[i,] row i
On windows, the file connection can also be used with description = x[,j] column j
Getting help "clipboard". To read a table copied from Excel, use x[,c(1,3)] columns 1 and 3
x <- read.delim("clipboard") x["name",] row named "name"
Most R functions have online documentation. Indexing data frames (matrix indexing plus the following)
To write a table to the clipboard for Excel, use
help(topic) documentation on topic x[["name"]] column named "name"
write.table(x,"clipboard",sep="\t",col.names=NA)
?topic id. id.
For database interaction, see packages RODBC, DBI, RMySQL, RPgSQL, and x$name
help.search("topic") search the help system
ROracle. See packages XML, hdf5, netCDF for reading other file formats.
apropos("topic") the names of all objects in the search list matching
the regular expression ”topic” Data creation
help.start() start the HTML version of help c(...) generic function to combine arguments with the default forming a Variable conversion
str(a) display the internal *str*ucture of an R object vector; with recursive=TRUE descends through lists combining all as.array(x), as.data.frame(x), as.numeric(x),
summary(a) gives a “summary” of a, usually a statistical summary but it is elements into one vector as.logical(x), as.complex(x), as.character(x),
generic meaning it has different operations for different classes of a from:to generates a sequence; “:” has operator priority; 1:4 + 1 is “2,3,4,5” ... convert type; for a complete list, use methods(as)
ls() show objects in the search path; specify pat="pat" to search on a seq(from,to) generates a sequence by= specifies increment; length=
pattern specifies desired length Variable information
ls.str() str() for each variable in the search path seq(along=x) generates 1, 2, ..., length(along); useful for for is.na(x), is.null(x), is.array(x), is.data.frame(x),
dir() show files in the current directory loops is.numeric(x), is.complex(x), is.character(x),
methods(a) shows S3 methods of a rep(x,times) replicate x times; use each= to repeat “each” el- ... test for type; for a complete list, use methods(is)
methods(class=class(a)) lists all the methods to handle objects of ement of x each times; rep(c(1,2,3),2) is 1 2 3 1 2 3; length(x) number of elements in x
class a rep(c(1,2,3),each=2) is 1 1 2 2 3 3 dim(x) Retrieve or set the dimension of an object; dim(x) <- c(3,2)
data.frame(...) create a data frame of the named or unnamed dimnames(x) Retrieve or set the dimension names of an object
Input and output arguments; data.frame(v=1:4,ch=c("a","B","c","d"),n=10); nrow(x) number of rows; NROW(x) is the same but treats a vector as a one-
load() load the datasets written with save row matrix
shorter vectors are recycled to the length of the longest
data(x) loads specified data sets ncol(x) and NCOL(x) id. for columns
list(...) create a list of the named or unnamed arguments;
library(x) load add-on packages class(x) get or set the class of x; class(x) <- "myclass"
list(a=c(1,2),b="hi",c=3i);
read.table(file) reads a file in table format and creates a data unclass(x) remove the class attribute of x
array(x,dim=) array with data x; specify dimensions like
frame from it; the default separator sep="" is any whitespace; use attr(x,which) get or set the attribute which of x
dim=c(3,4,2); elements of x recycle if x is not long enough
header=TRUE to read the first line as a header of column names; use attributes(obj) get or set the list of attributes of obj
matrix(x,nrow=,ncol=) matrix; elements of x recycle
as.is=TRUE to prevent character vectors from being converted to fac-
factor(x,levels=) encodes a vector x as a factor
tors; use comment.char="" to prevent "#" from being interpreted as
gl(n,k,length=n*k,labels=1:n) generate levels (factors) by spec-
Data selection and manipulation
a comment; use skip=n to skip n lines before reading data; see the which.max(x) returns the index of the greatest element of x
ifying the pattern of their levels; k is the number of levels, and n is
help for options on row naming, NA treatment, and others which.min(x) returns the index of the smallest element of x
the number of replications
read.csv("filename",header=TRUE) id. but with defaults set for rev(x) reverses the elements of x
expand.grid() a data frame from all combinations of the supplied vec-
reading comma-delimited files sort(x) sorts the elements of x in increasing order; to sort in decreasing
tors or factors
read.delim("filename",header=TRUE) id. but with defaults set order: rev(sort(x))
rbind(...) combine arguments by rows for matrices, data frames, and
for reading tab-delimited files cut(x,breaks) divides x into intervals (factors); breaks is the number
others
read.fwf(file,widths,header=FALSE,sep="" ,as.is=FALSE) of cut intervals or a vector of cut points
cbind(...) id. by columns
read a table of f ixed width f ormatted data into a ’data.frame’; widths match(x, y) returns a vector of the same length than x with the elements
is an integer vector, giving the widths of the fixed-width fields Slicing and extracting data of x which are in y (NA otherwise)
save(file,...) saves the specified objects (...) in the XDR platform- Indexing vectors which(x == a) returns a vector of the indices of x if the comparison op-
independent binary format x[n] nth element eration is true (TRUE), in this example the values of i for which x[i]
save.image(file) saves all objects x[-n] all but the nth element == a (the argument of this function must be a variable of mode logi-
cat(..., file="", sep=" ") prints the arguments after coercing to x[1:n] first n elements cal)
character; sep is the character separator between arguments x[-(1:n)] elements from n+1 to the end choose(n, k) computes the combinations of k events among n repetitions
print(a, ...) prints its arguments; generic, meaning it can have differ- x[c(1,4,2)] specific elements = n!/[(n − k)!k!]
ent methods for different objects x["name"] element named "name" na.omit(x) suppresses the observations with missing data (NA) (sup-
format(x,...) format an R object for pretty printing x[x > 3] all elements greater than 3 presses the corresponding line if x is a matrix or a data frame)
write.table(x,file="",row.names=TRUE,col.names=TRUE, x[x > 3 & x < 5] all elements between 3 and 5 na.fail(x) returns an error message if x contains at least one NA
sep=" ") prints x after converting to a data frame; if quote is TRUE, x[x %in% c("a","and","the")] elements in the given set
unique(x) if x is a vector or a data frame, returns a similar object but with fft(x) Fast Fourier Transform of an array nchar(x) number of characters
the duplicate elements suppressed mvfft(x) FFT of each column of a matrix
table(x) returns a table with the numbers of the differents values of x filter(x,filter) applies linear filtering to a univariate time series or
Dates and Times
(typically for integers or factors) to each series separately of a multivariate time series The class Date has dates without times. POSIXct has dates and times, includ-
subset(x, ...) returns a selection of x with respect to criteria (..., Many math functions have a logical parameter na.rm=FALSE to specify miss- ing time zones. Comparisons (e.g. >), seq(), and difftime() are useful.
typically comparisons: x$V1 < 10); if x is a data frame, the option ing data (NA) removal. Date also allows + and −. ?DateTimeClasses gives more information. See
select gives the variables to be kept or dropped using a minus sign also package chron.
sample(x, size) resample randomly and without replacement size ele-
Matrices as.Date(s) and as.POSIXct(s) convert to the respective class;
ments in the vector x, the option replace = TRUE allows to resample t(x) transpose format(dt) converts to a string representation. The default string
with replacement diag(x) diagonal format is “2001-02-21”. These accept a second argument to specify a
prop.table(x,margin=) table entries as fraction of marginal table %*% matrix multiplication format for conversion. Some common formats are:
solve(a,b) solves a %*% x = b for x
Math solve(a) matrix inverse of a %a, %A Abbreviated and full weekday name.
sin,cos,tan,asin,acos,atan,atan2,log,log10,exp rowsum(x) sum of rows for a matrix-like object; rowSums(x) is a faster %b, %B Abbreviated and full month name.
max(x) maximum of the elements of x version %d Day of the month (01–31).
min(x) minimum of the elements of x colsum(x), colSums(x) id. for columns %H Hours (00–23).
range(x) id. then c(min(x), max(x)) rowMeans(x) fast version of row means %I Hours (01–12).
sum(x) sum of the elements of x colMeans(x) id. for columns %j Day of year (001–366).
diff(x) lagged and iterated differences of vector x
prod(x) product of the elements of x
Advanced data processing %m Month (01–12).
%M Minute (00–59).
mean(x) mean of the elements of x apply(X,INDEX,FUN=) a vector or array or list of values obtained by
%p AM/PM indicator.
median(x) median of the elements of x applying a function FUN to margins (INDEX) of X
%S Second as decimal number (00–61).
quantile(x,probs=) sample quantiles corresponding to the given prob- lapply(X,FUN) apply FUN to each element of the list X
%U Week (00–53); the first Sunday as day 1 of week 1.
abilities (defaults to 0,.25,.5,.75,1) tapply(X,INDEX,FUN=) apply FUN to each cell of a ragged array given
%w Weekday (0–6, Sunday is 0).
weighted.mean(x, w) mean of x with weights w by X with indexes INDEX
%W Week (00–53); the first Monday as day 1 of week 1.
rank(x) ranks of the elements of x by(data,INDEX,FUN) apply FUN to data frame data subsetted by INDEX
%y Year without century (00–99). Don’t use.
var(x) or cov(x) variance of the elements of x (calculated on n − 1); if x is merge(a,b) merge two data frames by common columns or row names
%Y Year with century.
a matrix or a data frame, the variance-covariance matrix is calculated xtabs(a b,data=x) a contingency table from cross-classifying factors
%z (output only.) Offset from Greenwich; -0800 is 8 hours west of.
sd(x) standard deviation of x aggregate(x,by,FUN) splits the data frame x into subsets, computes
%Z (output only.) Time zone as a character string (empty if not available).
cor(x) correlation matrix of x if it is a matrix or a data frame (1 if x is a summary statistics for each, and returns the result in a convenient
vector) form; by is a list of grouping elements, each as long as the variables
in x Where leading zeros are shown they will be used on output but are optional
var(x, y) or cov(x, y) covariance between x and y, or between the on input. See ?strftime.
columns of x and those of y if they are matrices or data frames stack(x, ...) transform data available as separate columns in a data
cor(x, y) linear correlation between x and y, or correlation matrix if they frame or list into a single column
are matrices or data frames unstack(x, ...) inverse of stack()
round(x, n) rounds the elements of x to n decimals reshape(x, ...) reshapes a data frame between ’wide’ format with
log(x, base) computes the logarithm of x with base base repeated measurements in separate columns of the same record and Plotting
’long’ format with the repeated measurements in separate records; plot(x) plot of the values of x (on the y-axis) ordered on the x-axis
scale(x) if x is a matrix, centers and reduces the data; to center only use
use (direction=”wide”) or (direction=”long”) plot(x, y) bivariate plot of x (on the x-axis) and y (on the y-axis)
the option center=FALSE, to reduce only scale=FALSE (by default
hist(x) histogram of the frequencies of x
center=TRUE, scale=TRUE) Strings barplot(x) histogram of the values of x; use horiz=FALSE for horizontal
pmin(x,y,...) a vector which ith element is the minimum of x[i], paste(...) concatenate vectors after converting to character; sep= is the bars
y[i], . . . string to separate terms (a single space is the default); collapse= is dotchart(x) if x is a data frame, plots a Cleveland dot plot (stacked plots
pmax(x,y,...) id. for the maximum an optional string to separate “collapsed” results line-by-line and column-by-column)
cumsum(x) a vector which ith element is the sum from x[1] to x[i] substr(x,start,stop) substrings in a character vector; can also as- pie(x) circular pie-chart
cumprod(x) id. for the product sign, as substr(x, start, stop) <- value boxplot(x) “box-and-whiskers” plot
cummin(x) id. for the minimum strsplit(x,split) split x according to the substring split sunflowerplot(x, y) id. than plot() but the points with similar coor-
cummax(x) id. for the maximum grep(pattern,x) searches for matches to pattern within x; see ?regex dinates are drawn as flowers which petal number represents the num-
union(x,y), intersect(x,y), setdiff(x,y), setequal(x,y), gsub(pattern,replacement,x) replacement of matches determined ber of points
is.element(el,set) “set” functions by regular expression matching sub() is the same but only replaces stripplot(x) plot of the values of x on a line (an alternative to
Re(x) real part of a complex number the first occurrence. boxplot() for small sample sizes)
Im(x) imaginary part tolower(x) convert to lowercase coplot(x˜y | z) bivariate plot of x and y for each value or interval of
Mod(x) modulus; abs(x) is the same toupper(x) convert to uppercase values of z
Arg(x) angle in radians of the complex number match(x,table) a vector of the positions of first matches for the elements interaction.plot (f1, f2, y) if f1 and f2 are factors, plots the
Conj(x) complex conjugate of x among table means of y (on the y-axis) with respect to the values of f1 (on the
convolve(x,y) compute the several kinds of convolutions of two se- x %in% table id. but returns a logical vector x-axis) and of f2 (different curves); the option fun allows to choose
quences pmatch(x,table) partial matches for the elements of x among table the summary statistic of y (by default fun=mean)