Week 1-PART 1-Review SAS Program Basics
Week 1-PART 1-Review SAS Program Basics
SAS Programming
PHEB 631: SAS PROGRAMMING FOR
EPIDEMIOLOGICAL RESEARCH
Xiaohui Xu, Ph.D.
Department of Epidemiology and Biostatistics
Part 1. Review SAS
program basics
Lecture Outlines
• Review SAS programming basics
• SAS environment
• SAS Syntax
• SAS Datasets (Temporary vs. Permanent)
• Creating SAS datasets
1.1 SAS Environment
4
1.1 SAS Environment
When you start the SAS Program, there are 5 basic SAS Windows or
Panels:
Editor: where you type programs for analyzing data
Log: where error messages and executed SAS commands are printed
Output: where the result of SAS programs are printed. There is one
more window linked to this window
Results: where you can see each section of your output
Explorer: where you can browse through your SAS programs where
SAS data is stored
5
1.1 SAS Environment
6
1.1 SAS Environment
8
1.2. SAS syntax-Structure And
Components Of SAS Programs
• The SAS program consists of a DATA step or a PROC step
or any combination of DATA and PROC steps
1.2.1 DATA steps
• DATA steps typically create or modify SAS data sets.
They can also be used to produce custom designed
reports. For example, you can use DATA steps to
• Put Your Data Into A SAS Data Set
• Compute Values
• Check For And Correct Errors In Your Data
• Produce New SAS Data Sets By Subsetting, Merging, And
Updating Existing Data Sets.
1.2.2 PROC (procedure) steps
• PROC (procedure) steps are pre-written routines that enable you
to analyze and process the data in a SAS data set and to present the
data in the form of a report. PROC steps sometimes create new SAS
data sets that contain the results of the procedure. PROC steps can
list, sort, and summarize data. For example, you can use PROC steps
to
• Create a report that lists the data
• Produce descriptive statistics
• Create a summary report
• Produce plots and charts.
Example- Proc Procedure (Proc
Means)
1.2.3 Boundary of SAS steps
• The beginning of a new step (DATA or PROC) implies the
end of the previous step.
• The RUN statement and the QUIT statement mark step
boundaries.
1.3 SAS statements
• SAS programs consist of SAS statements.
• Characteristics of SAS Programs
• All SAS statements must end with a semicolon (;).
• SAS statements typically begin with a SAS keyword. (Examples in the above
program include OPTIONS, TITLE, DATA, INPUT, DATALINES, RUN, PROC,
and VAR.)
• SAS programs can be freely formatted:
• SAS is not case sensitive; uppercase and lowercase letters are recognized as
the same, even for variable names.
• The words in SAS statements are separated by blanks or special characters
(e.g. =, +, or *).
1.3.1 free format of SAS
statement
they can begin and end anywhere on a line
data clinic.admit2;
set clinic.admit;
run;
proc print data=clinic.admit2;
run;
• Example:
Libname clinic ‘c:\users\name\sasuser’;
Data clinic.admit2;
Set clinic.admit;
Weight=round(weight);
Run;
1.4.3 Rules for SAS names
• SAS names: • Rules:
• Library name • The length of a SAS
• Data set name name depends
• Variables
• The first character must be an
English letter (A, a, B, b, C, c, . . .
• Array name , Z, z,) or an underscore (_).
• …… • Blanks are not allowed.
• Special characters, except for the
underscore, are not allowed.
1.4.3 Rules for SAS names
Element Names Maximum Length in Bytes
Librefs (library names) 8
Data set name 32
Variable 32
Array names 32
1.5 Structure of SAS dataset
• A SAS data set is a file that consists of two parts
• a descriptor portion
• a data portion.
1.5.1 descriptor portion
• The descriptor portion of a SAS data set contains information
about the data set, including
•_ the name of the data set
•_ the date and time that the data set was created
•_ the number of observations
• _ the number of variables
1.5.2 Data portion
• The data portion of a SAS data set is a collection
of data values that are arranged in a rectangular
table.
• Rows: Rows (called observations) in the data set
are collections of data values that usually relate
to a single object. Individual records
•
• Columns: Columns (variables) in the data set
are collections of values that describe a particular
characteristic, names of variables
1.5.2.1 Variable attributes
• Name
• The same rule as data set name above
• Type
• A variable's type is either character or numeric.
• _ Character variables, such as Name, can contain any values.
• _ Numeric variables, such as Total score, can contain only numeric values (the digits 0 through
9, +, -, ., and E for scientific notation).
• Missing value
• For character variables such as Name, a blank represents a missing value.
• For numeric variables such as Age, a period represents a missing value.
1.5.2.1 Variable attributes
• Length
• A variable's length (the number of bytes used to store it) is
related to its type:
Character variables can be up to 32,767 bytes long.
All numeric variables have a default length of 8.
• Format
• Formats are variable attributes that affect the way data values
are written
• Example: to display the value 1234 as $1234.00 in a report,
you can use the DOLLAR8.2 format
1.5.2.1 Variable attributes
• Informat
• informats read data values in certain forms into standard SAS values.
• For example, the numeric value $1,234.00 contains two special characters, a dollar
sign ($) and a comma (,). You can use an informat to read the value while removing
the dollar sign and comma, and then store the resulting value as a standard numeric
value.
• Label
• A variable can have a label, which consists of descriptive text up to 256 characters
long
1.6 Creating SAS datasets
1. Direct Entry
2. Reading external data into SAS: Text File: txt, Excel File:
xls, xlsx, csv
3. Reading SAS Transport File: xpt
1.6.1 Direct entry
2.1 Create a dataset “TEST”: Create a temporary data
SAS Program: set “TEST”
DATA TEST;
INPUT SUBJECT 1-2 GENDER $ 4 EXAM1 6-8 EXAM2 10-12
HW_GRADE $ 14;
DATALINES;
10 M 80 84 A
7 M 85 89 A DATALINES Input statement show what to call
4 F 90 86 B statement show the variables and where to find
20 M 82 85 B that the DATA them on the data line; Among
25 F 94 94 A statement end them, “GENDER” and
14 F 88 84 C here and below “HW_GRADE” are character
; are the actual variables
data
Run
1.6.1 Direct entry
2.2. Example –Enhancing Data Step program
SAS Program:
RUN;
RANGE=
SHEET=“Sheet1$”;
GETNAMES=YES;
DATAROW=2;
Run;
36
1.6.3. Creating Permanent Dataset in SAS
xpt file (e.g., NHANES datasets)
37