Sas 1
Sas 1
Manuals
Var1 = 123
Var2 = 356
Var3 = 923
5. Create a data set PROFILE-1 that contains the following variables: (Hint: Length Statement)
Subsetting
1. From the last exercise, create a new data set called NEW_PROFILE from PROFILE using
the SET statement.
Solution: Data New_Profile;
set Profile;
run;
proc print data=New_Profile;
title "New Profile";
run;
2. Create a new data set called ENROL based on the PROFILE data set. ENROL should
contains only the patients enrolled in the study (ENROL = YES)
Solution: Data Enrol;
Set Profile;
where Enrol="Yes";
run;
proc Print data=enrol;
title "Enrol";
run;
Locate the HOLIDAY data set from SASHELP. Create a subset of the HOLIDAY data set that
contains only the holidays that fall in January. Name the new data set as JanHol and have it created
in the WORK library. How many observations are there in the subset?
Solution: data janhol;
set sashelp.holiday;
where month= 1;
proc print data=janhol;
run;
Exporting Data from SAS
5 Steps to Export Data:
Step 1: Right-click the data set that you'd like to export.
Step 3: Select the shared folder where the data set should be exported to.
Step 5: Select the type of file to be exported (Excel, Text, CSV ...etc.)
1. Locate the CP951 data set from the SASHELP library. Save the CP951 data set into the
shared folder myfolders.
2. Locate the ELECTRIC data set from the SASHelplibrary. Export ELECTRIC into an
Excel spreadsheet. Ensure the Excel spreadsheet contains the same rows and columns as
the SAS data set.
Reading Data into SAS from .TXT or .XLSX
1. Consider the following data stored in a TXT File
Store Data
Store Revenue Staff Salary Operation Profit Complaint Turnover
STORE101 128000 18 29200 15200 83600 5 2
STORE102 158000 17 19000 12000 127000 11 2
STORE103 138000 18 26300 10500 101200 7 1
STORE104 101000 17 19700 19700 61600 5 2
STORE105 123000 15 29500 10400 83100 7 1
STORE106 189000 13 24400 12600 152000 5 2
STORE107 135000 10 24800 11900 98300 5 2
STORE108 130000 14 19400 11000 99600 3 1
STORE109 191000 12 28300 10500 152200 8 2
STORE110 176000 10 23500 15900 136600 9 1
Your boss needs a SAS data set that contains only the stores with Revenue per Staff higher than
$10,000. Write a SAS code to extract this information.
Solution: data Store;
infile "/folders/myfolders/Store.txt" firstobs=2;
Input Store $ Revenue Staff Salary Operation Profit Complaint Turnover;
run;
Proc print data=Store;
where Revenue gt 10000;
title "Store Data";
run;
2. Create a text file Temperature containing Temperature in Celcius on specific dates. Read it
into SAS and display the temperature in Fahrenheit.
Solution: data Convert_Temp;
infile "/folders/myfolders/Temperature.txt";
Input Date Ddmmyy10. Temp_c;
Format Date Ddmmyy10.;
DO Temp_F=1.8*Temp_c+32;
output;
end;
run;
proc print data=convert_temp;
run;
3. Create a file in Excel Grades.xlsx which contains data on Student Grades. Use Import
statement to read data from this file into SAS dataset.
Solution: proc import datafile="/folders/myfolders/Students_grade.xlsx"
out=work.students
dbms=xlsx
replace;
run;
proc print data=work.students;
run;
out=work.employee2
dbms=xlsx
replace;
run;
run;
Conditional & Iterative Constructs
1. Create an Excel File with fields as: EMPID, NAME & AGE. Import the file in SAS and display the data
with one additional field Age_Group calculated as per the below stated categories. (Note- Leave Age
field blank for at least 2 records to exercise the missing option.)
out=work.EMP
dbms=xlsx
replace;
sheet=Employee;
run;
Data employee_group;
set work.emp;
run;
2. Consider SAShelp data set Retail, write a program to create a new data set (Sales_Status) with the
help of following variables:
If sales greater than or equal to 300 set Bonus equal to ‘Yes’ and Level to ‘High’. Otherwise, if sales is
not missing, set Bonus to ‘NO’ and Level to ‘Low’. List the observations in this data set.
set sashelp.retail;
Bonus ="Yes";
Level="High";
END;
Else Do;
Bonus="No";
Level="Low";
End;
run;
3. Create a conversion table for pounds and kilograms. The table should have one column showing
pounds from 0 to 100 in units of 10. The second column should show the kilogram equivalents.
Note: 1KG =2.2 Lbs.
Do W_Pound=0 to 100;
W_Kg= 2.2*W_Pound;
Output;
End;
run;
4. You have a variable called Money initialized at 100. Write a DO WHILE loop that compounds this
amount by 3 percent each year and computes the amount of money plus interest for each year.
Stop when the total amount exceeds 200.
MONEY=100;
INTEREST=0.03;
AMOUNT=200;
YEAR=0;
MONEY=MONEY+INTEREST*MONEY;
LEAVE;
OUTPUT;
END;
RUN;
RUN;
Handling Date & Subsetting
1. Consider the Employee Excel Sheet created in Section “Reading Data into SAS from .TXT or .XLSX”,
Q.4. Add fields DOB and DOJ referring to Date of Birth and Date of Joining of Employees. Import this file
and calculate the ages and years of experience of all employees as two new fields in your SAS datasets.
INFILE "/folders/myfolders/EMP.txt";
AGE=yrdif(DOB,TODAY());
Experience=yrdif(DOJ,TODAY());
RUN;
2. From the dataset created in above question display the records of employees who have experience
greater than or equal to 10 years.
run;
3. Consider SAS help data set CARS, create two temporary data sets. The first named CHEAP should
include all observations from Cars where the MSRP (manufacturer’s suggested retail price) is
less than or equal to $11,000. The other EXPENSIVE should include all observations from Cars
where MSRP is greater than or equal to $100,000. Include only the fields Male, Type, Origin and
MSRP. List observations from both data sets. The program should take care that if there are
missing values for MSRP, then those observations must not be written to CHEAP.
SET SASHELP.CARS;
RUN;
run;
run;
4. Using the CARS permanent SAS dataset, write SAS code to do the following:
a) Create a subset (SMALL) consisting of all vehicles whose engine size is less than 2.0 L. On the basis of
this dataset, find the average city and highway miles per gallon for these vehicles.
WHERE EngineSize lt 2;
RUN;
Title "The average city and highway miles per gallon for vehicles with engine size less
than 2.0L";
run;
b) Create a subset (HYBRID) of all hybrid vehicles in the dataset. For these vehicles:
SET SASHELP.CARS;
WHERE TYPE="Hybrid";
run;
run;
Title "The average city and highway miles per gallon(Hybrid cars)";
c) Create a subset (AMDSUV) consisting of all vehicles that are both SUVs and have all-wheel drive. Sort
the data by highway miles per gallon. List the BRAND, MODEL and highway miles per gallon for this
sorted data.
set sashelp.cars;
run;
BY MPG_Highway;
run;
run;
Data Analytics Using SAS Statistical Functions
Q.1. Consider the prdsale data set. It is available in the SAS help library. Answer these questions:
set sashelp.prdsale;
run;
b) Print the first 20 observations of Prdsale data and write your observations.
run;
c) What is the size of population?
run;
where Country="CANADA";
run;
e) Take a random sample of size 30.
sampsize=30;
run;
run;
f) Identify the continuous, discrete, and categorical variables.
g) What are cause variables (independent)? What are effect variables (dependent)?
Solution- Actual and Predicted Sales are effect(Dependent variables ) while rest variables are
cause(Independent) variables.
var actual;
run;
var actual;
run;
Correlation
1. Use the dataset CARS1 and get the result showing the correlation coefficients between
horsepower and weight.
set sashelp.cars;
proc corr data=car;
run;
2. Use Fisher’s iris data from SAS help. Compute SAS correlation analysis of all variables and
explain the results. Then depict the various plots and explain the observations.
set sashelp.iris;
title "Iris";
run;
ID Species;
Histogram;
qqplot/normal(mu=est sigma=est);
run;
3. Consider the following Fitness Data with fields Age, Weight, Runtime, Oxygen. The data is stored in a
.txt file and values are separated by spaces. Compute the correlation analysis of all variables with plots
and explain the results.
50 70.87 8.92 .
49 76.32 . 48.673
infile "/folders/myfolders/Fitness.txt";
run;
Histogram;
qqplot/normal(mu=est sigma=est);
run;
Regression
Consider the Gallup Dataset sent to you. Do the following questions:
1. Bring the gallup.txt data into SAS and save the data as a permanent SAS data set.
Solution: data gallup;
infile "/folders/myfolders/gallup.txt";
input location age race gender education emp wage hours weeks salary income
disloc train monthu rate;
run;
2. Display the contents of your data file.
Solution: proc contents data=gallup;
Title "Contents of the Dataset";
run;
6. Create a new temporary data set that contains only the variables age, race, gender, and
education for Pittsburgh.
Solution: data temp;
infile "/folders/myfolders/gallup.txt";
input age race gender education;
run;
proc print data=temp;
Title " Temp Dataset";
run;
7. Display the cross tabulation of race and gender for Pittsburgh observations.
Solution: proc freq data=temp;
tables race*gender;
Title "Cross Tabulation Table";
run;
Exercise 2.
Write one SAS program to do all of the following:
1. Bring in SAS data gallup.txt into a new temporary data set. Drop the observations that have
a salary of 0.
Solution: data temp_new;
set gallup;
if salary=0 then delete;
run;
Title "***New Gallup Dataset***";
proc print data=temp_new;
run;
2. Create a dummy variable that takes on the value 1 if an individual’s salary is greater than
$20,000 and equals 0 otherwise.
Solution: data temp2;
set gallup;
if salary gt 20000 then var=1;
else var=0;
run;
title "****Temp2****";
proc print data=temp2;
run;
3. Display the mean age for high and low income individuals. To do this, you must first sort by
your salary dummy variable.
Solution: proc sort data=temp2
out=Sorted_Temp;
by descending var;
run;
proc means data=sorted_Temp mean;
class var;
var age;
run;
4. Display a frequency distribution of your dummy variable.
Solution: proc freq data=temp2;
tables var;
run;
5. Estimate a simple and a multiple regression where salary is the dependent variable. Use the
explanatory variables of your choice.
out=work.baseball
DBMS=xlsx
replace;
run;
run;
b) Generate Descriptive Statistics of the entire data.
run;
by descending nHome;
run;
data top_5H;
run;
run;
out=baseball2;
by descending Salary;
run;
data Top_paid;
run;
run;
e) Find the impact of Home Runs on Salary using Linear Regression.
Model Salary=nHome;
run;
f) Add more explanatory variables nAtBat, nHits, nHome, nRuns, nRB, nBB, NBB, nOuts, nError.
run;
g) Identify from the results, which factors have high impact on Salary in comparison to Home Runs.
Solution: From the above results we can see that nHits, Nbb, nOuts,nAtBat are significant factors
that have impact on salary as p value for thaem is less than 0.05 While p-value for nHome is
0.7838 (>0.05). So nHome is insignificant and does not impact the Salary.Also For Factors like
nRuns ,Nrbi and nError p-value >0.05 So these factors are also insignificant. So nHits, Nbb,
nOuts,nAtBat have high impact on Salary as compared to nHome.
h) Calculate performance scores (ps) by applying the following formula:
ps= 3*nHome + 0.5*nHits + 1*nRuns +1* nAtBat - 1*nRBI + 0.3*nBB + 2*nOuts - 1*nError
set work.baseball;
end;
run;
run;
model Salary=ps;
run;
j) Explain the results.
Solution: From the above results we can see that although ps is significant as p-value for ps
(<0.0001) is less than 0.05 but adjusted R-square value is 0.1573 i.e. adjusted R-square <0.7 so the
regression model is insignificant this implies that salary is correlated with ps but ps does not
explain much of variability in salary.