100% found this document useful (2 votes)
611 views50 pages

Sas 1

Here are the steps to add DOB and DOJ fields to the Employee Excel sheet from question 4 and import into SAS: 1. Add two new columns - DOB and DOJ to the Excel sheet with sample date values in MM/DD/YYYY format 2. Import the updated Excel sheet into SAS using proc import 3. Extract the date fields and convert to SAS date format 4. Print the dataset Solution: proc import datafile="/folders/myfolders/Emply_updated.xlsx" out=work.employee dbms=xlsx replace; run; data employee; set employee; DOB=input(DOB,MMDDYY

Uploaded by

pratyusa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
611 views50 pages

Sas 1

Here are the steps to add DOB and DOJ fields to the Employee Excel sheet from question 4 and import into SAS: 1. Add two new columns - DOB and DOJ to the Excel sheet with sample date values in MM/DD/YYYY format 2. Import the updated Excel sheet into SAS using proc import 3. Extract the date fields and convert to SAS date format 4. Print the dataset Solution: proc import datafile="/folders/myfolders/Emply_updated.xlsx" out=work.employee dbms=xlsx replace; run; data employee; set employee; DOB=input(DOB,MMDDYY

Uploaded by

pratyusa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

SAS Project and Lab

Manuals

Submitted to: Prof. Rinku Dixit


Submitted by: Pratyusa Goswami (338)
Lab Manual

Creating Datasets Using Datalines


1. Create a data set Number that contains the following 3 variables:

Var1 = 123
Var2 = 356
Var3 = 923

Solution: Data NUMBER;


Input Var1 Var2 Var3;
Datalines;
123
356
923
;
run;
proc print data=number;
title "NUMBER DATASET";
run;

2. Create a data set Food that contains the following variables:

Restaurant: Burger King


NumEmploy: 5
Location: Toronto
Solution: data Food;
input @1 Restaurant $ @ 12 NumEmploy Location $;
datalines;
Burger King 5 Toranto
;
run;
proc print data=food;
title "Food Dataset";
run;
3. Create a data set SCORE that contains the following variables:

Solution : Data Score;


Input Score1 Score2 Score3;
Datalines;
77 88 35
93 57 74
67 85 71
run;
proc print data=score;
title “SCORE DATASET”;
run;

4. Create a data set PROFILE that contains the following variables:

Solution: data profile;


input patid $ enrol $ bscore ;
datalines;
P001 Yes 99
P002 Yes 101
P003 No 125
;
PROC PRINT DATA= profile;
run;

5. Create a data set PROFILE-1 that contains the following variables: (Hint: Length Statement)

Solution: data Profile1;


length PAT_ID $11 Enrol $4;
Input PAT_ID $ Enrol $ Bscore;
datalines;
PAT3000001 Yes 99
PAT3000002 Yes 101
PAT3000003 No 125
;
proc print data=profile1;
run;

Subsetting
1. From the last exercise, create a new data set called NEW_PROFILE from PROFILE using
the SET statement.
Solution: Data New_Profile;
set Profile;
run;
proc print data=New_Profile;
title "New Profile";
run;

2. Create a new data set called ENROL based on the PROFILE data set. ENROL should
contains only the patients enrolled in the study (ENROL = YES)
Solution: Data Enrol;
Set Profile;
where Enrol="Yes";
run;
proc Print data=enrol;
title "Enrol";
run;

Locate the HOLIDAY data set from SASHELP. Create a subset of the HOLIDAY data set that
contains only the holidays that fall in January. Name the new data set as JanHol and have it created
in the WORK library. How many observations are there in the subset?
Solution: data janhol;
set sashelp.holiday;
where month= 1;
proc print data=janhol;
run;
Exporting Data from SAS
5 Steps to Export Data:
Step 1: Right-click the data set that you'd like to export.

Step 2: Click Export from the list.

Step 3: Select the shared folder where the data set should be exported to.

Step 4: Name the file to be exported (from Filename).

Step 5: Select the type of file to be exported (Excel, Text, CSV ...etc.)

1. Locate the CP951 data set from the SASHELP library. Save the CP951 data set into the
shared folder myfolders.
2. Locate the ELECTRIC data set from the SASHelplibrary. Export ELECTRIC into an
Excel spreadsheet. Ensure the Excel spreadsheet contains the same rows and columns as
the SAS data set.
Reading Data into SAS from .TXT or .XLSX
1. Consider the following data stored in a TXT File
Store Data
Store Revenue Staff Salary Operation Profit Complaint Turnover
STORE101 128000 18 29200 15200 83600 5 2
STORE102 158000 17 19000 12000 127000 11 2
STORE103 138000 18 26300 10500 101200 7 1
STORE104 101000 17 19700 19700 61600 5 2
STORE105 123000 15 29500 10400 83100 7 1
STORE106 189000 13 24400 12600 152000 5 2
STORE107 135000 10 24800 11900 98300 5 2
STORE108 130000 14 19400 11000 99600 3 1
STORE109 191000 12 28300 10500 152200 8 2
STORE110 176000 10 23500 15900 136600 9 1

Your boss needs a SAS data set that contains only the stores with Revenue per Staff higher than
$10,000. Write a SAS code to extract this information.
Solution: data Store;
infile "/folders/myfolders/Store.txt" firstobs=2;
Input Store $ Revenue Staff Salary Operation Profit Complaint Turnover;
run;
Proc print data=Store;
where Revenue gt 10000;
title "Store Data";
run;
2. Create a text file Temperature containing Temperature in Celcius on specific dates. Read it
into SAS and display the temperature in Fahrenheit.
Solution: data Convert_Temp;
infile "/folders/myfolders/Temperature.txt";
Input Date Ddmmyy10. Temp_c;
Format Date Ddmmyy10.;
DO Temp_F=1.8*Temp_c+32;
output;
end;
run;
proc print data=convert_temp;
run;

3. Create a file in Excel Grades.xlsx which contains data on Student Grades. Use Import
statement to read data from this file into SAS dataset.
Solution: proc import datafile="/folders/myfolders/Students_grade.xlsx"
out=work.students
dbms=xlsx
replace;
run;
proc print data=work.students;
run;

4. Create the following data in an Excel Sheet and import it in SAS.

EmpID Lastname Firstname JobCode Annual Salary

31 GOLDENBERG DESIREE PLT 50221.62

40 WILLIAMS ARLENE M. FLTAT 23666.12

71 PERRY ROBERT A. FLTAT 21957.71

82 MCGWIER-WATTS CHRISTINA PLT 96387.39

91 SCOTT HARVEY F. FLTAT 32278.4

106 THACKER DAVID S. FLTAT 24161.14

355 BELL THOMAS B. PLT 59803.16

366 GLENN MARTHA S. PLT 120202.38

Solution: proc import datafile="/folders/myfolders/Emply.xlsx"

out=work.employee2

dbms=xlsx

replace;

run;

proc print data=work.employee2;

run;
Conditional & Iterative Constructs
1. Create an Excel File with fields as: EMPID, NAME & AGE. Import the file in SAS and display the data
with one additional field Age_Group calculated as per the below stated categories. (Note- Leave Age
field blank for at least 2 records to exercise the missing option.)

If missing (Age) then Age_Group= . ;

Else if Age le 20 then Age_Group= 1;

Else if Age le 40 then Age_Group= 2;

Else if Age le 60 then Age_Group= 3;

Else if Age le 80 then Age_Group= 4;

Else if Age gt 80 then Age_Group= 5;

Solution: proc import datafile="/folders/myfolders/Emply.xlsx"

out=work.EMP

dbms=xlsx

replace;

sheet=Employee;

run;

Data employee_group;

set work.emp;

If missing(AGE) then Age_Group= . ;

Else if AGE le 20 then Age_Group= 1;

Else if AGE le 40 then Age_Group= 2;

Else if AGE le 60 then Age_Group= 3;

Else if AGE le 80 then Age_Group= 4;

Else if AGE gt 80 then Age_Group= 5;

proc print data=employee_group;

run;
2. Consider SAShelp data set Retail, write a program to create a new data set (Sales_Status) with the
help of following variables:

If sales greater than or equal to 300 set Bonus equal to ‘Yes’ and Level to ‘High’. Otherwise, if sales is
not missing, set Bonus to ‘NO’ and Level to ‘Low’. List the observations in this data set.

Solution: Data Sales_status;

set sashelp.retail;

If Sales ge 300 Then DO;

Bonus ="Yes";

Level="High";

END;

Else Do;

Bonus="No";

Level="Low";

End;

Proc print data=sales_status;

run;
3. Create a conversion table for pounds and kilograms. The table should have one column showing
pounds from 0 to 100 in units of 10. The second column should show the kilogram equivalents.
Note: 1KG =2.2 Lbs.

Solution: DATA Weight_conv;

Do W_Pound=0 to 100;

W_Kg= 2.2*W_Pound;

Output;

End;

proc print data=weight_conv;

run;
4. You have a variable called Money initialized at 100. Write a DO WHILE loop that compounds this
amount by 3 percent each year and computes the amount of money plus interest for each year.
Stop when the total amount exceeds 200.

Solution: DATA LOAN;

MONEY=100;

INTEREST=0.03;

AMOUNT=200;

YEAR=0;

DO while (MONEY lt AMOUNT);


YEAR+1;

MONEY=MONEY+INTEREST*MONEY;

If Money GT 200 THEN

LEAVE;

OUTPUT;

END;

RUN;

Title " Loan data";

PROC PRINT DATA=LOAN;

RUN;
Handling Date & Subsetting
1. Consider the Employee Excel Sheet created in Section “Reading Data into SAS from .TXT or .XLSX”,
Q.4. Add fields DOB and DOJ referring to Date of Birth and Date of Joining of Employees. Import this file
and calculate the ages and years of experience of all employees as two new fields in your SAS datasets.

Solution: DATA EMPLOYEE;

INFILE "/folders/myfolders/EMP.txt";

INPUT EMPID Gender $ Name $ DOB Mmddyy10. Location $ Salary ManagerEmpID


DOJ Mmddyy10.;

FORMAT DOB Mmddyy10.;


Format DOJ Mmddyy10.;

AGE=yrdif(DOB,TODAY());

Experience=yrdif(DOJ,TODAY());

Title "EMPLOYEES TABLE 1";

proc print data= EMPLOYEE;

RUN;

2. From the dataset created in above question display the records of employees who have experience
greater than or equal to 10 years.

Solution: proc print data=employee;

where Experience ge 10;

title "Employees with experience>=10 years";

run;

3. Consider SAS help data set CARS, create two temporary data sets. The first named CHEAP should
include all observations from Cars where the MSRP (manufacturer’s suggested retail price) is
less than or equal to $11,000. The other EXPENSIVE should include all observations from Cars
where MSRP is greater than or equal to $100,000. Include only the fields Male, Type, Origin and
MSRP. List observations from both data sets. The program should take care that if there are
missing values for MSRP, then those observations must not be written to CHEAP.

Solution: DATA CHEAP EXPENSIVE;

SET SASHELP.CARS;

IF MSRP LE 11000 THEN OUTPUT CHEAP;


ELSE IF MSRP GE 100000 THEN OUTPUT EXPENSIVE ;

RUN;

title "Cheap Dataset";

PROC PRINT DATA= CHEAP;

var Model Type Origin MSRP;

run;

title "Expensive Dataset";

PROC PRINT DATA= Expensive;

var Model Type Origin MSRP;

run;

4. Using the CARS permanent SAS dataset, write SAS code to do the following:

a) Create a subset (SMALL) consisting of all vehicles whose engine size is less than 2.0 L. On the basis of
this dataset, find the average city and highway miles per gallon for these vehicles.

Solution: DATA SMALL;


SET SASHELP.CARS;

WHERE EngineSize lt 2;

PROC PRINT DATA=SMALL;

RUN;

Title "The average city and highway miles per gallon for vehicles with engine size less
than 2.0L";

proc means data=small mean;

Var MPG_City MPG_Highway;

run;

b) Create a subset (HYBRID) of all hybrid vehicles in the dataset. For these vehicles:

List the brand and Model Name.

Find the average city and highway miles per gallon.

Solution: DATA HYBRID;

SET SASHELP.CARS;

WHERE TYPE="Hybrid";

run;

title "Hybrid Cars ";

proc print data=hybrid;

var Make Model;

run;

Title "The average city and highway miles per gallon(Hybrid cars)";

proc means data=Hybrid mean;

Var MPG_City MPG_Highway;


run;

c) Create a subset (AMDSUV) consisting of all vehicles that are both SUVs and have all-wheel drive. Sort
the data by highway miles per gallon. List the BRAND, MODEL and highway miles per gallon for this
sorted data.

Solution: Data AMDSUV;

set sashelp.cars;

where Type="SUV" and DriveTrain="All";

run;

Title "AMDSUV DATASET";

PROC SORT DATA=AMDSUV;

BY MPG_Highway;

run;

proc print data=amdsuv;

var Make Model MPG_Highway;

run;
Data Analytics Using SAS Statistical Functions
Q.1. Consider the prdsale data set. It is available in the SAS help library. Answer these questions:

a) Print the contents of Prdsale data and write your observations.

Solution: data Sales;

set sashelp.prdsale;

Title "SALES DATASET";

proc Contents data=sales;

run;

b) Print the first 20 observations of Prdsale data and write your observations.

Solution: proc print data=sales (obs=20);

TITLE "First 20 Observations";

run;
c) What is the size of population?

Solution: proc means data=sales n;

run;

d) Filter the data and take a sample (where country=Canada).

Solution: proc print data=sales;

where Country="CANADA";

Title "Canada Data";

run;
e) Take a random sample of size 30.

Solution: proc surveyselect data=SASHELP.PRDSALE out=work.RandomSample method=srs

sampsize=30;

run;

proc print data=work.RandomSample(obs=30);

title "Subset of work.RandomSample";

run;
f) Identify the continuous, discrete, and categorical variables.

Solution: Continuos Variables-Actual and Predicted Sales

Discrete variables- Quarter and Year

Categorical variables- Country,Region, Division, ProdType and Product

g) What are cause variables (independent)? What are effect variables (dependent)?

Solution- Actual and Predicted Sales are effect(Dependent variables ) while rest variables are
cause(Independent) variables.

h) Calculate a parameter (mean actual sales of the population).

Solution: proc means data=Sales;

var actual;

Title "Mean Actual Sales of Population";

run;

i) Calculate a statistic (mean actual sales of the sample).

Solution: proc means data=work.randomsample;

var actual;

Title "Mean Actual Sales of Sample";

run;

j) How close is the statistic to a parameter? Is it a good estimate?

Correlation
1. Use the dataset CARS1 and get the result showing the correlation coefficients between
horsepower and weight.

Solution: data car;

set sashelp.cars;
proc corr data=car;

var Horsepower Weight;

run;

2. Use Fisher’s iris data from SAS help. Compute SAS correlation analysis of all variables and
explain the results. Then depict the various plots and explain the observations.

Solution: data iris_data;

set sashelp.iris;

proc print data=iris_data;

title "Iris";

proc corr data=iris_data;

run;

proc univariate data=iris_data;

ID Species;

Histogram;

qqplot/normal(mu=est sigma=est);

run;
3. Consider the following Fitness Data with fields Age, Weight, Runtime, Oxygen. The data is stored in a
.txt file and values are separated by spaces. Compute the correlation analysis of all variables with plots
and explain the results.

57 73.37 12.63 39.407

54 79.38 11.17 46.080

52 76.32 9.63 45.441

50 70.87 8.92 .

51 67.25 11.08 45.118


54 91.63 12.88 39.203

51 73.71 10.47 45.790

57 59.08 9.93 50.545

49 76.32 . 48.673

48 61.24 11.5 47.920

52 82.78 10.5 47.467

44 73.03 10.13 50.541

45 87.66 14.03 37.388

45 66.45 11.12 44.754

47 79.15 10.6 47.273

54 83.12 10.33 51.855

49 81.42 8.95 40.836

51 77.91 10.00 46.672

48 91.63 10.25 46.774

49 73.37 10.08 50.388

44 89.47 11.37 44.609

40 75.07 10.07 45.313

44 85.84 8.65 54.297

42 68.15 8.17 59.571

38 89.02 9.22 49.874

47 77.45 11.63 44.811

40 75.98 11.95 45.681

43 81.19 10.85 49.091

44 81.42 13.08 39.442

38 81.87 8.63 60.055

Solution: Data Fit;

infile "/folders/myfolders/Fitness.txt";

input Age Weight Runtime Oxygen;


run;

proc print data=Fit;

run;

proc univariate data=fit;

Histogram;

qqplot/normal(mu=est sigma=est);

run;
Regression
Consider the Gallup Dataset sent to you. Do the following questions:

1. Bring the gallup.txt data into SAS and save the data as a permanent SAS data set.
Solution: data gallup;
infile "/folders/myfolders/gallup.txt";
input location age race gender education emp wage hours weeks salary income
disloc train monthu rate;
run;
2. Display the contents of your data file.
Solution: proc contents data=gallup;
Title "Contents of the Dataset";
run;

3. Display the descriptive statistics of all of the variables.


Solution: proc means data=gallup;
Title "Descriptive Statistics";
run;

4. Display the descriptive statistics of age, employment status, and wage.


Solution: proc means data=gallup;
var emp age wage;
run;
5. Display a frequency table of education.
Solution: proc freq data=gallup;
tables education;
Title " Frequency Table Education";
run;

6. Create a new temporary data set that contains only the variables age, race, gender, and
education for Pittsburgh.
Solution: data temp;
infile "/folders/myfolders/gallup.txt";
input age race gender education;
run;
proc print data=temp;
Title " Temp Dataset";
run;

7. Display the cross tabulation of race and gender for Pittsburgh observations.
Solution: proc freq data=temp;
tables race*gender;
Title "Cross Tabulation Table";
run;
Exercise 2.
Write one SAS program to do all of the following:
1. Bring in SAS data gallup.txt into a new temporary data set. Drop the observations that have
a salary of 0.
Solution: data temp_new;
set gallup;
if salary=0 then delete;
run;
Title "***New Gallup Dataset***";
proc print data=temp_new;
run;

2. Create a dummy variable that takes on the value 1 if an individual’s salary is greater than
$20,000 and equals 0 otherwise.
Solution: data temp2;
set gallup;
if salary gt 20000 then var=1;
else var=0;
run;
title "****Temp2****";
proc print data=temp2;
run;
3. Display the mean age for high and low income individuals. To do this, you must first sort by
your salary dummy variable.
Solution: proc sort data=temp2
out=Sorted_Temp;
by descending var;
run;
proc means data=sorted_Temp mean;
class var;
var age;
run;
4. Display a frequency distribution of your dummy variable.
Solution: proc freq data=temp2;
tables var;
run;

5. Estimate a simple and a multiple regression where salary is the dependent variable. Use the
explanatory variables of your choice.

Solution: proc reg data=temp_new;


model salary=education;
output out=SLR PREDICTED=PRED_SALARY;
Title " Simple Linear regression";
run;
proc reg data=temp_new;
model salary=education age gender ;
output out=MLR PREDICTED=PRED_SALARY;
Title " Multiple Linear Regression";
run;
PROJECT: Baseball Player Performance
The Baseball dataset contains details of baseball players in the year 1986. The data also has parameters
depicting performance of the players and their career records.

Do the following using SAS:

a) Import the data in SAS.

Solution: proc import datafile="/folders/myfolders/baseball.xlsx"

out=work.baseball

DBMS=xlsx

replace;

run;

proc print data=work.baseball;

run;
b) Generate Descriptive Statistics of the entire data.

Solution: proc means data=work.baseball;

run;

c) Generate a list of the top 5 Home Run Players.

Solution: proc sort data=work.baseball


out=baseball_data;

by descending nHome;

run;

data top_5H;

set baseball_data (obs=5);

run;

Title "Top 5 Home Run Scorer";

proc print data=top_5H;

run;

d) Generate a list of the top 5 paid Players.

Solution: proc sort data=work.baseball

out=baseball2;

by descending Salary;

run;

data Top_paid;

set baseball2 (obs=5);

run;

title "Top 5 paid Player";

proc print data=top_paid;

run;
e) Find the impact of Home Runs on Salary using Linear Regression.

Solution: proc reg data=work.baseball;

Model Salary=nHome;

output out= Predicted predicted=Pred_Salary;

title "Regression analysis(Salary~nHome)";

run;
f) Add more explanatory variables nAtBat, nHits, nHome, nRuns, nRB, nBB, NBB, nOuts, nError.

Solution: proc reg data=work.baseball;

Model Salary=nHome nAtBat nHits nRuns nRBI nBB nOuts nError;

output out=Pred_Salary residual=resid Predicted=Pred;

title "Regression analysis 2";

run;
g) Identify from the results, which factors have high impact on Salary in comparison to Home Runs.

Solution: From the above results we can see that nHits, Nbb, nOuts,nAtBat are significant factors
that have impact on salary as p value for thaem is less than 0.05 While p-value for nHome is
0.7838 (>0.05). So nHome is insignificant and does not impact the Salary.Also For Factors like
nRuns ,Nrbi and nError p-value >0.05 So these factors are also insignificant. So nHits, Nbb,
nOuts,nAtBat have high impact on Salary as compared to nHome.
h) Calculate performance scores (ps) by applying the following formula:
ps= 3*nHome + 0.5*nHits + 1*nRuns +1* nAtBat - 1*nRBI + 0.3*nBB + 2*nOuts - 1*nError

Solution: data Performance_score;

set work.baseball;

Do ps=3*nHome + 0.5*nHits + 1*nRuns +1* nAtBat - 1*nRBI + 0.3*nBB +


2*nOuts - 1*nError;

end;

run;

proc print data=Performance_score;

run;

i) Calculate the impact of Performance Scores (ps) on Salary.

Solution: proc reg data=performance_score;

model Salary=ps;

output out=performance_score Predicted=Pred;

run;
j) Explain the results.

Solution: From the above results we can see that although ps is significant as p-value for ps
(<0.0001) is less than 0.05 but adjusted R-square value is 0.1573 i.e. adjusted R-square <0.7 so the
regression model is insignificant this implies that salary is correlated with ps but ps does not
explain much of variability in salary.

You might also like