Chapter 2: Introducton to the research process using SAS
Figure 2.1 Portion of a dataset showing data collected for participants in a screening survey (example)
Figure 2.1 Code
Click here to show code as text
/* Create a library called "source". This means that when you process datasets that have been imported into SAS you can use the word source instead of having to type the path to the folder holding the SAS datasets. SAS will only be able to read data from that folder and not accidentally change the data. You only need to type the libname statement once, when you first start SAS. The statements are repeated in the examples for your convenience. */ /* Save the data into a folder and type an approriate path to tell SAS where to find the data. Use a path like this for windows. */ libname source "C:\Projects\Books\Presenting\data\sasData" access = readonly; /* Use a path like this for SAS University Edition. */ libname source "/folders/myfolders/Presenting/data/sasData" access = readonly; /* For the people who hate to type, this is the same SAS University Edition path. */ libname source "~/Presenting/data/sasData" access = readonly; /* Print the first 14 records, without the record number using variable labels and round numeric values. */ proc print data = source.rhinitis (obs=14) noobs label round; /* id studyID; * If you had study ID it would go here to use a row label.; */ /* No var statement added so all variables are listed. */ /* var age yob; * You can specify specific variables like this; */ run;
Figure 2.3 Checking for errors in the data (example)
Figure 2.3 Code
Click here to show code as text
/* Create a library called "source". This means that when you process datasets that have been imported into SAS you can use the word source instead of having to type the path to the folder holding the SAS datasets. SAS will only be able to read data from that folder and not accidentally change the data. You only need to type the libname statement once, when you first start SAS. The statements are repeated in the examples for your convenience. */ /* Save the data into a folder and type an approriate path to tell SAS where to find the data. Use a path like this for windows. */ libname source "C:\Projects\Books\Presenting\data\sasData" access = readonly; /* Use a path like this for SAS University Edition. */ libname source "/folders/myfolders/Presenting/data/sasData" access = readonly; /* For the people who hate to type, this is the same SAS University Edition path. */ libname source "~/Presenting/data/sasData" access = readonly; /* Print the first 14 records, without the record number using variable labels and round numeric values. */ proc print data = source.rhinitis (obs=14) noobs label round; /* id studyID; * If you had study ID it would go here to use a row label.; */ /* No var statement added so all variables are listed. */ /* var age yob; * You can specify specific variables like this; */ run;
Figure 2.4 SAS output to check data
Figure 2.4 Code
Click here to show code as text
libname source "C:\Projects\Books\Presenting\data\SASData" access = readonly; /* SAS datasets contain only two types of data, character and numeric. You can not do math on character data. So, good programmers will code "secret code numbers" as character strings to prevent people from making mistakes like accidentally caculating an average on the season. You will see character data listed with values in quotes in SAS programs. Numeric values appear as unquoted numbers. SAS uses formats to change the appearance of data. For example the dollar format which is built into SAS can be used to have 1234.5 appear as $1,234.50. SAS programmers can define their own formats. SAS character formats can cause letters/words/phrases to appear as different letters/words/phrases. The $ in the user defined seasons format definition below indicates that this is a character format that will display letters/words/phrases instead of the numberic characters that are actually in the data. The rhinitis format below is a numeric format that can be used to cause the numbers 0 and 1 to appear as words. Once this procedure is run, the formats can be used repeatedly but they will be erased when SAS is quit. The format statements in the frequency procedure below use the formats which are created here. */ proc format library = work; /* $ is used because character strings are displaying for other characters. */ value $season "1" = "Dry season" "2" = "Wet season" "3" = "Anytime" ; /* Display numeric values with words. */ value rhinitis 0 = "No" 1 = "Yes" ; run; /* Make a contingency table. */ proc freq data = source.rhinitis; label whenrhin = "When get rhinitis"; /* Add a descriptive label. */ format whenrhin $season. ; /* Apply the format created above. */ label rhinitis = "Rhinitis with a cold in last 12 months"; format rhinitis rhinitis.; /* Make a 2x2 table but don't show row, column or total percentages. */ tables whenrhin * rhinitis / norow nocol nopercent; run; /* Show default numeric summary (n, mean, sd, min, max) and round to 2 digits. */ proc means data = source.rhinitis maxdec = 2; /* Use all variable that start with the letters sy d or pu. */ var sy: d: pu:; run; /* Do a statistical graphics plot. */ proc sgplot data = source.rhinitis; /* Make a histogram showing counts instead of default percentages and use 20 bars. */ histogram diast2 / scale = count nbins = 20; run;