Chapter 5: Introduction to presenting statistical analyses using R

Box 5.3 Assessing non-response bias

Box 5.3

Box 5.3 Code

# Tell R where you will be working.  R will look here for your data. 
# You could also add the path to the load statements below.
setwd("~/Documents/Books/Presenting/data/rData/")
 
# Load the dataset rhinitis out of the current folder/directory into R's 
# interactive working memory/environment. See chapter 2 for more details.
load("nonResponse.RData")
 
 
# The factor() function below  below replaces the character string variable  
# called race in the nonResponse dataset with a variable, also called race.  R 
# will treat the new variable as an ordered set of categories when it does 
# statistics and graphics.
 
# The factor() function uses a comma-delimited set of details. Programmers call 
# the details that a function understands "arguments". Notice the ( following
# the word factor and the commas ending the lines with the arguments until 
# the closing ) .
nonResponse$race <- factor(nonResponse$race, 
                           # The c() function can glue together a set of similar 
                           # things, for example, a bunch of numbers, to make a 
                           # vector.  The levels argument is expecting a vector.
                           # Remember that the c() function stores similar 
                           # things as a vector. In this case, the vector is 
                           # storing character strings that are the actual 
                           # values of the race variable.
                           levels = c("White", "African origin", "Asian"), 
                           order = TRUE
                           )           
nonResponse$sex <- factor(nonResponse$sex, 
                          levels = c("Male", "Female"), 
                          order = TRUE
                          )      
 
# The with() function, used below, allows you to work with the variables within 
# a data frame  without having to repeatedly mention the data frame.  The code 
# below could have been written as:
#      table(nonResponse$race, nonResponse$sex)
#
# Another option is to attach the data frame  at the "top" of the search list with
# code like this:
#       attach(nonResponse)
#       table(race, sex)
#       detach(nonResponse)
# That strategy can work if you are only working with a single data frame but it 
# can lead to problems. Problems will ensue if you do not detach the data when 
# you have variables with the same name in multiple data frames.  For example,
# if you are working to predict who has a sexually transmitted infection and 
# if you have a variable called sex in a data frame  called coitus and another 
# variable called sex in a data frame  called demographics, you can easily 
# accidentally use a person's gender when you intended to use an indicator for 
# sexual activity. Look here a good brief introduction:
#     http://www.r-bloggers.com/to-attach-or-not-attach-that-is-the-question/
 
# Make a summary table of the race and sex variables found in the nonResponse 
# data frame .
with(nonResponse,  # Use the lbp dataset. 
     # The table() function makes frequency count tables. Here it will make a table
     # for the 3 levels of race and the 2 levels of sex.
     table(race, sex)
     )

Color coding created by Pretty R at inside-R.org

Box 5.4 Population profile from a report

Box 5.4 Population profie from a report

Box 5.4 Code

Click here to show code with comments

setwd("~/Documents/Books/Presenting/data/rData/")
 
load("lbp.RData")
 
lbp$time <- factor(lbp$time, 
                   levels = 1:5,  # shorthand for c(1, 2, 3, 4, 5)
                   labels = c("Less than 1 week", 
                              "1 week to < 4 weeks",
                              "4 week to < 8 weeks",
                              "8 week to < 6 months",
                              "6 months and over"
                              ),
                   order = TRUE
                   )
 
lbp$pain <- factor(lbp$pain, 
                   levels = 1:6, 
                   labels = c("No pain at all", 
                              "Little pain",
                              "Moderate pain",
                              "Quite bad pain",
                              "Very bad pain",
                              "Almost unbearable pain"
                              ),
                   order = TRUE
                   )
 
with(lbp,  # Use the lbp dataset.      
     # The na.omit() function returns the non missing data.  To learn about how
     # it handles missing data look here:
     # http://www.ats.ucla.edu/stat/r/faq/missing.htm 
     table(na.omit(time)) # Make a frequency table after removing missing data.
     )  
with(lbp, 
     # Calculate percentages by dividing the counts of a frequency table
     # divided by the number of not missing elements and multiply by 100.
     # The round() function takes two arguments.  The first is a number and 
     # the second is the number of decimal places.  Here, round returns a 
     # number with 0 decimal places. The length function says how many items 
     # were found.  Here it counts the time assessments after the missing times
     # are removed. 
     round(100 * table(na.omit(time))/length(na.omit(time)), 
           0
           )
     )
 
with(lbp, 
     table(na.omit(pain)) 
     )  
with(lbp, 
     round(100 * table(na.omit(pain))/length(na.omit(pain)), 
           0
           )
     )

Color coding created by Pretty R at inside-R.org

Table 5.1 Single Concise table from a paper with adjusted and unadjusted estimates

Table 5.2 Table suitable for an oral presentation

Table 5.1 Early Pregnancy Study

Table 5.1 and 5.2 Code

Click here to show code with comments

Figure 5.2 Several binary variables on one graph suitable for a poster or talk

Figure 5.2 Several binary variables on one graph suitalbe for a psoter or talk (example)

Figure 5.2 Code

Click here to show code with comments

setwd("~/Documents/Books/Presenting/data/rData/")
 
load("vaginosis.RData")
 
# The function() function is used to make a new function called rnd() that will 
# be used below.  To start to learn about writing R functions look here:
#    http://www.r-bloggers.com/how-to-write-and-debug-an-r-function/
rnd <- function(x) {
  # To understand how the function works, remove the leading # in the four 
  # lines of code below and run the code one line at a time:
  #
  # table(vaginosis$bvyes)  # Make a frequency count.
  # prop.table(table(vaginosis$bvyes))  # Convert the count to percentages.
  # prop.table(table(vaginosis$bvyes))[2]  # Only keep the 2nd number.
  # round(prop.table(table(vaginosis$bvyes))[2], 2)  # Round it to 2 decimals.
  #
  # That is, use the table() function to make a frequency table on x; turn that 
  # into a table of percentages by using the prob.table() function. Type 
  # ?prop.table and run the example code to see what it does.  Take the 
  # 2nd percentage (the yes category), multiply that by 100 and use the round() 
  # function to round it to two decimal places. 
  round(100 * prop.table(table(x)),  
        2
        )
}
 
# Apply the rnd() function to the bvyes variable for everyone in the vaginosis 
# dataset.
allScore <- with(vaginosis, rnd(bvyes))
allScore  # Print the rounded percentage.
 
# Apply the rnd() function to the bvyes variable for everyone who has ageunder25
# equal to 1 in the vaginosis dataset. The subset() function takes a data frame  
# as its first argument and a "logic check" argument, which is actually called 
# subset, which is used to determine which records/rows to include.  
ageunder25Score <- with(subset(vaginosis, ageunder25 == 1), rnd(bvyes))
blackethnicScore <- with(subset(vaginosis, blackethnic == 1), rnd(bvyes))
socialclass3to5Score <- with(subset(vaginosis, socialclass3to5 == 1), rnd(bvyes))
singleScore <- with(subset(vaginosis, single == 1), rnd(bvyes))
topScore <- with(subset(vaginosis, top == 2), rnd(bvyes))
 
# The data.frame() function can be used to convert (technically it is called 
# coerce) its arguments into being a new data frame .  Here, two vectors are 
# created. The first, called what, is holding character strings.  The second
# vector, called per, is holding numbers. They are glued together to form a 
# data frame  called theFrame that has two columns and six rows.
theFrame <- data.frame(what = c("All women (1201)", 
                                "Under 25  (150)", 
                                "Afro-Caribbean (116)",
                                "Social Class 3-5 (415)",
                                "Single (94)",
                                "Previous termination (207)"
                                ),
                       per = c(allScore, ageunder25Score, blackethnicScore, 
                               socialclass3to5Score, singleScore, topScore
                               )
                       )
# The barplot() function draws bar charts.  It can use many arguments, some of
# which are shown below, to control the details of what is shown.  To learn 
# more, type ?barplot and look at the Usage and Examples section.  Also look
# here: http://www.statmethods.net/graphs/bar.html
barplot(theFrame$per, 
        names.arg = theFrame$what
        )

Color coding created by Pretty R at inside-R.org

Figure 5.3 Graph for an ordered categorical variable suitable for poster or talk

Figure 5.3 Graph for an ordered categorical variable suitable for poster or talk (example)

Figure 5.3 Code

Click here to show code with comments

Figure 5.4 Graph comparing two groups of patients suitable for poster or talk

Figure 5.4 Figure 5.4 Graph comparing two groups of patients suitable for poster or talk (example)

Figure 5.4 Code

Click here to show code with comments

setwd("~/Documents/Books/Presenting/data/rData/")
 
load("xray.RData")
xray 
# Notice there is one record per observation.  That is, the data is in 
# long/narrow format.  The bar plot will want the data reorganized to have 
# one record per result.  That is, the data should be in wide format.  This  
# can be done with many functions.  The unstack() function is perhaps the
# simplest.
 
x <- unstack(xray, 
             percent ~ when # The percents are shown in the body of the new 
             # data frame .  The new column variables are based on
             # the "when" variable.
             )
 
# The row.names() function can give labels for the rows in a data frame.
row.names(x) = c("Referred", "Not Referred")
 
# The reshape package has a function called rename() to easily rename variables.
# That function uses two arguments, the data frame  and the details on the 
# change.
library(reshape)  
x <- rename(x,  # In the x data frame , rename the X1 variable to appear with label.
            c(X1 = "Less than 1 week")
            )
x <- rename(x, c(X2 = "1 week to < 8 weeks"))
x <- rename(x, c(X3 = "8 week to < 6 months"))
x <- rename(x, c(X4 = "6 months and over"))
 
# You can run into warnings/issues if two R packages have the same function.  
# The function reshape() is used in several packages which are commonly used.
# So, it is good hygiene to unload it when you are done.
detach("package:reshape", unload = TRUE)  # Prevent conflicts with rename function.
 
 
barplot(as.matrix(x),  # Convert the x data frame  to be a matrix.  
        legend.text = row.names(x), 
        beside = TRUE,                  # Do a side by side bar chart. 
        col = c("grey", "red"),         # The bar colors 
        args.legend = list(bty = "n"),  # Put no box around the legend/key. 
        ylab = "Percentage of patients"
        )
title(main = "Length of present episode for patients referred for X-ray in observational study (N=427)")

Color coding created by Pretty R at inside-R.org

Figure 5.5 Calculating the confidence interval of a geometric mean

Figure 5.5 Calculating the confidence interval of a geometric mean in R

Figure 5.5 Code

Click here to show code with comments

Table 5.3 Presenting several types of variables with a common theme in one table

Table 5.3 Code

Click here to show code as text

R code

Figure 5.6 Using a graph to compare a distribution with a cut-off

Figure 5.6

Figure 5.6 Code

Click here to show code with comments