Chapter 9: Analysing matched or paired data using R
Figure 9.1 Scatterplot for two variables
Figure 9.2 Output for Pearson’s correlation
Box 9.3 Presenting the results for Pearson’s correlation
Figure 9.1 Code
Click here to show code with comments# Tell R where you will be working. R will look here for your data: setwd("~/Documents/Books/Presenting/data/rData/") # See chapter 2 for more details on the setwd() function. # Use the load() function to load the dataset screening.RData out of the current # folder/directory into R's interactive working memory/environment. See chapter # 2 for more details. load("screening.RData") # Use the library() function to load the psych package for the describe() # function. See chapter 2 for more details on library(). library(psych) # The describe() funcion in the psych package does descriptive statistics # including a standard deviation. See chapter 2 for more details. Chapter 2 # also covers the with() function. with(screening, describe(parent) ) with(screening, describe(mdi) ) with(screening, # The cor.test() function calculates many correlation coefficients # including the Pearson correlation coefficient. cor.test(parent, mdi) ) with(screening, # Use the plot() function to make a scatterplot for parent and mdi. Plot() # makes a scatterplot by default for two integer variables. See chapter 7 # for more details. plot(parent, mdi, main = "Scatter plot of MDI by parent questionnaire score", # title xlab = "Parent score", ylab = "Mental development index") ) # The text() function can be used to add annotation to plots produced by the # plot() function. The first argument gives the x-axis location, the second # is used for the y-axis and the third is the text that is added to the plot. text(120, 52, "r = .68 (N = 64, p < .001)") # Center extra text at (120, 52).
Figure 9.3 Scatterplots for several variables
Figure 9.4 Calculating several correlations
Box 9.5 Presenting correlations between several variables


Figure 9.3, Figure 9.4 and Box 9.5 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("babyanth.RData") with(babyanth, # The pairs() function produces a scatterplot matrix. Type ?pairs and look # in the Examples section to see the syntax. pairs(~Weight+Head+Arm+Length) # Scatterplot matrix with all 4 variables ) # Load the Hmisc package to use the rcorr() function. By default, it returns # a set of matricies holding Pearson's r, the sample size, and p values. library(Hmisc) # The rcorr() function expects a matrix for its first argument. The as.matrix() # function coerces four select columns of the babyanth data frame to be a matrix # object. The [ ] notation is used to filter a data frame. Here the [, # indicates that there are no filters on the rows and 1:4] indicates that only # the first 4 columns in the data frame are being used. rcorr(as.matrix(babyanth[, 1:4]) # Use columns 1 to 4 in the correlation matrix. )
Figure 9.5 Presenting scatterplots (a) with skewed data and (b) where data are transformed



Figure 9.5 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("tbmeals.RData") tbmeals$logged <- log(tbmeals$tb) with(tbmeals, cor.test(meals, tb) # Do a Pearson correlation. ) with(tbmeals, # Plot parent by mdi. The plot() function makes a scatterplot by default # for two numeric variables. plot(meals, tb, main = "Raw data", # title xlab = "Free school meals (% children)", ylab = "TB rate / 100000 popn.") ) text(18, 33, "r = .45, df = 31, p < 0.0078") # Center extra text at (18, 33). with(tbmeals, cor.test(meals, logged) ) with(tbmeals, # Make a scatterplot of the parent by mdi variables. plot(meals, logged, main = "Log-transformed data", xlab = "Free school meals (% children)", ylab = "TB rate / 100000 popn.") ) text(18, 3.4, "r = .46, df = 31, p < .0068") # Center extra text at (18, 3.4).
Figure 9.6 Output for a rank test

Figure 9.6 Code
Click here to show code with comments
Figure 9.7 Scatterplot of two variables with linear regression line


Figure 9.7 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("pefr.RData") with(pefr, # Make a scatterplot. plot(age, pefr, xlab = "Age in years", ylab = "PEER (1/min)" ) ) # Predict pefr with age and save the results as an object called theModel with # the lm() function. That object will be of class lm and will have many useful # attributes including the details on the modeling, the coefficients, the # residuals, etc. You can plot an object that is a lm with a statement like # plot(theModel) to get diagnostic plots. Other useful functions like # summary() and confint() are shown below. theModel <- with(pefr, # The lm() function can be used to fit a linear regression # model. The first argument is a formula which is written as # the outcome/dependent variable/predicted, then a tilde, then # the independent variable(s)/predictor(s). The syntax for the # models can become complex. lm(pefr ~ age) ) # The abline() function adds lines to a previous plot. Here it is passed the # lm object called theModel. When you pass abline a lm model, which is a simple # linear regression, it will use the regression formula to set the slope and # intercept for the line. abline(theModel) # Add a regression line based on the model.
Figure 9.8 Output for simple regression

Figure 9.8 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("pefr.RData") # Load the psych package for the describe() function. library(psych) with(pefr, describe(pefr) # Describe the outcome variable. ) with(pefr, describe(age) # Describe the predictor variable. ) # Predict pefr with age and save the results as an object called theModel. theModel <- with(pefr, lm(pefr ~ age) ) # The summary() function pays attention to the type of object it is summarizing. # Recall from chapter 2 that summary() does five number summaries on a data # frame. Here the object is a linear model. You can see that by typing: # class(theModel) # Because the class is "lm", summary gives different details including the model # code that was submitted, the residuals, the coefficients and a few statistics. summary(theModel) # Print a summary of the model. confint(theModel, level = .95) # Print 95% CLs on the model coefficients.
Box 9.11 Presenting the results for several predictor variables


Box 9.11 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("smokingbwt.RData") # Select five columns where the rows are not missing tar. momSmokes <- smokingbwt[! is.na(smokingbwt$tar1), c("birthwt", "cigsday1", "nic1", "tar1", "co1") ] describe(momSmokes) # Use the mothers with smoking data to predict birth weight with cigarettes. modelCigsday1 <- with(momSmokes, lm(birthwt ~ cigsday1) ) summary(modelCigsday1) # Print a summary of the model. # The confint() function prints confidence limits on models. Here it produces 95% # confidence limits on the modelCigsday1 model that was just created. confint(modelCigsday1, level = .95) # Print 95% CLs on the model coefficients. # Use the mothers with smoking data to predict birth weight with nicotine. modelNic1 <- with(momSmokes, lm(birthwt ~ nic1) ) summary(modelNic1) # Print a summary of the model. confint(modelNic1, level = .95) # Print 95% CLs on the model coefficients. # Use the mothers with smoking data to predict birth weight with tar. modelTar1 <- with(momSmokes, lm(birthwt ~ tar1) ) summary(modelTar1) # Print a summary of the model. confint(modelTar1, level = .95) # Print 95% CLs on the model coefficients. # Use the mothers with smoking data to predict birth weight with carbon monoxide. modelCo1 <- with(momSmokes, lm(birthwt ~ co1) ) summary(modelCo1) # Print a summary of the model. confint(modelCo1, level = .95) # Print 95% CLs on the model coefficients.
Box 9.14 Presentation of a regression line used for prediction
Figure 9.9 Scatterplot of two variables with linear regression line and 95% confidence intervals


Box 9.14 and Figure 9.9 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("pefr.RData") # Load the ggplot package to get access to the ggplot() function and many # supporting functions including aes(), geom_point(), stat_smooth(), theme_bw(), # xlab(), ylab() and geom_text(). # To learn more about ggplot2 look here: # http://ggplot2.org/ # http://www.statmethods.net/advgraphs/ggplot2.html library(ggplot2) ggplot(pefr, # Use the pefr datasets. aes(age, pefr) # Plot x = age, y = perf. ) + geom_point() + # Draw a scatterplot. stat_smooth(method = "lm") + # Add a regresssion line with CLs. theme_bw() + # Use a white background with grey details. xlab("Age (years)") + # Add a label for the x axis. ylab ("PERR(1/min)") + # Add a label for the y axis. geom_text(aes(x, y), # Add text at the position in the provided data # frame. label = "PEER = 89.4 + 21.8 * Age", # The data argument expects a data frame. The data.frame() function # coerces the tiny vectors named x and y to form a data frame. data = data.frame( x = 8.5, y = 410) )