Chapter 11: Survival analysis using R
Figure 11.1, Kaplan–Meier curve
Figure 11.2 Kaplan–Meier curve with logrank test
Figure 11.4 Output for Kaplan–Meier estimates of survival and logrank test
Figure 11.1, 11.2 and 11.4 Code
Click here to show code with comments# Tell R where you will be working. R will look here for your data: setwd("~/Documents/Books/Presenting/data/rData/") # See chapter 2 for more details on the setwd() function. # Use the load() function to load the dataset pvd.RData out of the current # folder/directory into R's interactive working memory/environment. See chapter # 2 for more details. load("pvd.RData") # The code shows how to make a data frame that keeps the 4 variables of interest # for the records/observations where the month variable is not missing and the # outcome is not blank. The logic checks inside of the [ ] generate # values indicating which rows and columns to include in the new data frame. # Read the syntax as: [ row check , column check ]. Remember that # R uses == to check for equality and ! means "not". ! changes true into false # and it changes false into true. The is.na() function checks for missing # values and returns a vector of true and false values. usefulPvd<- pvd[! is.na(pvd$months) & ! pvd$outcome == "", # rows to keep , c("months", "outcome", "diabetes", "age70")] # columns to keep # The c() function used above can glue together a set of similar things, # in this case, a bunch of character stings, to make a vector. # The factor() function can be used to remake a categorical factor variable # without the blank level. See chapters 2 and 5 for more information on the # factor() function. usefulPvd$outcome <- factor(usefulPvd$outcome) usefulPvd$diabetes <- factor(usefulPvd$diabetes, levels = 0:1, labels = c("Healthy", "Diabetic") ) # The table() function does simple cross tabulation tables. table(usefulPvd$outcome) # Count the alive and the dead people. # Load the survival package for survfit() and survdiff() functions. library(survival) pvdSurvival <- with(usefulPvd, # Predict number of months of survival with diabetes. # The as.numeric() function coerces the outcome variable to be # numeric if it is not already numeric(outcome). # The survfit() function makes a survfit object with details # on a Cox model. It can be used to show median survival # information by typing the object name or it can print a # survival curve when used with plot, for example, # plot(pvdSurvival) # The Surv() function makes an object that has information # on the outcome/survival. It is typically used like this # with the survfit() function. survfit(Surv(months, as.numeric(outcome)) ~ diabetes) ) with(usefulPvd, # Predict number of months of survival with diabetes. The survdiff() function # with rho = 0 does a log-rank test to compare the survival in the # predictor factor diabetes. survdiff(Surv(months, as.numeric(outcome)) ~ diabetes, rho = 0) ) # Use the plot() function to draw the KM curve. Because the object it is drawing # is a "survfit" object, it draws the KM curve. plot(pvdSurvival, # survival plot using model from above lty=c(2,1), # solid line, dotted line col = c("black", "red"), # first line red xlab = "Time in months", # label x-axis ylab = "Probability of survival", main = "Kaplan-Meier survival estimates by diabetes" ) # The legend() function adds a legend to an existing plot created using the plot() # function. You can set many options. Type ?par to see options like the line # type (lty) or the line colors (col). legend(10, # legend at x-axis position 10 .3, # legend at y-axis position .3 # The levels() function can be used to get the factor labels for a # variable. levels(usefulPvd$diabetes), # the values that are in the key lty = c(2, 1), # Specify a dashed and solid line for the # key. col = c("black", "red") ) # A user-written function that produces a graphic, including the number left at # risk, can be found here: # http://biostat.mc.vanderbilt.edu/wiki/Main/TatsukiRcode # If you save it as an r file called TatsukiRcodeKMplot.r in your working # directory, then the source() function will include it when you run your code. source("TatsukiRcodeKMplot.r") kmplot(pvdSurvival, lty.surv=c(2,1), # dashed, solid lines lwd.surv=1, # width of curves col.surv=c("black", "red"), # color of curves group.names=c("Normal", "Diabetic"), # labels group.order = c(2,1), # normal above diabetic extra.left.margin=3, # leave room for labels xlab = "Time in months", # label x-axis ylab = "Probability of survival", # label y-axis legend = FALSE) # no legend in body of picture # The title() function can add titles to many common plots including graphics # created by the plot(). To learn more look here: # http://www.statmethods.net/advgraphs/axes.html title("Kaplan-Meier survival estimates by diabetes showing number at risk by time", adj=.5, # centered (default) font.main=1, # which font line=0.5, # distance from plot cex.main=1) # size of font # The text() function can be used to add annotation to plots produced by the # plot() function. The first argument gives the x-axis location, the second # is used for the y-axis and the third is the text that is added to the plot. text(90, .95, "P (logrank) = 0.19") # Center extra text at (90, .95)
Figure 11.3
The data for this graphic are not available.
Figure 11.5 Output for Cox regression with one predictor available
Figure 11.5 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("pvd.RData") usefulPvd$diabetes <- factor(usefulPvd$diabetes, levels = 0:1, labels = c("Healthy", "Diabetic") ) pvdSurvivalCox <- with(usefulPvd, # The coxph() function fits a cox proportional hazards model. coxph(Surv(months, as.numeric(outcome)) ~ hasDiabetes) ) # The summary() function pays attention to the type of object it is summarizing. # Here, because the object class is coxph, it gives the model, the coefficients, # and several statistics. See chapters 2 and 9 for more details. summary(pvdSurvivalCox) # Display the Cox model.
Figure 11.6 Output for Cox regression with more than one predictor variable
Box 11.8 Presenting Cox regression analysis
Figure 11.6 and Box 11.8 Code
Click here to show code with commentssetwd("~/Documents/Books/Presenting/data/rData/") load("pvd.RData") pvdSurvCox2 <- with(usefulPvd, # Predict number of months of survival with diabetes and # age > 70 indicator. Notice that the Surv() function # call is on the left side of the ~ which indicates that # it is the outcome of the model. coxph(Surv(months, as.numeric(outcome)) ~ diabetes + age70) ) summary(pvdSurvCox2) # Display the Cox model.