*************************************************************; * Example of survival-time data with two groups * using `proc lifetest' in SAS: * (i) Kaplan-Meier plots for both groups * (ii) The Cox-Mantel and a Gehan-Wilcoxon-like test for a * difference in median survival rates between the two groups. * * The example below has survival times in months for two groups of * 5 patients. The `Treated' group had a series of three * chemotherapy agents in rotation after cancer surgery. The * `Control' group had the same surgery but no chemotherapy. * * In this example, SAS needs a dataset with values for the following * three fields for each of the 10 patients: * Group (whether Treated or Control) * Months (Lifetime or Censored time in months) * Status (0 if death observed, 1 if censored) * * It is easiest to think of a SAS dataset as a matrix with records * for individuals as rows and columns for variables. Here the SAS * dataset has 10 rows and 3 columns, with `group, months, status' * as the 3 columns. In general, a `variable' is SAS is the same as * a column in a SAS dataset. * * The data step below begins with the statement `data cmf', where * `cmf' is the name that we give the dataset. We need to write one * record for each of 10 patients with values for three SAS * variables. The data set could have lots of other variables, but * we only need these three. The `input' statement reads one row at * a time from the lines strictly between `datalines' and the next * line that contains a semicolon. One record with values for group, * months, and status are written for each row read, so that the * resulting SAS dataset looks very much like the data in the * datalines block. * * The data set is displayed in the `print' command. In general, it * is very important to make sure that SAS's version of the data * is the same as you think it is. (In this case, there is * relatively little that could go wrong.) * * The `title' statements below say to put these two lines at the top * of every page of output, so that we know what problem the output * is talking about. * * The `options' statement makes the printed output more readable. * The default SAS page size is (or used to be in some versions) * 150x30, not 80x60. The `options' commands specifies a page size * of 75x60. * * The data is from E.T.Lee and J.W.Wang, * ``Statistical Methods for Survival Data Analysis'', p109 *************************************************************; title 'Survival times in months for two groups - YOUR NAME'; title2 'Two groups of patients after surgery'; options ps=60 ls=75 pageno=1 nocenter; data cmf; input group$ months status; datalines; Treated 16 1 Treated 18 1 Treated 20 1 Treated 23 0 Treated 24 1 Control 15 0 Control 18 0 Control 19 0 Control 19 0 Control 20 0 ; *************************************************************; * Display the SAS dataset to make absolutely sure that it is * what we think it is. *************************************************************; proc print data=cmf; title3 'The data as SAS sees it'; run; *************************************************************; * We now use SAS's `proc lifetest' to plot the survival curves S(t) * for the two groups and to compare the two groups using three * different statistical tests. * * The command `options ps=40' restricts output page height to * 40 rows in order to get nicer-looking text plots. * * The option `lineprinter' below generates text plots as opposed to * plots on a higher-resolution graphics device. You will get * nicer-looking plots in Windows PC SAS if you leave this option * out. On the other hand, UNIX SAS on the ArtSci computer will * crash if you leave out `lineprinter', since it will not know * where to send your higher-resolution output. * * The syntax `strata group' tells SAS that `group' is a variable * that indicates different groups in `cmf'. The statement * `time months*status(1)' says that * (i) `months' is the survival or censoring time variable, * (ii) some of those times may be censored, depending on the * value of the variable `status', * (iii) the value(s) of `status' that indicate a CENSORED TIME * (as opposed to an observed death) are listed inside the * parenthesis. That is, status=1 means a censored event and * any other value of status (such as status=0) means an * observed death. *************************************************************; options ps=40; proc lifetest data=cmf plots=(s) lineprinter; title3 'Lifetest plots and tests'; strata group; time months*status(1); run; *************************************************************; * The output of `proc lifetest' begins with a spreadsheet- or * life-table-like table for each group that is used to create * two Kaplan-Meier plots. You can suppress these fussy life tables * by adding the word `notable' after `plots=(s)' above and before * the semicolon on that line. * * The principal plotting options for the plots to include in * `proc lifetest' output are * * s for S(t) (Kaplan-Meier plot(s)) * ls for -log S(t) (linearity suggests an exponential * distribution) * lls for log(-log S(t)) (linearity in log(t) suggests a * Weibull distribution) * h for an estimate of the hazard function, h(t) * p for an estimate of the probability density, f(t) * * Here we only ask for S(t). If we wanted all five, we would * replace `plots=(s)' by `plots=(s,ls,lls,h,p)'. * * This is followed by output for two statistical tests that are * called the Log-Rank and Wilcoxon tests. The first table after * the Kaplan-Meier plots has the numerator of the corresponding * tests statistics in the row marked `Treated'. Note that these * numerators are exactly the same as the numerators of the * Cox-Mantel and Gehan-Wilcoxon test statistics for this dataset * in the text and that we also computed in class. * * The next two tables in the output give the Variances of the * numerators of the Log-Rank and Wilcoxon statistics. Note that * the variance of the Log-Rank test is exactly the same as that of * the Cox-Mantel test in the text. The variance of the Wilcoxon * test statistic is very similar to that for the Gehan-Wilcoxon * test in the text, but not identical. Specifically, the text * has 57.78 for the variance, while the SAS output has 58.40. * * In fact, the Log-Rank test in `proc lifetest' is identical with * what the text calls the Cox-Mantel test. The Wilcoxon test in * `proc lifetest' is the same as the Gehan-Wilcoxon test, except * that the variance is calculated the same way as in the Cox-Mantel * test, as opposed to the permutation-test variance. That is, the * variance is that of a weighted Mantel-Haenszel test at distinct * observed death times. * * The P-value for the `-2log(R)' test at the end of the `proc * lifetest' output is only valid if the distribution of true * survival times is exponential, and is not valid otherwise. This * P-value should be ignored unless you know for some reason that * the true lifetimes are really exponentially distributed. *************************************************************; *************************************************************; * ENTERING SURVIVAL DATA IN A MORE INTUITIVE WAY: * We would like to enter the data as * * Treated 16+ 18+ 20+ 23 24+ * Control 15 18 19 19 20 * * rather than the somewhat clumsy want that we entered the data in * the datalines block about. * * Clumsier data entry often means more typographical errors, as well * as possibly lots of work with a text editor to get the data in that * form. Also, data in published SAS programs are generally entered * in an appealing form and then transformed into the format that SAS * procedures need by commands in a data step. For those of you who * want to learn more about SAS programming, we show how to do this * here. * * First, we note that there are only two types of variables in SAS, * text and numerical, and it is generally not easy to switch between * them. If we entered the data as above, the `trailing +'s would * either flag those variables as text with unpleasant consequences, * or else the trailing +s would be ignored altogether. The same * would happen if we entered data as * * Treated +16 +18 +20 23 +24 * Control 15 18 19 19 20 * * However, we can enter the data as * * Treated -16 -18 -20 23 -24 * Control 15 18 19 19 20 * * The negative values cannot be taken literally, since survival times * must always be positive (or zero). Hence the only possible meaning * of `-16' is that this is a censored time whose true value is +16. * With these conventions in mind, we can now enter *************************************************************; data cmf2; retain group; input zz$ @@; if zz='Treated' or zz='Control' then group=zz; else do; val=input(zz,12.0); status=(val<0); months=abs(val); output; end; drop zz val; datalines; Treated -16 -18 -20 23 -24 Control 15 18 19 19 20 ; *************************************************************; * By default, `input' reads an entire line from a datalines block, * does with it what it will, and then reads the next line. Any * information in the line that is not immediately used is lost. * The `trailing @@' option in `input zz$ @@' tells SAS to read * one word at a time from the datalines block and not one line * at a time. * * `Retain group' tells SAS to retain the value of `group' over * successive input commands. Otherwise, SAS re-initializes all * variables to `missing'. This is normally the most reasonable * thing for SAS to do, but here we want to set the value of `group' * and keep it until the next group is encountered. * * `input zz$ @@' reads one word at a time from the datalines block, * treating the word that is read as text. If this text word is * `Treated' or `Control', we use it to set `group', and then go on * to read the next word. * * If zz is NOT `Treated' or `Control', the statement * `val=input(zz,12.0)' assumes that the text word `zz' is a number * is disguise, and reads its numerical value into the numerical * value `val'. This is necessary since SAS is a strongly typed * language in regards to text and numerical variables. The * commands do--end act as parentheses (or grouping commands) * for the `else' statement. In particular, `end' does not mean * to literally end the program or data step. * * Like the language C, SAS converts any logical statement * (here that val<0) into the value 1 if it is true or 0 if it is * false. Thus status=(val<0) sets status=1 for a censored value * (val<0) and status=0 for an observed value (val>=0). The * statement `months=abs(val)' retrieves the absolute value of val, * which is the true survival or censoring time whether the time * is censored or not. Finally, `output' would tell SAS to write a * record with the available values of zz, group, val, status, and * months. However, the final command `drop zz val' in the datastep * says to ignore the values of zz and val, so that the record that * we write only has fields for group, status, and months. * * There is a subtlety here. Note that the datastep `data cmf' did * NOT have an `output' statement. The convention is that if you * ever say `output' in a data step, then SAS will write a record * in that data step ONLY when you say `output'. Otherwise, SAS * writes exactly one record for each pass through the data step. * All that is gained by the used of `output' here is that records * are not written for zz=`Treated' and zz=`Control'. These records * would have missing values for months and status. SAS would * ignore these records anyway most of the time. We only exclude * them here so that we can say that we got exactly the same records * as in the first dataset (other than the order of the columns). * * Finally, we display the second SAS dataset to show that it is * (except for the order of the columns) exactly the same as the * first dataset. *************************************************************; proc print data=cmf2; title3 'THE SECOND DATASET IS THE SAME AS THE FIRST DATASET !'; run;