**********************************************************; * Based on data from an actual study: * * A group of 432 former inmates were released from Maryland state * prisons and followed for one year. The event of interest is the * first rearrest after release (if any) and how this depends on a * number of covariates. Former inmates who were not rearrested * within the year are treated as censored events. The purpose of * the study was to see the effect of financial aid on release. * The variable `fin' below was randomized with respect to the * other covariates. * * Response variables: * week - week of first arrest after release (range 1-52) * arrest = 1 if arrested during the follow-up year, otherwise * arrest=0 and week=52. (Treated as a censored event.) * 114 were arrested during the year and 318 were not arrested. * * The covariates are * fin - 1 if given financial aid on release, otherwise 0 * age - age in years on release (range 17-44) * race - 1 if Afro-American, otherwise 0 * wexp - 1 if has ever had full-time work experience, else 0 * mar - 1 if married at time of release, otherwise 0 * parole - 1 if released on parole, otherwise 0 * priors - number of prior convictions (range 0-18) * educ - A code for educational level (range 2-6) * emp1-emp52 - 52 variables - 1 if employed during that week, * otherwise 0 * * Data originally from Rossi etal, ``Money, Work, and Crime: * Some Experimental Results", Academic Press, 1980. * * The programs below were modified (some only slightly) from those in * Paul D. Allison, ``Survival Analysis Using the SAS System: * A Practical Guide", SAS Institute Press, 1995, Chapters 3-5. **********************************************************; title "RECIDICISM FOR RELEASED PRISON INMATES"; options ls=75 ps=60 pageno=1 nocenter; data recid; * Data downloaded from SAS Web site ; * 432 records, 62 variables ; infile "alrecid.dat"; input week arrest fin age race wexp mar parole priors educ emp1-emp52; run; **********************************************************; * Are the covariates correlated? This might cause loss of * significance due to some covariates `shadowing' others. **********************************************************; proc corr nosimple; * Nosimple means to skip mean,min,max,etc tables; title2 "CORRELATIONS OF COVARIATES"; var fin--educ; run; **********************************************************; * The statistically significant correlations are: * Age with Wexp rho= 0.35, P<10^{-4} * Age with Mar rho= 0.17, P=0.0003 * Age with Priors rho=-0.10, P=0.0361 * Wexp with Mar rho= 0.25, P<10^{-4} * Wexp with Priors rho=-0.26, P<10^{-4} * Wexp with Educ rho= 0.18, P=0.0002 * Race with Educ rho= 0.10, P=0.0473 * Parole with Priors rho=-0.13, P=0.0086 * Priors with Educ rho=-0.18, P=0.0002 * * Recall that Wexp=0,1 with Wexp=1 if the person has ever has * any full-time work experience. * * The following displays a horizontal bar chart of `week'; * * This shows that the times of arrest of those who were re-arrested * were more-or-less uniformly distributed throughout the 52 weeks. **********************************************************; proc chart; title2 "A HORIZONTAL BAR CHART"; hbar week / discrete; run; **********************************************************; * How do the covariates affect the time to rearrest? * Try a Cox regression using the first 8 covariates. * All released inmates were apparently followed for 52 weeks. If * a former inmate was not rearrested during the study, the * variable arrest=0 and week=52. Thus records with arrest=0 * should be treated as censored. * The bar chart shows that there are a fair number of ties among * the 114 ``observed death times'' (rearrests), so that a * tie correction method is appropriate. **********************************************************; proc phreg data=recid; title2 "COX REGRESSION (DEFAULT BRESLOW TIE CORRECTION)"; title3 "IGNORE TIES AND USE DEFAULT LIKELIHOOD"; model week*arrest(0) = fin age race wexp mar parole priors educ; run; proc phreg data=recid; title2 "COX REGRESSION: `EXACT' MODEL TIE CORRECTION"; title3 "SUM LIKELIHOOD OVER ALL POSSIBLE ORDERINGS OF TIE GROUPS"; model week*arrest(0) = fin age race wexp mar parole priors educ / ties=exact; run; proc phreg data=recid; title2 "COX REGRESSION: EFRON TIE CORRECTION"; title3 "EFRON'S LINEAR APPROXIMATION TO `EXACT' MODEL"; model week*arrest(0) = fin age race wexp mar parole priors educ / ties=efron; run; proc phreg data=recid; title2 "COX REGRESSION: `DISCRETE' MODEL TIE CORRECTION"; title3 "CONDITIONAL LOGISTIC REGRESSION AT EACH REARREST TIME"; model week*arrest(0) = fin age race wexp mar parole priors educ / ties=discrete; run; **********************************************************; * Results were nearly identical for all four tie-correction methods. * * In all cases, the only significant covariates are * Age (P=0.008) Priors (P=0.004) * * Fin is close to being significant (P=0.06) with Educ (P=0.16) * the next closest. Relative hazard ratios (exp(betahat)) are * Age(0.94) Priors(1.09) Fin(0.70) * * These are hazard rates per unit, so this indicates a drop of about * 5%/year per increased age at release for the hazard rate of * rearrest, an increase of about 10%/prior arrest, and a drop of * about 30% if financial aid is given at release. * * Perhaps correlated covariates are interfering with one another? * Try model selection to find out: * * Stepwise regression begins with 0 covariates and adds the most * significant covariate (computed differentially) until all * remaining covariates are nonsignificant. * * Backwards regression begins with all covariates and removes the * least significant covariate (computed differentially) until all * remaining covariates are significant. * * Score selection lists the best models for any given number of * covariates using the Score-test chi-square as a measure of * likelihood. * `Best=3' says the best 3 for any given number of covariates. * `Stop=5' says not to list models with more than 5 covariates. **********************************************************; proc phreg data=recid; title2 "COX REGRESSION: STEPWISE MODEL SELECTION (EXACT TIE CORRECTION)"; model week*arrest(0) = fin age race wexp mar parole priors educ / ties=exact selection=stepwise; run; proc phreg data=recid; title2 "COX REGRESSION: BACKWARDS MODEL SELECTION (EXACT TIE CORRECTION)"; model week*arrest(0) = fin age race wexp mar parole priors educ / ties=exact selection=backwards; run; proc phreg data=recid; title2 "COX REGRESSION: SCORE MODEL SELECTION (EXACT TIE CORRECTION)"; model week*arrest(0) = fin age race wexp mar parole priors educ / ties=exact selection=score best=3 stop=5; run; **********************************************************; * Stepwise selection adds priors, then adds age, then stops, exactly * as suggested by the highly significant covariates in the model * with all 7 variables. * * Backwards stepwise selection removes all variables EXCEPT age and * priors in turn, with fin (P=0.0682) the last variable to be removed. * * Both results argue against the existence of groups of * highly-correlated covariates that are disguising one another's * significant effects. * * The best models determined by Score Selection are * 2 variables: age priors * 3 variables: age priors fin * 4 variables: age priors fin mar * * * * How about including the CURRENT (weekley) employment status as * a covariate? * * This will be a TIME-DEPENDENT effect, since employment status * varies from week to week. * * Recall that empl=1 iff the released inmate was employed during * the k-th week, presumably as measured by a single spot check. * **********************************************************; proc phreg data=recid; title2 "COX REGRESSION:"; title3 "ADDING A HAZARD FOR EMPLOYMENT DURING THE WEEK OF ARREST"; model week*arrest(0)=fin age race wexp mar parole priors educ employed / ties=exact; array emplx{*} emp1-emp52; employed=emplx{week}; * NOTES: We want to say `employed = emp(week)', but the notation * emp1-emp52 does not allow indexing. In general, SAS allows you * to define arrays of what are essentially alternative names * for pre-existing variables, in such a way that the variables * can be indexed (as above). In contrast to other programming * languages, SAS arrays are essentially arrays of pointers, * not of values themselves. * SAS arrays are mostly used in SAS data steps, where the usual * syntax is round parentheses (). However, in `proc phreg' * programming statements, you must use curly parentheses {} * instead; run; **********************************************************; * The time-dependent variable `employed' is highly significant * (P<0.0001). Age and priors are still significant. The hazard * ratio is around 0.26, indicating a relative hazard of 3.8 * for not being employed. * HOWEVER, this model is potentially circular, since someone who * is arrested early in the week might be scored as NOT working * later in that week, since he might be in jail. This will * artificially increase the correlation between being unemployed * and being rearrested. * Let's try regression on the employment status during the * PREVIOUS week: **********************************************************; proc phreg; title2 "ADDING A HAZARD FOR EMPLOYMENT THE WEEK BEFORE THE ARREST"; model week*arrest(0)=fin age race wexp mar parole priors educ emp_prev / ties=exact; array emplx{*} emp1-emp52; emp_prev=emplx{week-1}; where week>1; * Skip records with week=1; run; **********************************************************; * `Employed' is highly significant (P=0.0003), but not as significant * as before. The hazard rate rises to 0.45, or 2.2 for being * unemployed the previous week. * Priors (P=0.0014) and Age (P=0.0230) are still significant. The * less-significant P-value for Age is probably due to a positive * correlation with employment status. * * How about including variables for employment status during the * TWO previous weeks? **********************************************************; proc phreg; title2 "EMPLOYMENT HAZARDS FOR THE TWO PREVIOUS WEEKS"; model week*arrest(0)=fin age race wexp mar parole priors educ employ1 employ2 / ties=exact; array emplx{*} emp1-emp52; employ1=emplx{week-1}; employ2=emplx{week-2}; where week>2; * Ignore records with week <= 2; run; **********************************************************; * Neither `employ1' nor `employ2' are significant, with `employ1' * the closest to being significant (P=0.17). This gives an example * of two correlated covariates causing neither to be significant. * * As an alternative, perhaps try modeling the CUMULATIVE weekly * employment over all previous weeks as a risk factor in any * given week? Here by `cumulative' weekly employment we mean * the AVERAGE of empl for k1; * Again, skip the first week; run; **********************************************************; * `Cumeploy' is significant with P=0.0227 and relative hazard 0.50, * but much less significant than the previous-week's employment, * and less significant in this model than priors (P=0.0042) and * age (P=0.0182). **********************************************************;