Vanderbilt Biostatistics Wiki

REGRESSION MODELING STRATEGIES with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis by FE Harrell

E-book available as of 2015-08-17here

Print version available 2015-09-04Flyer

You may order the hardcover or e-bookhereor fromAmazon

R codefor all examples in the books 2nd edition. Numbers in file names are chapter numbers.

vol. 72 no. 3 September 2016, p. 1006-7. doi:10.1111/biom.12569

REGRESSION MODELING STRATEGIES with Applications to Linear Models, Logistic Regression, and Survival Analysis by FE Harrell. The book was published June 5 2001 bySpringer New York, ISBN 0-387-95232-2 (also available andDirectTextBook. Clickhereto see the text from the books back cover. Clickhereto see the preface and table of contents for the book manuscript in .pdf format. Clickhereto obtain a partial index to the book in .pdf format, andhereto see a sample chapter from the book (

:This material is Copyright 2001-2004 Springer-Verlag and may not be reproduced).

Changes and additionsfor the second edition (projected publication July 2015)

Statistical Methods in Biomedical Research

Bulletin of the Swiss Statistical Society

Statistical Methods in Medical Research

International Journal of Epidemiology

31(3):699-700, June 2002. Note: This otherwise excellent review states that the book recommends selecting variables to include in the model on the basis of their frequency of selection by a bootstrap procedure. This is definitely

Journal of the American Statistical Association

Errata for thefirstandsecond and laterprintings. The book had its third printing in December 2002 and its fourth printing in December 2003. The sixth printing was in December 2005.

New versions of R codethat makes some examples in the book relying on the

One-semester courseusing part of the text, for students who have not had a course in linear regression.

Interactive Overviewof many of the methods in the book

Syllabusfor a more advanced course using the text up until survival analysis, for students already versed in ordinary linear regression.

Vanderbilt University Campus: 14-18 May 2018; detailshere

Clickherefor a detailed course description

Clickherefor supplements to handouts

Offered for the first time in the Vanderbilt University Department of Biostatistics graduate program Spring 2013 (Jan-Apr). It is taught yearly by Prof. Harrell

SeeCourseBios330for up-to-date material

R codefor all examples in the books 2nd edition. Numbers in file names are chapter numbers.

Chapter 7 from the first edition: Case Study in Least Squares Fitting and Interpretation of a Linear Model (analysis of the 1992 US presidential election)

Surveyof new approaches to regression and tree-based modeling (referred to in Chapter 4 of the second edition)

Syllabusfor a 1-day short course based on the text

Syllabusfor a 3-day short course and S workshop based on the text

Syllabusfor a 1-day short course Modern Approaches to Predictive Modeling and Covariable Adjustment in Randomized Clinical Trials

Scripts developed in class during theMay 2000orAugust 20003-day courses or theJune 2001orJune 20023-day course for Insightful Corporation

An olderdiscussion boardfor readers and the author to discuss questions, issues, controversies, and new research related to the text

Homework assignments not in the book. Some of the early assignments are for basic regression, a prerequisite for the book. Some of the problems use data in Rosner B:

Fundamentals of Biostatistics, 5th Edition

. Belmont CA: Duxbury Press; 1999. Solutions to these problems as well as solutions to many of the problems given in the book are available to instructors byE-mailingthe author

Quizzes (with answer sheets) on concepts in the text and on prerequisites, are available to instructors byE-mailingthe author

InteractiveS scriptsdemonstrating various curve fitting criteria and showing the flexibility of restricted cubic splines (see alsohere)

An Introduction to S and the Hmisc and Design Librariesby CF Alzola and FE Harrell

Statistical computing coursematerial

Miscellaneous S softwareavailable on the Internet that is related to some of the methods covered in this book such as data reduction, censored data analysis, imputation, recursive partitioning

Unsupported SAS macrosfor restricted cubic splines, displaying survival estimates, and checking proportional hazards and other model assumptions, etc. (FE Harrell, 1991). Clickherefor information, examples, and brief information on SAS procedures useful for multivariable modeling and on obtaining predicted values with SAS.

SAS macrosfor various censored data calculations such as AUC, from Chambless LE, Diao G. Estimation of time-dependent area under the ROC curve for long-term risk prediction.

Warren SarlesSAS macros and examplesfor bootstrapping and jackknifing. See Warrens cautionary note on bootstrap confidence intervals, with a good example related to R^2 in multiple regression. The example shows that when the estimate of R^2 is badly biased, bootstrap confidence limits are badly displaced to the right. Included in the notes is the standard error of R^2 and information about adjusted R^2.

StatLibstatistical computing repository

Recent simulation experiments conducted by Carl Moons and Frank Harrell indicate that the performance of

for multiple imputation is about halfway between single conditional mean imputation and MICE (see below), consistent with the findings from Faris PD, Ghali WA,

(2002): Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses.

:184-191. Suboptimal performance of

for multiple imputation is probably due to the fact that

fits the flexible additive imputation models and then draws all multiple imputations from the fitted models. A new function in the Hmisc package,

, uses the bootstrap to re-fit additive nonparametric imputation models for each of the multiple imputations. Results for

Validation of binary logistic models

Steyerberg EW, Harrell FE, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF (2001): Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis.

(2003): Internal and external validation of predictive models: A simulation study of bias and precision in small samples.

Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF (2005): Substantial effective sample sizes were required for external validation studies of predictive logistic regression models.

Studying the degrees of freedom spending strategythat uses generalized Spearman rho^2, in terms of preserving type I error and sigma^2 in ordinary least squares

Prediction Error in Cox Models Varying Number of Predictors

Shrinkage and problems with stepwise variable selection

: See Steyerberg EW, Eijkemans MJC, Harrell FE, Habbema JDF (2001): Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets.

Model simplification and stepwise variable selection

: See Ambler G, Brady AR, Royston P (2002): Simplifying a prognostic model: a simulation study based on clinical data.

:3803-3822. The authors studied the performance of the model simplification strategy discussed in the book, and compared it with more traditional variable selection methods, finding that standard variable selection can work well when there is a large proportion of irrelevant variables.

case study on penalized maximum likelihood estimation

for binary logistic modeling: Moons KGM, Donders ART, Steyerberg EW, Harrell FE (2004): Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example.

Example of aspline interaction surface

Interactive demonstrationsof curve fitting, effects of categorization, etc.

Peter Ellisblog articleabout overanalysis of time series data

To subscribe, clickhereor access it from anews server.

Joseph SchafersMultiple Imputation FAQ

Multiple Imputation Onlineand R MICE Software by Stef van Buuren and Karin Oudshoorn

To subscribe to the Impute E-mail discussion group led by Juned Siddique of Northwestern University, clickhere.

A paper containing a good overview of multiple imputation and a comparison of some software packages is Horton NJ, Lipsitz SR,

An excellent recent survey of missing data methods is Schafer, JL and Graham JW,

See alsoBiases in SPSS 12.0 Missing Value Analysisby Paul von Hippel,

Notesfrom Tim Hesterberg on why the response variable must be used when doing multiple imputation. Tims notes include code to do several simulations illustrating his points.

A nicestudy and reviewof multiple imputation

Moons KGM, Donders RART, Stijnen T, Harrell FE,

General information onregression modeling, including prerequisite material for Regression Modeling Strategies

Problems withcategorizing continuous variables

Information forbiomedical researchersandmedical citations for statistical issues

Notes onregression modeling in randomized clinical trials

Julian Faraways free bookPractical Regression and Anova using R

Bender and Benners paperCalculating ordinal regression models in SAS and S-Plus

Stephan Rudolfers presentationDiagnosis of Carpal Tunnel Syndrome using Logistic Regression, an excellent presentation on various types of ordinal logistic models. Includes a nice discussion of accuracy indexes.

John FoxsApplications of Quantitative Methods in Sociologycourse material, including information on polytomous logistic regression

John Foxs excellent article onBootstrapping Regression Models

Brian Ripleys terrific presentationSelecting Amongst Large Classes of Models, containing highly useful thoughts about AIC, cross-validation, and other concepts

Paul Allisons excellentdiscussionabout R^2 measures

Lindsay Smiths nice tutorial onprincipal components analysis

Annotated bibliographywith emphasis on predictive methods, survival analysis, logistic regression, prognosis, diagnosis, modeling strategies, model validation, practical Bayesian methods, clinical trials, graphical methods, papers for teaching statistical methods, bootstrap, etc; FE Harrell *Miscellaneous information on methodology, much of it culled from electronic discussion groups (

Bob ObenchainsRegression ShrinkageWeb page

Patrick Burnsbootstrap and resamplingpage

Jan de Leeuws excellentworking paperon splines, including monotone splines

–FrankHarrell- 30 Jan 2004; updated 29 Feb, 4,27 May, 4, 11 Jul, 29 Aug, 2 Sep 2004, 20, 22 Jan, 24 Mar, 14 Aug 2005, 13 Jan, 11 Nov 2006, 28 Feb, 29 Jun, 12 Jul, 16 Jul, 7 Sep, 22 Oct 2007, 11 Jan, 2 Sep, 18, 23 Dec 2008, 6 Jan, 3 Feb, 13, 17 Apr, 9 Sep 2009, 9Feb 2010, 21 Feb 2011, 21 Apr 2011, 1 May 2011, 29 May 2011, 6 Aug, 5 Sep 2011, 22 Dec 2012, 1 Mar 2014, 12 Apr 2014, 2014-07-15, 2014-09-23, 2014-10-04, 2014-11-26, 2015-01-03, 2015-01-23, 2015-03-22, 2015-06-08, 2015-08-27, 2015-09-06, 2015-09-23, 2016-11-10, 2017-04-16, 2018-01-18

Syllabus for Advanced Data Analysis Course

Simulation study of logistic model validation

New approaches to modeling (referred to in Chapter 4 of 2nd edition)

Detailed Description of RMS Short Course

Topic revision: r215 – 18 Jan 2018,FrankHarrell

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?Send feedback