Field Epidemiology Manual

A set of training materials for professionals working in intervention epidemiology, public health microbiology and infection control and hospital hygiene.

Need help with your investigation or report writing?Ask the Expert. Free advice from the professional community.

You cant make decissions on this pages approval status because you have not the owner or an admin on this pages Group.

Estimating Odds Ratios in the presence of interaction

You are viewing the most recent version of this page. An expert reviewed version of this page can be foundhere

Fitting logistic regression modelsArticles – WikiRate This

In a linear regression we mentioned that the straight line fitting the data can be obtained by minimizing the distance between each dot of a plot and the regression line. In fact we minimize the sum of the squares of the distance between dots and the regression line (squared in order to avoid negative differences). This is called theleast sum of squaremethod. We identify b0and b which minimise the sum of squares.

In logistic regression the method is more complicated. It is called themaximum likelihoodmethod. Maximum likelihood will provide values of 0and 1which maximise the probability of obtaining the data set. It requires iterative computing and is easily done with most computer software.

We use thelikelihood functionto estimate the probability of observing the data, given the unknown parameters (0and b1). A likelihood is a probability, specifically the probability that the observed values of the dependent variable may be predicted from the observed values of the independent variables. Like any probability, the likelihood varies from 0 to 1.

Practically, it is easier to work with the logarithm of the likelihood function. This function is known as thelog-likelihood, and will be used for inference testing when comparing several models. The log likelihood varies from 0 to minus infinity (it is negative because the natural log of any number less than 1 is negative).

Estimating the parameters 0and 1is done using the first derivatives oflog-likelihood(these are called the likelihood equations), and solving them for 0and 1. Iterative computing is used. An arbitrary value for the coefficients (usually 0) is first chosen. Then log-likelihoodis computed and variation of coefficients values observed. Reiteration is then performed until maximisation (plateau). The results are themaximum likelihood estimatesof 0and 1.

Now that we have estimates for 0and 1, the next step is inference testing.

It responds to the question:Does the model including a given independent variable provide more information about occurrence of disease than the model without this variable?The response is obtained by comparing the observed values of the dependent variable to values predicted by two models, one with the independent variable of interest and one without. If the predicted values of the model with the independent variable is better then this variable significantly contributes to the outcome. To do so we will use a statistical test.

TheLikelihood ratio statistic(LRS) can be directly computed from likelihood functions of both models.

Probabilities are always less than one, so log likelihoods are always negative; we then work withnegative log likelihoodsfor convenience.

The likelihood ratio statistic(LRS) is a test of the significance of the difference (the ratio if expressed in log) between the likelihood for the researchers model minus the likelihood for a reduced model (the models with and without a given variable).

TheLRScan be used to test the significance of a full model (several independent variables in the model versus no variable = only the constant). In that situation it tests the probability (the null hypothesis) that all are equal to 0 (all slopes corresponding to each variable are equal to 0). This implies that none of the independents variables are linearly related to the log odds of the dependent variable.

TheLRSdoes not tell us if a particular independent variable is more important than others. This can be done, however, by comparing the likelihood of the overall model with a reduced model which drops one of the independent variables.

In that case theLRStests if the logistic regression coefficient for the dropped variable equals 0. If so it would justify dropping the variable from the model. A non significantLRSindicates no difference between the full and the reduced models.

AlternativelyLRScan be computed from deviances.

In whichD-andD+are respectively the deviances of the models without and with the variable of interest.

The deviance can be computed as follows:

(A saturated model being a model in which there are as many parameters as data points.)

Under the hypothesis that 1= 0,LRSfollows a chi-square distribution with 1 degree of freedom. The derived p-value can be computed.

The following table illustrates the result of the analysis (using a logistic regression package) of a study assessing risk factors for myocardial infarction. TheLRSequals 138,7821 (p 0,001) suggesting that oral contraceptive (OC) use is a significant predictor of the outcome.

Table 1:Risk factors for myocardial infarction.Logistic regression model including a single independent variable (OC)

In model 2, model 1 was expended and another variable was added (the age in years). Here again the addition of the second variable contributes significantly to the model. The LRS (LRS = 16,7253, p 0,001) expresses the difference in likelihood between the two models.

Table 2:Risk factors for myocardial infarction.Logistic regression model including two independent variable (OC and AGE)

You need to be logged in to post comments.

You canlog in here.You canregister hereif you havent done so yet.Social