Measuring the Impact of Collinearity in Epidemiological Research


  •  Andrew Woolston    
  •  Yu-Kang Tu    
  •  Mark Gilthorpe    
  •  Paul Baxter    

Abstract

Collinearity amongst covariates in linear regression models has long been recognised as a potential source of bias. Various `solutions' have been proposed, though one issue almost entirely omitted in the current literature is the importance of the relationship between the outcome and the correlated covariates.  Using vector geometry, it can be shown that the impact of collinearity on the model, such as changes in regression coefficients, cannot be judged by the correlation structure of the covariates alone-their relationship with the outcome is crucial. Traditional diagnostics of collinearity are thus insufficient in evaluating adverse effects or model instability. Collinearity diagnostics should play an important role in assessing this impact, both adverse and beneficial, on model parameters. The objective of this study was to build a new index that measures the impact of collinearity in the model environment, rather than providing only a description of the feature. Vector geometry was used to design a measure that accounts for the relationship between the outcome and the correlated covariates-labelled the D-index. The D-index was implemented as part of a regression study to develop a parsimonious model for body fat using easily obtainable body circumference measurements. The covariates were selected based on the degree of collinearity amongst the predictors in the model and the variance explained in the response. Such a model would potentially allow for a reduction in the number of body size measurements required, reducing study length and cost, whilst maintaining measurements that most accurately represent total body fat.


This work is licensed under a Creative Commons Attribution 4.0 License.