A New Algorithm for Detecting Outliers in Linear Regression


  •  Mehmet Satman    

Abstract

In this paper, we present a new algorithm for detecting multiple outliers in linear regression. The algorithm is based on a non-iterative robust covariance matrix and concentration steps used in LTS estimation. A robust covariance matrix is constructed to calculate Mahalanobis distances of independent variables which are then used as weights in weighted least squares estimation. A few concentration steps are then performed using the observations that have smallest residuals. We generate random data sets for $n=10^3, 10^4, 10^5$ and $p=5,10$ to show up the capabilities of the algorithm. In our Monte Carlo simulations, it is shown that our algorithm has very low masking and swamping ratios when the number of observations is up to $10^4$ in the case of maximum contamination in X-Space. It is also shown that, the algorithm is successful in the case of Y-Space outliers when the contamination level, sample size and number of parameters are up to $30\%$, $n=10^5$, and $p=10$, respectively. Bias, variance and MSE statistics are calculated for different scenarios. The reported computation time of our implementation is quite short. It is concluded that the presented algorithm is suitable and applicable for detecting multiple outliers in regression analysis with its small masking and swamping ratios, accurate estimates of regression parameters except the intercept, and short computation time in large data sets and high level of contamination. A future work is required for reducing bias and variance of the intercept estimator in the model.


This work is licensed under a Creative Commons Attribution 4.0 License.