Optimal Algorithm for Metabolomics Classification and Feature Selection varies by Dataset

Charles Jr

doi:10.5539/ijb.v7n1p100

Optimal Algorithm for Metabolomics Classification and Feature Selection varies by Dataset

Charles Jr

Abstract

Metabolomics, the systematic identification and quantification of all metabolites in a biological system, is increasingly applied towards identification of biomarkers for disease diagnosis, prognosis and risk prediction. Applications of metabolomics extend across the health spectrum including Alzheimer's, cancer, diabetes, and trauma. Despite the continued interest in metabolomics there are numerous techniques for analyzing metabolomics datasets with the intent to classify group membership (e.g. Control or Treated). These include Partial Least Squares Discriminant Analysis, Support Vector Machines, Random Forest, Regularized Generalized Linear Models, and Prediction Analysis for Microarrays. Each classification algorithm is dependent upon different assumptions and can potentially lead to alternate conclusions. This project seeks to conduct an in depth comparison of algorithm performance on both simulated and real datasets to determine which algorithms perform best given alternate dataset structures. Three simulated datasets were generated to validate algorithm performance and mimic 'real' metabolomics data: (Han et al., 2011) independent null dataset (no correlation, no discriminatory variables), (Davis, Schiller, Eurich, & Sawyer, 2012) correlated null (no discriminating variables), (Guan et al., 2009) correlated discriminatory. This comparison is also applied to 3 open-access datasets including two Nuclear Magnetic Resonance (NMR) and one Mass Spectrometry (MS) dataset. Performance was evaluated based on the Robustness-Performance-Trade-off (RPT) incorporating a balance between model classification accuracy and feature selection stability. We also provide a free, open-source R Bioconductor package (OmicsMarkeR) that conducts the analyses herein. The proposed work provides an important advancement in metabolomics analysis and helps alleviate the confusion of potentially paradoxical analyses thereby leading to improved exploration of disease states and identification of clinically important biomarkers.

Full Text: XLSX XLSX UNKNOWN UNKNOWN UNKNOWN XLSX XLSX PDF
DOI:10.5539/ijb.v7n1p100

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN(Print): 1916-9671
ISSN(Online): 1916-968X
Started: 2009
Frequency: annual

Contact

Ryan JonesEditorial Assistant
ijb@ccsenet.org

Optimal Algorithm for Metabolomics Classification and Feature Selection varies by Dataset

Abstract

Index

Contact