Evaluation of Efficiency of Linear Techniques to Optimize Attribute Space in Machine Learning: Relevant Results for Extractive Methods of Summarizing


  •  Jesus Motta    
  •  Laurence Capus    
  •  Nicole Tourigny    

Abstract

One major challenge in the field of machine learning, especially in classification problems, is to optimize the attribute space in order to obtain a classification function, which will be used to discriminate future items. Several approaches to optimize the attribute space can be used: some of them select the most relevant attributes and the other ones extract certain attributes to create a new smaller set of variables. These classification approaches have recently been implemented in the automatic summarization process with promising results. This paper enriches these first results with another new experiment. Five well-known linear methods were exploited to optimize the attribute space in an original manner on a corpus of 1250 text documents. These methods, used in data clustering and unsupervised machine learning, allow either attribute selection (Singular Value Decomposition, K-Means, Kohonen Neural Networks) or new attribute extraction (Principal Component Analysis, Factor Analysis). After having applied these methods to optimize attribute space, the validation phase was focused on the discrimination power of the obtained classification function. For that, six techniques of machine learning were used to abduce the classification function. Its performance was evaluated with the metric Fmesure and ROC curves. The results show that the application of the five chosen linear methods for optimizing attribute space in the automatic summarization process by extraction is relevant. They also show which machine learning technique is preferable to use with each linear method to obtain a better efficiency.



This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: semiannual

Journal Metrics

WJCI (2022): 0.636

Impact Factor 2022 (by WJCI):  0.419

h-index (January 2024): 43

i10-index (January 2024): 193

h5-index (January 2024): N/A

h5-median(January 2024): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )

Contact