Evaluation of Efficiency of Linear Techniques to Optimize Attribute Space in Machine Learning: Relevant Results for Extractive Methods of Summarizing
- Jesus Motta
- Laurence Capus
- Nicole Tourigny
Abstract
One major challenge in the field of machine learning, especially in classification problems, is to optimize the attribute space in order to obtain a classification function, which will be used to discriminate future items. Several approaches to optimize the attribute space can be used: some of them select the most relevant attributes and the other ones extract certain attributes to create a new smaller set of variables. These classification approaches have recently been implemented in the automatic summarization process with promising results. This paper enriches these first results with another new experiment. Five well-known linear methods were exploited to optimize the attribute space in an original manner on a corpus of 1250 text documents. These methods, used in data clustering and unsupervised machine learning, allow either attribute selection (Singular Value Decomposition, K-Means, Kohonen Neural Networks) or new attribute extraction (Principal Component Analysis, Factor Analysis). After having applied these methods to optimize attribute space, the validation phase was focused on the discrimination power of the obtained classification function. For that, six techniques of machine learning were used to abduce the classification function. Its performance was evaluated with the metric Fmesure and ROC curves. The results show that the application of the five chosen linear methods for optimizing attribute space in the automatic summarization process by extraction is relevant. They also show which machine learning technique is preferable to use with each linear method to obtain a better efficiency.
- Full Text: PDF
- DOI:10.5539/cis.v5n6p58
Journal Metrics
WJCI (2022): 0.636
Impact Factor 2022 (by WJCI): 0.419
h-index (January 2024): 43
i10-index (January 2024): 193
h5-index (January 2024): N/A
h5-median(January 2024): N/A
( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )
Index
- Academic Journals Database
- BASE (Bielefeld Academic Search Engine)
- CiteFactor
- CNKI Scholar
- COPAC
- CrossRef
- DBLP (2008-2019)
- EBSCOhost
- EuroPub Database
- Excellence in Research for Australia (ERA)
- Genamics JournalSeek
- Google Scholar
- Harvard Library
- Infotrieve
- LOCKSS
- Mendeley
- PKP Open Archives Harvester
- Publons
- ResearchGate
- Scilit
- SHERPA/RoMEO
- Standard Periodical Directory
- The Index of Information Systems Journals
- The Keepers Registry
- UCR Library
- Universe Digital Library
- WJCI Report
- WorldCat
Contact
- Chris LeeEditorial Assistant
- cis@ccsenet.org