Addressing the Problem of Coherence in Automatic Text Summarization: A Latent Semantic Analysis Approach


  •  Abdulfattah Omar    

Abstract

This article is concerned with addressing the problem of coherence in the automatic summarization of prose fiction texts. Despite the increasing advances within the summarization theory, applications and industry, many problems are still unresolved in relations to the applications of the summarization theory to literature. This can be in part attributed to the peculiar nature of literary texts where standard or typical summarization processes are not amenable for literature. This study, therefore, tends to bridge the gap between literature and summarization theory by proposing a summarization system that is based on more semantic-based approaches for extracting more meaningful and coherent summaries. Given that lack of coherence within summaries has its negative implications on understanding original texts; it follows that more effective methods should be developed in relation to the extraction of coherent summaries. In order to do this, a hybrid of methods including statistical (TF-IDF) and semantic (Latent Semantic Analysis LSA) methods were used to derive the most distinctive features and extract summaries from 10 English novellas. For evaluation purposes, both intrinsic and extrinsic methods are used for determining the quality of the extracted summaries. Results indicate that the integration of LSA into features extraction methods achieves better summarization performance outcomes in terms of coherence properties within the extracted summaries.



This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1923-869X
  • ISSN(Online): 1923-8703
  • Started: 2011
  • Frequency: bimonthly

Journal Metrics

Google-based Impact Factor (2021): 1.43

h-index (July 2022): 45

i10-index (July 2022): 283

h5-index (2017-2021): 25

h5-median (2017-2021): 37

Learn more

Contact