A Text and Data Analytics Approach to Enrich the Quality of Unstructured Research Information


  •  Otmane Azeroual    

Abstract

With the increased accessibility of research information, the demands on research information systems (RIS) that are expected to automatically generate and process knowledge are increasing. Furthermore, the quality of the RIS data entries of the individual sources of information causes problems. If the data is structured in RIS, users can read and filter out their information and knowledge needs without any problems. This technique, which nevertheless allows text databases and text sources to be analyzed and knowledge extracted from unknown texts, is referred to as text mining or text data mining based on the principles of data mining. Text mining allows automatically classifying large heterogeneous sources of research information and assigning them to specific topics. Research information has always played a major role in higher education and academic institutions, although they were usually available in unstructured form in RIS and grow faster than structured data. This can be a waste of time searching for RIS staff in universities and can lead to bad decision-making. For this reason, the present paper proposes a new approach to obtaining structured research information from heterogeneous information systems. It is a subset of an approach to the semantic integration of unstructured data using the example of a RIS. The purpose of this paper is to investigate text and data mining methods in the context of RIS and to develop an improvement quality model as an aid to RIS using universities and academic institutions to enrich unstructured research information.



This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: semiannual

Journal Metrics

WJCI (2022): 0.636

Impact Factor 2022 (by WJCI):  0.419

h-index (January 2024): 43

i10-index (January 2024): 193

h5-index (January 2024): N/A

h5-median(January 2024): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )

Contact