AI-Based Data Quality Management and Cleaning in Intelligent Databases
- Nzenwata Uchenna Jeremiah
- OMONIYI Emmanuel Oluwapelumi
- Victor Chiemere Ezeokechukwu
- Ikonne Austin
- Obebe Ifeoluwa Micah
- Atoro Toluwani Daniel
- Bright Joses Ayomide
- Akinola Oluwaseun Samuel
Abstract
The use of large datasets for predictive modeling in intelligent databases is frequently hindered by structural deficiencies, most notably missing records. Conventional imputation methods, such as mean or median substitution, distort statistical variance and fail to capture multivariate relationships among features. This paper presents a dynamic, AI-based data quality management framework that autonomously identifies and corrects structural anomalies prior to predictive modeling. Utilizing Classification and Regression Trees (CART), the system intelligently imputes missing categorical and numerical values by learning localized patterns from the observed portion of the dataset. To validate the integrity of the repaired database, the framework was evaluated on a credit risk dataset using a Tri-Ensemble of gradient boosting algorithms (XGBoost, LightGBM, and CatBoost), achieving an Area Under the ROC Curve (AUC) score of 0.9393. These results demonstrate that predictive, tree-based data imputation preserves the statistical distribution of the original data and substantially enhances the accuracy of downstream machine learning tasks. The proposed framework offers a scalable, automated solution for data preprocessing in distributed and intelligent database environments.
- Full Text:
PDF
- DOI:10.5539/cis.v19n2p30
Journal Metrics
WJCI (2022): 0.636
Impact Factor 2022 (by WJCI): 0.419
h-index (January 2024): 43
i10-index (January 2024): 193
h5-index (January 2024): N/A
h5-median(January 2024): N/A
( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )
Index
- BASE (Bielefeld Academic Search Engine)
- CNKI Scholar
- CrossRef
- DBLP (2008-2019)
- EuroPub Database
- Excellence in Research for Australia (ERA)
- Genamics JournalSeek
- GETIT@YALE (Yale University Library)
- Google Scholar
- Harvard Library
- Infotrieve
- Mendeley
- Open policy finder
- ResearchGate
- Scilit
- The Keepers Registry
- UCR Library
- WJCI Report
- WorldCat
Contact
- Chris LeeEditorial Assistant
- cis@ccsenet.org