Fabuyi, Jumai Adedoja (2024) Leveraging Synthetic Data as a Tool to Combat Bias in Artificial Intelligence (AI) Model Training. Journal of Engineering Research and Reports, 26 (12). pp. 24-46. ISSN 2582-2926
Fabuyi26122024JERR127156.pdf - Published Version
Download (970kB)
Abstract
This study investigates the efficacy of synthetic data in mitigating bias in artificial intelligence (AI) model training, focusing on demographic inclusivity and fairness. Using Generative Adversarial Networks (GANs), synthetic datasets were generated from the UCI Adult Dataset, COMPAS Recidivism Dataset, and MIMIC-III Clinical Database. Logistic regression models were trained on both synthetic and original datasets to evaluate fairness metrics and predictive accuracy. Fairness was assessed through demographic parity and equality of opportunity, which measure balanced prediction rates and equitable outcomes across demographic groups. Fidelity and data diversity were evaluated using statistical tests such as Kolmogorov-Smirnov (KS) and Kullback-Leibler (KL) divergence, along with the Inception Score, which quantifies diversity in synthetic data. The results revealed significant fairness improvements for models trained on synthetic datasets. For the COMPAS dataset, demographic parity increased from 0.72 to 0.89, and equality of opportunity rose from 0.65 to 0.83, without compromising predictive accuracy (0.82 AUC-ROC compared to 0.83 for original data). Based on the findings, this research recommends employing GANs for generating synthetic data in bias-sensitive domains to enhance demographic inclusivity and ensure equitable outcomes in AI models. Furthermore, integrating human-in-the-loop (HITL) systems is critical to monitor and address residual biases during data generation. Standardized validation frameworks, including fairness metrics and fidelity tests, should be adopted to ensure transparency and consistency across applications. These practices can enable organizations to leverage synthetic data effectively while maintaining ethical standards in AI development and deployment.
Item Type: | Article |
---|---|
Subjects: | Research Scholar Guardian > Engineering |
Depositing User: | Unnamed user with email support@scholarguardian.com |
Date Deposited: | 02 Dec 2024 06:41 |
Last Modified: | 02 Dec 2024 06:41 |
URI: | http://science.sdpublishers.org/id/eprint/2959 |