Leveraging Synthetic Data as a Tool to Combat Bias in Artificial Intelligence (AI) Model Training

Fabuyi, Jumai Adedoja (2024) Leveraging Synthetic Data as a Tool to Combat Bias in Artificial Intelligence (AI) Model Training. Journal of Engineering Research and Reports, 26 (12). pp. 24-46. ISSN 2582-2926

[thumbnail of Fabuyi26122024JERR127156.pdf] Text
Fabuyi26122024JERR127156.pdf - Published Version

Download (970kB)

Abstract

This study investigates the efficacy of synthetic data in mitigating bias in artificial intelligence (AI) model training, focusing on demographic inclusivity and fairness. Using Generative Adversarial Networks (GANs), synthetic datasets were generated from the UCI Adult Dataset, COMPAS Recidivism Dataset, and MIMIC-III Clinical Database. Logistic regression models were trained on both synthetic and original datasets to evaluate fairness metrics and predictive accuracy. Fairness was assessed through demographic parity and equality of opportunity, which measure balanced prediction rates and equitable outcomes across demographic groups. Fidelity and data diversity were evaluated using statistical tests such as Kolmogorov-Smirnov (KS) and Kullback-Leibler (KL) divergence, along with the Inception Score, which quantifies diversity in synthetic data. The results revealed significant fairness improvements for models trained on synthetic datasets. For the COMPAS dataset, demographic parity increased from 0.72 to 0.89, and equality of opportunity rose from 0.65 to 0.83, without compromising predictive accuracy (0.82 AUC-ROC compared to 0.83 for original data). Based on the findings, this research recommends employing GANs for generating synthetic data in bias-sensitive domains to enhance demographic inclusivity and ensure equitable outcomes in AI models. Furthermore, integrating human-in-the-loop (HITL) systems is critical to monitor and address residual biases during data generation. Standardized validation frameworks, including fairness metrics and fidelity tests, should be adopted to ensure transparency and consistency across applications. These practices can enable organizations to leverage synthetic data effectively while maintaining ethical standards in AI development and deployment.

Item Type: Article
Subjects: Research Scholar Guardian > Engineering
Depositing User: Unnamed user with email support@scholarguardian.com
Date Deposited: 02 Dec 2024 06:41
Last Modified: 02 Dec 2024 06:41
URI: http://science.sdpublishers.org/id/eprint/2959

Actions (login required)

View Item
View Item