News

From principles to practice: Why ethical AI starts with data

Huge datasets are the cornerstone of artificial intelligence (AI) systems, including the large language models (LLMs) that power chatbots and other generative AI applications.

Woman, left, holds smartphone up to face of man, right, trees in background.
  • Artificial Intelligence
  • ethics

Huge datasets are the cornerstone of artificial intelligence (AI) systems,  including the large language models (LLMs) that power chatbots and other generative AI applications. These datasets are used to train AI systems to perform diverse tasks, such as analyzing text or images, or providing agricultural advice. However, AI developers often pay little attention to where the data originate—in most cases, from people and communities. Those data are frequently collected and used to train AI models without the permission or knowledge of the original sources. A related problem is datasets that underrepresent or exclude certain groups, such as women—AI models trained on such data can produce biased, discriminatory outcomes.

That is why there should be no AI without data ethics. The way data are collected, labeled, and used in AI apps should reflect values such as fairness, transparency, accountability, and respect for privacy. When these values are overlooked in AI development and deployment, people and communities may be harmed in various ways.

Read More Here