California Assembly Bill 2013 (AB 2013) on Generative Artificial Intelligence: Training Data Transparency was signed into law on September 28, 2024, after the State Assembly and the State Senate approved it.
The law introduces transparency requirements for generative AI (GenAI) system developers. It mandates that developers publicly disclose information about the data used to train and test their GenAI models. GenAI systems and services used for purposes related to national security, military, or defense are exempt from such requirements.
The law addresses growing regulatory and public concerns around model bias, privacy, and other ethical accountability factors. To that end, it serves as a vital first step in a direction that would require developers to be more transparent about their backend development processes. This law helps Californians better understand how AI systems work while promoting responsible innovation.
Read on to learn more about the law in greater detail.
Who Does the Law Apply To?
The law applies to developers of generative artificial intelligence (AI) systems or services or entities that substantially modify such systems. The term "developer" includes any person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an AI system or service for use by members of the public. Members of the public exclude:
- Affiliate- entities that, directly or indirectly, through one or more intermediaries, controls, is controlled by, or is under common control with, another entity. This means the requirement to post public documentation under AB 2013 only applies when AI systems are made available outside an organization's internal or affiliated network.
- Members of a hospital's medical staff.
The phrase “substantially modifies it” means creating a new version, new release, or other update to a generative artificial intelligence system or service that materially changes its functionality or performance, including the results of retraining or fine-tuning.
What Does It Regulate?
The law regulates “generative artificial intelligence,” defined as AI that can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence’s training data.” The regulation applies to systems or services released on or after January 1, 2022.
Obligations on Developers
Developers are required to post specific documentation about the training data on their public websites by January 1, 2026 (or prior to substantial modifications). The documentation must include:
- Sources or owners of the datasets.
- A description of how the datasets align with the intended purpose of the AI system.
- Number and types of data points in the datasets.
- Whether the datasets contain copyrighted, trademarked, patented, or public domain information.
- Whether the developer purchased or licensed the datasets.
- Whether the datasets include ‘personal information’ or ‘aggregate consumer information’.
- Whether the developer cleaned, processed, or modified the datasets and the intended purpose of those efforts in relation to the AI system or service;
- The time period of data collection and whether data collection is ongoing.
- The time period during which the data in the datasets was collected, including a notice if the data collection is ongoing.
- Information about synthetic data generation, if used.
Exemptions
Certain AI systems or services are exempt from the training data transparency requirements:
- AI systems or services solely used for security and integrity purposes.
- AI systems or services used for the operation of aircraft in the national airspace.
- AI systems or services developed for national security, military, or defense purposes, only available to federal entities.
Key Takeaway
Maintaining a data provenance record is crucial for compliance with Assembly Bill 2013, which mandates transparency regarding the datasets used to train generative AI systems. By accurately tracking datasets' origin, ownership, modifications, and usage, businesses can meet the law’s requirements to disclose how data supports AI functionality, whether it contains personal or sensitive information, and if any synthetic data is used.