Introduction
The EU AI Act is the first of its kind, a comprehensive AI regulation that lays down rules on artificial intelligence. It entered into force on August 1, 2024. Chapter V of the EU AI Act specifies the obligations of the providers of general-purpose artificial intelligence (GPAI) models that went into effect on August 2, 2025.
Article 53(1)(d) of the EU AI Act obligates the providers of GPAI models to create and publish a detailed summary of the GPAI model’s training data. This summary must follow a template provided by the AI Office. All providers of GPAI models, including the providers of free and open-source licenses, are required to fulfill this obligation.
To support this obligation, the European Commission has most recently, on 24 July 2025, released the Explanatory Notice and Template for the Public Summary of Training Content for General-Purpose AI (GPAI) Models.
What is the Objective of the Summary
The summary aims to ensure transparency across the board concerning the data used for training GPAI models. With the template in place, GPAI model providers will now be required to release a consistent summary of the data that was used to train their models.
The summary will help various parties, especially copyright holders and data subjects, exercise and enforce their rights. Moreover, it will assist downstream providers in assessing data diversity to prevent bias, allow researchers to evaluate risks, and promote a more competitive market.
The summary must be comprehensive, covering data from all training stages, but it is not required to be overly technical. Providers are encouraged to voluntarily disclose more details to help copyright holders verify if their content was used for training.
What Must Be Disclosed
The European Commission’s template outlines three main sections:
This contains basic information such as the provider and their authorised representative’s name and contact details, versioned model name(s), model dependencies, date of placement on the EU market, etc.
In addition, it should contain information on the model and the provider, as well as specifics about the modalities present in the training data (text, image, video, and audio), the sizes of each modality within wide ranges, and a description of the type of content (e.g., fiction, press publications, photography, audiobooks, music videos) included in the training data.
2. List of Data Sources
To ensure the completeness of the summary regarding the content used for the model training, the list of data sources requires disclosure of the primary datasets that were used to train the model, such as:
- Large private or public databases,
- A detailed narrative description of the data scraped online by the provider or on their behalf (including a summary of the most pertinent domain names scraped), and
- A narrative description of all other data sources used (such as user data or synthetic data).
3. Relevant Data Processing Aspects
This mandates the disclosure of the methods and steps undertaken for processing the data before model training. This is particularly crucial for complying with EU legislation for copyright and associated rights (including respect for opt-outs under text and data mining rules), as well as for removing illegal content to minimize the possibility that the GPAI model may replicate and distribute it widely.
Who is Affected and When
All GPAI model providers, including those with systemic risks, must disclose their specific summaries before putting their models on the EU market. Providers of models made available under open-source and free licenses are also subject to this obligation.
The requirement to publish the training data summary takes effect starting August 2, 2025. Providers of GPAI models released before that date must ensure the summary is published no later than August 2, 2027. The providers must also explicitly identify and explain any missing information in the public summary. This applies if they've made their best effort, but the information is either unavailable or would be an unreasonable burden to retrieve.
The summary must be posted on the provider's official website in an easily readable format, explicitly identifying the model or models (and maybe the model version or versions) that the summary covers. It is recommended that the summary be made publicly accessible with the model through all of its public distribution methods, including internet platforms.
Enforcement
Beginning August 2, 2026, the AI Office will supervise and enforce guidelines for GPAI models. GPAI model providers are required to update summaries at least every six months or whenever there are material changes, such as model fine-tuning or additional training, to the training data. Non-compliance can result in substantial penalties, with fines reaching up to 3% of a provider's total global annual turnover or €15 million, whichever is higher.
The Broader Outlook
Transparency is at the core of the EU AI Act, requiring AI model providers, developers, deployers, distributors, and other applicable stakeholders to comply with the Act’s requirements.
The Explanatory Notice is a step in the right direction, providing deeper insights into the AI training data for the general public and copyright holders, without requiring providers to fully provide intellectual raw data, other business-critical trade secrets, or sensitive data.
With the template in place, copyright holders can have oversight of whether copyrighted material has been utilized to train the AI model. It also establishes a universal standard for internal documentation, providing greater transparency across teams and relevant authorities.