Securiti launches Gencore AI, a holistic solution to build Safe Enterprise AI with proprietary data - easily

View

EDPB Report on ChatGPT Taskforce: Navigating GDPR Compliance for LLMs

Get Free GDPR Assessment
Contributors

Syed Tatheer Kazmi

Associate Data Privacy Analyst, Securiti

CIPP/Europe

Maria Khan

Data Privacy Legal Manager at Securiti

FIP, CIPT, CIPM, CIPP/E

Listen to the content

The widespread adoption of large language models (LLMs) has been a notable trend in recent years due to the rapid advancement in artificial intelligence and technology, with applications across multiple domains and industries. These models rely heavily on web scraping which involves the automated collection and extraction of different publicly available data from the web which are then used for training purposes.

Since these models are trained on vast publicly available datasets often containing personal data, their development and deployment must adhere to the requirements of the General Data Protection Regulation (GDPR) and other applicable data protection regulations. Over the last few months, different countries across the EU, including the Netherlands, Italy, and the UK, have released guidance and instructions on web scraping for LLMs that help them comply with data protection standards. Notably, Meta has recently halted the use of public content shared by adults on Facebook and Instagram across the EU/EEA for training its AI model following directions from the Irish Data Protection Authority (DPA).

While the EU AI Act aims to establish a legal framework for the deployment and use of AI systems, it should be read together with the GDPR if the processing of personal data is involved. On May 23, 2024, the European Data Protection Board (EDPB) published a report outlining the key takeaways from the ChatGPT Task Force's work (EDPB Report). While not formally designated as guidance, this Report provides valuable insights that will likely influence the evaluation of AI systems' compliance with the GDPR.

Although the EDPB Report is specific to ChatGPT, it has indicated some major takeaways and actionable insights for LLMs which are as follows: 

(1) Ensure data protection and security safeguards:

  • When processing personal data, LLMs must be designed and deployed to ensure accountability and data protection by design, prioritizing compliance with GDPR requirements at all processing stages. Controllers should proactively implement necessary safeguards and measures to protect personal data and cannot rely on technical impossibility for justifying non-compliance with the requirements of the GDPR.
  • LLMs must adopt technical measures defining precise data collection criteria and ensuring that certain data categories are not collected or that certain sources (such as public social media profiles) are excluded from the data collected. Appropriate technical measures should be adopted to delete or anonymize personal data that has been collected via web scraping before the training stage.

(2) Ensure lawful basis for collection and processing of personal data:

  • Although the EDPB Report does not indicate whether or not consent is likely to be an appropriate lawful basis in the context of web scraping, it does mention that each processing of personal data must meet at least one of the lawful basis specified in Article 6(1) of the GDPR or Article 9(2) of the GDPR in the case of sensitive personal data processing. The data subject’s consent is unlikely to be an appropriate lawful basis in the context of web scraping due to the large-scale data collection and the difficulty of identifying whose data we want to scrape. Similarly, performance of contract is unlikely to be an appropriate legal basis as there is no such relationship between the data subject and the data controller which requires the data subject to provide his or her personal data.
  • The EDPB Report recognizes that OpenAI relies on legitimate interests as the basis of the collection and processing of personal data to train ChatGPT. The EDPB Report highlights the controllers of LLMs must establish a legitimate basis for collecting and processing of personal data, including data scraped from the web and user inputs. To rely on legitimate interests as a legal basis of data processing, the following three tests must be met:
    • The necessity test which requires the pursuit of a legitimate interests by the data controller to whom the data is disclosed,
    • The purpose test which requires the data controller to identify the specific purpose of data processing and the need to process personal data for the legitimate interests pursued, and
    • The balancing test which requires the data controller to balance the legitimate interests of the controller against the fundamental rights and freedoms of data subjects.

In this Report, the EDPB highlighted the need to balance data controller’s interests with users' privacy rights and implement safeguards to ensure compliance with the GDPR.

(3) Ensure the protection of sensitive personal data:

  • Article 9 of the GDPR lays down conditions under which the processing of sensitive personal data can take place. One of the grounds is if the processing relates to sensitive personal data which are manifestly made public by the data subject. Even though LLMs rely on publicly available data, the EDPB has cautioned that it is important to ascertain whether the data subject had intended, explicitly and by clear affirmative action, to make that sensitive personal data accessible to the general public, and just because personal data is publicly accessible does not imply that it was made public by the data subject manifestly. This view is consistent with the Advocate General’s Opinion in the case of (Schrems v. Meta). In this case, the Advocate General noted that a statement made by a person about his or her sexual orientation in a panel discussion open to the public does not in itself permit the aggregating and analyzing the sexual orientation of that person for personalized advertising purposes.
  • To ensure only appropriate data is collected and retained, sensitive personal data categories should be filtered, both during data collection (selecting criteria for what data is collected) and immediately after data collection (deleting data).

(4) Ensure user transparency and fairness:

  • LLMs relying on legitimate interests as the legal basis for using input and uploaded files from data subjects (referred to as "Content") for training purposes must clearly and demonstrably inform data subjects about this practice. Users must be aware that their Content may be used for such purposes and must be given the option to opt-out. The EDPB further indicates that if the input data becomes part of the data model, OpenAI must remain responsible for the compliance with the GDPR in the first place and should not put the onus on data subjects by placing a clause in the Terms and Conditions that data subjects are responsible for their chat inputs.
  • The controller must provide proper information on the probabilistic output creation mechanisms and their limited reliability level, including explicit reference to the fact that the generated text may be biased or made up. Where notification to data subjects is not possible, take appropriate measures to protect data subjects’ rights and freedoms, including making the information publicly available
  • LLMs must ensure that their data processing practices are fair and do not unfairly shift responsibility to users. It is important to provide clear information to users about how their data is used, especially regarding AI training. Moreover, personal data should not be processed in a detrimental, discriminatory, unjustifiable, unexpected, or misleading manner.

(5) Ensure data subjects’ rights fulfillment:

  • The EDPB emphasized the importance of making it easy for users to exercise their GDPR rights, such as accessing, correcting, and deleting their data. LLMs must facilitate straightforward mechanisms for users to manage their data and make decisions about automatic processing. It should continue to improve the modalities/interfaces to facilitate data subject rights.

Based on the afore-mentioned takeaways identified in the EDPB Report on ChatGPT Taskforce, LLMs must implement appropriate data protection measures both at the time of the determination of the means for data processing and at the time of the data processing itself and integrate the necessary safeguards for the protection of data subjects’ rights. This EDPB Report indicates that web scraping on the basis of the data controller’s legitimate interests is possible if appropriate technical measures are in place.

Join Our Newsletter

Get all the latest information, law updates and more delivered to your inbox


Share


More Stories that May Interest You

Videos

View More

Mitigating OWASP Top 10 for LLM Applications 2025

Generative AI (GenAI) has transformed how enterprises operate, scale, and grow. There’s an AI application for every purpose, from increasing employee productivity to streamlining...

View More

DSPM vs. CSPM – What’s the Difference?

While the cloud has offered the world immense growth opportunities, it has also introduced unprecedented challenges and risks. Solutions like Cloud Security Posture Management...

View More

Top 6 DSPM Use Cases

With the advent of Generative AI (GenAI), data has become more dynamic. New data is generated faster than ever, transmitted to various systems, applications,...

View More

Colorado Privacy Act (CPA)

What is the Colorado Privacy Act? The CPA is a comprehensive privacy law signed on July 7, 2021. It established new standards for personal...

View More

Securiti for Copilot in SaaS

Accelerate Copilot Adoption Securely & Confidently Organizations are eager to adopt Microsoft 365 Copilot for increased productivity and efficiency. However, security concerns like data...

View More

Top 10 Considerations for Safely Using Unstructured Data with GenAI

A staggering 90% of an organization's data is unstructured. This data is rapidly being used to fuel GenAI applications like chatbots and AI search....

View More

Gencore AI: Building Safe, Enterprise-grade AI Systems in Minutes

As enterprises adopt generative AI, data and AI teams face numerous hurdles: securely connecting unstructured and structured data sources, maintaining proper controls and governance,...

View More

Navigating CPRA: Key Insights for Businesses

What is CPRA? The California Privacy Rights Act (CPRA) is California's state legislation aimed at protecting residents' digital privacy. It became effective on January...

View More

Navigating the Shift: Transitioning to PCI DSS v4.0

What is PCI DSS? PCI DSS (Payment Card Industry Data Security Standard) is a set of security standards to ensure safe processing, storage, and...

View More

Securing Data+AI : Playbook for Trust, Risk, and Security Management (TRiSM)

AI's growing security risks have 48% of global CISOs alarmed. Join this keynote to learn about a practical playbook for enabling AI Trust, Risk,...

Spotlight Talks

Spotlight 47:42

Cybersecurity – Where Leaders are Buying, Building, and Partnering

Rehan Jalil
Watch Now View
Spotlight 46:02

Building Safe Enterprise AI: A Practical Roadmap

Watch Now View
Spotlight 13:32

Ensuring Solid Governance Is Like Squeezing Jello

Watch Now View
Spotlight 40:46

Securing Embedded AI: Accelerate SaaS AI Copilot Adoption Safely

Watch Now View
Spotlight 10:05

Unstructured Data: Analytics Goldmine or a Governance Minefield?

Viral Kamdar
Watch Now View
Spotlight 21:30

Companies Cannot Grow If CISOs Don’t Allow Experimentation

Watch Now View
Spotlight 2:48

Unlocking Gen AI For Enterprise With Rehan Jalil

Rehan Jalil
Watch Now View
Spotlight 13:35

The Better Organized We’re from the Beginning, the Easier it is to Use Data

Watch Now View
Spotlight 13:11

Securing GenAI: From SaaS Copilots to Enterprise Applications

Rehan Jalil
Watch Now View
Spotlight 47:02

Navigating Emerging Technologies: AI for Security/Security for AI

Rehan Jalil
Watch Now View

Latest

View More

Accelerating Safe Enterprise AI with Gencore Sync & Databricks

We are delighted to announce new capabilities in Gencore AI to support Databricks' Mosaic AI and Delta Tables! This support enables organizations to selectively...

View More

Building Safe, Enterprise-grade AI with Securiti’s Gencore AI and NVIDIA NIM

Businesses are rapidly adopting generative AI (GenAI) to boost efficiency, productivity, innovation, customer service, and growth. However, IT & AI executives—particularly in highly regulated...

View More

The Right to Data Portability in the Middle East

Discover the regulatory landscape of data portability in the Middle East, particularly its requirements, limitations/exceptions. Learn how Securiti helps ensure swift compliance.

Data Protection in the Telecommunications Sector of the UAE View More

Data Protection in the Telecommunications Sector of the UAE

Gain insights into data protection regulations in the UAE telecommunications sector. Discover data governance framework, data security obligations and how Securiti can help.

The Future of Privacy View More

The Future of Privacy: Top Emerging Privacy Trends in 2025

Download the whitepaper to gain insights into the top emerging privacy trends in 2025. Analyze trends and embed necessary measures to stay ahead.

View More

Personalization vs. Privacy: Data Privacy Challenges in Retail

Download the whitepaper to learn about the regulatory landscape and enforcement actions in the retail industry, data privacy challenges, practical recommendations, and how Securiti...

Nigeria's DPA View More

Navigating Nigeria’s DPA: A Step-by-Step Compliance Roadmap

Download the infographic to learn how Nigeria's Data Protection Act (DPA) mapping impacts your organization and compliance strategy.

Decoding Data Retention Requirements Across US State Privacy Laws View More

Decoding Data Retention Requirements Across US State Privacy Laws

Download the infographic to explore data retention requirements across US state privacy laws. Understand key retention requirements and noncompliance penalties.

Gencore AI and Amazon Bedrock View More

Building Enterprise-Grade AI with Gencore AI and Amazon Bedrock

Learn how to build secure enterprise AI copilots with Amazon Bedrock models, protect AI interactions with LLM Firewalls, and apply OWASP Top 10 LLM...

DSPM Vendor Due Diligence View More

DSPM Vendor Due Diligence

DSPM’s Buyer Guide ebook is designed to help CISOs and their teams ask the right questions and consider the right capabilities when looking for...

What's
New