Securiti Tops DSPM ratings by GigaOm

View

AB 1008: California’s Move to Regulate AI and Personal Data

Published September 27, 2024

Listen to the content

As artificial intelligence (AI) continues revolutionizing industries, concerns about data privacy are becoming increasingly critical, especially as AI systems increasingly rely on vast datasets often containing personal information. In a major step toward regulating AI, the California Senate passed Assembly Bill 1008 (AB 1008) on August 30, 2024, which was subsequently signed into law by the Governor on September 28, 2024.

This law expands the definition of personal information under the California Privacy Rights Act (CPRA) to include a wide array of formats, including AI models. Thus, it broadens the scope of privacy protections to data utilized by automated systems and machine learning models or large language models (LLMs).

It will impose new requirements on businesses using AI, significantly altering the governance and management of AI models, particularly those trained on personal information.

In this blog, we will explore the key provisions of AB 1008, what they mean for AI developers and users, their broader implications for data privacy and compliance, and how Securiti Genstack AI automation enables enterprises to ensure swift compliance with AI regulations.

Understanding AB 1008

California's AB 1008 introduces additional privacy law obligations for AI systems trained on personal information. This bill ensures that AI models, particularly LLMs, comply with the CPRA by expanding the scope of privacy law to include personal information processed within these systems. Below are the key changes introduced by AB 1008:

Expanded Definition of “Personal Information”

AB 1008 revises the definition of "personal information" under the CPRA to include "abstract digital formats," such as data processed in AI systems capable of outputting personal information. This includes model weights, tokens, or other outputs, derived from a consumer’s personal information that could produce an output linked to that individual.

This change significantly impacts AI systems, particularly LLMs, trained on personal information, by subjecting them to CPRA obligations as those dealing with conventional forms of personal information.

Biometric Data Protection

AB 1008 clarifies that biometric data- including fingerprints, facial recognition data, and iris scans- collected without a consumer’s knowledge is not considered publicly available information (which is exempted under CPRA) and must be treated as personal information under CPRA.

This is especially important for businesses using AI systems for facial recognition, voice analysis, or other biometric data processing. Even if collected in public, such data remains protected under the CPRA, requiring businesses to comply with privacy regulations, including obtaining consent and respecting consumers' data rights.

Consumer Rights Over AI Models

Following AB 1008, the business obligations regarding CPRA will continue beyond a model’s training phase. Even after their personal information has been used to train a machine-learning model, consumers have the right to access, delete, correct, and restrict the sale or sharing of personal data contained within the trained AI system as tokens or model weights.

Neural Data as Sensitive Personal Information

SB 1223, passed alongside AB 1008, introduces neural data as a category of sensitive personal information. Neural data refers to information generated from measuring a consumer’s central or peripheral nervous system activity. This means that AI models utilizing neural data will be subject to even stricter data protection obligations under the CPRA.

Implications for AI Developers and Companies

AB 1008 poses several challenges for AI developers and businesses who rely on AI models trained on personal information:

Cost of Compliance

It may be costly and time-consuming to retrain AI models after each consumer data request, particularly for enterprises that handle big volumes of data, such as LLMS created by Google, OpenAI, and other digital behemoths, which could need expensive and time-consuming retraining cycles.

Technical Feasibility

Organizations are required to respond to data subject requests within 90 days, which can pose significant operational challenges. While retraining smaller models within this timeframe may be feasible, meeting these requirements for large language models is much more difficult. This presents serious technological hurdles as the retraining process for LLMs requires extensive computing resources, specialized hardware, and time.

Operational Challenges in Data Management

Managing DSR requests to access, delete, or correct personal data in AI systems introduces significant operational complexity. Businesses will need to track the flow of personal information from collection to AI model outputs. Keeping track of which personal information is used in training datasets and AI outputs can be difficult, especially in complex systems involving third-party data providers, vendors, or multiple data sources.

Data Integrity

Another technological challenge is ensuring an AI model retains its performance and integrity while honoring DSR requests. Deleting or correcting individual data points may affect an AI system's general accuracy and behavior, which might reduce the system's overall effectiveness.

Data Privacy as a Design Consideration

AB 1008 will probably require AI developers and companies to include data privacy protections from the very beginning of AI development. The importance of privacy-by-design strategies—which remove or anonymize personal information from training datasets—will only grow, and they might result in non-personalized AI outputs.

Do AI Models Contain Personal Information?

The question of whether AI models contain personal information has already sparked debate among regulators. Currently European authorities are divided about this issue.

For instance, the Hamburg Data Protection Authority (DPA) has adopted a different stance, maintaining that LLMs do not contain personal data and are, therefore, not subject to data subject rights such as deletion or correction.

This position contrasts with California’s stance under AB 1008 which treats AI models as potential repositories of personal information, thereby subjecting them to consumer privacy rights and regulatory obligations. This stance was solidified when the California Privacy Protection Agency (CPPA) voted to support the bill, following a staff position paper that emphasized the need to regulate AI models under existing privacy laws.

This discrepancy between California and European perspectives may make compliance more challenging for international companies. Organizations must implement adaptable and dynamic data management practices that comply with local regulations to successfully navigate these diverse regulatory landscapes.

Possible Solution: Sanitizing the AI Data Pipeline

Following the enactment of AB 1008, businesses must take proactive measures to navigate the compliance complexities it introduces. One effective strategy is sanitizing the AI data pipeline during the training phase, ensuring that personal information is not used to train AI models in the first place. This approach could avoid the need for costly retraining in response to consumer requests.

For example, businesses can adopt data anonymization, de-identification, or synthetic data generation techniques that allow them to train AI models without personal information. This would not only ensure compliance with AB 1008 but also reduce the costs and operational challenges associated with retraining models.

Accelerate AI Compliance with Securiti Genstack AI

Despite the challenges presented by AB 1008, the main challenge in scaling enterprise generative AI systems is securely connecting to diverse data systems while maintaining controls and governance throughout the AI pipeline.

Large enterprises orchestrating GenAI systems face several challenges: securely processing extensive structured and unstructured datasets, safeguarding data privacy, managing sensitive information, protecting GenAI models from threats like data poisoning and prompt injection, and performing these operations at scale.

Securiti’s Genstack AI Suite removes the complexities and risks inherent in the GenAI lifecycle, empowering organizations to swiftly and safely utilize their structured and unstructured data anywhere with any AI and LLMs. It provides features such as secure data ingestion and extraction, data masking, anonymization, redaction, and indexing and retrieval capabilities.

Additionally, it facilitates the configuration of LLMs for Q&A, inline data controls for governance, privacy, and security, and LLM firewalls to enable the safe adoption of GenAI.

Securiti’s Genstack AI enables organizations to:

  • Streamlines data connectivity: Genstack AI simplifies the connection to hundreds of data systems (unstructured and structured data), ensuring seamless integration across diverse data environments, including public, private, SaaS, and data clouds.
  • Accelerates AI pipeline development: Enables faster construction of generative AI pipelines by supporting popular Vector databases (DBs), large language models (LLMs), and AI prompt interfaces.
  • Secure deployment: Facilitates the secure deployment of enterprise-grade generative AI systems by maintaining data governance, security, and compliance controls throughout the AI pipeline.
  • Comprehensive and flexible solution: Genstack AI offers multiple components that can be used collectively for end-to-end enterprise retrieval-augmented generation (RAG) systems or individually for various AI use cases.
  • Enterprise-grade AI: Designed specifically to meet the needs of enterprises, ensuring that generative AI systems are safe, scalable, and compliant with industry regulations.
  • Data Sanitization: Classification and redaction of sensitive data on the fly, ensuring data privacy and compliance policies are properly enforced before data is fed to the AI models.
  • Data Vectorization and Integration: Turn data into custom embeddings with associated metadata and load them to your chosen vector database, making your enterprise data ready for LLM to use.
  • LLM Model Selection: Select from a wide range of vector databases and LLM models to build an AI system that aligns with your business goals and operational requirements for a specific use case.
  • LLM Firewalls: Protect AI interactions, including prompts, responses, and data retrievals with context-aware LLM firewalls. Custom and pre-configured policies block malicious attacks, prevent sensitive data leaks, ensure your AI systems align with corporate policies, and preserve access entitlements to documents and files.

Securiti Genstack AI enables organizations to accelerate their transition from generative AI POCs to production by ensuring the safe use of enterprise data, alignment with corporate policies, compliance with evolving AI laws, and continuous monitoring and enforcement of guardrails.

What’s Next for AB 1008?

With AB 1008 now signed into law, the potential implications on businesses using AI are obvious: compliance will become more complex, and the cost of managing AI systems trained on personal data could rise significantly. Companies operating in California will need to rethink their data strategies, prioritizing privacy-first approaches and adopting technologies that allow for easy removal, correction, and management of personal information in AI models. This law could set a precedent for other states or countries, pushing the global conversation on how AI systems handle personal data.

Join Our Newsletter

Get all the latest information, law updates and more delivered to your inbox


Share


More Stories that May Interest You

What's
New