Securiti Tops DSPM ratings by GigaOm

View

GenAI Governance: Why Entitlements Matter More Than Ever

Published August 16, 2024

Listen to the content

Generative AI (GenAI) is set to unleash unprecedented productivity, promising a massive economic impact. Unstructured data is at the heart of this GenAI revolution, providing rich, diverse inputs to fuel GenAI systems, particularly Large Language Models (LLMs). This data is 90% of the enterprise data today and comes in the form of text, images, audio and video files, and social media posts.

While GenAI leverages both unstructured and structured data for valuable business insights, it comes with its own set of challenges. In the earlier blogs, we have covered data intelligence , data quality, and data lineage. In this blog, you'll learn how preserving data entitlements or permissions is critical for GenAI's success.

GenAI Security

Why Preserving Data Entitlements is Essential for GenAI Success

A large insurance company deployed Microsoft Copilot, a GenAI-powered digital assistant, across their organization. The very next day, they had to pull the plug on it, as it exposed that their data access controls in Office 365 were overly permissive. Users started to receive responses from Copilot that contained highly sensitive data they should never have had permission to access. This incident not only highlights the data security risks associated with GenAI systems but also raises the question: how can we address them?

In a recent survey, 71% of IT decision-makers cited that GenAI will introduce new security threats to their data. These security risks are uniquely new, largely because GenAI often behaves like a black box, making it difficult for you to see which data it accesses and exposes. Many large enterprises are now limiting or banning the workplace use of GenAI assistants, primarily due to the lack of visibility into the files, documents, and sensitive data these assistants might access.

Imagine your marketing team gives a prompt to a knowledge system, expecting to access salary data. Without proper access controls, the GenAI system might pull this sensitive salary data to generate its response. Similar issues can arise if interns gain access to confidential business strategies or financial forecasts, marketers see detailed financial statements meant only for the finance team, or the finance team accesses sensitive customer data intended only for the sales team. It can lead to conflicts and misuse of sensitive information.

Access restrictions are not just important between teams but also within teams. For example, detailed financial statements are restricted to financial analysts and senior management. What if GenAI systems expose them to the entire finance team? And what happens when GenAI models learn from them and leak the information later?

Even when the source data entitlements are limited to one team or one role, GenAI systems are very likely to encounter inadvertent exposure to other teams or roles. In such cases, the primary risk is the violation of confidentiality and privacy. This can disrupt operations, harm employee trust, and lead to significant legal and competitive consequences.

Traditionally, you can manage compliance through governance practices of enforcing access policies and defining controls. However, GenAI systems do not use standard queries to access source data, making any conventional access control impossible to implement. This is why it is crucial for you to be vigilant in managing data entitlement for GenAI.

Gartner defines Entitlement Management as a technology that grants, resolves, enforces, revokes, and administers fine-grained access entitlements. It covers access policies to structured and unstructured data, services, and devices. For GenAI, it is essential to preserve the entitlements of the source data, ensuring that data access is restricted to the original permissions and accordingly reflected in the GenAI response.

For many organizations, GenAI systems are still at the proof-of-concept (POC) stage. When these POCs move to production, preserving source data entitlements is essential to ensure safe and compliant data use.

Challenges in Understanding and Enforcing Data Access Entitlements

The Copilot case underscores the necessity of properly defining and following good access hygiene policies so that enforcement of these entitlements through GenAI systems keeps your data protected from unintentional leakage. However, this process is fraught with several challenges. Most of these challenges arise because GenAI models use diverse unstructured data that cannot be governed by traditional tools and technologies.

  1. Lack of Transparency: GenAI models often operate as black boxes, making it difficult to identify and control the data being accessed.
  2. Ambiguous Ownership: GenAI's rapid adoption heightens the urgency for clear data ownership. Unstructured data is often siloed and lacks clear ownership. It's typically created and managed across various departments, leading to ambiguity in accountability. Companies may sequester unstructured data for legitimate reasons (e.g., upcoming commentary for an acquisition) or for less desired causes (e.g., political boundaries between divisions), further complicating ownership issues.
  3. Diverse Data Sources: The unstructured data used by GenAI systems comes from varied and complex sources. Ensuring consistent enforcement across these sources can be difficult.
  4. Complex Data Transformations: Unstructured data undergoes numerous complex transformations before getting fed to GenAI models. These transformations can affect the tracking and enforcement of entitlements.
  5. Dynamic and Real-time Enforcement: GenAI systems access data in dynamic and probabilistic ways, making it difficult to track and manage entitlements. Implementing real-time entitlement enforcement in GenAI pipelines requires advanced monitoring and control mechanisms.

Data Entitlements: Structured vs. Unstructured Data

Structured Data

Unstructured Data

Entitlements are easily managed with predefined roles and permissions linked to database schemas. Entitlement management is complex, as the formats vary, and understanding the context is essential.
Automated entitlement management is straightforward due to consistent data structure. Automation is complex, often requiring AI and ML techniques to understand content and context.
Database management systems (DBMS) provide robust features for managing data entitlements. Sophisticated tools are required to assess and enforce permissions at a granular level across diverse data sources.
Entitlement policies can be easily scaled across similar data structures. Scaling entitlement policies is difficult due to the combination of permissions assignment and discretionary file sharing.

How Securiti Preserves and Updates Entitlements for GenAI Success

A common concern with developing GenAI systems is how to preserve source data entitlements and control data access as information is transferred to GenAI. Securiti enables you to safely use both unstructured and structured data with GenAI while maintaining all enterprise data access controls. It preserves all associated metadata and entitlements when transferring data from source systems to GenAI models. The GenAI systems honor these entitlements by analyzing the prompt, identifying which files they need to access to build the response, matching user entitlements with the source data entitlements, and then either accessing the files if allowed or declining to respond.

Generative AI adoption has also exposed inadequacies in existing access controls, highlighting the need for comprehensive audits. Securiti’s Data Access Intelligence and Governance (DAIG) helps organizations ensure proper access control hygiene in existing data systems. By updating and enforcing robust access protocols, Securiti’s DAIG assists organizations in aligning their AI systems with established data governance frameworks, minimizing unauthorized access risks while maximizing AI's potential.

6 Best Practices for Preserving Source Data Entitlements

Preserving entitlements paves the way for safeguarding your data in GenAI systems. By following the best practices, you can prevent data leakage, maintain compliance, and fully harness the transformative potential of GenAI.

  1. Include unstructured data in your enterprise data security strategy: Consider all aspects of the safe use of unstructured data and make sure to include them in your data security strategy. Train your employees to use data safely, especially for GenAI. Gartner notes that 82% of data breaches in 2022 were a result of employee behaviors that were unsecure or inadvertent.
  2. Prioritize entitlement management: Gartner recommends that you prioritize visibility and entitlement when building your AI data pipeline. Enhancing your entitlement management and standardizing access to the AI data pipeline will significantly reduce the risk of misuse.
  3. Invest in the right tools: GenAI helps unlock the value of unstructured data, which constitutes the majority of enterprise data today. Select tools that utilize AI and ML to provide dynamic controls across your data sources.
  4. Ensure robust entitlements for source data: Regularly implement, review, and modify data entitlements to reflect changes in updated roles, responsibilities, and compliance requirements.
  5. Maintain audit records: Track, log, and maintain all changes and access to source data. This approach ensures transparency and supports compliance efforts.
  6. Protect your sensitive data with least privilege reviews: Identify and map user roles and permissions for sensitive data in unstructured data repositories. For GenAI systems, ensure these entitlements are maintained and enforced during data extraction and throughout the pipelines or prompts. Regularly review security policies and compliance to apply suitable data protection measures, such as encryption, tokenization, and masking.

In Summary

Preserving data entitlements is crucial in the evolving field of GenAI to prevent unauthorized access and ensure compliance. With GenAI systems mostly leveraging unstructured data, traditional access controls often fall short. With Securiti, you can preserve source data entitlements to safeguard data and pave the way for GenAI success.

Learn to leverage unstructured data safely and effectively for GenAI. Download the white paper Harnessing Unstructured Data for GenAI: A Primer for CDOs.

Harnessing Unstructured Data for GenAI:
A Primer for CDOs

Join Our Newsletter

Get all the latest information, law updates and more delivered to your inbox


Share


More Stories that May Interest You

What's
New