Securiti leads GigaOm's DSPM Vendor Evaluation with top ratings across technical capabilities & business value.

View

Unstructured Data, GenAI, and Regulatory Compliance

Author

Jack Berkowitz

Chief Data Officer at Securiti

Listen to the content

For most enterprises, unstructured data is difficult to manage, govern, and secure. Issues around the sheer volume and variety of unstructured data sources—from text-heavy emails and documents to photo, video, and social media files—complicate governance and security teams’ efforts to enforce consistent policies around unstructured data. Uncontrolled access and sharing obfuscate efforts to track data provenance and inconsistent formats make it difficult to manage unstructured data uniformly.

Partly because the tools and technologies built to manage unstructured data have proven less effective than those for structured data, many enterprises have deprioritized the proper management of unstructured data, and some struggle to even identify where it lives throughout their organization.

The risks and opportunities behind unstructured data

Unstructured data poses increased cybersecurity risks and challenges compared to structured data, primarily due to its lack of organization and the potential presence of sensitive information hidden within its content. A range of threats exist, including data breach and exposure, insider threats, shadow data, unclear or inconsistent data classification, data sprawl, and delivery of ransomware or malware.

At the same time, unstructured data is the primary input that fuels most GenAI systems, whose models are trained on massive amounts of unstructured text data. It is this data that enables them to develop (in effect “learn”) the natural language capabilities and contextual understanding they need to generate human-like output. And with the rise of GenAI and the untold opportunity it represents for enterprise, unstructured data — along with the challenges that beset it — is very much in the spotlight.

To mitigate these risks, organizations must implement robust data governance strategies, access controls, encryption, data loss prevention (DLP) tools, and employee training programs specifically tailored to the unique challenges of unstructured data. Continuous monitoring, auditing, and the adoption of advanced technologies like machine learning and data classification can also help organizations better identify, protect, and manage their unstructured data assets. Utilizing effective risk identification, threat detection, and identity and entitlement management together alongside data classification can provide a holistic solution to classify, manage, and protect your data — and help prevent data loss or theft.

The regulatory landscape around unstructured data and GenAI

Regulatory requirements around GenAI systems are already here — and they’re rapidly expanding in number and scope. In March, the European Union adopted the most comprehensive set of regulations around the use of AI by businesses, the EU AI Act, adding to the list of pre-established regulations that already cover unstructured data.

Existing regulations around unstructured data

Even before the AI Act that covers generative artificial intelligence specifically, privacy and security regulations have increasingly recognized the importance of protecting unstructured data, as it often contains sensitive and personal information. Key regulations that increase the scrutiny of unstructured data include:

  1. General Data Protection Regulation (GDPR): The GDPR, which applies to the European Union, defines personal data broadly, including unstructured data such as emails, documents, and multimedia files that can directly or indirectly identify an individual.
  2. California Consumer Privacy Act (CCPA): The CCPA defines personal information to include unstructured data like emails, text messages, and photos. Businesses must disclose the categories of personal information they collect, including unstructured data, and provide consumers with the right to access, delete, and opt-out of the sale of their personal information.
  3. Health Insurance Portability and Accountability Act (HIPAA): HIPAA in the United States requires covered entities to implement safeguards to protect the confidentiality, integrity, and availability of electronic protected health information (ePHI), which can include unstructured data such as medical records, physician notes, and diagnostic images.
  4. Payment Card Industry Data Security Standard (PCI DSS): PCI DSS requires merchants and service providers to protect cardholder data, which can include unstructured data like customer emails, call recordings, and scanned documents containing payment card information.
  5. Sarbanes-Oxley Act (SOX): SOX in the United States requires public companies to maintain and protect corporate records, including unstructured data such as emails, memos, and financial reports, for auditing purposes.
  6. In the EU, NIS2 (coming into effect in October 2024) and the Digital Operational Resilience Act “DORA” (coming into effect in January 2024) cover a wide range of business and financial information, including unstructured data and raise the bar in terms of security techniques required (e.g., anomaly detection, identity management, and vulnerability and threat reporting, among others).
  7. In the US, SEC Cybersecurity materiality reporting (which came into effect on December 18, 2023) applies to most US public companies now and smaller reporting entities starting June 2024. This requires a determination of materiality to be made “without undue delay” following a cyber incident, and if the incident is determined to be material, then a disclosure must be made to the SEC within four days. This applies to both unstructured and structured data — and to effectively comply with the tight reporting timelines requires active threat detection and reporting, as this information is key to making a determination of materiality (or not) and justifying such determination with evidence.

Many regulations emphasize the need for data discovery, classification, and protection measures to identify and secure sensitive unstructured data. Organizations must implement appropriate access controls, encryption, data loss prevention (DLP) tools, and other security measures to protect unstructured data containing personal, financial, or confidential information.

Where the new EU AI Act comes into play

Here are some key points from the EU AI Act that are relevant to unstructured data:

  1. Data governance: The AI Act emphasizes the importance of data governance and data management practices for AI systems.
  2. Data quality: The regulation requires that the training, validation, and testing data used for AI systems be relevant, representative, free of errors, and complete.
  3. Documentation: AI system providers are required to document their systems' characteristics, capabilities, and limitations, including information about the training data used.
  4. Data protection: The AI Act reinforces the need to comply with existing data protection regulations, such as the GDPR, when processing personal data for AI systems. This is particularly relevant for unstructured data that may contain personal information.
  5. Bias and discrimination: The act aims to mitigate the risks of bias and discrimination in AI systems, which could arise from biases present in the training data, including unstructured data sources.

What companies should do to ensure compliance?

To effectively manage their unstructured data, companies should implement the following strategies:

  1. Discover and classify unstructured data: Identify and classify unstructured data assets — including documents, emails, multimedia files, and so on — across the org using machine-learning data discovery tools and automatically categorize data based on sensitivity, content, purpose, and more. to automate the process and categorize data based on sensitivity.
  2. Establish a data governance framework: Manage unstructured data throughout its lifecycle with a comprehensive framework that defines policies, roles, and responsibilities — and includes rules for data creation, storage, access, retention, and disposal.
  3. Implement metadata management practices: Enrich unstructured data with contextual information, such as data owners, access permissions, retention periods, and so on.
  4. Apply access controls and data security: Protect sensitive unstructured data from unauthorized access, data breaches, or accidental exposure by establishing and implementing appropriate measures for access controls, encryption, and data loss prevention (DLP).
  5. Manage the entire data lifecycle: Define and enforce policies for data retention, archiving, and disposal. Ensure regulatory compliance and minimize data storage costs by automating processes for managing data lifecycle stages.
  6. Integrate cloud and on-prem: Manage unstructured data across cloud and on-premises environments, ensuring consistent governance, security, and compliance across hybrid infrastructure.
  7. Enable continuous monitoring and auditing: Implement processes to track data access, usage, and potential data leakage or misuse.

Today, Securiti and Lacework offer a combined solution to give companies the end-to-end visibility into their multicloud and on-prem environments that they need to govern and protect unstructured data at scale — and achieve regulatory compliance now and in the future. With the ability to prioritize risk based on Lacework security findings and determine the sensitivity of data with the Securiti Data Command Center, companies can identify their highest priority risks, focus on high-impact threats, intelligently prioritize and remediate data, protect sensitive information at scale, and properly manage their unstructured data environment. Learn more about the partnership to see what our combined solution can do for your org.

Join Our Newsletter

Get all the latest information, law updates and more delivered to your inbox


Share


More Stories that May Interest You

Videos

View More

Mitigating OWASP Top 10 for LLM Applications 2025

Generative AI (GenAI) has transformed how enterprises operate, scale, and grow. There’s an AI application for every purpose, from increasing employee productivity to streamlining...

View More

DSPM vs. CSPM – What’s the Difference?

While the cloud has offered the world immense growth opportunities, it has also introduced unprecedented challenges and risks. Solutions like Cloud Security Posture Management...

View More

Top 6 DSPM Use Cases

With the advent of Generative AI (GenAI), data has become more dynamic. New data is generated faster than ever, transmitted to various systems, applications,...

View More

Colorado Privacy Act (CPA)

What is the Colorado Privacy Act? The CPA is a comprehensive privacy law signed on July 7, 2021. It established new standards for personal...

View More

Securiti for Copilot in SaaS

Accelerate Copilot Adoption Securely & Confidently Organizations are eager to adopt Microsoft 365 Copilot for increased productivity and efficiency. However, security concerns like data...

View More

Top 10 Considerations for Safely Using Unstructured Data with GenAI

A staggering 90% of an organization's data is unstructured. This data is rapidly being used to fuel GenAI applications like chatbots and AI search....

View More

Gencore AI: Building Safe, Enterprise-grade AI Systems in Minutes

As enterprises adopt generative AI, data and AI teams face numerous hurdles: securely connecting unstructured and structured data sources, maintaining proper controls and governance,...

View More

Navigating CPRA: Key Insights for Businesses

What is CPRA? The California Privacy Rights Act (CPRA) is California's state legislation aimed at protecting residents' digital privacy. It became effective on January...

View More

Navigating the Shift: Transitioning to PCI DSS v4.0

What is PCI DSS? PCI DSS (Payment Card Industry Data Security Standard) is a set of security standards to ensure safe processing, storage, and...

View More

Securing Data+AI : Playbook for Trust, Risk, and Security Management (TRiSM)

AI's growing security risks have 48% of global CISOs alarmed. Join this keynote to learn about a practical playbook for enabling AI Trust, Risk,...

Spotlight Talks

Spotlight 12:!3

You Can’t Build Pipelines, Warehouses, or AI Platforms Without Business Knowledge

Watch Now View
Spotlight 47:42

Cybersecurity – Where Leaders are Buying, Building, and Partnering

Rehan Jalil
Watch Now View
Spotlight 27:29

Building Safe AI with Databricks and Gencore

Rehan Jalil
Watch Now View
Spotlight 46:02

Building Safe Enterprise AI: A Practical Roadmap

Watch Now View
Spotlight 13:32

Ensuring Solid Governance Is Like Squeezing Jello

Watch Now View
Spotlight 40:46

Securing Embedded AI: Accelerate SaaS AI Copilot Adoption Safely

Watch Now View
Spotlight 10:05

Unstructured Data: Analytics Goldmine or a Governance Minefield?

Viral Kamdar
Watch Now View
Spotlight 21:30

Companies Cannot Grow If CISOs Don’t Allow Experimentation

Watch Now View
Spotlight 2:48

Unlocking Gen AI For Enterprise With Rehan Jalil

Rehan Jalil
Watch Now View
Spotlight 13:35

The Better Organized We’re from the Beginning, the Easier it is to Use Data

Watch Now View

Latest

Accelerating Safe Enterprise AI View More

Accelerating Safe Enterprise AI: Securiti’s Gencore AI with Databricks and Anthropic Claude

Securiti AI collaborates with the largest firms in the world who are racing to adopt and deploy safe generative AI systems, leveraging their own...

View More

CAIO’s Guide to Building Safe Knowledge Agents

AI is rapidly moving from test cases to real-world implementation like internal knowledge agents and customer service chatbots, and a PwC report predicts 2025...

View More

What are Data Security Controls & Its Types

Learn what are data security controls, the types of data security controls, best practices for implementing them, and how Securiti can help.

View More

What is cloud Security? – Definition

Discover the ins and outs of cloud security, what it is, how it works, risks and challenges, benefits, tips to secure the cloud, and...

The Future of Privacy View More

The Future of Privacy: Top Emerging Privacy Trends in 2025

Download the whitepaper to gain insights into the top emerging privacy trends in 2025. Analyze trends and embed necessary measures to stay ahead.

View More

Personalization vs. Privacy: Data Privacy Challenges in Retail

Download the whitepaper to learn about the regulatory landscape and enforcement actions in the retail industry, data privacy challenges, practical recommendations, and how Securiti...

India’s Telecom Security & Privacy Regulations View More

India’s Telecom Security & Privacy Regulations: A High-Level Overview

Download the infographic to gain a high-level overview of India’s telecom security and privacy regulations. Learn how Securiti helps ensure swift compliance.

Nigeria's DPA View More

Navigating Nigeria’s DPA: A Step-by-Step Compliance Roadmap

Download the infographic to learn how Nigeria's Data Protection Act (DPA) mapping impacts your organization and compliance strategy.

Gencore AI and Amazon Bedrock View More

Building Enterprise-Grade AI with Gencore AI and Amazon Bedrock

Learn how to build secure enterprise AI copilots with Amazon Bedrock models, protect AI interactions with LLM Firewalls, and apply OWASP Top 10 LLM...

DSPM Vendor Due Diligence View More

DSPM Vendor Due Diligence

DSPM’s Buyer Guide ebook is designed to help CISOs and their teams ask the right questions and consider the right capabilities when looking for...

What's
New