Securiti Tops DSPM Ratings in GigaOm Report


Unstructured Data, GenAI, and Regulatory Compliance

Published April 24, 2024 / Updated May 21, 2024

Listen to the content

For most enterprises, unstructured data is difficult to manage, govern, and secure. Issues around the sheer volume and variety of unstructured data sources—from text-heavy emails and documents to photo, video, and social media files—complicate governance and security teams’ efforts to enforce consistent policies around unstructured data. Uncontrolled access and sharing obfuscate efforts to track data provenance and inconsistent formats make it difficult to manage unstructured data uniformly.

Partly because the tools and technologies built to manage unstructured data have proven less effective than those for structured data, many enterprises have deprioritized the proper management of unstructured data, and some struggle to even identify where it lives throughout their organization.

The risks and opportunities behind unstructured data

Unstructured data poses increased cybersecurity risks and challenges compared to structured data, primarily due to its lack of organization and the potential presence of sensitive information hidden within its content. A range of threats exist, including data breach and exposure, insider threats, shadow data, unclear or inconsistent data classification, data sprawl, and delivery of ransomware or malware.

At the same time, unstructured data is the primary input that fuels most GenAI systems, whose models are trained on massive amounts of unstructured text data. It is this data that enables them to develop (in effect “learn”) the natural language capabilities and contextual understanding they need to generate human-like output. And with the rise of GenAI and the untold opportunity it represents for enterprise, unstructured data — along with the challenges that beset it — is very much in the spotlight.

To mitigate these risks, organizations must implement robust data governance strategies, access controls, encryption, data loss prevention (DLP) tools, and employee training programs specifically tailored to the unique challenges of unstructured data. Continuous monitoring, auditing, and the adoption of advanced technologies like machine learning and data classification can also help organizations better identify, protect, and manage their unstructured data assets. Utilizing effective risk identification, threat detection, and identity and entitlement management together alongside data classification can provide a holistic solution to classify, manage, and protect your data — and help prevent data loss or theft.

The regulatory landscape around unstructured data and GenAI

Regulatory requirements around GenAI systems are already here — and they’re rapidly expanding in number and scope. In March, the European Union adopted the most comprehensive set of regulations around the use of AI by businesses, the EU AI Act, adding to the list of pre-established regulations that already cover unstructured data.

Existing regulations around unstructured data

Even before the AI Act that covers generative artificial intelligence specifically, privacy and security regulations have increasingly recognized the importance of protecting unstructured data, as it often contains sensitive and personal information. Key regulations that increase the scrutiny of unstructured data include:

  1. General Data Protection Regulation (GDPR): The GDPR, which applies to the European Union, defines personal data broadly, including unstructured data such as emails, documents, and multimedia files that can directly or indirectly identify an individual.
  2. California Consumer Privacy Act (CCPA): The CCPA defines personal information to include unstructured data like emails, text messages, and photos. Businesses must disclose the categories of personal information they collect, including unstructured data, and provide consumers with the right to access, delete, and opt-out of the sale of their personal information.
  3. Health Insurance Portability and Accountability Act (HIPAA): HIPAA in the United States requires covered entities to implement safeguards to protect the confidentiality, integrity, and availability of electronic protected health information (ePHI), which can include unstructured data such as medical records, physician notes, and diagnostic images.
  4. Payment Card Industry Data Security Standard (PCI DSS): PCI DSS requires merchants and service providers to protect cardholder data, which can include unstructured data like customer emails, call recordings, and scanned documents containing payment card information.
  5. Sarbanes-Oxley Act (SOX): SOX in the United States requires public companies to maintain and protect corporate records, including unstructured data such as emails, memos, and financial reports, for auditing purposes.
  6. In the EU, NIS2 (coming into effect in October 2024) and the Digital Operational Resilience Act “DORA” (coming into effect in January 2024) cover a wide range of business and financial information, including unstructured data and raise the bar in terms of security techniques required (e.g., anomaly detection, identity management, and vulnerability and threat reporting, among others).
  7. In the US, SEC Cybersecurity materiality reporting (which came into effect on December 18, 2023) applies to most US public companies now and smaller reporting entities starting June 2024. This requires a determination of materiality to be made “without undue delay” following a cyber incident, and if the incident is determined to be material, then a disclosure must be made to the SEC within four days. This applies to both unstructured and structured data — and to effectively comply with the tight reporting timelines requires active threat detection and reporting, as this information is key to making a determination of materiality (or not) and justifying such determination with evidence.

Many regulations emphasize the need for data discovery, classification, and protection measures to identify and secure sensitive unstructured data. Organizations must implement appropriate access controls, encryption, data loss prevention (DLP) tools, and other security measures to protect unstructured data containing personal, financial, or confidential information.

Where the new EU AI Act comes into play

Here are some key points from the EU AI Act that are relevant to unstructured data:

  1. Data governance: The AI Act emphasizes the importance of data governance and data management practices for AI systems.
  2. Data quality: The regulation requires that the training, validation, and testing data used for AI systems be relevant, representative, free of errors, and complete.
  3. Documentation: AI system providers are required to document their systems' characteristics, capabilities, and limitations, including information about the training data used.
  4. Data protection: The AI Act reinforces the need to comply with existing data protection regulations, such as the GDPR, when processing personal data for AI systems. This is particularly relevant for unstructured data that may contain personal information.
  5. Bias and discrimination: The act aims to mitigate the risks of bias and discrimination in AI systems, which could arise from biases present in the training data, including unstructured data sources.

What companies should do to ensure compliance?

To effectively manage their unstructured data, companies should implement the following strategies:

  1. Discover and classify unstructured data: Identify and classify unstructured data assets — including documents, emails, multimedia files, and so on — across the org using machine-learning data discovery tools and automatically categorize data based on sensitivity, content, purpose, and more. to automate the process and categorize data based on sensitivity.
  2. Establish a data governance framework: Manage unstructured data throughout its lifecycle with a comprehensive framework that defines policies, roles, and responsibilities — and includes rules for data creation, storage, access, retention, and disposal.
  3. Implement metadata management practices: Enrich unstructured data with contextual information, such as data owners, access permissions, retention periods, and so on.
  4. Apply access controls and data security: Protect sensitive unstructured data from unauthorized access, data breaches, or accidental exposure by establishing and implementing appropriate measures for access controls, encryption, and data loss prevention (DLP).
  5. Manage the entire data lifecycle: Define and enforce policies for data retention, archiving, and disposal. Ensure regulatory compliance and minimize data storage costs by automating processes for managing data lifecycle stages.
  6. Integrate cloud and on-prem: Manage unstructured data across cloud and on-premises environments, ensuring consistent governance, security, and compliance across hybrid infrastructure.
  7. Enable continuous monitoring and auditing: Implement processes to track data access, usage, and potential data leakage or misuse.

Today, Securiti and Lacework offer a combined solution to give companies the end-to-end visibility into their multicloud and on-prem environments that they need to govern and protect unstructured data at scale — and achieve regulatory compliance now and in the future. With the ability to prioritize risk based on Lacework security findings and determine the sensitivity of data with the Securiti Data Command Center, companies can identify their highest priority risks, focus on high-impact threats, intelligently prioritize and remediate data, protect sensitive information at scale, and properly manage their unstructured data environment. Learn more about the partnership to see what our combined solution can do for your org.

Join Our Newsletter

Get all the latest information, law updates and more delivered to your inbox


More Stories that May Interest You