To mitigate these risks, organizations must implement robust data governance strategies, access controls, encryption, data loss prevention (DLP) tools, and employee training programs specifically tailored to the unique challenges of unstructured data. Continuous monitoring, auditing, and the adoption of advanced technologies like machine learning and data classification can also help organizations better identify, protect, and manage their unstructured data assets. Utilizing effective risk identification, threat detection, and identity and entitlement management together alongside data classification can provide a holistic solution to classify, manage, and protect your data — and help prevent data loss or theft.
The regulatory landscape around unstructured data and GenAI
Regulatory requirements around GenAI systems are already here — and they’re rapidly expanding in number and scope. In March, the European Union adopted the most comprehensive set of regulations around the use of AI by businesses, the EU AI Act, adding to the list of pre-established regulations that already cover unstructured data.
Existing regulations around unstructured data
Even before the AI Act that covers generative artificial intelligence specifically, privacy and security regulations have increasingly recognized the importance of protecting unstructured data, as it often contains sensitive and personal information. Key regulations that increase the scrutiny of unstructured data include:
- General Data Protection Regulation (GDPR): The GDPR, which applies to the European Union, defines personal data broadly, including unstructured data such as emails, documents, and multimedia files that can directly or indirectly identify an individual.
- California Consumer Privacy Act (CCPA): The CCPA defines personal information to include unstructured data like emails, text messages, and photos. Businesses must disclose the categories of personal information they collect, including unstructured data, and provide consumers with the right to access, delete, and opt-out of the sale of their personal information.
- Health Insurance Portability and Accountability Act (HIPAA): HIPAA in the United States requires covered entities to implement safeguards to protect the confidentiality, integrity, and availability of electronic protected health information (ePHI), which can include unstructured data such as medical records, physician notes, and diagnostic images.
- Payment Card Industry Data Security Standard (PCI DSS): PCI DSS requires merchants and service providers to protect cardholder data, which can include unstructured data like customer emails, call recordings, and scanned documents containing payment card information.
- Sarbanes-Oxley Act (SOX): SOX in the United States requires public companies to maintain and protect corporate records, including unstructured data such as emails, memos, and financial reports, for auditing purposes.
- In the EU, NIS2 (coming into effect in October 2024) and the Digital Operational Resilience Act “DORA” (coming into effect in January 2024) cover a wide range of business and financial information, including unstructured data and raise the bar in terms of security techniques required (e.g., anomaly detection, identity management, and vulnerability and threat reporting, among others).
- In the US, SEC Cybersecurity materiality reporting (which came into effect on December 18, 2023) applies to most US public companies now and smaller reporting entities starting June 2024. This requires a determination of materiality to be made “without undue delay” following a cyber incident, and if the incident is determined to be material, then a disclosure must be made to the SEC within four days. This applies to both unstructured and structured data — and to effectively comply with the tight reporting timelines requires active threat detection and reporting, as this information is key to making a determination of materiality (or not) and justifying such determination with evidence.
Many regulations emphasize the need for data discovery, classification, and protection measures to identify and secure sensitive unstructured data. Organizations must implement appropriate access controls, encryption, data loss prevention (DLP) tools, and other security measures to protect unstructured data containing personal, financial, or confidential information.
Where the new EU AI Act comes into play
Here are some key points from the EU AI Act that are relevant to unstructured data:
- Data governance: The AI Act emphasizes the importance of data governance and data management practices for AI systems.
- Data quality: The regulation requires that the training, validation, and testing data used for AI systems be relevant, representative, free of errors, and complete.
- Documentation: AI system providers are required to document their systems' characteristics, capabilities, and limitations, including information about the training data used.
- Data protection: The AI Act reinforces the need to comply with existing data protection regulations, such as the GDPR, when processing personal data for AI systems. This is particularly relevant for unstructured data that may contain personal information.
- Bias and discrimination: The act aims to mitigate the risks of bias and discrimination in AI systems, which could arise from biases present in the training data, including unstructured data sources.
What companies should do to ensure compliance?
To effectively manage their unstructured data, companies should implement the following strategies:
- Discover and classify unstructured data: Identify and classify unstructured data assets — including documents, emails, multimedia files, and so on — across the org using machine-learning data discovery tools and automatically categorize data based on sensitivity, content, purpose, and more. to automate the process and categorize data based on sensitivity.
- Establish a data governance framework: Manage unstructured data throughout its lifecycle with a comprehensive framework that defines policies, roles, and responsibilities — and includes rules for data creation, storage, access, retention, and disposal.
- Implement metadata management practices: Enrich unstructured data with contextual information, such as data owners, access permissions, retention periods, and so on.
- Apply access controls and data security: Protect sensitive unstructured data from unauthorized access, data breaches, or accidental exposure by establishing and implementing appropriate measures for access controls, encryption, and data loss prevention (DLP).
- Manage the entire data lifecycle: Define and enforce policies for data retention, archiving, and disposal. Ensure regulatory compliance and minimize data storage costs by automating processes for managing data lifecycle stages.
- Integrate cloud and on-prem: Manage unstructured data across cloud and on-premises environments, ensuring consistent governance, security, and compliance across hybrid infrastructure.
- Enable continuous monitoring and auditing: Implement processes to track data access, usage, and potential data leakage or misuse.
Today, Securiti and Lacework offer a combined solution to give companies the end-to-end visibility into their multicloud and on-prem environments that they need to govern and protect unstructured data at scale — and achieve regulatory compliance now and in the future. With the ability to prioritize risk based on Lacework security findings and determine the sensitivity of data with the Securiti Data Command Center, companies can identify their highest priority risks, focus on high-impact threats, intelligently prioritize and remediate data, protect sensitive information at scale, and properly manage their unstructured data environment. Learn more about the partnership to see what our combined solution can do for your org.