When will the EU AI Act come into effect?

The AI Act will become fully applicable in 2026 (except for a few provisions) with a phased enforcement timeline that began on August 1, 2024. Various provisions came into effect after their effective date. Provisions on prohibited AI practices came into effect in February 2025, with various other obligations and chapters coming into effect gradually in 2025, 2026, and 2027.

Which AI systems are considered high-risk?

High-risk AI systems include any AI systems that pose significant impacts on health, safety, or fundamental rights. These include AI used in critical infrastructure, medical devices, law enforcement, recruitment, education, and financial services. Any providers or deployers of such systems must adhere to the requirements related to risk management, data governance, transparency, and human oversight.

How will the EU AI Act be enforced?

The newly created European AI Office will oversee the enforcement of the AI Act. This office will work with the various supervisory authorities in the EU member states and coordinate efforts related to compliance, audits, investigation of violations, and future recommendations.

What penalties exist for non-compliance?

Non-compliance with the AI Act can result in fines of up to €35 million or 7% of a company's annual turnover, whichever is higher. The penalties are tiered based on the severity of the violation. Violations of prohibited AI practices carry the highest penalties, while non-compliance with other obligations (such as those for high-risk systems) can result in fines up to €15 million or 3% of global turnover. Providing incorrect information to authorities carries the lowest penalties, up to €7.5 million or 1% of global turnover.

Products

Data Command Center
View

Data+AI Security Teams

Data+AI Teams

Data Governance Teams

Data Privacy Teams

Secure Data+AI anywhere

Data Security Posture Management

Secure sensitive data everywhere from hybrid multicloud to SaaS

Agent Commander

Detect AI risk. Protect AI systems. Undo AI mistakes.

Security for AI Agents and Copilots

Ensure robust data security controls to accelerate Agentic AI adoption across the enterprise.

Data Minimization

Automate Data Minimization: Reduce Cost, Risk & Accelerate Trusted AI at scale

Data Discovery & Classification

Discover shadow and cloud-native assets and accurately classify data

Compliance Management

Assess & improve compliance with security best practices frameworks

Breach Impact Analysis

Analyze breach impact & automate notifications to affected individuals

Data Flow Governance

Understand data lineage and secure real-time streaming data

Data Access Intelligence & Governance

Monitor user access to data and enforce least privilege controls

Build safe enterprise AI systems

Safe Enterprise AI Copilots

Implement rule-aware AI copilots across your organization’s data anywhere

Data Vectorization and Ingestion

Extract info from complex Unstructured Files, convert it into AI-ready formats, and sync to vector databases

Data Curation and Sanitization for AI

Transform raw, unstructured files into data ready for model training and tuning

Context-aware LLM Firewalls

Protect AI interactions with intelligent retrieval, response, and prompt firewalls

Unstructured Data Governance

Manage and govern unstructured data to enable its safe use with generative AI

Govern data for safe innovation

Data Discovery & Classification

Discover shadow and cloud-native assets and accurately classify data

Unstructured Data Governance

Manage unstructured data to enable safe use with generative AI

Data Access Governance

Monitor sensitive data access and prevent unauthorized use

AI Governance

Establish controls for safe adoption of AI technologies including GenAI

Data Catalog

Enable users to easily find, understand, trust and access the data they need

Data Lineage

Automatically track changes and transformations of data throughout its lifecycle

Data Quality

Conduct data quality checks and validation across various data types

Automate data privacy operations

Data Mapping Automation

Manage your entire data mapping lifecycle and automate RoPA reports

AI Governance

Comply with emerging AI regulations and ensure safe use of AI

Data Subject Request Automation

Automate entire DSR lifecycle from consumer request intake to secure report delivery

Assessment Automation

Automate your entire assessment lifecycle and demonstrate compliance

Compliance Management

Use automation to audit and improve compliance with global regulations and industry standards

Consent Management

Manage your first-party and third-party consent lifecycle from scanning to reporting

Mobile App Consent Management

Seamlessly track and manage user consent with your mobile app, get compliant with all major global regulations.

Breach Management

Automate your incident management and optimize notifications to users & regulatory bodies

Privacy Center

Elegant Consumer Frontend, Fully Automated Backend, Privacy Regulation Intelligent Everywhere
Solutions
Technologies

Covering you everywhere with 1000+ integrations across data systems.

GCP

View

AWS

View

Databricks

View

Snowflake

View

Azure

View

+ More

View

Learn more

Industries

Enabling Safe Use of Data and AI across verticals.

Finance

View

Healthcare

View

Telecom

View

Retail

View

Travel & Hospitality

View

Learn more

Regulations & Frameworks

Automate compliance with global privacy regulations.

CDMC

View

EU AI Act

View

OWASP

View

NIST AI RMF

View

European Union GDPR

View

California's CPRA

View

Brazil's LGPD

View

Canada's PIPEDA

View

China's PIPL

View

+ More

View

Learn more

Roles

Identify data risk and enable protection & control.

Data+AI Builders

View

Data Security

View

Data Privacy

View

Data Governance

View

Marketing

View
Resources

Blog

Read through our articles written by industry experts

Collateral

Product brochures, white papers, infographics, analyst reports and more.

Knowledge Center

Learn about the data privacy, security and governance landscape.

Securiti Education

Courses and Certifications for data privacy, security and governance professionals.

Webinars

Learn from industry thought leaders why you need a Data Command Center to enable safe use of data.
Company

About Us

Learn all about Securiti, our mission and history

Partner Program

Join our Partner Program

Contact Us

Contact us to learn more or schedule a demo

News Coverage

Read about Securiti in the news

Press Releases

Find our latest press releases

Careers

Join the talented Securiti team

Home Blog AI Security & Governance 5 Ways to Accelerate Unstructured Data Cleansing for AI with Securiti and DataBricks

5 Ways to Accelerate Unstructured Data Cleansing for AI with Securiti and DataBricks

Published June 9, 2025

Author

Jocelyn Houle

Senior Director of Product Management at Securiti

This post is also available in: Arabic

The Unstructured Data Challenge

LLMs has created an opportunity for organizations to extract tremendous value from their unstructured data. However, CDAOs are all too aware of the challenges involved in incorporating unstructured data into large-scale data transformations. In an ideal world, it would be just as easy to use unstructured data as it is to use structured data. Organizations need to know that data can be trusted, that it has been thoroughly sanitized at an element level with granular access entitlements that protect all the data in the data estate. Today, organizations struggle to apply the same degree of governance typically afforded to their business-critical structured data as they do to their ever-expanding reservoir of unstructured data. Meanwhile, eagerly awaited AI initiatives stall.

Organizations that leverage Databricks for analytics and AI face specific technical challenges when working with unstructured data, which comprises approximately 90% of enterprise information. While Databricks excels at handling structured data and has made progress on unstructured sources, teams in complex, hybrid cloud data environments may encounter several critical pain points when attempting to incorporate unstructured sources into their data pipelines:

1. Complex and Manual Preprocessing Requirements

Ingesting unstructured data (including zipped folders, mixed file types, and inconsistent CSV formats) requires preprocessing before it can be loaded into Databricks. Teams typically need to build custom Python scripts or use external tools to parse, clean, and convert data into Delta Lake format, which creates scalability challenges and maintenance overhead.

2. Granular Permission Management is Cumbersome

To build permission-aware AI applications that safeguard confidential and proprietary data, firms must ensure that only authorized users can access sensitive unstructured data. Today, that often requires meticulous configuration. Unity Catalog provides centralized access control, but setting up granular permissions—especially for external locations in cloud storage—is a manual and error-prone process. Why is that? The answer is technical and organizational. Locking down unstructured data in general requires the organization to have comprehensive fine grained permissions established - unfortunately, due to constantly changing data sources, even the best run companies tend to be over provisioned with far too many people having access. For AI use cases, the matter is even more complicated as the AI workflow includes a process called vectorization that turns all the info into an indexable representation LLMs can read and in the process, breaks the access controls you thought you had in the first place.

Databricks' collaborative environment, like all modern cloud data platforms, accelerates the speed at which data can be shared, which in turn increases the risk of accidental or intentional data exposure. Unstructured data often contains sensitive information, and, if not thoroughly scanned, it is impossible to ensure that sensitive data is fully accounted for. Rapid data ingestion and sharing often result in partial scans and misconfigured access controls, making it difficult to maintain compliance with regulations such as GDPR, HIPAA, or PCI-DSS.

4. Feature Extraction and Structuring Overhead

It is not enough to find sensitive data in complex multi-user scenarios. Tools must be in place to minimize, redact, and sanitize sensitive data before it is loaded or considered a gold copy. Before unstructured data can be used for analytics or AI, it must undergo complex feature extraction and transformation. Today, this requires additional pipelines and specialized tooling that engineering teams must build and maintain.

5. Query Performance and Storage Management Challenges

Querying unstructured data can be slow and resource-intensive. Transformations such as flattening nested data degrade performance at scale, while unstructured data quickly balloons storage costs and complicates governance. Without the tools to curate and trust the precise unstructured data you absolutely need- no more no less- organizations may get unpleasant surprise bills.

How Securiti Expands Solutions to Unstructured Data Challenges

Securiti has partnered with Databricks to deliver end-to-end, trusted unstructured data management with full context through Securiti’s Gencore AI solution newly directly integrated into Delta Tables and Unity Catalog. This new partnership enables organizations to more easily and quickly build safe, enterprise-grade generative AI (GenAI) systems and AI agents, using high-value, proprietary enterprise data.

Securiti AI enhances Databricks in five powerful ways:

1. Simplified Unstructured Data Ingestion

Gencore AI safely ingests unstructured and structured data from SaaS apps and on-prem systems into Databricks Delta tables. It eliminates the need for custom preprocessing scripts by providing hundreds of native connectors to quickly and securely ingest data at scale from anywhere, including public, private, SaaS, and data clouds.

Data engineers benefit: Instead of building and maintaining custom scripts, teams can leverage Securiti's extensive connector library to streamline the ingestion process, reducing data preparation time by up to 60% as reported by shared Securiti and Databricks customers.

2. Automated Data Sanitization and Protection

Gencore AI helps sanitize (redact, mask, or anonymize) sensitive information before bringing it into Databricks. The solution automatically classifies and redacts sensitive data on-the-fly, ensuring privacy and compliance before data is exposed to AI models or transformed into vectors that can be later retrieved.

Security teams benefit: Before data enters AI pipelines and LLMs, comprehensive checks ensure alignment with AI governance, privacy, security, compliance, and sovereignty requirements - dramatically reducing security and compliance risks.

3. Advanced Data Security & Governance

Built-in data protection, alignment with OWASP Top 10 for LLMs, and a graph-based full provenance view of AI and data enable safe AI systems at scale. Gencore AI implements advanced LLM firewalls to understand the context of all AI interactions, including prompts, responses, and data retrievals, to offer end-to-end protection of enterprise data far beyond easily circumvented model guardrails.

Compliance teams benefit: Custom and pre-configured policies block malicious attacks, prevent sensitive data leaks, and ensure enterprise AI systems align with corporate policies. These context-aware firewalls also preserve access entitlements to documents and files throughout the AI pipeline.

4. Enhanced Unity Catalog Intelligence

Unity Catalog gains enriched context through Securiti's Data Command Graph, thus increasing data utilization. Securiti's Data Command Graph contains rich context about relationships between files, tables, columns, AI objects, users, permissions, and regulations that can be seamlessly registered within Unity Catalog.

Data administrators benefit: The comprehensive context increases Unity Catalog's utility and enables safer data usage across the platform.

5. The Securiti Data Command Graph: A Game-Changer for Databricks

At the heart of Securiti's solution is the Data Command Graph—a knowledge graph that provides contextual intelligence about enterprise data. This graph enables:

Precise selection of relevant files and datasets based on labels, entitlements, regulations, and quality
Comprehensive visibility into data lineage and relationships
Preservation of user entitlements at the prompt level, enhancing security and compliance

"Contextual intelligence for both unstructured and structured data is at the heart of GenAI use cases," said Jocelyn Houle, Sr. Director of Product Management, “The Data Command Graph automatically builds knowledge about your data that provides insights to the GenAI pipeline at every step for its safe use.”

The graph provides in-depth contextual insights into data objects, such as files, folders, buckets, tables, or columns, including related context, such as sensitive information, entitlements, location, applicable policies and processes, and regulations.

Conclusion: Unlocking the AI adoption with Securiti and Databricks

The partnership between Securiti and Databricks represents a significant advancement in enterprise AI and permission-aware solution-building capabilities. By addressing the critical challenges of unstructured data management, organizations can now unlock the full potential of their data assets while maintaining rigorous security, governance, and compliance standards.

As organizations continue to invest in AI initiatives, solutions like Gencore AI will become essential for scaling enterprise AI responsibly and efficiently. The integration enables teams to focus on innovation rather than wrestling with the complexities of unstructured data management, ultimately accelerating the path to AI-driven business transformation.

To learn more about how Securiti and Databricks can help your organization, visit Securiti's Gencore AI website.

Analyze this article with AI

Prompts open in third-party AI tools.

More Stories that May Interest You

At Securiti, our mission is to enable organizations to safely harness the incredible power of Data & AI.

Hey AI, learn about us

Newsletter

Company

Resources

Terms

Get in touch

info@securiti.ai
Securiti, LLC.
3155 Olsen Drive
Suite 325
San Jose, CA 95117

Frost & Sullivan Most Innovative DSPM Leader

Products
Back
Secure Data+AI anywhere

Data Security Posture Management
Secure sensitive data everywhere from hybrid multicloud to SaaS

View

Agent Commander
Detect AI risk. Protect AI systems. Undo AI mistakes.

View

Security for AI Agents and Copilots
Ensure robust data security controls to accelerate Agentic AI adoption across the enterprise.

View

Data Minimization
Automate Data Minimization: Reduce Cost, Risk & Accelerate Trusted AI at scale

View

Data Discovery & Classification
Discover shadow and cloud-native assets and accurately classify data

View

Compliance Management
Assess & improve compliance with security best practices frameworks

View

Breach Impact Analysis
Analyze breach impact & automate notifications to affected individuals

View

Data Flow Governance
Understand data lineage and secure real-time streaming data

View

Data Access Intelligence & Governance
Monitor user access to data and enforce least privilege controls

View
Build safe enterprise AI systems

Safe Enterprise AI Copilots
Implement rule-aware AI copilots across your organization’s data anywhere

View

Data Vectorization and Ingestion
Extract info from complex Unstructured Files, convert it into AI-ready formats, and sync to vector databases

View

Data Curation and Sanitization for AI
Transform raw, unstructured files into data ready for model training and tuning

View

Context-aware LLM Firewalls
Protect AI interactions with intelligent retrieval, response, and prompt firewalls

View

Unstructured Data Governance
Manage and govern unstructured data to enable its safe use with generative AI

View
Govern data for safe innovation

Data Discovery & Classification
Discover shadow and cloud-native assets and accurately classify data

View

Unstructured Data Governance
Manage unstructured data to enable safe use with generative AI

View

Data Access Governance
Monitor sensitive data access and prevent unauthorized use

View

AI Governance
Establish controls for safe adoption of AI technologies including GenAI

View

Data Catalog
Enable users to easily find, understand, trust and access the data they need

View

Data Lineage
Automatically track changes and transformations of data throughout its lifecycle

View

Data Quality
Conduct data quality checks and validation across various data types

View
Automate data privacy operations

Data Mapping Automation
Manage your entire data mapping lifecycle and automate RoPA reports

View

AI Governance
Comply with emerging AI regulations and ensure safe use of AI

View

Data Subject Request Automation
Automate entire DSR lifecycle from consumer request intake to secure report delivery

View

Assessment Automation
Automate your entire assessment lifecycle and demonstrate compliance

View

Compliance Management
Use automation to audit and improve compliance with global regulations and industry standards

View

Consent Management
Manage your first-party and third-party consent lifecycle from scanning to reporting

View

Mobile App Consent Management
Seamlessly track and manage user consent with your mobile app, get compliant with all major global regulations.

View

Breach Management
Automate your incident management and optimize notifications to users & regulatory bodies

View

Privacy Center
Elegant Consumer Frontend, Fully Automated Backend, Privacy Regulation Intelligent Everywhere

View
Solutions
Back
GCP
View

AWS
View

Databricks
View

Snowflake
View

Azure
View

+ More
View
Finance
View

Healthcare
View

Telecom
View

Retail
View

Travel & Hospitality
View
CDMC
View

EU AI Act
View

OWASP
Mitigate AI Security Risks with the Broadest Coverage of OWASP Top 10 for LLMs

View

NIST AI RMF
View

European Union GDPR
View

California's CPRA
View

Brazil's LGPD
View

Canada's PIPEDA
View

China's PIPL
View

+ More
View
Data+AI Builders
View

Data Security
View

Data Privacy
View

Data Governance
View

Marketing
View
Resources
- Blog
  
  View
- Collateral
  
  View
- Knowledge Center
  
  View
- Securiti Education
  
  View
- Webinars
  
  View
Company
- About Us
  
  View
- Partner Program
  
  View
- Contact Us
  
  View
- News Coverage
  
  View
- Press Releases
  
  View
- Careers
  
  View

Please enter a minimum of 3 characters to begin your search.

Type

Videos

March 9, 2026

Rehan Jalil, Veeam on Agent Commander : theCUBE + NYSE Wired: Cyber Security Leaders

Following Veeam’s acquisition of Securiti, the launch of Agent Commander marks an important step toward helping enterprises adopt AI agents with greater confidence. In...

January 20, 2025

Mitigating OWASP Top 10 for LLM Applications 2025

Generative AI (GenAI) has transformed how enterprises operate, scale, and grow. There’s an AI application for every purpose, from increasing employee productivity to streamlining...

January 15, 2025

Top 6 DSPM Use Cases

With the advent of Generative AI (GenAI), data has become more dynamic. New data is generated faster than ever, transmitted to various systems, applications,...

January 2, 2025

Colorado Privacy Act (CPA)

What is the Colorado Privacy Act? The CPA is a comprehensive privacy law signed on July 7, 2021. It established new standards for personal...

December 24, 2024

Securiti for Copilot in SaaS

Accelerate Copilot Adoption Securely & Confidently Organizations are eager to adopt Microsoft 365 Copilot for increased productivity and efficiency. However, security concerns like data...

November 1, 2024

Top 10 Considerations for Safely Using Unstructured Data with GenAI

A staggering 90% of an organization's data is unstructured. This data is rapidly being used to fuel GenAI applications like chatbots and AI search....

October 29, 2024

Gencore AI: Building Safe, Enterprise-grade AI Systems in Minutes

As enterprises adopt generative AI, data and AI teams face numerous hurdles: securely connecting unstructured and structured data sources, maintaining proper controls and governance,...

August 12, 2024

Navigating CPRA: Key Insights for Businesses

What is CPRA? The California Privacy Rights Act (CPRA) is California's state legislation aimed at protecting residents' digital privacy. It became effective on January...

June 3, 2024

Navigating the Shift: Transitioning to PCI DSS v4.0

What is PCI DSS? PCI DSS (Payment Card Industry Data Security Standard) is a set of security standards to ensure safe processing, storage, and...

January 29, 2024

Securing Data+AI : Playbook for Trust, Risk, and Security Management (TRiSM)

AI's growing security risks have 48% of global CISOs alarmed. Join this keynote to learn about a practical playbook for enabling AI Trust, Risk,...

Spotlight Talks

Spotlight

Future-Proofing for the Privacy Professional

Watch Now View

Spotlight 50:52

From Data to Deployment: Safeguarding Enterprise AI with Security and Governance

Watch Now View

Spotlight 11:29

Not Hype — Dye & Durham’s Analytics Head Shows What AI at Work Really Looks Like

Watch Now View

Spotlight 11:18

Rewiring Real Estate Finance — How Walker & Dunlop Is Giving Its $135B Portfolio a Data-First Refresh

Watch Now View

Spotlight 13:38

Accelerating Miracles — How Sanofi is Embedding AI to Significantly Reduce Drug Development Timelines

Watch Now View

Spotlight 10:35

There’s Been a Material Shift in the Data Center of Gravity

Watch Now View

Spotlight 14:21

AI Governance Is Much More than Technology Risk Mitigation

Watch Now View

Spotlight 12:!3

You Can’t Build Pipelines, Warehouses, or AI Platforms Without Business Knowledge

Watch Now View

Spotlight 47:42

Cybersecurity – Where Leaders are Buying, Building, and Partnering

Watch Now View

Spotlight 27:29

Building Safe AI with Databricks and Gencore

Watch Now View

Latest

April 8, 2026

Building Sovereign AI with HPE Private Cloud AI and Veeam Securiti Gencore AI

How HPE Private Cloud AI, NVIDIA acceleration, and Veeam Securiti Gencore AI support secure, governed enterprise AI with policy enforcement across RAG, assistant, and agentic workflows.

March 30, 2026

Securiti.ai Names Accenture as 2025 Partner of the Year

In a continued celebration of impactful collaboration in DataAI Security, Securiti.ai, a Veeam company, has honored Accenture as its 2025 Partner of the Year....

February 23, 2026

Largest Fine In CCPA History: What The Latest CCPA Enforcement Action Teaches Businesses

Businesses can take some vital lessons from the recent biggest enforcement action in CCPA history. Securiti’s blog covers all the important details to know.

February 19, 2026

AI & HIPAA: What It Means and How to Automate Compliance

Explore how the Health Insurance Portability and Accountability Act (HIPAA) applies to Artificial Intelligence (AI) in securing Protected Health Information (PHI). Learn how to...

April 21, 2026

Opt-Outs That Stick: Consent Withdrawal Across Marketing, SaaS & GenAI

Securiti's whitepaper provides a detailed overview of various consent withdrawal requirements across marketing, SaaS, and GenAI. Read now to learn more.

April 17, 2026

The Hidden Privacy Cost of Shadow AI & Shadow Data

Download the whitepaper to discover the risks of Shadow AI and Shadow Data, why traditional controls fail, and how to build proactive, scalable AI...

April 7, 2026

Agent Commander: Solution Brief

Learn how Agent Commander detects AI agents, protects enterprise data with runtime guardrails, and undoes AI errors - enabling secure, compliant AI adoption at...

March 31, 2026

Compliance with CCPA Amendments with Securiti

Stay compliant with 2026 CCPA amendments using Securiti, covering updated consent requirements, expanded sensitive data definitions, enhanced consumer rights, and readiness assessments.

February 18, 2026

Take the Data Risk Out of AI

Learn how to prepare enterprise data for safe Gemini Enterprise adoption with upstream governance, sensitive data discovery, and pre-index policy controls.

December 22, 2025

Navigating HITRUST: A Guide to Certification

Securiti's eBook is a practical guide to HITRUST certification, covering everything from choosing i1 vs r2 and scope systems to managing CAPs & planning...

5 Ways to Accelerate Unstructured Data Cleansing for AI with Securiti and DataBricks

The Unstructured Data Challenge

1. Complex and Manual Preprocessing Requirements

2. Granular Permission Management is Cumbersome

3. Security and Compliance Risks in Data Sharing and Rapid Deployment

4. Feature Extraction and Structuring Overhead

5. Query Performance and Storage Management Challenges

How Securiti Expands Solutions to Unstructured Data Challenges

1. Simplified Unstructured Data Ingestion

2. Automated Data Sanitization and Protection

3. Advanced Data Security & Governance

4. Enhanced Unity Catalog Intelligence

5. The Securiti Data Command Graph: A Game-Changer for Databricks

Conclusion: Unlocking the AI adoption with Securiti and Databricks

Analyze this article with AI

Spotlight Talks