Securiti AI Launches Context-Aware LLM Firewalls to Secure GenAI Applications

View

What to Know About Saudi Arabia’s Generative AI Guidelines

By Anas Baig | Reviewed By Salma Khan
Published March 26, 2024

1. Introduction

Recently, the Saudi Data & Artificial Intelligence Authority (SDAIA) released the Generative Artificial Intelligence Guidelines. The primary purpose of this document is to help organizations that are developing and using GenAI models and systems understand the various risks commonly associated with their development, design, and deployment.  

Additionally, the Guidelines contain an extensive overview of various aspects related to the use of Generative Artificial Intelligence (GenAI) tools, including the potential impact it can have on various sectors and services. It aims to increase awareness related to the significance of GenAI tools and the role they can play in a sustainable and beneficial future if leveraged responsibly.

Read on to learn more about the Guidelines, including the key risks highlighted as well as the best practices to mitigate each of them.

2. Who Are These Guidelines For

a. Material Scope

This document provides sufficient guidance for both developers and users of Generative AI models and systems.

b. Territorial Scope

This guide applies to all entities involved in designing, developing, deploying, implementing, using, or in any form affected by GenAI models and systems within Saudi Arabia.

3. Definitions of Key Terms

Here are some key definitions provided within the Guidelines:

a. Generative Artificial Intelligence

This refers to a machine learning model capable of producing new examples from its training dataset. These new examples can include text, images, sounds, icons, videos, and various other forms of media. Such models can automate several human tasks that mimic cognitive abilities, such as responding to and formulating verbal or written commands, learning, and problem-solving.

b. Users

This refers to any natural or legal persons that use products or services rendered by Generative AI systems. These include all relevant stakeholders, such as companies, NGOs, and individuals.

c. Developers

This refers to any natural or legal persons involved in developing products or services rendered by Generative AI systems. These include AI developers, data scientists, and researchers.

4. Guidelines for Developers & Users

When interacting with GenAI models and systems, developers, and users must adhere to certain guidelines throughout the model or system’s lifecycle. Doing so maximizes potential benefits and mitigates potential risks.

These guidelines were developed as part of the AI Ethics Principles released by the SDAIA on  September 14, 2023, that are applicable to all AI systems, not just GenAI:

a. Fairness

All relevant stakeholders are required to undertake appropriate actions to mitigate bias, discrimination, or stigmatization against individuals, communities, or groups in the design, development, deployment, and use of all such GenAI systems. To ensure the creation of systems that are both fair and representative, all GenAI systems should be trained on data that is thoroughly filtered for bias and has adequate representation of minority groups.

To that end, developers and users must:

  • Test all GenAI models to ensure no bias is embedded in the code or algorithm by only using training datasets that are clean of any such bias;
  • Have a comprehensive understanding of the training data, including its origins, content, selection criteria, and how it was prepared;
  • Increase their knowledge on bias, diversity, inclusion, anti-racism, values, and ethics to improve their ability to recognize biased or discriminatory content.

b. Reliability & Safety

All relevant stakeholders must ensure that their GenAI systems follow a set of predetermined specifications and behave in the manner intended. Doing so not only promotes the reliability of the system but assures that the system does not pose a risk of harm or danger to society and individuals.

To that end, developers and users must:

  • Design and develop systems that can withstand the wide instances of uncertainty, instability, and volatility it may face;
  • Develop a corresponding set of standards and protocols that appropriately assess the reliability of the GenAI system;
  • Have internal triggers within the systems that lead to human oversight in cases of emergency;
  • Be able to identify AI-generated content to ensure users can cross-check all such content against independent sources;
  • Ensure the continuous quality of the training dataset to guarantee the reliability of all content produced;
  • Review all content for factual accuracy and relevance to prevent any spread of misinformation.

c. Transparency & Explainability

All relevant stakeholders must ensure that GenAI systems are built to be appropriately explainable and offer adequate information related to how automated decisions are made. Doing so would allow for timely interventions if automated decisions are less than optimal.

To that end, developers and users must:

  • Clearly communicate to the public when GenAI capabilities are used;
  • Inform the public when content is AI-generated;
  • Offer appropriate channels for the public to opt for alternatives to automated communication channels;
  • Implement tools that help the public identify AI-generated content, such as watermarks.

d. Accountability & Responsibility

All relevant stakeholders, such as the designers, vendors, procurers, developers, owners, and assessors of GenAI systems, are ethically responsible and liable for the decisions made by that GenAI system that negatively affect individuals and communities.

Furthermore, GenAI adoption carries several legal and ethical implications, such as the risk of infringing intellectual property rights, human rights violations, and potential data privacy violations.

To that end, developers and users must:

  • Ensure all training data is properly acquired, classified, processed, and accessible to facilitate human intervention if needed;
  • Have consistent data quality checks to validate the integrity of data;
  • Ensure all GenAI systems developed appropriately adhere to relevant regulations such as the Personal Data Protection Law and Intellectual Property Law;
  • Regularly consult with legal professionals to assess and eliminate all major risks associated with deploying GenAI systems.

e. Privacy & Security

All relevant stakeholders must ensure that GenAI systems being developed appropriately protect the privacy of the data they collect. This can only be achieved by leveraging adequate data security measures to prevent data breaches that may lead to various types of harm.

To that end, developers and users must:

  • Adopt appropriate privacy and security measures when using sensitive or classified data to train GenAI systems;
  • Deploy data protection mechanisms as required by relevant regulations;
  • Assess all risks presented by the use of GenAI systems per the AI Ethics Principles;
  • Carry out regular privacy impact and risk assessments to re-evaluate social and ethical considerations as required;
  • Deploy privacy and security by design framework to protect any developed GenAI system from malicious attacks.

f. Humanity

All relevant stakeholders must ensure that they deploy ethical methodology when developing GenAI models that generate a positive impact for individuals and communities in both the short and long term. Such GenAI models should adopt a human-centric design that facilitates human choice and determination.

g. Social & Environmental Benefits

All GenAI models must have social and environmental priorities that benefit both individuals and communities. GenAI systems must not cause or accelerate harm to humans but contribute to empowering social and environmental progress.

5. GenAI Risks & Mitigation Measures

Opportunities and challenges, GenAI presents both for developers and users. Here are some of the most immediate and pressing risks of GenAI, as well as the recommended mitigation measures for each of them:

a. Deepfakes & Misrepresentation

GenAI capabilities can be leveraged for nefarious purposes such as scams, financial fraud, blackmail, and sophisticated identity theft. Furthermore, digitally available data such as videos and pictures of individuals can be used and manipulated to create fake digital representations (i.e., deepfakes). These not only cause serious personal harm to individuals but have the potential to cause political and social unrest in some instances.

The best strategies to counter this threat include:

i. Watermark Implementation

All content generated via GenAI models and systems should carry a watermark indicating so. Such data can then be flagged if used in other GenAI models and systems and be appropriately dealt with per the relevant regulations.

ii. KYC Protocols

Most deepfake technology requires intensive computational power. Hence, organizations that provide the hardware for it, such as Graphics Processing Units (GPUs) and/or Tensor Processing Units (TPUs), should have extensive use verification processes. Doing so will make it easier to mark the potential creators of deep fakes and make it harder for malicious actors to access these tools in the first place.

iii. Output Verification

All generated content via GenAI models and systems must be thoroughly analyzed for inappropriate audio and visual content. Additional checks, such as using public figures' faces and voice samples, can also be performed.

iv. Better Digital Literacy And Online Safety

Both public and private organizations must take proactive measures to appropriately inform, educate, and convince the general public regarding the potential risks and dangers of deep fakes, as well as what data protection measures they can take to minimize the chances of their data being used to generate any such content. Furthermore, information on how they can help identify all such content online will be helpful in limiting the spread of all such content.

b. Safety Threats

GenAI tools can be used to generate a wide range of content. Without appropriate filtration mechanisms, it can create content that can be used to compromise public safety and security. Some ways to counter such threats include:

i. Content Moderation & Filtration

It is necessary to ensure that the content generated by GenAI models and systems is in line with the content guidelines of the organization providing the service. Mechanisms should be built in that can appropriately detect, flag, and prevent the spread of any malicious content generated by such systems. One example of such a mechanism would be thorough filtration of both the user prompts and the generated outputs. Regular updates should be made to such mechanisms based on customer feedback and rigorous internal testing.

ii. Dataset Filtration

Like the generated content, all training data should also be thoroughly filtered for any information present that could be used to influence the generated output in any way.

iii. Closed Access Models

While open-access models and channels can be helpful in acquiring several datasets, they can pose problems of their own since it is nearly impossible to verify the source of such models and datasets. Hence, wherever possible, developers should only be allowed to use models if the developer personally verifies their safety as well as the stated objectives for use.

c. Misinformation & Hallucination

GenAI models and systems can be used to create false information that seems factual. This is known as “AI Hallucination,” where the model has somehow been tricked into generating false outputs. The best ways to counter this include:

i. Content Verification & Citation

Similar to academic data, all publicly available GenAI models and systems should have built-in mechanisms to estimate the accuracy of the information they provide. By requiring all such systems to display the citation for the data used to generate content, users can cross-reference the accuracy of the generated content for themselves. Furthermore, such a mechanism allows for a higher level of accuracy as well as proper sources used to generate content.

ii. Content Labeling

Watermarks can be leveraged in this instance, too. Each piece of content generated by GenAI models and systems should contain some indication that such models and systems generated it. These can be specified based on the kind of content being generated.

iii. User Vigilance

Users must take a proactive role themselves in verifying the accuracy of the content generated by GenAI. Carefully reviewing for information, such as Content Labeling or cross-referencing them with reliable sources online, is an excellent way to start.

iv. Better Awareness

Users should be aware of the high degree of variance in the accuracy of the outputs generated by such GenAI models and systems. Hence, they should be educated about the standard fact-checking and content-verification processes to review all GenAI-generated content for themselves.

d. Data Breaches

Often, users may involuntarily expose sensitive information online. The same holds true when these users use GenAI tools. Employees of an organization using such tools can lead to severe repercussions for their organizations as any sensitive organizational information they expose on these tools can be exposed to third parties, which can be competitors or, worse, malicious actors. To counter this, organizations can take the following steps:

i. Usage Protocols

Organizations must implement a strict and uncompromising stance related to employees entering classified information into any such third-party GenAI models and tools. Ideally, there should be a set of guidelines that instruct all employees on best content generation and use practices, as well as steps they can take to support the ethical use of this technology. Appropriately enforced, this can minimize the chances of an internal lapse of policy leading to a significant data breach.

ii. Employee Training

Designing and enforcing usage policies is a good start. Still, an organization must take all necessary steps to ensure that its employees understand the critical necessity of these policies. Employee training sessions and workshops that cover several other aspects of the GenAI model and system usage, such as broader legal mandates encompassing data governance, cybersecurity, government data classification, personal data protection, intellectual property (IP) rights, and other pertinent legal or policy areas, would be the best way to approach this.

iii. Access Controls

Users’ or employees’ access to critical information should be limited, with access privileges granted based on their designation or necessity of access to that take.

e. Certification Fraud

In a more academic setting, GenAI systems and models are being increasingly used in human certification processes, such as professional exams and evaluations. This can severely undermine the reliability and credibility of these exams and poses a societal threat as it can be harder to detect with the standard anti-plagiarism tools available on the market. Organizations can undertake the following steps to counter this:

i. Better Assessments

Malicious actors can leverage GenAI tools to amplify and refine their preexisting tactics. This underscores the importance of organizations having a thorough review process in place for all their assessments. These assessments and the mechanisms involved in conducting such assessments should be adjusted based on evolving technological capabilities and methodologies.

To that end, organizations must ensure their personnel have the relevant skills, knowledge, and awareness to counter any potential misuse of GenAI tools. Doing so on a regular basis can help organizations maintain the integrity and security of these assessments.

ii. Educate & Train

Thorough internal training sessions must be conducted to promote the responsible use of GenAI tools while also being leveraged to identify potentially suspicious behaviors that indicate fraud or abuse of the tool.

iii. Query Proficiency

Developers and users should be trained on prompts and how to draft prompts that can deliver optimal outputs without resulting in some of the content-related issues discussed above.

iv. Clear Guidelines

Each organization must deliberate and develop appropriate guidelines for using GenAI tools that align with its ethical principles and code of conduct.

f. Intellectual Property Infringement & Protection

GenAI models and systems are trained on a wide array of content. Naturally, this has resulted in heightened cases of unauthorized use of copyrighted or IP material. The following steps can help minimize such occurrences:

i. IP Licensing & Due Diligence

AI developers must be required to seek both permission and legal licenses to use any IP in the training datasets for their GenAI models and systems. Doing so can establish a chain of accountability that can help limit the use of protected content without appropriate permissions. Furthermore, users must be given the chance to request confirmation if the GenAI tools they’re using were trained with IP.

ii. Creator Permissions

An extension of IP Licensing, GenAI model, or system developers must have permission from the original IP holder before using such data within their datasets. If this cannot be done, developers can establish a compensation mechanism that can fairly compensate the original creators every time their content is used in training datasets.

g. Variability of Outputs

Unlike traditional programmable services, GenAI models and systems do not solely rely on pre-defined algorithms to generate outputs. Furthermore, these tools can change the way they develop and generate outputs based on updates and internal changes. Hence, to counter any issues arising from such situations, organizations can take the following steps:

i. Clear Indication of AI-Generated Code

Developers responsible for developing an organization’s internal code stack should include detailed and precise annotations within the source code whenever they use tools such as ChatGPT and GitHub’s Copilot to indicate that such code was developed with the help of these tools. Doing so not only increases the overall transparency but also ensures the maintenance of the code’s overall quality.

ii. Verify & Validate

All AI-generated code and other relevant content must be regularly and thoroughly verified, especially in instances where accuracy is of prime importance.

iii. Stay Informed

Developers must strive to remain updated on the latest developments related to GenAI models and systems. This helps them stay on top of any changes that might occur within these tools that affect the way they use and interact with these tools.

iv. Build Expertise

Developers should develop a thorough understanding and expertise related to the use of GenAI. Such expertise can prove vital when assessing the validity of AI-generated outputs.

6. How Securiti Can Help

Securiti is the pioneer of the Data Command Center, a centralized platform that enables the safe use of data and GenAI. It provides unified data intelligence, controls, and orchestration across hybrid multi-cloud environments. Large global enterprises rely on Securiti's Data Command Center for its various data security, privacy, governance, and compliance solutions.

With the Data Command Center, organizations will gain access to several modules and solutions that will enable seamless integration and compliance with the policies and measures suggested in this guide.

The assessment automation module automates your records of processing (RoPA) reports, privacy impact assessments, and data protection impact assessments. The data quality module provides technical information about the data while also maintaining a list of business rules that have been applied to data. Additionally, data classification allows for the detection of sensitive files, such as medical documents and financial documents, with automated labeling to track and enforce security policies for all such files.

Similarly, several other modules that are a part of the Data Command Center can be leveraged to counter the risks identified in this guide.

Request a demo today and learn more about how Securiti can help you comply with any data and AI-related regulations from Saudi Arabia.

Join Our Newsletter

Get all the latest information, law updates and more delivered to your inbox

Share


More Stories that May Interest You

What's
New