AI Security Risks, by Model Type

By Cindy Kaplan

Every industry is adopting AI. However, security risks are increasing just as rapidly. Different AI models can fail in dramatically different ways. The type of model you choose to adopt – LLM? Agent? RAG? Something else? – strictly dictates your risk exposure. Get straight to the point with this primer on AI security risks by model type, illustrated with real-world incidents and research, linked to applicable enterprise controls.

Large Language Models (LLMs)

Likelihood: High. ChatGPT, Microsoft Copilot, and similar LLM-based tools have been widely adopted across many enterprises to augment productivity and enable automation.

Why they’re vulnerable: LLMs take requests in natural language, allowing them to do anything a user asks (within reason), making them extremely versatile but also susceptible to manipulation.

Key Risks

Prompt injection/jailbreak attacks
Hallucinated/false outputs
Extraction of sensitive data included in user prompts

Real-world Examples

Engineers at Samsung Electronics inadvertently leaked sensitive code into ChatGPT (Bloomberg, 2023).
Users could have had their chat titles and metadata about billing exposed through a ChatGPT bug (OpenAI, 2023).
Proof of concept for prompt injection attacks via Microsoft Copilot (Microsoft Security Research, 2024).

Mitigations in the Enterprise

Layers that filter out risky output/input
Data loss prevention (DLP) solutions
Prevent sensitive data from entering LLMs in the first place
Manual review of outputs that could be risky/harmful

Fine-Tuned / Domain-Specific LLMs

Likelihood: High.

These are LLMs that have been trained (fine-tuned) on private or regulated data (examples: Legal precedent or regulation, healthcare/medical data, or internal corporate/business knowledge for AI agents).

Key concerns

Data poisoning attacks in the fine-tuning process
Latent backdoors triggered by inputs
Inappropriate levels of confidence in regulated industries

Research findings

The dataset can be poisoned to learn latent behaviors (Stanford research group)
Uncertainty around persistence of unsafe fine-tuning practices (Research from the OpenAI team)

How enterprises can address

Vetting datasets & providing data provenance
Testing against adversarial inputs prior to release
Monitoring for aberrant behaviors

Agent/ Autonomous AI Systems

Likelihood: Very High

Agents combine LLMs with tools(APIs, browser, other internal tools) to take actions—not just create output.

Top Risks

Unauthorized actions (copying sensitive data, running commands)
Indirect prompt injection through web pages/documents
Chain of attacks across multiple tools

Research Demonstrations

Carnegie Mellon Researchers demonstrated indirect prompt injection attacks from the web.
Microsoft Research proved agents could be tricked into exfiltrating sensitive data.

How Enterprises Can Address Risks

Strict permissioning of what tools agents have access to (apply least privilege)
Sandboxed execution environment
Human-in-the-Loop for actions
Logs/audit trails of agent behavior

Retrieval-Augmented Generation (RAG) Applications

Likelihood: Medium Severity: Medium

LLMs paired with company knowledge bases.

Key Risks

Inclusion of sensitive documents
Poisoning company knowledge bases
Injection via stored documents

Research Findings

Researchers from MIT and NVIDIA demonstrate taking control of generated outputs with engineered documents.

Mitigations in Use

Permissions on knowledge base document access
Data sanitization routines
Detection for anomalous retrieval queries

Open-Source LLMs

Risk level: High (if unmanaged)

Allows freedom and control, but places the onus on the organization for security.

Primary concerns

Tampered with or otherwise malicious model weights
Absence of safety guardrails
Absence of patching/updating

Analogous incidents

The SolarWinds supply chain attack propagated compromised components to numerous organizations.

Mitigations for Enterprises

Authenticate model weights (checksum, signatures, etc.)
Use trusted source repositories (Hugging Face, etc.) carefully
Retain internal model registries
Utilize security scanning on ML pipelines

Closed-Source / API-Based Models

Risk level: Medium

Models that are operated by a vendor and accessed through APIs.

Top risks

Sharing data with third parties
Vulnerabilities on the vendor’s side
Limited transparency into the application

Known issue

Leaked via ChatGPT in 2023, prompting concerns about privacy in multi-tenant AI. https://www.theregister.com/2023/08/29/openai_llm_data_found_on_gpt/

Enterprise mitigations

Conduct vendor risk assessments
Employ data minimization practices
Encryption in transit and at rest
Contractual mechanisms (SLAs, data rights, etc.)

Computer Vision Models

Potential Severity: Medium

Applications: Surveillance systems, biometric systems, autonomous vehicles.

Top threats

Adversarial examples
False negatives/positives in security or safety-critical systems
Privacy

Research highlight

Researchers at Google showed how adversarial examples can be used to cause stop signs to be misclassified.

What companies can do

Test for adversarial robustness
Use other sensors to verify the computer vision-based analysis
Use humans in the loop for critical decisions

Reinforcement Learning (RL) Agents

Risk rating: High

RL agents learn to take actions that maximize a reward signal.

Primary concerns

Reward hacking
Uncertain, unsafe emergent behavior
Overfitting to the environment

Research illustration

DeepMind reported agents game-playing rather than solving desired problems.

Business mitigations

Auditing the reward function
Testing via simulation stress-tests
Incorporating constraints and safety layers

Rule-Based / Symbolic AI

Risk Level: Low

The classic deterministic models from way back when.

AI Security Risk Comparison Summary

COMPANY CATEGORY	BREACH	THIRD PARTY
LLMs	High	Prompt injection, data leakage
Fine-tuned LLMs	High	Backdoors, poisoned data
Agentic AI	Very High	Unauthorized actions
RAG Systems	Medium	Data exposure
Open-source LLMs	High	Supply chain risk
Closed APIs	Medium	Vendor exposure
Computer Vision	Medium	Adversarial inputs
RL Agents	High	Reward hacking
Rule-based AI	Low	Predictability

Key Enterprise Takeaways:

The riskiest systems allow autonomy, external access, and minimal controls all at once.
The riskiest systems based on real-world incidents have been LLM data leakage and prompt injection.
Open-source AI also brings software supply chain-style risks.
The safest AI systems are the narrowest, most controlled ones that are siloed from sensitive data.

In Conclusion

AI security doesn’t revolve around one control. It’s a layered discipline that requires companies to assess the right model for the right type of data with the right set of safeguards in place.

The conversation is shifting from “Is AI risky?” to “ Which AI model has what risk? And how are you controlling it?”

Review Your AI Security and Risk Posture

Review Your CoPilot Security Position

Review Your CCPA Privacy Risk

Read more AI (Artificial Intelligence) Risk Insights

References

Bloomberg News. “Samsung Bans Staff Use of ChatGPT After Leak of Sensitive Code.” Bloomberg. April 2023.
OpenAI. “March 20 ChatGPT Outage: Here’s What Happened.” OpenAI Blog. March 2023.
Microsoft Security Research. “Prompt Injection Attacks Against Large Language Models.” Microsoft. 2024.
Stanford University. “Data Poisoning Attacks on NLP Models.” Stanford HAI.
Carnegie Mellon University. “Indirect Prompt Injection Attacks on AI Systems.” CMU Research.
MIT. “Security Risks in Retrieval-Augmented Generation Systems.” MIT CSAIL
NVIDIA. “Securing Retrieval-Augmented Generation Pipelines.” NVIDIA Technical Blog.
U.S. Cybersecurity and Infrastructure Security Agency. “SolarWinds Supply Chain Compromise.” CISA.
Google Research. “Adversarial Examples in Machine Learning.” Google AI Blog.
DeepMind. “Specification Gaming: The Flip Side of AI Ingenuity.” DeepMind Blog.