The Top 25 MCP vulnerabilities recently published put a hard spotlight on protocol-level issues. That's good. They show how easy it is to break agents at the tool and protocol layer.
At the same time, the community is building solid maps and frameworks that help us think about the bigger picture:
- OWASP Top 10 for LLMs: the foundation. Covers prompt injection, insecure output handling, data poisoning. The important point: most of these risks don't disappear when you add autonomy → they multiply.
- OWASP Agentic AI Threats & Mitigations: extends the map for agents: memory manipulation, orchestration risks, alignment faking. It shows how the attack surface grows when models start taking actions.
- Precize Agentic AI Top 10: a practitioner catalog on GitHub. Useful because it frames risks in very operational terms: authorization hijacking, impact chains, dependency attacks.
- CSA MAESTRO: a threat modeling framework for multi-agent systems. Important because it forces teams to map flows and trust boundaries across many agents, not just one.
- SP 800-53 AI Control Overlays: still evolving, but will standardize how enterprises think about risk in autonomous and multi-agent AI. When NIST moves, regulators and auditors usually follow.
These are super valuable materials.
BUT, after talking with many teams recently (CISOs, security teams, AI engineers, devs), and seeing real-world cases (some were elaborated in my previous posts), it keeps coming back to three failure modes that deserve special attention.
They're not always obvious, not easy to detect, and often hidden inside "allowed" or "safe-looking" behavior.
And yet, they're already creating issues in real environments.
Failure #1: Allowed-but-Wrong
The agent does something bad without breaking a rule.
- A tool call that's technically permitted, but pulls the wrong dataset, causes internal DoS, or overwrites data we didn't intend to.
- A permitted chain of actions, but stuck in a flow that stays below thresholds (e.g. excessive token spend) while creating unintended outcomes.
- Steps that individually look safe, but when chained, lead to data exfiltration.
This is why CISOs shouldn't trust guardrails alone. Guardrails catch violations. They don't catch misuse inside the rules.
Failure #2: Behavioral Drift
Agents don't stand still.
- Models get updated, behavior shifts.
- Third-party MCP servers update silently.
- Memory accumulates (or is slowly poisoned).
We've seen this in practice: an agent that was safe on day one starts acting differently a week later. Nothing "broke", but the system quietly slid into unsafe behavior.
Failure #3: Static Risk Blindness
Classic risk programs assume static assets and fixed procedures. Agents don't live there. They change at runtime: new model versions, updated remote MCP servers, new memory, new data paths.
The question isn't "what's the risk right now?" It's:
- Where can this agent drift next?
- How likely is this agent to behave differently?
- What's the blast radius if it does?
- How fast will we see it and stop it?
If your controls only score today's snapshot, you're measuring the wrong thing.
An agent that looks "low risk" on paper can flip after a dependency update or a quiet memory shift.
Example: a research agent rated low-risk gets a tool update that adds write scopes "for convenience." A week later, it "tidies" a shared drive and wipes useful logs. No rule was broken; your model was static while the system moved.
Bottom line: risk for agents is a trajectory, not a point-in-time score. You need runtime awareness: drift baselines, sequence guards, and alerts tied to behavior change, not just asset inventories and control checklists.
What CISOs should do
- Map agentic workflows: data, tools, scopes. It's not about a single Python script running one agent. It's about micro-services running modular agentic workflows, at scale.
- Lock down tools: least privilege, scoped registries.
- Detect drift: baseline behavior, snapshot memory, monitor changes.
- Guard sequences, not just steps: policies that ban dangerous combinations.
- Trace lineage: prompt → memory → tool → output.
- Harden the supply chain: signed updates, trusted registries.
- Red team it: simulate drift, poisoning, and chaining attacks.
Final word
The catalogs and frameworks are important, and they're the right starting point.
But what we keep seeing in the field are three failure modes: Allowed-but-Wrong, Drift, and Static Risk Blindness, that slip through unnoticed.
They hide inside "safe" behavior. They're already creating incidents. And they don't show up in guardrails or static audits.
If you're running agents today (or plan to), don't forget to design with these three in mind.
Originally published on LinkedIn
View original article