Cross‑Session "CIK" Poisoning Exposes Systemic Vulnerability in Personal AI Agents

What the paper found

A multi‑institution team — including UC Santa Cruz, National University of Singapore, Tencent (腾讯), ByteDance (字节跳动), UC Berkeley and UNC Chapel Hill — has reported on arXiv a worrying, practical attack class against deployed personal AI agents. The paper, "Your Agent, Their Asset: A Real‑World Safety Analysis of OpenClaw," tested a live OpenClaw (中文圈戏称“龙虾”) instance tied to real Gmail and Stripe APIs and found that persistent‑state poisoning across sessions can turn even the most “secure” large models into attack vectors. It has been reported that OpenClaw reportedly has over 220,000 public instances and that the official ClawHub plugin repository contained more than 800 malicious skills as of March 2026 — a scale that helps explain why real‑world tests matter.

How the attacks work

The authors formalised the attack surface as CIK — Capability, Identity, Knowledge — the persistent files that define what an agent can do, who it is, and what it remembers. Their two‑stage workflow is simple: Phase 1, inject malicious content into those files; Phase 2, trigger the poisoned behavior in a later session. The team ran 88 test cases across Claude Sonnet 4.5 and Opus 4.6 (Anthropic), Gemini 3.1 Pro (Google) and GPT‑5.4 (OpenAI). Results were stark: model attack success rates jumped dramatically after poisoning — Opus 4.6 rose from a baseline ~10% to ~44.2% on average — and Knowledge poisoning was the most effective (average ASR 74.4%). Capability injection succeeded at installation 100% of the time because executable skills run on the host outside the model’s reasoning chain.

Concrete risks and limited mitigations

The paper illustrates three chilling scenarios: a forged memory that makes an assistant auto‑refund Stripe charges; a falsified backup address in user identity files that convinces an agent to upload API keys to an attacker; and a benign‑looking skill whose script quietly deletes the agent’s workspace. What can defenders do? The team tested mitigations — guarded review skills, confirmation prompts, and file‑protection locks — but none offer a clean win. A heavy‑handed lock blocks malicious writes but also neuters the agent’s ability to learn and adapt. Even with an audit skill in place, Capability attacks still had high success rates (reportedly ~63.8% in some tests).

Why this matters beyond one platform

This is not just an OpenClaw problem. The authors stress the vulnerability is architectural: models from different vendors behave similarly when their persistent state can be modified. Against a backdrop of export controls, geopolitical scrutiny of AI supply chains and growing regulatory attention in both China and the West, these results have cross‑border implications. Policymakers, platform operators and enterprises must grapple with a hard trade‑off: how to preserve the usability and personalization that make local agents valuable, while closing a persistent, cross‑session attack surface that current models — however capable — cannot by themselves eliminate.