← Back to stories A person typing code on a laptop with a focus on cybersecurity and software development.
Photo by cottonbro studio on Pexels
钛媒体 2026-04-18

Three layers of defense still aren’t enough — a PR title can steal your API key

Cross‑vendor “Comment and Control” flaw lets attackers weaponize PR titles, issue comments and hidden HTML

A new security paper by independent researcher Aonan Guan and collaborators has exposed a cross‑vendor class of prompt‑injection attacks — dubbed “Comment and Control” — that can trick AI-powered code agents into exfiltrating API keys and tokens. It has been reported that Anthropic’s Claude Code, Google’s Gemini CLI GitHub Action and GitHub/Microsoft’s Copilot Agent were all vulnerable to variations of the same basic pattern: untrusted text from pull requests, issues or hidden markdown is treated as executable instruction and used to drive agent behaviour, sometimes with access to host environment secrets.

The mechanics are straightforward and chilling. In Anthropic’s case, a crafted PR title was concatenated directly into a prompt template and the agent ran system commands that read ANTHROPIC_API_KEY and GITHUB_TOKEN. With Google’s Gemini CLI the team reportedly escalated an issue comment into a “trusted” content block and induced the agent to publish a full GEMINI_API_KEY in an issue thread. GitHub’s Copilot Agent — despite layered runtime protections including environment‑variable filtering, key scanning and outbound firewalling — was shown to be bypassable by hiding instructions in HTML comments, reading parent process environments (via ps auxeww), base64‑encoding secrets to evade pattern scans, and then committing the encoded secrets back to GitHub for the attacker to retrieve.

Patches, quiet fixes, and wider implications

All three vendors have acknowledged the issues and implemented fixes, and it has been reported that each paid bug bounties (Anthropic reportedly $100, Google $1,337, GitHub $500). But none has, as of reporting, issued a broad user security advisory or assigned CVE identifiers; it has been reported that at least one vendor classified the behaviour as an “architectural/design” consequence rather than a conventional vulnerability. Why does that matter? Because prompt obedience is a core model property, not a traditional code bug — and when models are given repository read/write privileges, the platform itself becomes an attacker’s command‑and‑control channel.

This is not a niche academic exercise. The researchers say the pattern reproduced on popular open‑source tooling and warn the attack surface will grow as AI agents become standard in CI and developer workflows. Their practical takeaways are blunt: treat prompt injection like phishing, give agents the minimum privileges required, adopt whitelist‑only tool access rather than blacklists, and harden credential handling so secrets are never broadly available to parent processes or easily committed in clear or predictable encodings.

Regulators and enterprises watching AI risk should take note. Amid growing scrutiny of AI governance and the geopolitically charged debate over control of advanced models and infrastructure, should vendors treat model obedience as a security boundary that requires architectural guarantees — or will “design limitations” continue to be framed as acceptable trade‑offs? The researchers’ paper is a warning: without deeper, system‑level defenses, even multiple runtime guardrails can be dismantled by a single pull request title.

AI
View original source →