Anthropic says DeepSeek, Moonshot and MiniMax "illicitly extracted" Claude — but is distillation theft?

Allegations and numbers

It has been reported that US AI company Anthropic announced on Feb. 23 that its large language model Claude was the target of "illicit extraction" by three Chinese firms: DeepSeek (深度求索), Moonshot (月之暗面, "Dark Side of the Moon") and MiniMax (稀宇科技). Anthropic reportedly says the firms used about 24,000 fraudulent accounts to carry out more than 16 million interactions with Claude and that those exchanges were later used as training material to improve their own models. The accused companies are silent so far; Anthropic has accused them of using distillation-style queries to capture Claude’s distinctive capabilities in agency, tool use and coding.

What "distillation" means — common technique, contentious practice

Distillation is a well-known model-compression technique dating back to Geoffrey Hinton’s work in 2015: a large "teacher" model’s outputs are used to train a smaller "student" model so it can mimic behavior with fewer parameters. It is routinely used inside firms — for example, teams will distill smaller, faster versions from their own flagship models — but there is a crucial distinction when the teacher is a competitor’s closed commercial service. Many providers, including Anthropic, explicitly prohibit in their terms using model outputs to train other models, and Anthropic says the pattern of prompts and volume it observed looked nothing like normal user activity and constituted "distillation attacks."

Legal, commercial and geopolitical friction

The dispute sits at the intersection of machine‑learning technique, contract terms and geopolitics. It has been reported that Anthropic, which has previously restricted Claude access to China and publicly supported U.S. export controls, believes the firms circumvented those geographic limits via commercial proxies and has urged collaboration with U.S. cloud providers and policymakers. Past episodes — an earlier OpenAI memo that reportedly named DeepSeek, DeepSeek’s published rebuttals, and Moonshot’s denials over its Kimi model — show this is not a new flashpoint. At the same time, many Chinese models are released under permissive licenses such as MIT or Apache 2.0, and it has been reported that Western players sometimes distill from third‑party models too (market reports suggest Meta’s internal work on a model codenamed "Avocado" used other models as inputs), complicating any narrative of one‑way appropriation.

Unresolved questions

Technical detection, contractual remedies and intellectual‑property law have not caught up with the practice. How do you prove distillation at scale? Does using web‑scraped content that may include model outputs amount to unauthorized training data? And how do export controls and commercial terms overlap with widely used research practices? Expect this to unfold as a long, contested mix of technical forensics, litigation risk and geopolitically tinged policy responses — a debate that will shape how AI companies build walls and how others try to climb them.