Agent-based agents aim to personalize harassment filters — but who decides what stays online?

A new arXiv paper, "Agent-Based User-Adaptive Filtering for Categorized Harassing Communication," proposes replacing one-size-fits-all moderation with lightweight, personalized filtering agents that adapt to individual users' tolerance and preferences. The core claim is simple: instead of platforms applying uniform rules to all users, small adaptive agents learn from feedback and dynamically tune which categories of harassing speech are filtered for each person. It is a direct challenge to the dominant model used by most western and Chinese platforms today.

Technology and what the authors report

The authors describe an agent-based framework that models per-user tolerance levels and preference profiles, and that updates filtering behaviour from ongoing user feedback. Reportedly the approach targets categorized harassment (for example threats, insults, doxxing) rather than undifferentiated abuse, allowing finer-grained tradeoffs between safety and expression. The paper is available on arXiv and the submission appears via arXivLabs, the platform’s sandbox for experimental features and community-driven projects.

Policy, safety and geopolitical questions

The idea raises immediate policy questions. Who should set thresholds for harm — individual users, platform operators, or regulators? Could personalized filters reduce overblocking for some users while creating blind spots for others? In China, where major services such as WeChat (微信) and Weibo (微博) operate under strict content controls and real‑name obligations, a decentralised, user-adaptive model could conflict with regulatory requirements for unified enforcement. In the EU and US, platform liability and content rules (for example the Digital Services Act debates and Section 230 discussions) make any move toward individualized moderation politically sensitive.

Personalization offers potential benefits — fewer false positives and more user agency — but it also carries risks: echo chambers, inconsistent protection across populations, and new vectors for adversaries to game systems. It has been reported that the authors focus on simulation and initial experiments; real-world deployment would force platforms, regulators and civil-society actors to confront the trade-offs head on. The paper is a sign that moderation research is moving beyond blunt instruments — but it also asks an uncomfortable question: who gets to decide what counts as harassment online?