POaaS: Minimal-Edit Prompt Optimization Aims to Cut Hallucinations on On-Device sLLMs

Small, on-device language models are getting smarter without moving to the cloud. A new paper on arXiv (arXiv:2603.16045) introduces Minimal-Edit Prompt Optimization as a Service (POaaS), a lightweight approach that automatically tweaks user prompts with the fewest possible edits to lift factual accuracy and reduce hallucinations in small language models (sLLMs) running locally. The key claim: long, search-driven prompt rewrites designed for large cloud models break down on-device — they’re too long, too slow, and often exacerbate errors. POaaS instead produces compact, precise edits aimed at the constrained settings of phones, edge devices, and apps.

What POaaS does and how it differs

POaaS reframes automatic prompt optimization (APO) as a minimal-edit problem. Instead of generating long structured instructions through expensive search, the service proposes small, targeted corrections — fixing typos, clarifying intent, or inserting missing context — that fit within tiny model context windows and modest compute budgets. The paper’s experiments show improvements in accuracy and fewer hallucinations across several sLLMs; the authors report gains particularly when prompts are noisy or underspecified. The method is designed to be run as a compact on-device module or a lightweight nearby service, not as another heavyweight cloud LLM call.

Why this matters — for developers, users, and geopolitics

Who benefits? App developers and device makers that rely on sLLMs — including domestic Chinese players such as Baidu (百度), Alibaba (阿里巴巴), and Huawei (华为) that are pushing on-device AI — stand to gain by improving reliability without resorting to cloud inference. On-device techniques also intersect with broader geopolitical trends: export controls, data-residency rules, and privacy concerns have spurred a shift toward local models in many markets. It has been reported that organizations under sanction risk or with strict data-control regimes prefer edge-first deployments; POaaS-like approaches reduce the need for cloud fallbacks while keeping models useful.

The proposal is promising but not a panacea. Real-world robustness across languages, dialects, and adversarial prompts remains to be validated at scale, and performance will vary with sLLM architecture and device constraints. The arXiv preprint makes a practical point: sometimes the best fix is smallest — a minimal nudge to a prompt can be the difference between a truthful answer and a confident hallucination.