Tiiny AI’s Pocket Lab storms Kickstarter, pitching a plug‑and‑play local “Jarvis” for privacy‑conscious users

Crowdfunding sprint and product pitch

Tiiny AI (本智激活) has captured a clear market itch: users want local, easy‑to‑deploy AI without the cost and privacy tradeoffs of cloud APIs. It has been reported that the Pocket Lab—a $1,399‑starting external box that runs local large language models—raised roughly $2.95 million on Kickstarter from about 2,093 backers, reaching $1M in its first five hours. The venture, incubated at Shanghai Jiao Tong University’s Institute for Parallel and Distributed Systems (IPADS, 上海交通大学并行与分布式系统研究所), reportedly completed a multi‑million‑RMB seed round led by Guangqi Capital (光启资本) with participation from BV (Baidu Ventures, 百度风投) and Light Source L2F Entrepreneur Fund (光源L2F创业者基金).

The technical bet: software‑first, heterogenous chips

Tiiny AI’s core claim is pragmatic: don’t sell a full AI PC—sell a compact inference box that runs 100‑billion‑parameter‑class models locally and plugs into any Mac or Windows machine. The company grew out of an open‑source project called PowerInfer, a heterogeneous inference engine on GitHub that reportedly has 9,100 stars. Pocket Lab’s architecture, reportedly rated at 190 TOPS aggregate, pairs an Armv9.2 SoC with a 160TOPS dNPU ASIC tuned for Transformer inference and uses a “hot/cold” parameter scheduling strategy so only a small fraction of weights are hot‑activated each token. Tiiny says the device can one‑click download and run mainstream models under 100B and advertises real‑world throughputs (reportedly ~300 tokens/s prefill for a 120B MoE model and ~20 tokens/s decoding) — figures the company presents to argue software orchestration can substitute for brute‑force GPU raw power.

Questions, caveats and geopolitical backdrop

But overseas observers have raised important caveats: the 120B model cited by Tiiny is a MoE (mixture‑of‑experts) variant, meaning each token may only activate roughly 5.1B parameters rather than the full 120B—so “120B” is not the same as running a dense 120‑billion‑parameter model. It has also been reported that the 190 TOPS figure may be a simple sum of heterogeneous unit peaks, which is not directly comparable across architectures. And there is a broader context: constrained access to top‑end GPUs amid US export controls and US‑China tech tensions makes local, efficient inference attractive for privacy‑sensitive professionals and “Agent” power users who want Jarvis‑like assistants without sending data to the cloud. Is Pocket Lab the start of a new consumer AI appliance category, or a practical stopgap until more affordable, high‑end accelerators arrive? The device’s Kickstarter success suggests many buyers are ready to find out.