“State-of-the-art” models can struggle with basic office work, says AI executive

Sota models shine at hard problems — but stumble on the routine

“State-of-the-art” (Sota) large language models can solve Olympiad‑level maths and produce sophisticated code, yet reportedly falter on everyday enterprise chores such as invoice parsing and data cleaning, it has been reported. David Meyer, senior vice‑president of product at Databricks, told the South China Morning Post that the same behaviours that make these models impressive — their tendency to optimise for end‑to‑end solutions — can be a liability in tasks that require precise, narrowly defined outputs. For example, Meyer said a Sota model “will oftentimes fix the mistake” on an invoice rather than simply flagging the erroneous number for downstream correction.

Specialisation, not scale, for routine work

Data engineering — transforming datasets at scale, handling nulls and zeros, and reliably preparing inputs for analytics — often benefits from models trained on targeted signals and workflows, Meyer argued. He pointed to smaller open‑source models refined with reinforcement learning as a more efficient solution for these cases, allowing firms to reach required reliability “at a level of training cost orders of magnitude lower” than that of top‑tier models. “A single model, no matter how large, can’t be equally good at all things,” he said.

As enterprises weigh productivity gains against costs and control, many are adopting a mixed strategy: use Sota models for research‑grade reasoning and creativity, and deploy specialised, lightweight models for routine back‑office automation and data pipelines. This shift also plays out against a geopolitical backdrop — including US export controls on advanced AI chips and tighter scrutiny on cross‑border data flows — that is pushing firms, particularly in Asia, to favour locally deployable and auditable solutions.

Can one model ever rule the enterprise stack? Meyer’s assessment suggests not. Instead, businesses appear to be moving toward an ecosystem of complementary models — each chosen for the right task, rather than a single “best” model for all.