Pc-use brokers have been restricted to primitives. They click on, they sort, they scroll. Lengthy motion chains amplify grounding errors and waste steps. Apple Researchers introduce UltraCUA, a basis mannequin that builds an hybrid motion area that lets an agent interleave low stage GUI actions with excessive stage programmatic device calls. The mannequin chooses the cheaper and extra dependable transfer at every step. The strategy improves success and reduces steps on OSWorld, and transfers to WindowsAgentArena with out Home windows particular coaching.
https://arxiv.org/pdf/2510.17790
What hybrid motion adjustments?
Hybrid motion treats instruments as first-class actions. A device name encapsulates a multi step operation as a single perform with a transparent signature and a docstring. A click on or a key press nonetheless exists when no programmatic path is on the market. The agent learns to alternate between each modes. The objective is to cut back cascade errors and to chop step counts. The analysis crew positions this as a bridge between GUI solely CUAs and gear centric agent frameworks.
https://arxiv.org/pdf/2510.17790
Scaled device acquisition
UltraCUA builds its device library with an automatic pipeline. The system extracts keyboard shortcuts and instructions from software program documentation. The system integrates open supply implementations from agent toolkits. The system additionally makes use of coding brokers to synthesize new instruments. Every device is a callable interface that hides a protracted GUI sequence. The analysis crew reviews protection throughout 10 desktop domains with 881 instruments. The most important buckets embrace VS Code with 135 instruments and LibreOffice Author with 123 instruments. Thunderbird and GIMP even have deep protection.
https://arxiv.org/pdf/2510.17790
Verifiable artificial duties and trajectories
Coaching requires grounded supervision and steady rewards. UltraCUA makes use of a twin artificial engine. An evaluator first pipeline composes atomic verifiers for browsers, recordsdata, pictures, and system state, then generates duties that fulfill these checks. An instruction first pipeline explores the OS and proposes context aligned duties that are then verified. The result’s 17,864 verifiable duties throughout 10 domains similar to Chrome, LibreOffice, GIMP, VS Code, system, Thunderbird, VLC, and multi app workflows. Chrome has 2,826 duties. The LibreOffice suite sums to five,885 duties. Multi app duties attain 2,113.
https://arxiv.org/pdf/2510.17790
A multi agent rollout produces profitable hybrid trajectories. The planner makes use of OpenAI o3 for choice making. The grounder makes use of GTA1-7B for correct visible localization. The rollout yields about 26.8K profitable trajectories that present when to make use of a device and when to behave within the GUI. These trajectories are the core of the supervised part.
Coaching Method
Coaching has two levels. Stage 1 is supervised positive tuning. The fashions practice for 3 epochs at a studying price of 2e-5 on the profitable trajectories. Loss is utilized flip smart to keep away from over weighting early steps. Stage 2 is on-line reinforcement studying. The fashions practice for 150 steps at a studying price of 1e-6 on verified duties which can be sampled by issue. The coverage optimization follows a GRPO variant with clip greater, and removes KL regularization and format rewards. The reward combines sparse job consequence with a device use time period. Experiments use NVIDIA H100 GPUs. The context is saved close to 32K by controlling the variety of uncovered instruments.
Outcomes on OSWorld
UltraCUA improves success at each 7B and 32B scales. Beneath 15 step budgets, UltraCUA-32B reaches 41.0 p.c success. OpenCUA-32B reaches 29.7 p.c. Absolutely the achieve is 11.3 factors. UltraCUA-7B reaches 28.9 p.c. UI-TARS-1.5-7B reaches 23.4 p.c. Good points persist below 50 step budgets. A per area breakdown reveals constant lifts throughout Chrome, Author, VS Code, and cross utility duties. Common steps lower in opposition to baselines. These shifts point out higher motion choice fairly than solely extra makes an attempt.
https://arxiv.org/pdf/2510.17790
https://arxiv.org/pdf/2510.17790
Cross platform switch on WindowsAgentArena
UltraCUA trains solely on Ubuntu based mostly OSWorld information. The mannequin is then evaluated on WindowsAgentArena. UltraCUA-7B reaches 21.7 p.c success. This exceeds UI-TARS-1.5-7B at 18.1 p.c and a Qwen2 baseline educated with Home windows information at 13.5 p.c. The outcome means that hybrid motion methods discovered on one platform switch to different platforms. The paper highlights this as zero shot platform generalization.
https://arxiv.org/pdf/2510.17790
Key Takeaways
UltraCUA formalizes a hybrid motion area that lets a single agent alternate between GUI primitives and programmatic device calls, which reduces lengthy error susceptible motion chains.
The analysis crew scales a reusable device library by an automatic pipeline and pairs it with an artificial information engine, yielding 17,000 plus verifiable pc use duties for coaching and analysis.
Coaching follows a two stage recipe, supervised positive tuning on profitable hybrid trajectories then on-line reinforcement studying on verifiable duties, which optimizes when to name instruments versus act within the GUI.
On OSWorld, UltraCUA reviews a median 22 p.c relative enchancment over base fashions and 11 p.c fewer steps, which signifies beneficial properties in reliability and effectivity.
The 7B mannequin reaches 21.7 p.c success on WindowsAgentArena with out Home windows particular coaching, which reveals cross platform generalization of the hybrid motion coverage.
UltraCUA strikes pc use brokers from brittle primitive motion chains to a hybrid motion coverage, integrating GUI primitives with programmatic device calls, which reduces error propagation and step counts. It scales instruments through an automatic pipeline and pairs them with an artificial information engine that yields 17,000 plus verifiable duties, enabling supervised positive tuning and on-line reinforcement studying on grounded alerts. Reported outcomes embrace 22 p.c relative enchancment on OSWorld with 11 p.c fewer steps, and 21.7 p.c success on WindowsAgentArena with out Home windows particular coaching, which signifies cross platform switch of the coverage.
Try the Paper right here. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you possibly can be part of us on telegram as properly.
Michal Sutter is a knowledge science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at remodeling complicated datasets into actionable insights.
🙌 Comply with MARKTECHPOST: Add us as a most well-liked supply on Google.

