Members within the problem examined and debugged robots engaged on completely different duties. | Supply: AGIBOT
AGIBOT Innovation Know-how Co. final week hosted the AGIBOT World Problem 2026 alongside ICRA 2026 in Vienna. The corporate introduced collectively 526 analysis and enterprise groups from 27 nations to compete throughout two embodied AI tracks: “Reasoning to Motion” and “World Mannequin.”
Shanghai-based AGIBOT stated the competitors highlighted a key shift in how embodied AI is evaluated. The firm stated it confirmed that the trade is shifting past simulation scores towards closed-loop testing on actual robots, actual duties, and standardized benchmarks.
The competitors adopted a benchmark-driven format that mixed on-line automated analysis with an offline real-robot ultimate in Vienna. With AGIBOT’s EWMBench and Genie Sim Benchmark, the constant framework enabled automated testing, standardized metrics, and reproducible outcomes.
Through the offline ultimate, finalist groups accomplished duties utilizing the AGIBOT G2 humanoid robotic. By incorporating real-robot validation into the analysis course of, the competitors positioned robotic stability, real-world adaptability, and long-horizon process reliability on the heart of the scoring system. The corporate, also called Zhiyuan Robotics Co., stated this extra intently aligns technical analysis with sensible deployment wants.
The problem drew analysis and trade groups from main establishments and firms, together with the Chinese language Academy of Sciences, Tsinghua College, the College of Science and Know-how of China, the College of California San Diego, Russia’s Sber Robotics Heart, Alibaba, Amap, and vivo. Greater than 100 groups surpassed the official baseline.
What’s the distinction between the R2A and WM tracks?
The 2 tracks on the AGIBOT World Problem 2026 mirrored the broader evolution of embodied AI from process execution towards understanding, prediction, and decision-making, based on AGIBOT.
The Reasoning to Motion (R2A) observe evaluated how robots perceive duties, plan actions, and execute them in bodily environments. The R2A observe, upgraded from the 2025 Manipulation observe, expanded the analysis from motion execution to the total strategy of surroundings understanding, process planning, and bodily execution.
The World Mannequin (WM) observe targeted on how AI methods predict physical-world modifications and mannequin interactions based mostly on robotic actions and sensor inputs.
Groups skilled reasoning-and-manipulation fashions utilizing the AGIBOT WORLD open-source dataset and evaluated them by way of Genie Sim 3.0, with the benchmark masking language understanding, spatial reasoning, atomic abilities, disturbance adaptation, and zero-shot switch.
Within the ultimate rating, PrismBot from vivo received the championship with 43.47 factors, adopted by Shanghai RoboParty’s RP-VLA with 35.66 factors and Russia’s GreenVLA with 33.19 factors.
AGIBOT targets grocery store duties with the problem
Alongside the competitors, AGIBOT and Dexmal launched a grocery store benchmark observe targeted on end-to-end decision-making and whole-body management. This observe integrated non-ideal bodily interactions, together with object drops and greedy failures, to higher replicate the complexity of real-world interplay and supply a extra sensible analysis framework for world mannequin analysis.
Set in a sensible retail surroundings, the observe required fashions to finish the total cellular manipulation course of, from autonomous navigation and merchandise selecting to merchandise transport and placement, beneath bodily constraints reminiscent of shelf peak limits and randomized merchandise placement. By means of API-based distant management, contributors’ algorithms instantly managed actual robots, making a sensible benchmark for evaluating embodied intelligence in deployment-oriented eventualities.
Within the World Mannequin (WM) observe, NeoVerse-ABot, a joint group from the Institute of Automation of the Chinese language Academy of Sciences, and Amap CV Lab, received first place. The PAI@IAII group from the Institute of Industrial Synthetic Intelligence on the Chinese language Academy of Sciences, ranked second. The Loop group from the College of Science and Know-how of China positioned third.
With the World Problem, AGIBOT hoped to contribute to a extra sensible and reproducible analysis framework for embodied AI. | Supply: AGIBOT
AGIBOT releases full-stack toolchain for robotic validation
Past the competitors itself, AGIBOT opened a full-stack toolchain masking real-world knowledge, simulation analysis, and real-robot testing. The toolchain included the AGIBOT WORLD open-source dataset, Genie Sim 3.0, and the AGIBOT G2 robotic platform, serving to builders validate fashions throughout the trail from coaching to simulation and bodily deployment.
EWMBench and Genie Sim Benchmark supported standardized metrics, automated analysis, and comparable outcomes throughout simulation and bodily testing. They addressed widespread challenges reminiscent of inconsistent analysis standards and the hole between simulated efficiency and real-world deployment.
AGIBOT stated it’ll combine the technical and ecosystem assets developed by way of the competitors with its ongoing benchmark improvement and open-source efforts. The corporate additionally plans to launch an internet simulation leaderboard, introduce extra take a look at duties and diversified benchmarks, and help extra complete quantitative analysis of mannequin capabilities.
As well as, AGIBOT stated it’ll proceed to refine its benchmarks and full-stack toolchain, working with international analysis establishments, builders, and trade companions. Its acknowledged purpose is to assist embodied AI transfer from particular person algorithmic advances towards methods that may be deployed and scaled in real-world settings.
In different benchmark information, Fraunhofer IPA final month provided a brand new take a look at benchmark for humanoid robots, and NIST proposed its personal baseline efficiency benchmark for humanoids.
