Figure AI started a YouTube livestream at 09:00 PT on May 13, aiming a fixed three-camera shot at three identical Figure 03 humanoid robots running its Helix-02 neural network entirely onboard, sorting small barcoded parcels onto a stainless conveyor belt. The public target was 8 hours fully autonomous, zero teleoperation, zero human intervention. The trigger was a public dare from robotics evangelist Scott Walter (@scottyspectacular) on X, who had argued for weeks that humanoid robotics held no commercial value until somebody, anybody, could run an unedited 8-hour shift.
Brett Adcock replied with three words —「We’ll do it live」 — and turned the camera on.
24 hours later, the three robots — nicknamed Bob, Frank, and Gary by the chat — had crossed the 24-hour mark with 28,000+ packages sorted, zero failures, ~2 million viewers, and Adcock posting 「This is uncharted territory」 on X.
What the livestream actually showed
Per Figure’s own livestream description and the Interesting Engineering writeup on May 14:
- 3 × Figure 03 humanoid robots, identical hardware, identical Helix-02 weights.
- Task: detect barcode on a small parcel, pick it up, rotate it barcode-face-down, place it on a conveyor.
- Throughput: ~2.6 seconds per package at peak (Figure’s number), with humans averaging ~3 seconds per package on the same task per Figure’s internal benchmark.
- Helix-02 runs entirely onboard the robots. No teleop. No remote inference. Adcock’s pinned thread emphasised this twice, presumably because the first comment under every humanoid demo for two years has been 「is that teleoperated.」
- The robots displayed 「strikingly human-like quirks」, including touching their own heads and momentarily freezing when packages were jumbled. Whether these are emergent behaviours from imitation learning on human-motion video or hand-crafted recovery routines was not explained by Figure on stream.
- When packages caused a robot to get stuck, the system triggered an autonomous reset and resumed without human intervention — itself a non-trivial capability, since the failure mode that has killed every prior 「continuous autonomy」 demo was the recovery step.
- The original 8-hour target passed without incident around 17:00 PT on May 13. Figure simply kept the camera running. The 24-hour mark passed around 09:00 PT on May 14. 2M+ concurrent viewers at peak per the stream metadata, per BigGo Finance’s recap.
The Scott Walter dare matters more than the demo
Most humanoid demos are released as edited 60-second reels and breathlessly headlined. The Helix-02 endurance run is the first one staged as an explicit public bet against a named industry critic, on a fixed task, with a fixed minimum duration, on a fixed unedited camera. Scott Walter’s bet as documented on X was specific: he asked whether any humanoid company could run a single 8-hour shift, fully autonomous, with no cuts. The implicit standard was the BMW Spartanburg shift that Figure has been quoting for two months without producing the timestamped, continuous footage.
Adcock took the bet on the more honest terms: live, public, with the clock running, and stretched the duration to 3× the original ask before stopping the broadcast.
That is the first time the industry has agreed to a falsifiable test in public. It does not prove the technology generalises — it proves three specific robots, on one specific picking task, in one specific facility, on a fixed conveyor speed, can do 24 hours straight. But the bar moved.
What the skeptics are saying
TechRadar’s writeup on May 14 collected the skeptical thread:
- The task is narrow. Barcode-face-down package orientation is one of the easiest manipulation tasks in modern industrial robotics because barcodes are flat, rectangular, high-contrast, and consistent. The general capability the demo implies — multi-task home and warehouse autonomy on long horizons — is a strict superset of what was shown. The demo is necessary, not sufficient.
- The kit cost wasn’t disclosed. Helix-02 is running on Figure 03 hardware, which Adcock has previously priced for industrial deployment in 「six figures per unit」 but never with a published ASP. A 24-hour run at six figures per robot, x3, is a Sunday afternoon spend for a Series-D-funded company. Whether the unit economics work at warehouse-deployment scale is a different question.
- The 「no teleop」 claim is unverifiable from the outside. Helix-02 weights and inference latency are not public. The livestream showed the robots’ behaviour; it did not show the network path. A skeptical reading is that the demo is consistent with both fully-onboard inference and a low-latency remote inference loop. Figure’s reputation is now load-bearing on the claim.
- The 28,000-package number is impressive only with reference points. A FedEx Ground hub processes ~150,000 small packages per shift across ~80 human sorters. Three Figure 03 robots at 28,000/24h ≈ ~9,300 packages per robot-day. At 「one human shift」 equivalence, Figure 03 is now ~0.7× a single human sorter. Three robots replace ~2 humans for that 24h, before maintenance, charging, downtime, and amortised hardware cost. The economics are not there yet. The capability arc is.
Each of these is correct. None of them changes the structural fact that the bar for fully-autonomous humanoid endurance just moved from 「demo」 to 「livestreamed unedited 24 hours.」
What this means for the cohort
| Company | Latest endurance claim | Public format | Verifiable? |
|---|---|---|---|
| Figure 03 | 24h, 28k pkgs, 3 robots, May 13–14 | Livestream, ~2M viewers | Externally observable end-to-end |
| Atlas 001 | Handstand + L-sit (skill demo) | Edited reel | Skill yes, endurance no |
| Unitree G1 | UniStore 24 apps, $3,949 | Static catalogue + price | Capability yes, endurance no |
| Tesla Optimus | In-factory data collection | No external footage | Not demonstrated |
| Agibot G2 | 8h Longcheer livestream | Livestream | Demonstrated April 2026 |
| 1X NEO | Home pilots | Curated clips | Skill yes, endurance no |
Agibot’s 8-hour Longcheer livestream on April 19 was the prior bar. Figure 03’s 24-hour run on May 13–14 is 3× that. Two of the three serious endurance claims are now Chinese-and-American livestreamed demonstrations on continuous unedited camera. The Japanese-and-European players (Honda, ABB, Schaeffler-deployed Humanoid) have nothing comparable on the public record.
The category just split into two camps:
- Verifiable endurance camp: Figure, Agibot. Public, livestreamed, named-task duration claims with chat logs.
- Edited-reel camp: everybody else.
The verifiable camp is two companies. The first one to release the same demonstration on a second different task without re-staging will win the next news cycle.
What to watch
- The Helix-02 weight release / model card. Figure has previously released the Helix-01 paper, but never the weights. A Helix-02 model card with task-coverage tables, failure-mode disclosure, and onboard inference latency would convert the demo from a marketing claim into an evaluable capability. The pressure for this disclosure is now structural.
- The second-task replay. The cleanest falsification of any 「we did 24 hours」 demo is to run the same hardware and weights on a different task — kitting, bin picking, stretch-wrap, dishwasher loading — for the same duration. Figure has explicitly shown a 4-minute dishwasher unload-reload in pilot footage. A 24-hour kitchen-task livestream is the next, harder, more credible bar.
- BMW Spartanburg pilot timestamped output. Figure’s BMW pilot has been the marquee reference deployment for the better part of a year, but published throughput data has been thin. The May 14 livestream re-opens the question of whether Figure can release production-floor continuous-uptime numbers, not just lab-demo livestream numbers.
- Adcock’s reaction to a sanctioned third-party endurance audit. The Helix-02 result will not generalise to enterprise sales until somebody other than Figure runs the clock. The first robotics insurance underwriter or third-party lab to publish a Figure 03 endurance audit will set the unit-economics number the industry has been guessing at all year.
The morning-after read is that humanoid robotics now has its first falsifiable, public, unedited 24-hour autonomous endurance result on a real industrial task. Three robots, ~28,000 packages, ~2M viewers, zero failures, started because a guy on X asked for the receipts. Adcock called it 「uncharted territory.」 Walter has not yet conceded the bet, but he has stopped tweeting about whether the 8-hour bar can be cleared.
That is the only metric that matters this week.