Former NASA Robotics Division Chief, In Fortune May 23: U.S. Humanoids Score 「Nearly 90%」 In Controlled Simulation And 12% On Real Household Tasks. Figure 02's 1,250 Hours At BMW Were Ten Months Of One Single Task. The Country That Defines 「Good Enough To Deploy At Scale」 Sets Global Manufacturing For Decades — And, Per Robert Ambrose, That Country Is Not The United States

Dr. Robert Ambrose, former Chief of NASA's Software, Robotics and Simulation Division, used the Fortune commentary page on May 23 to put a Stanford number on the demo-vs-deployment gap (90% → 12%) and reframe Figure 02's BMW pilot as 「one task for ten months.」 The argument is a policy ask. The Stanford number is the load-bearing detail.

Former NASA Robotics Division Chief, In Fortune May 23: U.S. Humanoids Score 「Nearly 90%」 In Controlled Simulation And 12% On Real Household Tasks. Figure 02's 1,250 Hours At BMW Were Ten Months Of One Single Task. The Country That Defines 「Good Enough To Deploy At Scale」 Sets Global Manufacturing For Decades — And, Per Robert Ambrose, That Country Is Not The United States

The single load-bearing number in Dr. Robert Ambrose’s Fortune commentary, which landed at 5:00 AM Eastern on Friday May 23, is from a Stanford report he cites in passing: humanoid robots that score nearly 90% success rates in controlled simulations succeed at just 12% of real household tasks. Ambrose — currently Chairman of Robotics and AI at alliant, previously Chief of NASA’s Software, Robotics and Simulation Division — calls the 78-point gap “not a rounding error” but “the whole problem.”

Every other claim in the piece sits on top of that number. The Figure-Tesla-Boston Dynamics demo footage that the U.S. humanoid sector has produced for two years is doing work in the simulation column. The deployment column, by the same Stanford methodology, is what the U.S. mid-sized industrial base sees when it tries to commission one.

The Figure-02 reframe

The reframe of Figure AI’s BMW Spartanburg pilot — the program that earned Figure its $39B Series C valuation last September and the data point everyone cites — is the second non-obvious move in the piece. The line Figure publishes is 1,250 hours of work at BMW, 90,000+ sheet-metal components moved, eleven months elapsed.

Ambrose’s reading of that same number is, in his words: “the robot did one task: picking up sheet metal parts and placing them on a welding fixture for ten months straight.” The 90,000 parts is the same one part, picked 90,000 times.

What that reframe implies — and Ambrose says explicitly — is that a single-task, single-fixture, eleven-month pilot at a Fortune-100 OEM with a billion-dollar manufacturing-AI budget is exactly the unit economics a Tier-1 supplier cannot absorb. BMW can run a humanoid as a $1M-line-item R&D experiment. A mid-sized auto-parts maker in Ohio cannot. Until the humanoid does the second task — and the third, and the fourth — without a separate integration project per task, the demo videos are entertainment, not strategy. Per Ambrose, “right now, in most factories, several humans generate better ROI than one humanoid robot.”

The NASA argument

Ambrose’s actual professional credential on this is the part the Fortune editor underplayed. He led the NASA division that designed the Space Shuttle robotic arm — a system that was specified to do one thing (position an astronaut who would then catch and release a satellite) and that ended up doing a different thing better than the spec called for (making the catch itself), which in turn enabled an entirely unforeseen mission (the Hubble Space Telescope repair).

The arms that succeeded at NASA were the ones that could be re-tasked outside their training distribution. The arms that failed were the ones built for one scenario. That is a hard-won institutional memory, and it is the lens through which Ambrose reads the entire U.S. humanoid cohort. Most factories cannot economically host a one-task robot. The factories that can host it are the ones already operating fixed automation that does the same task for less money.

The piece does not say this, but the implication is that Figure 02 at BMW Spartanburg is, by NASA’s classification scheme, a single-scenario deployment. The reason it survived eleven months is that BMW was paying for the data, not for the labor. The reason it cannot replicate at scale is that nobody else is paying for the data.

The policy ask

The proposal in the back half of the piece is where Ambrose stops being a robotics columnist and starts being a former federal R&D executive. The two structural points he names:

  • Existing federal R&D tax credits reward discovery, not deployment. A manufacturer that spends $800,000 integrating a humanoid system gets “essentially the same tax credit as one that buys a new forklift.” The Section 174 R&D credit was written for prototype and lab work, not for the floor-level integration cost — which is most of where the money goes once the robot ships.

  • The $2.5B already in robotics VC is, on its own, insufficient. Private capital funds the demo column. Federal incentive structures are what move the deployment column. Without that, the demo column keeps growing — and the U.S. keeps optimizing for the wrong metric.

His specific asks: a robotics-focused “manufacturing deployment” tax credit stackable with the existing R&D credit; an expanded Manufacturing Extension Partnership with “humanoid deployment concierges” funded at low federal cost for small and mid-sized manufacturers; and NIST, working with NASA, establishing humanoid interoperability standards so a mid-sized supplier can combine fleets from multiple vendors safely. The interoperability standard is the unglamorous one and the one with the largest long-run leverage — the same role USB-C plays for consumer electronics is what NIST-defined humanoid interfaces would play for an Ohio plant trying to run a Figure unit alongside a Unitree unit on the same line.

The China frame

The geopolitical frame Fortune uses — “America is building the wrong kind of robots — and China knows it” — leans on the troupe of humanoid robots that danced for German Chancellor Merz earlier this year in a Beijing demonstration. Ambrose’s reading: that demo is spectacle, deliberately, because the country running the demo also runs the deployment program quietly behind it. The U.S. is building demos and calling them strategy. China is building demos and a deployment program, and the cohort the U.S. is competing against is the deployment program, not the demo.

The line that will get screenshot most: “the country that defines ‘good enough to deploy at scale’ will set the terms for global manufacturing for decades. Right now, that country is not the United States.” That is the policy claim, and the Stanford 12% number is the data point that turns it from rhetoric into a brief.

What to watch

  • Whether the Stanford simulation-to-deployment study gets a citation in the next House Science Committee humanoid hearing. The number is small enough to fit on a slide and load-bearing enough to drive testimony. Ambrose putting it in front of Fortune’s commentary readership is the first surface; the second surface is congressional record.
  • Whether the existing federal robotics credits get amended to cover deployment. Section 174 is the easy lever. The harder lever is a dedicated humanoid-deployment credit. The Inflation Reduction Act precedent for advanced-manufacturing investment credits already exists — the question is whether humanoids get added to that taxonomy before the 2027 reconciliation cycle.
  • Whether Figure, Apptronik, 1X, and Agility publish a second-task demonstration at a real customer in 2026. Ambrose’s Figure-02-as-single-task reframe is now in the public record. The way to retire it is to ship the second task. The way to confirm it is to not ship the second task.
  • Whether Unitree publishes a deployment number to match its 5,500-units-shipped-in-2025 number. Shipped is the demo metric. Deployed, with which customers, at what tasks, is the metric that turns the Stanford gap from rhetoric to reality.

The number to remember is 12%.