The year was 2013. In a state-of-the-art kitchen laboratory in Stanford University’s Robotics Center, surrounded by the whir of servo motors and the aroma of brewing coffee, I observed our latest prototype attempt a command that required multiple, albeit seemingly simple tasks: make me a cappuccino. The robot could execute each step in sequence—identify and pick up the coffee jar, pour the measured amount into the machine’s cup, screw it to the espresso machine, press the correct button, measure and add milk to the foaming machine, etc. But the real challenge wasn’t in these mechanical actions. It was in handling the unpredictable: coffee grounds that poured differently each time, cups that weren’t precisely where expected, and split-second decisions—like when to stop the milk from foaming.
A decade later, as I watch AI agents orchestrate complex enterprise tasks, I’m struck by a powerful realization: whether their labor is digital or physical, AI systems share fundamental building blocks that are reshaping how we think about work. That early robot’s need to remember ingredient locations and adapt to missing items operates on surprisingly similar principles to today’s digital agent planning a customer journey or orchestrating a supply chain.
This convergence of physical and digital AI marks a pivotal moment for enterprise leaders. It’s already happening in ways that they might not recognize; while much ink has been spilled on the advancement of buzzworthy humanoid robots, I’ve been focusing on discoveries that are seemingly mundane. I call them “boring breakthroughs.”
Consider this: humans today spend countless hours on routine physical tasks—from sorting warehouse inventory, to folding hospital linens, to organizing stock rooms. Just as AI agents now handle email triage and report generation, freeing knowledge workers for strategic thinking, robotics promises similar liberation in the physical realm. When a robot can reliably do the laundry or stock a shelf, it transcends mere convenience—it’s scaling up the capacity of what a single human can do, augmenting their cognitive, as well as physical abilities.
The impact multiplies in enterprise settings. Imagine hotels where robots handle luggage transport and room service delivery, enabling staff to create magical, white-glove experiences at scale. Picture hospitals where automated systems manage medication delivery and supply restocking, allowing healthcare providers to advance patient treatment and bedside manner. In manufacturing facilities, robots already handle repetitive assembly tasks—but next-generation systems will adapt to changing conditions just as flexibly as their human counterparts.
For enterprise leaders, this means today’s investments in digital automation aren’t just about current efficiency—they’re laying the groundwork for a future where agentic and robotic tasks converge in ways that fundamentally reshape how work gets done – both in the digital and the physical realm.
The Shared Brain: Digital and physical AI is more similar than you might think
What if I told you the intelligence powering a warehouse robot and a customer service AI agent share more DNA than you might expect? While their interfaces differ dramatically—one navigates aggregates of atoms while the other traverses zeros and ones—their core architecture is remarkably similar.
Whether coordinating robotic arms on a factory floor or orchestrating a multi-step customer service response, all AI agents require four fundamental components:
Memory for storing and retrieving information, a Brain for reasoning and planning, Actuators for taking action, and Senses for perceiving their environment.
The key difference? Digital agents operate through APIs and software interfaces, while physical agents interact through motors and sensors. But the intelligence layer—the ability to plan, adapt, and learn—remains consistent.
This architectural parallel isn’t just a scientist’s nifty theory—it’s already driving enterprise innovation. At Salesforce, we’re seeing this convergence in action through Agentforce, where our Atlas Reasoning Engine provides the “brain” that powers digital workflows today and could inform physical operations tomorrow. Our enterprise customers are discovering that success doesn’t hinge on building custom models from scratch, or expensive, “do-it-yourself” (DIY) AI. Instead, it’s about creating the right infrastructure and “piping” — the foundational business processes, security protocols, ethical guidelines and data flows that connect enterprise systems — to interface with the agentic layer provided by Agentforce. True innovation will always combine human ingenuity in reimagining what’s possible with the right technical foundations to make it real; winners will start with leadership vision and scale through expert implementation.
The implications for enterprise strategy are profound: organizations that excel at implementing digital AI today are already building the muscles they’ll need for advanced robotics, or physical AI—tomorrow. The frameworks for data management, process orchestration, and system integration—the foundational elements that enable digital agents—will also provide a foundation for robotic deployments. We’re already seeing compelling evidence of this pattern across industries. ABB has transformed decades of digital process automation expertise in manufacturing into the world’s most sophisticated industrial robots. In healthcare, Intuitive Surgical evolved from digital surgical planning to over 7,000 da Vinci robotic systems performing millions of life-giving procedures. Perhaps most dramatically, Waymo has leveraged digital workflow expertise to deploy advanced robotics, demonstrating a remarkable ~90% reduction in collision incidents compared to human drivers across 39 million real-world miles. These pioneers show how today’s digital AI capabilities can accelerate tomorrow’s physical automation, with increasingly compelling safety and efficiency benefits.
From Large Language Models to World Action Models (and Beyond)
The next frontier of AI isn’t just about understanding and generating language—it’s about understanding and acting in the physical realm. First came world models: AI systems that understand how physical reality works. Think of them as the three-dimensional, physical, equivalents of large language models (LLMs). Instead of capturing the relationship between words and text elements, they capture the relationship between physical 3D objects and the elements of the environment that surrounds them—how they move, interact, and occupy space.
Building on this foundation, and following our team’s pioneering 2023 work with Large Action Models (LAMs), which connect language with actionable outcomes in digital space, we’re now moving toward a powerful new parallel with world models; I like to call these “World Action Models” (WAMs)—systems that don’t just understand physical spaces but enable interaction and navigation within them.
Consider a simple command: “Fold these clean linens.” A WAM must not only understand the request but also assess the environment—e.g. size and location of the pile of laundry and folding surface—determine alternative actions for flat and fitted sheets, and adjust motions for relative sizes and fabric qualities. Each action changes the physical space, which in turn affects future possibilities.
These real-world interactions require a deeper understanding of physical dynamics, geometric relationships, and object permanence—capabilities that are already emerging, as NVIDIA CEO Jensen Huang demonstrated at CES last month with the launch of their Cosmos platform. “It’s not about generating creative content,” Huang explained, “but teaching AI to understand the physical world.”
My personal journey from robotics research to enterprise AI reminds me that just as language models have learned to understand context and nuance in text, WAMs are now learning to understand the laws of physics and real-world interactions. At Salesforce AI Research, we’re exploring potential results of these world models combining physical understanding with business rules and logic, especially when having access to all of the relevant data and knowledge within an organization.
The potential impact of this convergence on improving human workflows is exciting, to say the least. Tomorrow’s robots, powered by WAMs, will understand the physics of objects and adapt to unpredictable environments—but with humans firmly at the helm. They won’t just follow rules—they will understand principles and know when to seek guidance. Like a hotel robot encountering uncertainty about luggage location, they’ll learn to engage their human partners with specific questions: “Where exactly would you like the bags?” This human-AI partnership, where machines know their limitations and humans provide critical oversight, is essential as we move forward.
Back in that Stanford lab, watching our robot wrestle with an espresso machine taught me something meaningful about the future of AI. While the technical challenges were fascinating, the real potential extended beyond the “boring breakthrough” of making coffee. The “aha moment” was about understanding how machines could operate in a way that complement human capabilities; that amplifies human potential.
For enterprise leaders watching this convergence unfold today, three key signals deserve attention: First, the rapid advancement of world models that can understand and interact with physical spaces. Second, the growing breadth of enterprise use cases that bridge digital and physical labor; robotics is quickly expanding beyond the factory floor and the inventory stockroom to less predictable environments—like hospitals, the retail environment, and eventually our homes. Third, how and where to best scale human capabilities across your company. The convergence of AI and robotics will bring new organizational needs—from upskilling teams on AI-robot collaboration, to mapping existing business processes for automation potential. And most importantly, to developing new structures that optimize human creativity, innovation, and of course, oversight of both digital and physical AI systems.
Already well underway, this intersection between digital and physical AI must be met with both excitement and responsibility. As enterprise and technology leaders lay the foundations today—from ethical guidelines to training protocols to testing rigor—we’re shaping more than the next wave of enterprise technology. We’re creating a future where humans and machines work in concert, each playing to their unique strengths, making the impossible possible, together.