I believe large action models (LAMs) represent as big a shift in the development of AI as anything we’ve seen in the previous decade. Just as LLMs made it possible to automate the generation of text, and, in their multi-modal forms, a wide range of media, LAMs may soon make it possible to automate entire processes. And because they’re naturally fluent in language, they’ll intelligently interact with the world — communicating with people, adapting as circumstances change, and even interacting with other LAMs.
Recent months have seen the emergence of a powerful new trend in which large language models are augmented to become “agents” — software entities capable of performing tasks on their own, ultimately in the service of a goal, rather than simply responding to queries from human users. It may seem like a simple change, but it opens up an entire universe of new possibilities — by combining the linguistic fluency of an LLM with the ability to accomplish tasks and make decisions independently, generative AI is elevated from a passive tool, however powerful it may be, to an active partner in getting work done in real time. Here at Salesforce AI, the potential of such powerful agents has been a topic of active research and development for some time.
Table of Contents
What is a large action model?
Simply put, a large action model (LAM) is a type of generative AI that can perform specific actions based on user queries. These models not only analyze data, but are designed to take action based on the findings. Think of them as the can-do cousin to a large language model (LLM). An LLM could generate text in response to a query, while an LAM takes action — like helping a shopper process a return.
How do large action models work?
I believe that an important mandate of AI is the pursuit of automation that augments human abilities, rather than attempting to replace them. With that in mind, LAMs should focus on taking the reins on repetitive tasks and other busywork — the kind of thing most of us don’t want to do in the first place — that gets in the way of the kind of meaningful, high-value endeavors that we’re best at. So let’s discuss the incredible potential LAMs promise at two levels: for individuals, and for organizations. In the process, let’s imagine how LAMs can be applied today, and how their role in our work and lives might evolve in the coming years.
LAM example #1: Improve marketing workflows
Personal assistants have been a luxury reserved for the wealthy for generations, although the tech industry has been promising virtual alternatives for the masses for decades as well. LAMs, with their astonishing fluency and ability to generalize naturally across virtually all domains of life, might be the turning point we’ve been waiting for — a technology that can truly assist us, with much of the foresight and acumen we’d expect from a human colleague. Consider the following:
There’s been a lot of hype lately about the impact of LLMs on marketing workflows, with their ability to generate copy, imagery, and even web layouts seen as a step change for the field. The picture is more complicated in practice, however, as a great deal of manual effort is necessary to integrate an LLMs output into the complete process of, for example, conceiving of a new campaign and rolling out the results. Currently, even the best generative AI only truly automate selective parts of that process.
We envision that AI agents for marketing, however, will take a broader, more LAM-like approach to delivering results for marketing teams, by using an LLM interface to connect data, tools, and domain-specific agents in the pursuit of a high-level task. Imagine, for instance, a request like the following:
“Send a marketing email to highlight the value in our new Chocho Chocolate. Give the first 100 people to purchase a coupon for free shipping. Ensure each recipient gets a personalized message.”
On its own, an LLM would be hard pressed to fulfill it. A constellation of tools, agents, and data sources, however—access to previous marketing materials, customer data that the organization has chosen to share with the LAM, and, of course, LLMs themselves—could generate the copy (“Send a marketing email”) draw from documentation highlighting the latest product developments (“the value in our new Choco Chocolate”), break down the logic of the request (“Give the first 100 people to purchase a coupon”) and handle customer-specific touches (“Ensure each recipient gets a personalized message”) with ease.
LAM example #2: Simplify car buying
But personal assistants are meant to help out across one’s entire life, not just work. So let’s imagine how they might help with a significant but personal buying decision, like a car. For many, the process of buying a vehicle can be more hassle than excitement, and the research phase in particular can be overwhelming. With an LAM, though, it may soon require little more than a prompt like the following:
I’m in the market for a sedan with a good safety rating and lots of space, ideally in a dark color. No earlier than 2014, but no more expensive than $28,000. And mileage under 90,000.
The first step, for both a human and an LAM, would be scanning car buying sites to assemble an initial list of options. The powerful text-understanding capabilities of an LLM allow the agent to consume huge amounts of car reviews from professional and user-generated sources alike, quickly identifying candidates that satisfy the user’s parameters. Additionally, the LAM might notice red flags—for instance, that noticing a particular year of an otherwise suitable car model is notorious for faulty transmissions or electrical issues, and removing it from the list (or, at least, annotating it with a disclaimer).
As a next step, the LAM could even initiate conversations with private sellers and local dealers, using channels like email or even SMS to reach out. Although a good LAM would likely announce that it’s an AI, ensuring humans are never misled, it would still communicate in clear, fluid, natural language, with greetings, complete sentences, and a clear request or statement in each message. The user’s bank might even be notified, letting them know a loan should be drafted up. When the conversation gets closer to the decision-making moment, the user can be looped in for final approval.
LAM Example #3: Help insurance agents understand customers
Ultimately, Salesforce AI is committed to using the power of technology to improve the way businesses operate at every scale, and LAMs are a great example of what that might look like in the years ahead. I’m confident this vision will reach into every aspect of operations, from the back office to the front lines of marketing — including applications we can’t even imagine yet — but there’s probably no better single example than interacting with customers.
Imagine an agent at an insurance company, for instance. A big part of that agent’s day will be meeting with customers, both existing and potential, to discuss their needs and develop the relationship. The core of this process is the human touch that only a flesh-and-blood agent can deliver, but it’s surrounded by repetitive tasks that an LAM could accelerate dramatically. Consider the following flow, augmented from one step to the next by generative AI:
- The agent meets with a customer over Zoom, discussing their needs, ideas, and possible next steps.
- A transcript of the call is automatically recorded and organized with other relevant CRM information.
- After the call, the LAM reviews the transcript, summarizing its most important moments and sending the result to the agent for easy review later on.
- Additionally, the LAM will identify next steps worth taking, such as providing additional information mentioned on the call. This understanding is used to automatically draft a follow-up email, followed by a search through the company’s literature for any relevant documents that may be included as attachments. The agent is then notified that next steps are ready to be carried out, allowing for final confirmation and a quick proofread before doing so.
- Finally, the LAM’s understanding of the agency’s processes allows it to suggest further steps to help keep the agent productive and focused, whether it’s an upsell opportunity based on previous customer decision or simply a subsequent meeting with an automatically suggested agenda to advance the conversation.
- Along the way, the LAM may keep an eye out for signs that other stakeholders may need to be looped in. For example, a customer showing signs of frustration or hesitation may be deemed an “at risk” account, and be referred to a customer service specialist focusing specifically on satisfaction retention.
I think this is a compelling vision of individual empowerment, but the real transformation comes courtesy of the scalability of LAMs. Imagine an entire business augmenting its staff with tools of such sophistication, and how much time and expense can be saved in the aggregate — to say nothing of the way LAM suggestions can help prevent mistakes, recommend successful strategies, and more. This is a technology that can truly deliver value at any scale of deployment.
What’s the future of large action models?
So far we’ve talked about LAMs that serve individual users, but there are many, many more forms that this technology will likely take. It’s equally easy to imagine LAMs that serve groups or even entire organizations. And while all LAMs will benefit from their flexibility, I expect a diverse range of possibilities from the very general — analogous to the “executive assistant” concepts described above—to highly-tailored, domain-specific agents that address niche problems. And many LAMs — all, eventually — will be designed to learn from their experiences, whether it’s gathering more and more expertise in solving an organizational problem, or growing increasingly personalized to the needs and preferences of individual users.
And who’s to say LAMs will operate individually? One can just as easily imagine multiple LAMs working together, each optimized for a different set of goals, with another LAM dedicated to the task of orchestrating their efforts and communicating with their user or users, be it an individual, a team, or even an entire organization. In other words, it’d constitute an upgrade from a single personal assistant to an entire team, all unified by a “chief of staff” reporting to the human in charge.
The possibilities become even more interesting when we consider LAMs created for the sole purpose of interacting with other LAMs or teams of LAMs; imagine, for instance, an agent deployed by one of the car dealerships in the example above that specializes in handling inbound requests from the personal LAMs representing potential customers, or iterating with the LAMs representing the car manufacturers themselves. They’d retain the transparency and general application that makes all LAMs valuable, especially when evaluating their behavior in hindsight, but operate at the far higher speeds and efficiencies that machine-to-machine communication enables.
How large action models can deliver impact
Although many technical hurdles lay ahead in making the full power of LAMs a reality, the core challenge is a simple one to articulate: the world is not a static place, and any agent meant to interact with it must be flexible enough to adapt gracefully to changing circumstances. In the case of our car buying example, that means keeping tabs on leads and realizing when a desirable car has been sold before the user had a chance to make an offer, or even updating its suggestions in the event that a recall is issued in the midst of the research process. In the case of our insurance agency example, an awareness of current events—especially those local to the customer — will be essential in providing useful and up-to-date information, ranging from changes in industry regulation to extreme weather events.
In all cases, a good LAM will define itself by its understanding of when to notify its human user or request clarification. Doing it too often will be annoying and disruptive, and might even cancel out the benefits of an LAM to begin with. Doing it too rarely, however, all but guarantees that potentially serious, unwanted side effects flourish, ranging from deleting an important email to requesting an unwanted loan from the user’s own bank. Like a good personal assistant, an LAM will need good instincts to strike the right balance.
It also means tapping into one of the most powerful features of LAMs, which is their ability to learn. As LAMs are exposed to more and more real-world experience working alongside us, human feedback can be used to further refine their behavior. Additionally, LAMs can extract valuable interpretations of flows and processes by poring over data ranging from customer service transcriptions to event logs, piecing together the ideal steps that connect a given starting point to the most desirable outcome.
To be clear, an LAMs job isn’t just turning a request into a series of steps, but understanding the logic that connects and surrounds them. That means understanding why one step must occur before or after another, and knowing when it’s time to change the plan to accommodate changes in circumstances. It’s a capability we demonstrate all the time in everyday life. For instance, when we don’t have enough eggs to make an omelet, we know the first step has nothing to do with cooking, but with heading to the nearest grocery store. It’s time we built technology that can do the same.
Can you trust a large action model?
There’s no question LAMs will be getting uncannily good at the kind of fluency and communication that much of the above examples would require. But it’s still not a given that they can be trusted to behave in predictable, effective ways with the consistency necessary for regular use in the real world.
Of course, if trust is already a challenge when it comes to generating text and images — and it certainly is — it’s an even bigger one when it comes to taking action. And the burden of ensuring safety and reliability only grows when multiple LAMs work together in concert. For this reason, I believe it’s essential that even at their most independent, LAMs are designed to keep humans in the loop before critical actions are taken. No matter how advanced this technology gets, I envision it as a tool — albeit an unusually intelligent one — and one that humans are always free to control as they see fit.
Conclusion
After a decade of AI developments that have been nothing short of historic, it’s a testament to the potential of LAMs that so many of us in the research world feel like the biggest transformations are only just coming into view. With the right guidance and a commitment to human empowerment, I believe LAMs can usher into a new era of productivity, ease, and clarity, making us better at the tasks we find most engaging while freeing us from those we don’t. And with its decades of history in the enterprise world, I can’t imagine a better place to pursue this vision than Salesforce.
Special thanks to Alex Michael, Peter Schwartz, and the Salesforce Futures team for their contributions to the writing of this piece.