You’ve seen the ripples of what AI agents can do. Now, get ready for the tidal wave.
The future of AI agents? It’s agents that can hear your commands and see the world alongside you. It’s AI orchestrating AI, creating a symphony of digital labor. It’s agents with digital memories that learn your patterns and foresee your next move. Forget incremental upgrades. The AI agent of tomorrow will be a force multiplier unleashed.
Here are five ways that agents — like those powered by Agentforce, the agentic AI layer of the Salesforce platform — will evolve.
1. Agents will understand much more than text
AI agents today are primarily text-based: They interact with and process information in the form of typed messages, commands, or queries. But they can’t understand or respond to images, audio, or video, and this lack of contextual richness limits their ability to handle complex, real-world situations.
“We can do a lot more than that. The world is full of information that’s not words,” said Juan Carlos Niebles, research director at Salesforce. “An agent will be able to look at an image or listen to an audio clip or a person’s voice. You give it eyes and ears, and you open a door to other types of data that the agent could understand and process.”
In field service, a service technician could use their Agentforce app to record audio of a sputtering car engine, and ask Agentforce to diagnose the problem. In marketing, AI agents could analyze social media trends in real time, like a sudden spike in videos about ’80s-inspired leg warmers. In customer service, they could analyze audio from service calls, picking up on recurring negative sentiment from the tone of a customer’s voice.
To do this, AI agents must be able to see images and videos, and hear sounds. In a nutshell, this is accomplished by adding neural networks that can translate the input data from the new modality (voice, audio, video, visual) into tokens that a large language model (LLM) can understand.
To explore the future of multimodal AI, Salesforce AI Research has developed xGen-MM-Vid, a multimodal LLM that helps AI interpret videos. Take a look.
The future of AI agents is also voice: Soon, they’ll understand and respond to our spoken requests. Imagine verbally asking an agent to do the following:
“Access Q2 sales performance from the shared drive, and analyze KPIs with a focus on revenue growth, churn rate, and customer acquisition cost. Then, summarize the analysis, present your top findings, and identify the most important areas for improvement. Finally, recommend two actionable next steps to remedy the challenges.”
This is an enormous time-saver, but that’s only half the story. Since AI agents can quickly process huge amounts of data, they can help workers uncover trends they might otherwise miss, identifying subtle but important correlations between data.
2. Agents will work effortlessly with other agents
Right now, AI agents generally work solo to handle individual tasks, like customer service or inventory management. But soon, systems of multiple agents, each with a specialized role, will work together to carry out more complex tasks to achieve broader goals. This will be a force multiplier in terms of scale, speed, and strategic decision-making, fundamentally altering operational workflows and enabling a level of coordinated efficiency that’s not possible with an all-human workforce. Call it the age of agent-to-agent, or A2A.
For example, in ecommerce, a multi-AI agent system could include one agent handling customer inquiries, another managing inventory, and a third optimizing pricing based on demand, all working together in real time to improve efficiency and maximize sales.
This collaboration could be taken even further, with AI working across organizations, industries, and entire ecosystems to negotiate, optimize, inform, and maybe even influence decisions on a global scale.
Imagine a global supply chain where AI agents from different companies work together to coordinate logistics, source materials, and manage distribution. They react in real time to delays, demand shifts, and disruptions, keeping everything running smoothly. The result? A self-regulating, ultra-efficient network that practically runs itself.
But even in an increasingly autonomous A2A world, human oversight will remain critical for setting ethical boundaries, addressing unforeseen circumstances, and ensuring alignment with business strategy and human values. The role of humans may evolve to focus on strategic guidance, and ensuring the AI ecosystem serves human needs and goals.
These interdependent, multi-agent systems will require sophisticated coordination, orchestration, and governance (by humans and AI) that transcends traditional oversight models. Experts in Salesforce AI Research and the Global AI Practice recently laid out three methods of managing multi-agent complexity.
3. Orchestrator agents will manage teams of agents
Just like human workers, teams of AI agents need a manager to direct and coordinate different activities, and oversee multistep tasks through to competition. But in the future of AI agents, those managers will be other AI agents.
Silvio Savarese, executive president and chief scientist, AI Research, at Salesforce, recently described an orchestrator agent that coordinates the work of multiple specialist agents, similar to how a restaurant’s general manager oversees the work of hosts, servers, managers, chefs, cooks, and expediters.
What does this look like in an enterprise context? A service agent could process a customer’s inquiry while an inventory agent checks product availability. A logistics agent calculates shipping, and a billing agent reviews and processes payment options.

As Savarese noted, “The orchestrator agent coordinates all these inputs into a coherent, effective, on-brand, and contextually relevant response for the human at the helm to review, refine, and share with the customer.”
In other words, instead of juggling multiple individual agents, employees can work with one smart lead AI that coordinates everything behind the scenes.
And finally, orchestrator agents make it easy to grow and adapt. Think of orchestrator agents like smart connectors for your AI. When your business changes or you find new AI tools, you can easily add them to your system without having to rebuild everything from scratch. This keeps your AI setup ready for the future and able to handle new things.
4. Agents will get much better at reasoning
A reasoning engine is an AI system that mimics human-like decision-making and problem-solving based on certain rules, data, and logic. Essentially, it’s how an agent decides what actions to take, and which data is needed to take those actions.
A strong reasoning engine allows an AI to go beyond surface-level data, analyzing complex patterns, inferring user intent, and understanding the nuances of a user’s behavior, preferences, and needs. Instead of just reacting to keywords or simple commands, it can grasp the underlying context of a user’s interactions.
“This is about getting the AI agent to do more complex things. It’s not just a one-step task,” said Niebles. “It’s a chain of maybe 10 different actions to accomplish a goal.”
For example, here’s how an AI reasoning engine could power a multipronged, multistep marketing campaign:
- It analyzes data to identify target audiences and their preferences,
- Uses abductive (best-guess) reasoning to segment audiences and personalize messages,
- Orchestrates a campaign around the best channels and steps for the different segments,
- Monitors performance, and reallocates resources when necessary, and
- Reports on performance and strategy, identifying trends for future projects.
Agentforce and Salesforce’s Atlas Reasoning Engine represent significant steps forward, enabling this functionality. Shipra Gupta, senior director of product management at Salesforce, recently described some of these advancements:
Reasoning and Acting (ReAct)-style prompting, in which the system goes through a loop of reasoning, acting, and observing until a user’s goal is realized. The looping lets the system consider new information, and ask questions so the goal is fulfilled accurately.
Topic classification maps topics to a user’s intent or a job to be done. When a request is made, it’s mapped to a topic, which contains a set of instructions, business policies, and actions to fulfill the request. This helps define the scope of the task and the corresponding solution for the LLM.
Using LLMs for responses dramatically changes how AI talks to you. Before, it would give only action-focused answers. Now it understands context, just like a colleague. This lets you ask follow-up questions or request more details, making it much easier for the AI to help you get what you need.
Agents will be so intelligent that they’ll know when such high-level reasoning is overkill. Anthropic’s cofounder and chief scientist Jared Kaplan told MIT Technology Review that agents will know when to apply reasoning “when it’s really useful and important, but also [know not to waste] time when it’s not necessary.”
5. Agents will remember everything that matters
Today’s AI agents suffer from short-term memory, and that’s a problem.
Antonio Ginart, lead applied scientist for AI Research at Salesforce, described it this way: Imagine that you jotted down some notes from your workday on sticky notes, and the next day, the only thing you remembered were those few things you wrote down.
“That’s kind of what it’s like now between sessions with AI,” he said.
Long-term coherence (or memory) means agents can recall and understand what happened in past interactions over long periods of time, not just the most recent exchanges. This memory provides context to carry out current tasks. This is important not only for agents to handle multistep tasks, but in building better relationships with customers by recalling all past preferences, problems, and interactions.
True long-term coherence is not yet a fully solved problem in all scenarios, but that’s changing fast.
Imagine if an AI agent could follow a patient’s care over months or even years, tracking symptoms, appointments, test results, and treatments without ever losing context. It could remind the patient to refill a prescription, flag when a symptom changes, alert doctors to any changes, and adjust recommendations as their health evolves. There’s no repetition or starting over.
Long-term coherence will also help teams of agents collaborate more effectively by sharing what they know, allowing for better cross-functional work. One example: A legal agent and a logistics agent could work together to onboard a new partner without duplicating requests or missing any steps.
The future of AI agents: Multi-agent, multimodal, massive impact
As impressive as AI agents are today, what lies ahead is far more transformative: intelligent, coordinated AI agents operating across functions to drive efficiency, insight, and growth at scale. Specialized agents will uncover actionable insights from vast datasets. Scalable agents will absorb peak demand without compromise. And orchestrated agents will manage operations seamlessly across the enterprise.
Multimodal AI will further amplify this shift, allowing agents to understand and act on a mix of inputs like text, voice, images, and video, just as humans do. This will result in more natural interactions, richer insights, and broader applications.
The organizations that embrace this shift early will be best positioned to lead, faster to innovate, quicker to respond, and better equipped to deliver value.
Get articles selected just for you, in your inbox