The Breakthrough of Agentic Memory in LLM Design

How This Agentic Memory Research Unifies Long Term and Short Term Memory for LLM Agents

Understanding Agentic Memory in LLMs

How do you create a large language model (LLM) that autonomously decides what to remember long-term, what to use for short-term context, and what to discard? Researchers from Alibaba Group and Wuhan University have introduced a revolutionary framework called Agentic Memory (AgeMem). This framework allows LLM agents to learn the management of both types of memory through a single policy, eliminating the need for manual heuristics or additional controllers.

In AgeMem, the model uses built-in memory tools integrated into its action space to determine when to store, retrieve, summarize, and forget information.

The Challenges Facing Current LLMs

Most existing frameworks treat long-term and short-term memory management as separate systems, which leads to several limitations. Long-term memory typically holds user profiles, task details, and past interactions, while short-term memory manages the active context, which consists of ongoing dialogue and relevant documents.

Here are some key issues with the traditional approaches:

Independent Optimization: Long-term and short-term memories are optimized separately, which harms their interaction.
Fragile Heuristics: Manual rules dictate when to write to memory or summarize content, often overlooking rare yet significant events.
Increased Complexity: The use of additional controllers or expert models complicates the system and raises costs.

How AgeMem Revolutionizes Memory Management

AgeMem redefines how memory functions within LLMs by integrating memory operations directly into the agent’s policy, eliminating the reliance on external controllers.

Memory Tools in Agentic Memory

In AgeMem, memory operations are treated as tools that the model can use. At any interaction step, the model has the option to generate standard text tokens or invoke these tools. The framework comprises six key tools for managing memory:

ADD: Stores new memory items along with their metadata.
UPDATE: Modifies existing memory entries.
DELETE: Removes items that are outdated or of low value.
RETRIEVE: Performs semantic searches on long-term memory, injecting relevant items into the current context.
SUMMARY: Condenses segments of dialogue into concise summaries.
FILTER: Removes irrelevant context segments that won’t aid future reasoning.

Training with Three-Stage Reinforcement Learning

The AgeMem framework employs a unique three-stage reinforcement learning (RL) approach that integrates long-term and short-term memory behavior. You might also enjoy our guide on Meta’s Acquisition of Manus: Implications for AI Strategy in.

Details of the Three Stages

Stage 1 – Long-Term Memory Construction: The agent interacts in a casual environment, observing information that may be relevant later. It uses the ADD, UPDATE, and DELETE tools to accumulate knowledge.
Stage 2 – Short-Term Memory Control: Here, the context resets while long-term memory remains intact. The agent must filter out distractor content while retaining useful information through SUMMARY and FILTER.
Stage 3 – Integrated Reasoning: Finally, the agent retrieves information from long-term memory and controls the short-term context to provide answers.

This design compels the model to rely on retrieval rather than residual context, effectively mimicking realistic long-term dependencies. (CoinDesk)

Reward Structure and Optimization Techniques

AgeMem utilizes a novel approach to reward design through a step-wise variant of Group Relative Policy Optimization (GRPO). For each task, multiple trajectories are sampled, and a terminal reward is calculated for each.

Components of the Total Reward

Task Reward: Evaluates answer quality, scored between 0 and 1 by an LLM judge.
Context Reward: Measures the quality of short-term memory processes, such as summarization and relevance preservation.
Memory Reward: Assesses long-term memory quality, considering the proportion of high-quality stored items and the relevancy of retrieved items.

Each reward component is weighted equally, ensuring balanced contributions to the learning signal. On top of that, penalties are applied for exceeding dialogue length limits or context overflow.

Experimental Validation and Results

The research team fine-tuned AgeMem using the HotpotQA training dataset and evaluated its performance across five distinct benchmarks, including:

ALFWorld – Text-based embodied tasks
SciWorld – Science-themed environments
BabyAI – Instruction-following tasks
PDDL – Planning tasks
HotpotQA – Multi-hop question answering

Metrics for evaluation include success rates for various tasks and the Memory Quality metric, which compares stored memories with supporting facts in HotpotQA.

When tested, AgeMem outperformed existing baselines significantly, achieving an average score of 41.96 on the Qwen2.5-7B-Instruct model compared to 37.14 for the best baseline. Likewise, it scored higher on Qwen3-4B-Instruct, with AgeMem reaching 54.31 while the best baseline managed only 45.74. For more tips, check out How separating logic and search boosts AI agent scalability.

Key Insights for Future Development

AgeMem offers a vital design principle for developing future LLM agents. Memory shouldn’t be treated as two separate systems but as an integral part of the agent’s learning policy. By converting memory operations into explicit tools and training them alongside language generation, AgeMem teaches agents when to remember, when to forget, and how to manage context more effectively throughout their interactions. (Bitcoin.org)

Conclusion

The introduction of Agentic Memory in LLM systems signifies a important advancement in how AI agents use memory. By allowing models to autonomously manage their memory, we can expect a new era of more efficient, intelligent, and capable AI applications.

FAQs

what’s Agentic Memory?

Agentic Memory is a framework that enables large language models to autonomously manage both long-term and short-term memory through a unified policy.

How does AgeMem improve memory management in LLMs?

AgeMem integrates memory operations directly into the model’s action space, allowing it to decide when to store, retrieve, summarize, or discard information without external controls.

What are the benefits of the three-stage reinforcement learning approach?

This method ensures that long-term memory remains consistent while the model refines its short-term memory management under various scenarios, enhancing its ability to retrieve relevant information.