MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems

Abstract

While large language models (LLMs) have shown promising capabilities as zero-shot planners for embodied agents, their inability to learn from experience and build persistent mental models limits their robustness in complex open-world environments like Minecraft. We introduce MINDSTORES, an experience-augmented planning framework that enables embodied agents to build and leverage mental models through natural interaction with their environment.

Drawing inspiration from how humans construct and refine cognitive mental models, our approach extends existing zero-shot LLM planning by maintaining a database of past experiences that informs future planning iterations. The key innovation is representing accumulated experiences as natural language embeddings of (state, task, plan, outcome) tuples, which can then be efficiently retrieved and reasoned over by an LLM planner to generate insights and guide plan refinement for novel states and tasks. Through extensive experiments in the MineDojo environment, a simulation environment for agents in Minecraft that provides low-level controls for Minecraft, we find that MINDSTORES learns and applies its knowledge significantly better than existing memory-based LLM planners while maintaining the flexibility and generalization benefits of zero-shot approaches, representing an important step toward more capable embodied AI systems that can learn continuously through natural experience.

Introduction

MINDSTORES is a cutting-edge AI framework that enhances large language model (LLM) planning with experiential learning. Inspired by human cognition, MINDSTORES enables embodied agents to build and refine mental models by storing and analyzing past interactions. By leveraging a structured memory system, it allows AI systems to improve decision-making in open-world environments, such as Minecraft, through continuous adaptation. This approach bridges the gap between zero-shot planning and lifelong learning, bringing us closer to more autonomous and intelligent embodied agents.

MINDSTORES agent mining coal in a ravine.

MINDSTORES High Level Tasks

Using the MineCLIP model, MINDSTORES can perform high level MT5+ tasks such as building a base, mining iron, and more.

Architecture

We introduce MINDSTORES, an experience-driven AI framework that enhances large language model (LLM) planning with memory-informed decision-making, enabling embodied agents to adapt and learn continuously in open-world environments like Minecraft. MINDSTORES is built upon three core components: (1) an experience database that accumulates and retrieves past interactions to refine future plans, (2) a semantic retrieval mechanism that efficiently selects relevant experiences to inform decision-making, and (3) an iterative planning and outcome prediction module that refines strategies based on past successes and failures.

Real-time Architecture Example

MINDSTORES real-time architecture diagram showing task planning for crafting iron boots

The MINDSTORES architecture leverages a large language model (LLM) to enhance decision-making in dynamic environments like Minecraft. It integrates a structured memory system that stores past experiences, allowing the model to refine its plans based on previous successes and failures. This system dynamically updates its strategies by observing the current state and retrieving relevant experiences from its database. The architecture supports iterative planning, where the LLM generates an initial plan, observes the environment, and adjusts its actions accordingly. This approach enables the model to learn continuously, adapt to new challenges, and improve task execution over time, demonstrating significant advancements in long-horizon planning and open-world decision-making.

Performance Analysis

Success Rate vs. k Value for Crafting Tasks

Graph showing success rates vs k value for different crafting tasks

This analysis demonstrates the relationship between success rates and k-value (number of knowledge store retrievals) across various Minecraft crafting tasks. The results show varying levels of success depending on task complexity: Iron Boots achieving the highest success rate (>30%) at higher k values, while Torch and Iron Pickaxe show moderate improvement (≈15%). Minecart crafting reaches about 10% success at k=20, while Diamond acquisition remains challenging with success rates below 5%.

Novel Learning Iterations Across Tasks

Bar chart comparing learning iterations across different tasks

Comparing MINDSTORES against Voyager and Reflexion, this analysis reveals the efficiency of learning across various Minecraft tasks. For basic tasks like Mine Wood and Mine Cobblestone, all methods perform similarly with minimal iterations. However, MINDSTORES demonstrates superior efficiency in complex tasks, particularly in Mining Iron, where it requires significantly fewer learning iterations compared to existing approaches.

Comparison with DEPS Architecture

Comparison results with DEPS architecture

Results demonstrate consistent improvements over the DEPS LLM architecture model for Minecraft, highlighting MINDSTORES' enhanced performance across various metrics and task complexities.

Conclusion

In this work, we introduce MINDSTORES, an experience-augmented planning framework that enables embodied agents to build and refine mental models through continuous interaction with their environment. By leveraging a structured memory system and GPT-4 for in-context reasoning, MINDSTORES allows agents to learn from past experiences, adapt strategies dynamically, and generalize insights across novel tasks.

Our approach demonstrates superior performance in long-horizon planning, task completion, and open-world decision-making within Minecraft. MINDSTORES represents a significant step toward developing embodied AI systems that can continuously learn, reason, and act without requiring explicit model finetuning.

BibTeX

@misc{chari2025mindstoresmemoryinformedneuraldecision,
      title={MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems}, 
      author={Anirudh Chari and Suraj Reddy and Aditya Tiwari and Richard Lian and Brian Zhou},
      year={2025},
      eprint={2501.19318},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2501.19318}, 
}