Latent Space: The AI Engineer Podcast•September 11, 2025

Context Engineering for Agents - Lance Martin, LangChain

Lance Martin discusses the emerging field of context engineering, exploring strategies for managing and optimizing context in AI agents, including techniques like offloading, retrieval, context reduction, and multi-agent approaches. He shares insights from his work on OpenDeepResearch and highlights the challenges of building agents with rapidly evolving language models, emphasizing the importance of adaptability and minimal structure.

AI & Machine Learning

Indie Hackers & SaaS Builders

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this episode of Latent Space, Lance Martin from Langchain discusses the emerging field of context engineering, which he defines as the challenge of feeding language models just the right context at each step in multi-tool agent workflows. (01:01) The conversation covers five key categories of context engineering: offloading context to external systems, reducing context through summarization, retrieval strategies, context isolation via multi-agent architectures, and caching techniques.

Main theme: Context engineering as a critical discipline for building effective AI agents that can manage token costs and performance degradation in long-running, tool-heavy workflows

Speakers

Lance Martin

Lance Martin works at Langchain and LangGraph, where he focuses on agent development and AI engineering best practices. He's the creator of OpenDeepResearch, which according to benchmarks is one of the best performing open-source deep research agents. Lance has been building agents for over a year and regularly shares tutorials and insights on context engineering techniques through courses and blog posts.

Alessio Partner

Alessio is the founder of Kernel Labs and co-host of the Latent Space podcast. He brings extensive experience in AI engineering and has been closely involved with the Langchain ecosystem, speaking at AI Engineer conferences.

Swyx

Swyx is the founder of Small AI and co-host of Latent Space podcast. He's credited with coining the term "AI Engineer" and is known for his insights into AI industry trends and developer tooling.

Key Takeaways

Context Engineering Goes Beyond Prompt Engineering

While prompt engineering focuses on crafting the right human message for chat models, context engineering addresses the more complex challenge of managing context flow in agent workflows where information comes from multiple tool calls over extended trajectories. (02:18) Lance explains that agents typically involve 50-100+ tool calls, with production agents reaching hundreds of calls. This creates exponentially growing context that must be carefully managed to avoid hitting context windows and performance degradation. The key insight is that naive agents that simply append all tool call results to message history quickly become expensive and unreliable.

Offload Context to External Systems Rather Than Accumulating in Message History

One of the most effective techniques is offloading token-heavy tool call results to external systems like file systems or agent state objects, rather than passing full context back to the LLM. (06:24) Instead of sending raw tool output, you provide summaries or URLs that allow the agent to fetch full context on demand. Lance demonstrates this with his OpenDeepResearch agent, where he carefully prompts summarization models to create exhaustive bullet points that preserve recall while compressing content. This approach dramatically reduces token costs - his research agent went from $1-2 per run to much more economical operation.

Multi-Agent Context Isolation Works Best for Read-Heavy, Parallelizable Tasks

The debate between single-agent versus multi-agent architectures comes down to the nature of the task. (10:04) Multi-agent works exceptionally well for "read-only" tasks like research gathering where sub-agents collect information in parallel and writing happens in a single step at the end. However, for "write-heavy" tasks like coding where sub-agents must make coordinated decisions, conflicts arise when trying to compile results. Lance emphasizes using multi-agents only for easily parallelizable problems and avoiding scenarios where agents must communicate complex state changes.

Simple Agentic Search Can Outperform Complex RAG Pipelines

Traditional vector store indexing isn't always necessary - simple agentic search using basic file tools can be surprisingly effective. (15:47) Lance's benchmarks comparing different retrieval approaches for LangGraph documentation showed that an "llms.txt" file with good descriptions plus simple file loading tools often outperformed complex semantic search pipelines. This aligns with Anthropic's Claude Code approach, which uses no indexing and relies purely on agentic search with tools like grep. The key is providing good metadata descriptions so the agent knows what to retrieve.

Apply the Bitter Lesson to AI Engineering by Keeping Systems General

When building on rapidly improving models, avoid over-structuring systems that may bottleneck future capabilities. (43:02) Lance shares his year-long journey building OpenDeepResearch, where initial structured workflows became obstacles as models improved. He started with rigid, non-tool-calling workflows because "everyone knew tool calling wasn't reliable" in early 2024. As models advanced, this structure prevented him from leveraging better tool calling and MCP integration. The lesson: add minimal structure needed for today's models while maintaining flexibility to remove constraints as capabilities grow.

Statistics & Facts

Typical Manus production tasks involve around 50 tool calls, while Anthropic's production agents (likely Claude Code) can involve hundreds of tool calls per session. (03:08)
Lance's naive OpenDeepResearch agent initially used 500,000 tokens per run, costing $1-2 per execution before context engineering optimizations were applied. (03:40)
Lance's LangGraph documentation benchmark tested three retrieval approaches across 3 million tokens of documentation using 20 coding questions, with simple llms.txt approach often outperforming complex vector search. (17:28)