Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this episode of Latent Space, Lance Martin from Langchain discusses the emerging field of context engineering, which he defines as the challenge of feeding language models just the right context at each step in multi-tool agent workflows. (01:01) The conversation covers five key categories of context engineering: offloading context to external systems, reducing context through summarization, retrieval strategies, context isolation via multi-agent architectures, and caching techniques.
Lance Martin works at Langchain and LangGraph, where he focuses on agent development and AI engineering best practices. He's the creator of OpenDeepResearch, which according to benchmarks is one of the best performing open-source deep research agents. Lance has been building agents for over a year and regularly shares tutorials and insights on context engineering techniques through courses and blog posts.
Alessio is the founder of Kernel Labs and co-host of the Latent Space podcast. He brings extensive experience in AI engineering and has been closely involved with the Langchain ecosystem, speaking at AI Engineer conferences.
Swyx is the founder of Small AI and co-host of Latent Space podcast. He's credited with coining the term "AI Engineer" and is known for his insights into AI industry trends and developer tooling.
While prompt engineering focuses on crafting the right human message for chat models, context engineering addresses the more complex challenge of managing context flow in agent workflows where information comes from multiple tool calls over extended trajectories. (02:18) Lance explains that agents typically involve 50-100+ tool calls, with production agents reaching hundreds of calls. This creates exponentially growing context that must be carefully managed to avoid hitting context windows and performance degradation. The key insight is that naive agents that simply append all tool call results to message history quickly become expensive and unreliable.
One of the most effective techniques is offloading token-heavy tool call results to external systems like file systems or agent state objects, rather than passing full context back to the LLM. (06:24) Instead of sending raw tool output, you provide summaries or URLs that allow the agent to fetch full context on demand. Lance demonstrates this with his OpenDeepResearch agent, where he carefully prompts summarization models to create exhaustive bullet points that preserve recall while compressing content. This approach dramatically reduces token costs - his research agent went from $1-2 per run to much more economical operation.
The debate between single-agent versus multi-agent architectures comes down to the nature of the task. (10:04) Multi-agent works exceptionally well for "read-only" tasks like research gathering where sub-agents collect information in parallel and writing happens in a single step at the end. However, for "write-heavy" tasks like coding where sub-agents must make coordinated decisions, conflicts arise when trying to compile results. Lance emphasizes using multi-agents only for easily parallelizable problems and avoiding scenarios where agents must communicate complex state changes.
Traditional vector store indexing isn't always necessary - simple agentic search using basic file tools can be surprisingly effective. (15:47) Lance's benchmarks comparing different retrieval approaches for LangGraph documentation showed that an "llms.txt" file with good descriptions plus simple file loading tools often outperformed complex semantic search pipelines. This aligns with Anthropic's Claude Code approach, which uses no indexing and relies purely on agentic search with tools like grep. The key is providing good metadata descriptions so the agent knows what to retrieve.
When building on rapidly improving models, avoid over-structuring systems that may bottleneck future capabilities. (43:02) Lance shares his year-long journey building OpenDeepResearch, where initial structured workflows became obstacles as models improved. He started with rigid, non-tool-calling workflows because "everyone knew tool calling wasn't reliable" in early 2024. As models advanced, this structure prevented him from leveraging better tool calling and MCP integration. The lesson: add minimal structure needed for today's models while maintaining flexibility to remove constraints as capabilities grow.