Latent Space: The AI Engineer Podcast•December 26, 2025

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Bryan and Bill from OpenAI discuss Codex Max, a long-running coding agent designed to work for 24+ hours, manage its own context, and spawn sub-agents, highlighting the importance of model personality, trust, and evolving AI agent capabilities for coding and personal automation.

AI & Machine Learning

Indie Hackers & SaaS Builders

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

At the AI Engineer Conference, we caught up with Bryan and Bill from OpenAI's Codex team right after their launch of Codex Max, a new long-running coding agent designed to work for 24+ hours straight while managing its own context window. (01:27) The discussion reveals how OpenAI is shifting from traditional model training to building agents with distinct personalities that developers can trust. (03:02) Both researchers shared insights on how they train models to exhibit specific behavioral characteristics like communication, planning, and self-checking—essentially turning software engineering best practices into measurable model behaviors. The conversation also explored how the abstraction layer is moving from individual models to complete agent systems that can spawn sub-agents and work in parallel across entire codebases. (12:03)

Main Theme: The evolution from model-centric to agent-centric AI development, where personality, trust, and long-running autonomous capabilities become as important as raw technical performance.

Speakers

Bryan (Brian)

A key member of OpenAI's Codex training team who worked closely with the GPT-5 training team focusing on personality development for coding models. He has successfully launched open source projects written entirely by Codex and represents the bleeding edge of agent-first development at OpenAI, where approximately 50% of employees now use Codex daily.

Bill

Part of OpenAI's Codex team working on frontier coding model development and agent optimization. He specializes in the technical implementation of long-running coding agents and collaborates closely with coding partners to develop tool integrations and discover model capabilities that weren't initially anticipated by the training team.

Key Takeaways

Train Models for Personality, Not Just Performance

OpenAI discovered that trust between developers and coding agents requires specific behavioral characteristics beyond raw capability. (03:02) They identified communication, planning, context gathering, and self-checking as essential personality traits that mirror best software engineering practices. This approach transforms abstract behaviors into measurable training targets, allowing models to act more like trusted colleagues than mere tools. The practical impact is significant—developers at OpenAI went from 50% adoption to daily usage when the model learned to communicate its thought process and planning steps.

Embrace Model Habits and Preferences

Codex has developed specific tool preferences through training, such as strongly preferring "rg" (ripgrep) over "grep" for search operations. (07:48) Rather than fighting these preferences, successful implementations work with them by naming tools to match the model's training patterns. Partners discovered that renaming tools to match Codex's terminal-style expectations dramatically improved tool-call performance, demonstrating that understanding and accommodating model habits can unlock better results than trying to force generalization.

Move from Model-Level to Agent-Level Abstractions

The future of AI development is shifting from optimizing individual model calls to packaging complete agent systems. (12:03) Rather than constantly adapting to new model releases and API changes, developers can build on top of complete agents like Codex that include their own harness, tooling, and behavioral patterns. This allows teams to focus on higher-level integration work while the agent handles the complexity of optimal model usage, sandboxing, and context management internally.

Design for Sub-Agent Composition and Parallel Work

Codex Max was specifically designed to manage its own context window and spawn sub-agents for parallel work across different parts of a codebase. (14:45) This architectural approach enables agents to hand off context to specialized sub-agents, creating a network effect where complex problems can be decomposed and solved simultaneously. The practical application extends beyond coding—agents can create custom tools by spinning up Codex instances to write integrations or plugins for specific APIs, making software self-customizable at runtime.

Prioritize Applied Evals Over Academic Benchmarks

OpenAI shifted focus from academic evaluations to "applied evals" that capture real-world use cases and customer needs. (18:03) This approach treats model development like hiring and mentoring, where models need job descriptions (prompts), mentorship (guardrails), and performance reviews (evals) to improve at specific tasks. Multi-turn evaluations using LLM-as-a-judge can assess entire agent trajectories, enabling models to self-improve by reviewing their own performance and updating their instructions for future tasks.

Statistics & Facts

Approximately 50% of OpenAI employees initially started using Codex when it first launched, but now those employees use it daily for their development work. (16:43)
Codex Max can run continuously for 24+ hours or more, with Bryan reporting he has successfully run it for multiple days by closing and reopening his laptop. (01:44)
Tool performance significantly improves when tools are named to match Codex's training patterns - for example, calling a search tool "rg" instead of "grep" results in measurably better performance. (08:26)