Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this fascinating episode, Alex Shulman discusses his transition from head of AI training at Every to launching GoodStart Labs, a company at the intersection of AI and games. The conversation explores how Alex and his co-founder Tyler built an AI version of Diplomacy that garnered massive attention - 50,000 unique viewers on Twitch and millions of social media impressions. (02:58) Their work revealed distinct personalities across different AI models, with some becoming master schemers while others remained stubbornly honest.
Alex is the head of AI training at Every, where he leads consulting and training for clients across industries from construction to Fortune 500 companies. He was previously co-founder of AI Camp in 2021, teaching people to fine-tune GPT-2, and worked at Salt, a startup that pivoted three times before finding product-market fit in drug discovery. He also has experience at Amazon Robotics and has collaborated with the Ellison Medical Institute on life sciences applications.
Dan is the co-founder of Every, a publication focused on AI and business strategy. He conducts in-depth interviews exploring the intersection of technology, business, and human creativity, with particular expertise in evaluating AI models and their practical applications.
Unlike traditional benchmarks that can be gamed or taught to, dynamic game environments like Diplomacy expose authentic model behaviors and distinct personalities. (06:07) Alex discovered that O3 and Claude 4 became master schemers, forming coalitions and betraying allies strategically, while Gemini 2.5 Pro excelled at execution and understanding game mechanics. Meanwhile, Claude consistently lost because it was too honest and kept pushing for draws. This reveals that games function as both evaluation tools and training environments, providing richer insights than static evaluations that models can easily saturate.
The effectiveness of AI models in complex tasks depends heavily on three infinite problem spaces: information representation, tool access, and prompt design. (12:18) Alex found dramatic performance differences when models used optimized prompts versus baseline ones, with GPT-5 showing the biggest improvement jump of any model. This suggests that working with language models is more like playing an instrument or practicing art than pure engineering, requiring intuition and experimentation to reach local maximums in performance.
Games provide low-stakes environments where AI can explore, make mistakes, and learn without real-world consequences. (25:49) Research shows that vision models trained on games like Snake (by thinking of it as a Cartesian coordinate system) actually improved at math problems better than models trained directly on math. This suggests that rich, challenging game environments can help models develop transferable skills and generalization capabilities that apply beyond the specific game context.
People who actively use AI tools become less fearful because they understand both capabilities and limitations, while those who don't adopt them remain fearful and angry. (36:04) Alex observed that games make AI more relatable and less scary - viewers could watch models make mistakes, use suboptimal strategies, and occasionally succeed brilliantly. This hands-on exposure through engaging formats like games can bridge the growing knowledge gap and help people develop realistic expectations about AI capabilities.
Rather than replacing junior workers, AI dramatically accelerates learning curves for motivated young professionals. (53:13) Dan observed how Alex progressed from poor writing to exceptional output within months by recording conversations, creating prompts, and never repeating mistakes. This suggests companies avoiding young hires due to AI fears are making a strategic error - a 23-year-old with AI tools and mentorship can achieve unprecedented results and rapidly develop expertise that would normally take years.