AI and I•October 16, 2025

We Taught AI to Play Games—Now It’s a $3.6 Million Company

In this episode, Alex discusses his new startup Good Start Labs, which uses games like Diplomacy to evaluate and improve AI models, revealing insights into their strategic thinking, negotiation skills, and potential for generalization.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this fascinating episode, Alex Shulman discusses his transition from head of AI training at Every to launching GoodStart Labs, a company at the intersection of AI and games. The conversation explores how Alex and his co-founder Tyler built an AI version of Diplomacy that garnered massive attention - 50,000 unique viewers on Twitch and millions of social media impressions. (02:58) Their work revealed distinct personalities across different AI models, with some becoming master schemers while others remained stubbornly honest.

Main themes include using games as dynamic evaluation tools for AI models, the potential for games to improve AI training through synthetic data generation, and how play-based learning environments can bridge the gap between human understanding and AI capabilities

Speakers

Alex Shulman

Alex is the head of AI training at Every, where he leads consulting and training for clients across industries from construction to Fortune 500 companies. He was previously co-founder of AI Camp in 2021, teaching people to fine-tune GPT-2, and worked at Salt, a startup that pivoted three times before finding product-market fit in drug discovery. He also has experience at Amazon Robotics and has collaborated with the Ellison Medical Institute on life sciences applications.

Dan Shipper

Dan is the co-founder of Every, a publication focused on AI and business strategy. He conducts in-depth interviews exploring the intersection of technology, business, and human creativity, with particular expertise in evaluating AI models and their practical applications.

Key Takeaways

Games Reveal AI Model Personalities Better Than Static Tests

Unlike traditional benchmarks that can be gamed or taught to, dynamic game environments like Diplomacy expose authentic model behaviors and distinct personalities. (06:07) Alex discovered that O3 and Claude 4 became master schemers, forming coalitions and betraying allies strategically, while Gemini 2.5 Pro excelled at execution and understanding game mechanics. Meanwhile, Claude consistently lost because it was too honest and kept pushing for draws. This reveals that games function as both evaluation tools and training environments, providing richer insights than static evaluations that models can easily saturate.

Prompt Engineering Is an Infinite Art Form With Model-Specific Optimization

The effectiveness of AI models in complex tasks depends heavily on three infinite problem spaces: information representation, tool access, and prompt design. (12:18) Alex found dramatic performance differences when models used optimized prompts versus baseline ones, with GPT-5 showing the biggest improvement jump of any model. This suggests that working with language models is more like playing an instrument or practicing art than pure engineering, requiring intuition and experimentation to reach local maximums in performance.

AI Training Through Play Enables Safer Exploration and Generalization

Games provide low-stakes environments where AI can explore, make mistakes, and learn without real-world consequences. (25:49) Research shows that vision models trained on games like Snake (by thinking of it as a Cartesian coordinate system) actually improved at math problems better than models trained directly on math. This suggests that rich, challenging game environments can help models develop transferable skills and generalization capabilities that apply beyond the specific game context.

The Human-AI Knowledge Gap Requires Hands-On Experience

People who actively use AI tools become less fearful because they understand both capabilities and limitations, while those who don't adopt them remain fearful and angry. (36:04) Alex observed that games make AI more relatable and less scary - viewers could watch models make mistakes, use suboptimal strategies, and occasionally succeed brilliantly. This hands-on exposure through engaging formats like games can bridge the growing knowledge gap and help people develop realistic expectations about AI capabilities.

Young Professionals With AI Are Exponentially More Capable

Rather than replacing junior workers, AI dramatically accelerates learning curves for motivated young professionals. (53:13) Dan observed how Alex progressed from poor writing to exceptional output within months by recording conversations, creating prompts, and never repeating mistakes. This suggests companies avoiding young hires due to AI fears are making a strategic error - a 23-year-old with AI tools and mentorship can achieve unprecedented results and rapidly develop expertise that would normally take years.

Statistics & Facts

GoodStart Labs' AI Diplomacy game attracted 50,000 unique viewers on Twitch in one week and generated millions of social media impressions, making it the most-read Every article of the year. (00:30)
DeepSeek R1 performed competitively in Diplomacy games while being approximately 100 times cheaper than O3, demonstrating significant cost-performance advantages in certain AI applications. (08:27)
AlphaFold reduced protein folding research from six years of PhD-level work to thirty minutes, representing a dramatic acceleration in life sciences research capabilities. (27:01)