Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this episode, Anastasios Angelopoulos returns to Latent Space to recap Arena's incredible 2024 journey—from building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI models. (01:36) The conversation covers Arena's origin story as an academic project incubated by Anjney Midha at a16z, their decision to spin out as a company to achieve necessary scale, and how they're deploying their $100M raise primarily on inference costs and migrating from Gradio to React. (03:38) Anastasios addresses the controversial "Leaderboard Delusion" paper that critiqued Arena, explaining how their response demolished the paper's factual errors and misrepresentations. (10:18) The discussion also explores Arena's massive scale with 250M+ conversations, their expansion into occupational verticals and multimodal capabilities, and the viral "Nano Banana" moment that changed Google's market share overnight and validated the economic importance of multimodal AI models.
Co-founder of Arena (formerly LMArena), originally part of the LMSYS research group at Berkeley. Angelopoulos helped build what became the industry's most trusted AI model leaderboard from a basement project into a company that raised $100M and serves tens of millions of monthly conversations. He has extensive experience in machine learning evaluation and statistical analysis, particularly around response bias correction and benchmark integrity.
Anastasios emphasizes that Arena's public leaderboard operates as a "charity" and loss leader, maintaining strict independence from financial influence. (17:23) Models cannot pay to get on the leaderboard, cannot pay to get off, and scores reflect millions of real votes from actual users. This principle ensures the leaderboard serves as an unbiased North Star for the industry rather than a pay-to-play system like traditional analyst firms. The integrity of real-world user feedback over manufactured benchmarks creates lasting trust and credibility in an industry where evaluation standards are constantly questioned.
The decision to spin out from Berkeley's LMSYS group came from recognizing that only a company structure could provide the resources necessary to scale Arena's mission. (02:47) Academic projects and nonprofits lacked the funding and operational capabilities needed to handle tens of millions of monthly conversations and maintain platform quality. This pragmatic approach enabled Arena to secure $100M in funding primarily for inference costs and technical infrastructure, demonstrating that impact-driven missions sometimes require commercial structures to achieve their goals.
Arena's competitive advantage lies in users inputting their own real-world use cases rather than evaluating on pre-generated scenarios. (07:07) This organic approach provides a level of realism that distinguishes Arena from competitors who rely on synthetic benchmarks or consultant-generated test cases. The platform captures authentic user intent and demonstrates how models perform on actual problems people are trying to solve, creating more actionable insights for model developers and users alike.
Despite having tens of millions of users, Anastasios recognizes that consumer loyalty is fragile and temporary. (21:10) Users can leave at any moment, making retention a daily challenge that requires constant value delivery. Simple features like sign-in and persistent conversation history became major retention drivers, showing how basic user experience improvements can have outsized impact. This mindset of earning users every single day prevents complacency and drives continuous improvement in a competitive consumer landscape.
The viral "Nano Banana" moment demonstrated how multimodal capabilities, particularly image generation, drive billions in market value and fundamentally change competitive dynamics. (13:20) What initially seemed like a non-essential AI capability proved economically critical for marketing, design, and content creation use cases. This shift in perspective highlights how consumer-facing AI features can have enterprise implications and how seemingly frivolous capabilities often unlock massive business value when they meet real user needs.