Latent Space: The AI Engineer Podcast•September 5, 2025

A Technical History of Generative Media

A technical journey through the evolution of generative media, focusing on FAL's strategic pivot to specialize in optimizing image and video model inference, scaling from a few developers to serving over 2 million developers with 350 unique models across image, video, and audio generation.

AI & Machine Learning

Indie Hackers & SaaS Builders

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this episode, Alessio and Swyx sit down with Gurkum and Batuhan from FAL to explore how their company became the dominant force in generative media infrastructure. They reveal how their early insight about stable diffusion's terrible GPU utilization (04:27) led to a crucial pivot from Python runtime to specialized inference, eventually capturing 2 million developers and scaling beyond $100 million revenue. The conversation covers their aggressive optimization strategies using custom CUDA kernels, the explosive growth of video models reaching 50% of their revenue, and their partnerships with major AI labs like Google DeepMind for video models like Veo-3 (06:06)—offering both technical leaders and ambitious builders a blueprint for riding the generative AI wave while staying ahead of commoditization.

Speakers

Gurkum Gursel

Co-founder of FAL (formerly Features and Labels), a generative media platform serving 2 million developers and powering over $100M in annual revenue. Former Amazon engineer who pivoted from building feature stores to becoming a leader in AI inference optimization, specializing in custom CUDA kernels and GPU performance engineering for diffusion models.

Batuhan Taskaya

Head of Engineering at FAL and co-developer of the Python programming language. Joined FAL in 2021 before their seed round, bringing deep expertise in developer tools and Python runtime optimization. Now leads engineering efforts in building custom kernels, distributed systems, and the infrastructure serving thousands of H100 GPUs across 24 data centers.

Alessio Fanelli (Host)

Founder of Kernel Labs and co-host of the Latent Space podcast. Previously worked in venture capital and has extensive experience in AI infrastructure and developer tools.

Swyx (Host)

Founder of Small AI and co-host of the Latent Space podcast. Former Temporal and Netlify engineer, known for his insights on AI engineering and developer experience.

Key Takeaways

Bet on Niche Market Leadership Over Big Tech Competition

When faced with the choice between competing against Google, OpenAI, and Anthropic in language models versus becoming leaders in a fast-growing niche like generative media, choose the niche. (09:08) Language model hosting becomes a losing battle when you're ultimately competing against Google, who can offer search-integrated models for free because it's existential to their business. Generative media was a net new market with no incumbent threatening their core business model.

Performance Engineering Creates Defensible Moats in Moving Markets

Write custom kernels and optimize inference engines when the open source community hasn't caught up yet. (10:20) At Stable Diffusion 1.5's launch, basic convolutions on A100s only achieved 30% GPU utilization because nobody cared about optimization. This created a 10x performance gap that translated directly into revenue advantage. The key insight: new architectures and chips constantly create optimization gaps that justify dedicated performance teams.

Latency Is Revenue: Treat Speed as Product Feature #1

Every 10% improvement in generation speed correlates directly with user engagement and monetization, mirroring Amazon's page load time studies. (16:38) One customer's AB test showed that artificially slowing down FAL's image generation significantly reduced user engagement and image creation volume. In generative media, users can't stream responses like with LLMs - they need the full artifact, making latency optimization critical for retention.

Hire Master Builders From Discord, Not Just Resume Screeners

Recruit directly from community Discord servers and Twitter spaces where people are already building with your technology obsessively. (58:50) FAL's best applied ML engineers came from people already training LoRAs on their platform, posting viral Hugging Face spaces, and creating innovative workflows. These builders have proven product intuition and domain expertise that traditional hiring pipelines miss entirely.

Scale Into Adjacent Problems, Not Horizontal Abstractions

When you dominate inference for one model type, expand to related technical challenges rather than building general-purpose platforms. (22:30) FAL evolved from optimizing Stable Diffusion into building distributed file systems, container runtimes, CDNs, and content moderation - all specialized for generative media workloads across 24 data centers and 6 cloud providers. Each adjacent problem leveraged their core competency while avoiding the generic platform trap.

Compelling Stories

Available with a Premium subscription

Strategies & Frameworks

Available with a Premium subscription

Thought-Provoking Quotes

Available with a Premium subscription

Statistics & Facts

FAL serves 2 million developers on their platform with around 350 image, video, and audio models, generating over $100 million in revenue. (02:15)
In the first month of Flux model launch, FAL revenue jumped from $2 million to $10 million, reaching $20 million the following month - demonstrating how new model releases can drive exponential growth in the generative media space. (06:06)
Video models now represent over 50% of FAL's revenue (up from 18% in February), with the company managing over 10,000 H100 GPU equivalents across 24 data centers in four countries to handle this scale. (33:44)