Training Data•December 10, 2025

The Rise of Generative Media: fal's Bet on Video, Infrastructure, and Speed

fal's founders discuss their platform for generative video models, explaining the technical challenges, market dynamics, and potential future of AI-generated media across industries like education, entertainment, and advertising.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

This episode features the team behind FAL, a developer platform powering generative video and image models at scale. Founders Görkem Yurtseven, Burkay Gur, and Head of Engineering Batuhan Taskaya explain why generative media was initially overlooked despite video comprising 80% of internet bandwidth. (02:07) They discuss how video models present fundamentally different optimization challenges than LLMs - being compute-bound rather than memory-bound due to denoising thousands of tokens simultaneously across multiple inference steps. (09:29) The team shares insights on running over 600 models simultaneously, often faster than the labs that trained them, and explores demand from AI-native studios, personalized education, programmatic advertising, and early Hollywood engagement.

Main theme: Generative video infrastructure represents a massive but underexplored market with unique technical challenges and evolving use cases across entertainment, education, and commerce.

Speakers

Görkem Yurtseven

Co-founder of FAL, focusing on the infrastructure and business strategy for generative media platforms. He identified the early opportunity in generative video when others were focused on LLMs, leading FAL's positioning as a developer-first platform for accessing hundreds of video and image models.

Burkay Gur

Co-founder of FAL with expertise in distributed computing and infrastructure. He has been instrumental in building FAL's globally distributed GPU fleet across 35+ data centers and developing the orchestration systems that enable running 600+ models simultaneously.

Batuhan Taskaya

Head of Engineering at FAL and a Python core maintainer since age 14. At 22, he was one of the youngest core maintainers of the Python language and leads FAL's inference engine development, including their tracing compiler and custom kernel optimizations that achieve top performance benchmarks.

Key Takeaways

Focus Creates Sustainable Technical Advantages

While others were obsessing over LLM optimizations, FAL dedicated their entire engineering focus to generative media models. (11:36) This laser focus allowed them to build specialized tracing compilers and custom kernels that achieve superior performance on video models. Batuhan emphasized that the lead they maintain is typically 3-6 months ahead of competitors, but this advantage comes purely from dedicated focus rather than unique techniques.

Video Models Require Different Optimization Strategies Than LLMs

Unlike autoregressive LLMs that are memory-bandwidth constrained when moving large weights, video models are compute-bound because they denoise tens of thousands of tokens simultaneously. (09:29) This fundamental difference requires completely different optimization approaches, focusing on kernel efficiency and compute saturation rather than memory management techniques like speculative decoding used in LLM inference.

The Model Landscape Changes Every 30 Days

FAL's data shows that the half-life of a top-5 video model is just 30 days, creating a constantly shifting competitive landscape. (27:06) This rapid evolution means customers use an average of 14 different models simultaneously, requiring infrastructure that can adapt quickly to new architectures while maintaining consistent performance across the entire model catalog.

Open Source Ecosystems Thrive in Visual Domains

Unlike text models where fine-tuning differences are often imperceptible, visual models benefit significantly from customization because aesthetic preferences are subjective and highly specific. (33:32) This creates fertile ground for fine-tuning, LoRAs, and adapters, making open source models more valuable in generative media than in text generation where general-purpose models dominate.

Professional Workflows Require Composable Model Chains

Real professional workflows don't use single text-to-video prompts but instead chain multiple specialized models together - text-to-image for storyboarding, upscalers for quality enhancement, and image-to-video for animation. (38:05) This mirrors traditional film production pipelines and explains why having access to diverse model types is more valuable than having access to a single "best" model.

Statistics & Facts

Video comprises more than 80% of the internet's bandwidth, making generative video a massive addressable market despite being relatively overlooked in the current AI boom. (01:49)
FAL's top 100 customers use an average of 14 different models simultaneously, demonstrating the complexity of professional generative media workflows. (36:42)
A five-second, 24fps video requires approximately 10,000x more compute than a 200-token LLM inference, with 4K video adding another 10x multiplier. (18:39)