Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
This episode features the team behind FAL, a developer platform powering generative video and image models at scale. Founders Görkem Yurtseven, Burkay Gur, and Head of Engineering Batuhan Taskaya explain why generative media was initially overlooked despite video comprising 80% of internet bandwidth. (02:07) They discuss how video models present fundamentally different optimization challenges than LLMs - being compute-bound rather than memory-bound due to denoising thousands of tokens simultaneously across multiple inference steps. (09:29) The team shares insights on running over 600 models simultaneously, often faster than the labs that trained them, and explores demand from AI-native studios, personalized education, programmatic advertising, and early Hollywood engagement.
Co-founder of FAL, focusing on the infrastructure and business strategy for generative media platforms. He identified the early opportunity in generative video when others were focused on LLMs, leading FAL's positioning as a developer-first platform for accessing hundreds of video and image models.
Co-founder of FAL with expertise in distributed computing and infrastructure. He has been instrumental in building FAL's globally distributed GPU fleet across 35+ data centers and developing the orchestration systems that enable running 600+ models simultaneously.
Head of Engineering at FAL and a Python core maintainer since age 14. At 22, he was one of the youngest core maintainers of the Python language and leads FAL's inference engine development, including their tracing compiler and custom kernel optimizations that achieve top performance benchmarks.
While others were obsessing over LLM optimizations, FAL dedicated their entire engineering focus to generative media models. (11:36) This laser focus allowed them to build specialized tracing compilers and custom kernels that achieve superior performance on video models. Batuhan emphasized that the lead they maintain is typically 3-6 months ahead of competitors, but this advantage comes purely from dedicated focus rather than unique techniques.
Unlike autoregressive LLMs that are memory-bandwidth constrained when moving large weights, video models are compute-bound because they denoise tens of thousands of tokens simultaneously. (09:29) This fundamental difference requires completely different optimization approaches, focusing on kernel efficiency and compute saturation rather than memory management techniques like speculative decoding used in LLM inference.
FAL's data shows that the half-life of a top-5 video model is just 30 days, creating a constantly shifting competitive landscape. (27:06) This rapid evolution means customers use an average of 14 different models simultaneously, requiring infrastructure that can adapt quickly to new architectures while maintaining consistent performance across the entire model catalog.
Unlike text models where fine-tuning differences are often imperceptible, visual models benefit significantly from customization because aesthetic preferences are subjective and highly specific. (33:32) This creates fertile ground for fine-tuning, LoRAs, and adapters, making open source models more valuable in generative media than in text generation where general-purpose models dominate.
Real professional workflows don't use single text-to-video prompts but instead chain multiple specialized models together - text-to-image for storyboarding, upscalers for quality enhancement, and image-to-video for animation. (38:05) This mirrors traditional film production pipelines and explains why having access to diverse model types is more valuable than having access to a single "best" model.