Latent Space: The AI Engineer Podcast•January 28, 2026

🔬 Automating Science: World Models, Scientific Taste, Agent Loops — Andrew White

Andrew White shares his journey from academia to founding Future House and Edison Scientific, discussing how AI can automate scientific discovery by creating world models, generating and filtering hypotheses, and iterating through experimental loops with the goal of accelerating scientific research across domains.

AI & Machine Learning

BioTech & HealthTech

Data Science & Analytics

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this inaugural episode of the AI for Science podcast, Andrew White, co-founder of Future House and Edison Scientific, shares his journey from academia to pioneering the automation of scientific discovery. White traces his path from studying molecular dynamics as a PhD student to becoming a red teamer for GPT-4 and creating ChemCrow - the first chemistry LLM agent that triggered White House briefings and meetings with three-letter agencies. (10:05)

Core themes: Automating the entire scientific method through AI agents, from hypothesis generation to experimental design and analysis, while addressing the challenges of scientific "taste" and the limitations of traditional simulation methods like molecular dynamics and DFT

Speakers

Andrew White

Co-founder of Future House (a focused research organization) and Edison Scientific (a venture-backed startup focused on automating science). Former tenured professor at University of Rochester who resigned in June 2024 to focus on AI for science full-time. White has a PhD from University of Washington in molecular dynamics and was a red teamer for GPT-4, leading to the creation of ChemCrow which resulted in White House briefings and national security discussions.

RJ Honicky

Co-founder and CTO of MiraOmics, where they build AI models and services for single cell, spatial transcriptomics and pathology slide analysis. Co-host of the AI for Science podcast on the Latent Space Network.

Brandon Anderson

Builds AI systems for RNA drug discovery at Atomic AI. Co-host of the AI for Science podcast, focusing on bridging the gap between AI engineering and scientific research communities.

Key Takeaways

Lab-in-the-Loop Is the Real Bottleneck, Not Intelligence

White emphasizes that the constraint in scientific automation isn't the intelligence of AI models, but rather the physical execution of experiments and access to laboratory resources. (18:24) The bottleneck often comes down to practical issues like knowing lead times on reagents, what's available in the lab, and costs - not whether GPT-5 is smarter than GPT-4. This insight led to their Robin system, where agents propose experiments, humans execute them, and agents analyze results to propose the next experiment in an iterative loop.

Scientific "Taste" Cannot Be Captured Through Traditional RLHF

When Future House attempted to train models on human preferences for scientific hypotheses, they discovered humans focus on tone, actionability, and specific facts rather than the fundamental question: "If this hypothesis is true/false, how does it change the world?" (20:26) This led them to pivot from direct human feedback to end-to-end feedback loops where humans click/download discoveries, and that signal rolls up to hypothesis quality - a more natural way to capture scientific value.

Enumeration and Filtration Beats Trying to Be Smarter

White's approach prioritizes generating many hypotheses quickly and then filtering them through literature search and data analysis, rather than trying to generate perfect hypotheses from the start. (26:47) As he puts it, "if you can't be smarter, you can try more times." This strategy proved successful in their Robin paper on macular degeneration, where the hypothesis that experts thought was best wasn't the one that led to success - but their systematic approach discovered Ripasudil as an effective therapeutic through ROCK inhibitor mechanisms.

World Models as Scientific Memory Systems Enable Iterative Discovery

Cosmos uses a "world model" - essentially a distilled memory system like a Git repository for scientific knowledge - that accumulates and synthesizes information over time. (21:57) This breakthrough came from putting data analysis in the feedback loop, not just literature search. The world model allows the system to make predictions, update beliefs based on new evidence, and maintain a practical understanding that evolves - crucial for automating the scientific method's iterative nature.

Molecular Dynamics and DFT Are Overrated for Real-World Problems

White argues that MD and DFT "have consumed an enormous number of PhDs at the altar of beautiful simulation, but they don't model the world correctly." (40:15) He points to the AlphaFold vs. DE Shaw Research comparison: DE Shaw built custom silicon and ran MD at massive scales, while AlphaFold solved protein folding using experimental data on desktop GPUs. Real catalysts have grain boundaries and dopants that are too complex for DFT, demonstrating that machine learning on experimental data often beats first-principles simulation.

Statistics & Facts

Scientific experts only agree about 52-55% of the time when interpreting experimental results and evaluating scientific hypotheses, highlighting the subjective nature of scientific interpretation even among domain experts. (34:30)
White estimated that pre-ChatGPT, approximately 20% of the world's computing power was dedicated to simulating water through molecular dynamics calculations. (40:46)
AlphaFold can now be trained on approximately 10,000 GPU hours, making protein folding accessible on desktop computers rather than requiring the massive specialized hardware that DE Shaw Research developed. (45:07)