NVIDIA AI Podcast•November 18, 2025

How AI Data Platforms Are Shaping the Future of Enterprise Storage - Ep. 281

Jacob Lieberman discusses NVIDIA's AI Data Platform, a GPU-accelerated storage solution that transforms enterprise data management by enabling AI-ready data processing directly in storage systems without copying or moving data.

AI & Machine Learning

Data Science & Analytics

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this episode, Jacob Lieberman, Director of Enterprise Product Management at NVIDIA, returns to discuss the evolution of AI agent adoption in enterprises and introduces the groundbreaking AI Data Platform. While AI agents have advanced significantly with open models now matching the power of earlier commercial models, enterprises still face major challenges moving from proof-of-concept to production deployment. (02:04) The core issue revolves around data accessibility, as enterprise systems weren't originally designed for AI agents, and the majority of enterprise data remains unstructured and difficult to process. (04:04) Lieberman unveils NVIDIA's AI Data Platform - a GPU-accelerated storage solution that revolutionizes how enterprises prepare data for AI by bringing compute to the data rather than moving data to compute, eliminating security risks and inefficiencies of traditional data pipelines. (09:13)

Main themes focus on enterprise AI adoption challenges, data security concerns, and the revolutionary approach of GPU-accelerated storage that enables continuous, in-place data processing for AI readiness.

Speakers

Jacob Lieberman

Jacob Lieberman serves as Director of Enterprise Product Management at NVIDIA, where he focuses on enterprise AI solutions and agent deployment strategies. He specializes in helping organizations transition AI initiatives from proof-of-concept stages to full production deployment, with particular expertise in data platform architecture and GPU-accelerated enterprise solutions.

Noah Kravitz

Noah Kravitz hosts the NVIDIA AI Podcast, where he explores cutting-edge developments in artificial intelligence and their real-world applications. He brings a journalistic approach to technical topics, making complex AI concepts accessible to business leaders and technology professionals.

Key Takeaways

Enterprise AI Agents Need AI-Ready Data for Production Success

While consumer AI agent adoption has flourished, enterprises struggle to move beyond proof-of-concept deployments because their existing systems weren't built for AI agents. (03:23) The fundamental challenge lies in securing access to accurate, recent data, as all AI applications - whether training models, fine-tuning, or retrieval augmented generation - depend entirely on this foundation. Enterprise data is predominantly unstructured (PowerPoint presentations, PDFs, audio, video files) requiring complex transformation pipelines to become "AI-ready" through processes like text extraction, semantic chunking, metadata enrichment, embedding, and vector database indexing.

Data Velocity Creates Continuous Processing Demands

Enterprises face the challenge of "data velocity" - the combined rate at which new data is created plus the rate at which existing data changes. (06:22) This isn't a one-time transformation but requires continuous reprocessing to maintain data accuracy and relevance. Most enterprises lack governance systems to track which specific data has changed, forcing them to reindex entire datasets repeatedly - like rewashing all dishes when you're unsure which ones are dirty. This creates massive inefficiencies and resource drain on data science teams who spend up to 80% of their time on data wrangling rather than actual analysis.

Data Copying Creates Security Vulnerabilities and Governance Gaps

Traditional AI data preparation requires copying data multiple times through processing pipelines, creating significant security risks and governance challenges. (08:38) Each copy increases the attack surface, and when data is moved away from source systems, it becomes disconnected from permission changes and content updates. If an employee loses access to a document, they can still access all the AI-processed copies scattered across systems. Lieberman notes that enterprises typically end up with 7-13 copies of the same dataset across their data centers, all disconnected from the authoritative source of truth.

GPU-Accelerated Storage Enables In-Place Data Processing

NVIDIA's AI Data Platform reference design revolutionizes enterprise data management by bringing GPUs directly into storage systems rather than sending data to external processing. (11:55) This approach leverages "data gravity" - the principle that large, growing datasets are expensive and difficult to move. By processing data where it lives, enterprises can perform continuous AI preparation as background operations while maintaining security and governance controls. The GPU handles the entire pipeline - data discovery, text extraction, chunking, embedding, vector indexing, and semantic search - without creating vulnerable copies or disconnecting from source permissions.

AI Agents Can Work Directly Within Storage Systems

Beyond data preparation, storage-resident GPUs have sufficient compute capacity to run AI agents directly within the storage infrastructure, enabling what Lieberman calls "letting AI agents work from home." (19:41) These agents can perform sophisticated tasks like identifying documents that should be classified but aren't marked as such, or monitoring storage system telemetry to provide optimization recommendations to administrators. This approach provides agents with a controlled, secure environment where they understand the APIs, capabilities, and operating system, similar to how human workers often prefer the controlled environment of working from home.

Statistics & Facts

Data scientists spend up to 80% of their time wrangling and preparing data for AI rather than performing actual data science work. (15:29) This statistic highlights the massive inefficiency in current enterprise AI workflows where highly skilled professionals are bogged down in data preparation tasks.
A typical enterprise generates 7-13 copies of the same dataset scattered across their data centers as AI chatbots and agents proliferate. (09:15) These copies become disconnected from source of truth documents, creating security vulnerabilities and governance gaps.
The vast majority of enterprise data is unstructured, consisting of formats like PowerPoint presentations, PDFs, audio files, and videos that cannot be easily queried with traditional database methods. (04:46) This presents a fundamental challenge for AI systems that need structured, searchable data inputs.