The Stack Overflow Podcast•September 12, 2025

Planning to Arm mobile devices with chips that handle AI

ARM is exploring ways to optimize mobile devices for generative AI by developing specialized chips and techniques like model compression, quantization, and flexible hardware architectures. The company is focusing on reducing model sizes, experimenting with different data types, and creating tools that enable efficient AI inference across various computing units like CPUs, GPUs, and NPUs.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

This Stack Overflow podcast episode explores generative AI from a unique chip-level perspective with Geraint North, a fellow at ARM in AI and developer platforms. The conversation delves into how ARM approaches silicon design for AI workloads, the intersection of gaming and generative AI on mobile devices, and the technical challenges of bringing large language models to edge devices. (00:22) North shares insights on ARM's business model of designing CPU architectures and licensing implementations to partners, providing flexibility for customization while maintaining a five to six-year development timeline from conception to market. (03:05)

Main themes include chip-level optimization for AI workloads, the convergence of gaming and generative AI in mobile computing, and strategies for efficient on-device AI inference

Speakers

Ryan Donovan

Ryan Donovan is the host of the Stack Overflow podcast and works as an editor for Stack Overflow's blog. He brings curiosity and technical insight to conversations about software and technology trends.

Geraint North

Geraint North is a fellow at ARM in AI and developer platforms, based in Manchester, UK. He has over eleven years of experience at ARM, helping to establish ARM's Manchester location which grew from 9 to 500 employees. Previously, he worked at IBM after his startup Transitive was acquired, where he contributed to Apple's Rosetta technology for PowerPC to Intel CPU transitions. He specializes in ensuring ARM's technology is useful to developers and solving real-world problems across supercomputing, mobile gaming, and high-performance IoT.

Key Takeaways

Design for Long-Term Technology Trends, Not Current Needs

ARM operates with a five to six-year timeline from technology conception to market deployment, requiring significant foresight into future computing needs. (03:05) North explains that when ARM began working on tools for the Fugaku supercomputer in 2014, it didn't reach market until 2020. This extended development cycle means ARM must anticipate technological shifts well in advance, particularly challenging in rapidly evolving fields like AI where the landscape can change dramatically in just three months. The lesson for professionals is to develop strategic thinking that considers long-term trends rather than just immediate market demands, especially when working on foundational technologies or making significant investments.

Flexibility in Architecture Enables Market Differentiation

ARM's business model focuses on creating flexible architectures that partners can customize rather than one-size-fits-all solutions. (04:33) This approach allows silicon partners, device OEMs, and other stakeholders to make design decisions closer to the consumer value point, enabling meaningful differentiation in their products. For professionals, this translates to building systems and strategies that provide optionality rather than rigid solutions, allowing downstream users or customers to adapt offerings to their specific needs while maintaining core functionality.

Memory Bandwidth, Not Compute Power, Limits AI Performance

Large language models are extremely memory-intensive, often making memory bandwidth the limiting factor rather than computational capacity. (11:29) North reveals that on mobile platforms, you typically get similar performance running an LLM on a CPU versus a GPU or NPU because the memory bus and DRAM speed create the bottleneck. This insight challenges the common assumption that more specialized AI processors automatically mean better performance, highlighting the importance of understanding system-level constraints rather than focusing solely on individual component optimization.

Start with the Smallest Viable Model Size

When developing AI applications, particularly for edge devices, the key is determining the minimum model size that can accomplish your specific task rather than starting with the largest available model. (21:31) North describes how small language models with tens of millions of parameters, having the comprehension ability of a small child, can be perfectly adequate for simple game character interactions. This approach of right-sizing AI models for specific use cases can dramatically improve performance and reduce resource requirements, making AI features accessible on a broader range of devices.

Build Infrastructure That Benefits the Entire Ecosystem

ARM's strategy involves creating "pinch points" where they can carry heavy computational loads for the entire development ecosystem. (25:38) Their Clidy library accelerates AI inference engines and is integrated into major frameworks like Executor and LiteRT on Android, ensuring developers automatically get optimized performance without additional effort. This demonstrates the value of identifying strategic leverage points where investment in shared infrastructure can amplify benefits across many users and use cases.

Statistics & Facts

ARM's development timeline spans 5-6 years from initial technology conception to market deployment, as demonstrated by the Fugaku supercomputer project that began in 2014 and reached market in 2020. (03:05)
Mobile phone chip photomasks are limited to approximately 850 square millimeters, with cost-effective mobile chips needing to stay under 120 square millimeters to fit seven units per wafer stamp. (08:56)
In modern AI-enhanced graphics rendering, up to 95% of pixels can be generated purely by AI through techniques like DLSS upscaling and ray tracing denoising, rather than traditional rendering methods. (29:47)