Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
This Stack Overflow podcast episode explores generative AI from a unique chip-level perspective with Geraint North, a fellow at ARM in AI and developer platforms. The conversation delves into how ARM approaches silicon design for AI workloads, the intersection of gaming and generative AI on mobile devices, and the technical challenges of bringing large language models to edge devices. (00:22) North shares insights on ARM's business model of designing CPU architectures and licensing implementations to partners, providing flexibility for customization while maintaining a five to six-year development timeline from conception to market. (03:05)
Ryan Donovan is the host of the Stack Overflow podcast and works as an editor for Stack Overflow's blog. He brings curiosity and technical insight to conversations about software and technology trends.
Geraint North is a fellow at ARM in AI and developer platforms, based in Manchester, UK. He has over eleven years of experience at ARM, helping to establish ARM's Manchester location which grew from 9 to 500 employees. Previously, he worked at IBM after his startup Transitive was acquired, where he contributed to Apple's Rosetta technology for PowerPC to Intel CPU transitions. He specializes in ensuring ARM's technology is useful to developers and solving real-world problems across supercomputing, mobile gaming, and high-performance IoT.
ARM operates with a five to six-year timeline from technology conception to market deployment, requiring significant foresight into future computing needs. (03:05) North explains that when ARM began working on tools for the Fugaku supercomputer in 2014, it didn't reach market until 2020. This extended development cycle means ARM must anticipate technological shifts well in advance, particularly challenging in rapidly evolving fields like AI where the landscape can change dramatically in just three months. The lesson for professionals is to develop strategic thinking that considers long-term trends rather than just immediate market demands, especially when working on foundational technologies or making significant investments.
ARM's business model focuses on creating flexible architectures that partners can customize rather than one-size-fits-all solutions. (04:33) This approach allows silicon partners, device OEMs, and other stakeholders to make design decisions closer to the consumer value point, enabling meaningful differentiation in their products. For professionals, this translates to building systems and strategies that provide optionality rather than rigid solutions, allowing downstream users or customers to adapt offerings to their specific needs while maintaining core functionality.
Large language models are extremely memory-intensive, often making memory bandwidth the limiting factor rather than computational capacity. (11:29) North reveals that on mobile platforms, you typically get similar performance running an LLM on a CPU versus a GPU or NPU because the memory bus and DRAM speed create the bottleneck. This insight challenges the common assumption that more specialized AI processors automatically mean better performance, highlighting the importance of understanding system-level constraints rather than focusing solely on individual component optimization.
When developing AI applications, particularly for edge devices, the key is determining the minimum model size that can accomplish your specific task rather than starting with the largest available model. (21:31) North describes how small language models with tens of millions of parameters, having the comprehension ability of a small child, can be perfectly adequate for simple game character interactions. This approach of right-sizing AI models for specific use cases can dramatically improve performance and reduce resource requirements, making AI features accessible on a broader range of devices.
ARM's strategy involves creating "pinch points" where they can carry heavy computational loads for the entire development ecosystem. (25:38) Their Clidy library accelerates AI inference engines and is integrated into major frameworks like Executor and LiteRT on Android, ensuring developers automatically get optimized performance without additional effort. This demonstrates the value of identifying strategic leverage points where investment in shared infrastructure can amplify benefits across many users and use cases.