The Road to AI efficiency: the trend toward smaller more performant AI and AI at the edge
The creative process…
It’s 4:31 in the morning, a watery dawn is breaking, and I’m wafting down an almost empty M40 motorway heading to Heathrow for an early flight. The advanced autopilot features in the car are helping with the driving (in the UK it amounts to lane keeping and smart cruise control rather than anything nearer to ‘full self driving’), but I am certainly still paying close attention with both hands on the wheel. Using the ChatGPT voice feature with the new GPT-4o model, I’m writing this article on the go. It is like having a phone call with a very bright and cheerful friend piped through the hands-free system in the car. There is something strongly reminiscent of Michael Knight’s chats with KITT in ‘Knight Rider’ about the whole experience. Later this morning I’m flying to Brussels to attend events at our office, focusing on developments in AI regulation and their impact on our clients – only a few years ago, this drive to the airport would have been unproductive time. Now the latest AI tech allows for a creative collaboration while on the road. This experience exemplifies AI in action, showcasing how these tools can enhance productivity. Let me know what you think… the piece has been lightly edited, but is 99% the output from my conversation with GPT-4o.
- The Evolution of AI: Faster, Lighter, and More Performant Model
Artificial Intelligence (AI) has seen remarkable advancements in recent years. While we’ve seen amazing progress in recent years as models have become larger and larger (moving from billions to trillions of parameters, with ever larger demands for training data and compute resources), the trend in recent months has been toward smaller and therefore more computationally efficient models that perform at a similar level to last year’s leviathans. This evolution is driven by the need for AI applications that can operate in real-time and on a variety of devices with limited computational resources.
Two notable recent developments include Meta’s release of Llama 3, available currently as an 8 or 70 billion parameter model, and the Technology Innovation Institute in Abu Dhabi’s Falcon 2 11B, an 11 billion parameter model. Both models have shown impressive performance despite their relatively smaller sizes, making them suitable for a wide range of AI and edge applications. These leaps in performance without proportional increases in size mark significant milestones in AI research. - The Implications of AI at the Edge
The trend towards smaller, more efficient AI models has significant implications for AI at the edge. Recent releases from major phone and technology manufacturers showcase the integration of powerful AI capabilities directly into devices, allowing for more sophisticated processing to occur locally rather than relying on cloud-based solutions. This means faster response times, reduced latency, and enhanced privacy since data doesn’t need to be sent to external servers.
With models like Llama 3 and Falcon 2, or similarly powerful small models from device vendors themselves, we can expect to see even more advanced AI functionalities in everyday devices. From real-time language translation and advanced camera features to personalized user experiences and enhanced security, the possibilities are vast. These advancements make AI more accessible and practical for a wide range of applications, from consumer electronics to industrial IoT. - Efficiency and Its Importance for Cloud AI Providers
Efficiency in AI models is crucial not only for edge applications but also for large cloud AI providers like OpenAI and Google. These providers manage vast computational resources to support a wide array of AI services. By adopting similar techniques even these huge multi-modal models can be made more efficient and performant. Indeed, while OpenAI has not disclosed specific details of model size for the GPT-4 family, the fact that GPT-4o responds even more quickly that GPT-4 Turbo suggests that significant progress in model efficiency have been made. It is not too difficult to imagine this might have included making parametrically smaller versions of the those models.
Reduced computational requirements translate to lower operational costs. This efficiency enables providers to offer more competitive pricing and scale their services more effectively. Additionally, it allows for the deployment of more advanced features and capabilities within their platforms, enhancing the overall user experience. - Environmental and Trade Benefits of Efficient AI Models
Efficient AI models are not just beneficial for performance and cost—they also have a positive impact on the environment. AI models with reduced computational requirements consume less energy, leading to lower carbon footprints. This is especially important as data centres, which power many AI applications, are significant energy consumers. By optimizing AI models to be more efficient, we can reduce the overall environmental impact of AI technology, contributing to more sustainable practices in the tech industry and helping mitigate the effects of climate change. With more legislation driven by the ESG agenda, this is likely to be as much of a legally required choice as it is a moral choice as time passes.
Additionally, there is currently more demand than supply for the highest-end GPUs and AI accelerators. Various world powers increasingly controlling and banning the sale of the most advanced GPUs outside a tightly controlled list of close allies, making it crucial to develop AI models that are efficient enough to run on more readily available hardware. By reducing reliance on the highest-end GPUs, AI vendors can ensure wider accessibility to advanced AI technologies and alleviate some of the supply constraints in the market.
- Looking Ahead: The Future of AI Accessibility and Performance
As AI models continue to become more efficient and powerful, the implications for everyday users are profound. The ability to have high-quality reasoning assistants locally on devices means users can access advanced AI functionalities even without internet connectivity. This can revolutionize how people interact with technology, providing personalized and instantaneous assistance wherever they are.
Furthermore, the democratization of AI through more efficient cloud solutions allows high-quality AI systems to be accessible to a broader audience. Even those using older devices can benefit from advanced AI capabilities via cloud services, bridging the digital divide and ensuring more equitable access to technology.
Next Steps
You can find more views from the DLA Piper team on the topics of AI systems, regulation and the related legal issues on our blog, Technology’s Legal Edge.
If your organisation is deploying AI solutions, you can download DLA Piper’s AI Act App and download our AI Report, a survey of real world AI use.
If you’d like to discuss any of the issues discussed in this article, get in touch with Gareth Stokes,, or your usual DLA Piper contact.