Quick Look:
- Apple uses Google’s TPUs for AI model training, diverging from Nvidia’s GPUs.
- Despite AMD’s history with Apple, Google’s TPUs were chosen for their AI hardware needs.
- Apple’s TPUs show comparable effectiveness to Nvidia’s GPUs in AI model training.
- Apple values honest user feedback over benchmarks for model evaluation.
In a recent turn of events that has left the tech world buzzing, Apple has unveiled a detailed research paper outlining how it trained its latest generative AI models. In a surprising twist, Apple used Google’s neural network accelerators, shunning the more widely favoured Nvidia hardware. This intriguing choice has raised eyebrows and sparked discussions across the industry.
Embracing Google’s TPUs Over Nvidia’s GPUs
The paper “Apple Intelligence Foundation Language Models” comprehensively examines Apple’s approach to training and deploying language models. These models are integral to the Apple Intelligence features embedded in its operating systems, enabling functionalities such as text summarisation and suggested wording for messages.
While many AI organizations are fervently chasing after Nvidia GPUs, especially the highly coveted H100, Apple has taken a different route. Historically, Apple and Nvidia have had a rocky relationship, and Apple is not keen on mending fences. Instead, Apple has embraced Google’s Tensor Processing Unit (TPU) technology, leveraging TPU v4 and TPU v5 processors for training its Apple Foundation Models (AFMs).
A Surprising Choice Over AMD
Apple’s decision to bypass AMD’s Radeon GPUs adds another layer of surprise. This is notable given AMD’s history of supplying chips for Mac devices. The choice to partner with Google is particularly surprising, especially after recent critiques over user privacy issues. This decision demonstrates a unique strategic alignment on the hardware front. Apple’s use of 8,192 TPU v4 chips for its AFM server underscores the partnership’s significance. Additionally, the use of 2,048 TPU v5 processors for AFM-on-device highlights the importance of this collaboration.
Performance Comparisons and Market Dynamics
The company’s research provides exciting insights into the performance dynamics of its chosen hardware. While Nvidia claims that training a GPT-4-class AI model requires about 8,000 H100 GPUs, Apple’s experience suggests that TPU v4s are comparably effective regarding the number of accelerators needed. This move might not solely be about evading Nvidia; Google’s TPUs have seen rapid growth since 2021, positioning themselves firmly in the market.
User Preferences and Model Evaluations
Apple’s paper delves into user preferences and model evaluations, claiming that its models are often preferred over those of competitors. Specifically, it suggests that Apple’s models are favoured over those from Meta, OpenAI, and even Google. Although the paper is light on specific details regarding the AFM-server’s specifications, it highlights notable features. For instance, AFM-on-device boasts just under three billion parameters. Furthermore, AFM-on-device has been optimized for efficiency. It achieves this with an average quantization of fewer than four bits.
Interestingly, Apple’s approach to model evaluation emphasizes human feedback over standardized benchmarks. By presenting real users with outputs from different models and asking for their preferences, the company aims to align its evaluations more closely with actual user experiences.
Limitations and Achievements
Despite these claims, Apple’s models generally ranked second or third in overall evaluations. AFM-on-device performed well against smaller models like Gemma 7B, Phi 3 Mini, and Mistral 7B but fell short against LLaMa 3 8B. Similarly, AFM-server did not measure up to GPT-4 and LLaMa 3 70B in direct comparisons. The absence of comparative data for GPT-4o Mini leaves some questions unanswered.
Commitment to Safety and Efficiency
One area where Apple touts a clear victory is in generating safe content. The company claims that its AFM-on-device and AFM-server models produce harmful responses less frequently than competitors, with rates of 7.5% and 6.3%, respectively. This is a notable achievement, especially compared to models like Mistral 7N and Mixtral 8x22B, which had significantly higher rates of harmful outputs.
A Forward-Looking Strategy
Apple’s strategic choice to utilize Google’s TPUs over more conventional Nvidia or AMD GPUs illustrates its willingness to forge its path in the AI landscape. This decision, coupled with a strong focus on user experience and safety, positions Apple uniquely as it continues to develop and refine its AI capabilities. As the AI arms race intensifies, company’s unconventional approach may catalyze further innovation and collaboration within the industry.