Microsoft has unveiled the latest iteration of its lightweight artificial intelligence (AI) model, Phi-3 Mini, marking the initial release in a series of three compact models.
With 3.8 billion parameters, Phi-3 Mini is trained on a relatively more minor dataset than behemoths like GPT-4 and is now accessible on Azure, Hugging Face, and Ollama platforms.
Microsoft’s roadmap includes subsequent launches of Phi-3 Small (7B parameters) and Phi-3 Medium (14B parameters), where parameters denote the model’s capacity for comprehending complex instructions.
The debut of Phi-2 in December has matched the performance of larger models like Llama 2.
According to Microsoft, Phi-3 outperforms its predecessor, delivering responses akin to models ten times its size.
Moreover, Eric Boyd, Microsoft Azure AI Platform’s corporate vice president, asserts that Phi-3 Mini rivals large language models (LLMs) such as GPT-3.5, albeit in a more compact package.
Meanwhile, compared to their larger counterparts, smaller AI models offer cost-effective operation and superior performance on personal devices like smartphones and laptops.
Microsoft’s endeavors in this direction include Orca-Math, tailored for solving mathematical problems.
Similarly, competitors like Google and Anthropic have compact AI models, each catering to specific tasks such as document summarization, coding aid, or reading and summarizing dense research papers.
Boyd explains that developers employed a curriculum to train Phi-3, drawing inspiration from childhood learning processes characterized by more straightforward narratives and gradual complexity, akin to bedtime stories and beginner-level books.
LLM Crafts Children’s Books to Teach Microsoft’s Phi-3
Microsoft Phi-3 has acquired insights from narratives crafted by other LLMs during its learning process.
Boyd mentions the scarcity of children’s books and explains their approach as they compiled a list of over 3,000 words and tasked an LLM to craft children’s books to educate Phi.
Phi-3’s development builds upon its predecessors’ advancements. While Phi-1 prioritized coding and Phi-2 delved into reasoning, Phi-3 excels in coding and reasoning.
Despite possessing some general knowledge, the Phi-3 series pales in comparison to broader models like GPT-4 or other LLMs.
Boyd underscores the practicality of smaller models like Phi-3 for custom applications due to their efficiency with smaller datasets and lower computing costs.