Start your day with intelligence. Get The OODA Daily Pulse.

Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling

Groq, an AI hardware startup, has released two open-source language models that outperform tech giants in specialized tool use capabilities. The new Llama-3-Groq-70B-Tool-Use model has claimed the top spot on the Berkeley Function Calling Leaderboard (BFCL), surpassing proprietary offerings from OpenAI, Google, and Anthropic. Rick Lamers, project lead at Groq, announced the breakthrough in an X.com post. “I’m proud to announce the Llama 3 Groq Tool Use 8B and 70B models,” he said. “An open source Tool Use full finetune of Llama 3 that reaches the #1 position on BFCL beating all other models, including proprietary ones like Claude Sonnet 3.5, GPT-4 Turbo, GPT-4o and Gemini 1.5 Pro.” The larger 70B parameter version achieved a 90.76% overall accuracy on the BFCL, while the smaller 8B model scored 89.06%, ranking third overall. These results demonstrate that open-source models can compete with and even exceed the performance of closed-source alternatives in specific tasks. Groq developed these models in collaboration with AI research company Glaive, using a combination of full fine-tuning and Direct Preference Optimization (DPO) on Meta’s Llama-3 base model. The team emphasized their use of only ethically generated synthetic data for training, addressing common concerns about data privacy and overfitting. This development marks a significant shift in the AI landscape. By achieving top performance using only synthetic data, Groq challenges the notion that vast amounts of real-world data are necessary for creating cutting-edge AI models. This approach could potentially mitigate privacy concerns and reduce the environmental impact associated with training on massive datasets. Moreover, it opens up new possibilities for creating specialized AI models in domains where real-world data is scarce or sensitive.

Full report : Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling.