Start your day with intelligence. Get The OODA Daily Pulse.

Home > Briefs > Open-source AI must reveal its training data, per new OSI definition

Open-source AI must reveal its training data, per new OSI definition

The Open Source Initiative (OSI) has released its official definition of “open” artificial intelligence, setting the stage for a clash with tech giants like Meta — whose models don’t fit the rules. OSI has long set the industry standard for what constitutes open-source software, but AI systems include elements that aren’t covered by conventional licenses, like model training data. Now, for an AI system to be considered truly open source, it must provide:

  • Access to details about the data used to train the AI so others can understand and re-create it
  • The complete code used to build and run the AI
  • The settings and weights from the training, which help the AI produce its results

This definition directly challenges Meta’s Llama, widely promoted as the largest open-source AI model. Llama is publicly available for download and use, but it has restrictions on commercial use (for applications with over 700 million users) and does not provide access to training data, causing it to fall short of OSI’s standards for unrestricted freedom to use, modify, and share. Meta spokesperson Faith Eischen told The Verge that while “we agree with our partner OSI on many things,” the company disagrees with this definition. “There is no single open source AI definition, and defining it is a challenge because previous open source definitions do not encompass the complexities of today’s rapidly advancing AI models.” “We will continue working with OSI and other industry groups to make AI more accessible and free responsibly, regardless of technical definitions,” Eischen added.

Full report : Open Source Initiative (OSI) released a framework for open-source AI and Meta’s Llama doesn’t fit in it.