Start your day with intelligence. Get The OODA Daily Pulse.

Home > Analysis > Opportunities for Advantage: Natural Language Processing – Meta AI Builds a Huge GPT-3 Model and Makes it Available for Free

Opportunities for Advantage: Natural Language Processing – Meta AI Builds a Huge GPT-3 Model and Makes it Available for Free

OODA CTO Bob Gourley recently provided a discussion of the potential impacts and use cases of improved natural language processing (NLP), in which he highlighted the major developments in computer language understanding in a way that can help enterprise and government leaders better prepare to take action on these incredible new capabilities. Major improvements in the ability of computers to understand what humans write say and search are being made commercially available. These improvements are significant and will end up changing just about every industry in the world. But at this point, they are getting little notice outside a narrow segment of experts.

Background

Major developments of interest to this ‘expert’ class have been reviewed here at OODA Loop:

The Current AI Innovation Hype Cycle: Large Language Models, OpenAI’s GPT-3 and DeepMind’s RETRO:  For better or for worse, Large Language Models (LLMs) – used for natural language processing by commercial AI Platform-as-a-Service (PaaS) subscription offerings – have become one of the first “big data” applied technologies to become a crossover hit in the AI marketplace:  “Large language models—powerful programs that can generate paragraphs of text and mimic human conversation—have become one of the hottest trends in AI in the last couple of years. But they have deep flaws, parroting misinformation, prejudice, and toxic language.” (3)

From a big data perspective, LLMs are gigantic datasets or data models. In the world of AI, LLM’s are huge neural networks that increase in size based on the number of parameters included in the model and are used by neural networks for training. Neural network parameters are values constantly refined while training an AI model, resulting in AI-based predictions. The more parameters, the more the data training results in structured information (organized around the parameters of the LLM) – enhancing the accuracy of the predictions generated by the model.

In April of 2020, the bleeding edge of innovation in this space was the Facebook chatbot Blender, made open source by Facebook with 9.4 billion parameters and an innovative structure for training on 1.5 billion publicly available Reddit conversations  – with additional conversational language datasets for conversations that contained some kind of emotion;  information-dense conversations; and conversations between people with distinct personas.  Blender’s 9.4 billion parameters dwarfed Google’s Meena (released in January 2020) by almost 4X.  (1)

OpenAI, a San Francisco-based research and deployment company, released GPT-3 in June of 2020  – and the results were instantly compelling: Natural language processing (NLP) with a seeming mastery of language that generated sensible sentences and was able to converse with humans via chatbots.  By 2021, the MIT Technology Review was proclaiming OpenAI’s GPT-3 a top 10 breakthrough technology, “a big step toward AI that can understand and interact with the human world.”

Open-Source Natural Language Processing: EleutherAI’s GPT-J:  Initially, access to OpenAI’s GPT-3 was a selective process complete with a waiting list.  It has since been commercialized in collaboration with Microsoft.   In response, EleutherAI – a self-described “grassroots collective of researchers working to open-source AI research”  launched GPT-J in July 2020 as a quest to replicate the OpenAI GPT collection of models. The goal is to “break the OpenAI-Microsoft monopoly” through broadening availability and the collective intelligence of open-source development of a competing class of GPT models.

GPT is an acronym for “generative pre-trained transformer.” The first paper on the” GPT of a language model was written by Alec Radford and colleagues, and published in a preprint on OpenAI’s website on June 11, 2018.  It showed how a generative model of language is able to acquire world knowledge and process long-range dependencies by pre-training on a diverse corpus with long stretches of contiguous text. (4)

Meta AI is now in the GPT-3 model game  – with the release of a massive proprietary GPT-3 model which the company has made available for free to researchers.

Under the Hood:  “Democratizing access to large-scale language models with OPT-175B”

“With the release of OPT-175B and smaller-scale baselines, we hope to increase the diversity of voices defining the ethical considerations of such technologies.”

The Meta AI GPT-3 model, called Open Pretrained Transformer (OPT), has been made available for non-commercial use.  What also sets this release apart is the parallel release of:

  • Meta AI’s code and a logbook that documents the training process;
  • The logbook containing daily updates from members of the team about the training data: how it was added to the model and when what worked and what didn’t; and
  • In more than 100 pages of notes, the researchers log every bug, crash, and reboot in a three-month training process that ran nonstop from October 2021 to January 2022. (2)

Meta AI provided this unprecedented level of detail and the guidelines under which they operated and structured their development process in a blog post entitled “Democratizing access to large-scale language models with OPT-175B.”  This blog post (and the resources it makes available) is impressive, encouraging in a variety of ways, and deserves a full read:

“Large language models — natural language processing (NLP) systems with more than 100 billion parameters — have transformed NLP and AI research over the last few years. Trained on a massive and varied volume of text, they show surprising new capabilities to generate creative text, solve basic math problems, answer reading comprehension questions, and more. While in some cases the public can interact with these models through paid APIs, full research access is still limited to only a few highly resourced labs. This restricted access has limited researchers’ ability to understand how and why these large language models work, hindering progress on efforts to improve their robustness and mitigate known issues such as bias and toxicity.

In line with Meta AI’s commitment to open science, we are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology. For the first time for a language technology system of this size, the release includes both the pretrained models and the code needed to train and use them. To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license to focus on research use cases. Access to the model will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; along with industry research laboratories around the world.

We believe the entire AI community-academic researchers, civil society, policymakers, and industry – must work together to develop clear guidelines around responsible AI in general and responsible large language models in particular, given their centrality in many downstream language applications. A much broader segment of the AI community needs access to these models in order to conduct reproducible research and collectively drive the field forward. With the release of OPT-175B and smaller-scale baselines, we hope to increase the diversity of voices defining the ethical considerations of such technologies.” (2)

“…the commitment by Meta AI to these collaboration and governance guidelines is a major development in the commercial phase of NLP and GPT-3 models.”

What is most impressive about the release is that it dovetails with some of our research and analysis on the ethical use of AI and our concern that commercial AI efforts were operating in a bubble outside of some of the work being done by think tanks and industry standardization organizations (which are taking more of the ethical concerns into account in the frameworks and research that have been released in the last year).

The Meta AI OPT release is an operational signal from a huge technology company that publication guidelines and standardization are crucial elements in this early stage development of AI capabilities, partnering with an ethical AI non-profit and a government agency in the architecture of this release:

“Following the publication guidelines for researchers generated by the Partnership on AI, along with the governance guidance outlined by NIST in March 2022 (section 3.4), we are releasing all our notes documenting the development process, including the full logbook detailing the day-to-day training process, so other researchers can more easily build on our work. Furthermore, these details disclose how much compute was used to train OPT-175B and the human overhead required when underlying infrastructure or the training process itself becomes unstable at scale.” (2)

The MIT Technology Review notes that “Google, which is exploring the use of large language models in its search products, has also been criticized for a lack of transparency. The company sparked controversy in 2020 when it forced out leading members of its AI ethics team after they produced a study that highlighted problems with the technology.” (3)

Like some of the positive signals of public/private collaboration we are seeing in cybersecurity and open-source security efforts,  the commitment by Meta AI to these collaboration and governance guidelines is a major development in the commercial phase of NLP and GPT-3 models.

The Meta AI OPT Specifications

Meta AI laid out its development process and released the software, code, and hardware specifications:

  • We are sharing OPT-175B, along with the codebase used to train and deploy the model using only 16 NVIDIA V100 GPUs, in order to increase the accessibility of these models specifically for research purposes and to provide a foundation for analyzing potential harms rooted in quantifiable metrics on a common, shared model.
  • We are also fully releasing a suite of smaller-scale baseline models, trained on the same data set and using similar settings as OPT-175B, to enable researchers to study the effect of scale alone. The parameter count for these smaller-scale models includes 125 million, 350 million, 1.3 billion, 2.7 billion, 6.7 billion, 13 billion, and 30 billion (66 billion to be released soon).
  • Recent developments in AI research have consumed an extraordinary amount of compute power. While industry labs have started to report the carbon footprint of these models, most do not include the computational cost associated with the R&D phases of experimentation, which in some cases can be an order of magnitude more resource-intensive than training the final model.
  • We developed OPT-175B with energy efficiency in mind by successfully training a model of this size using only 1/7th the carbon footprint as that of GPT-3. This was achieved by combining Meta’s open sourceFully Sharded Data Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM. We achieved ~147 TFLOP/s/GPU utilization on NVIDIA’s 80 GB A100 GPUs, roughly 17 percent higher than published by NVIDIA researchers on similar hardware.
  • By sharing these baselines along with the codebase to train a 175B model efficiently, we have an opportunity to reduce our collective environmental footprint while also allowing new results and progress in the field to be measurable in a consistent manner.
  • Access the open-source code and small-scale pretrained models here, request access to OPT-175B here, and read the paper here.
  • Pretrained models are all licensed under the OPT-175B License Agreement.

One final point of interest from the release notes from Meta AI:

“While there are many exciting developments in the space of large language models, the limitations and risks these models pose are still not well understood. Without direct access to these models, researchers are also limited in their ability to design detection and mitigation strategies for possible harm, which leaves detection and mitigation in the hands of only those with sufficient capital to access models of this scale. We hope that OPT-175B will bring more voices to the frontier of large language model creation, help the community collectively design responsible release strategies, and add an unprecedented level of transparency and openness to the development of large language models in the field.”

What Next?

“What we call state-of-the-art nowadays can’t just be about performance. It has to be state-of-the-art in terms of responsibility as well.”

Be on the lookout for the continued leadership in this space of Joelle Pineau, a longtime advocate for transparency in the development of technology, who is the managing director at Meta AI.  Pineau is singlehandedly setting the ethical tone within Meta AI and, because of its size and influence, the larger AI industry itself:  From Pineau:

“We strongly believe that the ability for others to scrutinize your work is an important part of research. We really invite that collaboration.  Many of us have been university researchers.  We know the gap that exists between universities and industry in terms of the ability to build these models. Making this one available to researchers was a no-brainer.” (3)

She adds: “That commitment to open science is why I’m here.  I wouldn’t be here on any other terms.  What we call state-of-the-art nowadays can’t just be about performance. It has to be state-of-the-art in terms of responsibility as well.  I can’t tell you that there’s no risk of this model producing language that we’re not proud of.  It will.   I believe the only way to build trust is extreme transparency.  We have different opinions around the world about what speech is appropriate, and AI is a part of that conversation.  But how do we grapple with that? You need many voices in that discussion.” (3)

Stay Informed

It should go without saying that tracking threats are critical to informing your actions. This includes reading our OODA Daily Pulse, which will give you insights into the nature of the threat and risks to business operations.

Related Reading:

Explore OODA Research and Analysis

Use OODA Loop to improve your decision-making in any competitive endeavor. Explore OODA Loop

Decision Intelligence

The greatest determinant of your success will be the quality of your decisions. We examine frameworks for understanding and reducing risk while enabling opportunities. Topics include Black Swans, Gray Rhinos, Foresight, Strategy, Stratigames, Business Intelligence, and Intelligent Enterprises. Leadership in the modern age is also a key topic in this domain. Explore Decision Intelligence

Disruptive/Exponential Technology

We track the rapidly changing world of technology with a focus on what leaders need to know to improve decision-making. The future of tech is being created now and we provide insights that enable optimized action based on the future of tech. We provide deep insights into Artificial Intelligence, Machine Learning, Cloud Computing, Quantum Computing, Security Technology, Space Technology. Explore Disruptive/Exponential Tech

Security and Resiliency

Security and resiliency topics include geopolitical and cyber risk, cyber conflict, cyber diplomacy, cybersecurity, nation-state conflict, non-nation state conflict, global health, international crime, supply chain, and terrorism. Explore Security and Resiliency

Community

The OODA community includes a broad group of decision-makers, analysts, entrepreneurs, government leaders, and tech creators. Interact with and learn from your peers via online monthly meetings, OODA Salons, the OODAcast, in-person conferences, and an online forum. For the most sensitive discussions interact with executive leaders via a closed Wickr channel. The community also has access to a member-only video library. Explore The OODA Community

Daniel Pereira

About the Author

Daniel Pereira

Daniel Pereira is research director at OODA. He is a foresight strategist, creative technologist, and an information communication technology (ICT) and digital media researcher with 20+ years of experience directing public/private partnerships and strategic innovation initiatives.