Start your day with intelligence. Get The OODA Daily Pulse.
OODA CTO Bob Gourley recently provided a discussion of the potential impacts and use cases of improved natural language processing (NLP), in which he highlighted the major developments in computer language understanding in a way that can help enterprise and government leaders better prepare to take action on these incredible new capabilities. Major improvements in the ability of computers to understand what humans write say and search are being made commercially available. These improvements are significant and will end up changing just about every industry in the world. But at this point, they are getting little notice outside a narrow segment of experts.
Major developments of interest to this ‘expert’ class have been reviewed here at OODA Loop:
The Current AI Innovation Hype Cycle: Large Language Models, OpenAI’s GPT-3 and DeepMind’s RETRO: For better or for worse, Large Language Models (LLMs) – used for natural language processing by commercial AI Platform-as-a-Service (PaaS) subscription offerings – have become one of the first “big data” applied technologies to become a crossover hit in the AI marketplace: “Large language models—powerful programs that can generate paragraphs of text and mimic human conversation—have become one of the hottest trends in AI in the last couple of years. But they have deep flaws, parroting misinformation, prejudice, and toxic language.” (3)
From a big data perspective, LLMs are gigantic datasets or data models. In the world of AI, LLM’s are huge neural networks that increase in size based on the number of parameters included in the model and are used by neural networks for training. Neural network parameters are values constantly refined while training an AI model, resulting in AI-based predictions. The more parameters, the more the data training results in structured information (organized around the parameters of the LLM) – enhancing the accuracy of the predictions generated by the model.
In April of 2020, the bleeding edge of innovation in this space was the Facebook chatbot Blender, made open source by Facebook with 9.4 billion parameters and an innovative structure for training on 1.5 billion publicly available Reddit conversations – with additional conversational language datasets for conversations that contained some kind of emotion; information-dense conversations; and conversations between people with distinct personas. Blender’s 9.4 billion parameters dwarfed Google’s Meena (released in January 2020) by almost 4X. (1)
OpenAI, a San Francisco-based research and deployment company, released GPT-3 in June of 2020 – and the results were instantly compelling: Natural language processing (NLP) with a seeming mastery of language that generated sensible sentences and was able to converse with humans via chatbots. By 2021, the MIT Technology Review was proclaiming OpenAI’s GPT-3 a top 10 breakthrough technology, “a big step toward AI that can understand and interact with the human world.”
Open-Source Natural Language Processing: EleutherAI’s GPT-J: Initially, access to OpenAI’s GPT-3 was a selective process complete with a waiting list. It has since been commercialized in collaboration with Microsoft. In response, EleutherAI – a self-described “grassroots collective of researchers working to open-source AI research” launched GPT-J in July 2020 as a quest to replicate the OpenAI GPT collection of models. The goal is to “break the OpenAI-Microsoft monopoly” through broadening availability and the collective intelligence of open-source development of a competing class of GPT models.
GPT is an acronym for “generative pre-trained transformer.” The first paper on the” GPT of a language model was written by Alec Radford and colleagues, and published in a preprint on OpenAI’s website on June 11, 2018. It showed how a generative model of language is able to acquire world knowledge and process long-range dependencies by pre-training on a diverse corpus with long stretches of contiguous text. (4)
Meta AI is now in the GPT-3 model game – with the release of a massive proprietary GPT-3 model which the company has made available for free to researchers.
“With the release of OPT-175B and smaller-scale baselines, we hope to increase the diversity of voices defining the ethical considerations of such technologies.”
The Meta AI GPT-3 model, called Open Pretrained Transformer (OPT), has been made available for non-commercial use. What also sets this release apart is the parallel release of:
Meta AI provided this unprecedented level of detail and the guidelines under which they operated and structured their development process in a blog post entitled “Democratizing access to large-scale language models with OPT-175B.” This blog post (and the resources it makes available) is impressive, encouraging in a variety of ways, and deserves a full read:
“Large language models — natural language processing (NLP) systems with more than 100 billion parameters — have transformed NLP and AI research over the last few years. Trained on a massive and varied volume of text, they show surprising new capabilities to generate creative text, solve basic math problems, answer reading comprehension questions, and more. While in some cases the public can interact with these models through paid APIs, full research access is still limited to only a few highly resourced labs. This restricted access has limited researchers’ ability to understand how and why these large language models work, hindering progress on efforts to improve their robustness and mitigate known issues such as bias and toxicity.
In line with Meta AI’s commitment to open science, we are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology. For the first time for a language technology system of this size, the release includes both the pretrained models and the code needed to train and use them. To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license to focus on research use cases. Access to the model will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; along with industry research laboratories around the world.
We believe the entire AI community-academic researchers, civil society, policymakers, and industry – must work together to develop clear guidelines around responsible AI in general and responsible large language models in particular, given their centrality in many downstream language applications. A much broader segment of the AI community needs access to these models in order to conduct reproducible research and collectively drive the field forward. With the release of OPT-175B and smaller-scale baselines, we hope to increase the diversity of voices defining the ethical considerations of such technologies.” (2)
“…the commitment by Meta AI to these collaboration and governance guidelines is a major development in the commercial phase of NLP and GPT-3 models.”
What is most impressive about the release is that it dovetails with some of our research and analysis on the ethical use of AI and our concern that commercial AI efforts were operating in a bubble outside of some of the work being done by think tanks and industry standardization organizations (which are taking more of the ethical concerns into account in the frameworks and research that have been released in the last year).
The Meta AI OPT release is an operational signal from a huge technology company that publication guidelines and standardization are crucial elements in this early stage development of AI capabilities, partnering with an ethical AI non-profit and a government agency in the architecture of this release:
“Following the publication guidelines for researchers generated by the Partnership on AI, along with the governance guidance outlined by NIST in March 2022 (section 3.4), we are releasing all our notes documenting the development process, including the full logbook detailing the day-to-day training process, so other researchers can more easily build on our work. Furthermore, these details disclose how much compute was used to train OPT-175B and the human overhead required when underlying infrastructure or the training process itself becomes unstable at scale.” (2)
The MIT Technology Review notes that “Google, which is exploring the use of large language models in its search products, has also been criticized for a lack of transparency. The company sparked controversy in 2020 when it forced out leading members of its AI ethics team after they produced a study that highlighted problems with the technology.” (3)
Like some of the positive signals of public/private collaboration we are seeing in cybersecurity and open-source security efforts, the commitment by Meta AI to these collaboration and governance guidelines is a major development in the commercial phase of NLP and GPT-3 models.
Meta AI laid out its development process and released the software, code, and hardware specifications:
One final point of interest from the release notes from Meta AI:
“While there are many exciting developments in the space of large language models, the limitations and risks these models pose are still not well understood. Without direct access to these models, researchers are also limited in their ability to design detection and mitigation strategies for possible harm, which leaves detection and mitigation in the hands of only those with sufficient capital to access models of this scale. We hope that OPT-175B will bring more voices to the frontier of large language model creation, help the community collectively design responsible release strategies, and add an unprecedented level of transparency and openness to the development of large language models in the field.”
“What we call state-of-the-art nowadays can’t just be about performance. It has to be state-of-the-art in terms of responsibility as well.”
Be on the lookout for the continued leadership in this space of Joelle Pineau, a longtime advocate for transparency in the development of technology, who is the managing director at Meta AI. Pineau is singlehandedly setting the ethical tone within Meta AI and, because of its size and influence, the larger AI industry itself: From Pineau:
“We strongly believe that the ability for others to scrutinize your work is an important part of research. We really invite that collaboration. Many of us have been university researchers. We know the gap that exists between universities and industry in terms of the ability to build these models. Making this one available to researchers was a no-brainer.” (3)
She adds: “That commitment to open science is why I’m here. I wouldn’t be here on any other terms. What we call state-of-the-art nowadays can’t just be about performance. It has to be state-of-the-art in terms of responsibility as well. I can’t tell you that there’s no risk of this model producing language that we’re not proud of. It will. I believe the only way to build trust is extreme transparency. We have different opinions around the world about what speech is appropriate, and AI is a part of that conversation. But how do we grapple with that? You need many voices in that discussion.” (3)
It should go without saying that tracking threats are critical to informing your actions. This includes reading our OODA Daily Pulse, which will give you insights into the nature of the threat and risks to business operations.
Use OODA Loop to improve your decision-making in any competitive endeavor. Explore OODA Loop
The greatest determinant of your success will be the quality of your decisions. We examine frameworks for understanding and reducing risk while enabling opportunities. Topics include Black Swans, Gray Rhinos, Foresight, Strategy, Stratigames, Business Intelligence, and Intelligent Enterprises. Leadership in the modern age is also a key topic in this domain. Explore Decision Intelligence
We track the rapidly changing world of technology with a focus on what leaders need to know to improve decision-making. The future of tech is being created now and we provide insights that enable optimized action based on the future of tech. We provide deep insights into Artificial Intelligence, Machine Learning, Cloud Computing, Quantum Computing, Security Technology, Space Technology. Explore Disruptive/Exponential Tech
Security and resiliency topics include geopolitical and cyber risk, cyber conflict, cyber diplomacy, cybersecurity, nation-state conflict, non-nation state conflict, global health, international crime, supply chain, and terrorism. Explore Security and Resiliency
The OODA community includes a broad group of decision-makers, analysts, entrepreneurs, government leaders, and tech creators. Interact with and learn from your peers via online monthly meetings, OODA Salons, the OODAcast, in-person conferences, and an online forum. For the most sensitive discussions interact with executive leaders via a closed Wickr channel. The community also has access to a member-only video library. Explore The OODA Community