Start your day with intelligence. Get The OODA Daily Pulse.
While the societal and business impact of AI continues to consume the headlines, the reality is the future of AI, to a large extent, will be formally litigated in the legal system.
Following is an overview of the recently filed lawsuits with claims related to AI – followed by further resources and recent analyses of copyright, fair use and generative AI by the Congressional Research Service, the Stanford Institute for Human-Centered AI (HAI), the United States Copyright Office, the United States Patent and Trademark Office, and Recreating Europe.
The emergence of generative AI has raised several legal and ethical questions regarding copyright ownership and infringement. One of the key issues is determining the legal status of content generated by AI systems. Since generative AI creates content autonomously, it can be challenging to identify the primary author or copyright holder. This raises questions about who should be granted copyright protection and who should be liable for any copyright infringements.
According to an article on Built In, the U.S. Copyright Office has long held that there is no copyright protection for works created by non-humans, including machines. Therefore, the product of a generative AI model cannot be copyrighted. However, some argue that AI-generated works should be eligible for copyright protection because they are the product of complex algorithms and programming. In the US, copyright may be possible in cases where the creator can prove there was substantial human input.
Last month, Generative AI entered the courtroom in the form of a “nearly 160-page complaint filed…in a San Francisco federal court: The generative artificial intelligence company OpenAI LP was hit with a wide-ranging consumer class action lawsuit alleging that company’s use of web scraping to train its artificial intelligence models misappropriates personal data on ‘an unprecedented scale.’ [The complaint alleges that] “OpenAI’s popular generative AI programs ChatGPT and DALL-E are trained on ‘stolen private information’ taken from what it described as hundreds of millions of internet users, including children, without proper permission.” (1)
Further details of the lawsuit by way of Bloomberg Law:
“OpenAI illegally accesses private information from individuals’ interactions with its products and from applications that have integrated ChatGPT, the lawsuit claims. Such integrations allow the company to gather image and location data from Snapchat, music preferences on Spotify, financial information from Stripe, and private conversations on Slack and Microsoft Teams, according to the lawsuit.
The tech company, which is at the forefront of the burgeoning AI industry, is accused of conducting an enormous web scraping operation in secret, violating terms of service agreements and state and federal privacy and property laws. One of the laws cited in the suit is the Computer Fraud and Abuse Act, a federal anti-hacking law that has been invoked in scraping disputes before.
“Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft,” the complaint said.
The complaint named 16 plaintiffs who used various internet services, including ChatGPT, and who believed their personal information had been stolen by OpenAI.” (1)
Also in late June, “another class-action lawsuit claimed ChatGPT’s machine learning was trained on books without permission from its authors. The complaint was filed in a San Francisco federal court and alleged ChatGPT’s machine learning training dataset came from books and other texts that are “copied by OpenAI without consent, without credit, and without compensation.”
Daniel Newman, chief analyst at Futurum Research, says data protection remains a critical concern with AI applications. “We have definitely hit the inflection point where the speed of rolling out generative AI and the potential implications around data rights and privacy are coming to a head,” Newman tells InformationWeek. “While I believe the critical importance of AI will win out in the long run, the rapid deployment has created vulnerabilities in general of how data is ingested and then used. It will be critical that the tech industry takes this issue seriously…” (2)
Soon after AI tools emerged last year, lawsuits began challenging what the tools were trained on and how they could be used.
Photo service Getty Images blocked AI-generated images back in September, and then in February, it sued AI art generator Stable Diffusion for allegedly copying over 12 million images from its database without permission or compensation.
Separately, three artists sued Stable Diffusion, art generator Midjourney and art hosting site DeviantArt in January for allegedly using their work to train AI models without consent or compensation, claiming that “millions of artists” have been similarly victimized, according to The Verge.
In response, software maker Adobe released Firefly in March, a generative AI toolset that uses the company’s own library of stock images to create images without fear of illegally scraping artists’ works. Adobe is gearing up to integrate Firefly into the other products in its software lineup, like Photoshop.
Creators have hit other speed bumps while integrating AI into the modern publishing process. The US copyright office denied copyright protections to the AI-generated art in a graphic novel, though it did grant them for the human-created writing. And short story publications have been swamped with AI-generated submissions, to the point where the celebrated outlet Clarkesworld banned anything even partially created with AI. (3)
This CRS Legal Sidebar explores questions that courts and the U.S. Copyright Office have begun to confront regarding whether the outputs of generative AI programs are entitled to copyright protection, as well as how training and using these programs might infringe copyrights in other works.
Generative AI also raises questions about copyright infringement. Commentators and courts have begun to address whether generative AI programs may infringe copyright in existing works, either by making copies of existing works to train the AI or by generating outputs that resemble those existing works. Does the AI Training Process Infringe Copyright in Other Works? AI systems are “trained” to create literary, visual, and other artistic works by exposing the program to large amounts of data, which may consist of existing works such as text and images from the internet.
This training process may involve making digital copies of existing works, carrying a risk of copyright infringement. As the U.S. Patent and Trademark Office has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.”
Creating such copies, without express or implied permission from the various copyright owners, may infringe the copyright holders’ exclusive right to make reproductions of their work. AI companies may argue that their training processes constitute fair use and are therefore non-infringing. Whether or not copying constitutes fair use depends on four statutory factors under 17 U.S.C. § 107:
Some stakeholders argue that the use of copyrighted works to train AI programs should be considered a fair use under these factors. Regarding the first factor, OpenAI argues its purpose is “transformative” as opposed to “expressive” because the training process creates “a useful generative AI system.” OpenAI also contends that the third factor supports fair use because the copies are not made available to the public but are used only to train the program. For support, OpenAI cites The Authors Guild, Inc. v. Google, Inc., in which the U.S. Court of Appeals for the Second Circuit held that Google’s copying of entire books to create a searchable database that displayed excerpts of those books constituted fair use.
Regarding the fourth fair use factor, some generative AI applications have raised concern that training AI programs on copyrighted works allows them to generate works that compete with the original works. For example, an AI-generated song called “Heart on My Sleeve,” made to sound like the artists Drake and The Weeknd, was heard millions of times in April 2023 before it was removed by various streaming services. Universal Music Group, which has deals with both artists, argues that AI companies violate copyright by using these artists’ songs in training data.
These arguments may soon be tested in court, as plaintiffs have recently filed multiple lawsuits alleging copyright infringement via AI training processes. On January 13, 2023, several artists filed a putative class action lawsuit alleging their copyrights were infringed in the training of AI image programs, including Midjourney and Stable Diffusion. The class action lawsuit claims that defendants “downloaded or otherwise acquired copies of billions of copyrighted images without permission” to use as “training images,” making and storing copies of those images without the artists’ consent. Similarly, on February 3, 2023, Getty Images filed a lawsuit alleging that “Stability AI has copied at least 12 million copyrighted images from Getty Images’ websites . . . in order to train its Stable Diffusion model.” Both lawsuits appear to dispute any characterization of fair use, arguing that Stable Diffusion is a commercial product, weighing against fair use under the first statutory factor, and that the program undermines the market for the original works, weighing against fair use under the fourth factor. (4)
“Peter Henderson, a JD/PhD candidate at Stanford University and co-author of the recent paper, Foundation Models and Fair Use, lays out a complicated landscape:
‘People in machine learning aren’t necessarily aware of the nuances of fair use and, at the same time, the courts have ruled that certain high-profile real-world examples are not protected fair use, yet those very same examples look like things AI is putting out,’ Henderson says. ‘There’s uncertainty about how lawsuits will come out in this area.’
The consequences of stepping outside fair use boundaries could be considerable. Not only could there be civil liability but new precedent set by courts could dramatically curtail how generative AI is trained and used.” (5)
A description of the paper by Henderson and co-authors:
“Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model.
In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.” (6)
The authors of the paper from the Stanford HAI “proposed strategies to deal with the problem — from filters on the input data and the output content that recognize when AI is pushing the boundaries too far to training models in ways more in line with fair use. ‘There’s also an exciting research agenda in the field to figure out how to make models more transformative,” Henderson says. “For example, might we be able to train models to only copy facts and never exact creative expression?'”
Henderson does have some recommendations for coming to grips with this growing concern:
If your organization needs to take a deeper dive on any of these issues, see:
Copyright Law: An Introduction and Issues for Congress (CRS – March 2023): Given the economic and cultural significance of copyrightintensive industries, Congress frequently considers amendments to the Copyright Act, the federal law governing the U.S. copyright system. This In Focus provides an overview of copyright law and highlights areas of current and potential congressional interest.
Copyright Law and Machine Learning for AI: Where Are We and Where Are We Going? (October 2021) – Co-Sponsored by the United States Copyright Office and the United States Patent and Trademark Office: On October 26, 2021, the U.S. Copyright Office and the U.S. Patent and Trademark Office held a conference on machine learning and copyright law. Panelists explored machine learning in practice, how existing copyright laws apply to the training of artificial intelligence, and what the future may hold in this fast-moving policy space.
Legal approaches to Data: Scraping, Mining and Learning (Create UK – July 2021): The mining of big data and machine learning requires the compilation of corpora (e.g. literary works, public domain material, data) that are often “available on the internet”. The collection stage is usually followed by processing and annotation of the collected data, depending on the type of learning (supervised/unsupervised) and the purpose of the algorithm. Copyright law has a direct impact on this process, as the corpora could include works protected by copyright and, any digital copy, temporary or permanent, in whole or in part, direct or indirect, has the potential to infringe copyright Furthermore, the changes made in the collected material can amount to ‘adaptation’ and the relevant exceptions, such as research or text and data mining, might not sufficiently cover these activities of the stakeholders in this area. This project analyzes case studies on data scraping, natural language processing and computer vision to assess whether the current legal framework is well equipped for the development of AI applications, especially in the field of machine learning, or, if not, what kind of measures should be developed (legal reform, policy initiatives, licences and licence compatibility tools, etc).
https://oodaloop.com/archive/2023/05/28/andy-warhol-and-prince-are-the-future-of-generative-ai-and-copyright-law/
https://oodaloop.com/archive/2023/06/26/is-it-time-for-an-ai-ntsb-the-artificial-intelligence-incident-database-may-already-foot-the-bill/
For next steps in ensuring your business is approaching AI with risk mitigation in mind.Artificial Intelligence for Business Advantage.
Looking for a primer on what executives need to know about real AI and ML? See A Decision-Maker’s Guide to Artificial Intelligence.
OODAcon 2023
https://oodaloop.com/archive/2023/02/21/ooda-almanac-2023-useful-observations-for-contemplating-the-future/
The OODA C-Suite Report: Operational Intelligence for Business Leaders
https://oodaloop.com/archive/2019/02/27/securing-ai-four-areas-to-focus-on-right-now/
https://oodaloop.com/archive/2023/05/28/when-artificial-intelligence-goes-wrong-2/
https://oodaloop.com/archive/2023/06/14/improving-mission-impact-with-a-culture-that-drives-adoption/
https://oodaloop.com/ooda-original/2023/04/26/the-cybersecurity-implications-of-chatgpt-and-enabling-secure-enterprise-use-of-large-language-models/
https://oodaloop.com/archive/2023/05/08/the-ooda-network-on-the-real-danger-of-ai-innovation-at-exponential-speed-and-scale-and-not-adequately-addressing-ai-governance/
https://oodaloop.com/archive/2023/03/26/march-2023-ooda-network-member-meeting-tackled-strategy-misinforming-regulation-systemic-failure-and-the-emergence-of-new-risks/
https://oodaloop.com/archive/2023/02/01/nist-makes-available-the-voluntary-artificial-intelligence-risk-management-framework-ai-rmf-1-0-and-the-ai-rmf-playbook/
https://oodaloop.com/archive/2022/07/05/ai-ml-enabled-systems-for-strategy-and-judgment-and-the-future-of-human-computer-data-interaction-design/