60% of OpenAI model’s responses contain plagiarism

02/23/2024

A new report from plagiarism detector Copyleaks found that 60% of OpenAI’s GPT-3.5 outputs contained some form of plagiarism. Why it matters: Content creators from authors and songwriters to The New York Times are arguing in court that generative AI trained on copyrighted material ends up spitting out exact copies. Copyleaks is an AI-based text analysis company that began selling plagiarism-detection tools to businesses and schools long before ChatGPT’s arrival. GPT-3.5 was the model powering ChatGPT when it debuted, but OpenAI has moved on to the bigger and more capable GPT-4.0. Between the lines: Plagiarism takes many forms beyond simple cutting and pasting full sentences and paragraphs. Copyleaks attempts to turn detecting plagiarism from “I know it when I see it” into an exact science. The company uses a proprietary scoring method that aggregates the rate of identical text, minor changes, paraphrased text, and other factors and then assigns content a “similarity score.” Per the report, for GPT-3.5, “45.7% of all outputs contained identical text, 27.4% contained minor changes, and 46.5% had paraphrased text.” “A score of 0% signifies that all of the content is original, whereas a score of 100% means that none of the content is original,” per the report. Copyleaks asked GPT-3.5 for around a thousand outputs, each around 400 words, across 26 subjects.

Full report : 60% of OpenAI model’s responses contain plagiarism.

Tagged: AI ChatGPT Copyright Law OpenAI

Subscribe Sign In

Related Posts