In certain corners of the tech industry, it’s an article of faith that training artificial intelligence systems on larger amounts of online data will allow these tools to get better over time — possibly to the point of outperforming humans on certain tasks. But a new research paper is casting some doubt on that approach and raising alarms about what might be a fatal flaw in how AI systems are developed. In the paper, published in Nature in July, researchers find that when AI models are trained on data that includes AI-generated content — as will likely be increasingly common — they eventually end up with deteriorated performance, a phenomenon dubbed “model collapse.” The findings add to growing skepticism about the long-term trajectory of AI and come at a time when Wall Street is already questioning whether Big Tech’s massive investments in AI development will ultimately pay off. AI chatbots such as ChatGPT are powered by large language models trained on an almost inconceivable amount of data (trillions of words, in some cases) pulled from web pages, articles, comments sections and more. With these vast datasets, AI companies have been able to build products that can spit out shockingly relevant responses to user queries. But some AI-watchers have raised concerns that these models will eventually get significantly less accurate and “collapse” if they are trained on content that was generated by AI rather than actual humans. One 2023 paper on model collapse showed that AI images of humans became increasingly distorted after the model re-trained on “even small amounts of their own creation.” The researchers likened the phenomenon to an AI system being “poisoned” by its own work.
Full research : Why AI Researchers Are Worried About ‘Model Collapse.’