Start your day with intelligence. Get The OODA Daily Pulse.

OpenAI breach is a reminder that AI companies are treasure troves for hackers

There’s no need to worry that your secret ChatGPT conversations were obtained in a recently reported breach of OpenAI’s systems. The hack itself, while troubling, appears to have been superficial — but it’s reminder that AI companies have in short order made themselves into one of the juiciest targets out there for hackers. The New York Times reported the hack in more detail after former OpenAI employee Leopold Aschenbrenner hinted at it recently in a podcast. He called it a “major security incident,” but unnamed company sources told the Times the hacker only got access to an employee discussion forum. (I reached out to OpenAI for confirmation and comment.) No security breach should really be treated as trivial, and eavesdropping on internal OpenAI development talk certainly has its value. But it’s far from a hacker getting access to internal systems, models in progress, secret roadmaps, and so on. But it should scare us anyway, and not necessarily because of the threat of China or other adversaries overtaking us in the AI arms race. The simple fact is that these AI companies have become gatekeepers to a tremendous amount of very valuable data. Let’s talk about three kinds of data OpenAI and, to a lesser extent, other AI companies created or have access to: high-quality training data, bulk user interactions, and customer data. It’s uncertain what training data exactly they have, because the companies are incredibly secretive about their hoards. But it’s a mistake to think that they are just big piles of scraped web data. Yes, they do use web scrapers or datasets like the Pile, but it’s a gargantuan task shaping that raw data into something that can be used to train a model like GPT-4o. A huge amount of human work hours are required to do this — it can only be partially automated.

Full report : OpenAI breach is a reminder that AI companies are treasure troves for hackers,