Start your day with intelligence. Get The OODA Daily Pulse.

The Italian Data Protection Authority last month banned ChapGPT after the OpenAI large language model (LLM)-based XaaS platform experienced a major data breach.

The official statement from the agency, known in English as the Italian SA, pointed out the following specific reasons for the ban: 

  • Personal data is collected unlawfully, no age verification system is in place for children.
  • No way for ChatGPT to continue processing data in breach of privacy laws.
  • A data breach affecting ChatGPT users’ conversations and information on payments by subscribers to the service had been reported on 20 March.
  • ChatGPT is the best-known among relational AI platforms that are capable to emulate and elaborate human conversations.
  • As confirmed by the tests carried out so far, the information made available by ChatGPT does not always match factual circumstances, so that inaccurate personal data are processed.  
  • OpenAI is not established in the EU, however, it has designated a representative in the European Economic Area. (1

ChatGPT Suffers First Data Breach, Exposes Personal Information

“OpenAI’s ChatGPT…suffered its first major personal data breach.

The breach came during a March 20 outage and exposed payment-related and other personal information of 1.2% of the ChatGPT Plus subscribers who were active during a specific nine-hour window, according to a blog post by OpenAI Friday, March 24.

‘In the hours before we took ChatGPT offline on Monday, it was possible for some users to see another active user’s first and last name, email address, payment address, the last four digits (only) of a credit card number, and credit card expiration date. Full credit card numbers were not exposed at any time,’OpenAI officials wrote…” (1

The Italian Personal Data Protection Code and the European Union’s General Data Protection Regulation

Writ large, ChatGPT violates the Italina SA’s Personal Data Protection Code and the European Union’s General Data Protection Regulation (GDPR):  Failures to notify users of data collection, as well as to justify the hoovering of information, would run afoul of the European Union’s General Data Protection Regulation, raising the possibility that other countries in the bloc may follow suit in cracking down on the program. (2

Wired magazine also reports that the Italian ban “may just be the beginning of ChatGPT’s regulatory woes: 

The action is the first taken against ChatGPT by a Western regulator and highlights privacy tensions around the creation of giant generative AI models, which are often trained on vast swathes of internet data. Just as artists and media companies have complained that generative AI developers have used their work without permission, the data regulator is now saying the same for people’s personal information.

Similar decisions could follow all across Europe. In the days since Italy announced its probe, data regulators in France, Germany, and Ireland have contacted the [Italian SA] to ask for more information on its findings. ‘If the business model has just been to scrape the internet for whatever you could find, then there might be a really significant issue here,’ says Tobias Judin, the head of international at Norway’s data protection authority, which is monitoring developments. Judin adds that if a model is built on data that may be unlawfully collected, it raises questions about whether anyone can use the tools legally.

Europe’s GDPR rules, which cover the way organizations collect, store, and use people’s personal data, protect the data of more than 400 million people across the continent. This personal data can be anything from a person’s name to their IP address—if it can be used to identify someone, it can count as their personal information. Unlike the patchwork of state-level privacy rules in the United States, GDPR’s protections apply if people’s information is freely available online. In short: Just because someone’s information is public doesn’t mean you can vaccuum it up and do anything you want with it.

The [Italian SA] believes ChatGPT has four problems under GDPR:

  1. OpenAI doesn’t have age controls to stop people under the age of 13 from using the text generation system.
  2. It can provide information about people that isn’t accurate.
  3. People haven’t been told their data was collected; and
  4. There is ‘no legal basis’ for collecting people’s personal information in the massive swells of data used to train ChatGPT.

‘The Italians have called their bluff,’ says Lilian Edwards, a professor of law, innovation, and society at Newcastle University in the UK. ‘It did seem pretty evident in the EU that this was a breach of data protection law.’” (3

What Next?  

Details of the Ban: 

  • The Italian SA imposed an immediate temporary limitation on the processing of Italian users’ data by OpenAI, the US-based company developing and managing the platform.
  • An inquiry into the facts of the case was initiated as well.
  • In its order, the Italian SA highlights that no information is provided to users and data subjects whose data are collected by Open AI; more importantly, there appears to be no legal basis underpinning the massive collection and processing of personal data in order to ‘train’ the algorithms on which the platform relies.
  • Finally, the Italian SA emphasizes in its order that the lack of whatever age verification mechanism exposes children to receiving responses that are absolutely inappropriate to their age and awareness, even though the service is allegedly addressed to users aged above 13 according to OpenAI’s terms of service.
  • [OpenAI] will have to notify the Italian SA within 20 days of the measures implemented to comply with the order, otherwise a fine of up to EUR 20 million or 4% of the total worldwide annual turnover may be imposed. (1)

The Future of Model Data:

“OpenAI isn’t alone. Many of the issues raised by the Italian regulator are likely to cut to the core of all development of machine learning and generative AI systems, experts say. The EU is developing AI regulations, but so far there has been comparatively little action taken against the development of machine learning systems when it comes to privacy.

‘There is this rot at the very foundations of the building blocks of this technology—and I think that’s going to be very hard to cure,’ says Elizabeth Renieris, senior research associate at Oxford’s Institute for Ethics in AI and author on data practices. She points out that many data sets used for training machine learning systems have existed for years, and it is likely there were few privacy considerations when they were being put together. 

‘There’s this layering and this complex supply chain of how that data ultimately makes its way into something like GPT-4,’ Renieris says. ‘There’s never really been any type of data protection by design or default.'”  (4

https://oodaloop.com/archive/2019/02/27/securing-ai-four-areas-to-focus-on-right-now/

https://oodaloop.com/archive/2023/04/08/in-an-open-letter-tristan-harris-et-al-call-for-a-pause-on-the-training-of-ai-systems-more-powerful-than-gpt-4/

https://oodaloop.com/ooda-original/2023/04/07/bill-gates-weighs-in-on-the-opportunities-and-responsibilities-of-a-chatgpt-based-ai-future/

https://oodaloop.com/ooda-original/disruptive-technology/2022/12/22/ooda-loop-2022-the-past-present-and-future-of-chatgpt-gpt-3-openai-nlms-and-nlp/

Daniel Pereira

About the Author

Daniel Pereira

Daniel Pereira is research director at OODA. He is a foresight strategist, creative technologist, and an information communication technology (ICT) and digital media researcher with 20+ years of experience directing public/private partnerships and strategic innovation initiatives.