In 2016, an AI named AlphaGo finally proved to the world that AI could go beyond our capabilities by defeating the human champion of the Go board game. The first superhuman AI. Since then, humans have achieved another milestone, the creation of the first general-purpose AI models, with examples like ChatGPT. Fascinatingly, in 2024 we could see both worlds colliding with the emergence of the first general-purpose superhuman model. However, a recent paper by OpenAI argues that we aren’t ready to steer these models, posing a considerable, even potentially catastrophic, risk to humanity, to the point they are pouring billions into solving this problem. Luckily, they also show signs of hope, thanks to what they call weak-to-strong generalization, and they give plenty of insights into what this is and how it could potentially save us. An essential piece in the creation of natural language model assistants like ChatGPT is our capacity to align them. But what do we mean by that? A Generative Pretrained Transformer (GPT) is a natural language model that has been trained in an unsupervised manner, sometimes referred to as self-supervised, where the Transformer sees trillions of words and essentially predicts, out of all the words in its vocabulary, which one seems to be correct in its opinion.
Full critique : OpenAI Just Proved Superhuman Artificial Intelligence Paradigm and its Growing Risks for Humans.