Would you trust an AI chatbot with family planning? Investing $1 million? How about writing your wedding vows? Human-sounding bots barely existed two years ago. Now they’re everywhere. There’s ChatGPT, which kicked off the whole generative-AI craze, and big swings from Google and Microsoft, plus countless other smaller players, all with their own smooth-talking helpers. We put five of the leading bots through a series of blind tests to determine their usefulness. While we hoped to find the Caitlin Clark of chatbots, that wasn’t exactly what happened. They excel in some areas and fail in others. Plus, they’re all evolving rapidly. During our testing, OpenAI released an upgrade to ChatGPT that improved its speed and current-events knowledge. We wanted to see the range of responses we’d get asking real-life questions and ordering up everyday tasks—not a scientific assessment, but one that reflects how we’ll all use these tools. Consider it the chatbot Olympics. We have ChatGPT by OpenAI, celebrated for its versatility and ability to remember user preferences. (Wall Street Journal owner News Corp has a content-licensing partnership with OpenAI.) Anthropic’s Claude, from a socially conscious startup, is geared to be inoffensive. Microsoft’s Copilot leverages OpenAI’s technology and integrates with services like Bing and Microsoft 365. Google’s Gemini accesses the popular search engine for real-time responses. And Perplexity is a research-focused chatbot that cites sources with links and stays up to date. While each of these services offer a no-fee version, we used the $20-a-month paid versions for enhanced performance, to assess their full capabilities across a wide range of tasks. (We used the latest ChatGPT GPT-4o model and Gemini 1.5 Pro model in our testing.)
Full analysis : ChatGPT, Claude, Copilot, Gemini, and Perplexity’s responses to some real-life questions and everyday tasks: Perplexity ranked first overall.