Matt Shumer, co-founder and CEO of OthersideAI, also known as its signature AI assistant writing product HyperWrite, has broken his near two days of silence after being accused of fraud when third-party researchers were unable to replicate the supposed top performance of a new large language model (LLM) he released on Thursday, September 5. On his account on the social network X, Shumer apologized and claimed he “Got ahead of himself,” adding “I know that many of you are excited about the potential for this and are now skeptical.”
However, his latest statements do not fully explain why his model, Reflection 70B, which he claimed to be a variant of Meta’s Llama 3.1 trained using synthetic data generation platform Glaive AI, has not performed as well as he originally stated in all subsequent independent tests. Nor has Shumer clarified precisely what went wrong. Here’s a timeline: Thursday, Sept. 5, 2024: Initial lofty claims of Reflection 70B’s superior performance on benchmarks
In case you’re just catching up, last week, Shumer released Reflection 70B, on the open source AI community Hugging Face, calling it “the world’s top open-source model” in a post on X and posting a chart of what he said were its state-of-the-art results on third-party benchmarks. Shumer claimed the impressive performance was achieved to a technique called “Reflection Tuning,” which allows the model to assess and refine its responses for correctness before outputting them to users. VentureBeat interviewed Shumer and accepted his benchmarks as he presented them, crediting them to him, as we do not have the time nor resources with which to run our own independent benchmarking — and most model providers we’ve covered have so far been forthright. Fri. Sept. 6-Monday Sept. 9: Third party evaluations fail to reproduce Reflection 70B’s impressive results — Shumer accused of fraud