Start your day with intelligence. Get The OODA Daily Pulse.
Chinese artificial intelligence (AI) start-up DeepSeek has introduced a novel approach to improving the reasoning capabilities of large language models (LLMs), as the public awaits the release of the company’s next-generation model. In collaboration with researchers from Tsinghua University, DeepSeek developed a technique that combines methods referred to as generative reward modelling (GRM) and self-principled critique tuning, according to a paper published on Friday. The dual approach aims to enable LLMs to deliver better and faster results to general queries. The resulting DeepSeek-GRM models outperformed existing methods, having “achieved competitive performance” with strong public reward models, the researchers wrote. Reward modelling is a process that guides an LLM towards human preferences. DeepSeek intended to make the GRM models open source, according to the researchers, but they did not give a timeline. The academic paper, published on the online scientific paper repository arXiv, comes amid speculation about the start-up’s next move following the global attention garnered by the firm’s V3 foundation model and R1 reasoning model.