Meta debuts slimmed-down Llama models for low-powered devices

10/25/2024

Meta Platforms Inc. is striving to make its popular open-source large language models more accessible with the release of “quantized” versions of the Llama 3.2 1B and Llama 3B models, designed to run on low-powered devices. The Llama 3.2 1B and 3B models were announced at Meta’s Connect 2024 event last month. They’re the company’s smallest LLMs so far, designed to address the demand to run generative artificial intelligence on-device and in edge deployments. Now it’s releasing quantized, or lightweight, versions of those models, which come with a reduced memory footprint and support faster on-device inference, with greater accuracy, the company said. It’s all in the pursuit of portability, Meta said, enabling the Llama 3.2 1B and 3B models to be deployed on resource-constrained devices while maintaining their strong performance. In a blog post today, Meta’s AI research team explained that, thanks to the limited runtime memory available on mobile devices, it opted to prioritize “short-context applications up to 8K” for the quantized models. Quantization is a technique that can be applied to reduce the size of large language models by modifying the precision of their model weights. Meta’s researchers said they used two different methods to quantize the Llama 3.2 1B and 3B models, including a technique known as “Quantization-Aware Training with LoRA adaptors,” or QLoRA, which helps to optimize their performance in low-precision environments. The QLoRA method helps to prioritize accuracy when quantizing LLMs, but in cases where developers would rather put more emphasis on portability at the expense of performance, a second technique, known as SpinQuant, can be used. Using SpinQuant, Meta said it can determine the best possible combination for compression, so as to ensure the model can be ported to the target device while retaining the best possible performance.

Full report : Meta debuts “quantized” versions of Llama 3.2 1B and 3B models, designed to run on low-powered devices and developed in collaboration with Qualcomm and MediaTek.

Tagged: AI LLaMa Meta

Subscribe Sign In

Related Posts