April 28, 2024

Valley Post

Read Latest News on Sports, Business, Entertainment, Blogs and Opinions from leading columnists.

Apple wants to store LLM degrees in flash memory to bring AI to mobile phones and laptops – Apple

Apple wants to store LLM degrees in flash memory to bring AI to mobile phones and laptops – Apple

apple I've been experimenting for a long time With large language models (LLMs) driving most AI applications today.

Now we learn that the company wants to put these large language models at the service of users of its services and devices in the best possible way, but this is a very difficult task because it requires a lot of resources, both computational and memory. .

Traditionally, LLM requires AI accelerators coupled with a sufficient amount of DRAM to store model weights. But recently, Apple published a document revealing that the company plans to bring large language models to devices with limited memory. By storing LLMs in NAND flash memory, the method involves building an underlying inference model that aligns with the behavior of flash memory by guiding optimization in two important areas: reducing the amount of data transferred from flash memory and reading data in larger contiguous chunks. Also, instead of storing the model weights in DRAM, Apple wants to use flash memory to store the weights, which it can then pull “on demand” into DRAM just when needed.

Apple's Flash-based framework features two basic technologies: one called “windowing” and the other “row and column grouping.” Together, the two selected methods allow executing models with up to twice the amount of available DRAM, with a 4-5x and 20-25x increase in inference speed compared to the native processor and graphics card offloading methods, respectively.

The combination of sparsity awareness, context-adaptive loading, and hardware-oriented design paves the way for practical application of large language model inference in devices with limited memory such as SoCs with 8, 16, or 32 GB of available DRAM. Especially since DRAM prices far exceed those of NAND Flash, some devices with limited memory such as smartphones can easily store an LLM with several billion parameters, even if the available DRAM is not sufficient for this task. If you want to dig deeper into the technology Apple is proposing, you can read the related document here.

See also  The specifications for the GDDR7 memory standard for graphics cards are published by JEDEC - the Computer Foundation