Content-type: text/html Downes.ca ~ Stephen's Web ~ Why large language models struggle with long contexts

Stephen Downes

Knowledge, Learning, Community

If you've done any amount of work with a large language model, you will have encountered this problem: transformer-based models like ChatGPT get less efficient as you give them more and more content to consider. In short, "transformers have a scaling problem." This article considers this problem. It describes "FlashAttention," which "calculates attention in a way that minimizes the number of these slow memory operations." It also mentions "scaling attention across multiple GPUs." It also considers a Google hybrid between a transformer and a Recurrent Neural Network (RNN) and Mamba, another RNN project. But it's an open issue because "we want models that can handle billions of tokens" without forgetting them.

Today: 18 Total: 346 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Dec 18, 2024 10:43 a.m.

Canadian Flag Creative Commons License.

Force:yes