site:www.usenix.org - Search News

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

Transformer-based large language models (LLMs) demonstrate impressive performance across various natural language processing tasks. Serving LLM inference for generating long contents, however, poses a ...

USENIX

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs

Trending now