Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
-
Updated
Jun 11, 2025 - Python
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Efficient triton implementation of Native Sparse Attention.
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
[TIP-2025] Pytorch implementation of "Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution"
Building Native Sparse Attention
Text Summarization Modeling with three different Attention Types
Integrating QC techniques into Sparse Attention for Transformers
Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.
Add a description, image, and links to the sparse-attention topic page so that developers can more easily learn about it.
To associate your repository with the sparse-attention topic, visit your repo's landing page and select "manage topics."