cutlass

Star

Here are 19 public repositories matching this topic...

bytedance / flux

Star

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

gpu cuda pytorch cutlass

Updated Aug 28, 2025
C++

SmallDoges / flash-dmattn

Star

Trainable fast and memory-efficient sparse attention

transformers pytorch english transformer triton chinese cuda-kernels cutlass attention-mechanism attention-is-all-you-need self-attention pytorch-implementation flash-attention triton-kernels dynamic-mask-attention

Updated Nov 7, 2025
Python

coderonion / awesome-cuda-and-hpc

Star

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

Updated Aug 2, 2025

DD-DuDa / Cute-Learning

Star

Examples of CUDA implementations by Cutlass CuTe

gpu cuda cutlass

Updated Jul 1, 2025
Makefile

leimao / CUTLASS-Examples

Sponsor

Star

CUTLASS and CuTe Examples

docker cuda cutlass

Updated Oct 17, 2025
Cuda

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Feb 27, 2025
C++

bikrammajhi / 100-days-of-GPU

Star

This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA/CUTLASS kernels, Triton spells, and PTX sorcery.

mojo cuda triton cutlass ptx nsight-compute thunderkittens

Updated Nov 2, 2025
HTML

YashasSamaga / ConvolutionBuildingBlocks

Star

GEMM and Winograd based convolutions using CUTLASS

deep-learning cuda convolution cutlass

Updated Jul 15, 2020
Cuda

yester31 / Cutlass_EX

Star

study of cutlass

cmake cuda cpp17 cutlass linux-programming parallel-programming

Updated Nov 10, 2024
Cuda

Bruce-Lee-LY / cutlass_gemm

Star

Multiple GEMM operators are constructed with cutlass to support LLM inference.

gpu cublas nvidia cutlass gemm cublaslt llm matrix-multiply tensor-core

Updated Aug 3, 2025
C++

sgl-project / whl

Star

Kernel Library Wheel for SGLang

cuda cutlass sglang flashinfer

Updated Nov 9, 2025
HTML

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

Star

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cuda transformer cutlass cute tensorrt feature-matching multihead-attention superpoint lightglue flash-attention flash-attention-2

Updated Mar 3, 2025
Cuda

cjmcv / ai-infra-notes

Star

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

hpc gpu cuda inference simd cutlass heterogeneous-computing mlsys llm sglang

Updated Nov 2, 2025

Bruce-Lee-LY / DeepGEMMPerTensor

Star

DeepGEMMPerTensor: clean and efficient FP8 GEMM per tensor kernels without scales

gpu cuda nvidia cutlass tensor-core deep-gemm fp8-gemm

Updated Sep 7, 2025
Python

digital-nomad-cheng / tvm_project_course

Star

neural-network compiler cuda cutlass tensorrt tvm dl-compiler

Updated Nov 2, 2023
Python

Routhleck / blocksparse-pytorch-implement

Star

pytorch implements block sparse

python cuda pytorch matrix-multiplication cutlass blocksparse tilesparse

Updated May 13, 2023
C++

prateekshukla1108 / cutlass3

Star

Docs

cutlass

Updated May 14, 2025
HTML

peterlau123 / Lolly

Star

Lightweight and production level C++ Open source Library

c cpp cuda cutlass

Updated May 7, 2025
C++

jiaau / kernels

Star

This repository showcases common optimization techniques for kernels.

kernel cpp hpc cuda cutlass cute

Updated Aug 7, 2025
Cuda

Improve this page

Add a description, image, and links to the cutlass topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cutlass topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cutlass

Here are 19 public repositories matching this topic...

bytedance / flux

SmallDoges / flash-dmattn

coderonion / awesome-cuda-and-hpc

DD-DuDa / Cute-Learning

leimao / CUTLASS-Examples

Bruce-Lee-LY / flash_attention_inference

bikrammajhi / 100-days-of-GPU

YashasSamaga / ConvolutionBuildingBlocks

yester31 / Cutlass_EX

Bruce-Lee-LY / cutlass_gemm

sgl-project / whl

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

cjmcv / ai-infra-notes

Bruce-Lee-LY / DeepGEMMPerTensor

digital-nomad-cheng / tvm_project_course

Routhleck / blocksparse-pytorch-implement

prateekshukla1108 / cutlass3

peterlau123 / Lolly

jiaau / kernels

Improve this page

Add this topic to your repo