DefTruth avatar

DefTruth

AI Infra Engineer & Open-Source Developer
DiT / LLM Inference  ·  CUDA Kernels  ·  High-Performance C++ Toolkits

GitHub Stars GitHub Stars
14+Open-Source Projects
2SCI Publications

About

Hi! I'm DefTruth, an AI Infra engineer (DiT / LLM) and open-source developer focused on making large-scale AI models run faster and cheaper.

My recent work centres on Diffusion Transformer inference acceleration — I built Cache-DiT, a PyTorch-native inference engine with hybrid cache acceleration (DBCache, TaylorSeer, SCM) and massive parallelism (CP, USP, TP, UAA).

I'm also deeply invested in CUDA kernel engineering: writing HGEMM, FlashAttention, and 200+ kernels from scratch using Tensor Cores, MMA, and CuTe, documented in LeetCUDA.

I also built lite.ai.toolkit - a production C++ inference toolkit for 100+ vision models with ORT, MNN and TensorRT. Built ffpa-attn, torchlm, and core contributed to PaddlePaddle's FastDeploy v1.0 - covering 160+ text and vision models. I am also contributed to SGLang, vLLM, and Diffusers, focused on DiT caching, parallelism, quantization and CUDA kernels.

I curate high-quality reading lists: Awesome-LLM-Inference, Awesome-DiT-Inference and lihang-notes.

I maintain the xlite-dev organisation, write on 知乎, and am ranked among top open-source developers on Trendshift.

Featured Works

Publications

Contact

Feel free to reach out for discussions about DiT / LLM inference, CUDA kernel engineering, or open-source collaboration.