AI Infra Engineer & Open-Source Developer
DiT / LLM Inference · CUDA Kernels · High-Performance C++ Toolkits
Hi! I'm DefTruth, an AI Infra engineer (DiT / LLM) and open-source developer focused on making large-scale AI models run faster and cheaper.
My recent work centres on Diffusion Transformer inference acceleration — I built Cache-DiT, a PyTorch-native inference engine with hybrid cache acceleration (DBCache, TaylorSeer, SCM) and massive parallelism (CP, USP, TP, UAA).
I'm also deeply invested in CUDA kernel engineering: writing HGEMM, FlashAttention, and 200+ kernels from scratch using Tensor Cores, MMA, and CuTe, documented in LeetCUDA.
I also built lite.ai.toolkit - a production C++ inference toolkit for 100+ vision models with ORT, MNN and TensorRT. Built ffpa-attn, torchlm, and core contributed to PaddlePaddle's FastDeploy v1.0 - covering 160+ text and vision models. I am also contributed to SGLang, vLLM, and Diffusers, focused on DiT caching, parallelism, quantization and CUDA kernels.
I curate high-quality reading lists: Awesome-LLM-Inference, Awesome-DiT-Inference and lihang-notes.
I maintain the xlite-dev organisation, write on 知乎, and am ranked among top open-source developers on Trendshift.
Feel free to reach out for discussions about DiT / LLM inference, CUDA kernel engineering, or open-source collaboration.