DefTruth

About

Hi! I'm DefTruth, an AI Infra engineer (DiT / LLM) and open-source developer focused on making large-scale AI models run faster and cheaper.

My recent work centres on Diffusion Transformer inference acceleration — I built Cache-DiT, a PyTorch-native inference engine with hybrid cache acceleration (DBCache, TaylorSeer, SCM) and massive parallelism (CP, USP, TP, UAA).

I'm also deeply invested in CUDA kernel engineering: writing HGEMM, FlashAttention, and 200+ kernels from scratch using Tensor Cores, MMA, and CuTe, documented in LeetCUDA.

I also built lite.ai.toolkit - a production C++ inference toolkit for 100+ vision models with ORT, MNN and TensorRT. Built ffpa-attn, torchlm, and core contributed to PaddlePaddle's FastDeploy v1.0 - covering 160+ text and vision models. I am also contributed to SGLang, vLLM, and Diffusers, focused on DiT caching, parallelism, quantization and CUDA kernels.

I curate high-quality reading lists: Awesome-LLM-Inference, Awesome-DiT-Inference and lihang-notes.

I maintain the xlite-dev organisation, write on 知乎, and am ranked among top open-source developers on Trendshift.

About

Featured Works

Publications

Contact