Location
Beijing
Job Type
Full-time
Posted
May 30, 2026
Job Description
NVIDIA is leading company of AI computing. At NVIDIA, our employees are passionate about AI, HPC , VISUAL, GAMING. Our SA team is more focusing to bring NVIDIA new technology into difference industries. We help to design the architecture of AI computing platform, analysis the AI and HPC applications to deliver our value to customers, focusing on defining and solving computational challenges in LLM inference and training acceleration, as well as network communication and data transfer optimization.
What You'll Be Doing:
+ Contribute to the development of open-source inference frameworks such as SGLang and vLLM, including feature and operator development, performance optimization, and model support, in collaboration with the community.
+ Develop and optimize KV cache offloading frameworks for LLM workloads, supporting multi-level cache offloading and reuse across CPU, SSD, and remote storage to improve inference efficiency. (Team project: FlexKV)
+ Drive R&D on co...
What You'll Be Doing:
+ Contribute to the development of open-source inference frameworks such as SGLang and vLLM, including feature and operator development, performance optimization, and model support, in collaboration with the community.
+ Develop and optimize KV cache offloading frameworks for LLM workloads, supporting multi-level cache offloading and reuse across CPU, SSD, and remote storage to improve inference efficiency. (Team project: FlexKV)
+ Drive R&D on co...
Ready to Apply?
Submit your application for Senior Deep Learning Solution Architect at NVIDIA
Apply Now