Location
guerrero
Job Type
Full-time
Posted
July 05, 2026
Job Description
Responsibilities Develop and optimize LLM serving systems based on Deep X NPU. Design and implement runtime and inference engines for LLM. Analyze LLM serving performance and resolve bottlenecks. Qualifications Bachelor’s degree or higher in Computer Science or a related field. Experience in software development using C/C++ and Python. Understanding of LLMs or deep learning frameworks (e.g., Tensor Flow, Py Torch). Experience with Linux‑based development environments. Basic knowledge of computer architecture and parallel processing. Preferred Qualifications Experience with AI accelerator hardware such as NPUs or GPUs. Hands‑on experience with model compilers/runtimes like ONNX, TVM, or Tensor RT. Knowledge of model optimization techniques such as quantization and pruning. Project experience with LLM models such as Mistral, LLa MA, or GPT. Passion to grow fast in a dynamic startup environment. Recruitment Process Application Review – Phone Interview Technical Interview Organizational Cu...
Ready to Apply?
Submit your application for [sw] llm serving sw engineer (kilómetro 20) at Link-Worldwide
Apply Now