Site Reliability Engineer - Machine Learning Systems (Singapore) Technology - Backend Singapore[...]
Location
singapore
Job Type
Full-time
Posted
June 23, 2026
Job Description
Site Reliability Engineer - Machine Learning Systems (Singapore)
Job Code: A A
Responsibilities- Ensure our ML systems operate efficiently for large model deployment, training, evaluation, and inference.
- Maintain stability of offline tasks/services across multi‑data center, multi‑region, and multi‑cloud scenarios.
- Manage resource planning, cost, and budget, including computing and storage resources.
- Implement global system disaster recovery, cluster machine governance, and enhance business service stability, resource utilization, and operational efficiency.
- Build software tools, products, and systems to monitor and manage ML infrastructure and services efficiently.
- Participate in the global team roster that ensures system and business on‑call support.
- Bachelor’s degree or above in Computer Science, Computer Engineering, or related fields.
- Stro...
Ready to Apply?
Submit your application for Site Reliability Engineer - Machine Learning Systems (Singapore) Technology - Backend Singapore[...] at ByteDance
Apply Now