Cluster Ops
Strong Compute
Operations
San Francisco, CA, USA
Posted on May 27, 2025
We manage thousands of GPUs today and need to grow this with reliability, security and performance in mind.
You’ll be working on ops for multi-provider GPU clusters.
When applying please speak to:
- GPU type and count you’ve managed
- Providers you’ve worked with. Eg Hyperscalers, neoclouds, on prem.
- Interconnect you’ve managed.
- What tooling you used eg. for provision, scheduling, storage, monitoring, cost management etc.
- What tooling you developed.
Our culture
- 🚀 We move fast. We ship weekly—new features, improvements, and fixes go live fast. Our infra runs cluster scale up tests daily.👥 We test big. Every month, we stress test with large groups of users face to face, get real-world feedback, and iterate rapidly.
- 💻 We build together. Weekend hackathons push boundaries, drive innovation, and help us level up as a team.
- 🔄 We iterate relentlessly. Direct user feedback shapes our roadmap—we release, test, refine, and keep moving.
- ✈️ We travel when needed. Engineers may travel between SF and Sydney to run events, attend conferences, and meet with clients.