Optimizes cloud resources for Stateful Kubernetes, HPC, and AI workloads—including GPUs.
End users are DevOps, CloudOps managing complex, distributed workloads across hundreds of clusters and hundreds of thousands of nodes.
Dashboards provide real-time observability, metrics, and alerts at hyperscale.
The Role
Design and implementation of a modern UI for a multi-tenant, infrastructure-scale platform including dashboards for real-time observability across 100k+ nodes
Develop Develop backend APIs and Integrations with Golang
Lead two software engineers (they are full stack with an emphasis on the UI)
Collaborate with two other dev teams (both are full stack with emphasis on platform and backend)
Experience & Skills
10+ years of software engineering experience
Expertise in Go, React, Next.js, TypeScript
Proven experience building scalable UI for multi-cluster, multi-tenant environment, modern frontend architecture and patterns, UI State management and tooling
Extensive Golang skills including API design, data flow optimization
Python, Redis, caching methods
Experience with Kubernetes (configuring, pods, operators, CRDs) is a plus
Prometheus, Grafana, Loki, OpenTelemetry
Authentication patterns and IAM (Keycloak or similar) is a plus
Prior startup or incubator experience (preferred)
Degree(s) in CS or similar (preferred)
Apply to this Job
First Name *
Last Name *
Email
Phone
Yes
No
Do you consent to receiving text messages related to employment opportunities from Living Talent at this number?
Msg&data rates may apply. Msg frequency varies. Text STOP to opt-out or HELP for assistance.Messaging Terms