Senior Site Reliability Engineer

San Francisco, California

IDj-5917

Job TypeDirect Hire

Remote TypeOn-Site

Compensation$170000 - $250000 / yr

About the Senior Site Reliability Engineer role

We’re seeking a senior infrastructure-focused engineer to help operate and scale a complex production platform where reliability, performance, and visibility are first-class concerns. You’ll work closely with software engineers and data-focused teams to design, automate, and run the systems that support large-scale data processing, machine learning workloads, and low-latency services.

This role is best suited for a hands-on infrastructure or systems engineer with strong SRE experience, rather than a narrowly scoped, cloud-only SRE.

This role is highly hands-on and spans the full lifecycle of infrastructure—from architecture and automation to incident response and continuous improvement. You’ll have broad ownership across the stack, with direct responsibility for infrastructure architecture decisions, operational practices, and how reliability and observability are implemented as systems scale.

As the platform scales to support larger data volumes, more compute-intensive workloads, and lower-latency services, this role plays a critical part in ensuring reliability keeps pace with growth.

Salary/OTE: $170,000 - $250,000
Location: San Francisco, CA (onsite)

What You’ll Do

Architect, operate, and evolve scalable infrastructure supporting data-driven and compute-intensive workloads, including ownership of key architectural tradeoffs
Improve platform stability and performance through automation, observability, and capacity forecasting
Build and maintain deployment workflows, release automation, and configuration management systems
Define, own, and continuously improve operational practices including monitoring, alerting, on-call processes, and incident response workflows
Partner cross-functionally to embed reliability and performance best practices into engineering workflows
Maintain a strong security and operational posture across production environments
Lead incident reviews and drive long-term corrective and preventative improvements
This role is on-site in San Francisco to enable close collaboration with platform, data, and hardware-adjacent engineering teams.

What Will Help You Succeed

8+ years of experience in infrastructure, SRE, or systems engineering roles
5+ years working close to physical infrastructure, systems administration, and/or networking environments supporting production workloads
Deep experience with containerized systems and orchestration platforms (Docker, Kubernetes)
Strong command of Linux internals, networking fundamentals, and performance optimization
Hands-on experience with infrastructure-as-code tools such as Terraform and Ansible
Ability to write and maintain clean, reliable automation using tools such as Bash and Python
Practical experience implementing observability stacks (metrics, logs, tracing)
Familiarity with modern CI/CD pipelines and deployment tooling
Strong troubleshooting skills across distributed systems
Clear communicator who works effectively across engineering disciplines
No sponsorship available
Experience taking systems from earlier-stage architectures to more mature, production-hardened platforms is highly valued.

Senior Site Reliability Engineer

About the Senior Site Reliability Engineer role

What Will Help You Succeed

Drag & Drop Resume