Senior Site Reliability Engineer

San Francisco, California
IDj-5917
Job TypeDirect Hire
Remote TypeOn-Site
Compensation$170000 - $250000 / yr

About the Role

We’re seeking a senior infrastructure-focused engineer to help operate and scale a complex production platform where reliability, performance, and visibility are first-class concerns. You’ll work closely with software engineers and data-focused teams to design, automate, and run the systems that support large-scale data processing, machine learning workloads, and low-latency services.

This role is highly hands-on and spans the full lifecycle of infrastructure—from architecture and automation to incident response and continuous improvement. You’ll have broad ownership across the stack and meaningful influence over how systems are built and operated as they grow.

Salary/OTE: $170,000 - $250,000
Location: San Francisco, CA (onsite)


What You’ll Do
  • Architect and operate scalable infrastructure supporting data-driven and compute-intensive workloads
  • Improve platform stability and performance through automation, observability, and capacity forecasting
  • Build and maintain deployment workflows, release automation, and configuration management systems
  • Define and run operational practices including monitoring, alerting, on-call processes, and runbooks
  • Partner cross-functionally to embed reliability and performance best practices into engineering workflows
  • Maintain a strong security and operational posture across production environments
  • Lead incident reviews and drive long-term corrective and preventative improvements

What Will Help You Succeed

  • 8+ years of experience in infrastructure, SRE, or systems engineering roles
  • 5+ years working with physical infrastructure, systems administration, and/or networking environments
  • Deep experience with containerized systems and orchestration platforms (Docker, Kubernetes)
  • Strong command of Linux internals, networking fundamentals, and performance optimization
  • Hands-on experience with infrastructure-as-code tools such as Terraform and Ansible
  • Ability to write and maintain clean, reliable automation using tools such as Bash and Python
  • Practical experience implementing observability stacks (metrics, logs, tracing)
  • Familiarity with modern CI/CD pipelines and deployment tooling
  • Strong troubleshooting skills across distributed systems
  • Clear communicator who works effectively across engineering disciplines
  • No sponsorship available

Drag & Drop Resume

(PNG, JPEG, PDF, DOC, TXT)

Message & data rates may apply to all numbers allowed to receive messages

Message frequency varies. Text STOP to opt-out or HELP for assistance