Back to List

Location: Hybrid (Alpharetta, GA – 3 days/week in office)Type: Full-Time

Are you a passionate Site Reliability Engineer with a keen eye for performance, scalability, and automation? Do you thrive in fast-paced environments, ensuring seamless production systems that meet high availability and reliability standards? If so, we want to hear from you!

We are looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team. In this role, you will play a crucial part in enhancing the stability, performance, and reliability of our production systems. You’ll work closely with development, DevOps, and security teams to improve observability, optimize system performance, and ensure production readiness. From monitoring to automation, you’ll make a direct impact on our cloud infrastructure and service reliability.

As an SRE, you will:

1. Monitoring & Observability:

  • Maintain and enhance system observability using tools like New Relic and Graylog (or similar).

  • Develop actionable alerts and dashboards to track service health and performance metrics.

2. Reliability Engineering:

  • Implement and maintain reliable, scalable systems, focusing on capacity planning, performance optimization, and fault tolerance.

  • Collaborate on defining and monitoring SLIs, SLOs, and SLAs for service reliability.

3. Automation & Infrastructure Operations:

  • Automate operational tasks, minimizing manual interventions.

  • Manage Kubernetes workloads on AWS EKS, ensuring security and stability.

  • Leverage HashiCorp Vault to manage secrets and ensure compliance.

4. Incident & Problem Management:

  • Participate in on-call rotation and resolve production incidents swiftly.

  • Troubleshoot production issues, perform root cause analysis, and implement permanent fixes.

  • Lead post-incident reviews and follow through on remediation actions.

5. Collaboration & Continuous Improvement:

  • Work with DevOps teams to enhance CI/CD pipelines for production readiness.

  • Partner with development teams to embed resilience and observability into applications.

6. Documentation & Knowledge Sharing:

  • Create and maintain operational runbooks, escalation procedures, and production playbooks.

What We’re Looking For

  • 6+ years of experience in SRE, DevOps, or a similar role.

  • Expertise with AWS (EKS, EC2, S3, Route53, IAM).

  • 6+ years managing production Kubernetes workloads.

  • Hands-on experience with monitoring tools like New Relic or Graylog.

  • Experience with HashiCorp Vault (or similar secrets management tools).

  • Proficiency in automation and CI/CD using GitHub Actions, GitLab, Helm, and ArgoCD.

  • Expertise in Infrastructure as Code (IaC), specifically with Terraform.

  • Strong scripting skills in Python, Bash, or similar languages.

  • Solid troubleshooting, debugging, and root cause analysis skills.

  • A willingness to participate in a 24/7 on-call rotation.

Bonus Skills (Nice to Have)

  • AWS Certification

  • Familiarity with the .NET application stack

  • Exposure to multi-cloud environments

  • Experience with Rancher for managing Kubernetes clusters on-prem

  • Familiarity with Packer for building Golden AMIs

Why Join Us?

  • Impactful Work: You’ll be directly involved in shaping the reliability and scalability of our production systems.

  • Hybrid Flexibility: Work from our Alpharetta office 3 days a week and enjoy flexibility on the other days.

  • Collaborative Culture: Join a passionate team of engineers and problem-solvers who believe in innovation and continuous improvement.

  • Growth & Development: We support your personal and professional growth with opportunities for learning and certifications.

  • Competitive Compensation: We offer an attractive salary, benefits, and perks.

About Us

We’re a growing company in a unique industry and committed to delivering reliable, high-performance solutions for our customers. If you’re a detail-oriented, proactive problem-solver with a passion for reliability and automation, you’ll thrive here.

Ready to take the next step in your SRE career? Apply now and help us build the future of reliable systems!

Apply to this Job
First Name *
Last Name *
Email Address *

Phone Number

Yes
No