Distinguished Engineer - Software Defined GPUs - Startup
IDj-13410
Job TypeDirect Hire
Remote TypeFull Remote
Compensation$200000 - $300000 / yr
Cloud Native - Live Migration - Run GPU on Kubernetes
Architect needed for Series A Startup (team of 30)
100% Remote (in North America)
Equity participation
Base Salary: $200k-$300k
Key Responsibilities:
- Drive native Kubernetes architectures.
- Ensure Kubernetes-friendly installation and user experience.
- Extend capabilities to support GPUs, TPUs, FPGAs, and other accelerators.
- Collaborate with Product and Customers, acting as a technical translator to Product Management. Ensure appropriate prioritization of features and engage with customers during early evaluations and discussions to gather market insights.
- Communication: Create detailed architecture diagrams, documents, and presentations.
- Open Source Community: Stay actively involved with CNCF and related projects.
- Enterprise-Class Solutions: Drive and deliver solutions for enterprise-class data, ML, and AI applications.
- FinOps & SRE Best Practices: Implement FinOps for cloud financial management and modern SRE practices.
Qualifications:
- Startup experience
- Entrepreneurial mindset
- 10+ years of infrastructure-level software architecture and development
- Expertise in distributed systems
- Proficiency with Linux and virtualization platforms
- Native Kubernetes expertise
- Experience with Kubernetes-based ML/AI systems (Kubeflow, Kueue, KServe, GPU Operators, DRA, Karpenter)
Deep Knowledge:
- ML/AI use cases and customer stories of model development, training, inference, and hardware accelerator usage
- Proven track record of delivering complex distributed systems
- Involvement in CNCF or similar communities
- Strong leadership and team collaboration skills
- Excellent communication skills, both verbal and written
Preferred Qualifications:
- Knowledge of additional ML/AI frameworks and tools
- Experience in DevOps practices and tools
- Certification in Kubernetes or related technologies
- Awareness of FinOps and SRE best practices
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
