Back to List

This role requires candidates who are currently authorized to work in the U.S. without sponsorship, and C2C arrangements are not accepted.

 

Overview

We are seeking a pragmatic and skilled Senior Data Engineer to join a growing team focused on stabilizing and evolving core data infrastructure. This role will play a critical part in enabling machine learning and analytics by improving data quality, reliability, and scalability at the foundational level.

Why This Role Matters

Many strategic initiatives, including those involving AI/ML, are currently blocked by foundational data challenges such as:

  • Inconsistent or poorly structured data

  • Manual, fragile workflows

  • Schema limitations and lack of observability

  • Low maturity DevOps/data platform tooling

A hands-on, systems-oriented Senior Engineer will bring the right level of expertise and ownership to address these core issues.

Key Responsibilities

  • Refactor and evolve the data schema (e.g., PostgreSQL) to support scalability and data integrity

  • Build, optimize, and maintain batch and streaming pipelines using tools such as Airflow, Kafka, or equivalent

  • Develop reliable derived datasets to support analytics and reporting

  • Enhance data validation, observability, and logging across all pipelines

  • Support clean, structured inputs for downstream AI/ML applications

  • Collaborate with backend engineering to integrate data solutions into monolithic or microservice-based architectures

  • Contribute to internal data documentation and enforce engineering best practices

Required Skills and Experience

  • Strong proficiency in Python (e.g., Pandas, SQLAlchemy, PySpark) and SQL

  • Hands-on experience with PostgreSQL, including schema design, partitioning, and performance tuning

  • Practical experience deploying and maintaining data pipelines in production environments

  • Familiarity with ETL/ELT orchestration tools such as Airflow or dbt

  • Experience working with streaming data platforms (e.g., Kafka, Pub/Sub)

  • Comfort working in low-maturity environments lacking CI/CD, GitHub Enterprise, or monitoring setups

  • Exposure to data validation tools (e.g., Great Expectations) and observability stacks (e.g., Grafana, DataDog)

  • Awareness of working in regulated data environments (e.g., HIPAA, GDPR)

Preferred Qualifications

  • Experience in healthcare, mental health tech, or other regulated industries

  • Familiarity with Django or integration with monolithic web frameworks

  • Experience supporting data operations in early-stage or startup environments

  • Infrastructure-as-code familiarity and experience with tools like Docker

Mindset Traits

  • Thrives in ambiguity and incomplete systems

  • Enjoys untangling messy data and building durable solutions

  • Prioritizes doing things right over using trendy tech

  • Comfortable working independently and growing into broader ownership over time

What You'll Actually Be Doing

  • Re-architecting a basic Postgres-based data layer (not Snowflake-scale)

  • Writing ingestion pipelines and resolving data inconsistencies

  • Introducing CI/CD for data jobs and building foundational monitoring/logging

  • Collaborating cross-functionally with AI, backend, and infrastructure teams

  • Laying the groundwork for scalable, production-ready systems

Candidates Not Well-Suited for This Role

  • Expect a fully built modern data stack (e.g., Snowflake, data mesh, mature abstractions)

  • Have only worked in highly mature data environments and need heavy tooling support

  • Are focused primarily on ML or analytics rather than infrastructure and orchestration

  • Struggle to collaborate with shared DevOps/platform teams or require full ownership of production environments

Apply to this Job
First Name *
Last Name *
Email

Phone

Yes
No