MLOps Engineer - ML Infra / MLOps / ML Platform
$250,000 + $700,000 equity (4 year vest + annual freshers)
San Francisco - hybrid 3 days in office
Series D Tech firm
You'll join our rapidly expanding client as an MLOps Engineer in their MLOps team. In this role, you'll be instrumental in building and maintaining a foundational ML platform that spans the full ML lifecycle. This platform is designed for consumption by ML engineers and data scientists, enabling them to train, deploy, and manage ML models efficiently.
Key Responsibilities
- Analyze, troubleshoot, coordinate, and resolve complex infrastructure issues.
- Build, operate, and maintain a low-latency, high-volume ML serving layer for online and batch inference use cases.
- Orchestrate Kubernetes and ML training/inference infrastructure as part of an ML platform.
- Manage environments, interfaces, and workflows to support ML engineers in developing, building, and testing ML models and services.
- Expand the feature store implementation, enabling self-service data labeling, feature engineering, and batch inferencing.
- Reduce latency for model inference to achieve real-time model serving.
- Develop automation workflows to enhance team efficiency and ML stability.
- Improve efficiency, scalability, and stability of various system resources.
- Collaborate with other teams and stakeholders to deliver business initiatives.
- Onboard new team members, providing mentorship and facilitating their ramp-up on the team's code bases.
About You
- 5+ years in MLOps, Platform Engineering, DevOps, or Infrastructure.
- Expertise in building infrastructure for ML platforms and managing CPU/GPU compute.
- Proficient in Infrastructure as Code using Terraform.
- Familiar with CI/CD tools like Jenkins, CircleCI, Argo Workflows, and ArgoCD.
- Strong background in software development and passionate about applying it to ML infrastructure.
- Experienced with observability tools such as Splunk, Nagios, Sensu, Datadog, and New Relic.
- Proficient with containers and container orchestration, with hands-on experience in Docker and Kubernetes.
Work Environment
- Technologies: Python, Snowflake, SQL, DBT.
- Tools: Looker for data visualization.
- Backend: Java and Python microservices with Spark, Kinesis, Airflow, Snowflake, and Postgres on AWS.
- Teams Supported: Client Strategy, Product Management, Sales, Marketing, Finance, Engineering, Design, and Leadership.
Benefits
- Competitive perks and benefits, including health & wellness and equity options.
- Opportunity to join a rapidly growing team and influence the architectural roadmap.