Job Overview:
Ibexlabs is seeking a highly skilled Senior MLOps Engineer to lead and enhance our machine learning operations. This role requires extensive experience with AWS, Terraform, GitLab, and SageMaker, as well as expertise in implementing end-to-end observability solutions. As a Senior MLOps Engineer, you will play a key role in optimizing and automating ML workflows, ensuring the scalability, reliability, and security of our machine learning infrastructure.
Key Responsibilities:
- Design, develop, and manage scalable MLOps pipelines to automate the lifecycle of machine learning models from development to production using AWS services such as SageMaker.
- Build and maintain infrastructure as code using Terraform to support machine learning model training, deployment, and monitoring.
- Collaborate with data scientists, ML engineers, and DevOps teams to implement GitLab CI/CD pipelines for continuous integration and deployment of ML models.
- Implement and manage end-to-end observability (monitoring, logging, and tracing) to ensure the performance, reliability, and security of the ML infrastructure.
- Integrate automated testing, versioning, and rollbacks in ML pipelines to ensure models are deployed seamlessly and maintainable.
- Optimize resource utilization and cost-efficiency across AWS services, ensuring compliance with best practices in cloud security and governance.
- Troubleshoot and resolve issues related to ML pipelines, infrastructure, and deployments.
- Stay up to date with the latest trends and technologies in MLOps and DevOps, suggesting improvements and upgrades to the existing setup.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
- 5+ years of experience in MLOps or DevOps roles, with a strong focus on AWS services and machine learning operations.
- Proven expertise in AWS services (SageMaker, Lambda, S3, EKS, etc.).
- Experience with infrastructure as code tools such as Terraform.
- Strong experience with GitLab for CI/CD pipelines and version control.
- Hands-on experience with machine learning model lifecycle management, including deployment, monitoring, and troubleshooting.
- Expertise in building end-to-end observability, including monitoring, logging, and alerting for ML applications.
- Solid understanding of containerization (Docker, Kubernetes) and microservices architecture.
- Excellent problem-solving skills, with a proactive approach to improving processes and infrastructure.
- Strong communication and collaboration skills to work effectively with cross-functional teams.
Preferred Qualifications:
- AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer, etc.).
- Experience with Kubernetes, EKS, and serverless architecture.
- Familiarity with data pipeline tools (Airflow, Glue) and orchestration frameworks.
- Strong understanding of data science and machine learning concepts.
What We Offer:
- Competitive salary and benefits package.
- Flexible remote working environment.
- Opportunities for career growth and professional development.
- A collaborative and innovative work culture where your contributions are valued.