Job Summary
We are seeking a skilled MLOps Data Engineer to join our dynamic team. The ideal candidate will be responsible for designing, implementing, orchestrating and maintaining data pipelines using Apache Airflow. They will collaborate closely with data scientists, analysts, and other stakeholders to ensure smooth data workflows and reliable data delivery.
Responsibilities
- Design, develop, and maintain data pipelines using Apache Airflow.
- Collaborate with data scientists and analysts to understand data requirements, develop ML model features, and implement efficient data workflows.
- Orchestrate, Monitor and troubleshoot data pipelines to ensure optimal performance and reliability.
- Deploy and Manage MLFlow for model management. Integrate machine learning models in data pipelines for inferencing.
- Implement data quality checks and ensure data integrity throughout the pipeline.
- Work closely with DevOps and infrastructure teams to deploy and scale data pipelines.
- Stay updated on the latest trends and best practices in data engineering and implement them as appropriate.
- Document data pipelines, workflows, and processes for knowledge sharing and future reference.
- Provide support to stakeholders in analyzing and interpreting data.
- Support product development activities, advice on architectural choices, environment setup, and optimization.
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Proven experience in MLOps/DataOps & building data pipelines using Apache Airflow.
- Strong programming skills in Python & DevOps.
- Experience with SQL and relational databases (e.g., Snowflake, Redshift, PostgreSQL, MySQL).
- Familiarity with cloud platforms such as AWS.
- Knowledge of distributed computing frameworks (e.g., Apache Spark) is a plus.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
Preferred Qualifications
- Experience with containerization technologies such as Docker and Kubernetes.
- Experience or exposure to Snowflake & MLFlow
- Familiarity with data streaming technologies (e.g., Apache Kafka).
- Experience with version control systems (e.g., Git).
- Knowledge of data warehousing concepts and technologies.
- Certification in Apache Airflow or related technologies is a plus.