As a Data Science Engineer, you will bridge the gap between data engineering and data science by developing both the infrastructure for data-driven applications and the models that drive business insights. You’ll collaborate closely with data scientists, data engineers, and software developers to create and deploy machine learning models, optimize data workflows, and ensure data is accessible, clean, and ready for analysis.
What You Will Do
- Design, develop, and deploy machine learning models into production, ensuring scalability, efficiency, and reliability.
- Build and maintain robust data pipelines to extract, transform, and load (ETL) data from various sources to be used in machine learning models and analysis.
- Work closely with data scientists to take models from experimentation to deployment, helping refine models and make them production ready.
- Build and optimize data storage solutions (data lakes, warehouses) to ensure that data is easily accessible and efficiently processed for analysis.
- Implement automated machine learning workflows, managing the lifecycle of models (from training to monitoring) using CI/CD pipelines and tools like Airflow or Jenkins.
- Prepare and transform raw data into formats suitable for machine learning models, including feature engineering and scaling datasets.
- Leverage cloud platforms (AWS, GCP, Azure) to build scalable, distributed systems for data storage and machine learning model deployment.
- Monitor the performance of models in production, implementing processes to ensure continuous learning and retraining with new data.
What You Need To Have
- 4+ years’ experience in data engineering or data science with a solid understanding of both
- Strong programming skills in Python (or R) and SQL.
- Proficiency with machine learning libraries (e.g., Scikit-learn, TensorFlow, PyTorch).
- Experience with big data tools like Hadoop, Spark, or Kafka.
- Familiarity with cloud-based platforms (e.g., AWS, GCP, Azure) and deploying machine learning models in the cloud.
- Experience building ETL/ELT pipelines using tools like Apache Airflow or AWS Glue.
- Experience deploying machine learning models in production environments, using APIs, containers (e.g., Docker), or orchestration tools like Kubernetes.
- Hands-on experience with databases, data warehouses (e.g., Snowflake, Redshift), and distributed computing environments.
- Ability to work cross-functionally, communicate complex technical concepts, and collaborate with teams to align technical solutions with business needs.