About the Role
This is a fantastic opportunity to join a collaborative environment where you will be focused on building and maintaining automated data pipelines in our internal cloud Platform (“Anthemetrics”). Ideal for a candidate who now wishes to test themselves on truly big data within a best-in-class environment.
- What You’ll DoBuilding and maintaining optimized and highly available data pipelines that facilitate the creation of Machine Learning Models
- Ensuring that pipelines run in a consistent and reliable manner
- Maintain and develop our Azure Dev Ops CI/CD pipelines
- About YouYou have a degree in mathematics, statistics, computer science, engineering, or similar (Masters or Ph.D. an advantage)
- Experienced in writing production-grade PySpark Code
- Experience in using CI/CD tools (e.g. Azure DevOps, Jenkins, etc.)
- Sound knowledge & hands-on experience in modern data lake architecture
- You enjoy and have experience working on Big Data problems in a Spark environment
- Experience with real-time systems like Kafka and Kubernetes is something we can offer you
- You are structured in code development and annotation, ensuring usability for others
- Experience in data pipelines for data visualization tools is an advantage
- You are comfortable working through problems and can demonstrate the ability to change and adapt as our company grows