SUMMARY
Our client is seeking a skilled software engineer who will be an exceptional addition to their Core Data Team. You will establish the foundational set of core platform pipelines and datasets which are a vital key to success – enabling dozens of engineering and analytical teams to unlock the power of data to drive key business decisions and provide engineering, analytics, and operational teams the critical information necessary to scale the largest streaming service. Expanding, scaling, and standardizing the core foundational principles through consistent observability, lineage, data quality, logging, and alerting across all engineering teams in the Data organization is imperative to the creation of a single pane of glass. The Core Data team is seeking to grow their team of world class Data Engineers that share their charisma and enthusiasm for making a positive impact.
WHAT YOU’LL DO
- Contribute to maintaining, updating, and expanding existing Core Data platform data pipelines in Scala and Python / Spark while maintaining strict uptime SLAs
- Extend functionality of current Core Data platform offerings, including metadata parsing, extending the metastore API, and building new integrations with APIs both internal and external to the Data organization
- Implement the Lakehouse architecture, working with customers, partners, and stakeholders to shift towards a Lakehouse centric data platform
- Architect, design, and code shared libraries in Scala and Python that abstract complex business logic to allow consistent functionality across all data pipelines across the Data organization
- Tech stack includes Airflow, Spark, Databricks, Delta Lake, Snowflake, Scala, Python
- Collaborate with product managers, architects, and other engineers to drive the success of the Core Data platform
- Contribute to developing and documenting both internal and external standards and best practices for pipeline configurations, naming conventions, partitioning strategies, and more
- Ensure high operational efficiency and quality of the Core Data platform datasets to ensure our solutions meet SLAs and project reliability and accuracy to all our stakeholders (Engineering, Data Science, Operations, and Analytics teams)
- Be an active participant and advocate of agile/scrum ceremonies to collaborate and improve processes for our team
- Engage with and understand our customers, forming relationships that allow us to understand and prioritize both innovative new offerings and incremental platform improvements
- Maintain detailed documentation of your work and changes to support data quality and data governance requirements
Qualifications
- BS or MS degree in CS related major
- 3+ years of professional programming and design experience.
- Strong algorithmic problem-solving expertise
- Strong fundamental Scala and Python programming skills
- Basic understanding of AWS or other cloud provider resources (S3)
- Strong SQL skills and ability to create queries to analyze complex datasets
- Hands-on production environment experience with distributed processing systems such as Spark
- Hands-on production experience with data pipeline orchestration systems such as Airflow for creating and maintaining data pipelines
- Some scripting language experience
- Willingness and ability to learn and pick up new skillsets
- Self-starting problem solver with an eye for detail and excellent analytical and communication skills