Data Engineer

WeVision LLC • Contract • Remote (California, United States, US) • 4w ago

SUMMARY

Our client is seeking a skilled software engineer who will be an exceptional addition to their Core Data Team. You will establish the foundational set of core platform pipelines and datasets which are a vital key to success – enabling dozens of engineering and analytical teams to unlock the power of data to drive key business decisions and provide engineering, analytics, and operational teams the critical information necessary to scale the largest streaming service. Expanding, scaling, and standardizing the core foundational principles through consistent observability, lineage, data quality, logging, and alerting across all engineering teams in the Data organization is imperative to the creation of a single pane of glass. The Core Data team is seeking to grow their team of world class Data Engineers that share their charisma and enthusiasm for making a positive impact.

WHAT YOU’LL DO

Contribute to maintaining, updating, and expanding existing Core Data platform data pipelines in Scala and Python / Spark while maintaining strict uptime SLAs
Extend functionality of current Core Data platform offerings, including metadata parsing, extending the metastore API, and building new integrations with APIs both internal and external to the Data organization
Implement the Lakehouse architecture, working with customers, partners, and stakeholders to shift towards a Lakehouse centric data platform
Architect, design, and code shared libraries in Scala and Python that abstract complex business logic to allow consistent functionality across all data pipelines across the Data organization
Tech stack includes Airflow, Spark, Databricks, Delta Lake, Snowflake, Scala, Python
Collaborate with product managers, architects, and other engineers to drive the success of the Core Data platform
Contribute to developing and documenting both internal and external standards and best practices for pipeline configurations, naming conventions, partitioning strategies, and more
Ensure high operational efficiency and quality of the Core Data platform datasets to ensure our solutions meet SLAs and project reliability and accuracy to all our stakeholders (Engineering, Data Science, Operations, and Analytics teams)
Be an active participant and advocate of agile/scrum ceremonies to collaborate and improve processes for our team
Engage with and understand our customers, forming relationships that allow us to understand and prioritize both innovative new offerings and incremental platform improvements
Maintain detailed documentation of your work and changes to support data quality and data governance requirements

Qualifications

BS or MS degree in CS related major
3+ years of professional programming and design experience.
Strong algorithmic problem-solving expertise
Strong fundamental Scala and Python programming skills
Basic understanding of AWS or other cloud provider resources (S3)
Strong SQL skills and ability to create queries to analyze complex datasets
Hands-on production environment experience with distributed processing systems such as Spark
Hands-on production experience with data pipeline orchestration systems such as Airflow for creating and maintaining data pipelines
Some scripting language experience
Willingness and ability to learn and pick up new skillsets
Self-starting problem solver with an eye for detail and excellent analytical and communication skills