Job Title: Lead Data Engineer
Location: Toronto Canada Hybrid schedule - 2 - 3 days onsite
Duration: 3-6 Months Contract
Must : GCP ecosystem, BigQuery, Pub Sub, Data-flow and Data process
Job Description:
As a Data Lead Engineer, you will be responsible for leading and guiding the data engineering team in designing, developing, and optimizing scalable data pipelines and data infrastructure. Your expertise in cloud platforms, big data technologies, and programming languages will be crucial in building high-performance data systems that meet the needs of our clients and support the growth of our organization.
Key Responsibilities:
Data Architecture & Design:
Lead the design and implementation of scalable and efficient data architectures on the Google Cloud Platform (GCP).
Define and develop data models and data storage solutions using Firestore, BigQuery, and other relevant GCP services.
Data Pipeline Development:
Build, maintain, and optimize complex ETL/ELT pipelines using Apache Spark/PySpark, and Data Build Tool (DBT).
Ensure data pipelines are robust, scalable, and capable of handling large volumes of data in real-time and batch processing.
Programming & Scripting:
Write high-quality code in Java, Scala, and Python to process and transform data.
Develop and maintain data processing scripts and applications that integrate with various data sources and systems.
Workflow Orchestration:
Design and implement workflow automation using Apache Airflow to manage and monitor data pipelines.
Ensure data workflows are reliable, efficient, and easily maintainable.
Database Management:
Design and optimize SQL queries for data extraction, transformation, and reporting.
Manage and optimize NoSQL databases like Firestore and relational databases to ensure high performance and reliability.
Team Leadership & Collaboration:
Lead, mentor, and develop a team of data engineers, providing technical guidance and fostering a collaborative team environment.
Work closely with cross-functional teams, including data scientists, software engineers, and product managers, to deliver data-driven solutions.
Performance Optimization:
Continuously monitor and improve the performance of data systems, ensuring they meet the scalability, reliability, and efficiency needs of the business.
Identify and resolve performance bottlenecks in data processing and storage.
Data Governance & Security:
Implement and enforce best practices for data governance, security, and privacy.
Ensure compliance with data regulations and industry standards in all data engineering processes.
Innovation & Continuous Improvement:
Stay current with the latest trends and technologies in data engineering, cloud computing, and big data.
Drive innovation within the team, exploring new tools and methodologies to enhance data solutions.
Qualifications:
Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Technology, or a related field.
Experience:
Minimum of 6+ years of experience in data engineering or related fields.
Proven experience leading and mentoring data engineering teams.
Extensive experience with GCP, including services like BigQuery, Firestore, and Cloud Storage.
Strong expertise in big data technologies like Apache Spark and PySpark.
Proficiency in Java, Scala, and Python programming languages.
Hands-on experience with workflow orchestration tools such as Apache Airflow.
Experience with data transformation tools like DBT.
Strong SQL skills with experience in query optimization and database management.
Skills:
Excellent problem-solving and analytical skills.
Strong communication and leadership abilities.
Ability to work collaboratively in a fast-paced, dynamic environment.
A proactive approach to identifying and addressing potential challenges.
Sr. Data Engineer/Lead Scala Pyspark, Airflow, GCP, BQ, Fire store, SQL expertise, Scala is a bonus
A reasonable, good faith estimate of the minimum and maximum for this position is $70/Hr to $72/Hr with Limited benefits