Job Summary
The Lead Data Engineer will be responsible for designing, developing, and maintaining scalable data pipelines using Python and PySpark on a cloud-native Lakehouse data platform. The role involves working closely with product management and analysts to deliver data solutions, ensuring data quality, optimizing pipelines for performance, and following DevOps principles for deployment. The ideal candidate will bring strong experience in data engineering, SQL, and cloud platforms, preferably with Azure.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using software development patterns.
- Implement data processing solutions using Python and PySpark on a cloud-native Lakehouse data platform.
- Write efficient SQL queries to extract, transform, and load data.
- Collaborate with product management and analysts to understand data requirements and deliver solutions.
- Optimize and troubleshoot data pipelines for performance and reliability.
- Ensure data quality and integrity through comprehensive testing and validation processes.
- Follow DevOps principles and use CI/CD to deploy and operate data pipelines.
Required Qualifications
- Proficiency in Python and PySpark.
- Strong experience with SQL and database management.
- Knowledge of software development patterns and best practices in data engineering.
- Experience with ETL/ELT processes and data pipeline orchestration.
- Proficiency in developing using version control, automated testing, and deployments using git-based tools like GitHub and GitHub Actions.
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- 8+ years of experience in data engineering or a related role.
- Strong problem-solving skills and attention to detail.
- Excellent communication and teamwork abilities.
Preferred Qualifications
- Understanding of testing methodologies for data pipelines, including unit testing, integration testing, and end-to-end testing.
- Knowledge of data governance and data security best practices.
- Familiarity with data warehousing concepts and tools.
- Experience with cloud platforms (e.g., Azure, AWS, GCP) with Azure preferred.
- Knowledge of big data technologies (e.g., Microsoft Fabric, Azure Synapse, Lakehouse, Databricks).
- Familiarity with advanced data orchestration tooling and development frameworks like dbt or Airflow.
- Experience working in a healthcare-related industry.
Essential Job Functions
- Specific vision abilities, including close vision, operate computer screen for extended periods of time.
- May be required to sit or stand for extended periods of time.
- Ability to read, write, and speak the English language fluently.
Education: Bachelors Degree