We are looking for a highly skilled Data Engineer to join our team. The ideal candidate will be responsible for designing, developing, and maintaining robust data pipelines, ensuring the quality and integrity of our data, and supporting our data architecture to efficiently handle large volumes of data. This role requires a strong technical background, excellent problem-solving skills, and the ability to work collaboratively with cross-functional teams.
The Responsibilities Of The Data Engineer May Include
- Assisting in the design and development of data pipelines for ETL processes using Apache Airflow.
- Collaborating with the Data Engineers to understand data requirements and translate them into technical solutions.
- Implementing data integration and transformation processes, ensuring data quality, accuracy, and consistency.
- Building and managing data lakes, and other storage systems.
- Writing and optimizing SQL queries, PySpark, AWS Glue jobs for data extraction and analysis.
- Assisting in the deployment, monitoring, and maintenance of data pipelines and databases.
- Working with AWS Quicksight to create interactive dashboards and reports.
- Collaborating with cross-functional teams to identify and resolve data-related issues or inefficiencies.
- Conducting research and staying updated on the latest data engineering trends and best practices.
- Documenting data workflows, procedures, and technical specifications for future reference.
- Ensuring data security and compliance with relevant regulations and standards.
- Automating data collection and processing tasks to improve efficiency and scalability.
- Participating in code reviews and ensuring adherence to best practices and coding standards.
- Troubleshooting and resolving any issues related to data processing and storage.
The ideal candidate will have
- A Bachelor’s or Master’s degree in Computer Science, Information Technology, Engineering, or a related experience.
- Proficiency in SQL and experience with relational and non-relational databases.
- Strong programming skills in Python, particularly with libraries such as PySpark for big data processing.
- Hands-on experience with ETL tools and frameworks, specifically Apache Airflow.
- Familiarity with AWS services, including AWS Glue, S3, Redshift, DynamoDB, and QuickSight.
- Understanding of data warehousing concepts and building data lakes.
- Ability to understand business requirements and translate them into technical specifications.