Nature: Day One On-site
Duration: 24 Months
Candidates Required: 04
Experience: 5 to 8 Years
Following is the Job description for the role of a Data Engineer,
Mandatory Skill Set: Apache Spark, Hive, Hadoop, BigQuery, BigTable, Cloud Composure, Dataflow, Google Cloud Storage, Python, SQL, Shell Scripting, Git.
Good to have Skill Set: CI/CD, Jenkins, Security and Networking, Scala, GCP Identity and Access Management (IAM).
Responsibilities:
- Data Processing: Design, develop, and maintain scalable and efficient data processing pipelines using technologies such as Apache Spark, Hive, and Hadoop.
- Programming Languages: Proficient in Python, Scala, SQL, and Shell Scripting for data processing, transformation, and automation.
- Cloud Platform Expertise: Hands-on experience with Google Cloud Platform (GCP) services, including but not limited to BigQuery, BigTable, Cloud Composer, Dataflow, Google Cloud Storage, and Identity and Access Management (IAM).
- Version Control and CI/CD: Implement and maintain version control using Git and establish continuous integration/continuous deployment (CI/CD) pipelines for data processing workflows.
- Jenkins Integration: Experience with Jenkins for automating the building, testing, and deployment of data pipelines.
- Data Modeling: Work on data modeling and database design to ensure optimal storage and retrieval of data.
- Performance Optimization: Identify and implement performance optimization techniques for large-scale data processing.
- Collaboration: Collaborate with cross-functional teams, including data scientists, analysts, and other engineers, to understand data requirements and deliver solutions.
- Security and Networking: Possess basic knowledge of GCP Networking and GCP IAM to ensure secure and compliant data processing.
- Documentation: Create and maintain comprehensive documentation for data engineering processes, workflows, and infrastructure.
Qualifications:
- Proven experience with Apache Spark, Hive, and Hadoop.
- Strong programming skills in Python, Scala, SQL, and Shell Scripting.
- Hands-on experience with GCP services, including BigQuery, BigTable, Cloud Composer, Dataflow, Google Cloud Storage, and Identity and Access Management (IAM)
- Familiarity with version control using Git and experience in implementing CI/CD pipelines.
- Experience with Jenkins for automating data pipeline processes.
- Basic understanding of GCP Networking.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.