GCP Data Engineer

Triunity Software, Inc. • Full-time • Phoenix, AZ, US • 1d ago

Nature: Day One On-site

Duration: 24 Months

Candidates Required: 04

Experience: 5 to 8 Years

Following is the Job description for the role of a Data Engineer,

Mandatory Skill Set: Apache Spark, Hive, Hadoop, BigQuery, BigTable, Cloud Composure, Dataflow, Google Cloud Storage, Python, SQL, Shell Scripting, Git.

Good to have Skill Set: CI/CD, Jenkins, Security and Networking, Scala, GCP Identity and Access Management (IAM).

Responsibilities:

Data Processing: Design, develop, and maintain scalable and efficient data processing pipelines using technologies such as Apache Spark, Hive, and Hadoop.
Programming Languages: Proficient in Python, Scala, SQL, and Shell Scripting for data processing, transformation, and automation.
Cloud Platform Expertise: Hands-on experience with Google Cloud Platform (GCP) services, including but not limited to BigQuery, BigTable, Cloud Composer, Dataflow, Google Cloud Storage, and Identity and Access Management (IAM).
Version Control and CI/CD: Implement and maintain version control using Git and establish continuous integration/continuous deployment (CI/CD) pipelines for data processing workflows.
Jenkins Integration: Experience with Jenkins for automating the building, testing, and deployment of data pipelines.
Data Modeling: Work on data modeling and database design to ensure optimal storage and retrieval of data.
Performance Optimization: Identify and implement performance optimization techniques for large-scale data processing.
Collaboration: Collaborate with cross-functional teams, including data scientists, analysts, and other engineers, to understand data requirements and deliver solutions.
Security and Networking: Possess basic knowledge of GCP Networking and GCP IAM to ensure secure and compliant data processing.
Documentation: Create and maintain comprehensive documentation for data engineering processes, workflows, and infrastructure.

Qualifications:

Proven experience with Apache Spark, Hive, and Hadoop.
Strong programming skills in Python, Scala, SQL, and Shell Scripting.
Hands-on experience with GCP services, including BigQuery, BigTable, Cloud Composer, Dataflow, Google Cloud Storage, and Identity and Access Management (IAM)
Familiarity with version control using Git and experience in implementing CI/CD pipelines.
Experience with Jenkins for automating data pipeline processes.
Basic understanding of GCP Networking.
Excellent problem-solving and analytical skills.
Strong communication and collaboration skills.