Senior Data Engineer

Klanik • Contract • Remote (Montreal, QC, CA) • 2w ago

Job Description

A Cloud Software & Data Engineer is responsible for developing data engineering applications using third-party and in-house frameworks, leveraging a broad set of development skills that cover data engineering, data accessibility skillsets. The Cloud Software & Data Engineer is responsible for the complete software lifecycle – analysis, design, development, testing, implementation and support , as well as troubleshooting issues, deployment/upgrade of services and associated data, performance tuning and other maintenance work. This specific type of Full stack developer will focus on additional items: data engineering (large scale data transformation and manipulation, ETL, etc.), as well as infrastructure fine-tuning for optimization purposes. The position reports to the software project manager.

Responsibilities

• Work with subject matter experts to clarify requirements and use cases

• Turn requirements and user stories into functionality via implementation efforts which includes Design, build & maintain efficient, reusable, reliable code, High Quality software, Documentation, Traceability

• Develop server-side services to be elastically scalable and secure by design to support high volume & high velocity data processing. Services should be backward and forward compatible to ease deployment.

• Ensure the solution is deployable, operable and secure by default.

• Write and maintain provisioning, deployment, CI/CD and maintenance scripts for services they developed

• Write Unit Tests, Automation testing, Data Simulations

• Support, maintain, troubleshoot and fine-tune working cloud environments and the software run within

• Builds prototypes, products and systems that meets the project quality standards and requirements

• Be an individual contributor which includes technical leadership and documentation to developers and stakeholders

• Provide timely corrective actions on all assigned defects and issues.

• Contributes to development plan by providing task estimates.

• Fulfil organizational responsibilities (sharing knowledge & experience with other teams/ groups)

• Conduct technical training(s)/session(s), write whitepapers/case studies/blogs etc.

Background

Bachelors degree or higher in Computer Science or related with minimum years working experience

Skills and knowledge

Mandatory

• 5+ years of software development experience in Big Data technologies (Spark/, Database & Data Lakes)

• SQL, No-SQL, JSON, CSV, Parquet data types experience

• Advanced knowledge of large scale parallel computing engines (Spark) – provisioning, deployment, development of computing pipelines, operation and support, performance tuning (3y+)

• Good experience in building/tuning Spark pipelines in Python

• Design, build and maintain data processing pipelines in Apache NiFi, Spark Jobs

• Extensive knowledge of data structures, patterns and algorithms (5y +)

• Expertise with several back-end development languages and their associated frameworks – python (3y+)

• In-depth knowledge of application, cloud networking and security as well as related development best-practices and patterns (3y+)

• Cloud platform knowledge – Azure public cloud expertise (3y+)

• Advanced knowledge of DevOps, CI/CD and cloud deployment practices (5y+)

• Advanced knowledge of containerization and virtualization (Kubernetes), as well as scale clusters & debug issues on high volume/velocity data jobs and best practices (3y+)

• Advanced skills in setting up and operating databases (relational and non-relational) (3y+)

• Good experience in Databricks, Spark on Kubernetes

• Good Programming experience with Python

• Experienced in application profiling, bottleneck analysis and performance tuning

• Good communication and cross functional skills.

• Problem solving skills, Team player, adaptable & hustler

• Have worked in highly Agile projects in past

Nice to have

• Build, test and maintain tools, infrastructure to support Data science initiatives

• Exposure in PowerBI, SpotFire, Dataiku

• Knowledge and experience with version control tools (Git preferred but not mandatory)

• In Country cloud providers – Azure Stack (3y+)

• Experience deploying machine learning models into production environment.

• Experience with ML training/retraining, Model Registry, ML model performance measurement

• Oil and gas industry experience

• Architectural expertise