Day to day responsibilities- Cloud Operations Team – geared to Azure, majority of his team has AWS background currently – this person is responsible to come into team and be SME for infra portion of Azure. Will be responsible for any changes, upgrades, etc. in infra/Azure. Also helping team get a better understanding of Azure (teaching them). Best practices are already there, just needs someone to understand infra for when team starts getting azure requests, they can take that over. Setting up Infra code within Azure.
What is the next phase of this project? Only thing in azure is VDI right now- not sure what apps will be folded into azure in future but it will be coming, wants to get ahead of it
Tech stack:
-CI/CD, Terraform, GIT
-Scripting- Python and Terraform – open to powershell
-DTCC provides online training to the team, however this person will help out – ex: he will perform a change and everyone will watch as he does it, team will have opportunity to ask questions
-There will be a terraform already there that they may have to change a little bit, implement changes
-no integration of 3rd party services
-Infra as code focused role
MUST HAVES (TOP 3):
1. Scripting – Python, Terraform, Powershell
2. Operational Background – ex: worked with a ticketing program like SNOW, working on day to day changes, deployments, integrates with azure, open code w Azure
3. IaC
4. Relationship with Vendor Microsoft
Position Summary:
In this role a Cloud Operations Engineer is part of a support organization that is 24x7 across 3 global shifts and responsible for managing, monitoring, and optimizing cloud-based infrastructure, applications, and services within our organization. Candidate will play crucial role in ensuring the smooth functioning of cloud environments by handling tasks such as resource provisioning, configuration management, deployment, automation, and incident response. This candidate must have Cloud Operations support experience and training on AWS and Azure. In addition, this candidate must have experience with AWS services such as EC2, S3, RDS/Aurora, DynamoDB, Lambda, CloudWatch, IAM, CloudTrail, VPC Flow Logs, AWS Console, and AWS CLI. The role will also include developing disaster recovery and resiliency validation, and verification. In addition, this candidate must be an individual who can work closely with application developers and infrastructure engineers to support existing cloud environments, including incident response, change management, and handling stakeholder requests that come to the team.
Principal Responsibilities:
• Ability to write scripts (Bash, PHP, Python) for automation of solution resiliency validation and verification.
• Experience with provisioning and configuration tools like AWS CloudFormation/Terraform a plus.
• Cloud infrastructure management: Deploy and maintain cloud infrastructure, ensuring optimal performance, security and scalability.
• Cloud migration: Responsible for executing and managing cloud migration of applications, workloads and infrastructure from on-premises or other cloud environments to a target cloud platform.
• Resource provisioning: Allocate and manage cloud resources such as compute, storage and networking to meet the needs of applications and services.
• Configuration management: Implement infrastructure configurations and manage changes to maintain consistency across environments.
• Automation: Develop and implement automation scripts and tools to streamline and simplify repetitive tasks, improving efficiency and reducing human error.
• Monitoring: Set up and maintain monitoring tools to track the performance, availability and security of cloud services, proactively identifying and resolving issues.
• Incident response: Troubleshoot and resolve incidents, collaborating with development and IT teams to minimize downtime and maintain service quality.
• Security and compliance: Enforce security policies, best practices and compliance requirements to protect sensitive data and maintain cloud infrastructure integrity.
• Cost optimization: Monitor and analyze cloud resource usage to identify opportunities for cost savings, recommending and implementing cost optimizations.
• Backup and disaster recovery: Develop and maintain backup and disaster recovery plans to ensure data and application availability during unforeseen events.
• Collaboration and communication: Work closely with development, IT and business teams to ensure alignment with project goals and provide guidance on cloud best practices.
• Continuous improvement: Stay up to date with emerging cloud technologies, platforms and trends, continuously improving cloud operations and adapting to changing requirements.
• Documentation: Create and maintain comprehensive documentation of cloud infrastructure, configurations, processes and procedures to ensure knowledge sharing and team collaboration
Experience
• Minimum of 5 years of experience in IT related to cloud platforms including compute, storage, and network.
• Minimum of 3 years of Cloud Operations experience supporting AWS and Azure cloud environments, including working with services such as EC2, S3, RDS/Aurora, DynamoDB, Lambda, CloudWatch, IAM, CloudTrail, VPC Flow Logs, AWS Console, AWS CLI.
• Experience with Business Continuity and Disaster Recover support for cloud.
• Practical experience working with cloud automation tools such as Terraform, Terraform Enterprise, GitLab, and CloudFormation.
Knowledge/Skills
• Demonstrates strong customer service awareness and orientation and able to partner with Application Development teams and Shared Service owners on operational support activities and SNOW ticketing systems.
• Ability to effectively represent the Cloud Operations team on major incidents or troubleshooting calls.
• Terraform Enterprise and Gitlab experience.
• Maintains current knowledge of marketplace changes, technology changes, and client businesses pertinent to AWS and Azure cloud, compute, storage and network updates.
• Ability to succeed in a fast-paced, high demand environment.
• Excellent oral and written communication skills along with and ability to communicate at all levels.