our financial client-REQ-027369
Title: Site Reliability Engineer
Work Location: Charlotte, NC (Hybrid - Prefer candidates local to Charlotte)
Duration: 6+Months
Top Skills
Hard Skills:
- Experience with DevOps tools
- Observability & Monitoring tools, experience in public cloud platform AWS
- Leading incident response for critical issues, Terraform experience
Soft Skills Mandatory Skills:
- Ability to work across teams
- Proactive approach to observability
- Monitoring, vision to find issues and automate themJob Description:
- Run the production environment by monitoring availability and taking a holistic view of system health
- Support the applications with OnCall rotation support.
- Provide stability to our applications and facilitates rapid feature development by taking active control on direction of the service and be proactive
- Automate and eliminate manual work and look for opportunities for automation
- Maintaining and implementing the SLO implementation adoption and automation
- Production Readiness/Health Scoring & Error Budget Tracking
- Runbook standards, maintenance, and updates
Required: Experience using DevOps tools and technologies such as
GitLab, and Infrastructure as Code tools such as
Terraform
Strong troubleshooting skills and building and enhancing the
observability using
monitoring tools
Proactive approach to Observability maturity, identifying problems, performance bottlenecks, and areas for improvement for observability
Leading incident response and supporting application teams.
Blameless postmortems Developer feedback for enhanced logging, runbooks and addressing technical debt.
Promoting observability best practices
Experience in monitoring tools Dynatrace & Splunk
Experience in public cloud platforms, preferably
AWS and Api gateways
Experience developing API or Microservices or Frontend is a plus
Experience using source version control (SVC) such as Git