Title: DevOps/SRE/Rel-manager
Location: Jersey City, NJ (Day 1 Onsite)
Duration: 06+ Months
Need 10 plus years candidate in Devops
Skills: Prometheus, Grafana,Terraform, AWS, GCP, Azure,Docker
Position's General Duties and Tasks
In this role you will be responsible for:
Site Reliability Engineering (SRE):
- Monitor, maintain, and improve the reliability, availability, and performance of critical services
- Develop and implement monitoring solutions (e.g., Prometheus, Grafana) to track system health and performance
- Automate repetitive tasks and improve infrastructure efficiency using tools like Terraform, or similar
- Create and maintain Service Level Objectives (SLOs), Service Level Agreements (SLAs), and Service Level Indicators (SLIs) to drive reliability improvements
Participate in on-call rotations to handle incident response, root cause analysis, and mitigation strategies.
Release Management:
- Manage the end-to-end software release lifecycle, ensuring timely and smooth releases across all environments
- Work with different teams to coordinate and validate code releases
- Create and maintain a release calendar in collaboration with product and engineering teams to plan upcoming deployments
- Troubleshoot issues during the release process and ensure post-release validation
- Track and report release metrics to identify improvement opportunities and minimize downtime
DevOps:
- Design, implement, and manage CI/CD pipelines (e.g., Jenkins, GitLab CI) to support continuous integration and deployment
- Develop Infrastructure as Code (IaC) practices using tools like Terraform, AWS CloudFormation, or similar to manage infrastructure environments
- Collaborate with development teams to create scalable solutions that meet business and technical requirements
- Support containerization and orchestration efforts (Docker, Kubernetes) for application deployments
- Drive adoption of DevOps best practices across teams, fostering a culture of automation and agility
Requirements for this role include:
- Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience)
- 5+ years of experience in SRE, DevOps, or related roles
- Strong knowledge of cloud platforms (AWS, GCP, Azure) and cloud-native infrastructure
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK Stack)
- Proficiency in scripting languages (Python, Bash, Go) and automation tools (Ansible, Puppet, Terraform)
- Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab, CircleCI)
- Experience with release management, managing production deployments, and ensuring stable releases
- Familiarity with containerization technologies (Docker) and orchestration tools (Kubernetes)
- Strong problem-solving skills, attention to detail, and ability to work in a fast-paced environment
- Knowledge of GitOps, chaos engineering, and incident management tools (PagerDuty, Opsgenie)
Powered by JazzHR
ugOzHVRDcZ