Job Title: Site Reliability Engineer
Job Location: Irving/Dallas – Texas – On Site
Job Description
SoftClouds LLC is seeking a Full-Time Site Reliability Engineer with a strong background in a diverse range of areas, including cloud infrastructure, networking, monitoring, and security. The ideal candidate will have at least 5 years of experience in a similar role within large-scale enterprise projects and familiarity with DevOps models, GitHub interactions, and release pipelines. We are seeking a meticulous problem solver who thrives under pressure and is committed to maintaining the highest standards of reliability and security in our cloud services.
Responsibilities:
- Implement and maintain monitoring solutions to ensure system health, performance, and availability.
- Rapidly diagnose and resolve incidents to minimize downtime and service impact.
- Develop and maintain automation scripts to enhance operational efficiency and reliability.
- Analyze system performance and implement improvements to optimize resource utilization.
- Design and manage scalable infrastructure using IAC tools on Azure.
- Work closely with development, QA, and product teams to ensure seamless integration and deployment of new features.
- Create and maintain detailed documentation of systems, processes, and incident reports.
Qualifications:
- Minimum of 7+ years in a Site Reliability Engineering role or similar.
- Strong hands-on experience with Azure Cloud services and architecture.
- Proficiency in monitoring, logging, and tracing tools (e.g., Azure Monitor, Prometheus, Grafana, ELK Stack).
- Solid understanding of CI/CD pipelines and tools (e.g., Jenkins, GitLab, Azure DevOps).
- Proficient in scripting languages such as Python, Bash, or PowerShell.
- Strong analytical and troubleshooting skills.
- Excellent verbal and written communication skills.
Preferred/Desired:
- Azure certifications (e.g., Azure Administrator, Azure Solutions Architect).
- Previous experience with Kubernetes, Docker, and container orchestration.
- Familiarity with configuration management tools (e.g., Ansible, Terraform).
- Ability to work in a fast-paced environment and handle multiple tasks simultaneously.
- Enthusiasm for learning new technologies and sharing knowledge with the team.
Education Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or related field.