Role: Site Reliability Engineer (Data Center)
Number of positions: 2
- Location: 5 days’ on-site in one of these 3 locations
- Culver City, CA 90230
- Mountain View, CA 94041
- Bellevue, WA 98004
The Ideal Candidate will have experience with system operations and running large-scale, massively distributed infrastructure.
Responsibilities:
- Data monitoring and alerting, data quality assurance and anomaly detection.
- Document team processes and policies, including methods of engagement and SLOs.
- Analyze, design, and implement solutions at the system level to remove bottlenecks and improve edge service performance.
- Implement monitoring and alerting to improve issue detection and response.
- Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
- Participate in on-call rotations, responsible for resolving or escalating incoming events
- Maintain and operate a Linux and Kubernetes environment.
Qualifications
- Bachelor's degree or above, majoring in Computer Science or related fields, with at least 5 years of related work experience.
- 3+ years’ experience working with Unix Linux systems from kernel to shell and beyond with
- 3+ years’ experience working with system libraries, file systems, and client-server protocols.
- Experience reading python scripts for platform operations.
- Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment.
- Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.