Site Reliability Engineer

V-Soft Consulting Group, Inc. • Full-time • Culver City, CA, US • 1w ago

Role: Site Reliability Engineer (Data Center)

Number of positions: 2

The Ideal Candidate will have experience with system operations and running large-scale, massively distributed infrastructure.

Responsibilities:

Data monitoring and alerting, data quality assurance and anomaly detection.
Document team processes and policies, including methods of engagement and SLOs.
Analyze, design, and implement solutions at the system level to remove bottlenecks and improve edge service performance.
Implement monitoring and alerting to improve issue detection and response.
Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
Participate in on-call rotations, responsible for resolving or escalating incoming events
Maintain and operate a Linux and Kubernetes environment.

Qualifications

Bachelor's degree or above, majoring in Computer Science or related fields, with at least 5 years of related work experience.
3+ years’ experience working with Unix Linux systems from kernel to shell and beyond with
3+ years’ experience working with system libraries, file systems, and client-server protocols.
Experience reading python scripts for platform operations.
Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment.
Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.