Required Skills & Experience
- 7+ years of engineering experience
- Grafana
- VMware
- Experience pulling from raw data files and manipulating those files into Grafana for dashboards, reporting, and metrics
Nice to Have Skills & Experience
- DevOps background
- Infrastructure as code tools – Morpheus, Terraform, Ansible, or Chef
- Other monitoring tools – Prometheus, Splunk, etc.
Job Description
Our client is a Fortune 100 telecommunications company, and this role sits within their Application Platform Services team. The primary focus of the position is a large VM Rightsizing initiative. Candidates should be experienced in DevOps with specific knowledge of automation data collection, data updates, and tasks within VMware, and should be able to visualize VMware data using Grafana and creation of tenant application based dashboards. On a day-to-day basis, candidates will: - Work within the Grafana Enterprise instance on prem - Build global dashboards that display storage on VMs in rolling 90-day increments to display RAM, CPU, storage, and peak utilization, and traverse that against that VMs allocation and target to right size or reallocate that resource. - This system needs to also group virtual machines based on tenant and application. - There will be clean up of certain metadata to extract from VMware as well. - The first goal of this engagement is to reclaim 10% of CPU. Other tasks may include: - Design, develop, implement, and maintain Grafana Data Reporting and Visualization with specific familiarity working with VMWare VROPS data collection and presentation. - Design, develop implement, and maintain a cloud management platform, CI/CD Pipeline, automation, and managed services tools. - Automate deployment, monitoring, and management of PaaS and IaaS services. - Ensure that our PaaS and IaaS platforms meets the needs of our customers, including internal and external stakeholders. - Develop, and implement automated processes for incident management, problem management, and change management in alignment with Charter Incident and Change Management Polices. - Continuously monitor and analyze the performance of PaaS and IaaS services to identify and resolve issues proactively. - Participate in capacity planning and performance optimization efforts. - Mentor junior engineers and provide technical leadership to the team. - Participate in an On Call rotation to ensure 24x7 support of Cloud Services. - Perform other duties as requested.
Exact compensation may vary based on several factors, including skills, experience, and education. Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.