Role: Observability Engineer (Grafana)
Location: 4 days onsite in Englewood, CO ( must relocate 2 weeks from offer)
Note: We are looking for someone to create virtual machine reports in Grafana that tracks utilization etc. This person will need to know how to model and adjust data on Grafana to track vm utilization at a global scale. They will need to also have devops principal skills/knowledge.
Skills:
7+ years of engineering experience
Grafana
VMware
Experience pulling from raw data files and manipulating those files into Grafana for dashboards, reporting, and metrics
Nice to Have Skills & Experience:
DevOps background - Infrastructure as code tools – Morpheus, Terraform, Ansible, or Chef - Other monitoring tools – Prometheus, Splunk, etc.
Job Description:
Our client is a Fortune 100 telecommunications company, and this role sits within their Application Platform Services team.
The primary focus of the position is a large VM Rightsizing initiative. Candidates should be experienced in DevOps with specific knowledge of automation data collection, data updates, and tasks within VMware, and should be able to visualize VMware data using Grafana and creation of tenant application based dashboards.
On a day-to-day basis, candidates will:
Work within the Grafana Enterprise instance on prem
Build global dashboards that display storage on VMs in rolling 90-day increments to display RAM, CPU, storage, and peak utilization, and traverse that against that VMs allocation and target to right size or reallocate that resource.
This system needs to also group virtual machines based on tenant and application.
There will be clean up of certain metadata to extract from VMware as well.
The first goal of this engagement is to reclaim 10% of CPU.
Other Tasks may include:
Design, develop, implement, and maintain Grafana Data Reporting and Visualization with specific familiarity working with VMWare VROPS data collection and presentation.
Design, develop implement, and maintain a cloud management platform, CI/CD Pipeline, automation, and managed services tools.
Automate deployment, monitoring, and management of PaaS and IaaS services.
Ensure that our PaaS and IaaS platforms meets the needs of our customers, including internal and external stakeholders.
Develop, and implement automated processes for incident management, problem management, and change management in alignment with Charter Incident and Change Management Polices.
Continuously monitor and analyze the performance of PaaS and IaaS services to identify and resolve issues proactively.
Participate in capacity planning and performance optimization efforts.
Mentor junior engineers and provide technical leadership to the team.
Participate in an On Call rotation to ensure 24x7 support of Cloud Services.
Perform other duties as requested.