Site Reliability Engineer

Hyperion Industries • Full-time • Remote (New York, NY, US) • $130k - $150k / year • 1m ago

Company Description

Join us on an exhilarating mission at Hyperion, a VC-backed startup working with Tim Hwang, CEO of FiscalNote (NYSE: NOTE). Our co-founders, with their extensive AI and engineering backgrounds from Google, Amazon, Workday, and Instacart are leading the charge. Our mission is to revolutionize Site Reliability Engineering (SRE) with an AI-powered platform that manages incidents and resolves them autonomously, setting a new standard in uptime and reliability. We are a fast-paced, mission-driven team passionate about leveraging technology to solve complex problems, and we’re looking for similar talents to join us on this journey.

We're looking for a world-class Site Reliability Engineer who excels in solving complex technical problems, strongly influences design decisions, and owns all aspects of technical communication with founders.

Why Work With Us?

Join a dynamic team led by industry veterans from top tech companies with a history of successful exits.
Be part of a mission-driven startup that is redefining the SRE landscape with groundbreaking AI technology.
Work in a fast-paced, high-energy environment where innovation and creativity are encouraged.
Opportunity to grow with the company and significantly impact our direction and success.
Competitive salary, equity options, and comprehensive benefits package.

Role Description

As a Site Reliability Engineer (SRE) at our startup, you will be at the forefront of maintaining and improving the reliability, availability, and performance of our AI-driven platform. You’ll work closely with our software engineering team to design, drive, and influence the product design based on customer's technical issues. Your role will also involve participating in an on-call rotation, where you will be responsible for incident response, troubleshooting, and continuous improvement of our operational processes.

Key Responsibilities

Contribute to building the technical roadmap to disrupt the SRE industry.
Develop automation tools to streamline operational tasks and improve system reliability.
Monitor and optimize system performance, proactively identifying and addressing potential issues.
Participate in on-call rotations, providing rapid response to incidents and working to resolve them.
Collaborate with software engineering teams to integrate reliability into every stage of the development lifecycle.
Contribute to the continuous improvement of our incident management and response processes.

Qualifications

Proven experience as a Site Reliability Engineer or similar role in a high-availability, mission-critical environment.
Strong knowledge of cloud infrastructure (AWS, Google Cloud, or Azure).
Proficiency in scripting and automation using languages like Python, Bash, or Go.
Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc.).
Solid understanding of CI/CD pipelines and configuration management tools (Jenkins, GitLab CI/CD, Ansible, etc.).
Experience with on-call rotations and incident management.
Software engineering knowledge/skills are highly preferred.
Excellent problem-solving skills and a proactive approach to identifying and mitigating risks.
Strong communication and collaboration skills.

The Culture

At our core, we prioritize continuous learning. Our collective expertise is built upon the philosophy of investing time in industry knowledge, fostering idea exchange, and developing shared insights within our team.
Our success hinges on attention to detail. We strive for excellence by delving deep into the intricacies of our work and relentlessly pursuing perfection.
We trust in our work, and our customers trust us to deliver out best. We creatively and proactively seek ways to deliver optimal results to both internal and external end-users.
Teamwork drives our business forward. Trust forms the foundation of all team dynamics, and we seek individuals who share this value.

If this sounds like you, we’d love to hear from you! Apply now and help us shape the future of Site Reliability Engineering.