Job Summary:
As the Manager of Site Reliability Engineering (SRE) for our Production Bioinformatics group, you will play a critical role in ensuring the stability, scalability, and performance of our production bioinformatics applications and infrastructure. You will lead a team of SREs in managing the reliability and operational excellence of our production bioinformatics systems, which support cutting-edge research and clinical applications. Your role will also encompass release management and production support responsibilities, ensuring smooth deployments and ongoing operational stability.
Key Responsibilities:
Team Leadership & Management:
Lead and mentor a team of SREs, fostering a culture of collaboration, innovation, and continuous improvement.
Define clear goals and performance metrics for the team, and oversee the execution of their responsibilities.
Conduct regular one-on-ones, provide constructive feedback, and facilitate professional development opportunities for team members.
Site Reliability Engineering:
Implement and manage monitoring, alerting, and incident response processes to ensure the reliability and uptime of bioinformatics systems.
Drive the resolution of operational issues, perform root cause analysis, and implement preventive measures to mitigate recurrence.
Release Management:
Manage the end-to-end release process for bioinformatics applications, including planning, coordination, and deployment.
Collaborate with development teams to ensure timely and successful releases, minimizing disruptions and ensuring alignment with release schedules.
Develop and enforce best practices for release management, including version control, release notes, and rollback procedures.
Production Support:
Provide ongoing support for production systems, including handling incidents, performing routine maintenance, and addressing user-reported issues.
Implement and manage procedures for system health checks, backups, and disaster recovery.
Ensure that production environments are monitored, and that any issues are promptly identified and resolved.
Collaboration & Coordination:
Work closely with bioinformatics scientists, data engineers, and software developers to understand their needs and optimize system performance.
Collaborate with other engineering and IT teams to integrate bioinformatics applications with broader enterprise operational tracking systems and tools.
Participate in cross-functional projects to enhance overall system architecture and deployment strategies.
Operational Excellence:
Develop and enforce best practices for deployment, configuration management, and system maintenance.
Lead efforts in capacity planning, performance tuning, and infrastructure scaling to accommodate evolving research demands.
Maintain documentation and standard operating procedures for all SRE-related activities.
Innovation & Improvement:
Stay abreast of emerging technologies and trends in site reliability engineering and bioinformatics.
Evaluate and recommend new tools, technologies, and processes to enhance system reliability and operational efficiency.
Qualifications: