We have a 2-year hybrid Disaster Recovery Specialist contract that may be extended by our client for up to an additional 3 years. The anticipated start is July/August 2024.
The Disaster Recovery Specialist must be able to perform work on-site occasionally at our client headquarters facility in Sacramento, California. At our clients’ discretion, a different approach (fully in-office, fully remote, or different work hours) may be utilized based on our client business need.
Our client Technology Services (TS) Branch supports approximately 1,700 employees and Disaster Recovery Specialists at their headquarters and seven member service centers (MSC) throughout the State of California. Our client is currently working in a hybrid environment where staff are onsite 2-3 days a week. There are no IT staff working at the MSCs. Technology support is provided remotely for the MSCs. There are no applications hosted at any of the branch offices.
Our client business resiliency program consists of two teams. The Business Continuity (BC) team is under the program area and responsible for the Business Impact Analysis (BIA), business continuity (BC) plan, Emergency Operation Committee (EOC), EOC tabletop exercise and business resumption center (BRC). The Disaster Recovery (DR) team is under Technology Services Branch and responsible for the DR gap analysis, DR plan, Information System Recovery Plan (ISRP), DR exercise, and DR After-Action Report (AAR).
The BC team performs BIA on an annual basis. Based on the BIA, the DR team performs a gap analysis. The result is used to update the DR plan, the ISRPs and DR technology solution. The DR team coordinates with the BC team to conduct DR exercises once or twice a year. After each exercise, an AAR is created that not only documents the outcome of the exercise, but also an action plan which includes improvement opportunities. Both BC and DR teams currently use SharePoint Online for storing artifacts and information.
Our client has two major projects that will alter the way DR is performed in the future. One project is to replace the existing mainframe system with modern technology. This project is expected to go live by end of 2025. The new solution will be hosted in AWS cloud. The other project is to migrate applications in the on-premise datacenter to the AWS environment. This project is anticipated to complete by end of 2025 as well. After 2025, both the on-premise and DR datacenters will be decommissioned.
The new AWS cloud that is being established is a single region with multiple Availability Zones (AZ). Our client applications are hosted on EC2 instances using cloud infrastructure services. In addition to AWS, our client also establishes multiple Equinix co-locations to host customer premises equipment (CPE) from third-party vendors. Other than the CPEs, there is no application or infrastructure at the Equinix sites. The AWS, Equinix, HQ and MSCs are all connected and dynamically routed using Software Defined–Wide Area Network (SD-WAN) and Border Gateway Protocol (BGP).
Our client has approximately 130 business systems, about half of them are software-as-a-service (SaaS). The rest is a mix of commercial-off-the-shelf (COTS) products and custom applications. Some of the COTS and custom applications have already been migrated to the cloud, some are in the process of being migrated and the rest will be migrated by end of 2025. our client has about 1,000 servers, all of which run MS Windows Server operating systems and the database servers run MS SQL Server.
Our client develops and maintains a DR plan which follows the policy established in the California Department of Technology’s State Information Management Manual (SIMM) 5325. The SIMM refers to the DR plan as the Technology Recovery (TR) Plan which is required to be maintained and kept up to date. The TR plan includes references to the ISRP for systems that have a DR solution. The ISRPs are maintained by the respective system owners.
Tasks Summary
- Perform Business Impact Analysis (BIA) Assessment
Our client Business Continuity (BC) team performs BIA on an annual basis. The BIA follows industry standard and includes the identification of business function, processes and associated maximum acceptable outage (MAO), and the required facilities, personnel, information and technology to support them. The business processes are classified into 5 tiers. Tier 0 has a recovery time of 0–48 hours, Tier 1 is 3-5 days, Tier 2 is 6-14 days, Tier 3 is 15–31 days and Tier 4 RTO is more than 32 calendar days.
Deliverable: BIA Assessment Report
The Disaster Recovery Specialist will analyze the BIA and provide a report that includes recommendations for improvement opportunities. These recommendations can range from establishing standards, to approach, to information collection process, communication and artifact repository. The recommendation must be specific with clearly defined outcome so that it can be acted upon. There must be supporting information based on industry standards and best practices that explain the rationale for the recommendations.
- Optimize Disaster Recovery (DR) Gap Analysis Process
After the BIA is approved, the DR team performs a DR gap analysis. This process consists of translating the BIA into requirements for technology recovery that includes the RTO and recovery point objectives (RPO) for each business system and the supporting infrastructure. The information is then compared with the current Critical Systems List (CSL) to determine the gaps which are then evaluated to develop the appropriate action plan. This process is labor intensive and completely manual.
Deliverable: DR Gap Analysis Process Document
The Disaster Recovery Specialist will analyze the current process and identify areas for optimization. The Disaster Recovery Specialist will develop a new process that incorporates the optimization recommendations. The new process should aim to leverage technology (e.g., MS Office, SharePoint, Power Platform, etc.) to maximize operational efficiency through automation and provide self-service to the stakeholders. The proposed process will be documented with clear roles and responsibilities and the Disaster Recovery Specialist will perform one DR gap analysis using the new process.
- Develop Application Recovery Matrix
Based on the BIA, the DR team creates a tiered application list. This list includes infrastructure services such as network, server, storage, backup and shared services (i.e., email, phone, communication system, etc.). The applications within the same Tier are further prioritized to determine the order of recovery. The Tiers are ranked according to the RTO and RPO of the application, which in turn are based on the MAO of the business processes.
Deliverable: Application Recovery Matrix
The Disaster Recovery Specialist will review the current DR application list and their associated tiers. The Disaster Recovery Specialist will also review the Enterprise Architecture application portfolio to capture all the elements of the system and map the dependencies between the components within a system as well as dependencies between systems. The Disaster Recovery Specialist will create an application recovery matrix, taking in account the dependency mapping to determine the recovery order that is aligned with the BIA.
- Assess System Implementation Process
Our client would like for the DR team to be actively participating in every system implementation project and to provide support as well as guidance to the projects.
Deliverable: System Implementation Assessment Report
The Disaster Recovery Specialist will review up to five (5) representative system implementation processes. The Disaster Recovery Specialist will create an assessment report with recommendations for how the DR team can be engaged effectively in projects. The report will include a set of guidelines to be provided to the project team so they can use as a guide for their system implementation. The report will also contain a list of requirements to ensure the project team understands DR expectations.
- Assess Operation Management Process
Systems are continually enhanced and updated at Our client. The DR team should be plugged into these operational activities to understand the impact of the change on DR strategy.
Deliverable: Operation Process Assessment Report
The Disaster Recovery Specialist will review up to five (5) operation management processes including change, maintenance and release. The Disaster Recovery Specialist will create an assessment report with recommendation for how the DR team can be engaged effectively in these processes. The report will include a set of guidelines to be provided to the respective operation team so they can use as a guide for their process. The report will also contain a list of requirements to ensure the operation team incorporates the DR expectations in their processes.
- Develop Technology Recovery Plan
California State Agencies are required to develop a technology recovery plan (TRP) and keep it up to date. Our client adheres to the State policy and maintains its TRP.
Deliverable: Technology Recovery Plan
The Disaster Recovery Specialist will review the California State policies and standards with regard to technology recovery. The Disaster Recovery Specialist will update the current TRP to adhere to the State requirements. The TRP should be aligned to Our client BIA and cloud strategy, and structured in a way that it is easy to maintain. The Disaster Recovery Specialist will develop a process to vet the TRP with relevant stakeholders and to keep it up to date. This process should be automated as much as possible.
- Develop Information System Recovery Plan
Each system that has a DR solution is required to have an Information System Recovery Plan (ISRP) that provides an overview of the system, its architecture and the recovery procedure. The procedure must contain sufficient details so that more than one team member can carry out the tasks. The ISRP is created and maintained by the system owner. The TRP includes references to the ISRP for each system on the Application Recovery Matrix.
Deliverable: Information System Recovery Plan
The Disaster Recovery Specialist will review the Our client ISRP templates and sample documents. The Disaster Recovery Specialist will update the templates to adequately cover the different types of systems (e.g., SaaS, COTS, custom applications, infrastructure and network) used at Our client. The Disaster Recovery Specialist will create three (3) ISRPs for each system type using the updated templates. The Disaster Recovery Specialist will develop a process to collect and maintain the ISRPs in a repository. This process should be automated as much as possible.
Our client conducts multiple DR exercises a year. Each exercise is designed to meet specific business objectives. The DR Exercise Plan is an overarching document that describes the overall goal of the DR exercise, the testing strategy, the various exercises and the people and systems that are involved in each exercise. The plan also contains expected outcome and metrics to measure success.
Deliverable: DR Exercise Plan
The Disaster Recovery Specialist will review the current DR Exercise Plan and provide recommendations based on California State policy, Our client business requirements, Our client cloud strategy, industry standards and best practices. The Disaster Recovery Specialist will create or update the DR Exercise Plan that incorporates those recommendations approved by Our client.
Minimum Qualifications
- 10 years of experience performing business impact analysis, IT risk assessment and DR gap analysis.
- 10 years of experience developing DR assessment, strategy, plan and processes.
- 10 years of experience optimizing DR strategy, plan and processes to improve efficiency and resiliency.
- 10 years of experience working with SaaS, cloud environment (e.g., AWS, Azure), co- locations and on-premises datacenters.
Desirable Qualifications
- Successfully delivered at least 1 project of similar scope and complexity.
- Experience providing consulting services to State and/or Federal agencies in the USA.
- Hold degrees/certifications in the areas of business continuity, IT risk management or disaster recovery.
- Experience developing streamlined IT processes using industry standards and best practices such as Lean Six Sigma or Information Technology Infrastructure Library (ITIL).
- Experience managing IT projects using industry standards and best practices such as Project Management Institute (PMI) PMBOK.
Minimum Application Requirements
Your application will be disqualified if you do not meet all these minimum application requirements.
- Must meet or exceed the Minimum Qualifications.
- Must be a current resident of the United States.
- Must have current work authorization for the United States.
- Must be a direct hire.
- Must be able to accommodate our clients’ need for occasional on-site work.
Make sure to check your junk/spam folders as we will use email to reach out to you.