Policy details
Prepared for: NHS Norfolk and Suffolk ICB
Status: Approved
Version: 1.0
Date: 1/04/2026
Document control details:
- 6/05/2025 – Version 0.1, Initial draft
- 1/04/2026 – Current version, Revised to apply to Norfolk and Suffolk ICB, instead of Norfolk and Waveney ICB
Introduction
Overview
This Disaster Recovery (DR) Policy outlines the strategy and approach for ensuring the availability and recoverability of the Norfolk and Suffolk Datahub environment in the event of a regional outage or major service disruption.
The principles within this policy should be regularly reviewed against the NHS EPRR Framework.
Objectives
- Ensure continuity of critical data functions with minimal disruption.
- Protect data through regular backups and regional redundancy.
- Enable recovery using Infrastructure as Code (IaC) for rapid redeployment.
- Maintain data visibility for business users, primarily via Power BI.
Scope
This policy applies to all the components within the Datahub Azure Network platform including:
- Power BI Integration Gateway
- Central Integration Engine
- Master Patient Index
- Student Placement
- sFTP\HTTPs Servers (Public\HSCN)
- Curator
- Palo Alto Firewall
- Express Route Connectivity
- SHIR Gateway
- Logic Monitor Collector
Underlying are several Microsoft provided Azure components provided as part of the Microsoft cloud environment and the availability of these are prerequisites to be able to recover the above services.
Recovery Strategy
The Data Hub does not operate any business-critical clinical applications and as such the DR requirements are not as time critical. Alongside traditional methods of securing copies of data via the use of backups, the approach is not to invest in duplication and replication of the environments ‘just in case’ with the associated increase in operating costs, and instead to utilise modern tooling and practices to support recovery scenarios.
These fall into four broad categories:
Backups
All backup operations will be monitored daily to ensure success and any issues identified and rectified before the next backup run.
Backups are configured to operate with 4 Azure Recovery Service Vaults, one for each environment DEV, UAT, PRD, and HUB. Backups will be carried out using zone-redundant storage (ZRS) replicated synchronously across three Azure availability zones in UK West region.
Each availability zone is a separate physical location with independent power, cooling, and networking.
An Azure Policy Initiative will be leveraged to ensure a consistent centralized deployment across the different subscriptions with separate Policies within for each of the backup scenarios. The definition will be applied at the management group level and will be inherited by all subscriptions.
Access to recovery vaults will be tightly controlled with Azure RBAC.
Backup Policies will be configured as per the following:
- DEV – Snapshot retained for 1 day. Daily backup retained for 14 days.
- UAT – Snapshot retained for 1 day. Daily backup retained for 14 days.
- PRD – Snapshot retained for 2 days. Daily backup retained for 7 days. Weekly back retained for 4 weeks. Monthly backup retained for 3 months.
Yearly backup retained for 5 years. - HUB – Snapshot retained for 2 days. Daily backup retained for 7 days. Weekly back retained for 4 weeks. Monthly backup retained for 3 months.
Yearly backup retained for 5 years.
Azure Resilience
Utilise the cost-effective redundancy options built into the Azure offering which are built with high availability and fault tolerance in mind. These approaches include utilising the Availability Zones, High Availability components and Managed Services provisions and Auto-Recovery. In particular, appropriate use of Availability Zones ensure that functions and services are hosted across multiple physically separate data centres within the same region. In the event of a complete datacentre-level failure within a single region, services continue to operate within the other datacentres.
Paused/Stopped components
Utilise the ability to have services configured but not running so they don’t incur cost but can be started as quickly when needed.
Infrastructure as Code (IaC)
Deployment of components is carried out using modern development techniques where (including infrastructure) they are developed using code. This allows for rapid and repeatable deployment of components without having standby systems on hand.
Recovery Time Objectives (RTO) & Recovery Point Objectives (RPO)
RTO and RPO varies with the different functional components held within the Datahub. These vary depending on the frequency of data change and the impact on the organisation of unavailability and data loss.
Detailed targets for these are include within Appendix A.
Testing & Validation
Testing of backups will be carried out every 6 months minimum rotating through different components to ensure that all services are fully recoverable.
DR tests will be simulated using sandbox redeployment in either UK West or UK South.
IaC templates and images will be validated bi-annually for accuracy and completeness.
Monitoring and Alerts
All Datahub Azure components will be actively monitored to ensure that any issues are picked up quickly and the appropriate action taken.
Azure Monitor will be used for this and automated alerts configured for any anomalies.
Documentation and Communication
The Senior Cloud Engineer will be responsible for ensuring that engineers are appropriately trained in recovery procedures, and that SOPs exist to support those recovery processes.
For all DR Communications with the stakeholders, the Senior Cloud Engineer will act as the conduit for all status and progress communication.
DR will be considered as part of all new use cases to the Datahub and appropriate documentation updated to ensure it is current and meets the stakeholder’s needs for that use case. As part of this documentation there should be clearly documented Roles and responsibilities to ensure everyone is aware of the tasks necessary during a recovery scenario.
Appendix A – Prioritisation
Recovery Prioritisation
| Priority | Function | Comments | RTO | RPO |
|---|---|---|---|---|
| Critical | ||||
| High | Azure Private Networking | 6hr | 12hr | |
| Express Route | Connectivity to the HSCN Network | 6hr | 12hr | |
| Jumpbox\VDI | 6hr | 12hr | ||
| Palo Alto Firewall | 6hr | 12hr | ||
| Central Integration Engine | 6hr | 0 | ||
| Medium | Curator | 12hr | 24hr | |
| App Services – IPR Commentary | 12hr | 12hr | ||
| Master Patient Index | 12hr | 0 | ||
| Low | App Services – Student Placement) | 48hr | 12hr | |
| Power BI Gateway | 48hr | 12hr | ||
| Logic Monitor | 48hr | 24hr | ||
| sFTP\HTTPs Servers (Public\HSCN) | 48hr | 24hr | ||
| SHIR Server | 48hr | 24hr |
Appendix B – Contacts
Emergency contacts for this policy are for internal use and cannot be published as part of this policy.
Appendix C – Notification Network
- Senior Adure Infrastructure Engineer notifies: Technical Solutions Architect, Azure Developer, Cloud Infrastructure Engineer, Head of Data Engineering, and Head of Data Analytics.
- Technical Solutions Architect notifies: Associate Director of Insight and Analytics, and Executive Director of Digital and Data
- Azure Developer notifies: Applicaiton Consuming Organisations
- Cloud Infrastructure Engineer notifies: Integrated and Managed Service Organisations
- Head of Data Engineering notifies: Engineers
- Head of Data Analytics notifies: Analysts.