Data Hub Disaster Recovery Policy - Norfolk & Suffolk ICB

Policy details

Prepared for: NHS Norfolk and Suffolk ICB
Status: Approved
Version: 1.0
Date: 1/04/2026

Document control details:

6/05/2025 – Version 0.1, Initial draft
1/04/2026 – Current version, Revised to apply to Norfolk and Suffolk ICB, instead of Norfolk and Waveney ICB

Introduction

Overview

This Disaster Recovery (DR) Policy outlines the strategy and approach for ensuring the availability and recoverability of the Norfolk and Suffolk Datahub environment in the event of a regional outage or major service disruption.

The principles within this policy should be regularly reviewed against the NHS EPRR Framework.

Objectives

Ensure continuity of critical data functions with minimal disruption.
Protect data through regular backups and regional redundancy.
Enable recovery using Infrastructure as Code (IaC) for rapid redeployment.
Maintain data visibility for business users, primarily via Power BI.

Scope

This policy applies to all the components within the Datahub Azure Network platform including:

Power BI Integration Gateway
Central Integration Engine
Master Patient Index
Student Placement
sFTP\HTTPs Servers (Public\HSCN)
Curator
Palo Alto Firewall
Express Route Connectivity
SHIR Gateway
Logic Monitor Collector

Underlying are several Microsoft provided Azure components provided as part of the Microsoft cloud environment and the availability of these are prerequisites to be able to recover the above services.

Recovery Strategy

The Data Hub does not operate any business-critical clinical applications and as such the DR requirements are not as time critical. Alongside traditional methods of securing copies of data via the use of backups, the approach is not to invest in duplication and replication of the environments ‘just in case’ with the associated increase in operating costs, and instead to utilise modern tooling and practices to support recovery scenarios.

These fall into four broad categories:

Backups

All backup operations will be monitored daily to ensure success and any issues identified and rectified before the next backup run.

Backups are configured to operate with 4 Azure Recovery Service Vaults, one for each environment DEV, UAT, PRD, and HUB. Backups will be carried out using zone-redundant storage (ZRS) replicated synchronously across three Azure availability zones in UK West region.

Each availability zone is a separate physical location with independent power, cooling, and networking.

An Azure Policy Initiative will be leveraged to ensure a consistent centralized deployment across the different subscriptions with separate Policies within for each of the backup scenarios. The definition will be applied at the management group level and will be inherited by all subscriptions.

Access to recovery vaults will be tightly controlled with Azure RBAC.

Backup Policies will be configured as per the following:

DEV – Snapshot retained for 1 day. Daily backup retained for 14 days.
UAT – Snapshot retained for 1 day. Daily backup retained for 14 days.
PRD – Snapshot retained for 2 days. Daily backup retained for 7 days. Weekly back retained for 4 weeks. Monthly backup retained for 3 months.
Yearly backup retained for 5 years.
HUB – Snapshot retained for 2 days. Daily backup retained for 7 days. Weekly back retained for 4 weeks. Monthly backup retained for 3 months.
Yearly backup retained for 5 years.

Azure Resilience

Utilise the cost-effective redundancy options built into the Azure offering which are built with high availability and fault tolerance in mind. These approaches include utilising the Availability Zones, High Availability components and Managed Services provisions and Auto-Recovery. In particular, appropriate use of Availability Zones ensure that functions and services are hosted across multiple physically separate data centres within the same region. In the event of a complete datacentre-level failure within a single region, services continue to operate within the other datacentres.

Paused/Stopped components

Utilise the ability to have services configured but not running so they don’t incur cost but can be started as quickly when needed.

Infrastructure as Code (IaC)

Deployment of components is carried out using modern development techniques where (including infrastructure) they are developed using code. This allows for rapid and repeatable deployment of components without having standby systems on hand.

Recovery Time Objectives (RTO) & Recovery Point Objectives (RPO)

RTO and RPO varies with the different functional components held within the Datahub. These vary depending on the frequency of data change and the impact on the organisation of unavailability and data loss.

Detailed targets for these are include within Appendix A.

Testing & Validation

Testing of backups will be carried out every 6 months minimum rotating through different components to ensure that all services are fully recoverable.

DR tests will be simulated using sandbox redeployment in either UK West or UK South.

IaC templates and images will be validated bi-annually for accuracy and completeness.

Monitoring and Alerts

All Datahub Azure components will be actively monitored to ensure that any issues are picked up quickly and the appropriate action taken.

Azure Monitor will be used for this and automated alerts configured for any anomalies.

Documentation and Communication

The Senior Cloud Engineer will be responsible for ensuring that engineers are appropriately trained in recovery procedures, and that SOPs exist to support those recovery processes.

For all DR Communications with the stakeholders, the Senior Cloud Engineer will act as the conduit for all status and progress communication.

DR will be considered as part of all new use cases to the Datahub and appropriate documentation updated to ensure it is current and meets the stakeholder’s needs for that use case. As part of this documentation there should be clearly documented Roles and responsibilities to ensure everyone is aware of the tasks necessary during a recovery scenario.

Appendix A – Prioritisation

Recovery Prioritisation

Priority	Function	Comments	RTO	RPO
Critical
High	Azure Private Networking		6hr	12hr
	Express Route	Connectivity to the HSCN Network	6hr	12hr
	Jumpbox\VDI		6hr	12hr
	Palo Alto Firewall		6hr	12hr
	Central Integration Engine		6hr	0
Medium	Curator		12hr	24hr
	App Services – IPR Commentary		12hr	12hr
	Master Patient Index		12hr	0
Low	App Services – Student Placement)		48hr	12hr
	Power BI Gateway		48hr	12hr
	Logic Monitor		48hr	24hr
	sFTP\HTTPs Servers (Public\HSCN)		48hr	24hr
	SHIR Server		48hr	24hr

Appendix B – Contacts

Emergency contacts for this policy are for internal use and cannot be published as part of this policy.

Appendix C – Notification Network

Senior Adure Infrastructure Engineer notifies: Technical Solutions Architect, Azure Developer, Cloud Infrastructure Engineer, Head of Data Engineering, and Head of Data Analytics.
Technical Solutions Architect notifies: Associate Director of Insight and Analytics, and Executive Director of Digital and Data
Azure Developer notifies: Applicaiton Consuming Organisations
Cloud Infrastructure Engineer notifies: Integrated and Managed Service Organisations
Head of Data Engineering notifies: Engineers
Head of Data Analytics notifies: Analysts.

Cookie settings