Data Hub Data Anonymisation and Pseudonymisation Policy

Policy details

Prepared for: NHS Norfolk and Suffolk ICB
Status: Approved
Version: 4
Date: 1/04/2026

Document control details:

30/06/2021 – Version 0.1, Initial draft
28/07/2021 – Version 0.2, Amendments from Norfolk and Waveney CCG
9/11/2021 – Version 0.3, Review and word corrections
12/01/2022 – Version 1, Amendments and review of the draft
5/06/2024 – Version 2, Amendments to reflect change of host and governance arrangements
14/05/2025 – Version 3, Various updates to majority of sections and move to Data Hub polic templates. Created as version 3 due to volume of changes.
8/01/2026 – Version 3.1, Revision to simplify the document and ensure alignment with both the NHSE DARS agreement and new Norfolk and Suffolk agreed principles.
20/01/2026 – Version 3.1, Minor changes to wording clarity and link to supporting materials
24/02/2026 – Current version, Revised to apply to Norfolk and Suffolk ICB, instead of Norfolk and Waveney ICB

Introduction

Purpose

The purpose of this policy is to define how pseudonymisation is applied within the Data Hub to support the lawful, secure and proportionate use of data for secondary (non-direct care) purposes.

The policy also details the Data Hub requirements to anonymise first and covers anonymisation techniques required for reporting outputs.

Specifically, this policy sets out:

how identifiable data is pseudonymised during ingestion into the Data Hub
the level and type of pseudonymisation applied to different data fields
how pseudonymised identifiers are used to support analytical access
when a standard system pseudonym is used and when a project-specific pseudonym may be required

This policy is intended to ensure that identifiable data is not made available for analytical use unless there is a clear and approved legal basis to do so, and that access to data within the Data Hub is restricted to the minimum level of identifiability necessary to achieve the approved purpose.

By applying pseudonymisation as a default control, the Data Hub reduces the risk of inappropriate access to person identifiable data and supports compliance with data protection legislation, including the UK GDPR and the Data Protection Act 2018.

For the avoidance of doubt, this policy applies to data used for secondary purposes, including but not limited to:

population health management
commissioning and contract monitoring
performance and service analysis
system planning and evaluation

Scope

This policy applies to the handling of data within the Data Hub environment, from ingestion through to analytical access.

It covers:

the transformation of identifiable data into pseudonymised data at ingestion
the pseudonymisation standards applied to specific data fields
the use of pseudonymised identifiers within analytical datasets
the application of a national (system) pseudonym and, where approved, project-specific pseudonyms
principles for onward sharing or external disclosure of data

This policy applies to all users accessing data through the Data Hub, including ICB staff, partner analysts, provider analysts, and technical users, where access is for analytical or secondary use purposes.

This policy does not cover:

licensing or sublicensing arrangements
legal bases for data processing
disclosure control, anonymisation for publication, or small number suppression
use of data outside the Data Hub environment

These matters are addressed through separate governance documents, including but not limited to sublicensing agreements, Data Protection Impact Assessments, Section 251 approvals, and reporting or publication guidance.

Guidance on data presentation and appropriate anonymisation controls can be found within the Analyst Hub and accessed HERE. Access to this guidance is limited to Analysts who have been onboarded into the Data hub Environment

This policy should be read alongside the Data Access Management Policy and other relevant Data Hub governance documentation. Where there is any conflict, the overarching governance and legal documentation will take precedence.

Pseudonymisation Approach and Principles

The Data Hub applies the principles of anonymisation first, however, to enable linkage of multiple data sets the data held in the hub analytical layer is mainly pseudonymised with no access to the pseudo key. The Data Hub applies pseudonymisation as a standard control to support the use of data for secondary (non-direct care) purposes.

Within the Data Hub, pseudonymisation means replacing direct identifiers with artificial identifiers so that individuals cannot be readily identified by users of the data. Pseudonymised data remains personal data and continues to be protected in line with data protection legislation.

Pseudonymisation is embedded in the design of the Data Hub and is applied during data ingestion, before data is made available for analytical access. As a result, users accessing data within the Data Hub do not routinely have access to person identifiable data.

Pseudonymisation is used to:

reduce the risk of identification of individuals
enable safe linkage and longitudinal analysis across datasets
support lawful and proportionate secondary use of data
enforce consistent controls across all analytical access routes

Application of Pseudonymisation

The Data Hub applies pseudonymisation in a consistent and controlled manner. Direct identifiers are removed or transformed as part of ingestion or analytical access, and only pseudonymised data is made available for routine analytical use.

This ensures that analytical users cannot directly identify individuals or re-identify data without explicit approval and controlled processes.

Use of Identifiable Data

Identifiable data is not available within the analytical or reporting layers of the Data Hub. Identifiable data is only held in the secure access-controlled landing area and within data quality matching within the MPI, all access is exceptional, time-limited, and subject to enhanced controls.

Relationship to Other Policies

The overarching ICB Anonymisation and Pseudonymisation Policy sets out general principles for the organisation. This policy applies those principles specifically to data ingestion and analytical access within the Data Hub.

The Data Hub does not routinely create anonymised datasets as part of its core analytical services. Anonymisation, disclosure control, and publication of data are governed by separate policies and sit outside the scope of this document.

Data Hub Analytical Roles

For the purposes of this policy, the following analytical roles apply within the Data Hub:

System Users are users who access Data Hub data for system-wide, population-level, commissioning or analytical purposes and who do not have access to source clinical systems containing identifiable data. Examples include the ICB or Public Health Intelligence Teams

Provider Users are users who access Data Hub data on behalf of provider organisations and who also have access to their organisation’s source systems, inclusion of local IDs, outside of the Data Hub.

Both roles access data for secondary use purposes through the Data Hub. Neither role has routine access to patient identifiable data within the Data Hub environment.

The distinction between these roles informs the treatment of local identifiers, as set out in the field-level pseudonymisation and data treatment section.

Pseudonymisation Models Used in the Data Hub

The Data Hub uses a controlled and standardised approach to pseudonymisation to support analytical use of data while minimising the risk of identification.

As a default, the Data Hub applies a single system pseudonym across datasets. This enables safe linkage and longitudinal analysis while ensuring that users do not have access to direct identifiers.

In limited and approved circumstances, a project-specific pseudonym may be used where a dataset must be analysed independently of the wider Data Hub.

System Pseudonym

The system pseudonym is the standard identifier used within the Data Hub.

It:

replaces shared identifiers, such as NHS Number
is applied consistently across datasets
enables linkage and analysis across the Data Hub
is used for all routine analytical access

The system pseudonym is applied during data ingestion and is the primary identifier available to users for secondary use analysis.

All users accessing analytical data within the Data Hub work with the same system pseudonym. Provider-specific pseudonym keys are not used. The only exception to this would be for independently managed project work as defined in section 4.2.

Project-Specific Pseudonym

In some circumstances, a dataset or subset of data may need to be analysed independently of the wider Data Hub. In these cases, a project-specific pseudonym may be implemented.

A project-specific pseudonym:

is used only where there is a clear and documented requirement
prevents linkage between the project dataset and other Data Hub datasets
does not change the underlying field-level treatment of the data
is applied consistently within the scope of the approved project

The use of a project-specific pseudonym is an exception to the standard model and must be formally approved through the appropriate governance process.

Governance of Pseudonymisation Models

The choice of pseudonymisation model is determined centrally and is not configurable by individual users or projects.

Any deviation from the standard system pseudonym, including the introduction of a project-specific pseudonym, must be:

explicitly justified
documented as part of the approved use case
subject to appropriate information governance oversight

This ensures that pseudonymisation is applied consistently, transparently, and proportionately across the Data Hub.

Cryptographic Pseudonymisation Standard

Pseudonymisation within the Data Hub is implemented using one-way cryptographic hashing.

The pseudonymisation process:

uses a secure hashing algorithm appropriate for protecting personal data
applies centrally managed cryptographic salts
produces deterministic pseudonyms to support consistent linkage where required

Pseudonymised values cannot be reversed without access to controlled cryptographic material and services, which are not accessible to analytical users.

Detailed implementation of cryptographic controls is managed as part of the Data Hub technical architecture.

Field-Level Pseudonymisation and Data Treatment

The Data Hub applies a consistent and standardised approach to field-level pseudonymisation and transformation to support analytical use while minimising the risk of identification.

Field-level rules are defined centrally and applied uniformly across the Data Hub. The only standard variation between user groups relates to the treatment of local identifiers, reflecting different analytical requirements for system analysts and provider analysts.

Standard Field Treatment

Data Field	System User View	Provider User View
NHS Number	Replaced with system pseudonym	Replaced with system pseudonym
Local patient identifier (source system IDs)	Unchanged	Dynamically hashed (Snowflake dynamic hashing)
Name	Removed	Removed
Address	Removed	Removed
Postcode	Converted to LSOA	Converted to LSOA
Date of birth	Month and year only	Month and year only
Sex	Unchanged	Unchanged
Ethnicity	Unchanged	Unchanged
Organisation or provider code	Unchanged	Unchanged

Local identifiers are dynamically hashed for provider analysts to prevent the use of clear local identifiers as direct keys that could enable re-identification when combined with access to source clinical systems. Hashing allows users to continue to link local identified such as pathway IDs to support analysis

This approach also aligns with the explicit non-sharing of local identifiers set out in the ICB Data Access Request Service (DARS) agreement associated with sublicensing arrangements.

Consistency and Exceptions

All fields other than local identifiers are treated consistently for system analysts and provider analysts. The Data Hub does not operate separate provider-only datasets as standard; differences relate solely to how local identifiers are presented.

Where a project-specific pseudonym is implemented, this affects only the pseudonymised identifier used for linkage within that project. It does not alter the treatment of other data fields.

Changes to Field Treatment

Any change to the standard field-level treatment, including changes to local identifier handling, must be formally approved and documented through the appropriate governance process.

Anonymisation and Suppression rules for data outputs

The Data Hub reporting principles follow Anonymisation first.

All data outputs generated from the Data Hub must follow a clear hierarchy of disclosure control, applying the least identifiable form of data necessary to meet the agreed purpose. As a default position, analysts must assume that outputs will be anonymised, and only progress to pseudonymised or identifiable data where this is demonstrably required, justified within an approved use case, and supported by an appropriate legal basis.

Analysts must always assess whether the intended purpose can be met using anonymised data. This includes the use of aggregation, suppression of small numbers (5 of less), banding of ages, derivation of geography to LSOA or higher levels, and removal of all direct identifiers. Where anonymised outputs are sufficient, no patient level or pseudonymised data should be shared or released. Outputs intended for broad circulation, publication, or routine reporting must always be anonymised.

Where anonymised data is not sufficient to meet the purpose, for example where patient level analysis, cohort tracking, or longitudinal analysis is required, pseudonymised data may be used. In these circumstances, analysts must ensure that all direct identifiers have been removed and replaced in line with the pseudonymisation standards set out in this policy. Pseudonymised data must only be shared with approved recipients, for a defined purpose, and under a specific use case. Pseudonymised outputs must never be assumed to be anonymous and must be handled as personal data.

The use of identifiable data is the exception and not the norm. Identifiable outputs may only be generated where there is a clear legal basis, such as direct care or an approved Section 251 exemption, and where this has been explicitly approved through the Data Hub use case and governance process. Analysts must not create or share identifiable outputs unless re-identification has been formally authorised and is undertaken through the approved technical processes described in this policy.

At each stage, analysts are responsible for applying appropriate disclosure control checks before any data is shared. This includes reviewing outputs for small numbers, potential inference risks, and combinations of fields that could lead to re-identification, particularly in small populations or rural geographies. If there is any uncertainty about whether an output is sufficiently anonymised or appropriately pseudonymised, the analyst must escalate for advice before release.

This staged approach ensures that data is always processed and shared in a lawful, proportionate, and secure manner, protecting patient confidentiality while enabling effective secondary use of data across the system.

Access, Controls and Governance

Pseudonymisation within the Data Hub is enforced by design and aligned to the Data Hub Access Management Policy.

Users do not select or configure the level of pseudonymisation applied to data. The presentation of data fields and identifiers is determined centrally, based on the approved access role and the standards set out in this policy.

Alignment with Access Controls

Access to data within the Data Hub is granted only through approved access routes and roles. Pseudonymisation is applied before data is made available for analytical access and cannot be bypassed through standard user permissions.

System Users and Provider Users are presented with data in accordance with the field-level rules defined in Section 5. No user group has routine access to clear patient identifiers through the Data Hub.

Governance of Exceptions

Any deviation from the standard pseudonymisation approach, including the use of a project-specific pseudonym, must be:

explicitly justified as part of an approved use case
documented and approved through the appropriate governance process
limited to the minimum scope and duration necessary

Such exceptions do not permit access to additional identifiable fields and do not override the standard field-level treatment defined in this policy.

Monitoring and Assurance

Use of pseudonymised data within the Data Hub is subject to monitoring and audit. Access to data and use of pseudonymised identifiers may be reviewed to ensure compliance with this policy and with wider information governance requirements.

Any concerns relating to the application or use of pseudonymisation within the Data Hub will be managed in line with established information governance and incident management processes.

This policy governs the application of pseudonymisation for analytical access within the Data Hub and does not provide a mechanism for user-led re-identification, which is governed separately.

Cookie settings