Policy details
Prepared for: NHS Norfolk and Suffolk ICB
Status: Approved
Version: 4
Date: 1/04/2026
Document control details:
- 30/06/2021 – Version 0.1, Initial draft
- 28/07/2021 – Version 0.2, Amendments from Norfolk and Waveney CCG
- 9/11/2021 – Version 0.3, Review and word corrections
- 12/01/2022 – Version 1, Amendments and review of the draft
- 5/06/2024 – Version 2, Amendments to reflect change of host and governance arrangements
- 14/05/2025 – Version 3, Various updates to majority of sections and move to Data Hub polic templates. Created as version 3 due to volume of changes.
- 8/01/2026 – Version 3.1, Revision to simplify the document and ensure alignment with both the NHSE DARS agreement and new Norfolk and Suffolk agreed principles.
- 20/01/2026 – Version 3.1, Minor changes to wording clarity and link to supporting materials
- 24/02/2026 – Current version, Revised to apply to Norfolk and Suffolk ICB, instead of Norfolk and Waveney ICB
Introduction
Purpose
The purpose of this policy is to define how pseudonymisation is applied within the Data Hub to support the lawful, secure and proportionate use of data for secondary (non-direct care) purposes.
The policy also details the Data Hub requirements to anonymise first and covers anonymisation techniques required for reporting outputs.
Specifically, this policy sets out:
- how identifiable data is pseudonymised during ingestion into the Data Hub
- the level and type of pseudonymisation applied to different data fields
- how pseudonymised identifiers are used to support analytical access
- when a standard system pseudonym is used and when a project-specific pseudonym may be required
This policy is intended to ensure that identifiable data is not made available for analytical use unless there is a clear and approved legal basis to do so, and that access to data within the Data Hub is restricted to the minimum level of identifiability necessary to achieve the approved purpose.
By applying pseudonymisation as a default control, the Data Hub reduces the risk of inappropriate access to person identifiable data and supports compliance with data protection legislation, including the UK GDPR and the Data Protection Act 2018.
For the avoidance of doubt, this policy applies to data used for secondary purposes, including but not limited to:
- population health management
- commissioning and contract monitoring
- performance and service analysis
- system planning and evaluation
Scope
This policy applies to the handling of data within the Data Hub environment, from ingestion through to analytical access.
It covers:
- the transformation of identifiable data into pseudonymised data at ingestion
- the pseudonymisation standards applied to specific data fields
- the use of pseudonymised identifiers within analytical datasets
- the application of a national (system) pseudonym and, where approved, project-specific pseudonyms
- principles for onward sharing or external disclosure of data
This policy applies to all users accessing data through the Data Hub, including ICB staff, partner analysts, provider analysts, and technical users, where access is for analytical or secondary use purposes.
This policy does not cover:
- licensing or sublicensing arrangements
- legal bases for data processing
- disclosure control, anonymisation for publication, or small number suppression
- use of data outside the Data Hub environment
These matters are addressed through separate governance documents, including but not limited to sublicensing agreements, Data Protection Impact Assessments, Section 251 approvals, and reporting or publication guidance.
Guidance on data presentation and appropriate anonymisation controls can be found within the Analyst Hub and accessed HERE. Access to this guidance is limited to Analysts who have been onboarded into the Data hub Environment
This policy should be read alongside the Data Access Management Policy and other relevant Data Hub governance documentation. Where there is any conflict, the overarching governance and legal documentation will take precedence.
Pseudonymisation Approach and Principles
The Data Hub applies the principles of anonymisation first, however, to enable linkage of multiple data sets the data held in the hub analytical layer is mainly pseudonymised with no access to the pseudo key. The Data Hub applies pseudonymisation as a standard control to support the use of data for secondary (non-direct care) purposes.
Within the Data Hub, pseudonymisation means replacing direct identifiers with artificial identifiers so that individuals cannot be readily identified by users of the data. Pseudonymised data remains personal data and continues to be protected in line with data protection legislation.
Pseudonymisation is embedded in the design of the Data Hub and is applied during data ingestion, before data is made available for analytical access. As a result, users accessing data within the Data Hub do not routinely have access to person identifiable data.
Pseudonymisation is used to:
- reduce the risk of identification of individuals
- enable safe linkage and longitudinal analysis across datasets
- support lawful and proportionate secondary use of data
- enforce consistent controls across all analytical access routes
Application of Pseudonymisation
The Data Hub applies pseudonymisation in a consistent and controlled manner. Direct identifiers are removed or transformed as part of ingestion or analytical access, and only pseudonymised data is made available for routine analytical use.
This ensures that analytical users cannot directly identify individuals or re-identify data without explicit approval and controlled processes.
Use of Identifiable Data
Identifiable data is not available within the analytical or reporting layers of the Data Hub. Identifiable data is only held in the secure access-controlled landing area and within data quality matching within the MPI, all access is exceptional, time-limited, and subject to enhanced controls.
Relationship to Other Policies
The overarching ICB Anonymisation and Pseudonymisation Policy sets out general principles for the organisation. This policy applies those principles specifically to data ingestion and analytical access within the Data Hub.
The Data Hub does not routinely create anonymised datasets as part of its core analytical services. Anonymisation, disclosure control, and publication of data are governed by separate policies and sit outside the scope of this document.
Data Hub Analytical Roles
For the purposes of this policy, the following analytical roles apply within the Data Hub:
- System Users are users who access Data Hub data for system-wide, population-level, commissioning or analytical purposes and who do not have access to source clinical systems containing identifiable data. Examples include the ICB or Public Health Intelligence Teams
- Provider Users are users who access Data Hub data on behalf of provider organisations and who also have access to their organisation’s source systems, inclusion of local IDs, outside of the Data Hub.
Both roles access data for secondary use purposes through the Data Hub. Neither role has routine access to patient identifiable data within the Data Hub environment.
The distinction between these roles informs the treatment of local identifiers, as set out in the field-level pseudonymisation and data treatment section.
Pseudonymisation Models Used in the Data Hub
The Data Hub uses a controlled and standardised approach to pseudonymisation to support analytical use of data while minimising the risk of identification.
As a default, the Data Hub applies a single system pseudonym across datasets. This enables safe linkage and longitudinal analysis while ensuring that users do not have access to direct identifiers.
In limited and approved circumstances, a project-specific pseudonym may be used where a dataset must be analysed independently of the wider Data Hub.
System Pseudonym
The system pseudonym is the standard identifier used within the Data Hub.
It:
- replaces shared identifiers, such as NHS Number
- is applied consistently across datasets
- enables linkage and analysis across the Data Hub
- is used for all routine analytical access
The system pseudonym is applied during data ingestion and is the primary identifier available to users for secondary use analysis.
All users accessing analytical data within the Data Hub work with the same system pseudonym. Provider-specific pseudonym keys are not used. The only exception to this would be for independently managed project work as defined in section 4.2.
Project-Specific Pseudonym
In some circumstances, a dataset or subset of data may need to be analysed independently of the wider Data Hub. In these cases, a project-specific pseudonym may be implemented.
A project-specific pseudonym:
- is used only where there is a clear and documented requirement
- prevents linkage between the project dataset and other Data Hub datasets
- does not change the underlying field-level treatment of the data
- is applied consistently within the scope of the approved project
The use of a project-specific pseudonym is an exception to the standard model and must be formally approved through the appropriate governance process.
Governance of Pseudonymisation Models
The choice of pseudonymisation model is determined centrally and is not configurable by individual users or projects.
Any deviation from the standard system pseudonym, including the introduction of a project-specific pseudonym, must be:
- explicitly justified
- documented as part of the approved use case
- subject to appropriate information governance oversight
This ensures that pseudonymisation is applied consistently, transparently, and proportionately across the Data Hub.
Cryptographic Pseudonymisation Standard
Pseudonymisation within the Data Hub is implemented using one-way cryptographic hashing.
The pseudonymisation process:
- uses a secure hashing algorithm appropriate for protecting personal data
- applies centrally managed cryptographic salts
- produces deterministic pseudonyms to support consistent linkage where required
Pseudonymised values cannot be reversed without access to controlled cryptographic material and services, which are not accessible to analytical users.
Detailed implementation of cryptographic controls is managed as part of the Data Hub technical architecture.
Field-Level Pseudonymisation and Data Treatment
The Data Hub applies a consistent and standardised approach to field-level pseudonymisation and transformation to support analytical use while minimising the risk of identification.
Field-level rules are defined centrally and applied uniformly across the Data Hub. The only standard variation between user groups relates to the treatment of local identifiers, reflecting different analytical requirements for system analysts and provider analysts.
Standard Field Treatment
| Data Field | System User View | Provider User View |
|---|---|---|
| NHS Number | Replaced with system pseudonym | Replaced with system pseudonym |
| Local patient identifier (source system IDs) | Unchanged | Dynamically hashed (Snowflake dynamic hashing) |
| Name | Removed | Removed |
| Address | Removed | Removed |
| Postcode | Converted to LSOA | Converted to LSOA |
| Date of birth | Month and year only | Month and year only |
| Sex | Unchanged | Unchanged |
| Ethnicity | Unchanged | Unchanged |
| Organisation or provider code | Unchanged | Unchanged |
Local identifiers are dynamically hashed for provider analysts to prevent the use of clear local identifiers as direct keys that could enable re-identification when combined with access to source clinical systems. Hashing allows users to continue to link local identified such as pathway IDs to support analysis
This approach also aligns with the explicit non-sharing of local identifiers set out in the ICB Data Access Request Service (DARS) agreement associated with sublicensing arrangements.
Consistency and Exceptions
All fields other than local identifiers are treated consistently for system analysts and provider analysts. The Data Hub does not operate separate provider-only datasets as standard; differences relate solely to how local identifiers are presented.
Where a project-specific pseudonym is implemented, this affects only the pseudonymised identifier used for linkage within that project. It does not alter the treatment of other data fields.
Changes to Field Treatment
Any change to the standard field-level treatment, including changes to local identifier handling, must be formally approved and documented through the appropriate governance process.
Anonymisation and Suppression rules for data outputs
The Data Hub reporting principles follow Anonymisation first.
All data outputs generated from the Data Hub must follow a clear hierarchy of disclosure control, applying the least identifiable form of data necessary to meet the agreed purpose. As a default position, analysts must assume that outputs will be anonymised, and only progress to pseudonymised or identifiable data where this is demonstrably required, justified within an approved use case, and supported by an appropriate legal basis.
Analysts must always assess whether the intended purpose can be met using anonymised data. This includes the use of aggregation, suppression of small numbers (5 of less), banding of ages, derivation of geography to LSOA or higher levels, and removal of all direct identifiers. Where anonymised outputs are sufficient, no patient level or pseudonymised data should be shared or released. Outputs intended for broad circulation, publication, or routine reporting must always be anonymised.
Where anonymised data is not sufficient to meet the purpose, for example where patient level analysis, cohort tracking, or longitudinal analysis is required, pseudonymised data may be used. In these circumstances, analysts must ensure that all direct identifiers have been removed and replaced in line with the pseudonymisation standards set out in this policy. Pseudonymised data must only be shared with approved recipients, for a defined purpose, and under a specific use case. Pseudonymised outputs must never be assumed to be anonymous and must be handled as personal data.
The use of identifiable data is the exception and not the norm. Identifiable outputs may only be generated where there is a clear legal basis, such as direct care or an approved Section 251 exemption, and where this has been explicitly approved through the Data Hub use case and governance process. Analysts must not create or share identifiable outputs unless re-identification has been formally authorised and is undertaken through the approved technical processes described in this policy.
At each stage, analysts are responsible for applying appropriate disclosure control checks before any data is shared. This includes reviewing outputs for small numbers, potential inference risks, and combinations of fields that could lead to re-identification, particularly in small populations or rural geographies. If there is any uncertainty about whether an output is sufficiently anonymised or appropriately pseudonymised, the analyst must escalate for advice before release.
This staged approach ensures that data is always processed and shared in a lawful, proportionate, and secure manner, protecting patient confidentiality while enabling effective secondary use of data across the system.
Access, Controls and Governance
Pseudonymisation within the Data Hub is enforced by design and aligned to the Data Hub Access Management Policy.
Users do not select or configure the level of pseudonymisation applied to data. The presentation of data fields and identifiers is determined centrally, based on the approved access role and the standards set out in this policy.
Alignment with Access Controls
Access to data within the Data Hub is granted only through approved access routes and roles. Pseudonymisation is applied before data is made available for analytical access and cannot be bypassed through standard user permissions.
System Users and Provider Users are presented with data in accordance with the field-level rules defined in Section 5. No user group has routine access to clear patient identifiers through the Data Hub.
Governance of Exceptions
Any deviation from the standard pseudonymisation approach, including the use of a project-specific pseudonym, must be:
- explicitly justified as part of an approved use case
- documented and approved through the appropriate governance process
- limited to the minimum scope and duration necessary
Such exceptions do not permit access to additional identifiable fields and do not override the standard field-level treatment defined in this policy.
Monitoring and Assurance
Use of pseudonymised data within the Data Hub is subject to monitoring and audit. Access to data and use of pseudonymised identifiers may be reviewed to ensure compliance with this policy and with wider information governance requirements.
Any concerns relating to the application or use of pseudonymisation within the Data Hub will be managed in line with established information governance and incident management processes.
This policy governs the application of pseudonymisation for analytical access within the Data Hub and does not provide a mechanism for user-led re-identification, which is governed separately.