A Guide to Data Linkage

A Practical Guide to Trauma & Orthopaedics Database Linkage in the U.K. 

(The body of this text has been published as an editorial in the Bone & Joint Research Journal (BJR). You can read the full article here)


Trauma and Orthopaedics in the U.K. has long been established as a leader in the collection and analysis of high-quality patient data for the purposes of quality improvement and research. The scale and scope of registry data across several patient groups and procedures is the envy of colleagues around the world. The ability to link these data to other routinely held clinical information (for example patient health records held within the Hospital Episode Statistics data warehouse), allows us to enrich these data with important information that is not present in the original database and would otherwise be a significant burden to collect for every patient. 

The main strength of mandatory national registries is the data completeness (or coverage) of the study population. The drawback, however, is that regular changing of the collected data (or minimum dataset) makes any analysis cumbersome and becomes a burden to those submitting data. As a result, if variables are not collected in the registry, we must look to other sources to supplement it when needed for a study. An example of this is if a researcher wants to investigate mortality outcomes of patients undergoing joint replacements (included within the National Joint Registry (NJR)) whilst adjusting for specific co-morbidities (not included within the NJR).

Figure 1 – Example of data linkage

data linkage 1.jpg

Linkage Pathways

Requests to link datasets to routinely collected information (which is typically held by a national central organisation e.g. NHS Digital) are classically done through a formal application process. This is through the Data Access Request Service (DARS) in England and Wales, the Electronic Data Research and Innovation Service (eDRIS) in Scotland, and the Northern Ireland Statistics and Research Agency (NISRA) in Northern Ireland. Links to these services are detailed at the bottom of the page.

Specific registries may have pre-existing established linkage to routinely collected data (for example the National Joint Registry [NJR] and Hospital Episode Statistics [HES]), which means that applications can be made direct to the registry without having to go through the centralised service. Any application to link registry data to routinely collected healthcare data would typically be undertaken in combination with members of the registry committee or research team (such as the NJR research sub-committee) who will have specific expertise and understanding as to the vagaries of their particular dataset that are important for the linkage process.

As an example, there must be a common feature of both datasets (such as NHS number) that allows us to ensure the data in each source relates to the same patient, people experienced in data linkage of individual datasets will understand how this is best achieved.


For any application to link data there are some key components of the General Data Protection Regulation [GDPR] and Data Protection Act 2018 [DPA] (which govern appropriate use of patient data) to be aware of. 

Firstly, there is the role of the Data Controller (the organisation(s)/person(s) with ultimate control of, and responsibility for, the data in question) and the Data Processor (the organisation(s)/person(s) performing data processing [for example obtaining, recording, or holding data] on behalf of the Data Controller. Both are integral in any data application and will need to be clearly defined as part of the application process.

Secondly there is the lawful basis for data processing. This is covered by Article 6 (1) of the GDPR which sets out six potential reasons that can be utilised for lawful data processing. Most healthcare related linkage applications from NHS or Academic institutions will be covered under Article 6 (1) (e) - performance of a task carried out in the public interest or in the exercise of official authority vested in the controller. Healthcare applications are also considered “special category data”, which requires further additional justification under Article 9 of the GDPR. In most cases this will be covered under Article 9 (2)(j) – where “processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes” or Article 9 (2) (h) – “Processing is necessary for the purposes of preventive or occupational medicine, for the assessment of the working capacity of the employee, medical diagnosis, the provision of health or social care or treatment or the management of health or social care systems and services on the basis of Union or Member State law or pursuant to contract with a health professional and subject to the conditions and safeguards referred to in paragraph 3.”

Confidentiality and Consent

Another consideration is the common law duty of confidentiality. This governs the requirements for appropriate informed consent for the use of potentially identifiable confidential information. Given that most datasets utilise “pseudoanonymised” (the use of fictional identifiers to categorise individuals and link data together) data then the common law duty of confidentiality is applicable. In the case of truly anonymised data then consent is not required. The legal obligations specified in the common law duty of confidentiality (requirement for informed consent) for national datasets can be set aside through application to the Confidentiality Advisory Group (CAG) to allow application of Section 251 of the NHS Act 2006. A similar process also exists in Scotland through application to the Public Benefit and Privacy Panel (PBPP) for Health and Social Care. Local and regional data applications come under consideration of the local Caldicott Guardianship governance structure.

Ethical Approvals

Ethical and Health Research Authority approval (HRA) is a separate consideration and typically depends on the nature of the study. This is typically completed through the Integrated Research Application Service (IRAS). Decision tools regarding the requirement for ethical and HRA approval are available on the HRA website. Evidence of appropriate Information Governance training is also required, typically in the form of completion of the Medical Research Council (MRC) Research, GDPR and confidentiality Quiz.


There are several databases available across the U.K. readily accessible for national data linkage projects. Details of those in England and Wales held centrally by NHS Digital can be found here, whereas information regarding those held in Scotland can be found here. For Northern Ireland this is available here. There are also many other bespoke healthcare datasets that may be applicable to Trauma and Orthopaedics that can be identified through the Health Data Research U.K. (HDRUK) Innovation Gateway. The HDRUK website is also an excellent resource for information on access to, and analysis of, health data, and in particular utilisation of “Trusted Research Environments” that provide safe and streamlined access to healthcare information. Table 1 indicates commonly utilised databases in Trauma & Orthopaedics, including the information they contain and some key examples of national data analysis literature.

Research conduct and reporting

Researchers analysing linked routinely collected healthcare data (such as that found in registries or national datasets) are encouraged to ensure that all design and reporting is compliant with the REporting of studies Conducted using Observational Routinely-collected Data (RECORD) statement.1 Where feasible, results, data, and the code utilised for analyses should be published “Open Access” to allow for the widest potential impact whilst maintaining transparency and reproducibility. This allows others the opportunity to reproduce analysis as well as learn from the methods used.

Challenges of data linkage

Once ethical and data access issues have been overcome, there are several issues that researchers should be aware of. Large routine datasets (such as HES) are cumbersome (consisting of many millions of rows of data each representing a hospital spell) and take a lot of space to store and as a result can take a long time to analyse. A simple calculation on several million rows of data may take hours to run and as a result it is recommended that analysts create test or “toy” datasets (small excerpts of the larger dataset) to ensure the code works, before it is applied to the whole database. It is also important for researchers to be able to visualise the way their data are stored (e.g. a single hospital admission on each row of a database), this will help them in reorganising and linking the data correctly. Another important feature is to understand what data are missing, in what proportion and the pattern of how they are missing. If data are missing in one group more than another, then this can lead to biased analysis that may ultimately impact on the results and conclusions.


In summary, there are many opportunities available for data linkage projects across the spectrum of Trauma and Orthopaedics, with many excellent examples already published in high impact literature. As the quantity of electronic healthcare information continues to grow exponentially then it is essential that this data is appropriately leveraged to perform high quality research that helps inform the future care of our patients, particularly as we move towards a more personalised and precise approach to healthcare.

Table 1 - National data-linkage applications in Trauma and Orthopaedics

Dataset Information Example Literature
National Joint Registry (England, Wales and Northern Ireland) / Scottish Arthroplasty Project (Scotland).  Joint replacement implant data (NJR only) and revision rates

Adverse outcomes after total and unicompartmental knee replacement in 101 330 matched patients: a study of data from the National Joint Registry for England and Wales 2

The risk of peri-prosthetic fracture after primary and revision total hip and knee replacement 3
Hospital Episode Statistics (England and Wales) / Scottish Morbidity Record 01 (Scotland) / Hospital Inpatient System (Northern Ireland) Data on inpatient hospital episodes and co-morbidities Mortality rates at 10 years after metal-on-metal hip resurfacing compared with total hip replacement in England: retrospective cohort analysis of hospital episode statistics 4
Office National Statistics (England and Wales) / National Records of Scotland (Scotland) / General Register Office (Northern Ireland) Mortality

The Main Cause of Death Following Primary Total Hip and Knee Replacement for Osteoarthritis: A Cohort Study of 26,766 Deaths Following 332,734 Hip Replacements and 29,802 Deaths Following 384,291 Knee Replacements 5

The estimated lifetime risk of revision after primary knee arthroplasty is influenced by age, sex, and indication 6
Patient Reported Outcome Measures [PROMs] (England and Wales only with separate applications) Changes in health following surgical intervention The effect of surgical factors on early patient-reported outcome measures (PROMS) following total knee replacement 7
Clinical Practice Research Datalink (U.K. wide) Primary care data Encounters for foot and ankle pain in UK primary care: a population-based cohort study of CPRD data 8
Public Health England Surgical Site Infection Surveillance Service [SSISS] (England) / Electronic Communication of Surveillance in Scotland [ECOSS] (Scotland) Microbiology data Clinician-led surgical site infection surveillance of orthopaedic procedures: a UK multi-centre pilot study 9
National Hip Fracture Database (England, Wales and Northern Ireland), Scottish Hip Fracture Audit (Scotland) Hip fracture treatment and care process information

Discharge after hip fracture surgery by mobilisation timing: secondary analysis of the UK National Hip Fracture Database 10

A nationwide study of blood transfusion in hip fracture patients 11
Trauma Audit Research Network (England and Wales), Scottish Trauma Audit Group (Scotland) Major Trauma and some Orthopaedic Trauma e.g. Open fracture care Epidemiology of adult rib fracture and factors associated with surgical fixation: Analysis of a chest wall injury dataset from England and Wales 12
Scottish Medical Imaging Service (Scotland only) XR, CT, MRI scans and Radiology Reports  

N.B. Other sub-speciality specific registries also available e.g. Bone and Joint Infection Registry, Non-arthroplasty Hip Registry, National Ligament registry etc.

Associated Websites: Digital Access Service Request (DARS) - https://digital.nhs.uk/services/data-access-request-service-dars; Electronic Data Research and Innovation Service (eDRIS) - https://www.isdscotland.org/products-and-services/edris/; Northern Ireland Statistics and Research Agency (NISRA) - https://www.nisra.gov.uk/support/research-support


Authors: Farrow L, Evans J. Future registry research. Bone Joint Res. 2023;12(4):256-258. doi:10.1302/2046-3758.124.BJR-2023-0072