Better, Broader, Safer: Using Health Data for Research and Analysis
The review led by Professor Ben Goldacre into how the efficient and safe use of health data for research and analysis can benefit patients and the healthcare sector has now been published. It is available here.
The review was commissioned by the Secretary of State for Health and Care, and will be responded to in the forthcoming Data Strategy for Health and Social Care which sets the direction for the use of data in a post-pandemic healthcare system.
April 7th 2022
This is the pdf version of the full review, containing all 8 chapters, introductions from both the Secretary of State and Prof. Ben Goldacre, an executive summary, and all the acknowledgments.
April 7th 2022
This is the HTML version of the full review, containing all 8 chapters, introductions from both the Secretary of State and Prof. Ben Goldacre, an executive summary, and all the acknowledgements.
April 7th 2022
This is the pdf version of the very short (5 page) executive summary. It provides a high-level overview of the main topics of the review and introduces the 30 top-level recommendations.
April 7th 2022
This is the pdf version of the summary of the review (25 pages). It contains more background on each of the main topics of the review, the 30 top-level recommendations alongside links out to the more detailed recommendations in the review, and the introductions from the Secretary of State and Prof. Ben Goldacre.
Professor Ben Goldacre, Goldacre review chair
Director, Bennett Institute for Applied Data Science; Professorial Fellow, Jesus College; Bennett Professor of Evidence-Based Medicine, Nuffield Department of Primary Care Health Sciences, University of Oxford
Ben Goldacre is a clinical researcher at the University of Oxford where he is Director of the Bennett Institute for Applied Data Science, and Bennett Professor of Evidence-Based Medicine in the Nuffield Department of Primary Care Health Sciences. He advises government on better uses of data and leads an academic team that uses large health datasets to deliver research papers and tools including OpenSAFELY.org (a new model of secure analytics platform that runs across unprecedented volumes of linked NHS patient data); OpenPrescribing.net (an open data explorer for NHS GP prescribing choices with over 20,000 users a month); and TrialsTracker.net (an open tool that monitors clinical trial reporting performance). He is also active in public engagement: his books, including ‘Bad Science’, have sold over 700,000 copies in more than 30 countries and his online lectures have over 5 million views.
Jessica Morley, Goldacre review researcher
Policy Lead, Bennett Institute for Applied Data Science, Nuffield Department of Primary Care Health Sciences, University of Oxford; Wellcome funded DPhil Candidate, Oxford Internet Institute, University of Oxford
Jess is a social science researcher at the University of Oxford where she is the policy lead for the Bennett Institute for Applied Data Science, and a Wellcome funded DPhil candidate at the Oxford Internet Institute. Her health data policy work is supported by the Mohn-Westlake Foundation. Prior to moving into academia full-time, she was a civil servant for the Department of Health and Social Care (DHSC) and latterly, NHSX.
Nicola Hamilton, Goldacre review secretariat
Civil Servant, DHSC
Nicola is a Civil Servant in the UK Health Security Agency (UKHSA), an executive agency sponsored by DHSC. She has worked in the Civil Service for the last 6 years, across a range of projects and programmes, most recently focusing on health data.
Terms of Reference
How do we facilitate access to NHS data by researchers, commissioners, and innovators, while preserving patient privacy?
What types of technical platforms, trusted research environments, and data flows are the most efficient, and safe, for which common analytic tasks?
How do we overcome the technical and cultural barriers to achieving this goal, and how can they be rapidly overcome?
Where (with appropriate sensitivity) have current approaches been successful, and where have they struggled?
How do we avoid unhelpful monopolies being asserted over data access for analysis?
What are the right responsibilities and expectations on open and transparent sharing of data and code for arm’s length bodies, clinicians, researchers, research funders, electronic health records and other software vendors, providers of medical services, and innovators? And how do we ensure these are met?
How can we best incentivise and resource practically useful data science by the public and private sectors? What roles must the state perform, and which are best delivered through a mixed economy? How can we ensure true delivery is rewarded?
How significantly do the issues of data quality, completeness, and harmonisation across the system affect the range of research uses of the data available from health and social care? Given the current quality issues, what research is the UK optimally placed to support now, and what changes would be needed to optimise our position in the next 3 years?
If data is made available for secondary research, for example to a company developing new treatments, then how can we prove to patients that privacy is preserved, beyond simple reassurance?
How can data curation best be delivered, cost effectively, to meet these researchers’ needs? We will ensure alignment with Science Research and Evidence (SRE) research priorities and Office for Life Sciences (OLS) (including the data curation programme bid).
What can we take from the successes and best practice in data science, commercial, and open source software development communities?
How do we help the NHS to analyse and use data routinely to improve quality, safety and efficiency?
Platforms and security
1. Build trust by taking concrete action on privacy and transparency: trust cannot be earned through communications and public engagement alone.
2. Ensure all NHS data policies actively acknowledge the shortcomings of ‘pseudonymisation’ and ‘trust’ as techniques to manage patient privacy: these outdated techniques cannot scale to support more users (academics, NHS analysts, and innovators) using ever more comprehensive patient data to save lives.
3. Build a small number of secure analytics platforms – shared ‘Trusted Research Environments’ – then make these the norm for all analysis of NHS patient records data by academics, NHS analysts and innovators, wherever there is any privacy risk to patients, unless those patients have consented to their data flowing elsewhere. Every new TRE brings a risk of duplicated effort, duplicated information governance, duplicated privacy risks, monopolies on access or task, and obstructive divergence around data curation and similar activity: there should be as few TREs as possible, with a strong culture of openness and re-use around all code and platforms.
4. Use the enhanced privacy protections of TREs to create new, faster access rules and processes for safe users of NHS data; ensure all TREs publish logs of all activity, to build public trust.
5. Map all current bulk flows of pseudonymised NHS GP data, and then shut these down, wherever possible, as soon as TREs for GP data meet all reasonable user needs.
6. Use TREs – where all analysts work in a standard environment – as a strategic opportunity to drive modern, efficient, open, collaborative approaches to data science.
Modern, open working methods for NHS data
7. Promote and resource ‘Reproducible Analytical Pathways’ (RAP, a set of best practices and training created in ONS) as the minimum standard for academic and NHS data analysis: this will produce high quality, shared, reviewable, re-usable, well-documented code for data curation and analysis; minimise inefficient duplication; avoid unverifiable ‘black box’ analyses; and make each new analysis faster.
8. Ensure all code for data curation and analysis paid for by the state through academic funders and NHS procurement is shared openly, with appropriate technical documentation, to all data users. Data preparation, analysis and visualisation is complex technical work, requiring collaboration by many individuals, who may never meet, in a range of organisations, across the NHS and other sectors. The only way to manage this shared complexity is by sharing information, as in other technical fields.
9. Recognise software development as a central feature of all good work with data. UKRI/NIHR should provide open, competitive, high status, standalone funding for software projects and developers working on health data. Universities should embrace research software engineering (RSE) as an intellectually and academically creative collaborative discipline, especially in health, with realistic salaries and recognition.
10. Bridge the gap between health research and software development: train academic researchers and NHS analysts in contemporary computational data science techniques, using RAP where appropriate; offer ‘onboarding’ training for software developers and data scientists who are entering health services research and epidemiology; use in-person and online training; make online resources openly available where possible.
11. Note that ‘open code’ is different to ‘open data’: it is reasonable for the NHS and government to do some analyses discreetly without sharing all results in real time.
Data curation and knowledge management
12. Stop doing data curation differently, to variable and unseen standards, duplicatively in every team, data centre, and project: recognise NHS data curation as a complex, standalone, high status technical challenge of its own.
13. Meet this challenge with systematic curation work, devoted teams, shared working practices, shared code, shared tools, and shared documentation; driven by open competitive funding to develop new shared curation methods and tools, and to manually curate data for individual datasets and fields.
14. Use TREs as an opportunity to impose standards on how commonly used datasets are stored, and curated into analysis-ready tables.
15. Create an open online library for NHS data curation code, validity tests, and technical documentation with dedicated staff who have appropriate skills in data science, curation, and technical documentation; so that new analysts, academics and innovators can arrive to find platforms with well curated data and accessible technical documentation.
NHS data analysts
16. Create an NHS Analyst Service modelled on the Government Economic Service and Statistical Service, with: a head of profession; clear job descriptions tied to technical skills; progression opportunities to become a senior analyst rather than a manager; and realistic salaries where expensive specific skills are needed.
17. Embrace modern, open working methods for NHS data analysis by committing to Reproducible Analytical Pipelines (RAP) as the core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training.
18. Create an Open College for NHS Analysts: this should devise (and coordinate delivery of) a curriculum for initial training and ‘continuing professional development’, tied to job descriptions; all training content should be shared openly online to all; and cover a range of skills and roles from deep data science to data communication.
19. Recognise the value of knowledge management: create and maintain a curated national open library of NHS analyst code and methods, with adequate technical documentation, for common and rare analytic tasks, to help spread knowledge and examples of best practice across the community; use this in training.
20. Seek expert help from academia and industry, but ensure all code and technical documentation is openly available to all, procuring newly created ‘intellectual property’ on a ‘buy out’ basis. Commission best practice guidance on outsourcing data analytics to cover: where external collaborations can be most helpful; the role of skilled analysts in guiding procurement; common red flags for delivery; and why RAP builds capacity, quality, and continuity of service.
21. Train senior non-analysts and leaders in how to be good customers of data teams.
22. Rationalise approvals: create one map of all approval processes; require all relevant organisations to amend it until all agree it is accurate; de-duplicate work by creating a single common application form (or standard components) for all ethics, information governance, and other access permissions; coordinate shared meetings when approval requires multiple organisations; have researchers available to address misunderstandings of their project; build institutions to help users who are blocked; recognise and address the risk of data controllers asserting access monopolies to obstruct competitors; publish data on delays annually; ensure high quality patient and public involvement and engagement (PPIE) is done.
23. Have a frank public conversation about commercial use of NHS data for innovation, but only after privacy issues have been addressed through adoption of TREs; ensure the NHS gets appropriate financial return where marketable innovations are driven by NHS data, which has been collected at great cost over many decades; avoid exclusive commercial arrangements.
24. Develop clear rules around the use of NHS patient records in performance management of NHS organisations, aiming to: ensure reasonable use in improving services; avoid distracting NHS organisations with unhelpful performance measures.
25. Address the problem of 160 trusts and 6,500 GPs all acting as separate data controllers. Do this either through one national organisation acting as Data Controller for a copy of all NHS patients’ records in a TRE, or an ‘approvals pool’ where trusts and GPs can nominate a single entity to review and approve requests on their behalf.
Approaches and strategy
26. Use people with technical skills to manage complex technical problems – create very senior strategic leadership roles for developers, data architects and data scientists; offer leadership training to those in existing technical roles. (Also train senior leaders in the basics of data analysis, software development, and clinical informatics; but recognise the limitations of that approach).
27. Build impatiently, but incrementally, accepting that new ways of working are overdue, but cannot replace old methods overnight. We must build skills, and prove the value of modern approaches to data in parallel to maintaining old services and teams.
28. Identify a range of ‘data pioneer’ groups from each key sector: 3 ICS analyst teams; 3 national quality improvement registry or audit teams; 3 academic birth cohort or electronic health record analysis teams; and 1 to 3 national NHS analytic teams. These should be selected competitively as those with the best current technical skills. Resource them to adopt modern working practices (Reproducible Analytic Pipeline working methods in a TRE alongside research software engineer support) and to develop shared re-usable methods, code, technical documentation and tools; this can be in parallel to ‘business as usual’ in their organisation, but should incrementally subsume it.
29. Build TRE capacity by taking a hands-on approach to the components of work common to all TREs. Avoid commissioning multiple closed, black box data projects from which little can be learned, or framing these as ‘experiments’. Experimentation is only powerful where it delivers openly shared working methods, code, outputs and technical documentation from which all can learn.
30. Focus on platforms by resourcing teams, services and institutions who are focused solely on facilitating great analytic work by other people, working closely with users. Data curation, secure analytics, TREs, libraries, RAP training, and platforms are the key missing link: they will only be delivered if they become high status, independent activities.
Below is a collection of news articles and blogs covering the review.
NHS Confederation: Data and research transformation in the NHS: What must be addressed?
Research Professionals: Health data software projects ‘need more funding’
Computer Weekly: Goldacre review outlines recommendations on safer use of health data
Healthcare IT News: Goldacre Review into health data calls for modern open working methods
National Health Executive: Independent review calls to improve data transparency
Get in Touch
If you would like to get in touch to discuss the content of the review, how to implement the recommendations, next steps, or an event focused on topics related to the review then please do get in touch.