Skip to the content

Follow us @UKAuthority

MHRA creates synthetic datasets for research


Mark Say Managing Editor

The Medicines and Healthcare products Regulatory Agency (MHRA) has announced the creation of two synthetic datasets to support the development of new medical technologies for Covid-19 and cardiovascular disease.

Heart and magnifying glass on binary code

The datasets have been generated to accurately mirror symptoms, diagnoses and treatments in genuine patients. They are based on anonymised primary care data using innovative methods to produce entirely artificial data that does not contain any original data from ‘real’ patients.

MHRA said synthetic datasets are valuable in the development and testing of machine learning and artificial intelligence algorithms in medical devices used for diagnosing diseases and monitoring and improving health conditions.

They were produced by a collaboration between the Clinical Practice Research Datalink (CPRD), MHRA Medical Devices Division and researchers at Brunel University.

Validation function

CPRD director Janet Valentine said: “These datasets are designed to help researchers and companies validate their innovative new AI and medical devices. This development will support bringing safe products to market sooner, enabling patients to benefit from the latest technical advances.”

Indra Joshi, director of AI at NHSX, said: “Creating synthetic datasets is a novel way to help train machine learning algorithms on a rich and diverse set of data whilst maintaining safety and protecting privacy.”

The data generation and evaluation framework, as well as the datasets, are owned by MHRA, which has made available a technical description of the methodology used.

Image from GOV.UK, Open Government Licence v3.0

Register: Library & Alerts

Keep informed - Get the latest news about the use of technology, digital & data for the public good in your inbox from UKAuthority.