Department reports it has passed a milestone towards creating a single repository with extensive processing capabilities for customer data
HM Revenue & Customs’ (HMRC) digital team is moving towards the launch of a central repository for its customer data with the aim of large scale processing and interrogation of the information.
It has reached a milestone in the development of its Enterprise Data Hub, on which its digital team has been working since 2014, with the ability to upload data securely from anywhere in the department.
It now plans to move on to migrating over data and services to use the hub in transformation projects.
A digital team blogpost says it will save money, partly through decommissioning each of its 11 existing data warehouses, provide new ways of interrogating the data and combine it with customer analysis.
“Transformation of our analytical capability will be a game changer in terms of HMRC’s digital ambition and achievement of our revenue generation objective,” the blog says. “There is also potential for cross-government transparency with UK economic benefits and shared use of cloud storage.”
It highlights two elements of the work on the hub. One is the use of the Apache (Hadoop) open source software framework storage and large scale processing of datasets on clusters of commodity hardware. This can handle all types of data – including unstructured, log files, pictures and audio files – and removes the need for proprietary hardware.
The other is the use of the tokenisation technique for replacing sensitive data with non-sensitive ‘tokens’. This provides similar security to encryption while ensuring the data is usable for HMRC’s purposes, and gives it the capability to manage the data from a single control.
“This is a world first, and we know that a number of banks are very interested in what HMRC are doing and want to use a similar solution themselves,” it says.
Image from GOV.UK, Open Government Licence v3.0