Image source: Open Data Institute
Interview: Calum Inverarity of the Open Data Institute talks about issues identified in its new report on the subject and how it could apply to public services
Federated learning is an unfamiliar concept to many, but there are stirrings of interest in how this technique in machine learning could be used in more applications, including some public services.
These have risen to the point at which the Open Data Institute (ODI) has published a report on the technology and the considerations for its use, with an emphasis on how it relates to data privacy and ideas on how it could be applied.
One of the co-authors, ODI senior researcher Calum Inverarity, explains it in terms that highlight the shared characteristics with edge computing, in which much of the work is done on devices or in systems separate from the core system for the application.
It contrasts with the traditional approach to machine learning in which data is collected from a lot of sources then used in training the algorithms within the central repository.
“With federated learning, instead of having to have all the data together – which often involves lots of agreements with different organisations – the model itself is sent out to wherever the data is held,” Inverarity says. “That could be on a device like a phone or in a healthcare setting like a hospital’s network where the sensitive data is held, then the model is trained where the data is.
“It means you don’t need to share data. It can stay securely where it is and the people stewarding the data can be safe in the knowledge it hasn’t been shifted around. What gets sent back are updated versions of the trained model.”
So far the best known application has been in enabling the development of shared prediction models for mobile phones, such as the predictive typing engine on Google’s Gboard. Inverarity says this is a prime example of one cross-device type of federated learning, in which the global model for training algorithms is sent to the devices, which then apply it to the data they hold and send back the learnings.
The other type, not as widely used but providing scope for large scale applications, is cross-silo, in which the training takes place within organisational systems before being sent back to the central authority. This is where there are possibilities for use within public services.
The early initiatives have been in healthcare. Inverarity cites an example of Moorfields Eye Hospital and AI company Bitfount using federated learning to draw on sensitive patient data in developing a model to identify tumours and signs of macular degeneration.
“It allowed them to use thousands of images to develop more accurate models trained on different types of eyes,” he says. “They hope it will be positive in saving a lot of people’s vision.”
The report says the ODI has not seen any applications in e-government, but it could enable services that would encourage citizen participation and incentivise private organisations to collaborate with the public sector on intelligent services.
Inverarity suggests some possibilities: “Getting more speculative there might be applications when other issues are ironed out. One area getting consideration is in the financial sector to detect patterns of fraud. It might be that a public body like the Financial Conduct Authority might be able to use this kind of technology and get financial institutions onboard to use the datasets to identify where there is fraud without having to have direct access to the raw data.
“In time it might also be related to regulators running audits on things like tech companies and financial services. They could use federated learning for training a model, or the related technology of federated analytics, which uses the same underlying and processes for statistical analysis in a distributed fashion.”
Other possibilities are in areas such as encouraging energy efficiency, measures to tackle climate change and public health initiatives.
He says the necessary investment could depend largely on the use case and that there are open source frameworks available to mitigate the costs. In addition, the technology itself should not pose a major challenge as, despite the low profile, it has been in use for a few years. But there are other issues that need serious consideration.
“People know how to do it and that it can work, but as with a lot of other data issues it’s about trust, how confident the decision makers and risk assessors are with it being done on the data they are stewarding. It’s up to them on whether they want to take the decision.
“It might depend on who is taking part. It might be they need to buy a lot more equipment for it to be compatible for them to run different analysis across devices. It’s having the staff with expertise to do this stuff. And you have a lot of lawyers getting involved to ask if it’s safe and the risks are manageable.
“There will always be a risk, so it is whether it is manageable or palatable, and when you have a couple of data stewards involved there has to be agreement on how things are set up. That’s where more of the cost and effort needs to be at the moment. It’s in making sure people are technically competent to roll things out and that the people making the decisions have the confidence to allow the technology to be used.”
The skills issue is significant to any future public service initiatives, with the familiar problem of the sector struggling to compete against high payers in the private sector for the right people.
Developing the skills among existing staff can also be a challenge: Inverarity says many of the technical papers available are dense and might be inaccessible, and the glossier material is too light touch. This is what has prompted the ODI to produce the report, which he says is aimed at finding the right balance as an introduction to federated learning along with practical guidance.
The guidance includes a number of considerations, on scalability, resource savings, model performance and accuracy, confidentiality, legal and regulatory compliance and security, with descriptions of the benefits – along with rankings on the potential to achieve them – drawbacks and trade-offs for each.
There is also a deployment workflow, involving an ethics review and data impact assessment, design decisions, building a machine learning prototype, federating the model, considering the addition of privacy enhancing technologies and adding new types of data. This should be followed up with ongoing testing and validation, using methods such as threat modelling and testing how much incremental value is added through the training process.
It also looks briefly at federated analytics, which uses a similar infrastructure to federated learning and in which, despite it being in its infancy, there is a growing interest. This is something that is going to need much more work to deliver on promises.
The report’s conclusions highlight a number of points, including that the use of federated learning could pose unexpected risks to privacy and compliance, and more work is needed on governance models. In addition, the technology is at a critical juncture with a risk that it might be deployed primarily by large companies for personalising digital services and increasing profits, rather than dealing with societal challenges.
There has been interest in government – Inverarity says the ODI has had discussions with the Department for Digital, Culture, Media and Sport and the Centre for Data Ethics and Innovation on the issues – with key questions being on the reasons for use.
“It’s not just can we do it, but should we do it?” he says. “Where is it appropriate and responsible to use these technologies? If it is just going to be used for commercial purposes maybe it would not be for the benefit of everyone, and you might see co-operation between actors that already have lots of data, and that might create barriers to others getting involved.
“We’re really keen for the consideration of where it is appropriate to use these technologies. It seems there is a lot of potential behind them, so it’s about harnessing that and encouraging it to be used for the public benefit.”