The national infrastructure is a giant, highly complex beast, and the data it produces matches it in scale and complexity. Making sense of it all is probably the most demanding challenge for the analytics and data science community.
This is where DAFNI comes in – full name Data Analytics Facility for National Infrastructure – 18 months old, backed by £8 million for four years from the UK Collaboratorium for Research on Infrastructure & Cities (UKCRIC), and managed by the Scientific Computing Department of the UK’s Science and Technology Facilities Council.
It has a small team – eight developers with various types of expertise – but has been noticed at the top level of government: Chancellor Philip Hammond pointed to its ability to carry out in-depth analysis when he commissioned a study on infrastructure resilience in October. It looks set to be a key player in the study, but has a remit to work with a wide range of public, private sector bodies looking for insights into the national infrastructure.
Sam Chorlton has been involved for more than a year, initially as the technical architect and now lead for DAFNI. Speaking with UKAuthority, he says it will not go live in its beta form until July of next year and the details of how it will contribute to the study are still to be worked out; but it has already done some work that lays the ground.
“We’ve already run a couple of case studies with universities looking at how DAFNI can support them,” he says. “One has looked at housing demand based on increases in interest rates, another at the impact of the roll out of the 5G network.
“This has started to demonstrate the power of automation and will help to provide non-expert access to the models. That’s going to be increasingly powerful.
“We’re also starting to work with some of the urban observatories running smart sensors in their cities and looking at what role real time data can play in a system like DAFNI. It is not actively used in research and we’re keen to see where it might be able to secure insights.”
He sees it as a daunting challenge, reflecting the current shortcomings in how organisations are able to carry out data analysis and research.
“Often academics broker own arrangements for access to small subsets of data or just use what they already have in universities. The problem with this is that research across two institutions may be inconsistent because they are using different reference datasets.”
His experience is that they either shoehorn data into a single format, which has a negative effect on the strength of the results, or store it as native files that users can download, spend time adapting and use for analysis. This can work for individual projects but has little value for re-use.
DAFNI’s prime role is to respond to these problems.
“So we are trying to help by providing facilities to host that data – we have about 300 Tb of storage that will grow over coming years – and we have software in place to hold that data in what could be the most optimal format. We also look at how we can provide an interface to simplify how people access the data.”
It has an open source technology stack and a federated database architecture – derived from work by one of its partners at Newcastle University – in which it can act as a broker to interpret requests for data and send it to the most appropriate database. Chorlton describes it “almost a database and databases with a little logic controller to route requests and work out how to handle the complexity in the results it returns”.
Complexity is the key word. Chorlton says that managing a system with multiple databases is always a problem, and that no solution completely solves it because every data project is slightly different.
The team is planning to address this by using big data frameworks such as Apache Spark and Apache Drill which can provide a single interface, although he says they raise the technical barrier high. It will also take an agile approach to delivering projects, and aim to identify the cases when only a single type of data from a single source is needed and it can direct the user accordingly.
“Perhaps as we incrementally release more, if someone is trying to pull datasets from two databases that use different technologies, that’s when we push it out to this more complex engine,” he says.
“It might only support the more complex queries, but I think this is the approach to doing it.”
Open data issue
He also anticipates problems in obtaining some types of data. DAFNI anticipates using open data as far as possible, and while much of it is very good quality he says there is still a need to ensure its provenance is clear.
“There might be cases where someone has done some work on it and published without assisting that provenance. We want to provide a gold standard of data with traceability back to source where possible.”
It will also require data from bodies such as utility companies that see it has a commercial value and could be reluctant to make it available. Overcoming this could involve showing them how they could get some value in return, possibly through feedback, access to models and improvements in the quality of their data.
“There’s a lot to do in working out these processes before we can go any further,” Chorlton says.
DAFNI is also likely to use plenty of government data. While some – such as general demographics and features of the national landscape – is not at all sensitive, the more granular data can often have implications for privacy and whether it is appropriate for use in research.
“We are there to facilitate not conduct the research, but want to make sure as many measures are in place as possible to ensure security is ingrained and data is used in a responsible and appropriate fashion,” Chorlton says.
It has not prevented DAFNI from early steps to provide paths to the data. He says it already has access to Ordnance Survey datasets through the Public Sector Mapping Agreement and is talking with the Office of National Statistics about the potential.
But he also anticipates plenty of requests that require a more bespoke approach to obtaining the data. This is likely to run into a new range of complexities, but DAFNI is looking at whether it could reduce some of these with a data licensing model or a framework to provide a more formalised approach.
“Hopefully we can find some more forward looking organisations, work with them first, help to establish that value then demonstrate it to others,” he says.
This reflects what he sees as one of the big future benefits from its creation – in providing a point of collaboration for a research community that is largely disconnected at the moment.
“There are micro-communities in areas such as energy and transport modelling, but a platform that allows them to pull all the knowledge together does not really exist at the moment. That’s an area where we see huge gains.
“Another is re-usability of research. It tends to get done by an individual, a paper written up and then just a few instances where it is matured and used by industry. I think there is the potential for more research to be matured and used by industry or government.
“We are trying to provide standards throughout the process to make the maturation much easier and accessible.”
He also sees the potential, supported by cloud computing, of helping to scale up research from local institutions to a national scale.
Benefits to emerge
It should all be open to academia by July of next year with a range of datasets available for more direct use and processes in place for bespoke requests, and there are discussions with other public authorities and industry about how it can support them. He thinks the early case studies have shown some benefits, notably in how the process is managed, and that others will soon emerge.
The big win in the foreseeable future could be in contributing to the chancellor’s infrastructure review. Chorlton points out that any investment in infrastructure has to consider a myriad of factors – healthcare, environmental effects, demand for services – and it is often when the second or third order effects come out that the research shows its real value.
But at the moment infrastructure research is quite fragmented, and there is a need for a more cohesive approach.
“There are small inroads to this in academia already, and we are trying to do this on a wider scale so there are benefits for everyone. If we can provide a step towards that over the next four years it would be an incredible benefit to realise.
“The lack of precision is that there is so much opportunity for change, and as a result of that there are so many directions in which it could go and so many stakeholders it could support. It’s navigating a course through that with a thin slice of functionality then add depth where we see gains being realised.”
Image from the Science and Technology Facilities Council