Industry voice: Public authorities can harness the power of Apache Spark in the cloud for a low risk approach to experimenting with their data, writes Alex Purkiss, lead for government and life sciences at Databricks
There can be few in the public sector who do not see the immense potential of using data for the public good. And from UKAuthority's recent Data4Good event there is a visibly growing appetite to experiment with data in search of new insights and solutions to the sector’s major problems.
But many are deterred by the complexity, the perceived cost and effort involved and what they see as a risk in burning up resources in an historically resource intensive area.
The advent of cloud, however, has opened up new opportunities to explore and exploit big data while avoiding investment in digital infrastructure. It has opened the door to agile, dynamic platforms that can be used ‘as needed’ without prior investment. The Databricks Unified Analytics Platform, for example, not only runs an optimised version of the lightning-fast Apache Spark unified analytics engine but also offers interactive notebooks, integrated workflows, and full enterprise security. By unifying data science, engineering and business it enables multi-disciplinary teams to collaborate and ask unlimited questions - test hypotheses and provide an evidence base for strategic decisions.
It can enable organisations to repeatedly ask ‘What if?’ and establish the likely answers, which in turn can provide for better operational decisions and strategic planning. And the fact that it runs on the cloud, using the ‘pay as you go’ model, makes it much less expensive than investing on new internal infrastructure.
This is a cost-effective approach that encourages a new culture of experimentation and helps to reduce the risk in developing new approaches to public services.
It has already made a mark in the public sector, where it is being used by NHS Digital to help explore the use of big data and unified analytics. Andrew Meyer, director of the organisation’s Digital Delivery Centre, recently confirmed that Apache Spark is used throughout its core data products and that it was exploring how Databricks can be used to further enhance these services.
The potential is limited only by our ability to ask ‘how?’, ‘why?’ and ‘what if?’ and can extend into a wide range of public services. Databricks is already being used to analyse satellite and drone imagery to not only identify potholes but predict where they will occur in order to rapidly repair and prevent further damage. It can support social care by combining data from multiple sources to highlight trends, identify individuals at risk and lay the ground for early interventions. Some of those sources are the type that provide a messy, confusing picture that can be difficult to combine with more structured datasets, but Databricks provides a foundation for establishing connections and finding previously hidden patterns.
Similarly, there is potential in policing, investigating an array of factors in the social landscape to predict potential flashpoints; in environmental management, in combining IoT data with that from other sources to assess factors such as air quality, flood risks and traffic congestion; and in public health, identifying the influence of demographic and social conditions. The potential list of use cases is endless.
Coping with complexity
The Databricks platform can help organisations deal with the increasing complexity of data as they seek to use and combine new sources, such as the array of sensors in the internet of things (IoT), audio files, geospatial datasets, satellite imagery, from partner organisations and public sources as open data and social media.
It can be programmed to access the data in multiple systems, bring it together and convert it into a usable format for a specific purpose. This helps to deal with one of the major challenges in harnessing data – obtaining it from an array of legacy datasets in diverse formats with varying structures, and identifying any duplications and anomalies to ensure the algorithms work effectively.
Databricks works beyond traditional text and numerical data to the potential in geospatial information, aerial photography, audio and video records, and the patterns that emerge from the unstructured noise of social media.
It includes three features that provide flexibility in how it is used: a unified engine for complete data applications, the provision of user-friendly APIs; and the basis for building modular solutions that can be scaled up and remodelled as the demand for services evolves. These enable authorities to respond to changes in what they need, effectively future proofing the way they manipulate and interrogate the data.
This all encourages the ‘What if?’ approach and provides increasing value as authorities work on harnessing big data and adopting machine learning technologies.
Public services operate in an increasingly complex landscape, but Databricks can help them to reduce that complexity, make sense of the relevant data and answer problem-solving questions. By enabling a culture of data experimentation Databricks can play a significant part in preparing public services for a challenging future.
Learn more about Databricks
- Download Databricks’ white paper: Getting Started with Spark on Databricks
- Try Databricks with a Free Trial on Azure
- Watch the webinar on Streaming Analytics Use Cases on Apache Spark
If you would like to discuss how your organisation can harness Databricks and create a culture of data experimentation, email Alex Purkiss, Databrick’s UK Government Lead here