Data technologies

Combining latest data technologies to address Big Data challenges in industry and the wider world

Utilising Big Data

We need to move away from a one-off pro­ject ap­proach in Big Data and in­stead uti­lise the power of many tech­nol­o­gies to create re-usable Big Data work­flows.

—Katharina Reusch, Software Engineer

Big Data has been a hot topic for a number of years. Big Data platforms are well established and many data science projects are under way, solving real-world problems. But are we using the existing technology to the best of its capability and capacity? Do different technologies integrate seamlessly and are they flexible enough to be re-used for other scenarios? We believe more can be done to achieve true Big Data workflows, which are re-usable, flexible, powerful, data-centric and modular to give researchers and stakeholders the best possible data experience.

Our team aims to utilize and develop existing technologies (such as the Hadoop Open Data ecosystem, Spark, Python, agent-based modelling, Internet of Things technologies, machine learning and more), whilst bridging the gap between the worlds of Big Data and high-performance computing, integrating on premise with cloud workflows, creating novel visualisations and, ultimately, presenting powerful insights in a clear and engaging fashion. This is the key to providing different industries with solutions to some of their previously unsolvable problems.

Case study

CoCoA — Cognitive Connectivity Advisor

Smart route planning keeps you connected while you drive

The Cognitive Connectivity Advisor uses Big Data and machine learning to keep us in touch with our world. When trying to model or measure mobile phone signal strength, we are hampered by the fact that, whilst the physics of radio propagation is relatively simple, the environment we live in is very complex. We brought together a number of sources of open geospatial data (including elevation, weather and land use), along with data on the mobile phone network infrastructure and both crowd-sourced and self-collected signal strength measurements to model the signal across the whole of the UK.

Whilst these predictions can be served through an API, we also wrapped this in a route planner, which informs about signal strength along potential routes. This would allow a traveller to schedule a journey or choose a route based on their connectivity requirements, or at least be warned when they were about to lose service. We are currently extending the initial work to incorporate additional datasets and develop more complex deep learning models to improve accuracy. Such a service would find utility across a wide of applications including transport, construction, agriculture or the emergency services.

Case study

Smart Crop Data Platform

Helping agricultural researchers make better models and serve their insights to farmers

The agriculture sector has many different stakeholders, from farmers to suppliers and researchers to supermarkets. What all have in common is a large amount of data which contain useful insights and, if utilised efficiently, can benefit everyone. In collaboration with Rothamsted Research, we created a platform and workflows that utilise the group’s expertise and existing strength in infrastructure by leveraging high-performance computing (HPC) and Big Data technology.

The result is built on IBM Research’s geo-spatial data repository PAIRS, which provides a central data store where data is aligned, indexed and retrieved quickly through a powerful data spatiotemporal query engine. On top of this, we built additional apps and services, either for researchers to analyse their data or to inform stakeholders about latest trends and forecasts.

With moth migration data from Rothamsted Research, we can store data efficiently in PAIRS, query relevant data as model input for moth migration predictions and inform farmers depending on weather conditions when moths are expected.

We demonstrated how we can combine Rothamsted’s insect observation data with different models that incorporate weather patterns to prediction insect migration routes and provide forecasts about the arrival of insects in any given location. Such forecasts can allow farmers to make informed decisions about the amount and schedule for pesticide usage, significantly reducing costs and maximising yields. This is just one of the many examples where an easy-to-use geo-spatial data platform can provide real benefits in the real world.

Ask the experts

Blair Edwards

Blair Edwards

Katharina Reusch

Katharina Reusch

Lan Hoang

Lan Hoang