Bring your own data

“The world’s most valuable resource is no longer oil, but data.” While this 2017 headline in The Economist may be a bit of an exaggeration, for many organizations it’s not a million nautical miles from the truth: marine insurers have claims data; governments have data from myriad of sensors; ship charterers have data on destinations and cargos. This data is hugely valuable. But like any asset, its true value can only be realized if used effectively. And that might mean sharing it.

Sharing is Caring

Sharing proprietary data? Are we delirious? Not quite. To be sure, you’d need to consider issues like security, privacy, and ease-of-integration – not to mention the quality of the data you’re marrying, and the value it brings to your organization. But when done thoughtfully, bringing your own data to the table can be incredibly beneficial.

Here at Windward, we love marrying different types of data. Take vessel operations. Gaining insights into how vessels are operated over time can be an indispensable component in decision-making across the ecosystem. To support these decision-makers, over the past seven years we’ve incorporated data from independent and complementary sources (including AIS, GIS layers, weather, and nautical charts) to construct a comprehensive model that explains various aspects of vessel operations. Many other sources were considered but not integrated. As the objective was to create consistent insights on the world fleet we prioritized data sources that met three criteria:

  1. Added value – the new source should be complementary to existing sources.
  2. Global coverage – because every vessel could be relevant.
  3. Historical data should be available for at least several years.

Where the first reason is obvious, the second allows us to provide out-of-the box value for various parts of the maritime ecosystem around the world. The third is also crucial – it helps us understand trends over time, detect anomalies, and train our machine learning models on past behavior (for example on accidents that took place during the past six years). Without it, it can be years until a new source of data can reach a sufficient level to be used in predictive modelling e.g. for accidents.

This framework helped us to focus our efforts. Sometimes this meant neglecting very high-quality data (albeit on a more local scale). To put it in arboreal terms, we had to build the trunk first.

Growing the branches

As our tech has evolved over the years, sprouting a number of different branches, we’ve come to appreciate the potential value of combining our data with clients’ proprietary data. For example, we’ve improved the way we incorporate private positional data (e.g. VMS), satellite imagery, and insurance claims. We have also improved the way insights generated by our system are exported and integrated into client systems – but that’s a story for another blog post.

Of course, the world around us doesn’t stand still. Many organizations, including large governmental agencies, are starting to open up to the idea of sharing some of their data with a view to obtaining tangible value they couldn’t get from internal sources (the cloud computing revolution has accelerated this trend). Ultimately, they concluded that the potential benefits of using certain, specialized third-party tools outweigh the risks. One such example is the way governments and Global Fishing Watch combine technology and ML with proprietary VMS data to fight Illegal fishing.

Satellite imagery overlaid with AIS data can provide interesting insights, for example in detecting vessels that have gone “dark” Source: Windward, Planet

To get on-board the data-sharing revolution, there are five main considerations:

  1. Security: Proprietary data must be safe; third-party vendors should adhere to the highest security standards.
  2. Privacy: Data storage should be compliant with regulations (such as GDPR) with accessibility to the data restricted to authorized personnel.
  3. Value: Combining your own data with that of a third party should bring valuable insights that cannot be obtained in other ways (e.g. with in-house tools).
  4. Quality: The data should be of the highest quality with clear definitions as to who is responsible for cleaning and/or enriching the data.
  5. Integration: The final consideration – which should not be underestimated – is the ease of integrating the data. Asking questions about the structure of the data pipeline or the maturity of an existing API could help assess whether the project will be an integration nightmare or a dream.

More and more organizations are getting an edge by deriving insights through the combination of datasets which were previously siloed and isolated. The eyes of decision-makers need to be open to identifying the right opportunity at the right time – and jump aboard.

Yair Mazor is Head of Data Science at Windward