When chatting with a Brit, there are few better ways to break the ice than talking about the weather. According to one report, almost ALL Britons admit to having conversed about it in the past six hours. For Windward, though, the weather is something that, until recently, had been missing from our conversation.
This might seem strange. Our ability to identify ships’ behaviour is based on machine learning identification of vessels’ actions. If we knew what the weather was like where a ship was operating, we’d know if the ship was stopping in calm seas or high waves; we’d know if it was drifting with the current or sailing slowly against it; and we’d improve our ability to understand why a ship acted as it did, and be better able to assess its risk level.
And so, Windward’s Product Management team launched The Weather Project. It began by attempting to answer the questions: Will weather data give us – and therefore our customers – a better understanding of ships’ behavior? Which parameters would have the biggest impact – now and in the future – and provide our system with as much additional data as possible?
At this stage, we decided to look into different weather data sources, to see which would best fit our needs.
The first thing we discovered was that our needs were different from most other consumers of weather data. What most users need – and most companies provide – is forecast data i.e. what the weather at location X will be in the next hours/days/weeks. And not just any old forecast: consumers want the most accurate forecasts. Yet a forecast is just an estimate; by its nature it cannot be completely accurate.
At Windward, we require accurate (measured, not estimated) weather data for the entire world – now and in the past – ideally going back years. These unique requirements present a unique challenge. Nevertheless, we found several companies that could help. After some research, came to the following conclusions:
We decided to create a Proof Of Concept (PoC) with the two most promising data sources (for our purposes): Meteomatics and NOAA, and compare the results.
NOAA provides a batch of historical data (in netCDF format), which goes back years, and which can be downloaded and used offline. But it has limited scope and granularity.
Meteomatics, on the other hand, provides an online API which allows queries on their data. Each query can span multiple dates, data types and locations.
An interesting point about NOAA data is that it’s kept in a scientific format called NetCDF. This will be familiar to the world of academia, but not to the world of tech. Working with this format brought an additional level of complexity: all the data was kept in a 4-dimensional array of values:
What’s interesting about NetCDF format is that to minimize the data footprint, the data is stored in a particular fashion.
For example, longitude and latitude, which are float values representing real latitude/longitude values around the globe, were respectively stored as integer/positive values from 0 to 719 and 0 to 1440, representing points around the globe; 0.25 degree iterations were represented by single point differences.
Another example is time, which is calculated in hours since 1978-01-01.
Here is an excerpt from the file’s metadata:
After the PoC which included a comprehensive comparison and analysis of both NOAA and Meteomatics data we decided to use the Meteomatics API for a few reasons:
We can see that in some cases when comparing NOAA with Meteomatics we are able to get a stronger signal indicating cases of extreme weather, due to higher geographical and time granularity.
In conclusion, we see weather data as an important addition to Windward’s capabilities, enabling us to:
We can’t promise to talk about the weather as often as our British friends, but you should expect it to appear a lot more regularly in our future conversations.