MetaWeather Beta Launch
26 February 2006, 18:12
My blog has cooled down a bit recently (as if it was 'hot' beforehand), as I've spent most of my spare time moving house and getting MetaWeather going.
The concept is pretty simple. It's a weather data aggregator, that goes out and acquires weather information from sources (primarily XML, or all XML, depending on how you look at it - more later) and then calculates the average weather.
Seemed nice and simple on the face of it, and it was to a degree (weather pun intended), however were are some complexities I encountered...
Datasources - getting XML
The first problem to solve was acquiring the data. This is done by consuming XML over HTTP from various outlets over the web.
One source I particularly wanted to use data from was the BBC, mainly because I work for them, and also because I'm interested in the backstage.bbc.co.uk initiative. Problem is that the BBC don't obviously publish their weather data as XML. I presume this is because of licensing issues with the Met Office - but they continue to assure backstagers that it is coming. Quite a few apps available around the web use BBC weather data, the biggest of which appears to be the Konfabulator widget (mentioned on the backstage.bbc.co.uk site here). A quick shifty through the code (which turns out to be some nice digestible javascript) reveals that they are in fact consuming the BBC's WAP site for data.
Now, I think WAP is crap (so does Jacob - the buying a newspaper bit is hilarious) and I've never used the BBC WAP site before (I have however used the far superior mobile site). It seems WAP is valid XML, and contains quite a bit of weather data. It needs some string manipulation to get the readings out, but most of the legwork can be done via XPaths.
Weather.com has well thought out web service, although the registration on their site is a little hard to find - probably because they aren't making any money from it (or at least, not much). Not much to say there - it just works.
Weather.gov is a site run by NOAA who appear to the equivalent of the Met Office, on steroids - presumably because they actually have weather (proper weather, we're talking city levelling stuff) in the US. As opposed to perpetual drizzle over London. NOAA offers a variety of XML goodies via their site, including an interesting interpretation of the use of RSS (if not exactly per spec, I'd argue) in this feed system. Weather Underground presents its data in a similar way.
I've not found any other free sources yet, but I'm working on it. Accuweather is another large site, but it appears they have a view on free weather data that isn't exactly consummate with the aims of this project.
Standardising the data
Once I've got hold of the data sources the problem of storage occurs. We'll have to standardise the data so we can store it, then later compare it.
The data is recorded with a 'fetch date' and an 'applicable date'. Data with the same or similar (within a specified time period) 'fetch date' and 'applicable date' are called 'observations'. Readings with an applicable date in the future are therefore 'forecasts'. I suppose data with an applicable date in the past could is archived data. As its not much data we're talking about (and I have a fair amount of database juice to play with) I'm not destroying any data.
Most of the measurements are pretty straightforward to standardise. Temperature is held in centigrade, wind speed in MPH, wind direction in degrees - all fairly easily converted from various formats in which the sources are delivered - Fahrenheit, KPH or a compass point as examples. I'll probably post all these conversion functions online at some point.
However the most important data is the hardest to tie down - what we've called the 'Weather State'. This is often presented to us as a short string such as 'Showers' or 'Sunny', but more often than not its more complicated. Most weather sources offer an icon to go with this data - a much more machine readable and workable way of dealing with this problem than text designed to be read by a human. It is this data that we use to determine the weather state.
If I was an IA (information architect), I'd be referring to the different sets of source's data as 'controlled vocabularies'. MetaWeather has its own controlled vocab, and we also hold a big fat thesaurus. This thesaurus is a mapping of the different source's entire vocab to MetaWeather's. So for example... WeatherUnderground may call a particular weather state "Light Snow Showers" however MetaWeather will simplify this to just "Snow" in our controlled vocab. MetaWeather does sometimes take into account states that may be in between two in it's controlled vocab. For instance - a simple example - "Showers" is between "Clear" and "Rain". However, I'm not an IA, and this is a really simple example of controlled vocab with not a lot of data.
Some problems arise with this when you get to the nitty gritty of some source's data. For instance a source may return "Heavy Volcanic Ash", "Unknown Precipitation" or something equally crazy. In this case, we just disregard the weather state from this source - after all we have other sources to get data from, and these states don't occur very often.
Average Wind Direction
Wind direction is held in the database in degrees, and as such represents a problem in calculating the average. For example if one forecaster predicts a 10o wind and another a 350o gust then the mathematical average will be 180o, not 0o.
The solution was to combine the wind speed with the wind direction and convert it into a vector. Painful attempts to remember GCSE maths lessons ensued along with roping in of workmates to explain the vagaries of radians and trig functions. I've still to understand this fully in the context of this project, and therefore haven't implemented it yet. The rubbish average wind direction works for the moment, with a note explaining its inaccuracy.
Predictability
Predictability is a measure of how different forecasters predictions of the weather are. The calculation is pretty straightforward, and just involves getting the standard deviation of the weather state values and converting that into a percentage. Eventually I'll hope to get a whole weather predictability as opposed to just the weather state, by averaging the percentage deviations across other readings, which will probably give us a better view on the predictions.
So thats it for now. I've got a lot more to do on this - loading more locations, more sources and probably knocking up an API to present all the data.
RSS Feed
about
Jason is a web developer living in London. My clients include numerous startups, Google, & the BBC.
links
me
- Flickr (photos)
- Facebook (social network)
- del.icio.us (bookmarks)
- last.fm (music)
- Linked In (business social network)
- Twitter (microblogging)
- Google Reader (bookmarks)
work
- Ferrago Ltd (company)
- play.tm (company's product)
- MetaWeather (company's product)
colophon
- Built in ASP.net using a Mac and TextMate
- Hosted on Windows, with IIS & MySQL
- Activity feed by FriendFeed
