Content delivery: insource or outsource
17 December 2007, 10:00
Its all gone a bit negative around here (abusing ITV, dismissing bbc.co.uk 2.0, and listing crap URLs), so I think I should write a bit about what I'm up to.
In the constant scaling battle that is running a large and growing content site a key decision is to choose where you put components of your systems. Do you insource, outsource or a mixture of the both?
So, on play.tm we've recently been moving around quite a few pieces of this jigsaw. To give a bit of background - we've got several million pages of content ranging from text to images to video, and we deliver this to around a million users a month. We're a small company where budgets are tight, but we operate in a competitive marketplace where expectation of the user experience are high. Most visitors have broadband, are experienced digital natives and we're competing with everyone from the world's largest media conglomerates to individual bloggers for their eyeballs and ultimately advertising revenue. To minimise budgets and maximise quality of user experience we are constantly striving to improve our technical infrastructure to punch above our weight.
I've talked before about Cachefly. We've been gradually moving play.tm CSS, Javascript and furniture images over to their CDN in order to speed up the site for users who may be far away from our London-based origin webfarm. The high cost of storage means that this works well for the small most hit upon files - we host just 5mb of files there, but it is hit on millions of times a day. However, it becomes too expensive to host content images (we have gigs of them) and videos (again, gigs of them and growing very fast). The only problem we've found with Cachefly is on some Asian connections the latency isn't much better than that to our London server farm. Anycast seemingly doesn't do its job properly and routes those users to San Jose, rather than the probably optimal Tokyo POP. On the whole though this service dramatically improves the speed (and scalability) of our site, particularly in off-beat locations, making us (anecdotally) faster than the competition whilst having an (arguably) more richly designed site.
Images and video are indeed a different game. Very large amounts of storage are required, making CDNs prohibitively expensive due the fact that they need to replicate your content over many POPs. Also, bandwidth on CDNs can be pricey. With next-gen consoles (well, the PS3 and Xbox 360) capable of outputting HD content, the screenshots and video from these games are increasingly of a very high resolution and therefore file size. Also, there is the cultural change that PR companies are becoming more aware of the internet as a medium to distribute video - and therefore are producing and distributing more of it. The upshot of this is a storage nightmare, what we need is inexpensive storage, bandwidth and most importantly the ability to increase the amount of storage available.
In comes S3, a web service from Amazon. Simple Storage Solution (SSS, or S3) allow you to store and transmit files with a simple model of paying for only what you use. So, we could purchase a big array of disks, the server(s) and software to make them spin, networking gear to make them communicate and the bandwidth for them to talk to the outside world. Problem here is that there is a huge upfront cost to buy all the storage we think we may need for years to come - we could get the planning wrong, and we'll probably only be utilitising a fraction of its capacity to begin with. S3 on the other hand allows us to pay for only what we use - GB storage per month, any bandwidth used, and a very small cost per HTTP transaction. Most importantly there is no limit on the amount of files we can store (if there is, it's ridiculously high), and as a bonus the service won't degrade if we get busy. This solution isn't the best for all usecases, but for our videos it works a treat.
One other area of the site we've outsourced is the RSS feed delivery. We found that RSS feed hits made up a very large proportion of the hits on the site - up to 50 hits/sec. Whilst we did our best to cache the script's output the capacity of the webheads could certainly be used for something more interesting than pushing this XML. The solution came in the form of Feedburner - this Google owned service takes a pre-existing feed and "burns" it, producing a copy that can be delivered to the end user. Since Google bought Feedburner they made their "Pro" services free, allowing access to lots of interesting analytics and to "burn" the feeds to a domain name of your choosing. As we already had many people using the feeds we opted to setup a new feed origin, let Feedburner get its content from there, then redirect all the old users to the burnt feed. It worked really well, and as a bonus Feedburner provide us with a load of detailed stats we previously didn't have access to. Feedburner's pinging service took another process off our boxes too.
RSS Feed
about
Jason is a web developer living in London, working for Google and Ferrago Ltd.
links
me
- Flickr (photos)
- Facebook (social network)
- del.icio.us (bookmarks)
- last.fm (music)
- Linked In (business social network)
- Twitter (microblogging)
- Google Reader (bookmarks)
work
- Ferrago Ltd (company)
- play (company's product)
- MetaWeather (company's product)
colophon
- Built in ASP.net using a Mac and TextMate
- Hosted on Windows, with IIS & MySQL
- Activity feed by FriendFeed
