Django – a flying visit

So rather jokingly the other day I asked where the London Hackspace jobs board was, and 20 minutes later I’d decided to make it. I needed a little project as a break from the main commercial app I’m working on, which for the last week has involved staring at some rather involved postfix/mailer code. As I usually do, I grabbed Rails, but paused to update to Rails 3.1; I figured if I was going to start a new app, given Rails 3.1 is at rc6, I should probably go with that, knowing that a lot of the asset code would need rewriting shortly thereafter otherwise.

However, I quickly hit problems, probably just with interactions between my authentication library of choice (devise) and 3.1. Notably I got a complete appserver lockup on user registration, which sort of limited what I could do. Getting frustrated at how much had been fiddled with in 3.1 for seemingly no reason, I decided to take a look at how Pythonistas get their web fix.

Continue reading Django – a flying visit

Time-series data in Redis

For Insanity, I’ve been working on some of the support services that are built into the website that provide our staff with information from ancillary services and tools and bring it into a clear and useful format for decision making and monitoring. Latest on the agenda has been listener figures from our Icecast streaming servers. While this isn’t a perfect benchmark of our performance since we broadcast on traditional media too, it is certainly one of our most important benchmarks in measuring show quality and popularity, not to mention listener habits and trends.

We’ve historically relied on a RRDtool database updated with Munin and an icecast plugin. While this served us well in the single-server days, we recently added a relay server to help listeners with network problems connecting to our JANET-hosted box. Now we have to handle summing two statistics sources and compensating for the added relay connections. At this point I weighed up writing a Munin plugin versus rolling my own and decided to try whipping a solution up using Redis.

Redis is mind-blowingly fast, and exceptionally flexible to boot. We’re already planning to use it for caching, so it makes sense to use it for statistics storage. So, the goal here was:

  • Fast inserts
  • Very fast retrieval of arbitrary time ranges

Simple goals. I did some digging around and there’s a lot of different approaches to storing time-series data in Redis. The scheme I used in the end uses sorted sets to store data, indexed by the timestamp, and with the timestamp included in the data to allow for duplicate values. The sorted sets are partitioned by day; for a regular update interval we’re looking at ~8,000 points per day.

Updates take a bit longer because Redis has to do sorting on insert, but that’s actually scalable – O(log(N)) – and the losses there are regained tenfold when doing retrieval. We keep the datasets small by the partitioning, meaning that N in O(log(N)+M) is kept low- M is dependent on the query. I have yet to benchmark this all because I have yet to notice the extra performance hit on the pages- it’s snappy in the extreme. We’ll have to wait and see on how well it scales up, of course.

We do get a bit of overhead because we have to split the timestamp and value apart before we can use either. But that’s pretty trivial overall. We’re also putting the statistics into a GSL vector using rb-gsl-ng, which means subsequent stats operations on the dataset are fast; we can generate a page with 80-odd sparklines and statistics summaries generated from 80 different queries without adding more than 50ms to the page load time, which is completely acceptable. Overall, this is working very well indeed. I’d love to see Redis add more innate support for timeseries data with duplicate values, but for the time being this workaround is doing okay.

As an addendum to this post, redistat is another tool worth mentioning, partly for the similar application but also for the alternative method of storing key/value data in Redis, albeit in a manner more geared towards counters rather than time-series statistics data. Worth having a look at if you’re interested, in any case.

Getting set up with Ruby and Rails

I’ve had a lot of people asking for help setting up a Ruby on Rails environment recently so figured I’d put a post together detailing how I set up my boxes.

This won’t be a guide for everyone, but it’s a tried and tested setup that not only performs well, but is also well set up to work with most gems and the development tools you’ll want. This guide covers both development and production environment setups.

Continue reading Getting set up with Ruby and Rails