Next steps: Video streaming and production

I’ve done a lot of blogging on radio and Rivendell in particular. I’m a huge proponent of open tools and technologies wherever possible because it provides tons of flexibility, is cheap, and in many cases is just as powerful or easy to use as the commercial stuff. Radio and audio is complex, but why stop there? At Insanity we’ve been evaluating video streaming as a way of adding to our existing broadcasts and coverage, as video can be far more engaging to consumers than audio, particularly in the YouTube era. But with Insanity, we have one major problem: We don’t have any money!

So, you might figure that’s a problem. You’d be, partly, right. Video involves a lot more data, loads more numbers to crunch as a result, more bandwidth, and so on. Not to mention the relevant methods of capture are immensely more expensive to implement than the equivalent-quality audio. I’d like, though, to highlight a few nice things for open source video and production. Continue reading Next steps: Video streaming and production

How to fix voting by popularity

At SURHUL, the Student’s Union of Royal Holloway, University of London, we have a problem. I’m sure it’s not an uncommon one, particularly at student’s unions.

Our electoral system is essentially a popularity contest. Manifestos, campaigning and student outreach have very little impact on the results. Many positions are uncontested and whoever runs wins by virtue of being the candidate who is running; people assume that this means that they care about the position enough to run, and that’s enough for them.

I don’t think this is a good way to run elections, and it’s not something that should be encouraged. But it’s something that can be very easily fixed, or at least I think so.

Continue reading How to fix voting by popularity

Time-series data in Redis

For Insanity, I’ve been working on some of the support services that are built into the website that provide our staff with information from ancillary services and tools and bring it into a clear and useful format for decision making and monitoring. Latest on the agenda has been listener figures from our Icecast streaming servers. While this isn’t a perfect benchmark of our performance since we broadcast on traditional media too, it is certainly one of our most important benchmarks in measuring show quality and popularity, not to mention listener habits and trends.

We’ve historically relied on a RRDtool database updated with Munin and an icecast plugin. While this served us well in the single-server days, we recently added a relay server to help listeners with network problems connecting to our JANET-hosted box. Now we have to handle summing two statistics sources and compensating for the added relay connections. At this point I weighed up writing a Munin plugin versus rolling my own and decided to try whipping a solution up using Redis.

Redis is mind-blowingly fast, and exceptionally flexible to boot. We’re already planning to use it for caching, so it makes sense to use it for statistics storage. So, the goal here was:

  • Fast inserts
  • Very fast retrieval of arbitrary time ranges

Simple goals. I did some digging around and there’s a lot of different approaches to storing time-series data in Redis. The scheme I used in the end uses sorted sets to store data, indexed by the timestamp, and with the timestamp included in the data to allow for duplicate values. The sorted sets are partitioned by day; for a regular update interval we’re looking at ~8,000 points per day.

Updates take a bit longer because Redis has to do sorting on insert, but that’s actually scalable – O(log(N)) – and the losses there are regained tenfold when doing retrieval. We keep the datasets small by the partitioning, meaning that N in O(log(N)+M) is kept low- M is dependent on the query. I have yet to benchmark this all because I have yet to notice the extra performance hit on the pages- it’s snappy in the extreme. We’ll have to wait and see on how well it scales up, of course.

We do get a bit of overhead because we have to split the timestamp and value apart before we can use either. But that’s pretty trivial overall. We’re also putting the statistics into a GSL vector using rb-gsl-ng, which means subsequent stats operations on the dataset are fast; we can generate a page with 80-odd sparklines and statistics summaries generated from 80 different queries without adding more than 50ms to the page load time, which is completely acceptable. Overall, this is working very well indeed. I’d love to see Redis add more innate support for timeseries data with duplicate values, but for the time being this workaround is doing okay.

As an addendum to this post, redistat is another tool worth mentioning, partly for the similar application but also for the alternative method of storing key/value data in Redis, albeit in a manner more geared towards counters rather than time-series statistics data. Worth having a look at if you’re interested, in any case.