1. I recently evaluated Redis for a similar task to this too, along with MongoDB and MySQL 5.1.

    Whilst Redis is very fast for the inserts, but it’s not so good at summarising the data in the ways we needed (such as grouping entries for a given metric by the hour and summing them). To do this with Redis meant extracting data and doing the operations ourselves in our code.

    Mongo was able to do some of these kinds of operations internally but required turning to mapreduce for others (which was slow).

    MySQL/innodb was the winner here for us. Very simple schema with timestamp, metric name, value (primary key is combined timestamp and metric name columns). All the types of operations we need to do are MySQL’s bread and butter – easy and very fast.

    I was inserting my test data at 3500/s with no real MySQL tuning (and bin logs enabled). It was faster at grouping and summing too, as we didn’t have to do it ourselves (in Ruby). With 3.5 million records I was able to summarise hourly data for a given metric, suitable for graphing a 60 day graph, in about 80ms.

    By my calulcations, we should be able to store 98 million of these stats in MySQL with about 10gig of RAM (which is enough for hourly data for 4 metrics from 17,000 virtual servers for 60 days).

    All this and it’s transactional too, so grouping and storing these data to day or week or month after 60 days (and deleting the hourly data atomically) is easy peasy.

  2. Mark Cotner

    I have to agree with John Leach as much as I hate too. If your insert requirements allow for a relational DB then using one is a better option. However, I will say that getting data into PostgreSQL can be faster if you do it correctly and you’ll have something other than loop joins available for the data analysis part.

    Noone realizes the limitations of MySQL going in. It’s later that it rears its ugly head. Keeps companies like Percona very busy. :)

    Just in case you’re wondering where this biased opinion comes from . . . I’m a MySQL DBA.

    Nice time series post, by the way. I think redis definitely excels at this if you need to justify the additional write speed. I’ve been working with telemetry data for 10 years now and won an award from MySQL(2nd runner up for app of the year) some time ago. Time series is something I love working with and optimizing.

    There’s no rule that says you can’t use redis for insert velocity, write a simple program to summarize said data(usually end up doing this anyway in relational DBs) and put it into a relational store like PostgreSQL. If insert velocity warrants this it could make for a nice RRD like hybrid solution.


  3. Mark Cotner

    I just noticed you also play eve. I’m awksedgreep(along with 9 other alts) and usually hang around in Sinq. I’d love to chat about time series if you want. Eve mail or hit me up anytime for chat.


  4. I don’t disagree at all with the idea that using MySQL/PostgreSQL (we use PostgreSQL as a general database, so that’s an option) is certainly as if not more flexible and gets you some major benefits in terms of the available tools for querying that data (the overhead in processing the data in Ruby is nontrivial for larger datasets, certainly). I may end up going for a PostgreSQL store in the end, but giving Redis a shot and seeing how it performed with the task was if nothing else a good learning experience.

    My previous work with EVE Metrics was basically recording millions of points of data on many, many metrics (roughly 100 gigs of data if you include the indexes), which was quite Fun to optimize and get working reliably fast without a powerful server (which, as ever, is the limiting factor). With 8 gigs of RAM also hosting the app servers and a few other websites plus a legacy MySQL server for things that won’t play with PgSQL on that box, it gets crowded fast- obviously throw 16/32/64+ gigs of RAM at the problem and performance will always improve.

    For this particular setup we’re using a Linode 512 for the current hosting, though we may have to migrate to something with a little more oomph before too long- 512MB of RAM with an email server and web/app servers doesn’t leave much for DB caching!

    I don’t actually play EVE any more, though most of my previous work was focused on the game – CCP’s broken the game too comprehensively lately and made some really stupid decisions about the direction to take with what was once a great virtual world. A real shame – I may come back when they’re done with Incarna, but fleet combat really made it for me and 0.0’s pretty dead these days; fights never happen because you nearly always know the outcome before you engage, so it’s just shooting POS structures and the like now.

    What strikes me as potentially an interesting project would be in a similar vein to redistat but with pluggable storage components and focused around time-series data. If nothing else, it would make comparative benchmarking of storage and retrieval of such data much simpler, and potentially lead to a nice framework to use for time series data storage. I may give that a stab this evening and throw something up on GitHub.

  5. Mark Cotner

    Well your use of time series data in presorted lists is ideal. I’d look for redistat to follow your lead eventually. I’ve wanted to take on a project using redis like this for some time. Turns out I blogged about using redis for this some time ago . . . and forgot. :)


    If your data requirements aren’t huge, and peak insert speed is over 5k/sec then redis is hard to beat.


  6. Mark Smith

    Your needs probably aren’t such that this is interesting to you, but my company (StumbleUpon) just released an open source timeseries database called OpenTSDB (http://opentsdb.net/). It’s built on the HBase platform (Java) which allows it to do some amazing scaling.

    Also, hai Ix! Long time no see. I’m sorry you’re not on IRC anymore and that the jerks won. :(

Comments are closed.