MythTV and Freesat

Or- how to make TV worthwhile if you happen to have a leftover Sky dish on your house.

So when I moved in to my current university digs, the previous tenants had left a few things behind. Notably, they’d had Sky. So we had a Sky box in the living room and a dish on the wall. In the UK, if you want fast internet these days, you need Virgin Media. VM gives you cable TV in the bundle, so I didn’t want to pay for Sky. But Freesat’s got some nice stuff on it, including BBC HD and that sort of thing. So how about we get ourselves some free TV? Continue reading MythTV and Freesat

How to fix voting by popularity

At SURHUL, the Student’s Union of Royal Holloway, University of London, we have a problem. I’m sure it’s not an uncommon one, particularly at student’s unions.

Our electoral system is essentially a popularity contest. Manifestos, campaigning and student outreach have very little impact on the results. Many positions are uncontested and whoever runs wins by virtue of being the candidate who is running; people assume that this means that they care about the position enough to run, and that’s enough for them.

I don’t think this is a good way to run elections, and it’s not something that should be encouraged. But it’s something that can be very easily fixed, or at least I think so.

Continue reading How to fix voting by popularity

Building Backchat, Part 2

Or: How I learned to give up on projects.

Okay, so, Backchat was hugely interesting as a project. Eventually, I produced a set of graphs using the classifier that showed sentiment over time. These graphs aren’t too accurate but are fairly good at showing how things were going. However, after this I pretty much dropped the project. This was mainly due to exams cropping up and stealing my time away, but also because of how difficult it was to approach a sensible level of accuracy.

In my ‘final’ design I ended up using a bigram classifier. I added parsing of the tweets to pull out mentions of words, URLs and users, and then used this to generate my training sets, which improved things a lot. This gave me several thousand tweets for each training set, which worked okay. However, even with this classifier, which was doing a lot better than most others, my results weren’t very reliable on a tweet-by-tweet basis. Still, it wasn’t too shoddy, and the graphs on the right are fairly reliable I think in terms of general sentiment.

The AMQP-linked network of processors worked extremely well, and resulted in good throughput- I used two parsers, two classifiers and one classifier loader in the end; I was unable to achieve realtime performance due to network constraints. Sadly my ISP at home had decided that I’d used too much bandwidth and clamped me down to 128 kilobits a second. That said, thanks to the streaming API I did not (as far as I know, except for a few hundred to ratelimiting) lose any tweets, I just received them out of order and then reconstructed the correct order using the timestamps for each tweet. The machine I was using for this also pretty much went flat out on disk I/O and CPU usage, but was able to keep up- it’s a fairly old box, only a Pentium 4 with a couple of gigs of RAM.

In any case this was an interesting project and I’ll be open sourcing the data and source in the coming weeks if anyone wants to have a poke at it. While the debates are now gone and done, I’m sure people can come up with some great uses for sentiment analysis outside of UK politics.