On Tuesday night, Insanity Radio did a hugely successful outside broadcast from our student’s union event night. What we did was actually revolutionary in terms of the SU’s radio, and certainly isn’t something I’ve heard of another student station doing. The result was fantastic audio, a great sound overall, and a fun night. And near-zero latency. Here’s how it worked.
Or: How I learned to give up on projects.
Okay, so, Backchat was hugely interesting as a project. Eventually, I produced a set of graphs using the classifier that showed sentiment over time. These graphs aren’t too accurate but are fairly good at showing how things were going. However, after this I pretty much dropped the project. This was mainly due to exams cropping up and stealing my time away, but also because of how difficult it was to approach a sensible level of accuracy.
In my ‘final’ design I ended up using a bigram classifier. I added parsing of the tweets to pull out mentions of words, URLs and users, and then used this to generate my training sets, which improved things a lot. This gave me several thousand tweets for each training set, which worked okay. However, even with this classifier, which was doing a lot better than most others, my results weren’t very reliable on a tweet-by-tweet basis. Still, it wasn’t too shoddy, and the graphs on the right are fairly reliable I think in terms of general sentiment.
The AMQP-linked network of processors worked extremely well, and resulted in good throughput- I used two parsers, two classifiers and one classifier loader in the end; I was unable to achieve realtime performance due to network constraints. Sadly my ISP at home had decided that I’d used too much bandwidth and clamped me down to 128 kilobits a second. That said, thanks to the streaming API I did not (as far as I know, except for a few hundred to ratelimiting) lose any tweets, I just received them out of order and then reconstructed the correct order using the timestamps for each tweet. The machine I was using for this also pretty much went flat out on disk I/O and CPU usage, but was able to keep up- it’s a fairly old box, only a Pentium 4 with a couple of gigs of RAM.
In any case this was an interesting project and I’ll be open sourcing the data and source in the coming weeks if anyone wants to have a poke at it. While the debates are now gone and done, I’m sure people can come up with some great uses for sentiment analysis outside of UK politics.
With Dominion just around the corner, we’re looking at how that’ll affect EVE Metrics. Other than the market getting a few things shaken up as is usual for expansions, things should be minimally impacted. API services will probably be down for a week knowing CCP’s track record of breaking the API ‘just in case’ it affects Tranquility, but apart from that things should be fine.
We have got some things in the works for Dominion, and we hope you’ll find them useful; we’ve not had much time to work on EVE Metrics, and we’re being distracted by another project at the moment, but we’ll have more time to work on EVE Metrics in a few weeks time around Christmas. I’m still evaluating what we’ll spend our time on, though, and we’d like to get more feedback via the feedback button on the site– you can vote for other people’s suggestions, so please do so!
Other than that, not much to report. We’ll have Dominion items loaded into the site by release day so you can start using the site straight away with the new items. We’ve been so far very successful with some performance improvements on the site; this has mostly been tuning our database server and working on improving the performance of queries through better indexes, clustering indexes, and so on. Hopefully you’ll notice this in the form of improved page responsiveness and less ‘slow loading’ pages. Enjoy!