<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Talk Unafraid &#187; twitter</title>
	<atom:link href="http://www.talkunafraid.co.uk/tag/twitter/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.talkunafraid.co.uk</link>
	<description>EVE Online, Ruby on Rails and Security</description>
	<lastBuildDate>Wed, 01 Sep 2010 17:12:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Experiments in CL, NLP: Building Backchat, Part 1</title>
		<link>http://www.talkunafraid.co.uk/2010/04/experiments-in-cl-nlp-building-backchat-part-1/</link>
		<comments>http://www.talkunafraid.co.uk/2010/04/experiments-in-cl-nlp-building-backchat-part-1/#comments</comments>
		<pubDate>Sun, 18 Apr 2010 02:23:21 +0000</pubDate>
		<dc:creator>James Harrison</dc:creator>
				<category><![CDATA[Awesome Stuff]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[amqp]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[distributed computing]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[neural networks]]></category>
		<category><![CDATA[nltk]]></category>
		<category><![CDATA[politics]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.talkunafraid.co.uk/?p=862</guid>
		<description><![CDATA[Okay, so I may have something wrong with me. As soon as anything important (in my view) comes up, I have to build an app for it. Well, sometimes. Still, the impulse is strong, and so at 2:30 AM or thereabouts I registered a domain name and got to work. The aim of the project [...]]]></description>
			<content:encoded><![CDATA[<p>Okay, so I may have something wrong with me. As soon as anything important (in my view) comes up, I <em>have </em>to build an app for it. Well, sometimes. Still, the impulse is strong, and so at 2:30 AM or thereabouts I registered a domain name and got to work.</p>
<p>The aim of the project is this: To build a tool to do real-time analysis of Tweets for any event in terms of the sentiment of those tweets towards the various subjects of an event</p>
<p>I am fairly good at doing simple apps quickly. I had all but one component of this app done by the first Leader&#8217;s Debate here in the UK (allowing me to collect my data set for future development- around 185,000 tweets from 35,000 users). <a href="http://assets.talkunafraid.co.uk/2010/04/backchat.png" rel="lightbox[862]"><img class="alignright size-thumbnail wp-image-863" title="Backchat in Components" src="http://assets.talkunafraid.co.uk/2010/04/backchat-150x150.png" alt="" width="150" height="150" /></a> I&#8217;ve thrown in a handy diagram which details the data collection portion of the app as it stands. But here&#8217;s the quick overview:</p>
<ul>
<li>Streamer &#8211; Uses the Twitter streaming API to receive new tweets and for each tweet, throws them onto the appropriate AMQP exchanges</li>
<li>Parser &#8211; Receives a tweet and loads it into the database. Doesn&#8217;t actually do any parsing as such yet, but could be extended to do so (extracting URIs and hashtags are the things I&#8217;m thinking of)</li>
<li>Classifier &#8211; Receives a tweet and does clever stuff on it to determine all subjects and associated sentiments, passing the results back to AMQP</li>
<li>ClassificationLoader &#8211; Receives the results from the Classifier and loads them into the database</li>
</ul>
<p>Now, for starters this app isn&#8217;t done yet, so this is all strictly subject to change. For instance, I&#8217;d like to have the DB loader pass the tweet on to the classifier instead of the streamer since that&#8217;ll let the classifier store with reference to a DB object, and a few things like that. However, this distributed component structure means that I can run multiple copies of every component in parallel to cope with demand, across any number of computers. EC2 included, of course, but I can also use my compute cluster at home where network speed/latency isn&#8217;t a huge issue. Right now I don&#8217;t need that, but it&#8217;s nice to have and doesn&#8217;t involve a lot more work. It also lets me be language-agnostic between components, which leads me to&#8230;</p>
<p>CL/NLP. Short for computational linguistics/natural language processing, this is a seriously badass area of computer science. It&#8217;s still a developing field and a lot of great work is being done in it. As a result, the documentation barely exists, there are no tutorials, no how-to manuals, and what help you have assumes innate knowledge of the field. And I know <em>nothing</em> (Well, I know a fair bit now) about linguistics or computational linguistics or NLP. So, getting started was hard work. I ran into <a href="http://blog.knowtheory.net">knowtheory</a>, a chap in the Datamapper IRC channel of all places who happened to be a linguist interested in CL and whom has helped out substantially with my methods here.</p>
<p>I&#8217;ve gone through about 5 distinct versions and methods for my classifier. The first three were written in Python using the <a href="http://www.nltk.org/">NLTK </a>toolkit, which is great for some stuff but hard to use, especially to get results. That, and using NLTK was giving me very good results but at the cost of speed- several seconds to determine the subjects of a tweet, let alone do sentiment analysis or work out grammatical polarity and all that. Now, getting perfect results at the cost of speed was one way to go, and for all I know it might still be the way to go, but I decided to try a different plan of attack for my fifth attempt. I started fresh in Ruby using the <a href="http://github.com/postmodern/raingrams">raingrams</a> gem for n-gram analysis, and the <a href="http://github.com/luisparravicini/classifier">classifier </a>gem to perform latent semantic indexing on the tweets.</p>
<p>I boiled this down to a really, really simple proof of concept (It&#8217;s worth noting that I spent _days_ on the NLTK approach. Those of you who know me will know that days are very, very rarely used to describe the amount of time I&#8217;d spend on one component of an app to get it to a barely-working stage). I figured I could train two trigram models (using sets of three words) for positive and negative sentiment respectively, then use the total probabilistic chance of a given tweet&#8217;s words (split into trigrams) appearing in either model as a measure of distance. Positive tweets should have a higher probability in the positively trained model, and a lower probability in the negatively trained one. Neat thing is, this technique sort of worked. I trained LSI to pick up on party names etc, and added common words into an unknown category so that any positive categorization would be quite certain. This doesn&#8217;t take into account grammatical polarity or anything like that, but still. Then, using the classifications, I can work out over my initial dataset what the end result was; and here it is:</p>
<pre># Frequencies
Total: 183518 tweets
Labour: 30871, Tory: 35216, LibDem: 25124
# Average Sentiment
#  calculated by sum of (positive_prob - negative_prob)
#  divided by number of tweets for the party
Labour: -0.000217104050691102
Tory: -0.000247080522382047
LibDem: 0.000394512163310021
# Total time for data loading, training and computation
# I could speed this up with rb-gsl but didn't have it installed
real    13m5.759s
user    12m35.800s
sys     0m12.170s
</pre>
<p>So according to my algorithm, the liberal democrats did very well while labour and especially tories didn&#8217;t do so well. Which, if you read the papers, actually fits pretty well. However, algorithmically speaking the individual results on some tweets can be pretty far out, and so there&#8217;s lots of room for improvement. And my final approach I think has to consider part-of-speech tagging and chunking, but I need to work out a way to do that faster to be able to integrate it into a realtime app.</p>
<p>All in all, working on Backchat has so far been hugely rewarding and I&#8217;ve learned a lot. I&#8217;m looking further into CL/NLP and looking at the fields of neural network classifiers for potentially improved results, all of which is great fun to learn about and implement. And hopefully before next Thursday I&#8217;ll have a brand new app ready to go for the second Leader&#8217;s Debate!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.talkunafraid.co.uk/2010/04/experiments-in-cl-nlp-building-backchat-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EVE Fanfest(feed) 2009</title>
		<link>http://www.talkunafraid.co.uk/2009/10/eve-fanfestfeed-2009/</link>
		<comments>http://www.talkunafraid.co.uk/2009/10/eve-fanfestfeed-2009/#comments</comments>
		<pubDate>Sun, 04 Oct 2009 23:35:48 +0000</pubDate>
		<dc:creator>James Harrison</dc:creator>
				<category><![CDATA[EVE]]></category>
		<category><![CDATA[EVE Metrics]]></category>
		<category><![CDATA[MMMetrics]]></category>
		<category><![CDATA[fanfest]]></category>
		<category><![CDATA[flickr]]></category>
		<category><![CDATA[servers]]></category>
		<category><![CDATA[service]]></category>
		<category><![CDATA[stuffisawesome]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.talkunafraid.co.uk/?p=512</guid>
		<description><![CDATA[Well, that fateful time of year comes along again- thousands of EVE Online players meet for fanfest in Reykjavik, Iceland. And I can never make it. This year, my studies conspired against me; except they didn&#8217;t. While unknown until hours beforehand, I actually had no work and a lecture on basic packet switching keeping me [...]]]></description>
			<content:encoded><![CDATA[<p>Well, that fateful time of year comes along again- thousands of EVE Online players meet for fanfest in Reykjavik, Iceland. And I can <em>never make it</em>. This year, my studies conspired against me; except they didn&#8217;t. While unknown until hours beforehand, I actually had no work and a lecture on basic packet switching keeping me in England. Doh.</p>
<p>Anyway. We got a lot of fluff, this year. Aside from further elaboration on stuff already announced, there were actually no major announcements made at fanfest. We did have some interesting info about New Eden, CCP&#8217;s EVE-Online-Online website. And there was some evidence (gasp!) that CCP were listening to third party developer suggestions at the API roundtable.</p>
<p>There was almost enough minor stuff announced to make it worthwhile. We did get a release date for Dominion &#8211; 1st December 2009. But no New Eden with the launch. And knowing CCP we&#8217;ll probably not get API changes till a bit after that. What&#8217;s really awesome though is that we will be getting new APIs. I&#8217;m just hoping they&#8217;re useful APIs&#8230;</p>
<p>Anyway, while I was sitting at home being mostly bored, I decided I&#8217;d had enough pressing F5 on the Twitter search page, and put together a website (ff.mmmetrics.co.uk &#8211; it&#8217;s down now) to grab EVE fanfest feeds from Twitter and Flickr. This became popular enough within a few hours that we had to rip it off the server and give it it&#8217;s own Amazon EC2 virtual server, as it was in danger of crashing ISKsense and EVE Metrics. Doh. A wild success, in any case, for a simple but handy website. What the website did make us realise is how little headroom we have on our current server. We kinda knew that already but it did make the point quite well.</p>
<p>EVE Metrics 2.1 has launched mostly well but we&#8217;re still having issues with the API processing code. Makurid has been working hard to pin down the cause of the problems and destroy it while I&#8217;ve been fixing up servers and moving sites around, and we&#8217;re getting a bit closer to having a complete fix. We&#8217;re not there yet, but we will be soon with any luck.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.talkunafraid.co.uk/2009/10/eve-fanfestfeed-2009/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>RiCal and Google Calendar</title>
		<link>http://www.talkunafraid.co.uk/2009/07/rical-and-google-calendar/</link>
		<comments>http://www.talkunafraid.co.uk/2009/07/rical-and-google-calendar/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 00:29:52 +0000</pubDate>
		<dc:creator>James Harrison</dc:creator>
				<category><![CDATA[Charactr]]></category>
		<category><![CDATA[Code Snippets and Examples]]></category>
		<category><![CDATA[EVE Metrics]]></category>
		<category><![CDATA[MMMetrics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[calendar]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[eve metrics]]></category>
		<category><![CDATA[ical]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.talkunafraid.co.uk/?p=401</guid>
		<description><![CDATA[So, there was a request on Twitter from ChainerCygnus to get Google Calendar support in Charactr, so I went ahead and implemented it. You can now access an iCal feed of skill changes on Charactr on the characters page, and it uses the Charactr API key to authenticate so it works in anything, no need [...]]]></description>
			<content:encoded><![CDATA[<p>So, there was a request on Twitter from <a href="http://twitter.com/chainercygnus/status/2556207281">ChainerCygnus</a> to get Google Calendar support in Charactr, so I went ahead and implemented it. You can now access an iCal feed of skill changes on Charactr on the characters page, and it uses the Charactr API key to authenticate so it works in anything, no need for HTTP Basic authentication support or anything.</p>
<p>Implementing was actually really easy. I grabbed the RiCal gem off Github, threw in the config.gem line in environment.rb, and added this to the characters controller index action:</p>
<div class="geshi no ruby">
<div class="head">respond_to do |format|</div>
<ol>
<li class="li1">
<div class="de1">&nbsp; <span class="kw3">format</span>.<span class="me1">html</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw3">format</span>.<span class="me1">ics</span> <span class="kw1">do</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; rna = <span class="br0">&#91;</span><span class="st0">&#39;(Untrained)&#39;</span>,<span class="st0">&#39;I&#39;</span>,<span class="st0">&#39;II&#39;</span>,<span class="st0">&#39;III&#39;</span>,<span class="st0">&#39;IV&#39;</span>,<span class="st0">&#39;V&#39;</span><span class="br0">&#93;</span> <span class="co1"># Used below for roman numerals</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; cal = RiCal.<span class="me1">Calendar</span> <span class="kw1">do</span> <span class="sy0">|</span>cal<span class="sy0">|</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; cal.<span class="me1">add_x_property</span><span class="br0">&#40;</span><span class="st0">&#39;X-WR-CALNAME&#39;</span>,<span class="st0">&#39;Charactr&#39;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="re1">@characters</span>.<span class="me1">each</span> <span class="kw1">do</span> <span class="sy0">|</span>char<span class="sy0">|</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> char.<span class="me1">skills_in_training</span>.<span class="me1">length</span> <span class="sy0">&amp;</span>gt; <span class="nu0">0</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; char.<span class="me1">skills_in_training</span>.<span class="me1">each</span> <span class="kw1">do</span> <span class="sy0">|</span>s<span class="sy0">|</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cal.<span class="me1">event</span> <span class="kw1">do</span> <span class="sy0">|</span>e<span class="sy0">|</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; e.<span class="me1">summary</span> <span class="st0">&quot;#{char.name} finishes #{s.type.name} #{rna[s.level]}&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; e.<span class="me1">description</span> <span class="st0">&quot;#{char.name} finishes #{s.type.name} #{rna[s.level]}&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; e.<span class="me1">dtstart</span> s.<span class="me1">end_time</span><span class="nu0">-10</span>.<span class="me1">minutes</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; e.<span class="me1">dtstart</span> s.<span class="me1">end_time</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><span class="co1"># event</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span> <span class="co1"># skills loop</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span> <span class="co1"># skillintraining &amp;gt; 0</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="kw1">end</span> <span class="co1"># char loop</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">end</span> <span class="co1"># calendar</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; render <span class="re3">:text</span><span class="sy0">=&amp;gt;</span>cal.<span class="me1">export</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">end</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">end</span></div>
</li>
</ol>
</div>
<p>So as you can see, it&#8217;s a snap. Rails ships with a preconfigured MIME type for the :ics format, so it&#8217;s handled properly by nginx automatically. Still having some issues making Google Calendar accept the feed&#8217;s name as Charactr, mind you, but the rest works flawlessly and is available for all Charactr users right now.</p>
<p>On a side note, we&#8217;ve set up the <a href="http://twitter.com/mmmetrics">@mmmetrics</a> Twitter account as a shared account between all the MMMetrics team; if you&#8217;re having problems with any of the sites and use Twitter that&#8217;s a pretty good port of call. We&#8217;ll also be using that for announcements and so on, so it might make sense to follow if you&#8217;re interested in what we do.</p>
<p>It&#8217;s been a busy day on EVE Metrics 2, so I leave you with a few teaser screenshots. We&#8217;re almost done with the basic market view pages, and it&#8217;s now a matter of implementing the smaller features- market favourites, map features, and the APIs. EM2 won&#8217;t be an instant fullfilment of every feature promised over the past year or so; we&#8217;re taking the development slowly. EM2 at release won&#8217;t be as feature-packed as EM1, but it&#8217;ll work a whole lot better! Once we&#8217;ve gotten it released (hopefully in just under a week) we&#8217;ll be adding features and maintaining the site continuously to get all the features you want implemented without affecting performance.</p>

<a href='http://www.talkunafraid.co.uk/2009/07/rical-and-google-calendar/2009-07-09_1746/' title='Regional Statistics Overview'><img width="150" height="150" src="http://assets.talkunafraid.co.uk/2009/07/2009-07-09_1746-150x150.png" class="attachment-thumbnail" alt="Regional Statistics Overview" title="Regional Statistics Overview" /></a>
<a href='http://www.talkunafraid.co.uk/2009/07/rical-and-google-calendar/2009-07-09_1653/' title='Price History Graphs'><img width="150" height="150" src="http://assets.talkunafraid.co.uk/2009/07/2009-07-09_1653-150x150.png" class="attachment-thumbnail" alt="Price History Graphs" title="Price History Graphs" /></a>
<a href='http://www.talkunafraid.co.uk/2009/07/rical-and-google-calendar/2009-07-09_1923/' title='Order Lists'><img width="150" height="150" src="http://assets.talkunafraid.co.uk/2009/07/2009-07-09_1923-150x150.png" class="attachment-thumbnail" alt="Order Lists" title="Order Lists" /></a>

]]></content:encoded>
			<wfw:commentRss>http://www.talkunafraid.co.uk/2009/07/rical-and-google-calendar/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
