Feb 27 10

accVIEW Rejuvenated

by James Harrison

Well, it’s been way too long since I opened an editor and got to work on accVIEW’s source, and it really showed. In reality, accVIEW was something I slapped together in an afternoon for Vanguard Frontiers, home of myself, PyjamaSam (of Capsuleer fame) and some of the best pilots I’ve ever flown with. We needed a better way to do API checks and this was it.

I made it public and popularity grew. I added some features, added the premium option for those who wanted a bit more, and it’s been ticking along, occasionally throwing horrible errors and falling over, the background worker regularly falling over and dying, and running on a Quantum Rise datadump. And there was a major security glitch- we didn’t store API keys, making it impossible to validate people regularly, meaning people who left corporations could still view their old corp’s requests. And they couldn’t update their account to their new corporation.

No more.

accVIEW has gotten a fresh new facelift, skill distribution graphs, a fundamental API key change, some improved code throughout and a new database dump update. I’ve also added a ‘forgot password’ feature for those who don’t remember their logins too well, and fixed a few outstanding bugs.

If you’re an accVIEW user, next time you log in you will be prompted for your API key again. This is to be expected; the reason we’re doing this is so we have a copy we can re-validate regularly (once a day) to ensure that you are still in the corporation you were in last time we looked. If you change corporations, your main character will be dissociated and you’ll have to reenter your API keys next time you log in and choose a new main character.

Enjoy!

Feb 26 10

Varnishing over varnish

by James Harrison

Well, we’ve tried working with Varnish and we’ve given up. After desperately trying to make Varnish play nicely with everything else on the system, we’ve given up and removed Varnish from our application stack entirely. Why? Memory architecture.

Part of the documentation on Varnish’s website is a long architectural explanation that the OS should handle what stays in RAM and what gets swapped to disk, and that Varnish thus should not do any memory management as such. There is a problem, here, however. This design means Varnish will basically assume that the OS will handle contention between itself and other programs.

This is not a smart move. First off, some OSes are terrible at that sort of thing. Linux is pretty good. But here’s the real issue; take a database server like PostgreSQL. PostgreSQL correctly lets the OS handle disk caching rather than replicating efforts internally. This is a great move and means that you don’t have to guess how much RAM you can let PostgreSQL take up for disk caching; the OS handles it all. Since it’s just caching, sometimes that space can be reallocated to programs which need some RAM, and later given back to PostgreSQL (or any other app).

varnishd was regularly climbing to around 4-6 gigabytes of RAM usage, forcing even application memory into swap, and completely removing any memory from the OS for disk caching, having a terrible knock-on impact on performance of PostgreSQL on the same machine. I should point out that the 4-6 gigabyte figure was obtained while running varnishd with a 1 gigabyte disk cache.

Basically, if you want to run Varnish (and there are many good reasons to; it’s a fantastic cache server other than this issue) you need a dedicated machine to run it. The architecture of the software makes it impossible for it to coexist on a server with other programs. We even tried having Monit restart it when it reached 1 gigabyte of RAM usage, but it still had a terrible impact and the caching was impacted by it. While having a 45% cache hit on Varnish was a lovely thing, and helped reduce load on our backend servers, it was slowing the backend servers down enough for that to not really work out at all.

With the 1 gigabyte of RAM we freed by removing Varnish, we’ve added four more application servers to EVE Metrics. These are more than coping with demand, and we’re happily seeing things stay nice and stable even with a lot of API accesses. So far, then, so good.

On a side note, users of the popular accVIEW application will be happy to know I’m spending a chunk of time this weekend improving the app and adding some very much needed features, like persistent API key storage for users so that corporate security can be maintained even when people leave corporations or join new ones, forgot password features, and performance improvements.

Feb 20 10

What’s in your EVE space?

by James Harrison

Well, the latest EVE blogger craze seems to be posting pics of your EVE workspace. And so here’s mine, albeit some months ago.

Many empty bottles of diet coke were carefully removed before this picture was taken

The setup is fairly simple. There’s a 22″ monitor running on a Mac Mini for trading and work, the 24″ and 17″ widescreen displays are running on the desktop machine, which is a custom built box running Win7×64 (the only Windows machine in the room, out of 12 machines). That machine is based around an Asus mainboard with an E6600 2.4Ghz chip, a 275 series BFGtech OC2 graphics card, and 4GB of DDR2 RAM. There’s ~1.2TB of disk space across a bunch of disks in that machine, and another ~5TB distributed throughout the room. The storage is mostly full of backups and development snapshots of databases.

The multimonitor setup is either used with an EVE client per monitor, or more often just the one EVE client on the main machine or the Mac depending on what I’m working on (Perl is on the Mac, Ruby is on the dev server via the desktop). The 17″ monitor is nearly always showing IRC (Xchat) and any Pidgin windows I have open for MSN/SILC/XMPP. All in all, it works out great.

The desk above is my setup while I’m at university in term-time; back home in the holidays, this is how things usually look.

(Excuse the shoddy censoring of design notes on my whiteboard)

Same setup, though you get a look at the desktop this time. The minifridge is vital to day-to-day operations, and it gets dragged along to uni. The netbook above isn’t really useful for EVE, being a netbook and all that, but it is handy as a little linux machine.

Also in my EVE space are 4 fairly beefy network switches, various other bits of networking gear (particularly a Ubiquiti Networks Bullet M2HP running in bridge mode while I’m at uni to bridge my wired network in my room with the wireless LAN to the router downstairs), an APC Smart-UPS 1500VA rackmount for handling powercuts (though the router isn’t on battery backup, so this doesn’t permit uninterrupted EVE quite yet), and a grand total (at this exact moment) of 6 other server-class machines in 19″ pizzaboxes. These boxes get used for testing things, one is set up as a file server, one is set up as a development server (where I do all my work for EVE Metrics etc).

Aside from computers, there’s always a comfortable chair, my trusty speakers and amplifier, and a large pile of books. Recently I’ve been keeping some weights in reach of my chair so I can do some reps while I wait for things to load or compute, which has been working well. As far as books of the moment go, I’ve just finished Terry Pratchett’s latest, Unseen Academicals, which is very much worth reading if you’re into Pratchett’s stuff. Secrets and Lies: Digital Security in a Networked World by Bruce Schneier is my new ‘current’ book, though Iain M. Banks’ stuff gets reread quite often these days.

Feb 11 10

Learning (a tale of memory)

by James Harrison

We never really stop learning. Learning is perhaps the most important process to occur in our brains; ignore the past, and you are screwed.

This post is a tale of how we’ve been running into issues with memory usage recently, how we’ve been solving and diagnosing it, and the design decisions that have lead to it. read more…

Jan 29 10

Welcome to Pandora

by James Harrison

We’ve successfully moved all sites, email, DNS, and everything else on our old server, Highpoint, to our brand new machine, Pandora.  This has entailed a lot more downtime than we’d anticipated; this has mostly been due to lack of preparation on my part, a glorious DNS cock-up and the added complexity of having Highpoint’s backhaul fail three times as we tried to move across all the data.

In total it was a fairly mammoth operation by our standards; we transferred in excess of 100 gigabytes of data between the servers over the course of 12 hours, shifted over 20 websites and 3 major webapps, and got everything up and running again in under a day once we’d moved it all to the new box. The downtime has been annoying and I’ve certainly learned some lessons for next time, but here’s the flipside…

We’re now running on a much, much roomier machine. We’ve not got the environment perfectly set up and we’ll no doubt spend the next week tuning everything, adjusting things till they’re just right and fixing bugs, as well as adjusting and rewriting chunks of applications to make use of the extended caching capabilities of our new environment. We’re already using this to great effect in the EVE Metrics APIs but we can make better use of caching throughout our apps.

Once we’ve gotten settled in, we should be performing much better and more reliably than previously. We’ve already seen huge performance gains on our database (we can process more than twice as many uploads per second, for example) and we hope to have things even faster soon.

Of course, to achieve this I have been running on more or less an empty tank as far as sleep is concerned and working things in around my life at university, which has been interesting. Still, we’re at the point now where it’s more or less stable and everything basically works, so now I’m going to grab a few hours of sleep before lectures tomorrow, before a long long lie-in on Saturday. Enjoy!

Jan 14 10

Architecture for the future

by James Harrison

After that EVE-centric post on scalability (thanks to HighScalability.com for linking in, hope it was an interesting read), I figured it was time to return to EVE Metrics and other sites- accVIEW and ISKsense.

In the next week we will be migrating to a new server. It’s in the same datacenter with the same host, is a slightly faster machine but has four times as much RAM (8GB) and an additional 10kRPM hard drive. As part of the migration to the new server we’ll be making some changes to the software architecture running the show.

The main difference is that we’re moving away from Passenger, also known as mod_rails. It has some advantages in low-memory conditions, but we’ve had more trouble than it’s worth, so we’ll be moving back to running application servers manually as daemons. For this we’ll be using the excellent Thin application server. For the sites running PHP on the server (this blog, for example), we’ll be using PHP FPM as we are currently; we’ve had no issues with that. Both of those will be sitting as reverse proxies behind nginx. Nginx has done very well as a web server and it’s very fast, as well as being easy to configure.

There is only one other major change; we’ll be sitting nginx itself behind Varnish, a high performance HTTP cache. This will let us more efficiently leverage HTTP caching in our applications and speed up requests dramatically. Right now we don’t use HTTP caching that much; we’d like to change this, particularly in EVE Metrics’ API so we can let Varnish handle a good portion of the thousands of API calls we get asking for the price of trit or what have you. All in all it’ll mean reduced load on the application cluster, which means we can keep that smaller and lighter, which in turn means more room for the database in memory.

That translates to better performance on the more complex components in the site, ie market pages, your account page, corporate pages, and that better performance means we can build more- we’re waiting for the new capacity before we add asset support, one of the things we’re really looking forward to adding, since it will let us add a whole new level of functionality by giving lots more information to processes like our inferred trade detector and our planned fulfilled orders listings. Plus we’ll be adding asset valuation tools, of course.

The architecture I’ve described above will basically be ‘it’ for now; we have more complication at the application and DB layer (We still use MySQL for a few legacy applications, so we have a tiny MySQL server running). The complication at the app layer mainly consists of things like background processing tools, and for EVE Metrics tasks that are actually executed on a VPS and the results uploaded back to the server (we now do all the major CSV dumps on Makurid’s VPS).

As the guy who ends up fixing all this when it goes wrong, simplicity is always my main priority, but the added complexity of Thin and Varnish should be well worth it in the long run.

Jan 12 10

EVE Scalability Explained

by James Harrison

OK, I’ve seen a bunch of posts on the EVE blogosphere about this recently and it’s always been a tricky topic to understand. This post aims to demystify EVE’s architecture and explain in simple terms what EVE’s current issues with scaling for fleet fights are, and approaches for fixing them. So first a disclaimer: I do not work for CCP, I don’t get behind the scenes information. This is a post compiled from several years working on EVE third party development and talking to people who do work at CCP, people who have worked at CCP, and the community at large. To the best of my knowledge this is mostly correct, but I make no promises. If you’re looking for an exact technical description, look elsewhere.

So, let’s start with the basics. This is the (somewhat simplified) hardware layout for Tranquility (click to enlarge).

To sum up in words: There are proxy servers that receive your data and route you to the appropriate sol server, which is running on a sol node or reinforced sol node. These servers communicate with a single, shared database server, which is also used for web services like the API and the MyEVE website (and, soon, Spacebook).

There’s an important distinction to be made here and one that is vital to understanding EVE’s architecture- nodes and servers are not the same thing. Nodes refer to the actual physical hardware (at time of writing, IBM Blade servers) that may run one or more sol servers. Each sol server is, as the name implies (Sol is the name for our sun) responsible for one solar system in EVE. It is a software server process, handling everything that goes on in a system- combat, mining, market, and so on.

EVE’s scalability issues stem from this design, but let’s look at what those issues are. Can EVE handle 56,000 players? Yep, easily. Tranquility will be able to handle many more than that without issue, and because of this design the capacity can be easily expanded by increasing the number of sol nodes for sol servers to run on, spreading the load efficiently and easily. Will you be able to fit 3000 people onto a gate? Nope. Why? Well, because EVE was designed so that the capacity of the whole cluster expanded well, not individual systems. This was a design decision made back in the early days of EVE and it has served EVE well, with the exception of fleet combat and Jita. So how to handle the edge cases?

Well, where does lag come from? Proxy servers have an easy job and they are not a bottleneck in the vast majority of circumstances. The main issues they cause are disconnects; when a proxy server fails, a good chunk of EVE’s inhabitants disappear till they reconnect. The lag is in combat and in high concurrency systems- like Jita, where loads of people trade, talk in local, and fly around suicide ganking each other. This lag stems from intensive processes that have to be done; mathematical steps like calculating transversal velocities between objects, things that have complexity values (algorithmically speaking) of O(n^2) or worse. If you didn’t understand that- well, it just means the more ships you have, the more difficult things get, exponentially.

Obviously, there are optimisations that can be done, better algorithms, and CCP uses them, but the fact remains; this is a lot of work for a computer. Loads. Absolutely shedloads. And that’s all this challenge gets- one computer, at most. In bad cases, it won’t even get that-most sol nodes run multiple servers, the reason why lag sometimes seems to cross between systems- it really can, and does. Reinforced nodes just have more firepower and a guarantee of exclusivity, but they’re still only one computer. And as Google has taught industry, lots of small computers are cheaper, easier to fix, and faster than a single box computer.

True scalability will come to EVE when a sol server can be distributed seamlessly (without rebooting or dropping clients) and near-instantly across multiple sol nodes. That will mean that fleet fights can take all the resources they need, will mean that CCP gets to maintain cheaper hardware, making scaling the hardware cheaper and easier. And you maintain the scalability of the cluster, assuming you keep some hardware spare for sol nodes to grow onto in the event of a fight.

What needs to be done to achieve this? Why haven’t we got this yet? Well, it’s a heck of a lot of work. It’s a huge technical challenge, leaving internet spaceships out of it. Then there’s the hardware prerequisites; you need insanely fast low-latency networking (Infiniband, Fibre Channel, etc), and the extra nodes. It’s a huge investment for CCP, but one they’ll have to make eventually unless they find another way of solving the problem; but any other solution is likely to break immersion and cohesion in the game (grid sharding, etc), and so unlikely.

I hope that helps explain some of the thinking behind EVE’s architecture and why you lost that titan last night. And why it’s likely you’ll lose a few more before it’s fixed.

Minor second disclaimer: It’s 2:30 AM and I’m tired as hell, so this may contain errors. Feel free to point any out in the comments.

Dec 10 09

EVE Mail and training make an appearance

by James Harrison

At last, we get an EVE mail API! It’s a bit rubbish as APIs go – no message bodies yet- but it’s a great step in the right direction. And of course we’ve got it implemented and polished already over at EVE Metrics.

All you need to do is head over to EVE Metrics, log in (or sign up if you’ve not got an account yet), add your API key(s) if you haven’t already, and then enable the EVE mail API method. And voila- EVE mails, in your browser, updated as often as CCP lets us.

The icing on the cake is that we’ve also provided a feed for RSS readers for your EVE mails. Google Reader/iGoogle or any other ATOM-compatible reader (which is basically all of them) can now monitor your ingame EVE mails at the click of a button.

We’ve also gotten around to doing skill training- you can see what you’re training (queue support of course is included) on all your accounts.

The next logical step from here is notification support- get an email or SMS whenever your characters can train a new skill, whenever you get a new EVE mail, whenever one of your market orders is outbid or fullfilled. You name it, I’d love to see it notifiable. We’re still in the early days with that, but that’s where we’d like for that to end up.

We’ll be improving on these and implementing other APIs in the coming days- we want to get notifications loading for all you corporate types, and we’re looking forward to bringing more skill monitoring/information into the UI. I’ve got a lot of ideas bubbling around- we’re getting to the point where we’ve got loads of little snippets of data that can all tie in with each other, creating something really fantastic for you guys and girls, the users. And that’s awesome.

Of course, we need your help to make all this run smoothly and perform well, which it has problems doing at the moment. We’re still asking for donations here, you can buy GTCs in support of us here, and we’ve just opened up advertising on the site through Project Wonderful. Any form of help is hugely appreciated.

Dec 3 09

Moondoggie & Market Browsing

by James Harrison

OK. EVE Metrics is my big market browsing project. It’s very complex, it’s got a lot of data, but it all basically comes down to this: People browse the market with a program running on their computer, and when any market data is viewed, EVE Online writes it to a cache file, the program decodes that and fires it at the server. We collect all these reports and build a single picture of the market in EVE.

There’s the top-down view for you. We’ve never really not had enough data. We get good market coverage in most regions and we’re fairly up to date in the grand scheme of things. But compare the actual market of EVE to EVE Metrics and we’re still a long way off having a truly accurate picture. EVE moves quickly- in some markets, from minute to minute orders will be shuffling around and changing price and being bought out.

With Dominion we got a new browser. This means you can now use the full EVE Metrics website ingame, but also (through some Javascript client hook additions) lets us provide a fantastic new tool to help us get an even better picture of the market in EVE.

If you fire up the IGB and head over to the upload suggestions page, you’ll be given a list of 10 items, and a few options for automatic checking. Choosing this option will prompt EVE Metrics for a list of items to check, and will automatically go and look at those items. It’s slow, but it works. In the space of a few hours with one user, we can get data for an entire region across all the items on the market. This is utterly fantastic and we’re really looking forward to the larger volume of data this is bringing to the site.

So, if you’ve got a spare moment, or you need to go AFK for an hour, or you want to help out while you’re mining, or you’re just tired of clicking the next item in the list, install the uploader and visit the page ingame to get started. Every upload counts and helps us build the biggest, best picture of EVE’s market we can manage to produce. Uploads to EVE Metrics are also syndicated to other websites and tools, of course. Your uploads and contribution of time help hundreds of users who use the site, and tens of thousands more who rely on our pricing, history and order APIs for their applications.

Oh, and if you’re a developer, we now have a server status API with all the information you could possibly want on TQ, Sisi and the API servers. It can be found here (docs here). Enjoy!

Nov 29 09

Dominion

by James Harrison

With Dominion just around the corner, we’re looking at how that’ll affect EVE Metrics. Other than the market getting a few things shaken up as is usual for expansions, things should be minimally impacted. API services will probably be down for a week knowing CCP’s track record of breaking the API ‘just in case’ it affects Tranquility, but apart from that things should be fine.

We have got some things in the works for Dominion, and we hope you’ll find them useful; we’ve not had much time to work on EVE Metrics, and we’re being distracted by another project at the moment, but we’ll have more time to work on EVE Metrics in a few weeks time around Christmas. I’m still evaluating what we’ll spend our time on, though, and we’d like to get more feedback via the feedback button on the site- you can vote for other people’s suggestions, so please do so!

Other than that, not much to report. We’ll have Dominion items loaded into the site by release day so you can start using the site straight away with the new items. We’ve been so far very successful with some performance improvements on the site; this has mostly been tuning our database server and working on improving the performance of queries through better indexes, clustering indexes, and so on. Hopefully you’ll notice this in the form of improved page responsiveness and less ’slow loading’ pages. Enjoy!