EVE Scalability Explained

OK, I’ve seen a bunch of posts on the EVE blogosphere about this recently and it’s always been a tricky topic to understand. This post aims to demystify EVE’s architecture and explain in simple terms what EVE’s current issues with scaling for fleet fights are, and approaches for fixing them. So first a disclaimer: I do not work for CCP, I don’t get behind the scenes information. This is a post compiled from several years working on EVE third party development and talking to people who do work at CCP, people who have worked at CCP, and the community at large. To the best of my knowledge this is mostly correct, but I make no promises. If you’re looking for an exact technical description, look elsewhere.

So, let’s start with the basics. This is the (somewhat simplified) hardware layout for Tranquility (click to enlarge).

To sum up in words: There are proxy servers that receive your data and route you to the appropriate sol server, which is running on a sol node or reinforced sol node. These servers communicate with a single, shared database server, which is also used for web services like the API and the MyEVE website (and, soon, Spacebook).

There’s an important distinction to be made here and one that is vital to understanding EVE’s architecture- nodes and servers are not the same thing. Nodes refer to the actual physical hardware (at time of writing, IBM Blade servers) that may run one or more sol servers. Each sol server is, as the name implies (Sol is the name for our sun) responsible for one solar system in EVE. It is a software server process, handling everything that goes on in a system- combat, mining, market, and so on.

EVE’s scalability issues stem from this design, but let’s look at what those issues are. Can EVE handle 56,000 players? Yep, easily. Tranquility will be able to handle many more than that without issue, and because of this design the capacity can be easily expanded by increasing the number of sol nodes for sol servers to run on, spreading the load efficiently and easily. Will you be able to fit 3000 people onto a gate? Nope. Why? Well, because EVE was designed so that the capacity of the whole cluster expanded well, not individual systems. This was a design decision made back in the early days of EVE and it has served EVE well, with the exception of fleet combat and Jita. So how to handle the edge cases?

Well, where does lag come from? Proxy servers have an easy job and they are not a bottleneck in the vast majority of circumstances. The main issues they cause are disconnects; when a proxy server fails, a good chunk of EVE’s inhabitants disappear till they reconnect. The lag is in combat and in high concurrency systems- like Jita, where loads of people trade, talk in local, and fly around suicide ganking each other. This lag stems from intensive processes that have to be done; mathematical steps like calculating transversal velocities between objects, things that have complexity values (algorithmically speaking) of O(n^2) or worse. If you didn’t understand that- well, it just means the more ships you have, the more difficult things get, exponentially.

Obviously, there are optimisations that can be done, better algorithms, and CCP uses them, but the fact remains; this is a lot of work for a computer. Loads. Absolutely shedloads. And that’s all this challenge gets- one computer, at most. In bad cases, it won’t even get that-most sol nodes run multiple servers, the reason why lag sometimes seems to cross between systems- it really can, and does. Reinforced nodes just have more firepower and a guarantee of exclusivity, but they’re still only one computer. And as Google has taught industry, lots of small computers are cheaper, easier to fix, and faster than a single box computer.

True scalability will come to EVE when a sol server can be distributed seamlessly (without rebooting or dropping clients) and near-instantly across multiple sol nodes. That will mean that fleet fights can take all the resources they need, will mean that CCP gets to maintain cheaper hardware, making scaling the hardware cheaper and easier. And you maintain the scalability of the cluster, assuming you keep some hardware spare for sol nodes to grow onto in the event of a fight.

What needs to be done to achieve this? Why haven’t we got this yet? Well, it’s a heck of a lot of work. It’s a huge technical challenge, leaving internet spaceships out of it. Then there’s the hardware prerequisites; you need insanely fast low-latency networking (Infiniband, Fibre Channel, etc), and the extra nodes. It’s a huge investment for CCP, but one they’ll have to make eventually unless they find another way of solving the problem; but any other solution is likely to break immersion and cohesion in the game (grid sharding, etc), and so unlikely.

I hope that helps explain some of the thinking behind EVE’s architecture and why you lost that titan last night. And why it’s likely you’ll lose a few more before it’s fixed.

Minor second disclaimer: It’s 2:30 AM and I’m tired as hell, so this may contain errors. Feel free to point any out in the comments.

Why ECM ships don’t need a nerf, and how to fix ECM properly

CCP, you may have spotted, are planning an ECM nerf. Now, I’m biased in this- I fly support. I’m rarely seen in fleet fights because I’m in a buzzard 250km from anyone else, but if I’m there I’m in a Scorpion or a Kitsune. I’ve not gotten a killmail since 2008, for crying out loud. And therein lies my point.

The changes to Scorp, Falcon and Rook assume that ECM pilots just fit ECM as a side benefit of their guns. This is simply not the case in the vast majority of situations. None of the EW ships are particularly strong with a decent quantity of ECM- using racial jammers you need 4 midslots of ECM, and trying to generate a tank out of your remaining mids doesn’t tend to work well. The goal of the ECM pilot is to reduce the damage output of the opposing force by disrupting target locks and removing one or more ships from the battle in terms of actually dealing damage to your team.

Now, there’s one situation and one ship that has received a lot of attention.

Falcons are fairly legendary for having the ability to lock out to ~200km and to jam from that distance with good strength. This is a great asset in fleet fights and smaller engagements like gate camps where having a few distributed ECM platforms around the place to break up hostile fire concentration can swing the battle to the defenders. However, in smaller fights it can lead to ‘problems’ where the hostiles will be set up for close range high DPS situations with a Falcon or two dotted around the battlefield at long range, well out of the attacker’s range. Problems like the attackers not being able to shoot anything.

What is the solution to this, you might ask? Well, CCP’s answer is to turn the Falcon- a paper-thin, untankable (This is a Caldari ship- shields are all about the midslots, which is where your ECM goes. One or the other, chaps) fairly nippy cruiser that can cloak- into a close-range brawler. Otherwise known as ‘primary’. Unless you jam 100% of the targets 100% of the time, you’ve popped already in 99% of situations. Your cloaking advantage is useless, your tank doesn’t exist, your DPS is tiny compared to, say, a HAC.

Then there’s the Rook, which CCP want to turn into a longish-range platform. But wait- drone bay? Why does a long-range platform need a frakking drone bay? It’s enough for one sentry drone, but that’s about all you could concieveably find useful as a Rook pilot. At 80k or thereabouts, you’ll still die nice and fast but you’ll at least be able to do some damage with your one sentry drone and heavy missiles.

No nerf would be complete without planned changes for every good ship, though- the Scorpion gets messed around with too! No more optimal range bonus for the Scorp- it’s getting brawlerfied too. Now, I’ve got a Scorpion set up for W-Space. Strong tank, a little ECM, and cruises for contributing to the longer-range targets DPS-wise. Now, a torpscorp might work well if there were some more hardpoints/highslots added, but your tank would again destroy your ECM. I’ve only got 3 multispecs fitted, along with two SDAs. It’s not a PvP-worthy fitting, by any means, and reduces the Scorp’s usefulness greatly.

Now- enough ranting. How do we fix ECM properly, to give smaller gangs a chance against these obviously overpowered ships? Fix ECCM. Give Remote ECCM a huge boost, either in strength or by making it an area-of-effect module. Of course RECCM is useless- RECCM providers will simply be jammed. Why not make RECCM a shield effect, similar to a heavy interdictor, and a highslot module instead of a midslot module? Pick a ship class that has a grey area in terms of it’s role and choose that as a specialist platform for RECCM. RECCM shielding could give all ships within the bubble a boost to their sensor strength, making them more resistant to ECM.

By making RECCM a more viable option for players and making it a clearly defined fun role to play, ECM gets more interesting, other pilots get jammed less and are happier, and balance can be restored to small gangs with the addition of a RECCM pilot or some extra ECCM modules on snipers. Nothing needs nerfing; the battlefield just needs evening up to give RECCM pilots a chance to swing the battle back to their side.

Edit: Dev update- they’re not going to nerf the Falcon to short range, and are making the Rook the brawler instead. Unfortunately they still seem to think Scorpions can survive at anything less than sniper range, and that they should not be able to fight beyond 140k. What the fuck?

Apocryhghyhwhahatever. It’s awesome.

OK, so Apocrypha (took a few goes) is now out on TQ.

First off- well done CCPers, you got it out without deleting any INI files and fairly smoothly. Next: WTF, CCPers?

rant do { What the heck is wrong with you? What made you keep the horrid, why-did-they-do-this BITS patcher? Why would you change from a single simple download over HTTP to a broken protocol implemented on top of HTTP that you can’t download using anything but a propietary client? Why, after a huge chunk of the players on Sisi reported huge issues with this patcher and complained like there was no tomorrow, did you keep it in? I’m honestly stunned by that bit of madness.

On a side note, here is a real patch (1.5 gigs), and here‘s the full client. On the topic of gripes (and the reason why I’ve just posted two links instead of one)- CCP, please stop using llwnd. Their CDN is throttled by various ISPs (including BT in the UK), so downloads are terribly slow. Even on calm days downloads cap out at 40kbps, whereas mirroring using wget to my server does at around 2 megs a second and I can download from there at around 800kbps. At least provide some choices, or provide an official torrent of the patch. }

Now, the meat of the topic: Yes, the wormholes are awesome. It’s great to see them finally hit TQ, and the deployment was smooth as anything. Not as much in the way of awesome storyline moments, though- planets exploding managed to be boring in the trailer and the news feed wasn’t anywhere near as edge-of-the-seat as Empyrean. Still, it was a solid effort and didn’t disappoint. It’s good to see most of the bugs got fixed, and some of the more annoying UI gripes are cleaned up.

Also great to see is the welcome introduction of XML import/export of overview settings and fittings. I’ve written a few Ruby modules to parse, edit and write these files which I’ll be publishing as part of my EVE tools gem in the coming days. I finally figured I should roll up all my libraries and tools for EVE and pack them together with appropriate tests, so the overview/fitting parsers, the killmail parser and one or two other neat libraries will be appearing as a package soon via Github. I’m also planning to implement at the least an overview sharing system in EVE Metrics, and possibly fittings later on. The uploader may even get expanded to allow you to submit fittings/overviews quickly and download new ones from the site via a URL handler. So that’s all good.

Looks like the other projector is a dead end. I’m borrowing a more serious multimeter/oscilloscope than what I have to do some further debugging, but the lack of docs is worrying to say the least- hopefully it’s fixable (the power-up issue was a loose connector between the IEC PCB and the power supply PCB, now I get a ‘circuit error’).