It's always been cause for concern that Blizzard, due to their publisher connection, chose to run World of Warcraft's servers out of France. A country not exactly renown for their hard core Internet expertise. Still, deal with it they did but more or less as you'd expect. Every patch-time downtime was longer than the US. Servers in general are less reliable than the US. Latency to the servers is generally higher than that to the US (from folks in the US) and so on.
However at time of writing, things have taken a dramatic turn for the worse. The game is essentially unplayable in the context of 40-man raiding which is basically most of what we're doing in WoW these days, since we've been playing it since it came out. It takes an enormous amount of effort to put together a raid. People have to sign on our web raid calendar, an officer makes selections. Then people show up, someone passes out potions from the raid bank, substitutions are made, player stuff and potions are crafted/brewed on the spot, groups are set up, tactics explained, everyone sorted on teamspeak and you're ready to go.
Only to have half the raid disconnect as soon as you pull. Meaning a wipe, which means everyone has to run back and buff up and you do it all over again. Blizzard are nothing if not a modern e-commerce firm in that they make it pretty much impossible to engage them in a one on one way unless, of course, it's a billing issue. So people have had to turn to the technical support forum. That filling up with crap they finally made available WoW*ConcernsEU*blizzard.com (delete first, replace second with an @) to email. So I sent them an email, largely in futility, but the nuts and bolts are:
At the moment you're chucking a party every night and everyone shows up to discover there's no beer. Tell us there's no beer and we'll go to another party instead.
Scarily we have about 130 accounts in our guild which is well over a grand (in proper money) a month they're getting from us. Nice work if you can get it and it would be, you'd think, worth a simple communication to tell us what the hell is up and when they might fix it. The only thing they do say is that it's a network issue but generally our latency is fine. We just get disconnected.
This leads me to speculate on what's going on in a purely non productive and fun sort of way. Some of which I can deduce with evidence and some of which is just a guess based on a pretty rudimentary knowledge of how to do server stuff but mostly guesswork. I don't often speculate so please take this with a large pinch of salt.
We know that they run the continents seperately. Perhaps two game server processes on the same box. We also know that they have a seperate instance server. It's not clear how much hardware but it wouldn't need to be on the same box. Certainly the fact that they've implemented cross-realm PvP would seem to indicate that they use different boxes for instance servers or at least PvP.
There's three general 'lag' issues you see in WoW near as I can tell. First of all your standard network latency. This is a bit retarded since the in-game display is averaged over a very long time so if your latency comes good, it still stays read for ages. Which means people disconnect and reconnect just to make it green, thinking they've fixed their latency when in fact all they did was clear the latency measuring buffer. They really ought to fix that.
The second lag issue is what I'll call server lag. This is basically the game coming back and acknowledging an action you've performed. If you've got sever lag, you'll see the start of a casting animation for example, but not any more. The action wont complete. Also this manifests as chat lag too. This lag comes and goes all the time even when the servers are working okay-ish.
Thbe third lag is what I'll call database lag. Basically if you do anything that results in a write to yours or anyone elses character data, it's pretty obvious this requires a write to a database. The most obvious way this manifests is in 'trade lag'. You trade someone a simple item and it takes ages to complete after you've both hit accept. Warlocks also know this lag because unfortunately a db write is triggered when you soul drain just before killing a mob to get a soul shard. In fact eventually the game often gives up and you don't get your shard.
So, imagening a server topography, per 'server' like Runetotem, we have two realm server processes which act like game servers we're familiar with like Quake. This just handles people walking around and talking. The processes cover the two big island continents in WoW. Presumably they will add a third for the expansion. We also have a big ass database which, I think we can reasonably assume, is common to everything on the server.
Note: Further rampant speculation. For cross-server PvP, I think each PvP server game instance box has read/write access to the databases for each game server. When you zone in, it takes a copy of your current character and when you zone out, it writes it back to your server DB. This wasn't working very well post the patch and I suspect hardware upgrades were associated with this somehow.
We also have instance server processes. We don't know if they're on the same box or not. Potentially this is a really obvious way of spreading the server load since instances are by their very nature quite bandwidth intensive compared to the regular old world server. So I suspect they are on a different box but we can't know that. On the realms instance box, instances are, I guess, simply spawned as a processes when people enter.
Which brings us to the issue we're having right now. We all zone into the instance and people disconnect. Why? I don't know. I think it can be shown that WoW disconnects people when the server is trying to send them a lot of data and they can't keep up either there's a network problem along the way or it becomes a client issue. There has to be a client component to that because some people are more susceptable than others and some have influenced how likely you are to disconnect by removing mods. Which is quite strange.
But see here's the thing that's not immediately obvious to people running WoW. When you run a mod, they insert triggers based on events and when the event happens, functions are run and lua code starts doing stuff. Any time it does stuff, what it tends to do is use some predefined functions to check the status on this or that, normally just to initialise variables to use for later. I might not be explaining this very well. It's a bit like having a person that may be called upon telling you how many red objects are in the room. The only thing is, this person is in a black box. He has to gather information about the scene one time only before we close the lid on the box. WoW mods are like that. They don't run all the time, many of them are actually being triggered on instance zoning in. You can see them all trigger in your combat/general logs scrolling in the little windows.
No problem there you think? Well actually there sort of might be. I don't think WoW actually pulls down every bit of data about everything, in fact I know it doesn't. If you trigger a function which checks on, say, the reputation of a given faction of the targetted player, deep within WoW's binary it has to inject a request for that data into the upstream communication with the server. Then it has to get a result back (stalling your lua script and often your level load/zone incidentally) before the lua function can return the data your script is asking for. Why is this a problem?
Well, we run a LOT of mods. All of them are basically trying to memorise quite a lot of stuff from the instance and all of them end up injecting those requests into your WoW network communication to the server, and then sit back waiting for results back from the server before the lua scripts can exit. This takes awhile and it's quite a lot of data per client. At some point the WoW instance server process appears to become unhappy with this and disconnects you. Under exactly what conditions it's impossible to tell.
So why is it worse now than it was before? I have no idea. However, if I had to guess I'd say that the instance server is running on the cross-realm PvP server hardware. it might be being loaded by players from several servers now and is hence busier than it was. That's the only real change as of the last patch and that's what would reasonably require a hardware upgrade, which was the change that immediately preceeded the disaster we're currently seeing.
An instance server ends up facing the same sort of challenges that a busy web server does. Lots of people connecting wanting stuff. The more people that do it, the longer it takes to serve a connection and at some point you end up failing miserably to cope and the number of current connections all slowed to crawl becomes so long, you have no choice but to time them out or run out of memory. Also you want to time some out, to free up the process to try serve someone else. On a web server you get a "too many users" message, on WoW you disconnect.
You can, I think, improve things a bit by removing mods because you're asking for less data from your level load for the reasons I mentioned above, but really it's a straight capacity issue and a software design flaw.
Where it gets really strange, and which subverts the above theory to some extent, is that the disconnections issue does tend to resolve itself when you've reduced the amount of alive mob in the instance such as the famous Blackwing Lair 'suppression rooms' which are instantly crammed with many hundreds of monsters. I say subverts because your mods don't have any data to query regarding mob in your view other than one you have targetted. For the quantity of mob to be an issue, we start getting onto an issue which I do know a little bit about.
Network code in action multiplayer games. Normally games tend to try cull things you can't see if they are beyond a certain distance as the crow flies. They do tend to send you data of entities which you can't see who are within a certain distance because you might hear them and it cuts down on the computational cost of working out if you can see something or not.
I think it can be empirically proven that WoW sends the basic location and movement vectors of every mob in an instance regardless of whether you can see them or not or how far away you are. (Well okay, I think there's probabily a maximum visibility distance but it's a long way). You can often break WoW a little and target something through a wall from a mile away. AQ40 is a good example, you can target something much later in the incident by looking left around the left point of that rock outcrop. BWL is a particularly bad instance in this regard because it's built a bit like a cube. As the crow flies, you're very close to a huge quantity of mob right as you zone in from the front door.
Does it subvert my point or does it back it up? So now not only do we have your scripts and your general WoW client querying a load of variables and issuing server requests through your netcode connection, but also the damn instance server is trying to send you updates of every single mob in the instance at the same time. It's worse than that. Even the most basic netcode, and we'll assume WoW is exactly that, doesn't send 3D coordinates for ever mob and facing direction for every update. Rather it's somewhat compressed and while there are various schemes, basically it says 'mob x moved left 3 meters' or, if it's smarter, 'mob x is moving left at 3 meters per second'. The client then takes the previous know position and makes those changes.
There is one vital exception to this. When you zone. Your client has no idea where the mob are so it has to get full 3D positions, model ids and modification attributes, alignments, animation stage positions and so on all when you zone in. Only when it's done that can it send you those deltas in the compact form later.
So what we have is you're zoning in and the instance server is telling you about every mob in the suppression rooms and probably fecking Nefarian as well, and your scripts are screaming out for updates at the same time. The absolute data that was sent to you was right only for a bit and if mob is moving around (and it is in the suppression rooms) then you've got a bunch of stacking up delta data as well which you need to get before you're properly 'zoned in'.
This comes back to my original conclusion of bad or at least incomplete software design. Much of this issue has been solved by people making multiplayer computer games in the past and Blizzard kind of made things worse for themselves by having this event triggering system for interface add-ons. I shudder to think, with the amount of people playing, the impact it would have in having a modern network code strategy on the combined bandwidth. Significant, I think, highly fucking significant.
The strange thing is, Blizzard are no stranger to optimisation. You may not recall how bad IF was at one point. How long it took to load in to IF in particular. They changes the code to basically zone you in and kind of stream the entity updates to you after you'd loaded so players popped into the world. They do not, however, appear to have applied that strategy to instances and mob and now we're paying for it. There might be a reason, I guess, such as you could zone in and get past some mob near the door because it hasn't popped into your world yet. I would say the easiest fix would be to apply the incremental entity loading and just fix player position until it's complete and the world has caught up.
At any rate, I hope the above serves to illustrate that I actually believe the problems we're facing are less about levels of hardware and network performance issues than software design. That being the case you start to understand why they're not saying anything about the time frame. Or indeed anything at all.