Zandronum Chat on our Discord Server Get the latest version: 3.1
Source Code

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000058Zandronum[All Projects] Bugpublic2010-09-25 04:522018-09-30 19:47
ReporterAlexMax 
Assigned ToTorr Samaho 
PrioritynormalSeverityminorReproducibilityalways
StatusclosedResolutionfixed 
PlatformLinuxOSUbuntuOS Version10.04 x86
Product Version98c 
Target VersionFixed in Version98d 
Summary0000058: Skulltag doesn't play nice with Linode VPS.
DescriptionI really wish that I had a better explanation for you than this, but this is what I have to go on.

When I run skulltag-server's on a Linode VPS, the server will 'hiccup' every so often. By hiccup, I mean that all players either experience latency, or experience a few seconds worth of "waiting for server".

The reasons I don't think this is a host problem is that performance of the VPS is not affected at all when Skulltag is pitching a fit. If I very quickly alt-tab over to an open shell and test its responsiveness, it responds fine, the CPU is well within normal limits as is the RAM, and no obvious latency in input, all while the server is still "Waiting for server".

This is at the new jersey location, by the way, if anyone on the team has a linode already.
Attached Files? file icon st_weirdlag.pcap [^] (1,161,529 bytes) 2010-09-26 17:19

- Relationships

-  Notes
User avatar (0000153)
Eruanna (reporter)
2010-09-25 08:50
edited on: 2010-09-25 08:51

I can't fully confirm this as I don't have this problem often on Obsidian - however, I have occasionally received reports of such problems on there. Obsidian is a London Linode. Since it never seems to happen that often when I am playing, I really can't put my personal experience into this - but anyone who plays on Obsidian regularly might.

Most of the time my issues with Obsidian are network related - the datacenter has been having issues lately over the past couple of weeks.

User avatar (0000155)
AlexMax (developer)
2010-09-25 14:45
edited on: 2010-09-25 16:08

Just out of curiosity, what distribution and kernel do you use? My Linode is located at New Jersey and was using Ubuntu (I'm testing Debian to see if it has the same issues). For what it's worth, a friend of mine hosts a single server at their texas location and never has any issues.

User avatar (0000156)
TIHan (reporter)
2010-09-25 17:43

I have a linode in Atlanta that I have been using for a month and I haven't had any problems with "waiting for server". I would like to know what might be causing it to help you out. I'm running on Debian 5.0.
User avatar (0000176)
Torr Samaho (administrator)
2010-09-25 22:05

Are you sure that this only happens when servers are running on a Linode VPS?
User avatar (0000177)
AlexMax (developer)
2010-09-25 23:06

I actually moved away from Linode a year and a half ago because of these sort of issues. I happened to move back and what do you know, it persists.

However, it's not consistent. Today I have not had any issues. What sort of information would be useful to you when it starts exhibiting that behavior again? I was planning on doing a tcpdump when I see it happening, is there anything else I should be recording?
User avatar (0000187)
AlexMax (developer)
2010-09-26 17:20

I had complaints about this last night. The attached file is a tcpdump of all UDP activity on my server.

I have taken a tcpdump of an offending 'skip' and have pinpointed one of the times where there is a hiccup. By filtering out all packets except those going to and from my personal computer (ip.addr == 174.108.7.186), I have pinpointed a particular incident where I remember having a conversation with players on my server, and I say "like now?" and the words took forever to get there. Using wireshark on the log with that filter listed in parenthesis above, you can clearly see where usually the server sends out a bunch of UDP packets per second to my IP, there are none from second 27.944 to 29.072, in the space of which my client attempts two packets. Clearing the filter at that point, you can see that the server sends out no packets at all...except for Skulltag master heartbeat queries...from 27.944 to 29.070.

Seeing as tcpdump is located locally at my servers connection, it appears at first glance that it's not a network issue, but rather an issue with the server getting 'stuck' for a second or so.
User avatar (0000215)
AlexMax (developer)
2010-10-02 15:53

Any ideas here? I've got the Linode for another month, so I'd really like to get this resolved soonish.
User avatar (0000220)
Torr Samaho (administrator)
2010-10-02 20:53

Does the server make any DNS lookup during the time it's not responding? Or does it reparse any of your IP lists (banlist/whitelist/adminlist)?
User avatar (0000255)
AlexMax (developer)
2010-10-03 17:42

I'm in the process of getting a tcpdump that will show this, as the old tcpdump only shows UDP connections.

In the meantime, this is how I will recreate my testing environment, step by step, so you know exactly what I'm working with:

1. Make sure you have a Linode 512 hosted at Newark, New Jersey. It should have nothing else on it at all.
2. From the main screen, click "Deploy a Linux distribution". On the next screen, select Ubuntu 10.04 LTS as your distro, swap disk is 512 MB and your deployment disk drive is 15872 MB. Put in a root password and click okay.
3. Once your distro is set up, boot into it. SSH into it with either putty or command line SSH.
4. At the command prompt:
# aptitude install libsdl1.2-dev screen tcpdump
That will install everything having to do with SDL in one fell swoop.
5. At the command prompt:
# adduser yournickname
# su yournickname
We're going to create our own user for this thing.
6. At the command prompt:
$ mkdir ~/skulltag
$ cd ~/skulltag
$ wget'http://skulltag.net/download/files/testing/98d/SkullDev98d-2954linux-x86.tar.bz2 [^]'
$ wget'http://www.skulltag.com/download/files/release/st-v098c_linux-base.tar.bz2 [^]'
$ tar jxvf SkullDev98d-2954linux-x86.tar.bz2
$ tar jxvf st-v098c_linux-base.tar.bz2
Installing Skulltag. Nothing to see here. Make sure and grab doom2.wad somehow too, since that's dependent on where you get doom2.wad from.
7. Close your SSH session and log back in as your user. From there, at the command prompt:
$ screen
$ cd skulltag
$ ./skulltag-server -port 10666 +map d2ctf1
8. Start playing. If you don't see any "skipping", try again a few hours later. Once you do, create a new screen tab (Ctrl-a then c) and then:
$ su
Password: <put in your root password
$ tcpdump -Xs0 > ~/traffic.pcap
Go back to your server, wait for at least one good "skip" and then Ctrl-C out of tcpdump.

I'll let you know what the results are...
User avatar (0000299)
AlexMax (developer)
2010-10-10 00:48

This doesn't seem to happen consistently, in fact performance has thus far ranged from good to just a hiccup here and there that wasn't at all like what I was running into a while back.

I'll let you know when I have another tcpdump ready for you. In the meantime, I have a ton of FFA servers set up on the Linode, and hopefully I can get some people on there to test while I'm observing the hiccuping...
User avatar (0000430)
AlexMax (developer)
2010-10-17 19:52

I decided to cancel my Linode VPS subscription and get another VPS. So far, I haven't had the same kind of issues....though AOW2 did have one huge hiccup, but that might have been AOW2 and it's nowhere near as choppy as it was on linode.
User avatar (0000431)
AlexMax (developer)
2010-10-17 20:26

Oh wow, on this new VPS I'm getting a similar symptom. You know that "AOW2 huge hiccup" I mentioned? It just happened again on my Private CTF server. I'll get a tcpdump to you guys as soon as I can...
User avatar (0000433)
AlexMax (developer)
2010-10-17 22:22
edited on: 2010-10-17 22:26

Okay, I got a tcpdump. My lists were empty at the time of the hiccup. The log is too huge to post here, but again at second 1102.33944 the server communicates with a specific IP that I'm following, let's call it #.#.#.#. #.#.#.# sends a UDP packet to the server at 1022.36501, 1022.38774, 1022.41871, 1022.44244. And...silence from the server until second 1042.02150...over 20 seconds later! Expanding things to look at ALL packets, there seems to be a huge blackout of server communication for about 20 seconds.

So was the entire server frozen? I don't think so. One time this was happening a few hours before when I was still connected, I alt tabbed from the server, moved my cursor around, and didn't hear the server catch up for at least a few more seconds.

So yeah, that's it. Nothing else was going on, no DNS no nothing.

User avatar (0000436)
AlexMax (developer)
2010-10-18 16:18
edited on: 2010-10-18 16:18

Had an interesting conversation with InterServer, my new VPS host:

"Does Skulltag use UDP?

In /proc/user_beancounters I see some failures in dgramrcvbuf

dgramrcvbuf 0 262144 262144 262144 101894

Nothing else has hit a fail count.

dgramrcvbuf = "The total size of receive buffers of UDP and other datagram protocols." "

Their VPS management page also provides a neat little graph for dgramrcvbuf_f (failure rates of dgramrcvbuf). Lo and behold, I saw a spike that corresponded exactly to the lag I was getting on my server. Reading up about dgramrcvbuf, I found that it's a VPS option...this new VPS uses OpenVZ, but I suspect Xen has something similar:

'http://wiki.openvz.org/UBC_secondary_parameters#dgramrcvbuf [^]'

At the end of the day, they increased a setting on my VPS called numothersock, which is described here:

'http://wiki.openvz.org/UBC_primary_parameters#numothersock [^]'

Hopefully, this will solve the problems. However, is there anything perhaps abnormal that Skulltag might be doing to run up against this limit?

User avatar (0000453)
Torr Samaho (administrator)
2010-10-23 19:56

A Skulltag server should only allocate one socket when it's started and is supposed to keep using it then. Do you know what numothersock was set to before and is set to now?
User avatar (0000454)
AlexMax (developer)
2010-10-23 22:45
edited on: 2010-10-23 22:45

No idea what it was set to before, but this is what the numbers are set to now:

       uid resource held maxheld barrier limit failcnt
            dgramrcvbuf 0 351424 352144 352144 25644
            numothersock 22 44 460 460 0

This is what it looks like on a day with a 22/24 AOW2 and 8/15 Private CTF. That failcnt is a tally from October 19th, in which time both numothersock and dgramrcvbuf itself was increased. Why dgramrcvbuf itself? Well, according to the wiki:

"The dgramrcvbuf limits usually don't need to be high. Only if the Container needs to send and receive very large datagrams, the barriers for both othersockbuf and dgramrcvbuf parameters should be raised."

After reading this, I concluded that instead of overwhelming the number of other sockets, Skulltag was instead sending very large UDP packets back and forth, and overflowing the buffer with packets that are too damn big. The first time I had them bump the number, the graph that showed failures was still going up under heavy traffic, but it was halved. Since then, they've bumped it twice, and tonight is the first time since then that there has been a significant number of players on my AOW2 and Priv CTF servers. I'll let you know how many failures I get tonight.

User avatar (0000455)
AlexMax (developer)
2010-10-24 00:13
edited on: 2010-10-24 00:16

Just so you know, it's still overflowing the buffer every so often with those numbers. The stoppages aren't not nearly as bad, but people are still complaining. :(

User avatar (0000460)
Torr Samaho (administrator)
2010-10-24 08:43

If it's a problem with the UDP packet size, you can try to lower sv_maxpacketsize. The default is 1400, I'd try lowering it to 1024.
User avatar (0000485)
AlexMax (developer)
2010-10-29 21:19
edited on: 2010-10-29 21:20

Aha! That seems to have fixed things. I've been running without any dgramrcvbuf failures for a couple of days now, and my servers have been under an AOW2-sized load a few times since then. This might be a useful thing to add to the wiki, by the way.

Thanks!

User avatar (0000499)
Torr Samaho (administrator)
2010-11-01 11:54

I'm glad to hear that! Since the current default value of 1400 seems to cause problems, I'm lowering the default value to 1024 again (like it was in 98b and older versions).

Issue Community Support
This issue is already marked as resolved.
If you feel that is not the case, please reopen it and explain why.
Supporters: No one explicitly supports this issue yet.
Opponents: No one explicitly opposes this issue yet.

- Issue History
Date Modified Username Field Change
2010-09-25 04:52 AlexMax New Issue
2010-09-25 08:50 Eruanna Note Added: 0000153
2010-09-25 08:51 Eruanna Note Edited: 0000153 View Revisions
2010-09-25 14:45 AlexMax Note Added: 0000155
2010-09-25 14:46 AlexMax Note Edited: 0000155 View Revisions
2010-09-25 16:08 AlexMax Note Edited: 0000155 View Revisions
2010-09-25 17:43 TIHan Note Added: 0000156
2010-09-25 22:05 Torr Samaho Note Added: 0000176
2010-09-25 22:05 Torr Samaho Status new => feedback
2010-09-25 23:06 AlexMax Note Added: 0000177
2010-09-25 23:06 AlexMax Status feedback => new
2010-09-26 17:19 Anonymous File Added: st_weirdlag.pcap
2010-09-26 17:20 Anonymous Note Added: 0000186
2010-09-26 17:20 Anonymous Note Deleted: 0000186
2010-09-26 17:20 AlexMax Note Added: 0000187
2010-10-02 15:53 AlexMax Note Added: 0000215
2010-10-02 20:53 Torr Samaho Note Added: 0000220
2010-10-02 20:53 Torr Samaho Status new => feedback
2010-10-03 17:42 AlexMax Note Added: 0000255
2010-10-03 17:42 AlexMax Status feedback => new
2010-10-09 15:13 Torr Samaho Status new => feedback
2010-10-10 00:45 Anonymous Note Added: 0000298
2010-10-10 00:45 Anonymous Note Deleted: 0000298
2010-10-10 00:48 AlexMax Note Added: 0000299
2010-10-10 00:48 AlexMax Status feedback => new
2010-10-17 19:52 AlexMax Note Added: 0000430
2010-10-17 20:26 AlexMax Note Added: 0000431
2010-10-17 22:22 AlexMax Note Added: 0000433
2010-10-17 22:26 AlexMax Note Edited: 0000433 View Revisions
2010-10-18 16:18 AlexMax Note Added: 0000436
2010-10-18 16:18 AlexMax Note Edited: 0000436 View Revisions
2010-10-23 19:56 Torr Samaho Note Added: 0000453
2010-10-23 19:57 Torr Samaho Status new => feedback
2010-10-23 22:45 AlexMax Note Added: 0000454
2010-10-23 22:45 AlexMax Status feedback => new
2010-10-23 22:45 AlexMax Note Edited: 0000454 View Revisions
2010-10-24 00:13 AlexMax Note Added: 0000455
2010-10-24 00:16 AlexMax Note Edited: 0000455 View Revisions
2010-10-24 08:43 Torr Samaho Note Added: 0000460
2010-10-24 08:43 Torr Samaho Status new => feedback
2010-10-29 21:19 AlexMax Note Added: 0000485
2010-10-29 21:19 AlexMax Status feedback => new
2010-10-29 21:20 AlexMax Note Edited: 0000485 View Revisions
2010-11-01 11:54 Torr Samaho Note Added: 0000499
2010-11-01 11:54 Torr Samaho Status new => resolved
2010-11-01 11:54 Torr Samaho Fixed in Version => 98d
2010-11-01 11:54 Torr Samaho Resolution open => fixed
2010-11-01 11:54 Torr Samaho Assigned To => Torr Samaho
2012-06-09 13:22 Torr Samaho Category General => Bug
2018-09-30 19:47 Blzut3 Status resolved => closed






Questions or other issues? Contact Us.

Links


Copyright © 2000 - 2024 MantisBT Team
Powered by Mantis Bugtracker