Anonymous | Login | Signup for a new account | 2024-04-25 05:03 UTC |
My View | View Issues | Change Log | Roadmap | Zandronum Issue Support Ranking | Rules | My Account |
View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||
0000058 | Zandronum | [All Projects] Bug | public | 2010-09-25 04:52 | 2018-09-30 19:47 | ||||
Reporter | AlexMax | ||||||||
Assigned To | Torr Samaho | ||||||||
Priority | normal | Severity | minor | Reproducibility | always | ||||
Status | closed | Resolution | fixed | ||||||
Platform | Linux | OS | Ubuntu | OS Version | 10.04 x86 | ||||
Product Version | 98c | ||||||||
Target Version | Fixed in Version | 98d | |||||||
Summary | 0000058: Skulltag doesn't play nice with Linode VPS. | ||||||||
Description | I really wish that I had a better explanation for you than this, but this is what I have to go on. When I run skulltag-server's on a Linode VPS, the server will 'hiccup' every so often. By hiccup, I mean that all players either experience latency, or experience a few seconds worth of "waiting for server". The reasons I don't think this is a host problem is that performance of the VPS is not affected at all when Skulltag is pitching a fit. If I very quickly alt-tab over to an open shell and test its responsiveness, it responds fine, the CPU is well within normal limits as is the RAM, and no obvious latency in input, all while the server is still "Waiting for server". This is at the new jersey location, by the way, if anyone on the team has a linode already. | ||||||||
Attached Files | st_weirdlag.pcap [^] (1,161,529 bytes) 2010-09-26 17:19 | ||||||||
Notes | |
(0000153) Eruanna (reporter) 2010-09-25 08:50 edited on: 2010-09-25 08:51 |
I can't fully confirm this as I don't have this problem often on Obsidian - however, I have occasionally received reports of such problems on there. Obsidian is a London Linode. Since it never seems to happen that often when I am playing, I really can't put my personal experience into this - but anyone who plays on Obsidian regularly might. Most of the time my issues with Obsidian are network related - the datacenter has been having issues lately over the past couple of weeks. |
(0000155) AlexMax (developer) 2010-09-25 14:45 edited on: 2010-09-25 16:08 |
Just out of curiosity, what distribution and kernel do you use? My Linode is located at New Jersey and was using Ubuntu (I'm testing Debian to see if it has the same issues). For what it's worth, a friend of mine hosts a single server at their texas location and never has any issues. |
(0000156) TIHan (reporter) 2010-09-25 17:43 |
I have a linode in Atlanta that I have been using for a month and I haven't had any problems with "waiting for server". I would like to know what might be causing it to help you out. I'm running on Debian 5.0. |
(0000176) Torr Samaho (administrator) 2010-09-25 22:05 |
Are you sure that this only happens when servers are running on a Linode VPS? |
(0000177) AlexMax (developer) 2010-09-25 23:06 |
I actually moved away from Linode a year and a half ago because of these sort of issues. I happened to move back and what do you know, it persists. However, it's not consistent. Today I have not had any issues. What sort of information would be useful to you when it starts exhibiting that behavior again? I was planning on doing a tcpdump when I see it happening, is there anything else I should be recording? |
(0000187) AlexMax (developer) 2010-09-26 17:20 |
I had complaints about this last night. The attached file is a tcpdump of all UDP activity on my server. I have taken a tcpdump of an offending 'skip' and have pinpointed one of the times where there is a hiccup. By filtering out all packets except those going to and from my personal computer (ip.addr == 174.108.7.186), I have pinpointed a particular incident where I remember having a conversation with players on my server, and I say "like now?" and the words took forever to get there. Using wireshark on the log with that filter listed in parenthesis above, you can clearly see where usually the server sends out a bunch of UDP packets per second to my IP, there are none from second 27.944 to 29.072, in the space of which my client attempts two packets. Clearing the filter at that point, you can see that the server sends out no packets at all...except for Skulltag master heartbeat queries...from 27.944 to 29.070. Seeing as tcpdump is located locally at my servers connection, it appears at first glance that it's not a network issue, but rather an issue with the server getting 'stuck' for a second or so. |
(0000215) AlexMax (developer) 2010-10-02 15:53 |
Any ideas here? I've got the Linode for another month, so I'd really like to get this resolved soonish. |
(0000220) Torr Samaho (administrator) 2010-10-02 20:53 |
Does the server make any DNS lookup during the time it's not responding? Or does it reparse any of your IP lists (banlist/whitelist/adminlist)? |
(0000255) AlexMax (developer) 2010-10-03 17:42 |
I'm in the process of getting a tcpdump that will show this, as the old tcpdump only shows UDP connections. In the meantime, this is how I will recreate my testing environment, step by step, so you know exactly what I'm working with: 1. Make sure you have a Linode 512 hosted at Newark, New Jersey. It should have nothing else on it at all. 2. From the main screen, click "Deploy a Linux distribution". On the next screen, select Ubuntu 10.04 LTS as your distro, swap disk is 512 MB and your deployment disk drive is 15872 MB. Put in a root password and click okay. 3. Once your distro is set up, boot into it. SSH into it with either putty or command line SSH. 4. At the command prompt: # aptitude install libsdl1.2-dev screen tcpdump That will install everything having to do with SDL in one fell swoop. 5. At the command prompt: # adduser yournickname # su yournickname We're going to create our own user for this thing. 6. At the command prompt: $ mkdir ~/skulltag $ cd ~/skulltag $ wget'http://skulltag.net/download/files/testing/98d/SkullDev98d-2954linux-x86.tar.bz2 [^]' $ wget'http://www.skulltag.com/download/files/release/st-v098c_linux-base.tar.bz2 [^]' $ tar jxvf SkullDev98d-2954linux-x86.tar.bz2 $ tar jxvf st-v098c_linux-base.tar.bz2 Installing Skulltag. Nothing to see here. Make sure and grab doom2.wad somehow too, since that's dependent on where you get doom2.wad from. 7. Close your SSH session and log back in as your user. From there, at the command prompt: $ screen $ cd skulltag $ ./skulltag-server -port 10666 +map d2ctf1 8. Start playing. If you don't see any "skipping", try again a few hours later. Once you do, create a new screen tab (Ctrl-a then c) and then: $ su Password: <put in your root password $ tcpdump -Xs0 > ~/traffic.pcap Go back to your server, wait for at least one good "skip" and then Ctrl-C out of tcpdump. I'll let you know what the results are... |
(0000299) AlexMax (developer) 2010-10-10 00:48 |
This doesn't seem to happen consistently, in fact performance has thus far ranged from good to just a hiccup here and there that wasn't at all like what I was running into a while back. I'll let you know when I have another tcpdump ready for you. In the meantime, I have a ton of FFA servers set up on the Linode, and hopefully I can get some people on there to test while I'm observing the hiccuping... |
(0000430) AlexMax (developer) 2010-10-17 19:52 |
I decided to cancel my Linode VPS subscription and get another VPS. So far, I haven't had the same kind of issues....though AOW2 did have one huge hiccup, but that might have been AOW2 and it's nowhere near as choppy as it was on linode. |
(0000431) AlexMax (developer) 2010-10-17 20:26 |
Oh wow, on this new VPS I'm getting a similar symptom. You know that "AOW2 huge hiccup" I mentioned? It just happened again on my Private CTF server. I'll get a tcpdump to you guys as soon as I can... |
(0000433) AlexMax (developer) 2010-10-17 22:22 edited on: 2010-10-17 22:26 |
Okay, I got a tcpdump. My lists were empty at the time of the hiccup. The log is too huge to post here, but again at second 1102.33944 the server communicates with a specific IP that I'm following, let's call it #.#.#.#. #.#.#.# sends a UDP packet to the server at 1022.36501, 1022.38774, 1022.41871, 1022.44244. And...silence from the server until second 1042.02150...over 20 seconds later! Expanding things to look at ALL packets, there seems to be a huge blackout of server communication for about 20 seconds. So was the entire server frozen? I don't think so. One time this was happening a few hours before when I was still connected, I alt tabbed from the server, moved my cursor around, and didn't hear the server catch up for at least a few more seconds. So yeah, that's it. Nothing else was going on, no DNS no nothing. |
(0000436) AlexMax (developer) 2010-10-18 16:18 edited on: 2010-10-18 16:18 |
Had an interesting conversation with InterServer, my new VPS host: "Does Skulltag use UDP? In /proc/user_beancounters I see some failures in dgramrcvbuf dgramrcvbuf 0 262144 262144 262144 101894 Nothing else has hit a fail count. dgramrcvbuf = "The total size of receive buffers of UDP and other datagram protocols." " Their VPS management page also provides a neat little graph for dgramrcvbuf_f (failure rates of dgramrcvbuf). Lo and behold, I saw a spike that corresponded exactly to the lag I was getting on my server. Reading up about dgramrcvbuf, I found that it's a VPS option...this new VPS uses OpenVZ, but I suspect Xen has something similar: 'http://wiki.openvz.org/UBC_secondary_parameters#dgramrcvbuf [^]' At the end of the day, they increased a setting on my VPS called numothersock, which is described here: 'http://wiki.openvz.org/UBC_primary_parameters#numothersock [^]' Hopefully, this will solve the problems. However, is there anything perhaps abnormal that Skulltag might be doing to run up against this limit? |
(0000453) Torr Samaho (administrator) 2010-10-23 19:56 |
A Skulltag server should only allocate one socket when it's started and is supposed to keep using it then. Do you know what numothersock was set to before and is set to now? |
(0000454) AlexMax (developer) 2010-10-23 22:45 edited on: 2010-10-23 22:45 |
No idea what it was set to before, but this is what the numbers are set to now: uid resource held maxheld barrier limit failcnt dgramrcvbuf 0 351424 352144 352144 25644 numothersock 22 44 460 460 0 This is what it looks like on a day with a 22/24 AOW2 and 8/15 Private CTF. That failcnt is a tally from October 19th, in which time both numothersock and dgramrcvbuf itself was increased. Why dgramrcvbuf itself? Well, according to the wiki: "The dgramrcvbuf limits usually don't need to be high. Only if the Container needs to send and receive very large datagrams, the barriers for both othersockbuf and dgramrcvbuf parameters should be raised." After reading this, I concluded that instead of overwhelming the number of other sockets, Skulltag was instead sending very large UDP packets back and forth, and overflowing the buffer with packets that are too damn big. The first time I had them bump the number, the graph that showed failures was still going up under heavy traffic, but it was halved. Since then, they've bumped it twice, and tonight is the first time since then that there has been a significant number of players on my AOW2 and Priv CTF servers. I'll let you know how many failures I get tonight. |
(0000455) AlexMax (developer) 2010-10-24 00:13 edited on: 2010-10-24 00:16 |
Just so you know, it's still overflowing the buffer every so often with those numbers. The stoppages aren't not nearly as bad, but people are still complaining. :( |
(0000460) Torr Samaho (administrator) 2010-10-24 08:43 |
If it's a problem with the UDP packet size, you can try to lower sv_maxpacketsize. The default is 1400, I'd try lowering it to 1024. |
(0000485) AlexMax (developer) 2010-10-29 21:19 edited on: 2010-10-29 21:20 |
Aha! That seems to have fixed things. I've been running without any dgramrcvbuf failures for a couple of days now, and my servers have been under an AOW2-sized load a few times since then. This might be a useful thing to add to the wiki, by the way. Thanks! |
(0000499) Torr Samaho (administrator) 2010-11-01 11:54 |
I'm glad to hear that! Since the current default value of 1400 seems to cause problems, I'm lowering the default value to 1024 again (like it was in 98b and older versions). |
This issue is already marked as resolved. If you feel that is not the case, please reopen it and explain why. |
|
Supporters: | No one explicitly supports this issue yet. |
Opponents: | No one explicitly opposes this issue yet. |
Issue History | |||
Date Modified | Username | Field | Change |
2010-09-25 04:52 | AlexMax | New Issue | |
2010-09-25 08:50 | Eruanna | Note Added: 0000153 | |
2010-09-25 08:51 | Eruanna | Note Edited: 0000153 | View Revisions |
2010-09-25 14:45 | AlexMax | Note Added: 0000155 | |
2010-09-25 14:46 | AlexMax | Note Edited: 0000155 | View Revisions |
2010-09-25 16:08 | AlexMax | Note Edited: 0000155 | View Revisions |
2010-09-25 17:43 | TIHan | Note Added: 0000156 | |
2010-09-25 22:05 | Torr Samaho | Note Added: 0000176 | |
2010-09-25 22:05 | Torr Samaho | Status | new => feedback |
2010-09-25 23:06 | AlexMax | Note Added: 0000177 | |
2010-09-25 23:06 | AlexMax | Status | feedback => new |
2010-09-26 17:19 | Anonymous | File Added: st_weirdlag.pcap | |
2010-09-26 17:20 | Anonymous | Note Added: 0000186 | |
2010-09-26 17:20 | Anonymous | Note Deleted: 0000186 | |
2010-09-26 17:20 | AlexMax | Note Added: 0000187 | |
2010-10-02 15:53 | AlexMax | Note Added: 0000215 | |
2010-10-02 20:53 | Torr Samaho | Note Added: 0000220 | |
2010-10-02 20:53 | Torr Samaho | Status | new => feedback |
2010-10-03 17:42 | AlexMax | Note Added: 0000255 | |
2010-10-03 17:42 | AlexMax | Status | feedback => new |
2010-10-09 15:13 | Torr Samaho | Status | new => feedback |
2010-10-10 00:45 | Anonymous | Note Added: 0000298 | |
2010-10-10 00:45 | Anonymous | Note Deleted: 0000298 | |
2010-10-10 00:48 | AlexMax | Note Added: 0000299 | |
2010-10-10 00:48 | AlexMax | Status | feedback => new |
2010-10-17 19:52 | AlexMax | Note Added: 0000430 | |
2010-10-17 20:26 | AlexMax | Note Added: 0000431 | |
2010-10-17 22:22 | AlexMax | Note Added: 0000433 | |
2010-10-17 22:26 | AlexMax | Note Edited: 0000433 | View Revisions |
2010-10-18 16:18 | AlexMax | Note Added: 0000436 | |
2010-10-18 16:18 | AlexMax | Note Edited: 0000436 | View Revisions |
2010-10-23 19:56 | Torr Samaho | Note Added: 0000453 | |
2010-10-23 19:57 | Torr Samaho | Status | new => feedback |
2010-10-23 22:45 | AlexMax | Note Added: 0000454 | |
2010-10-23 22:45 | AlexMax | Status | feedback => new |
2010-10-23 22:45 | AlexMax | Note Edited: 0000454 | View Revisions |
2010-10-24 00:13 | AlexMax | Note Added: 0000455 | |
2010-10-24 00:16 | AlexMax | Note Edited: 0000455 | View Revisions |
2010-10-24 08:43 | Torr Samaho | Note Added: 0000460 | |
2010-10-24 08:43 | Torr Samaho | Status | new => feedback |
2010-10-29 21:19 | AlexMax | Note Added: 0000485 | |
2010-10-29 21:19 | AlexMax | Status | feedback => new |
2010-10-29 21:20 | AlexMax | Note Edited: 0000485 | View Revisions |
2010-11-01 11:54 | Torr Samaho | Note Added: 0000499 | |
2010-11-01 11:54 | Torr Samaho | Status | new => resolved |
2010-11-01 11:54 | Torr Samaho | Fixed in Version | => 98d |
2010-11-01 11:54 | Torr Samaho | Resolution | open => fixed |
2010-11-01 11:54 | Torr Samaho | Assigned To | => Torr Samaho |
2012-06-09 13:22 | Torr Samaho | Category | General => Bug |
2018-09-30 19:47 | Blzut3 | Status | resolved => closed |
Copyright © 2000 - 2024 MantisBT Team |