Zandronum Chat on our Discord Server Get the latest version: 3.2
Source Code

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0001178Zandronum[All Projects] Bugpublic2012-11-11 17:082024-03-10 20:40
ReporterWatermelon 
Assigned To 
PriorityhighSeverityminorReproducibilityrandom
StatusclosedResolutionunable to reproduce 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0001178: Memory leak (or something) causes lag spikes on the server
DescriptionIs there anyway to diagnose what could be causing the server to have a lag spike and/or connection issues? While this is a very generic request, this is what I've noticed:

- No matter what host I go to (either VPS, home connection or even a dedicated server like GV), there seem to always be intermittent spikes that occur where everyone in the server sees "Connection interrupted" for a second or so. Sometimes if it's really bad it can last up to 5+ seconds.

- It seems to only happen on one server, which is confusing because if it was the actual host itself you'd think that all the servers would be affected at the same time. Therefore, what could cause only one of (lets say five) servers to be affected while the others don't -- and they are all on the same VPS/dedicated server

- This has happened since 98d as well but we all (the hosts of the servers) thought it was just VPS's possibly handling data incorrectly

- I *only* run 5 servers on the linux system which consumes around 300 or so mega of ram, I have 3700 megs left over and literally nothing else running

- On the BEST-EVER servers, Jenova literally turned off every other running thing in the background that could be turned off and it still occured

- I've had this happen on FOUR VPS's, all of which were different hosts. This is not including Grandvoid which is on a dedicated server

- Happens on any operating system, even linux



Is there any tool I can run to determine this problem? I have no idea what it is though it's starting to become annoying in game and plague most servers.
Additional InformationPlayer amount does not affect the lag. We've had the worst lag spikes at 6 players, and 20+ minutes of nothing when there was 27+ people in pub CTF.
Attached Files

- Relationships

-  Notes
User avatar (0005363)
Watermelon (developer)
2012-11-11 17:10
edited on: 2012-11-11 17:22

I forgot to mention the frequency varies. Sometimes 3 happen in a row. It appears to be random, estimated one every 10 or so minutes (+/- 5 minutes).

It seems the server still processes the shots and stuff on it's own end and still seems to accept incoming input, but falters on sending any data... if that helps.

EDIT: Zandronum seems to have a higher threshold before it displays connection interrupted, whereas on ST it displayed it much more because every little transmission error showed 'connection interrupted'. Here however, Zandronum has a buffer zone so some of the micro-lag spikes you don't actually notice unless you fire a stream of plasma and notice some not coming out.

User avatar (0005375)
Torr Samaho (administrator)
2012-11-11 22:06
edited on: 2012-11-11 22:08

Just to be sure that it's not a hostname lookup issue: Try if setting the CVAR masterhostname manually to the current IP of master.zandronum.com makes any difference.

Quote from Watermelon
It seems the server still processes the shots and stuff on it's own end and still seems to accept incoming input, but falters on sending any data... if that helps.
How do you know? Are you looking at the server console output?

Quote from Watermelon
Zandronum seems to have a higher threshold before it displays connection interrupted
FYI, I'm pretty sure that I didn't touch the threshold, but I fixed some bugs that affect this.

Personal comment: Memory leak is likely the second most commonly misused term (right after crash). I see no indication for a memory leak in your description of the problem. If the overall memory usage of the server is not constantly rising it's no memory leak.

User avatar (0005376)
Watermelon (developer)
2012-11-12 01:03
edited on: 2012-11-12 01:04

Quote
Just to be sure that it's not a hostname lookup issue: Try if setting the CVAR masterhostname manually to the current IP of master.zandronum.com makes any difference.

I'm going to try that tomorrow and get back to you on it ASAP

Quote
Quote
It seems the server still processes the shots and stuff on it's own end and still seems to accept incoming input, but falters on sending any data... if that helps.

How do you know? Are you looking at the server console output?

When the lag spikes happen, if you are holding "+forward" while it happens your character will be much farther ahead of where you are, therefore I assume it has to still be receiving data to process the +forward commands, but for whatever reason the screens on the client freeze. Therefore the only thing I can think of with my limited knowledge is that the update data for the client is not coming through and results in a connection error.

Quote
Personal comment: Memory leak is likely the second most commonly misused term (right after crash). I see no indication for a memory leak in your description of the problem. If the overall memory usage of the server is not constantly rising it's no memory leak.

I think I did mis-use this term here. I don't know what to call it, some kind of overload somewhere?

I'll also be checking into this tomorrow:
Quote
<Konar6> certain lagspikes appear when the server is advertised and there are nameserver issues
<Konar6> don't ask me why

to either confirm or hopefully rule this out.

I'll be setting up Odamex and ZDaemon servers tomorrow to establish if it is only Zandronum it is happening with. If so, what could possibly be causing it?



EDIT: I further had this problem confirmed by a few more server cluster hosts. Whatever this may be, it seems much more widespread than I thought. Could this just be how the internet is nowadays?

User avatar (0005378)
ZzZombo (reporter)
2012-11-12 02:17
edited on: 2012-11-12 02:18

I don't know how it's related but once I timed out from server on my local host playing cooperative on Doom I stock maps without any PWADs. The server didn't crash or something so after reconnect I could play further without any troubles. The client gave me "CLIENT_CheckForMissingPackets: missing more than 1024 packets. Unable to recover" error.

User avatar (0005381)
Torr Samaho (administrator)
2012-11-12 06:28

Quote from Watermelon
When the lag spikes happen, if you are holding "+forward" while it happens your character will be much farther ahead of where you are, therefore I assume it has to still be receiving data to process the +forward commands, but for whatever reason the screens on the client freeze
I'd say more likely the server completely freezes, your system still receives and buffers the network packets from the clients and as soon as the server unfreezes it parses all the client movement commands buffered by the system at once, making it look as if you jump ahead.

Quote
<Konar6> certain lagspikes appear when the server is advertised and there are nameserver issues
<Konar6> don't ask me why

This are the hostname lookup issue I was referring to. The reason is very simple: The server only uses a single thread, calls gethostbyname and has to wait for it to return something.
User avatar (0005397)
Torr Samaho (administrator)
2012-11-15 05:58

Did anybody have a chance yet to check whether it's a hostname lookup issue?
User avatar (0005398)
Konar6 (reporter)
2012-11-15 11:50

I believe this is bogus. Internet connections aren't stable and it's normal for clients and even servers to experience packet loss. I advised Watermelon to try running a different server on the box (such as ZDaemon) to observe its behavior.
Ever since I noticed the lagspikes from DNS issues pointed above (and tracked them to be related to the server resolving the master's hostname), I run a local DNS server, which I would recommend to all dedicated server hosts.

I've had a different issue happen and heard 2-3 times, though probably unrelated, as this one is said to affect all clients. When the only one client can't send any data to the server, but the server data are received fine. So that client can't move or talk, but can see others moving and talking normally. Lasts for a few seconds.
User avatar (0005409)
Torr Samaho (administrator)
2012-11-18 10:26

Quote from Watermelon
I'm going to try that tomorrow and get back to you on it ASAP
This was almost a week ago. Did you have a chance to test this yet?

Quote from Konar6
I believe this is bogus.
I don't question that these lag spikes exists, but I'm not convinced yet that it's a Zandronum bug.
User avatar (0005422)
Watermelon (developer)
2012-11-19 22:07

I will test it as soon as I can, sadly I came down with the flu days ago and I'm barely over the worst part yet. When I get out of the woods I'll update my results
User avatar (0005543)
Watermelon (developer)
2012-12-22 05:24

So I've been testing this out just to see if it's actually the VPS or not.

So far I've got the following statistics:
- ZD does not seem to suffer the same problems
- Unable to get Odamex working so no data here
- The lag seems to be really... weird on Zandronum. At times it happens on *all* the servers briefly which led me to believe that it is just the VPS sucking... but other times it only happens on the server itself which is really unusual. We had multiple people on the same server cluster in different port games and most of the time only one of them would get hammered with this random lag spike and the rest would be fine.
Therefore I'm unsure if the one off lag spike that occurred on all of them was actually just the internet being the internet or if it was an indication that the VPS is not good.

Likewise I've tried four VPS's and all of them have this same lag issue. It's hard to believe that four individual service providers would have this problem. This is not including the other servers like NJ Funcrusher which have this problem, and even Grandvoid has this happening when it's a dedicated server.


Is there some kind of diagnostic tool I can run to check for lag spikes? I'd really like to prove without a shadow of a doubt that it is the VPS/box it's hosted on before I continue on with this ticket.
User avatar (0005549)
Torr Samaho (administrator)
2012-12-23 11:38

Did you test whether it's a hostname lookup issue?
User avatar (0005552)
Watermelon (developer)
2012-12-23 18:12
edited on: 2012-12-23 18:18

What are the steps to rule everything out? I'd like to do as many things in one go to rule them all out


1) DNS server lookup issue -> how to I fix this?

2) Run a local DNS server as Konar said -> how do I do this? never done this before

3) Anything else -> I will do ASAP

The masterhostname points to master.zandronum.com

User avatar (0005553)
Torr Samaho (administrator)
2012-12-23 20:58

This should be a good start:
Quote from Torr Samaho
Try if setting the CVAR masterhostname manually to the current IP of master.zandronum.com makes any difference.
User avatar (0005555)
Watermelon (developer)
2012-12-24 01:01

I've had that set for a while and it made no difference

Issue Community Support
This issue is already marked as resolved.
If you feel that is not the case, please reopen it and explain why.
Supporters: jwaffe Chronos Ouroboros
Opponents: No one explicitly opposes this issue yet.

- Issue History
Date Modified Username Field Change
2012-11-11 17:08 Watermelon New Issue
2012-11-11 17:10 Watermelon Note Added: 0005363
2012-11-11 17:22 Watermelon Note Edited: 0005363 View Revisions
2012-11-11 22:06 Torr Samaho Note Added: 0005375
2012-11-11 22:08 Torr Samaho Note Edited: 0005375 View Revisions
2012-11-11 22:08 Torr Samaho Note Revision Dropped: 5375: 0002955
2012-11-12 01:03 Watermelon Note Added: 0005376
2012-11-12 01:04 Watermelon Note Edited: 0005376 View Revisions
2012-11-12 02:17 ZzZombo Note Added: 0005378
2012-11-12 02:18 ZzZombo Note Edited: 0005378 View Revisions
2012-11-12 06:28 Torr Samaho Note Added: 0005381
2012-11-15 05:58 Torr Samaho Note Added: 0005397
2012-11-15 05:58 Torr Samaho Status new => feedback
2012-11-15 11:50 Konar6 Note Added: 0005398
2012-11-18 10:26 Torr Samaho Note Added: 0005409
2012-11-19 22:07 Watermelon Note Added: 0005422
2012-11-19 22:07 Watermelon Status feedback => new
2012-12-22 05:24 Watermelon Note Added: 0005543
2012-12-23 11:38 Torr Samaho Note Added: 0005549
2012-12-23 18:12 Watermelon Note Added: 0005552
2012-12-23 18:18 Watermelon Note Edited: 0005552 View Revisions
2012-12-23 20:58 Torr Samaho Note Added: 0005553
2012-12-24 01:01 Watermelon Note Added: 0005555
2014-06-14 03:15 Watermelon Status new => closed
2014-06-14 03:15 Watermelon Resolution open => unable to reproduce
2024-03-10 20:40 Ru5tK1ng Relationship added related to 0003873
2024-03-10 20:40 Ru5tK1ng Relationship deleted related to 0003873






Questions or other issues? Contact Us.

Links


Copyright © 2000 - 2025 MantisBT Team
Powered by Mantis Bugtracker