MantisBT - Zandronum
View Issue Details
0001531Zandronum[All Projects] Bugpublic2013-10-08 18:022017-11-09 09:59
Tiger 
 
normalminoralways
closednot fixable 
DesktopDebian [32bit]
1.3 
 
0001531: Gameserver Interrupt Request - Timeslice Hog
When hosting a Zandronum Gameserver, the gameserver itself will take a lot of 'Interrupt Request' from the system main resources regardless of their activity state. When hosting an abundant of Zandronum gameservers, the system will have to accommodate the load - and by doing this the system will run sluggish. Unfortunately, I am unsure what is causing this demand. I am not exactly sure where this issue mainly comes from or what is causing such a high demand. However, this issue has been going on since forever - at least known. This issue was not only in Zandronum, but in SkullTag as well, though I am not exactly sure what version issue originally originated from.
Demonstrating the issue:
After shutting down all of the Zandronum processes, the IRQ demands literally reduced from (~ 70,000 Interrupts Per Second) to (~ 727 Interrupts Per Second)
    
    http://wads.rfc1337.net/ArmadaExtra/Terra_IRQ_ZandronumHost_1.png
    
After 15 minutes or 30 minutes, we restarted all of the Zandronum processes again and recorded the next result a few moments later:
    
    http://wads.rfc1337.net/ArmadaExtra/Terra_IRQ_ZandronumHost_2.png
Server Processes (known):
    Zandronum gameservers [Armada or based on Armada foundation]
    TeamSpeak
    mySQL
    Ventrilo
    nginx

External Resources:
        IRC Log:
            http://wads.rfc1337.net/ArmadaExtra/armadaCHAN_IRQ.txt
        Images:
            [Killed all Zandronum processes]
                http://wads.rfc1337.net/ArmadaExtra/Terra_IRQ_ZandronumHost_1.png
            [Restarted all of the Zandronum Gameservers]
                http://wads.rfc1337.net/ArmadaExtra/Terra_IRQ_ZandronumHost_2.png


Too Much Information (not sure if this is necessary, but if incase):
    Gameserver Cluster: Armada {Hellspawn}
    Gameservers: 20 <= x <= 40 {approx currently: ~ 20}
    Changelogs:'http://wads.rfc1337.net/ArmadaExtra/Hellspawn_Changelog [^]' {LEGACY: Hellshire'http://wads.rfc1337.net/ArmadaExtra/Hellshire_Changelog} [^]'
    IPv4: 64.71.152.93
    Port Rage: 10666 <= x <= 10705
    GameMode Majority: Cooperative Based
    Known Problems [external]: Zandronum IRQ;
    Known Problems [Internal]: Possible BrutalDoom bug\crash (needs confirmation);
    Packet size (from frame): 1024 {Default}
    Binary(s): zandronum_host [32bit];
No tags attached.
Issue History
2013-10-08 18:02TigerNew Issue
2013-10-08 18:12TigerNote Added: 0007352
2014-06-15 14:33WatermelonNote Added: 0009379
2014-06-15 14:33WatermelonStatusnew => feedback
2015-01-20 05:56mifuNote Added: 0011454
2015-01-20 06:19Torr SamahoNote Added: 0011455
2015-01-20 07:19mifuNote Added: 0011456
2015-01-20 17:57Torr SamahoNote Added: 0011458
2015-01-20 17:57Torr SamahoNote Edited: 0011458bug_revision_view_page.php?bugnote_id=11458#r6462
2015-01-20 17:57Torr SamahoNote Revision Dropped: 11458: 0006461
2015-01-20 17:58DuskSteps to Reproduce Updatedbug_revision_view_page.php?rev_id=6464#r6464
2015-01-20 17:58DuskAdditional Information Updatedbug_revision_view_page.php?rev_id=6466#r6466
2015-01-20 23:30mifuNote Added: 0011463
2015-01-20 23:32mifuNote Edited: 0011463bug_revision_view_page.php?bugnote_id=11463#r6470
2015-01-20 23:45mifuNote Edited: 0011463bug_revision_view_page.php?bugnote_id=11463#r6471
2015-01-21 00:05DuskProduct Version => 1.3
2015-01-21 07:33Torr SamahoNote Added: 0011470
2015-01-21 23:31mifuNote Added: 0011496
2015-01-22 07:43mifuNote Added: 0011497
2017-11-09 09:59DuskStatusfeedback => closed
2017-11-09 09:59DuskResolutionopen => not fixable

Notes
(0007352)
Tiger   
2013-10-08 18:12   
Adding '~' caused some gibberish to the integers, additionally I was unable to modify this ticket to correct the issue.
(0009379)
Watermelon   
2014-06-15 14:33   
Is it possible to make this happen in 2.0?
(0011454)
mifu   
2015-01-20 05:56   
Hello.

Sorry we are late for this but it seems we are experiencing the same issue with 2.0

I do not have any graph data as of yet however and I am working on that at the moment.
(0011455)
Torr Samaho   
2015-01-20 06:19   
Does this only happen in 2.0 or is 1.4 also affected?
(0011456)
mifu   
2015-01-20 07:19   
We have not tested 1.4 yet but we can check. I am not going to be able to do this now however as I am currently at work (and it will take some time to get up a cluster on a box.) I will post here again when I test it out

As for more information, it is important to note that for 2.0 I am using different hardware then the Armada cluster which is running 1.3 I believe. Same thing happens on both servers.
(0011458)
Torr Samaho   
2015-01-20 17:57   
Quote from mifu
As for more information, it is important to note that for 2.0 I am using different hardware then the Armada cluster which is running 1.3 I believe. Same thing happens on both servers.
Does this mean that 1.3 is also affected? In that case, there is no need to test 1.4.

(0011463)
mifu   
2015-01-20 23:30   
(edited on: 2015-01-20 23:45)
Yes 1.3 is affected. What I should really do is test this on a windows machine since both of the server boxes use linux. Would that be worth it?

EDIT: I should of mentioned that my server box that uses 2.0 runs on Ubuntu 14, while 1.3 is on Debian. Sorry about that.

(0011470)
Torr Samaho   
2015-01-21 07:33   
Right now I have no idea what could be causing this. The last time this problems appeared for Skulltag versions that were not affected beforehand, it was caused by DNS lookup problems. These problems should be long fixed now though.

Was 1.3 always affected or did this just start happening recently? Hopefully, your graph data will give us some hints.
(0011496)
mifu   
2015-01-21 23:31   
I believe 1.3 was always affected to begin with. I had noticed this back in skulltag but I always thought it was a resources problem, so I just cut down the amount I was hosting.

Of course when I got a powerful box we noticed the same issue occurring. Gameplay of the servers was affected also where players will get frequent lag spikes. We had plenty of RAM left at the time and the CPU was hardly touched. Network was also fine as well. What we did notice however is a lot of context switching was happening while the servers are up.

Maybe thats the cause if it or if there is something else instead. Either way, I will be getting some graphs of the system itself tonight and hopefully it will provide some hints of what could be causing it.
(0011497)
mifu   
2015-01-22 07:43   
here is the graph I am referring to >'http://demonizer.allfearthesentinel.net/munin/allfearthesentinel.net/demonizer.allfearthesentinel.net/interrupts.html [^]'

This is my Aussie server that experiences this issue currently.
If i stopped some of my servers, context switching drops, servers are all happy and the system works like a charm.