MantisBT - Zandronum
View Issue Details
0000822Zandronum[All Projects] Bugpublic2012-04-29 02:422024-07-17 17:08
AlexMax 
Kaminsky 
normalminoralways
resolvedfixed 
LinuxUbuntu10.04 x86-64
98d 
3.23.2 
0000822: Servers freeze after 24 days uptime
Pretty simple. As soon as my Skulltag servers had reached 25 days uptime, people started asking me where they were. I took a look at supervisorctl, and _all_ of my servers were unresponsive except for the ones that were restarted earlier. This is output from supervisor:

st-duel32-duel RUNNING pid 1943, uptime 25 days, 8:51:01
st-duel32-duel2 RUNNING pid 1936, uptime 25 days, 8:51:01
st-idl2012-atf RUNNING pid 3459, uptime 17 days, 23:51:05
st-idl2012-ctf RUNNING pid 1937, uptime 25 days, 8:51:01
st-idl2012-privctf RUNNING pid 1944, uptime 25 days, 8:51:01
st-idl2012-scrimctf RUNNING pid 1942, uptime 25 days, 8:51:01

Of the six servers, the only one that was up was my "Attack the Flag" test, and it was only up for 17 days. This isn't the first time this has happened either.
No tags attached.
Issue History
2012-04-29 02:42AlexMaxNew Issue
2012-04-29 02:46AlexMaxNote Added: 0003487
2012-04-29 11:17DevilHunterNote Added: 0003491
2012-06-09 13:22Torr SamahoCategoryGeneral => Bug
2013-06-09 20:44jwaffeNote Added: 0006410
2013-06-09 20:45jwaffeNote Edited: 0006410bug_revision_view_page.php?bugnote_id=6410#r3526
2013-06-09 20:45jwaffeNote Edited: 0006410bug_revision_view_page.php?bugnote_id=6410#r3527
2013-06-22 21:55Konar6Note Added: 0006478
2013-06-22 21:56Konar6Note Edited: 0006478bug_revision_view_page.php?bugnote_id=6478#r3566
2013-06-23 00:48DuskStatusnew => confirmed
2013-06-23 11:02Torr SamahoNote Added: 0006480
2013-06-23 16:33Edward-sanNote Added: 0006489
2013-06-23 17:25jwaffeNote Added: 0006490
2013-06-23 19:48Torr SamahoNote Added: 0006493
2017-06-07 03:32Ru5tK1ngNote Added: 0017808
2018-01-21 20:26Torr SamahoNote Added: 0019005
2024-03-01 16:42KaminskyNote Added: 0023175
2024-03-01 16:42KaminskyAssigned To => Kaminsky
2024-03-01 16:42KaminskyStatusconfirmed => needs review
2024-03-01 16:42KaminskyTarget Version => 3.2
2024-03-03 21:09KaminskyNote Added: 0023295
2024-03-03 21:09KaminskyStatusneeds review => needs testing
2024-07-17 17:08KaminskyNote Added: 0023797
2024-07-17 17:08KaminskyStatusneeds testing => resolved
2024-07-17 17:08KaminskyFixed in Version => 3.2
2024-07-17 17:08KaminskyResolutionopen => fixed

Notes
(0003487)
AlexMax   
2012-04-29 02:46   
Attack the Flag is running 98e. Will let you know if it crashes too.
(0003491)
DevilHunter   
2012-04-29 11:17   
I think Silvertear has the same issue with his servers

[10:40:11] [@silvertear] yeah they stop working after 25 days or so
[10:40:18] [@silvertear] happens to alexmax's server as well

After he restarted them, they worked just fine.

Haven't noticed this on Armada, but then again, when has any of those servers been on for least a month lol
(0006410)
jwaffe   
2013-06-09 20:44   
(edited on: 2013-06-09 20:45)
I can confirm this behavior on my servers ([IFOC]), all of them go down at the same time, though I figured it was closer to 28 days. This has been happening as long as I have been hosting, even on Skulltag 98D

When this happens to my servers, they disappear from the master server, I can connect to them using -connect, but I get a bright yellow HOM screen. Killing and restarting the servers fixes all problems for another interval of around 28 days.

My server is running the 64 bit linux build on Ubuntu server

(0006478)
Konar6   
2013-06-22 21:55   
(edited on: 2013-06-22 21:56)
This problem is caused by overflow in SERVER_Tick().
The variables which hold the millisecond timers work with the LONG datatype, and thus overflow when the server has been running for 2,147,483,647 msec, or 24 days and ~ 20 hours.
The timer is however provided by an SDL function SDL_GetTicks() which itself is just Uint32 anyway, so I guess the fix won't be as easy as switching our variables to a 64bit datatype (and switching to ULONG would only postpone the problem to 49 days?)

According to Firestone, he doesn't suffer from this problem on his Windows servers.

(0006480)
Torr Samaho   
2013-06-23 11:02   
If it's just an overflow problem in SERVER_Tick(), why aren't the Windows servers affected?
(0006489)
Edward-san   
2013-06-23 16:33   
... it should affect windows servers too, but only after 49 days, which is, coincidentally, the time for SDL_GetTicks.
(0006490)
jwaffe   
2013-06-23 17:25   
Ah... the timer rolls over and the server doesn't know what to do with packets from 49 days ago, or something similar I imagine.

It might not be easy to go from using the absolute value of the counter to using a relative value. I'm not sure how deeply it's coded to use the return value instead of the return value minus a point in time to measure against.
(0006493)
Torr Samaho   
2013-06-23 19:48   
Quote from Edward-san

... it should affect windows servers too, but only after 49 days

SERVER_Tick stores the return value of I_MSTime as LONG. To this should also overflow under Windows after 24 days.
(0017808)
Ru5tK1ng   
2017-06-07 03:32   
Is there any reason SERVER_Tick doesn't use ULong?
(0019005)
Torr Samaho   
2018-01-21 20:26   
The following could help with debugging the issue:

[21:16:19] <Dusk> [20:07:51] <AlexMax> [18:05:29]'https://idea.popcount.org/2013-07-19-how-to-sleep-a-million-years/ [^]'
[21:16:19] <Dusk> [20:07:51] <AlexMax> [18:05:36] this might be useful for trying to fix the 24 day bug
(0023175)
Kaminsky   
2024-03-01 16:42   
'https://foss.heptapod.net/zandronum/zandronum-stable/-/merge_requests/36 [^]'
(0023295)
Kaminsky   
2024-03-03 21:09   
The merge request above got pushed into the default branch of the repository.
(0023797)
Kaminsky   
2024-07-17 17:08   
[12:52 PM] Sean: most of my servers have been running for at least 93 days straight now
[12:54 PM] Sean: my 3.1 servers where I backported that fix are at 135 days

Since Sean's Blue Firestick servers, as of writing this, have been running for this long and still work fine, I'll mark this issue as resolved.