MantisBT - Zandronum
View Issue Details
0001035Zandronum[All Projects] Bugpublic2012-09-12 23:492017-11-08 21:30
NotJenova 
 
normalcrashhave not tried
closedunable to reproduce 
LinuxUbuntu10.04 x86-64
 
 
0001035: Crash on string manipulation functions?
Players hosting the new Chillax WAD on my servers have been reporting random server crashes. I've decided to investigate the issue, and it seems to be a SEGFAULT.

Here is the initial crash message and server startup command run:

'http://pastebin.com/x24u26uB [^]'

Here is the Zandronum crashlog:

'http://pastebin.com/cFgTr7Zt [^]'

And here is me running valgrind on zandronum-server:

'http://pastebin.com/tW7iwL2P [^]'
No tags attached.
txt kpatch-crash-on-chillax (0001035).txt (7,075) 2013-04-02 05:04
/tracker/file_download.php?file_id=955&type=bug
txt zandronum-crash-0x18-memory-access (0001035).txt (7,825) 2013-04-04 21:04
/tracker/file_download.php?file_id=959&type=bug
Issue History
2012-09-12 23:49NotJenovaNew Issue
2012-09-13 12:31Edward-sanNote Added: 0004661
2012-09-13 14:21NotJenovaNote Added: 0004663
2012-09-15 19:31Torr SamahoNote Added: 0004673
2012-09-15 19:32Torr SamahoStatusnew => feedback
2012-09-15 20:53Edward-sanNote Added: 0004678
2012-09-15 21:40Edward-sanNote Edited: 0004678bug_revision_view_page.php?bugnote_id=4678#r2527
2012-09-15 21:50Edward-sanNote Edited: 0004678bug_revision_view_page.php?bugnote_id=4678#r2528
2012-09-15 21:50Edward-sanNote Edited: 0004678bug_revision_view_page.php?bugnote_id=4678#r2529
2012-09-15 23:08Edward-sanNote Edited: 0004678bug_revision_view_page.php?bugnote_id=4678#r2530
2012-09-16 07:38Torr SamahoNote Added: 0004682
2012-09-16 13:17Edward-sanNote Edited: 0004678bug_revision_view_page.php?bugnote_id=4678#r2545
2013-04-02 04:59tehuserNote Added: 0006200
2013-04-02 05:04tehuserFile Added: kpatch-crash-on-chillax (0001035).txt
2013-04-02 06:28tehuserNote Edited: 0006200bug_revision_view_page.php?bugnote_id=6200#r3418
2013-04-04 21:03tehuserNote Added: 0006221
2013-04-04 21:04tehuserFile Added: zandronum-crash-0x18-memory-access (0001035).txt
2013-04-04 21:07tehuserNote Edited: 0006221bug_revision_view_page.php?bugnote_id=6221#r3437
2013-04-05 01:22ZzZomboNote Added: 0006223
2013-04-05 02:02tehuserNote Added: 0006224
2013-04-05 02:13tehuserNote Edited: 0006224bug_revision_view_page.php?bugnote_id=6224#r3439
2013-04-05 03:53tehuserNote Added: 0006225
2013-04-05 03:57tehuserNote Edited: 0006225bug_revision_view_page.php?bugnote_id=6225#r3441
2013-04-05 03:59tehuserNote Edited: 0006225bug_revision_view_page.php?bugnote_id=6225#r3442
2013-04-06 18:36Torr SamahoNote Added: 0006256
2013-04-06 21:57tehuserNote Added: 0006262
2013-04-06 21:59tehuserNote Edited: 0006262bug_revision_view_page.php?bugnote_id=6262#r3457
2013-04-06 22:02tehuserNote Edited: 0006262bug_revision_view_page.php?bugnote_id=6262#r3458
2017-11-08 21:30DuskStatusfeedback => closed
2017-11-08 21:30DuskResolutionopen => unable to reproduce

Notes
(0004661)
Edward-san   
2012-09-13 12:31   
what's zandronum-server_auto?
(0004663)
NotJenova   
2012-09-13 14:21   
It's basically just a renamed zandronum-server.
(0004673)
Torr Samaho   
2012-09-15 19:31   
I can't say what's going on from the logs. Do you have any idea how to reproduce the problem? Or can you compile and run the server in debug mode? This way the crash log should be more informative.
(0004678)
Edward-san   
2012-09-15 20:53   
(edited on: 2012-09-16 13:17)
Crap I forgot to remove this message after creating a new ticket about this here.

(0004682)
Torr Samaho   
2012-09-16 07:38   
A general remark:
Quote from Edward-san
I already told you some time ago (in skulltag forum, via PM).
I can't remember this and don't know if this PM was lost for technical reasons like many other things on the Skulltag forum. Anyway, as I always say, suggestions and reports should not be done on the forum (neither in a thread nor via PM) or IRC, I simply can't keep track of all the stuff that's reported this way. I saw that you just made a new suggestion for the crashcatcher backport, so this one won't be forgotten now.

Anyway, I can't say whether the crashcatcher improvements would help here or not, but compiling in debug mode and running with gdb definitely will help.
(0006200)
tehuser   
2013-04-02 04:59   
(edited on: 2013-04-02 06:28)
I'm running a server on Chillax wads as well, and the same issue is happening. I switched the server to Konar6's kpatch version to see if that resolved the crashes, but it did not.

I've been running a customized build of kpatch (compiled with -DNDEBUG but unstripped), and I've got a gdb output from this build if anyone wants it. Realize that this is a log from kpatch (and not the official Zandronum), but I have reason to believe the crash is the same (segfault trying to access 0x17 or 0x18 in both Zandronum and kpatch -- likely trying to call a function as an offset from a null pointer).

I've done some modifications to the relevant code to see if the crashes go away, and thus far they have not. I can share these modifications as well as crash logs generated from this modified code as well, if anyone is interested.

Actually, I'll go ahead and upload the log from the unmodified kpatch build.

Edit: To be clear (should be clear from the uploaded log), I'm running Chillax and WoC

(0006221)
tehuser   
2013-04-04 21:03   
(edited on: 2013-04-04 21:07)
So I realized that probably none of the devs are interested in seeing crash reports from kpatch (since that's not their code), so I switched back to a custom built (built with -g -DNDEBUG) Zandronum 1.0. Server crashed again last night in the same bit of code. I'm attaching that crash log.

I can't figure it out though. obj is 0xc4940b8, but when we make a call to obj->PropagateMark() we end up trying to access memory at 0x18. Trying to access 0x18 would make sense to me if obj were NULL, 'cause then 0x18 would seem like a pretty reasonable offset from the instance address. But I don't understand how we're getting to 0x18 from 0xc4940b8. I've looked into resolving this myself and contributing code, but I can't make heads or tails of how we got to 0x18 (or 0x17 in some other crashes -- I think it's always been one or the other).

Edit: Crash log now attached to this report. Filename is zandronum-crash-0x18-memory-access (0001035).txt.

(0006223)
ZzZombo   
2013-04-05 01:22   
I guess you should have attached the full crash log, not only one file from it.
(0006224)
tehuser   
2013-04-05 02:02   
(edited on: 2013-04-05 02:13)
As far as I know the crash log is only one file?

Edit: I just looked through the crash catcher code and it looks to me like the only log created is the one I attached. It could be that it differs for Zandronum 1.1, 'cause I know the crash catcher code changed. Are you requesting that I change the crash catcher code somehow?

(0006225)
tehuser   
2013-04-05 03:53   
(edited on: 2013-04-05 03:59)
On second thought, I may have an explanation. obj is non-null in the previous call to PropagateMark(), but it makes a recursive call to itself, setting Gray = obj->GCNext(). This call would be made even if obj->GCNext() == NULL. When we make the recursive call, we use Gray as our obj. Thus, if the parent object's GCNext() was NULL, when we make the recursive call to PropagateMark(), we end up copying NULL into obj. We never check if obj == NULL, so this scenario could lead to a crash.

That's my theory. The question is, what's the right way to solve it? Just return 0 if obj == NULL?

(0006256)
Torr Samaho   
2013-04-06 18:36   
I think this kind of situation should not occur in the garbage collector. Can you compile in full debug mode (CMAKE_BUILD_TYPE=Debug) and see if you get more information?

Quote from tehuser
As far as I know the crash log is only one file?
That's correct. Under Linux there is only one file. The Windows crash logs consist of several files.
(0006262)
tehuser   
2013-04-06 21:57   
(edited on: 2013-04-06 22:02)
I'll look into running a full debug build. I tried that previously with kpatch, but I changed course since it seemed to frequently quit due to failed asserts. Worst case though, I should be able to change asserts so that they just printf() and continue running.

As another note, I change my explanation as to why we're seeing a memory access to 0x18. If a second, recursive call is being made to PropagateMark(), as I previously thought, it should be listed twice in the call stack. But it's not. So, it looks to me like we're somehow going from 0xc4940b8 to 0x18 within the same PropagateMark() call.

Oh, also, this code is single threaded, right? I know there were two threads running at the time of crash, but one thread appears to just be SDL. Everything within the Zandronum server is just within one thread, right?