MantisBT - Zandronum
View Issue Details
0004064Zandronum[All Projects] Bugpublic2022-12-22 22:252023-07-27 01:53
Zalewa 
DrinkyBird 
normalminorsometimes
resolvedfixed 
PCWindows11
3.2 
3.23.2 
0004064: Trouble with big UDP packets in launcher query responses from game servers
The issue affects game servers, not the master server.

With more and more information being transmitted in the launcher query, the UDP packet that is sent as the reply can be big enough to cause transmission problems.

As an example, there's currently a TSPG server at 104.128.58.120:10711 that is running almost 30 WADs. Moreover up to 32 players may be connected to this server. When you also consider that Zandronum is sending a checksum for each WAD, the number of bytes quickly grows. I checked this server and it's sending a 2KB UDP packet in the response.

A user (King Dumb) has reported problems querying this server when using Windows 11. It would always reply with the "Refreshed too fast" status. The reason for this is that the query packet from his launcher reaches the server, but the 2KB reply from the server is dropped somewhere in transmission, possibly at the network stack of his OS.

Now, I remember that the problem with too big packets was already foreseen in the master server query. The master server sends the query response split into multiple packets of smaller size. But the game servers don't do that. I think it may be necessary to introduce the same split into this protocol too.
I cannot replicate this problem myself, neither on Windows 10 or Ubuntu 22.04. The server from the example responds just fine for me, whereas for King Dumb it was impossible to get a proper response from this server.

Discord message where I diagnose the problem:'https://discord.com/channels/297616756636254218/479693063128875008/1055582870229491834 [^]'

MS forums where people discuss a similar issue with UDP and big packets:'https://social.technet.microsoft.com/Forums/en-US/965e107e-d9b0-4240-ac3f-74797c91b476/unable-to-send-udp-packets-larger-than-the-mtu-with-windows-build-1809-using-c-udpclient?forum=win10itpronetworking [^]'
No tags attached.
related to 0004130needs testing Zalewa Doomseeker Doomseeker requests with flag SQF2_PWAD_HASHES even if flag 'check the integrity of local wads' is disabled 
related to 0004142needs testing Zalewa Doomseeker Support the Zandronum's segmented server query response 
Issue History
2022-12-22 22:25ZalewaNew Issue
2022-12-22 22:39KaminskyStatusnew => acknowledged
2022-12-31 00:57KaminskyTarget Version => 3.2
2023-01-05 19:23dukeNote Added: 0022675
2023-01-26 15:29ZalewaNote Added: 0022743
2023-01-26 15:31ZalewaNote Edited: 0022743bug_revision_view_page.php?bugnote_id=22743#r13915
2023-01-26 18:51dukeNote Added: 0022750
2023-01-26 22:08dukeNote Edited: 0022750bug_revision_view_page.php?bugnote_id=22750#r13917
2023-01-27 04:18dukeNote Added: 0022752
2023-01-27 04:19dukeNote Edited: 0022752bug_revision_view_page.php?bugnote_id=22752#r13921
2023-01-27 04:20dukeNote Edited: 0022752bug_revision_view_page.php?bugnote_id=22752#r13922
2023-01-27 04:22dukeNote Edited: 0022752bug_revision_view_page.php?bugnote_id=22752#r13923
2023-03-19 21:48DrinkyBirdAssigned To => DrinkyBird
2023-03-19 21:48DrinkyBirdStatusacknowledged => assigned
2023-03-19 21:50DrinkyBirdNote Added: 0022814
2023-04-30 16:57WaTaKiDRelationship addedrelated to 0004130
2023-05-07 21:13DrinkyBirdNote Added: 0022843
2023-05-07 21:13DrinkyBirdStatusassigned => needs testing
2023-05-07 21:31DrinkyBirdNote Added: 0022844
2023-06-25 13:43ZalewaRelationship addedrelated to 0004142
2023-07-22 22:27ZalewaNote Added: 0022891
2023-07-27 01:49DrinkyBirdNote Added: 0022897
2023-07-27 01:49DrinkyBirdStatusneeds testing => resolved
2023-07-27 01:49DrinkyBirdFixed in Version => 3.2
2023-07-27 01:49DrinkyBirdResolutionopen => fixed
2023-07-27 01:53DrinkyBirdNote Edited: 0022897bug_revision_view_page.php?bugnote_id=22897#r14001

Notes
(0022675)
duke   
2023-01-05 19:23   
I hope the server protocol doesn't become more complicated with multi-packet responses just because of some Windows bug. Microsoft may fix their software eventually but added protocol complexity will stay forever.

A possible workaround for the people impacted by this problem would be to use 'https://doomlist.net/ [^]' to avoid the need to fully query servers from their end. Being able to join games directly from the website would be a lot easier for people if my proposed change in issue 0004015 was merged.
(0022743)
Zalewa   
2023-01-26 15:29   
(edited on: 2023-01-26 15:31)
Quote from duke
I hope the server protocol doesn't become more complicated with multi-packet responses just because of some Windows bug. Microsoft may fix their software eventually but added protocol complexity will stay forever.

These are just two sentences but there are numerous problems with them:

1. Even though the problem is described on Microsoft's page and linked there to a certain Windows build, the issue with UDP "jumbo" packets isn't limited to just Windows. We can't say this is "just a Windows bug" where other networking equipment or even the ISP may be at fault.
2. What makes you think that Microsoft will "fix their software"?
3. Why are you not counting for "their software" that Microsoft might not fix and people may still use? Doomseeker 1.4 still runs on Windows XP.
4. Why are you so concerned about the complexity of the protocol? Isn't it the job of the software developers to manage the complexity? What makes you equalize complexity with complications, especially since the master server is already segmenting its packets and there are no problems with it?

Quote from duke
A possible workaround for the people impacted by this problem would be to use'https://doomlist.net/ [^]' [^] to avoid the need to fully query servers from their end. Being able to join games directly from the website would be a lot easier for people if my proposed change in issue 0004015 was merged.

This is incorrect, because Doomseeker still needs to query the server to learn about its WADs - this is mandatory. We can reduce the packet size here, because in such scenario Doomseeker is asking for many things it doesn't need, but a server that hosts sufficiently large amount of WADs will still trigger the problem.

(0022750)
duke   
2023-01-26 18:51   
(edited on: 2023-01-26 22:08)
I take back my workaround suggestion, you are right that a server with a large number of wads may go over the packet size limit and prevent the affected people from joining even when using the website links. My mistake.

All I meant to say is, let's not rush into complicating the protocol before we know there isn't a better solution. I don't think we should just accept that some network stack decided that a 2KB packet is "too big" and devs need to work around that.

To answer your points:
1) Indeed we don't have enough information to tell if this is just a Windows bug or not, though it seems likely to me. We need reports from the affected people about their configuration to investigate more.

2) I said Microsoft MAY fix it, I don't know if they will.

3) I don't want to leave people behind either. Hopefully as we get more understanding of the problem we will know if there are any fixes or viable workarounds.

4) More complexity means more potential bugs, more work for people developing the protocol or tools using that protocol. Hopefully the protocol change would be done in a backwards-compatible way, but still. The job of software developers is to manage the *unavoidable* complexity that is inherent in the problems we are trying to solve, while avoiding or minimizing any unnecessary complexity.
Of course the devs are free to disregard my opinion and do whatever they think is best.

(0022752)
duke   
2023-01-27 04:18   
(edited on: 2023-01-27 04:22)
I did some investigation and I must say I was probably also wrong thinking this is just some obscure Windows bug.

While the theoretical maximum for UDP packet size is around 65KB, packets above some threshold (in practice 1500+ bytes, the most common MTU setting) get fragmented on the IP layer and some systems and ISPs have issues with fragmented packets regardless of OS.

This article has a lot of detail and ways to test large packet delivery:'https://blog.cloudflare.com/ip-fragmentation-is-broken/ [^]'
It cites a paper from 2012 saying that around 6% hosts block inbound fragment datagrams.

Just tonight there was a MegaMan server up with 20+ wads and 25+ players going over the 1500 byte threshold and I found out that even the Oracle Cloud instance hosting 'https://doomlist.net [^]' seems to be dropping UDP packets larger than 1500 bytes and was failing to query that server while many players were connected. That's embarrasing.

I suppose that's why the Steam Server Query protocol ('https://developer.valvesoftware.com/wiki/Server_queries [^]'), a UDP based protocol similar to Zandronum Launcher Protocol does fragmentation on the application level, like Zalewa proposed. Not fun to implement but maybe a good idea after all.

To keep compatibility with older launchers I think servers would need to keep sending big packets by default and allow clients to request fragmentation with a flag.

Sorry for muddying the ticket with my previous uninformed opinions.

(0022814)
DrinkyBird   
2023-03-19 21:50   
I've been working on this.
Some initial refactoring was required:'https://hg.osdn.net/view/zandronum/zandronum-stable/rev/805780cfc641 [^]'
The actual segmented protocol changes are still being worked on.
(0022843)
DrinkyBird   
2023-05-07 21:13   
'https://hg.osdn.net/view/zandronum/zandronum-stable/rev/0e8360876fdf [^]'

Now I need to clean up and update the documentation...
(0022844)
DrinkyBird   
2023-05-07 21:31   
Docs updated:'https://wiki.zandronum.com/Launcher_protocol [^]'

As a bonus here's the tool I used to test this with:'https://github.com/DrinkyBird/zanquerytest/blob/master/index.js [^]'
(0022891)
Zalewa   
2023-07-22 22:27   
One question about the new segmented protocol: what happens when an SQF or SQF2 field extends beyond the MTU limit?

I created 100 empty WADs, called them empty_<n>.wad, loaded them up on server and the server sent me 4796 bytes in a response, not segmented. Now, if we go into the segmented responses: each WAD checksum in the SQF2_PWAD_HASHES segment is 33 bytes, so we reach the 1472 bytes MTU for this segment with 44 WADs.

What happens when a server tries to host more WADs than that?

1. Will the SQF2_PWAD_HASHES be sent in a single segment, breaching the 1472 MTU?
2. Will I receive two segments, both with SQF2_PWAD_HASHES, and I'll have to resume the parsing in the 2nd segment where I left off in the 1st one?

I can't check that myself, just yet.
(0022897)
DrinkyBird   
2023-07-27 01:49   
(edited on: 2023-07-27 01:53)
Marking this as resolved, as a Zan beta with this is now out, Doomseeker now supports it (0004142) and with it, servers with huge responses are now showing up consistently for people affected by this problem