Opened 13 years ago

Last modified 6 years ago

#968 new defect

UserReporter may cause significant delay when shutting down

Reported by: historic_bruno Owned by:
Priority: Must Have Milestone: Backlog
Component: Core engine Keywords:
Cc: Patch:

Description

To reproduce this, run the game and immediately open Atlas (Tools and Options > Scenario Editor). There is a delay of up to 8s on my system before the UserReporter worker thread exits. Similar behavior occurs when running the game and immediately closing it (the process lingers in the background). The culprit seems to be sys_get_proxy_config which is run on init and may be very slow (tens of seconds). Expected behavior is that the system shuts down as gracefully and responsively as possible when the user desires and if possible, sys_get_proxy_config should not prevent that.

Change History (5)

comment:1 by Philip Taylor, 12 years ago

sys_get_proxy_config is slow on my machine too. I think the difficulty is that the relevant Win32 call is synchronous, but if we just shut down the engine while the user-report thread is still running (instead of waiting for clean termination) then there's likely various race conditions that might causes crashes during shutdown (e.g. the user-report thread uses the profiler, which the engine might be shutting down).

Maybe the user-reporter thread should start a second thread that just does sys_get_proxy_config, and wait asynchronously for the result. When the user-reporter thread is told to shut down, that second thread can safely continue running in the background (with its eventual return value being ignored) while the engine terminates.

comment:2 by elexis, 7 years ago

Priority: Nice to HaveShould Have

See also the report by Imarok in #4527:

Sometimes pyrogenesis takes very long to shutdown. This is because the deinitialization of the UserReporter called from ​here needs between some us and multiple minutes. There is also a TODO there in L185: TODO: should have a timeout in case of network hangs.

Notice in both cases it's the server not responding quickly enough

Last edited 7 years ago by elexis (previous) (diff)

in reply to:  1 comment:3 by elexis, 6 years ago

Replying to Philip Taylor:

wait asynchronously for the result

curl_easy_perform is also blocking and should use the multi handle variant. SDL_QuitRequested() could probably be used to catch the event.

comment:4 by elexis, 6 years ago

Priority: Should HaveMust Have

As of r21898 the UserReporter client requires SSL for the transfer of the UserReport data. Since the hostname still links to Philips server and since Philips server doesn't listen on port 443, the connection will timeout.

When the user wants to exit the application, the libCURL default timeout of 5 minutes blocks the program from exiting for that period. Even if the DNS entry will change soon, it seems better to limit this issue to 15 seconds until asynchroneous libCURL multi-handle requests are implemented.

comment:5 by elexis, 6 years ago

In 21900:

Don't delay the pyrogenesis shutdown for 5 minutes but at most 10 seconds if the server is not responding, refs #968;
in particular Philips server not responding to SSL which became a requirement by the client following rP21898 or following GDPR 32.1.a, refs #5257 while the new backend is not ready yet.

Mark file emptied in rP21898 as deleted.
Add scrolling to the UserReporter window, so that the timeout error strings that became visible following the bugfixes in rP21892 and rP21897 and don't overlap with the buttons.

Note: See TracTickets for help on using tickets.