Opened 6 years ago

Closed 5 years ago

#4936 closed defect (fixed)

Atlas crashes if map descriptions contain Unicode characters

Reported by: Adrián Chaves Owned by: s0600204
Priority: Nice to Have Milestone: Alpha 24
Component: Atlas editor Keywords: unicode
Cc: Patch: phab:D1395

Description

Atlas crashes on Linux when trying to generate a random map from a template with a Unicode character in its description.

This was detected after I replaced sequences of three periods (...) by the Unicode ellipsis character (…) in the description of random map templates (e.g. Unknown) among other places.

elexis detected the issue on his Linux x86_64 machine, Vladislav could not reproduce the issue on Windows, and I could reproduce the issue on Linux x86_64 like elexis.

For the time being we will avoid using non-ASCII characters in translatable English strings, but eventually we should fix this issue and update the English style guide to encourage Unicode usage instead of forbidding it.

Attachments (1)

0ad-atlas-unicode-crash.txt (2.0 KB ) - added by Adrián Chaves 6 years ago.
Crash traceback

Download all attachments as: .zip

Change History (12)

by Adrián Chaves, 6 years ago

Attachment: 0ad-atlas-unicode-crash.txt added

Crash traceback

comment:1 by elexis, 6 years ago

Milestone: BacklogAlpha 23

Meroë currently breaks atlas hard.

While it's arguable to have the title string only in ascii, at least the description should be able to handle these characters.

If we don't fix it, then we have to use Meroe:

Index: binaries/data/mods/public/maps/random/fields_of_meroe.json
===================================================================
--- binaries/data/mods/public/maps/random/fields_of_meroe.json	(revision 21524)
+++ binaries/data/mods/public/maps/random/fields_of_meroe.json	(working copy)
@@ -1,8 +1,8 @@
 {
 	"settings" : {
-		"Name" : "Fields of Meroë",
+		"Name" : "Fields of Meroe",
 		"Script" : "fields_of_meroe.js",
-		"Description" : "The \"Island of Meroë\", a vast peninsula flanked by the Nile and Atbarah rivers, formed the heartland of ancient Kush. Where the harsh deserts start making way for the semi-arid savannahs and small acacia forests dot the landscape. The area is rich in resources and the ever-present Nile brings life, but grave threats loom on the opposite riverbank.",
+		"Description" : "The \"Island of Meroe\", a vast peninsula flanked by the Nile and Atbarah rivers, formed the heartland of ancient Kush. Where the harsh deserts start making way for the semi-arid savannahs and small acacia forests dot the landscape. The area is rich in resources and the ever-present Nile brings life, but grave threats loom on the opposite riverbank.",
 		"Preview" : "fields_of_meroe_dry.png",
 		"Keywords": ["new"],
 		"CircularMap" : true,
Last edited 6 years ago by elexis (previous) (diff)

comment:2 by s0600204, 6 years ago

Could I ask someone/more than one person to check the following?:

  • Start Atlas
  • Press "Continue" as many times as needed
  • Enter the appropriate input to tell Atlas to generate the "Fields of Meroë" map (might be the blank entry between "Extinct Volcano" and "Flood")
  • Pay attention to the console output
  • One of the lines should say Generating <map name> of size [...]
  • Question One: What is the value of <map name> you get?
  • Close Atlas
  • Open 0AD normally
  • Start a new single player match, generating the "Fields of Meroë" map
  • Again, pay attention to the lines ouput to the console
  • And again, one of the lines should say Generating <map name> of size [...]
  • Question Two: What is the value of <map name> you get?

For reference, I get:

  • Atlas: Generating Fields of Meroë of size [...]
  • In-Game: Generating Fields of Meroë of size [...]

comment:3 by elexis, 6 years ago

I get the same results as you.

It's exactly the same encoding issue as in #4320. There was a missing UTF16 ot UTF8 conversion or something.

comment:4 by Stan, 6 years ago

Windows Call Stack. Crash occurs in AtlasUI code, but I can't get to compile it...

ntdll.dll!_RtlReportCriticalFailure@12
()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!_RtlpReportHeapFailure@4
()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!_RtlpLogHeapFailure@24
()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!_RtlFreeHeap@12
()	Unknown	Non-user code. Symbols loaded.
 	msvcr120.dll!free(void * pBlock=0x078a70cc) Line 51	C	Non-user code. Symbols loaded.
>	[Inline Frame] pyrogenesis.exe!AtlasMessage::Shareable<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >::{dtor}() Line 156	C++	Symbols loaded.
 	pyrogenesis.exe!AtlasMessage::Shareable<std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >::Unalloc() Line 212	C++	Symbols loaded.
 	pyrogenesis.exe!AtlasMessage::Shareable<std::vector<std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >,std::allocator<std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > > > >::operator=(const AtlasMessage::Shareable<std::vector<std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >,std::allocator<std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > > > > & rhs={...}) Line 243	C++	Symbols loaded.
 	pyrogenesis.exe!AtlasMessage::fGetMapList(AtlasMessage::qGetMapList * msg=0x011ce824) Line 352	C++	Symbols loaded.
 	pyrogenesis.exe!AtlasMessage::fGetMapList_wrapper(AtlasMessage::IMessage * msg=0x011ce824) Line 343	C++	Symbols loaded.
 	pyrogenesis.exe!RunEngine(void * data=0x011cfd84) Line 173	C++	Symbols loaded.
 	pyrogenesis.exe!thread_start(void * param=0x03714080) Line 624	C++	Symbols loaded.
 	ucrtbase.dll!thread_start<unsigned int (__stdcall*)(void *)>()	Unknown	Non-user code. Symbols loaded.
 	kernel32.dll!@BaseThreadInitThunk@12
()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!__RtlUserThreadStart()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!__RtlUserThreadStart@8
()	Unknown	Non-user code. Symbols loaded.

Removing this line makes the file open work again L352 of MapHandlers

	GET_FILE_LIST(L"maps/tutorials/", tutorialFilenames);

So it's likely faulty code there in Map Dialog.cpp, can't track it further so do not know.

		for (const std::wstring& filename : *qry.skirmishFilenames)
		{
			wxString name = filename.substr(skirmishPath.Length());
			listBox->Append(name, new wxStringClientData(filename));
		}

comment:5 by Stan, 6 years ago

Might be interesting https://forums.wxwidgets.org/viewtopic.php?t=6377

I typed wxstring umlaut on Google 

comment:6 by s0600204, 6 years ago

Patch: phab:D1395

Stan, what you're suffering in Comment 4 sounds like it might be a problem with phab:D614 / r21540. Possibly.

comment:7 by Vladislav Belov, 6 years ago

Did you try to compile Atlas? There was a suggestion by Itms to drop old wxWidgets (2.8) and update the precompiled lib.

comment:8 by elexis, 6 years ago

In 21760:

Two hurtful string changes in preparation of string freeze, refs #4936, decided in the staff meeting.

comment:9 by Itms, 6 years ago

I spent some time trying to improve on s0600204's patch, but my conclusion is that the only proper fix would take some work. I would like us to push this bug to A24.

  • I thought the current patch was good, because it is not an issue to use wxWidgets in Atlas. However,
    • AtlasObject should not depend on wxWidgets, it doesn't need to. It is just an interface parsing XML and JSON. One should take a look at premake and remove the unnecessary dependency, as originally hinted by Vlad.
    • One could think the current patch is "good enough" for now, but it prevents us from writing tests for the JSON parser (we currently only have tests for the XML one). Indeed, tests cannot depend on wxWidgets (the dependency must remain optional on Windows). So I really don't want that to land, even temporarily, because it will make future improvements more difficult.
  • Currently AtlasObjects hold strings as wide strings. This is not an issue per se, but since our data files are encoded in UTF8, we need some conversion code. What we have right now is an unmaintained code used by the XML parser (which, by the way, works with unsigned char*s instead of char*s, so we can't just used that code for the JSON parser).
    • It would be great to get rid of the conversion code, so I think the easiest fix is to hold data as UTF8 strings in that project. That removes a lot of overhead.
    • One thing that has to be checked is that jsonspirit and libxml properly load the files and pass data without messing the encoding. I believe they don't, but maybe a compiler flag was enabled somewhere to handle our wide strings.
  • Once this is done, the difficulty is to make sure this UTF8 data is correctly used. So the entire AtlasUI project should be examined, and strings from the AtlasObject project should be loaded into wxWidgets code using wxString::FromUTF8. Currently wxStrings are created in a lot of different inconsistent ways, which explains why the current patch doesn't fix all occurrences of �s.
  • All those changes will fix a big amount of bugs and remove quite a few hacks, but of course, regressions might pop up. Let's do that carefully.

TL;DR: In order to have a sensible and maintainable way of handling Unicode, we should use only UTF8 strings for data in Atlas, and make sure they are handled as UTF8 by wxWidgets. That will allow us to remove old unmaintained code and add tests. This is more work that can be done a few days before a release, and I don't see any quick fix that wouldn't create more problems in the future.

comment:10 by s0600204, 6 years ago

Milestone: Alpha 23Alpha 24

Agreed.

comment:11 by s0600204, 5 years ago

Owner: set to s0600204
Resolution: fixed
Status: newclosed

In 22335:

Improve UTF-8 character handling in Atlas

(Also prevents the compile-time warnings reported in the abandoned D1432)

Accepted by: Itms
Patch linting by: Stan, Vladislav, wraitii
Also tested by: Imarok
Fixes: #4936
Differential Revision: https://code.wildfiregames.com/D1395

Note: See TracTickets for help on using tickets.