Opened 7 years ago
Closed 7 years ago
#4320 closed defect (fixed)
Replay Menu: Support non-ASCII characters
Reported by: | elexis | Owned by: | elexis |
---|---|---|---|
Priority: | Release Blocker | Milestone: | Alpha 22 |
Component: | Core engine | Keywords: | |
Cc: | Patch: | Phab:D518 |
Description (last modified by )
Some users, f.e. with czech characters in the replay path run into that case:
Couldn't open %s. Non-latin characters are not supported yet
Thus the replay menu is entirely empty to affected users.
It occurs in VisualReplay::LoadReplayData
:
const CStr fileName = replayFile.string8(); std::ifstream* replayStream = new std::ifstream(fileName.c_str());
Previously utf8_from_wstring(fileName.string())
and OsString(filePath).c_str()
have been used, see r17314 r17356. It might be a bug in any of the string conversion functions, or even FileExists
Any proposed solution should be tested with all operating systems.
Reported by Bouddica.
Change History (13)
comment:1 by , 7 years ago
Summary: | Replay Menu: Support non-ASCI characters → Replay Menu: Support non-ASCII characters |
---|
comment:2 by , 7 years ago
Savegame loading and config loading does work for Bouddica.
The issue is reproducible on linux by adding a replay subdirectory, named with non-ASCII letters, containing a non-empty commands.txt
. Most, if not all distributions prevent adding users with exotic characters, so the bug should become only apparent on unix if the player rename their directories with those chars (hence the phrasing of error message).
We can notice that the GetFileInfo
call in VisualReplay.cpp
can resolve the given path correctly, but the std::ifstream
using the string8()
version doesn't. The Vfs
callstack reveals that given the current state of our file library functions, OsString
was indeed the correct way to resolve the path, not string8
.
Call stacks: Config Saving:
CConfigDB::WriteFile
fromConfigDB.cpp
CreateFile
fromvfs.cpp
Store
fromPRealDirectory
fromreal_directory.cpp
Store
fromio.h
Open
fromfile.h
FileOpen
fromfile.cpp
wopen
- UNIX:
wopen
fromunix/ufilesystem.cpp
open(OsString(pathname).c_str(), oflag);
system call, http://man7.org/linux/man-pages/man2/open.2.html
- WIN:
wopen(OsString(pathname).c_str(), oflag, _S_IREAD|_S_IWRITE);
fromwin/wposix/wfilesystem.cpp
_wsopen_s(&fd, OsString(pathname).c_str(), oflag, _SH_DENYRD, mode)
from the C Runtime Library https://msdn.microsoft.com/en-US/library/w64k0ytk.aspx (as we don't useO_DIRECT
as far as Philip knows)
- UNIX:
Screenshots Saving:
WriteScreenshot
fromUtil.cpp
tex_write
fromUtil.cpp
CreateFile
fromvfs.cpp
- see above
- Then it uses
GetRealPath
fromvfs.cpp
to translate theVfs
path to anOsPath
, appliesstring8()
and shows the full path usingLOGMESSAGERENDER
anddebug_printf
Loading vfs files ends up with the same callstack leading to wopen
.
comment:3 by , 7 years ago
string8
was only meant for providing something that can be displayed, never as input to anything operating on the file system, since valid paths are just arbitrary bytes, not required to be actually valid utf8 (or any other encoding).
But if you notice the common code path both of these share you might notice that using the vfs might have made this a non-issue.
follow-up: 6 comment:4 by , 7 years ago
From path.h
:
/** * Return a UTF-8 version of the path, in a human-readable but potentially * lossy form. It is *not* safe to take this string and construct a new * Path object from it (it may fail for some non-ASCII paths) - it should * only be used for displaying paths to users. */ std::string string8() const { // TODO: On Unixes, this will only be correct for ASCII or ISO-8859-1 // encoded paths; we should probably assume UTF-8 encoding by default // (but take care to handle non-valid-UTF-8 paths safely). return utf8_from_wstring(path); }
Searching for string8
yields 146 results, of all except these occurances in the following 2 files are LOGERROR
and debug_printf
calls:
Exceptions:
VisualReplay.cpp
: all ocurrances callstd::ifstream
StartVisualReplay
callingCGame::StartVisualReplay
LoadReplayData
to load thecommands.txt
GetReplayAttributes
to load the first line of thecommands.txt
GetReplayMetadata
to loadmetadata.json
SaveReplayMetadata
ScriptInterface.cpp
:LoadScript
LoadGlobalScripts
LoadGlobalScriptFile
ReadJSONFile
Not sure why that string8
proxy is needed in the first place - utf8_from_wstring
could just be used everywhere and the comment could go to that function. Though changing that will be a megapatch without any gain.
Searching for utf8_from_wstring
yields further filepath calls with questionable UTF8 conversions.
archive_zip.cpp
inutf8_from_wstring(pathname.string())
calls in seem suspicious.TerrainTextureEntry.cpp
hasm_Tag = utf8_from_wstring(path.Basename().string());
GUItext.cpp
uses above tag to check whetherIconExists
and to loadm_Font
L10n.cpp
extracts the locale code viautf8_from_wstring
JSI_Mod::GetAvailableMods
usesutf8_from_wstring
on the path- The
TemplateLoader
gets passedutf8_from_wstring(path)
and useswstring_from_utf8
to revert that before using the Vfs
So it seems this UTF8 error could occur in several places if we were to use non-ASCI characters in the public mod.
I didn't forget the Vfs topic. At least for the replay menu part we could do this (the replay writing still needs an open stream that is closed when ending the game).
But before, we have to evaluate whether there will be a performance impact. Loading some hundred replays can take minutes with an HDD, even just checking the filesize of all commands.txt
took about 20s (when the files aren't cached by the HDD yet). So it seems we would have to mount the files at the begin of VisualReplay::LoadReplayData
and unmount again before leaving the function. This is also needed as we want to refresh the replay list when reloading entering the GUI page. (Screenshots or savegames for example are also overwritten if the user changes the savegames or screenshots in the directory manually or with a second instance iirc). #3433 has a reviewable patch that reduces the loading time from 2 minutes to 2 seconds by caching everything into a replayCache.json
file and skipping as much I/O as possible.
Furthermore we have to check whether we lose another I/O performance advantage you had suggested back in the day (reading only the last line of the commands.txt
to determine the replay duration).
I agree that consistent file loading would be an improvement. I have to see and test the diff before I can give a fully informed decision.
comment:5 by , 7 years ago
Milestone: | Backlog → Alpha 22 |
---|---|
Owner: | set to |
comment:6 by , 7 years ago
Replying to elexis:
Not sure why that
string8
proxy is needed in the first place -utf8_from_wstring
could just be used everywhere and the comment could go to that function. Though changing that will be a megapatch without any gain. Searching forstring8
yields 146 results, [...]
One would think that you just answered your own question, otherwise you're welcome to go back to read the commit that introduced that, and the surrounding discussion.
ScriptInterface.cpp
:
LoadScript
LoadGlobalScripts
LoadGlobalScriptFile
All of those use it to get a string that will be reported when we get an error while executing that script, which is already loaded at that point and passed as another parameter.
ReadJSONFile
Didn't find any non-debug use there.
Searching for
utf8_from_wstring
yields further filepath calls with questionable UTF8 conversions.
archive_zip.cpp
inutf8_from_wstring(pathname.string())
calls in seem suspicious.
Our ZIP code does not (at least from a quick glance) support UTF-8 file names, which was only added to the ZIP specification in 2007, also all .zip
files we load should thus really only be ASCII, nothing against actually ensuring that. (Or extending it, or in case that is a lot of work using other code to do that work, though it is working fine so far.)
TerrainTextureEntry.cpp
hasm_Tag = utf8_from_wstring(path.Basename().string());
That is used to be written to map files, which means that there are likely some more limits on what can go there. Could be changed, but might be some work for no real benefit, as we control (or can define as valid) only files that are actually ASCII.
GUItext.cpp
uses above tag to check whetherIconExists
and to loadm_Font
This is again using only files in mods, so controlled by us.
L10n.cpp
extracts the locale code viautf8_from_wstring
The locale code is ASCII, and mod folders/names should be too. Also we do control those files, so no huge issue here.
JSI_Mod::GetAvailableMods
usesutf8_from_wstring
on the path
We need to store that in the config, so a few things are not valid (\n
,
might need escaping, etc) and at some point users shouldn't need to create folders there (#4207). So we can just require mod folders (the displayed name can be arbitrary anyway) to be [-_A-Za-z0-9]
or something like that.
However we should actually check that the mod folder can be stored in the config, so some work needs to be done there. If that check fails, we should display an error/warning message and hide the mod.
- The
TemplateLoader
gets passedutf8_from_wstring(path)
and useswstring_from_utf8
to revert that before using the Vfs
Those paths are coming from the simulation, so we do control them.
So it seems this UTF8 error could occur in several places if we were to use non-ASCI characters in the public mod.
So the fix for most of these is to add a few sanity checks that tell people that do such stupid things to not do that.
Now why are most of these issues above different from the replay issue? Because we do not control the path (and in this case we have a full path, not a vfs path, which only gives us a subset of what we want and control) in the replay case, and must thus expect everything. (We should still strive for the same thing in other code, but most of those don't seem urgent.)
But before, we have to evaluate whether there will be a performance impact. [...]
Mount once and keep it mounted; use hotloading, change the handler that causes hotloading to occur to also handle additions; rely on the vfs not populating whole directory trees, unless someone actually needs them; notice that we can extend the vfs interface to do only partial reads if we really need it; actually benchmark it.
comment:7 by , 7 years ago
Priority: | Must Have → Release Blocker |
---|
Doesn't have to stay a release blocker, but the replay menu is completely broken to the affected people.
comment:8 by , 7 years ago
Owner: | removed |
---|
comment:9 by , 7 years ago
I'm running into a similar issue for the campaigns; the WriteJSONFile function will output a different file name (presumably because it's UTF16?) if you input strange characters (such as éèç) for the campaign name. I could work around this, but it's annoying.
However I'm not sure where it goes wrong? Does VFS support UTF8-filenames?
comment:10 by , 7 years ago
Description: | modified (diff) |
---|
i'm having the same issue so a simple fast easy good solution is to change the replays and/or save path. i do not know how to so please tell me how i can change replays reading path. thank you
comment:11 by , 7 years ago
Patch: | → Phab:D518 |
---|
comment:12 by , 7 years ago
Keywords: | beta removed |
---|
Does loading savegames or config files work? If yes, then maybe using the same way of getting files might help.