Opened 12 years ago
Last modified 3 years ago
#1396 new enhancement
[PATCH] Test UTF-8 handling of scripts and data
Reported by: | historic_bruno | Owned by: | |
---|---|---|---|
Priority: | Nice to Have | Milestone: | Backlog |
Component: | Core engine | Keywords: | patch, simple |
Cc: | Patch: |
Description
Now that we support UTF-8 script and data files (e.g. JSON), it would be nice to test that we don't break it by accident. We already test pre-loaded JSON and script strings, but not loading them from files (beware the confusing LoadScript()
call takes a filename but only for debugging). It shouldn't be any more complicated than figuring out how to add the VFS init stuff to the tests and then adding a few test files. I think we should test UTF-8 with and without BOM, since that seems likely to cause breakage.
Attachments (1)
Change History (15)
comment:1 by , 12 years ago
Keywords: | simple unicode added |
---|
comment:3 by , 11 years ago
We've never really encouraged it, but as some (Windows) text editors add a BOM by default, it seemed better to simply handle it and not confuse people who know nothing about UTF-8 by rejecting their files :) As far as I know, most or all text editors that support UTF-8 can handle the BOM, and most of them save without it by default.
comment:4 by , 10 years ago
I've tried to implement unit-tests for all (reasonable) places where file.DecodeUTF8 is used for reading json/js (only existing calls). Test approach: Except for the ComponentManager tests which uses two new files in /binaries/dat/mods/_test.sim/simulation/components, a stub vfs implementation has been written for the utf8-support tests. This provides three advantages:
- Less additional test files
- Tests run faster as less test classes need to mount a "real" vfs
- Tests are easier to analyze (in Windows environment, the BOM is hidden by most editors)
Patch tested with rev. 14713 in
- Vista/MSVC2008, appears to work ok (compiles, links, tests run ".....OK!")
- openSUSE 12.2/GCC 4.7.1, compiles and links ok, not run (./test and ./pyrogenesis segfault immediately at startup; unfortunately, a clean svn check-out shows the same behavior on that machine, so i couldn't actually test the patch).
The patch could be optimized further by extacting the StubVFS implementations from the test header and moving them into a common (separate) header. I have not done so as i found no precedence case for that in the 0ad project (e.g. where to place the StubVFS implementation)
comment:5 by , 10 years ago
Keywords: | review patch added |
---|---|
Milestone: | Backlog → Alpha 16 |
Summary: | Test UTF-8 handling of scripts and data → [PATCH] Test UTF-8 handling of scripts and data |
comment:6 by , 10 years ago
Milestone: | Alpha 16 → Alpha 17 |
---|
comment:7 by , 10 years ago
Milestone: | Alpha 17 → Alpha 18 |
---|
comment:8 by , 9 years ago
Sorry for the delay in responding. I had a significantly more minimal idea of tests in mind when opening the ticket. At the least, the stub VFS implementations do need to be factored out, but I don't think those are even necessary and should be proposed and discussed in a separate ticket.
As far as performance of the tests goes, we don't care too much. Obviously it shouldn't be so slow that nobody runs it, but the parts that are really slow are disabled by default and there shouldn't be so many test files involved that they impact performance.
comment:9 by , 9 years ago
Milestone: | Alpha 18 → Alpha 19 |
---|
comment:10 by , 9 years ago
Keywords: | review removed |
---|---|
Milestone: | Alpha 19 → Backlog |
Owner: | set to |
I'm looking into making that compatible to the current version. I will also review it at the same time.
comment:11 by , 6 years ago
Keywords: | unicode removed |
---|---|
Owner: | removed |
I never actually progressed on this. It's not "simple" but it is a very good entrypoint for anyone wanting to work on the engine, so I'm leaving the keyword and I'm available for guidance if someone is up for the task :)
comment:12 by , 3 years ago
I just read the ticket and I think it is interesting. However, I'm new in the 0ad project. So, can someone give me advice for the beginning?
comment:13 by , 3 years ago
Keywords: | simple removed |
---|---|
severity: | → simple |
comment:14 by , 3 years ago
Keywords: | simple added |
---|
Since all scripts are supposed to be in UTF-8, I suggest discouraging any use of BOM, and removing it from existing files.