Opened 12 years ago

Last modified 3 years ago

#1396 new enhancement

[PATCH] Test UTF-8 handling of scripts and data

Reported by: historic_bruno Owned by:
Priority: Nice to Have Milestone: Backlog
Component: Core engine Keywords: patch, simple
Cc: Patch:

Description

Now that we support UTF-8 script and data files (e.g. JSON), it would be nice to test that we don't break it by accident. We already test pre-loaded JSON and script strings, but not loading them from files (beware the confusing LoadScript() call takes a filename but only for debugging). It shouldn't be any more complicated than figuring out how to add the VFS init stuff to the tests and then adding a few test files. I think we should test UTF-8 with and without BOM, since that seems likely to cause breakage.

Attachments (1)

track1396.patch (50.1 KB ) - added by Teiresias 10 years ago.
Proposal for utf8-support unit-tests

Download all attachments as: .zip

Change History (15)

comment:1 by historic_bruno, 12 years ago

Keywords: simple unicode added

comment:2 by Adrián Chaves, 11 years ago

Since all scripts are supposed to be in UTF-8, I suggest discouraging any use of BOM, and removing it from existing files.

The Unicode Standard permits the BOM in UTF-8,[2] but does not require or recommend its use.— Wikipedia

Last edited 11 years ago by Adrián Chaves (previous) (diff)

comment:3 by historic_bruno, 11 years ago

We've never really encouraged it, but as some (Windows) text editors add a BOM by default, it seemed better to simply handle it and not confuse people who know nothing about UTF-8 by rejecting their files :) As far as I know, most or all text editors that support UTF-8 can handle the BOM, and most of them save without it by default.

by Teiresias, 10 years ago

Attachment: track1396.patch added

Proposal for utf8-support unit-tests

comment:4 by Teiresias, 10 years ago

I've tried to implement unit-tests for all (reasonable) places where file.DecodeUTF8 is used for reading json/js (only existing calls). Test approach: Except for the ComponentManager tests which uses two new files in /binaries/dat/mods/_test.sim/simulation/components, a stub vfs implementation has been written for the utf8-support tests. This provides three advantages:

  1. Less additional test files
  2. Tests run faster as less test classes need to mount a "real" vfs
  3. Tests are easier to analyze (in Windows environment, the BOM is hidden by most editors)

Patch tested with rev. 14713 in

  • Vista/MSVC2008, appears to work ok (compiles, links, tests run ".....OK!")
  • openSUSE 12.2/GCC 4.7.1, compiles and links ok, not run (./test and ./pyrogenesis segfault immediately at startup; unfortunately, a clean svn check-out shows the same behavior on that machine, so i couldn't actually test the patch).

The patch could be optimized further by extacting the StubVFS implementations from the test header and moving them into a common (separate) header. I have not done so as i found no precedence case for that in the 0ad project (e.g. where to place the StubVFS implementation)

comment:5 by Teiresias, 10 years ago

Keywords: review patch added
Milestone: BacklogAlpha 16
Summary: Test UTF-8 handling of scripts and data[PATCH] Test UTF-8 handling of scripts and data

comment:6 by leper, 10 years ago

Milestone: Alpha 16Alpha 17

comment:7 by Itms, 10 years ago

Milestone: Alpha 17Alpha 18

comment:8 by historic_bruno, 9 years ago

Sorry for the delay in responding. I had a significantly more minimal idea of tests in mind when opening the ticket. At the least, the stub VFS implementations do need to be factored out, but I don't think those are even necessary and should be proposed and discussed in a separate ticket.

As far as performance of the tests goes, we don't care too much. Obviously it shouldn't be so slow that nobody runs it, but the parts that are really slow are disabled by default and there shouldn't be so many test files involved that they impact performance.

comment:9 by Itms, 9 years ago

Milestone: Alpha 18Alpha 19

comment:10 by Itms, 9 years ago

Keywords: review removed
Milestone: Alpha 19Backlog
Owner: set to Itms

I'm looking into making that compatible to the current version. I will also review it at the same time.

comment:11 by Itms, 6 years ago

Keywords: unicode removed
Owner: Itms removed

I never actually progressed on this. It's not "simple" but it is a very good entrypoint for anyone wanting to work on the engine, so I'm leaving the keyword and I'm available for guidance if someone is up for the task :)

comment:12 by Cento1, 3 years ago

I just read the ticket and I think it is interesting. However, I'm new in the 0ad project. So, can someone give me advice for the beginning?

comment:13 by Silier, 3 years ago

Keywords: simple removed
severity: simple

comment:14 by Silier, 3 years ago

Keywords: simple added
Note: See TracTickets for help on using tickets.