Opened 12 years ago

Last modified 11 years ago

#1556 new defect

Properly handle UTF-8 files in reference check tool

Reported by: historic_bruno Owned by:
Priority: If Time Permits Milestone: Backlog
Component: Non-game systems Keywords: perl
Cc: Patch:

Description

We have a Perl script for finding unused and missing assets. For JSON files, it uses JSON::decode_json which expects a binary UTF-8 encoded string, but we don't do any encoding and we don't strip a potential BOM so the JSON parsing fails.

maur.json is an example that fails due to its BOM.

Attachments (1)

utf8-bom.patch (940 bytes ) - added by retrosnub 12 years ago.

Download all attachments as: .zip

Change History (6)

by retrosnub, 12 years ago

Attachment: utf8-bom.patch added

comment:1 by retrosnub, 12 years ago

Either remove all BOMs and never use them again (they are not recommended anyways when using utf-8) or do something like in the attached patch.

comment:2 by ben, 12 years ago

In 12456:

Adds UTF-8 BOM stripping to checkrefs.pl when using JSON files, patch by retrosnub. Refs #1556

comment:3 by Markus, 11 years ago

Changeset r12456 (in Alpha 11) fixed this ticket?

in reply to:  3 comment:4 by historic_bruno, 11 years ago

Replying to Markus:

Changeset r12456 (in Alpha 11) fixed this ticket?

It should fix the error loading some JSON files, but I'm not sure it's correctly passing UTF-8 strings into JSON::decode_json.

comment:5 by Adrián Chaves, 11 years ago

I agree with the removal of BOM, as stated in #1396. Funny enough, a Python-based tool used for string extraction in the internationalization branch (see #67) also fails to parse JSON files with BOM (I’ve reported this upstream, but won’t be fixed anytime soon).

Note: See TracTickets for help on using tickets.