fix_latin - filters a data stream that is predominantly utf8 and 'fixes' any
latin (ie: non-ASCII 8 bit) characters
fix_latin options <input_file >output_file
--use-xs <value> 'auto' | 'always' | 'never'
--version list version number
--help detailed help message
The script acts as a filter, taking source data which may contain a mix of
ASCII, UTF8, ISO8859-1 and CP1252 characters, and producing output will be all
Multi-byte UTF8 characters will be passed through unchanged (although over-long
UTF8 byte sequences will be converted to the shortest normal form). Single
byte characters will be converted as follows:
0x00 - 0x7F ASCII - passed through unchanged
0x80 - 0x9F Converted to UTF8 using CP1252 mappings
0xA0 - 0xFF Converted to UTF8 using Latin-1 mappings
- --use-xs 'auto' | 'always' | 'never'
- Override default ('auto') behaviour of trying to use XS
module and falling back to pure-Perl version if not available. Set to
'never' to always use the Perl version or 'always' to always use XS and
die if not available.
- --version (alias -v)
- Display version number of underlying Encoding::FixLatin and
- --help (alias -?)
- Display this documentation.
This script was originally written to assist in converting a Postgres database
from SQL-ASCII encoding to UNICODE UTF8 encoding. The following examples
illustrate its use in that context.
If you have a SQL format dump file that you would normally restore by piping
into 'psql', you can simply filter the dump file through this script:
fix_latin < dump_file | psql -d database
If you have a compressed dump file that you would normally restore using
'pg_restore', you can omit the '-d' option on pg_restore and pipe the
resulting SQL through this script and into psql:
pg_restore -O dump_file | fix_latin | psql -d database
To take a look at non-ASCII lines in the dump file:
perl -ne '/^COPY (\S+)/ and $t = $1; print "$t:$_" if /[^\x00-\x7F]/' dump_file
This script is implemented using the Encoding::FixLatin Perl module. For more
details see the module documentation with the command:
In particular you should read the 'LIMITATIONS' section to understand the
circumstances under which data corruption might occur.
Copyright 2009-2014 Grant McLean "<firstname.lastname@example.org>"
This program is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.