Content-type: text/html Downes.ca ~ Stephen's Web ~ A brief guide to perl character encoding

Stephen Downes

Knowledge, Learning, Community

I spent most of the day learning more about, and messing with (yet again), character sets. For those (few) of you who are interested, this article dives deep into the subject. Basically: character encodings are set anywhere you deal with a string of characters: as content in your code, as something you input or output, as something you display on a web page, as something you store in a database. Perl, as a very old language, defaults a lot of the time to ISO-Latin-1 (a strict superset of ASCII). Sometimes other applications (such as VS Code) do as well. It depends. But numerous languages require Unicode Transformation Format 8-bit (UTF-8) to encode special characters. Unless your software have been told to use UTF-8, there's always the danger it will represent the string as two (or sometimes three, or sometimes even four, if it's (say) an emoji) Latin characters, not one (examples: one character: $, two characters: £, three characters: 한, four characters: 😄). Anyhow, after 25 years of using the Perl database interface (DBI) I learned I need to enable the mysql_enable_utf8mb4 flag attribute to handle all the utf8 encodings.

Today: 1 Total: 110 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Nov 21, 2024 06:42 a.m.

Canadian Flag Creative Commons License.

Force:yes