strtolower and UTF-8

Posted on September 27, 2010
by
Charset issues is something that always made me go mad. And since I’m french and I’ve designed many french websites, it’s something I couldn’t escape, thanks to all these special chars we have in our language :)

Well, today an issue came up with the strtolower function. Look at what follows:

$t1 = 'Expérience';
$t2 = strtolower($t1);
echo $t2; // echos 'expience'

See? It drops the two letters “ér”. No matter why and how it processes (for more details about UTF-8/ISO issues, please use google), what matters is that it totally screwed up my beautiful string.

On PHP.net, you can read this function uses the charset defined in the current locale. It means that whatever the encodage of your string is (UTF-8, ISO, …), even if you work with UTF-8 all along (database tables, database connection, page chars …), it will use the current locale charset anyway.

To this, I can see two options.

Option 1: you can administrate the web server that hosts your pages

It’s my personal case. If you can administrate the web server, good news: you won’t have to change a line of your PHP code. The procedure is to change the current locale for your server. To do so, two ways (from  Michael C. Schultheiss):

The Easy Way

  1. Install debconf (i.e. run apt-get update then apt-get install debconf, as root)
  2. Run dpkg-reconfigure locales as root
  3. Select an UTF-8 charset, such as en_EN.UTF-8, or fr_FR.UTF-8

The Hard Way

  1. Edit /etc/locale.gen as root. If /etc/locale.gen does not exist, create it. Example of a line: en_US.UTF-8 UTF-8
  2. Run /usr/sbin/locale-gen as root

The easy way worked well for me :) Then I just had to restart my PHP-CGI binary, and that’s all folks. Afterwards, my strtolower was working as expected

Option 2: you can’t touch the web server at all

Then you’re pretty fucked. You will have to change your code, and it’s dirty. But you don’t really have the choice, do you? Instead of using the strtolower function directly, wrap it into another function, such as:

function strtolower_utf8($string) {
$result = utf8_decode($string);
$result = strtolower($result);
$result = utf8_encode($result);
return $result;
}

Another solution is to use the mb_strtolower function, which acts like strtolower but let you specify a charset:

mb_strtolower($string, 'UTF-8');

Hope this helps :)

About the author

Cyril Mazur is a serial web entrepreneur with experience in various fields: online dating, forex & finance, blogging, online advertising... who enjoys building things that people like to use.

One comment

  1. Manu
    on September 27, 2010
    mb_strtolower($string); without specify a charset works for me ;)

Leave a Reply