A frequent problem when it comes to internationalisation is proper handling of different charset. When you're using Java and Maven it is relatively easy to set up source encoding to UTF-8, but the frequent point of failure is in the SQL database.
If you use mySQL, and you have latin1 tables, but you should have UTF-8 instead, use this little script to convert from latin1 to UTF-8 :
mysqldump --user=${USER} --password=${PASS} --default-character-set=latin1 --skip-set-charset ${DATABASE} > dump.sql;
sed -r ‘s/latin1/utf8/g’ dump.sql > dump_utf.sql
mysql --user=${USER} --password=${PASS} --execute=”DROP DATABASE ${DATABASE}; CREATE DATABASE ${DATABASE} CHARACTER SET utf8 COLLATE utf8_general_ci;”
mysql --user=${USER} --password=${PASS} --default-character-set=utf8 dbname < dump_utf.sql
Generally speaking, don't hesitate to always put the --default-character-set=utf8 on all the mySQL commands you execute. Don't forget to add at the end of your jdbc connection url the following parameters : "useUnicode=true&characterEncoding=UTF-8" to ensure you connect using UTF-8.
comments powered by Disqus