UTF-8 Support
- The Dada Mail and UTF-8 FAQ
The Dada Mail and UTF-8 FAQ
Introduction
Dada Mail can speak UTF-8 and almost expects that everything else around it does, too.
That means:
It treats everything it handles as UTF-8
Everything it returns is in UTF-8
How To Have a Pleasant Experience
If you're installing Dada Mail for the first time, there's nothing you'll need to do, but below are some guidelines on how to keep your lists configured, so you continue to have a good experience.
If you're upgrading, make sure your configuration reflects the advice below.
It's heavily advised to keep everything in Dada Mail speaking UTF-8
without any real exceptions.
Config Variable: $HTML_CHARSET
By default, the config variable, $HTML_CHARSET
is set to, UTF-8
Keep it that way, same case (UTF-8) - same everything.
Dada Mail is only tested with the charset set this way.
Mail Sending - Advanced Options
Default Character Set
Set this as, UTF-8 UTF-8
Default Plain Text/HTML Message Encoding
There's really only a few choices recommended for Dada Mail.
8bit
Should work.
quoted-printable
If you have any trouble with
8bit
, tryquoted-printable
. Because of the amount of time that Dada Mail creates, tweaks, formats and templates out email messages, the encoding can potentially get mucked up.This potential mucking-up is mitigated when Dada Mail uses
quoted-printable
encoding internally. This should be the default for email messages.
Encode Message Headers
Have this option checked.
SQL Backends
Database
PostgreSQL
Encoding for PostgreSQL databases is done when the database is created - make sure to create your database with a, UTF-8
encoding, like so:
CREATE DATABASE dadamail WITH ENCODING 'UTF-8'
MySQL
Nothing you'll have to do, but do note that the schema that's shipped with Dada Mail for MySQL does set the character set and collation for UTF-8 in the tables where this is needed. If you're upgrading Dada Mail (from before 4.0.3), you may have to change the charset/encoding of your tables.
You may also want to double-check the version of the Perl MySQL driver (DBD::mysql
) and make sure it has the, mysql_enable_utf8
flag support. It's difficult to tell from it's own docs, but having at least version 4.004 would be prudent.
SQLite
Nothing you'll have to do.
DBM Files
DBM Files have no encoding support, but Dada Mail knows this and compensates.
Schema
MySQL
The MySQL schemas are set to create tables with an encoding of, UTF-8
PostgreSQL
Nothing has changed.
SQLite
Nothing has changed.
Drivers
The current support SQL backends, mysql
(MySQL), Pg
(PostgreSQL) and SQLite
all have different ways to somewhat, "enable" their UTF-8 support.
MySQL
add,
mysql_enable_utf8 => 1,
has been added to the $DBI_PARAMS hashref.
PostgreSQL
add,
pg_enable_utf8 => 1,
has been added to the $DBI_PARAMS hashref.
SQLite
add,
sqlite_unicode => 1
has been added to the $DBI_PARAMS hashref.
No explicit encoding/decoding is done in Dada Mail when saving/retrieving data. Hopefully, the drivers are UTF-8-aware enough.
Plugins/Extensions
The Plugins and Extensions that come with Dada Mail have not been as thoroughly tested as the main program. There's still warts.
Bridge
Bridge has a unique position needing to handle a lot of different stuff thown at it and deal with it gracefully. Dada Mail does, in fact, handle, any realistic character set/encoding you throw at it, but Dada Mail will convert messages it receives to its internal format, before it resends it out to your list.
This means the encoding of your choice (8bit or quoted-printable) and the charset of your choice (as long as your charset is, UTF-8)
Upgrading
You are potentially going to have problems.
Its possible that, since List Settings were never decoded/encoded correctly in past versions, they'll show up the program (once you've upgrade) incorrectly. The easiest thing to do is to edit the mistakes and resave the information. For most of the program, you're going to have to manually export the information and re-import it with the correct encoding, sadly. Dada Mail will probably fail gracefully with old information, but it's possible that you'll see squiggly charaters, instead of what you want to see. There's nothing in Dada Mail that will stop this from happening. If you experience it (from old information), we're not going to count it as a bug, but rather a known issue.
MySQL Notes
"Specified key was too long; max key length is 1000 bytes", Problem (and Solution)
What's recommended here is to alter some of the fields in some of the tables that make up the MySQL schema.
The majority of the time, when a field is named, email
it looks like this:
email text(320),
Changing this field type from, text(320)
to, varchar(80)
will be both beneficial to this problem, as well as realistic.
The reason why the email fields were set to a size of, 320
is that the RFC says that email addresses can be this long. In reality, they never really are. Having a field type this long (especially when using for keys and indexes) tends to muck up things, when you have a utf-8 character set.
You'll see this field in the table schemas, as well as the table indexes. The advice is to delete the old table indexes, alter your tables for anything that had a field type of, text(320)
and make it, varchar(80)
and if you want, recreate the table indexes, using what's in the dada/extras/mysql_schema.sql
file.
Here's some SQL that should remove those indexes:
ALTER TABLE `dada_subscribers` DROP INDEX `dada_subscribers_all_index`;
ALTER TABLE `dada_archives` DROP INDEX `dada_subscribers_all_index`;
Here's some SQL that should work on changing the field types:
ALTER TABLE `dada_bounce_scores` CHANGE `email` `email` VARCHAR( 80 );
ALTER TABLE `dada_profiles` CHANGE `email` `email` VARCHAR( 80 );
ALTER TABLE `dada_profile_fields` CHANGE `email` `email` VARCHAR( 80 );
ALTER TABLE `dada_subscribers` CHANGE `email` `email` VARCHAR( 80 );
Changing the character set of tables
The following tables should have a character set of, utf8
and a collation of utf8_bin
dada_settings
dada_subscribers
dada_profiles
dada_profile_fields
dada_profile_fields_attributes
dada_archives
A quick-and-dirty solution to changing the character sets (if they aren't already in UTF-8) is to just use the following queries:
ALTER TABLE `dada_settings` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
ALTER TABLE `dada_profiles` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
ALTER TABLE `dada_subscribers` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
ALTER TABLE `dada_profile_fields` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
ALTER TABLE `dada_profile_fields_attributes` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
ALTER TABLE `dada_archives` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
There are downsides to this approach - most notably - any UTF-8 encoded (erm, kinda-encoded) stuff is going to get double-decoded, when Dada Mail accesses and uses the information. More information:
http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/
Other Problems?
Please let us know via the Support Boards:
http://dadamailproject.com/support/boards/
Or the developer mailing list:
http://dadamailproject.com/cgi-bin/dada/mail.cgi/list/dadadev/
We would love to help you out.
Thanks!
See Also:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
perlunitut - Perl Unicode Tutorial
perlunifaq - Perl Unicode FAQ