UTF-8 Support



The Dada Mail and UTF-8 FAQ


Introduction

Dada Mail can speak UTF-8 and almost expects that everything else around it does, too.

That means:


How To Have a Pleasant Experience

If you're installing Dada Mail for the first time, there's nothing you'll need to do, but below are some guidelines on how to keep your lists configured, so you continue to have a good experience.

If you're upgrading, make sure your configuration reflects the advice below.

It's heavily advised to keep everything in Dada Mail speaking UTF-8 without any real exceptions.

Config Variable: $HTML_CHARSET

By default, the config variable, $HTML_CHARSET is set to, UTF-8

Keep it that way, same case (UTF-8) - same everything.

Dada Mail is only tested with the charset set this way.

Advanced Sending Preferences

Default Character Set

Set this as, UTF-8 UTF-8

Default Plain Text/HTML Message Encoding

There's really only a few choices recommended for Dada Mail.

  • 8bit

    Should work.

  • quoted-printable

    If you have any trouble with 8bit, try quoted-printable. Because of the amount of time that Dada Mail creates, tweaks, formats and templates out email messages, the encoding can potentially get mucked up.

    This potential mucking-up is mitigated when Dada Mail uses quoted-printable encoding internally. This should be the default for email messages.

Encode Message Headers

Have this option checked.

SQL Backends

Database

PostgreSQL

Encoding for PostgreSQL databases is done when the database is created - make sure to create your database with a, UTF-8 encoding, like so:

 CREATE DATABASE dadamail WITH ENCODING 'UTF-8'

MySQL

Nothing you'll have to do, but do note that the schema that's shipped with Dada Mail for MySQL does set the character set and collation for UTF-8 in the tables where this is needed. If you're upgrading Dada Mail (from before 4.0.3), you may have to change the charset/encoding of your tables.

You may also want to double-check the version of the Perl MySQL driver (DBD::mysql) and make sure it has the, mysql_enable_utf8 flag support. It's difficult to tell from it's own docs, but having at least version 4.004 would be prudent.

SQLite

Nothing you'll have to do.

DBM Files

DBM Files have no encoding support, but Dada Mail knows this and compensates.

Schema

MySQL

The MySQL schemas are set to create tables with an encoding of, UTF-8

PostgreSQL

Nothing has changed.

SQLite

Nothing has changed.

Drivers

The current support SQL backends, mysql (MySQL), Pg (PostgreSQL) and SQLite all have different ways to somewhat, "enable" their UTF-8 support.

  • MySQL

    add,

     mysql_enable_utf8 => 1,

    has been added to the $DBI_PARAMS hashref.

  • PostgreSQL

    add,

     pg_enable_utf8 => 1,

    has been added to the $DBI_PARAMS hashref.

  • SQLite

    add,

     sqlite_unicode => 1

    has been added to the $DBI_PARAMS hashref.

No explicit encoding/decoding is done in Dada Mail when saving/retrieving data. Hopefully, the drivers are UTF-8-aware enough.

Plugins/Extensions

The Plugins and Extensions that come with Dada Mail have not been as thoroughly tested as the main program. There's still warts.

Dada Bridge

Dada Bridge has a unique position needing to handle a lot of different stuff thown at it and deal with it gracefully. Dada Mail does, in fact, handle, any realistic character set/encoding you throw at it, but Dada Mail will convert messages it receives to its internal format, before it resends it out to your list.

This means the encoding of your choice (8bit or quoted-printable) and the charset of your choice (as long as your charset is, UTF-8)


Upgrading

You are potentially going to have problems.

Its possible that, since List Settings were never decoded/encoded correctly in past versions, they'll show up the program (once you've upgrade) incorrectly. The easiest thing to do is to edit the mistakes and resave the information. For most of the program, you're going to have to manually export the information and re-import it with the correct encoding, sadly. Dada Mail will probably fail gracefully with old information, but it's possible that you'll see squiggly charaters, instead of what you want to see. There's nothing in Dada Mail that will stop this from happening. If you experience it (from old information), we're not going to count it as a bug, but rather a known issue.

MySQL Notes

"Specified key was too long; max key length is 1000 bytes", Problem (and Solution)

What's recommended here is to alter some of the fields in some of the tables that make up the MySQL schema.

The majority of the time, when a field is named, email it looks like this:

 email text(320),

Changing this field type from, text(320) to, varchar(80) will be both beneficial to this problem, as well as realistic.

The reason why the email fields were set to a size of, 320 is that the RFC says that email addresses can be this long. In reality, they never really are. Having a field type this long (especially when using for keys and indexes) tends to muck up things, when you have a utf-8 character set.

You'll see this field in the table schemas, as well as the table indexes. The advice is to delete the old table indexes, alter your tables for anything that had a field type of, text(320) and make it, varchar(80) and if you want, recreate the table indexes, using what's in the dada/extras/mysql_schema.sql file.

Here's some SQL that should remove those indexes:

 ALTER TABLE `dada_subscribers` DROP INDEX `dada_subscribers_all_index`;
 ALTER TABLE `dada_archives`    DROP INDEX `dada_subscribers_all_index`;

Here's some SQL that should work on changing the field types:

 ALTER TABLE `dada_bounce_scores`  CHANGE `email` `email` VARCHAR( 80 );
 ALTER TABLE `dada_profiles`       CHANGE `email` `email` VARCHAR( 80 );
 ALTER TABLE `dada_profile_fields` CHANGE `email` `email` VARCHAR( 80 );
 ALTER TABLE `dada_subscribers`    CHANGE `email` `email` VARCHAR( 80 );

Changing the character set of tables

The following tables should have a character set of, utf8 and a collation of utf8_bin

A quick-and-dirty solution to changing the character sets (if they aren't already in UTF-8) is to just use the following queries:

 ALTER TABLE `dada_settings` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
 
 ALTER TABLE `dada_profiles` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
 
 ALTER TABLE `dada_subscribers` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
 
 ALTER TABLE `dada_profile_fields` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
 
 ALTER TABLE `dada_profile_fields_attributes` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
 
 ALTER TABLE `dada_archives` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;

There are downsides to this approach - most notably - any UTF-8 encoded (erm, kinda-encoded) stuff is going to get double-decoded, when Dada Mail accesses and uses the information. More information:

http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/


Other Problems?

Please let us know via the Support Boards:

http://dadamailproject.com/support/boards/

Or the developer mailing list:

http://dadamailproject.com/cgi-bin/dada/mail.cgi/list/dadadev/

We would love to help you out.

Thanks!


See Also:

Loading