Grant wrote:
>>>> Email order receipt will not be send as UTF8 charset, so it's quite >>>> plausible that Swedish characters are messed up. Proper UTF8 support >>>> is still under development. >>>> >>>> Regards >>>> Racke >>> Will IC pass unicode characters properly to mysql? Should they be >>> displayed properly with [value]? >> As has been noted in this thread already, full unicode support is far >> from trivial, and is something that can be difficult to put in as an >> afterthought. If you are just concerned with the out-going emails >> (i.e., the site appears to function fine), you can try to use one of >> the following approaches: >> >> If you are using the [email] tag to send out your confirmation/order >> emails and you know that all of the data will be in the UTF-8 >> encoding, you can add explicit calls to the tag usertag to output >> mime headers as shown: >> >> [email <to, from, etc> extra="[tag op=mime arg=header]"] >> [tag op='mime' type='text/plain; charset="utf-8"'] >> <body content here> >> [/email] >> >> Another option (depending on how much you want to get your hands >> dirty) is to roll-your-own email sending usertag/routine in Perl >> which can harness both Encode and MIME::Lite to explicitly manage/ >> handle the coercion of data to the desired encoding. >> >> Please note that if you have non-ascii data that you want to appear >> in the email headers (to, from, subject, etc) you will need to >> explicitly encode the data using the MIME-Header encoding to handle >> this properly. >> >> Good Luck, >> >> David > > Thanks David. I'm not so much concerned with email being displayed > properly as I am with having the customer's shipping address. Maybe > the thing to do is use [tag] as you suggested to always send a > separate UTF-8 email to the admin containing just the shipping address > so we're sure to have that. We would need to run that UTF-8 address > through IC to ship though, so that may not do any good anyway. It > sounds like UTF-8 data is messed up as soon as it hits IC, but maybe > not. I'm still not clear on that. Check if UTF8 data is stored as such in the database, try to enter UTF8 strings in user account forms etc. Regards RackeAlright, thanks for everyone's help with this.
Grant,Did you sort out the last matter you mentioned, regarding getting UTF-8 data into MySQL?
IIRC, Interchange doesn't do much of anything with the incoming data (for a POST or whatever) as far as encoding is concerned; it simply assumes raw encoding on the filehandle between Interchange and the vlink/tlink script.
I believe this can work, provided that:* the actual web pages themselves, and the forms therein, are properly encoded with UTF-8, marked as such, and thus the browser submits data in UTF-8; * the client encoding on your DBD::mysql connection is set to raw, or whatever MySQL's equivalent encoding name for this is (I cannot remember; I seem to recall that MySQL may treat the latin1 encoding as simple raw encoding, in which case it wouldn't make a difference -- I moved to Postgres when I started dealing with any real UTF-8 data).
This is all just treating it as raw data, which isn't necessarily ideal. For one, if the data is coming in as raw byte strings (as outlined above), then regexes will give you funky behavior (for instance, the HTML entity encoding routines will appear to break your data). This is because in a raw string, each character represents an octet rather than an actual character, but Perl has no way of knowing that. So, what is in fact a valid high-bit sequence in UTF-8 (for representing any character outside the 7-bit ASCII range) will appear as a a series of odd characters in the raw string if you were to simply print the raw string to a non-UTF8 terminal. In order for regexes to work reliably, the raw data needs to be re-encoded as a UTF8 scalar, which requires messing with the Perl Encode module.
If you don't need to run regexes or HTML entity filters or whatever against your inbound data, then you could probably get by with raw encoding. Otherwise, this will probably bite you.
Assuming the data gets safely into MySQL as well-formed UTF8 (or assuming the data already exists in MySQL), pulling the data out is another matter. You'll need to look at the docs for DBD::mysql to see what it offers for UTF8 support, or to see if reading the data in from a database handle with the client encoding set to UTF8 would do the trick. Basically, UTF8 data coming out of the database will break in things like the table editor because of the same regular expression problem already mentioned; byte sequences that correspond to a single logical character are treated as separate characters and therefore semantically mismatch with the intentions of the regular expressions for things like HTML entity escaping. DBD::Pg (for Postgres) provides a setting for telling the driver to properly elevate text scalars to UTF8, which can address this issue; I'm not familiar with DBD::mysql's offerings for this sort of thing. If you can get the data returned from MySQL to be automatically elevated to UTF8 before Interchange touches it, then you may pull it off.
It's a complicated issue. Once you have one Perl scalar that is marked internally as UTF8, any scalar it combines with will be elevated on-demand to UTF8. So, in theory, having one UTF8 string coming from one column in one record of your database could cause the entire output buffer for a page to be elevated. But what about all your template pages and such, and their encodings? File encoding is a somewhat mysterious topic, since files aren't typically flagged as being in a particular encoding. You have to know what kinds of encoding you're using in every aspect of your application in order for this to work out in a controlled fashion.
And of course getting everything elevated to UTF8 will impose some kind of performance penalty. Probably not anything worth worrying about, I would guess, but it's best to be prepared.
As Jon said a little while ago, we're (that is, End Point) preparing a change set to improve UTF8 support, and we've been making good progress. Once it's ready and the IC core team has their say, it should help considerably. However, it will remain a complex issue that requires a lot of attention to detail and a significant headache, as it affects all layers of the software stack.
One final note: if you're working with UTF-8, then you will inevitably end up feeling a deep sense of loathing for CP1252, because it pops up *everywhere*. If your MySQL data is supposedly latin1, then it's almost certainly really CP1252. :)
Thanks. - Ethan -- Ethan Rowe End Point Corporation suppressed _______________________________________________ interchange-users mailing list suppressed http://www.icdevgroup.org/mailman/listinfo/interchange-users
Mail converted by mhonarc 2.6.15
This archive provided courtesy of JSW4.NET, Internet Hosting Services for Small Business.