Silly me

Friday, 18th July 2008

This might be obvious to most developers, but remember to utf-8 encode those akismet requests.

I’ve been wondering for the past few days why when submitting, let’s say, comments to the site with some characters in them that aren’t ascii, like pretty quotes (“”), that I kept getting errors thrown back at me.

'ascii' codec can't encode character u'\u201c' in position 78: ordinal not in range(128)

I understood that it had something to do with encoding, but where should I encode it? If I encoded the whole comment body to UTF-8 the admin would act up when trying to edit those comments it took a little to get to the following solution:

from django.utils.encoding import smart_str
is_spam = a.comment_check( smart_str(body, encoding='utf-8', strings_only=False, errors='strict'), akismet_data)

Only encode the Akismet spam check. Bah, I feel so stupid - but yeah, Django automatically takes care of Unicode input to the database but you gotta make sure to encode those Akismet checks since they are sent through HTTP.

In the end it was just bad debugging from my side - I should’ve checked that it was only Akismet that needed some help with the encoding.

0 comments

Write comment

Optional

No HTML is allowed, please use Markdown for text formatting.

Emails are never published, but are used to spice up your comment through Gravatar.