vrijdag 18 maart 2011

Character encoding gotchas - what I needed to do to handle orders from China

Just when you think you've got your spring web application nicely under control your first customer from a Scandinavian country tries to place an order. And then you are hit by the evil character encoding monster. Your customer doesn't live in København but in K�benhavn and their last name is now MÃ¥rtensson instead of Mårtensson. Chances are your customers from China will be treated even worse by your web app.
No problem, you think, "Just need to set tomcat default encoding to UTF-8 and we're in worldwide business". Well if life were that easy us programmers would be out of jobs really quickly. Here's the list of tricks I needed to perform to make sure our expansion to Scandinavia and China could begin:

1. set tomcat default encoding
In conf/server.xml set the attribute URIEncoding="UTF-8" on the Context entries

2. in web.xml add a characterencoding filter

<filter>
<filter-name>characterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>

and map it to the requests that you need to be treated as UTF-8:

<filter-mapping>
<filter-name>characterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>


3. make sure your database is in utf-8
Especially when using MySQL you need to be aware that by default it creates databases in latin1 format. If, by accident, you didn't pay attention to this small detail when you first created your database, here's what you can do to change it afterwards:
alter database my_database default charset utf8 collate utf8_general_ci;
followed, just to be sure, by the following statement for all your tables:
alter table my_table convert to character set utf8 collate utf8_general_ci;

4. make sure your DB connection also uses UTF-8, all the time
We're using the DBCP connection pool, configured like this:

<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" p:connectionProperties="characterEncoding=UTF-8;useUnicode=true;"
...other properties...
</bean>


5. instruct freemarker to use UTF-8 when processing its templates

<bean id="freemarkerConfiguration" class="org.springframework.ui.freemarker.FreeMarkerConfigurationFactoryBean">
<property name="templateLoaderPath" value="classpath:/mailTemplates" />
<property name="freemarkerSettings">
<props>
<prop key="default_encoding">UTF-8</prop>
<prop key="output_encoding">UTF-8</prop>
</props>
</property>
</bean>


6. when using the Spring restTemplate, make it use UTF-8
We were using restTemplate to POST from one web app to another. By default, it uses ISO-8859-1 for its request parameters. This must be overridden like so:

<bean id="restTemplate" class="org.springframework.web.client.RestTemplate">
<property name="messageConverters">
<list>
<bean class="org.springframework.http.converter.StringHttpMessageConverter" />
<bean class="org.springframework.http.converter.FormHttpMessageConverter" >
<property name="charset" value="UTF-8" />
</bean>
</list>
</property>
</bean>


That was all it took!

Geen opmerkingen: