Quantcast
Channel: Topic Tag: MySQL | WordPress.org
Viewing all articles
Browse latest Browse all 5527

Southwest on "Trouble converting database containing special characters to UTF-8"

$
0
0

Summary:

I have a WordPress database that is filled with special characters encoded in different ways, and this is interfering with my attempts to convert it all to UTF-8.

Technical Specs:

WordPress version: 4.1.1

Hosted at NearlyFreeSpeech

Backstory:

I have a WordPress blog, currently running WordPress 4.1.1. I've had the same blog for close to ten years. A while ago--unfortunately, I can't remember when--I noticed that old posts were filled with weird character strings wherever I had a special character. For instance, I saw naïve instead of naïve. I corrected these glitches when I saw them, opening the post in the editor and replacing ï with ï.

When I saved the post, the new special character was displaying instead of the original garbled mess. I was happy.

I had some old posts that contained Japanese text, and the Japanese text was goofy here, too. Instead of 日本語, it would display 日本語. Only, when I tried to correct that in the post editor, the Japanese characters I entered would be replaced entirely with question marks upon hitting "save" or "publish".

I didn't replace all of those garbled strings, though; I still have some posts that show naïve instead of naïve (for example). This means that my blog currently contains:

  • mis-encoded special characters (naïve)
  • "proper" special characters (naïve)
  • mis-encoded Japanese (日本語)
  • question marks where I attempted to correct the mis-encoded Japanese (???)

What I want:

I want to make changes to my current database so that:

  1. Japanese (and other non-Latin characters) can be saved and displayed
  2. There are no strings of mis-encoded special characters
  3. All of the characters I "corrected" in the past still display

Steps taken:

Since I'm having problems on a WordPress installation that I've used for nearly a decade, I identified the problem as one with character encoding. I recognize that old versions of WordPress, until v2.2, used the latin1 character set and the latin1_swedish_ci collation. When I look at my database in phpMyAdmin, I see that I have numerous tables that are still collated as latin1_swedish_ci.

I've tried, in numerous ways, to convert my database to UTF-8. Here's what I've done:

  1. Use the UTF-8 Database Converter plugin
  2. Follow this guide to export the database and replace all instances of "latin1" with "UTF8"
  3. As described in the Codex, use a SQL script to convert tables and columns to blob, and then to UTF-8 text./li>
  4. As described in the Codex, use a SQL script to recast tables and columns to latin1 text, then to blob, and then to UTF-8 text

Each time I've done this, I've had the same result.

Actual results:

Upon creating the new database in mySQL, the following things happen:

  • "Mis-encoded" special characters (naïve) and Japanese text (日本語) convert into legible strings (naïve and 日本語)
  • "Corrected" special characters cause data loss. An entry in the database that originally read, "Today, it is 37° and sunny" will lose all of its content from the first special character onward, becoming "Today, it is 37". As far as I can tell, this affects all tables in the database, including wp_options.
  • Question marks where I attempted to correct the mis-encoded Japanese remain question marks.
  • When I create a new post containing special characters and Japanese, those characters are saved correctly to the database, and display without issue.

Help?

I know nothing about databases, but I really really want to believe there's a way for me to correct my database so that Japanese saves and displays correctly... without also screwing up hundreds of entries that I enthusiastically "corrected" earlier, or configuration options.

Any advice at all that can point me in the right direction would be greatly appreciated.

Thank you.


Viewing all articles
Browse latest Browse all 5527

Trending Articles