Home > Articles

Character Set Support

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

3.10 Upgrading Character Sets from MySQL 4.0

Now, what about upgrading from older versions of MySQL? MySQL 4.1 is almost upward compatible with MySQL 4.0 and earlier for the simple reason that almost all the features are new, so there's nothing in earlier versions to conflict with. However, there are some differences and a few things to be aware of.

Most important: The "MySQL 4.0 character set" has the properties of both "MySQL 4.1 character sets" and "MySQL 4.1 collations." You will have to unlearn this. Henceforth, we will not bundle character set/collation properties in the same conglomerate object.

There is a special treatment of national character sets in MySQL 4.1. NCHAR is not the same as CHAR, and N'...' literals are not the same as '...' literals.

Finally, there is a different file format for storing information about character sets and collations. Make sure that you have reinstalled the /share/mysql/charsets/ directory containing the new configuration files.

If you want to start mysqld from a 4.1.x distribution with data created by MySQL 4.0, you should start the server with the same character set and collation. In this case you won't need to reindex your data.

There are two ways to do so:

shell> ./configure --with-charset=... --with-collation=...
shell> ./mysqld --default-character-set=... --default-collation=...

If you used mysqld with, for example, the MySQL 4.0 danish character set, you should now use the latin1 character set and the latin1_danish_ci collation:

shell> ./configure --with-charset=latin1 \
           --with-collation=latin1_danish_ci
shell> ./mysqld --default-character-set=latin1 \
           --default-collation=latin1_danish_ci

Use the table shown in Section 3.10.1, "4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs," to find old 4.0 character set names and their 4.1 character set/collation pair equivalents.

If you have non-latin1 data stored in a 4.0 latin1 table and want to convert the table column definitions to reflect the actual character set of the data, use the instructions in Section 3.10.2, "Converting 4.0 Character Columns to 4.1 Format."

3.10.1 4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs

ID

4.0 Character Set

4.1 Character Set

4.1 Collation

1

big5

big5

big5_chinese_ci

2

czech

latin2

latin2_czech_ci

3

dec8

dec8

dec8_swedish_ci

4

dos

cp850

cp850_general_ci

5

german1

latin1

latin1_german1_ci

6

hp8

hp8

hp8_english_ci

7

koi8_ru

koi8r

koi8r_general_ci

8

latin1

latin1

latin1_swedish_ci

9

latin2

latin2

latin2_general_ci

10

swe7

swe7

swe7_swedish_ci

11

usa7

ascii

ascii_general_ci

12

ujis

ujis

ujis_japanese_ci

13

sjis

sjis

sjis_japanese_ci

14

cp1251

cp1251

cp1251_bulgarian_ci

15

danish

latin1

latin1_danish_ci

16

hebrew

hebrew

hebrew_general_ci

17

win1251

(removed)

(removed)

18

tis620

tis620

tis620_thai_ci

19

euc_kr

euckr

euckr_korean_ci

20

estonia

latin7

latin7_estonian_ci

21

hungarian

latin2

latin2_hungarian_ci

22

koi8_ukr

koi8u

koi8u_ukrainian_ci

23

win1251ukr

cp1251

cp1251_ukrainian_ci

24

gb2312

gb2312

gb2312_chinese_ci

25

greek

greek

greek_general_ci

26

win1250

cp1250

cp1250_general_ci

27

croat

latin2

latin2_croatian_ci

28

gbk

gbk

gbk_chinese_ci

29

cp1257

cp1257

cp1257_lithuanian_ci

30

latin5

latin5

latin5_turkish_ci

31

latin1_de

latin1

latin1_german2_ci


3.10.2 Converting 4.0 Character Columns to 4.1 Format

Normally, the server runs using the latin1 character set by default. If you have been storing column data that actually is in some other character set that the 4.1 server now supports directly, you can convert the column. However, you should avoid trying to convert directly from latin1 to the "real" character set. This may result in data loss. Instead, convert the column to a binary column type, and then from the binary type to a non-binary type with the desired character set. Conversion to and from binary involves no attempt at character value conversion and preserves your data intact. For example, suppose that you have a 4.0 table with three columns that are used to store values represented in latin1, latin2, and utf8:

CREATE TABLE t
(
  latin1_col CHAR(50),
  latin2_col CHAR(100),
  utf8_col CHAR(150)
);

After upgrading to MySQL 4.1, you want to convert this table to leave latin1_col alone but change the latin2_col and utf8_col columns to have character sets of latin2 and utf8. First, back up your table, then convert the columns as follows:

ALTER TABLE t MODIFY latin2_col BINARY(100);
ALTER TABLE t MODIFY utf8_col BINARY(150);
ALTER TABLE t MODIFY latin2_col CHAR(100) CHARACTER SET latin2;
ALTER TABLE t MODIFY utf8_col CHAR(150) CHARACTER SET utf8;

The first two statements "remove" the character set information from the latin2_col and utf8_col columns. The second two statements assign the proper character sets to the two columns.

If you like, you can combine the to-binary conversions and from-binary conversions into single statements:

ALTER TABLE t
  MODIFY latin2_col BINARY(100),
  MODIFY utf8_col BINARY(150);
ALTER TABLE t
  MODIFY latin2_col CHAR(100) CHARACTER SET latin2,
  MODIFY utf8_col CHAR(150) CHARACTER SET utf8;
  • + Share This
  • 🔖 Save To Your Account