Home > Articles

Character Set Support

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

3.2 Character Sets and Collations in MySQL

The MySQL server can support multiple character sets. To list the available character sets, use the SHOW CHARACTER SET statement:

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+
| Charset  | Description                 | Default collation   |
+----------+-----------------------------+---------------------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |
| dec8     | DEC West European           | dec8_swedish_ci     |
| cp850    | DOS West European           | cp850_general_ci    |
| hp8      | HP West European            | hp8_english_ci      |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |
| latin1   | ISO 8859-1 West European    | latin1_swedish_ci   |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |
...

The output actually includes another column that is not shown so that the example fits better on the page.

Any given character set always has at least one collation. It may have several collations.

To list the collations for a character set, use the SHOW COLLATION statement. For example, to see the collations for the latin1 ("ISO-8859-1 West European") character set, use this statement to find those collation names that begin with latin1:

mysql> SHOW COLLATION LIKE 'latin1%';
+-------------------+---------+----+---------+----------+---------+
| Collation         | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1  |  5 |         |          |       0 |
| latin1_swedish_ci | latin1  |  8 | Yes     | Yes      |       1 |
| latin1_danish_ci  | latin1  | 15 |         |          |       0 |
| latin1_german2_ci | latin1  | 31 |         | Yes      |       2 |
| latin1_bin        | latin1  | 47 |         | Yes      |       1 |
| latin1_general_ci | latin1  | 48 |         |          |       0 |
| latin1_general_cs | latin1  | 49 |         |          |       0 |
| latin1_spanish_ci | latin1  | 94 |         |          |       0 |
+-------------------+---------+----+---------+----------+---------+

The latin1 collations have the following meanings:

Collation

Meaning

latin1_bin

Binary according to latin1 encoding

latin1_danish_ci

Danish/Norwegian

latin1_general_ci

Multilingual

latin1_general_cs

Multilingual, case sensitive

latin1_german1_ci

German DIN-1

latin1_german2_ci

German DIN-2

latin1_spanish_ci

Modern Spanish

latin1_swedish_ci

Swedish/Finnish


Collations have these general characteristics:

  • Two different character sets cannot have the same collation.

  • Each character set has one collation that is the default collation. For example, the default collation for latin1 is latin1_swedish_ci.

  • There is a convention for collation names: They start with the name of the character set with which they are associated, they usually include a language name, and they end with _ci (case insensitive), _cs (case sensitive), _bin (binary), or _uca (Unicode Collation Algorithm, http://www.unicode.org/reports/tr10/).

  • + Share This
  • 🔖 Save To Your Account