Home > Articles > Data > MySQL

  • Print
  • + Share This
  • 💬 Discuss

This chapter is from the book

3.2 MySQL Data Types

Each table in a database contains one or more columns. When you create a table using a CREATE TABLE statement, you specify a data type for each column. A data type is more specific than a general category such as “number” or “string.” For a column, the data type is the means by which you precisely characterize the kind of values the column may contain, such as SMALLINT or VARCHAR(32). This in turn determines how MySQL treats those values. For example, if you have numeric values, you could store them using either a numeric or string column, but MySQL will treat the values somewhat differently depending on what type you use. Each data type has several characteristics:

  • What kind of values it can represent.
  • How much space values take up.
  • Whether values are fixed-length (all values of the type take the same amount of space) or variable-length (the amount of space depends on the particular value being stored)
  • How MySQL compares and sorts values of the type
  • Whether the type can be indexed

The following discussion surveys MySQL’s data types briefly, and then describes in more detail the syntax for defining them and the properties that characterize each type, such as their range and storage requirements. The type specifications are shown as you use them in CREATE TABLE statements. Optional information is indicated by square brackets ([]). For example, the syntax MEDIUMINT[(M)] indicates that the maximum display width, specified as (M), is optional. On the other hand, for VARCHAR(M), the lack of brackets indicates that (M) is required.

3.2.1 Overview of Data Types

MySQL has numeric data types for integer, fixed-point, floating-point, and bit values, as shown in Table 3.2. Numeric types other than BIT can be signed or unsigned. A special attribute enables sequential integer or floating-point column values to be generated automatically, which is useful in applications that require a series of unique identification numbers.

Table 3.2. Numeric Data Types

Type Name

Meaning

TINYINT

A very small integer

SMALLINT

A small integer

MEDIUMINT

A medium-sized integer

INT

A standard integer

BIGINT

A large integer

DECIMAL

A fixed-point number

FLOAT

A single-precision floating-point number

DOUBLE

A double-precision floating-point number

BIT

A bit field

Table 3.3 shows the MySQL string data types. Strings can hold anything, even arbitrary binary data such as images or sounds. Strings can be compared according to whether they are case sensitive. In addition, you can perform pattern matching on strings. (Actually, in MySQL, you can even perform pattern matching on numeric types, but it’s more commonly done with string types.)

Table 3.3. String Data Types

Type Name

Meaning

CHAR

A fixed-length non-binary (character) string

VARCHAR

A variable-length non-binary string

BINARY

A fixed-length binary string

VARBINARY

A variable-length binary string

TINYBLOB

A very small BLOB (binary large object)

BLOB

A small BLOB

MEDIUMBLOB

A medium-sized BLOB

LONGBLOB

A large BLOB

TINYTEXT

A very small non-binary string

TEXT

A small non-binary string

MEDIUMTEXT

A medium-sized non-binary string

LONGTEXT

A large non-binary string

ENUM

An enumeration; each column value may be assigned one enumeration member

SET

A set; each column value may be assigned zero or more set members

Table 3.4 shows the MySQL date and types, where CC, YY, MM, DD, hh, mm, and ss represent century, year, month, day of the month, hour, minute, and second, respectively. For temporal values, MySQL provides types for dates and times (either combined or separate) and timestamps (a special type that enables you to track when changes were last made to a row). There is also a type for efficiently representing year values when you don’t need an entire date.

Table 3.4. Date and Time Data Types

Type Name

Meaning

DATE

A date value, in 'CCYY-MM-DD' format

TIME

A time value, in 'hh:mm:ss' format

DATETIME

A date and time value, in 'CCYY-MM-DD hh:mm:ss' format

TIMESTAMP

A timestamp value, in 'CCYY-MM-DD hh:mm:ss' format

YEAR

A year value, in CCYY or YY format

Table 3.5 shows the MySQL spatial data types. These represent various kinds of geometrical or geographical values.

Table 3.5. Spatial Data Types

Type Name

Meaning

GEOMETRY

A spatial value of any type

POINT

A point (a pair of X,Y coordinates)

LINESTRING

A curve (one or more POINT values)

POLYGON

A polygon

GEOMETRYCOLLECTION

A collection of GEOMETRY values

MULTILINESTRING

A collection of LINESTRING values

MULTIPOINT

A collection of POINT values

MULTIPOLYGON

A collection of POLYGON values

3.2.2 Specifying Column Types in Table Definitions

To create a table, issue a CREATE TABLE statement that includes a list of the columns in the table. Here’s an example that creates a table named mytbl with three columns named f, c, and i:

CREATE TABLE mytbl
(
  f FLOAT(10,4),
  c CHAR(15) NOT NULL DEFAULT 'none',
  i TINYINT UNSIGNED NULL
);

Each column has a name and a type, and attributes can be associated with the type. The syntax for defining a column is as follows:

col_name col_type [type_attributes] [general_attributes]

The name of the column, col_name, is always first in the definition and must be a legal identifier. The precise rules for identifier syntax are given in Section 2.2, “MySQL Identifier Syntax and Naming Rules.” Briefly summarized, column identifiers may be up to 64 characters long, and may consist of alphanumeric characters from the system character set, as well as the underscore and dollar sign characters (‘_’ and ‘$’). Keywords such as SELECT, DELETE, and CREATE normally are reserved and cannot be used. However, you can include other characters within an identifier or use a reserved word as an identifier if you are willing to put up with the bother of quoting it whenever you refer to it. To quote an identifier, enclose it within backtick (‘'’) characters. If the ANSI_QUOTES SQL mode is enabled, it is allowable to quote identifiers within double quote (‘"’) characters instead.

col_type indicates the column data type; that is, the specific kind of values the column can hold. Some type specifiers indicate the maximum length of the values you store in the column. For others, the length is implied by the type name. For example, CHAR(10) specifies an explicit length of 10 characters, whereas TINYTEXT values have an implicit maximum length of 255 characters. Some of the type specifiers allow you to indicate a maximum display width (how many characters to use for displaying values). For fixed-point and floating-point types, you can specify the number of significant digits and number of decimal places.

Following the column’s data type, you may specify optional type-specific attributes as well as more general attributes. These attributes function as type modifiers. They cause MySQL to change the way it treats column values in some way:

  • The type-specific attributes that are allowable depend on the data type you choose. For example, UNSIGNED and ZEROFILL are allowable only for numeric types, and CHARACTER SET and COLLATE are allowable only for non-binary string types.
  • The general attributes may be given for any data type, with a few exceptions. You may specify NULL or NOT NULL to indicate whether a column can hold NULL values. For most data types, you can specify a DEFAULT clause to define a default value for the column. Default value handling is described in Section 3.2.3, “Configuring Time Zone Support.”

If multiple column attributes are present, there are some constraints on the order in which they may appear. In general, you should be safe if you specify data type-specific attributes such as UNSIGNED or ZEROFILL before general attributes such as NULL or NOT NULL.

3.2.3 Specifying Column Default Values

For all but BLOB and TEXT types, spatial types, or columns with the AUTO_INCREMENT attribute, you can specify a DEFAULT def_value clause to indicate that a column should be assigned the value def_value when a new row is created that does not explicitly specify the column’s value. With some limited exceptions for TIMESTAMP columns, def_value must be a constant. It cannot be an expression or refer to other columns.

If a column definition includes no explicit DEFAULT clause and the column can take NULL values, its default value is NULL. Otherwise, the handling of a missing DEFAULT clause is version dependent.

From MySQL 5.0.2 on, the column is created without any DEFAULT clause. That is, it has no default value. This affects how the server handles the column if a new row that does not specify a value for the column is inserted into the table:

  • When strict SQL mode is not in effect, the column is set to the implicit default for its data type. (Implicit defaults are described shortly.)
  • When strict SQL mode is in effect, an error occurs if the table is transactional. The statement aborts and rolls back. For non-transactional tables, an error occurs and the statement aborts if the row is the first row inserted by the statement. If it is not the first row, you can elect to have the statement abort or to have the column set to its implicit default with a warning. The choice depends on which strict mode setting is in effect. See Section 3.3, “How MySQL Handles Invalid Data Values,” for details.

Before MySQL 5.0.2, MySQL defines the column with a DEFAULT clause that specifies the implicit default value.

The implicit default for a column depends on its data type:

  • For numeric columns, the default is 0, except for columns that have the AUTO_INCREMENT attribute. For AUTO_INCREMENT, the default is the next number in the column sequence.
  • For date and time types except TIMESTAMP, the default is the “zero” value for the type (for example, '0000-00-00' for DATE). For TIMESTAMP, the default is the current date and time for the first TIMESTAMP column in a table, and the “zero” value for any following TIMESTAMP columns. (TIMESTAMP defaults actually are more complex and are discussed in Section 3.2.6.2, “The TIMESTAMP Data Type.”)
  • For string types other than ENUM, the default is the empty string. For ENUM, the default is the first enumeration element. For SET, the default when the column cannot contain NULL actually is the empty set, but that is equivalent to the empty string.

You can use the SHOW CREATE TABLE statement to see which columns have a DEFAULT clause and what the default value is for those columns that have one.

3.2.4 Numeric Data Types

MySQL’s numeric data types group into three groups:

  • Exact-value types, which include the integer types and DECIMAL. Integer types are used for numbers that have no fractional part, such as 43, -3, 0, or -798432. You can use integer columns for data represented by whole numbers, such as weight to the nearest pound, height to the nearest inch, number of stars in a galaxy, number of people in a household, or number of bacteria in a petri dish. The DECIMAL type stores exact values that may have a fractional part, such as 3.14159, -.00273, or -4.78. This is a good data type for information such as monetary values. Integer and DECIMAL values are stored exactly as specified without rounding when possible, and calculations are exact.
  • The floating-point types are available in single precision (FLOAT) and double precision (DOUBLE). These types, like DECIMAL, are used for numbers that may have a fractional part, but they hold approximate-value numbers such as 3.9E+4 or -0.1E-100 that are subject to rounding. They can be used when exact precision is not required or for values that are so large that DECIMAL cannot represent them. Some types of data you might represent as floating-point values are average crop yield, distances, or unemployment rates.
  • The BIT type is used for storing bit-field values.

Values with a fractional part can be assigned to integer columns, but will be rounded using the “round half up” rule: If the fractional part is .5 or greater, the value is rounded away from zero to the next integer (up for positive values, down for negative values.) Conversely, integer values may be assigned to types that allow a fractional part. They are treated as having a fractional part of zero.

When you specify a number, you should not include commas as a separator. For example, 12345678.90 is legal, but 12,345,678.90 is not.

Table 3.6 shows the name and range of each numeric type, and Table 3.7 shows the amount of storage required for values of each type. M represents the maximum display width for integer types, the precision (number of significant digits) for floating-point and decimal types, and the number of bits for BIT. D represents the scale (number of digits following the decimal point) for types that have a fractional part; this is also known as the scale.

Table 3.6. Numeric Data Type Ranges

Type Specification

Range

TINYINT[(M)]

Signed values: −128 to 127 (−27 to 27−1); Unsigned values: 0 to 255 (0 to 28−1)

SMALLINT[(M)]

Signed values: −32768 to 32767 (−215 to 215−1); Unsigned values: 0 to 65535 (0 to 216−1)

MEDIUMINT[(M)]

Signed values: −8388608 to 8388607 (−223 to 223−1); Unsigned values: 0 to 16777215 (0 to 224−1)

INT[(M)]

Signed values: −2147483648 to 2147483647 (−231 to 231−1); Unsigned values: 0 to 4294967295 (0 to 232−1)

BIGINT[(M)]

Signed values: −9223372036854775808 to 9223372036854775807 (−263 to 263−1); Unsigned values: 0 to 18446744073709551615 (0 to 264−1)

DECIMAL([M[,D]])

Varies depending on M and D

FLOAT[(M,D)]

Minimum non-zero values: ±1.175494351E−38; Maximum non-zero values: ±3.402823466E+38

DOUBLE[(M,D)]

Minimum non-zero values: ±2.2250738585072014E−308; Maximum non-zero values: ±1.7976931348623157E+308

BIT[(M)]

0 to 2M−1

Table 3.7. Numeric Data Type Storage Requirements

Type Specification

Storage Required

TINYINT[(M)]

1 byte

SMALLINT[(M)]

2 bytes

MEDIUMINT[(M)]

3 bytes

INT[(M)]

4 bytes

BIGINT[(M)]

8 bytes

DECIMAL([M[,D]])

Varies depending on M, D

FLOAT[(M,D)]

4 bytes

DOUBLE[(M,D)]

8 bytes

BIT[(M)]

Varies depending on M

Storage for DECIMAL values depends on the number of digits on the left and right sides of the decimal point. For each side, 4 bytes are required for each multiple of nine digits, plus 1 to 4 bytes if there are any remaining digits. Storage per value is the sum of the left and right side storage.

A BIT(M) value requires approximately (M+7)/8 bytes.

3.2.4.1 Exact-Value Numeric Data Types

The exact-value data types include the integer types and the fixed-point DECIMAL type.

The integer types in MySQL are TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT. INTEGER is a synonym for INT. These types vary in the range of values they can represent and in the amount of storage space they require. (The larger the range, the more storage is required.) Integer columns can be defined as UNSIGNED to disallow negative values; this shifts the range for the column upward to begin at 0.

When you define an integer column, you can specify an optional display size M. If given, M should be an integer from 1 to 255. It represents the number of characters used to display values for the column. For example, MEDIUMINT(4) specifies a MEDIUMINT column with a display width of 4. If you define an integer column without an explicit width, a default width is assigned. The defaults are the lengths of the “longest” values for each type. Note that displayed values are not chopped to fit within M characters. If the printable representation of a particular value requires more than M characters, MySQL displays the full value.

The display size M for an integer column relates only to the number of characters used to display column values. It has nothing to do with the number of bytes of storage space required. For example, BIGINT values require 8 bytes of storage regardless of the display width. It is not possible to magically cut the required storage space for a BIGINT column in half by defining it as BIGINT(4). Nor does M have anything to do with the range of values allowed. If you define a column as INT(3), that doesn’t restrict it to a maximum value of 999.

DECIMAL is a fixed-point type: Values have a fixed number of decimals. The significance of this fact is that DECIMAL values are not subject to roundoff error the way that floating-point values are—a property that makes DECIMAL especially applicable for storing currency values.

NUMERIC and FIXED are synonyms for DECIMAL.

DECIMAL columns can be defined as UNSIGNED. Unlike the integer types, defining a DECIMAL type as UNSIGNED doesn’t shift the type’s range upward, it merely eliminates the negative end.

For a DECIMAL column, you may specify a maximum number of significant digits M and the number of decimal places D. These correspond to the concepts of “precision” and “scale” with which you may be familiar. The value of M should be from 1 to 65. The value of D should be from 0 to 30 and no greater than M.

M and D are optional. If D is omitted, it defaults to 0. If M is omitted as well, it defaults to 10. In other words, the following equivalences hold:

DECIMAL = DECIMAL(10) = DECIMAL(10,0)
DECIMAL(n) = DECIMAL(n,0)

The maximum possible range for DECIMAL is determined by the values of M and D. If you vary M and hold D fixed, the range becomes larger as M becomes larger (Table 3.8). If you hold M fixed and vary D, the range becomes smaller as D becomes larger (Table 3.9).

Table 3.8. How M Affects the Range of DECIMAL(M,D)

Type Specification

Range

DECIMAL(4,1)

−999.9 to 999.9

DECIMAL(5,1)

−9999.9 to 9999.9

DECIMAL(6,1)

−99999.9 to 99999.9

Table 3.9. How D Affects the Range of DECIMAL(M,D)

Type Specification

Range

DECIMAL(4,0)

−9999 to 9999

DECIMAL(4,1)

−999.9 to 999.9

DECIMAL(4,2)

−99.99 to 99.99

3.2.4.2 Approximate-Value Numeric Data Types

MySQL provides two floating-point types, FLOAT and DOUBLE, that hold approximate-value numbers. DOUBLE PRECISION is a synonym for DOUBLE. The REAL type is a synonym for DOUBLE by default. If the REAL_AS_DEFAULT SQL mode is enabled, REAL is a synonym for FLOAT.

Floating-point types can be defined as UNSIGNED, which eliminates the negative end of the type’s range.

For each floating-point type (just as for DECIMAL), you may specify a maximum number of significant digits M and the number of decimal places D. The value of M should be from 1 to 255. The value of D should be from 0 to 30 and no greater than M.

For FLOAT and DOUBLE, M and D are optional. If you omit both from the column definition, values are stored to the full precision allowed by your hardware.

FLOAT(p) syntax also is allowed. However, whereas p stands for the required number of bits of precision in standard SQL, it is treated differently in MySQL. p may range from 0 to 53 and is used only to determine whether the column stores single-precision or double-precision values. For p values from 0 to 24, the column is treated as single precision. For values from 25 to 53, the column is treated as double precision. That is, the column is treated as a FLOAT or DOUBLE with no M or D values.

3.2.4.3 The BIT Data Type

The BIT data type was introduced in MySQL 5.0.3 as a type for holding bit-field values. When you define a BIT column, you can specify an optional maximum width M that indicates the “width” of the column in bits. M should be an integer from 1 to 64. If omitted, M defaults to 1.

Values retrieved from BIT columns are not displayed in printable form by default. To display a printable representation of bit-field values, add zero or use CAST():

mysql> CREATE TABLE t (b BIT(3));  # 3-bit column; holds values 0 to 7
mysql> INSERT INTO t (b) VALUES(0),(b'11'),(b'101'),(b'111');
mysql> SELECT b+0, CAST(b AS UNSIGNED) FROM t;
+------+---------------------+
| b+0  | CAST(b AS UNSIGNED) |
+------+---------------------+
|    0 |                   0 |
|    3 |                   3 |
|    5 |                   5 |
|    7 |                   7 |
+------+---------------------+

The BIN() function is useful for displaying bit-field values or the result of computations on them in binary notation:

mysql> SELECT BIN(b), BIN(b & b'101'), BIN(b | b'101') FROM t;
+--------+-----------------+-----------------+
| BIN(b) | BIN(b & b'101') | BIN(b | b'101') |
+--------+-----------------+-----------------+
| 0      | 0               | 101             |
| 11     | 1               | 111             |
| 101    | 101             | 101             |
| 111    | 101             | 111             |
+--------+-----------------+-----------------+

3.2.4.4 Numeric Data Type Attributes

The UNSIGNED attribute disallows negative values. It can be used with all numeric types except BIT, but is most commonly used with integer types. Making an integer column UNSIGNED doesn’t change the “size” of the underlying data type’s range; it just shifts the range upward. Consider this table definition:

CREATE TABLE mytbl
(
  itiny   TINYINT,
  itiny_u TINYINT UNSIGNED
);

itiny and itiny_u both are TINYINT columns with a range of 256 values, but differ in the set of allowable values. The range of itiny is −128 to 127, whereas the range of itiny_u is shifted up, resulting in a range of 0 to 255.

UNSIGNED is useful for columns into which you plan to store information that doesn’t take on negative values, such as population counts or attendance figures. Were you to use a signed column for such values, you would use only half of the data type’s range. By making the column UNSIGNED, you effectively double your usable range. For example, if you use the column for sequence numbers, it will take twice as long to run out of values if you make it UNSIGNED.

You can also specify UNSIGNED for DECIMAL or floating-point columns, although the effect is slightly different from that for integer columns. The range does not shift upward; instead, the upper end remains unchanged and the lower end becomes zero (effectively cutting the range in half).

The SIGNED attribute is allowed for all numeric types that allow UNSIGNED. However, it has no effect because such types are signed by default. SIGNED serves simply to indicate explicitly in a column definition that the column allows negative values.

The ZEROFILL attribute can be specified for all numeric types except BIT. It causes displayed values for the column to be padded with leading zeros to the display width. You can use ZEROFILL when you want to make sure column values always display using a given number of digits. Actually, it’s more accurate to say “a given minimum number of digits” because values wider than the display width are displayed in full without being chopped. You can see this by issuing the following statements:

mysql> DROP TABLE IF EXISTS mytbl;
mysql> CREATE TABLE mytbl (my_zerofill INT(5) ZEROFILL);
mysql> INSERT INTO mytbl VALUES(1),(100),(10000),(1000000);
mysql> SELECT my_zerofill FROM mytbl;
+-------------+
| my_zerofill |
+-------------+
|       00001 |
|       00100 |
|       10000 |
|     1000000 |
+-------------+

Note that the final value is displayed in full, even though it is wider than the column’s display width.

If you specify the ZEROFILL attribute for a column, it automatically becomes UNSIGNED as well.

One other attribute, AUTO_INCREMENT, is allowed for integer or floating-point data types. Specify the AUTO_INCREMENT attribute when you want to generate a series of unique identifier values. When you insert NULL into an AUTO_INCREMENT column, MySQL generates the next sequence value and stores it in the column. Normally, unless you take steps to cause otherwise, AUTO_INCREMENT values begin at 1 and increase by 1 for each new row. The sequence may be affected if you delete rows from the table. That is, sequence values might be reused; it is storage engine-dependent whether this occurs.

You can have at most one AUTO_INCREMENT column in a table. The column should have the NOT NULL constraint, and it must be indexed. Generally, an AUTO_INCREMENT column is indexed as a PRIMARY KEY or UNIQUE index. Also, because sequence values always are positive, you normally define the column UNSIGNED as well. For example, you can define an AUTO_INCREMENT column in any of the following ways:

CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY);
CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE);
CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (i));
CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT, UNIQUE (i));

The first two forms shown specify the index information as part of the column definition. The second two specify the index as a separate clause of the CREATE TABLE statement. Using a separate clause is optional if the index includes only the AUTO_INCREMENT column. If you want to create a multiple-column index that includes the AUTO_INCREMENT column, you must use a separate clause. (For an example of this, see Section 3.4.2.1, “AUTO_INCREMENT for MyISAM Tables.”

It is always allowable to define an AUTO_INCREMENT column explicitly as NOT NULL, but if you omit NOT NULL, MySQL adds it automatically.

Section 3.4, “Working with Sequences,” discusses the behavior of AUTO_INCREMENT columns further.

Following the attributes just described, which are specific to numeric columns, you may specify NULL or NOT NULL. If you do not specify NULL or NOT NULL for a numeric column, it allows NULL by default.

You also may specify a default value using the DEFAULT attribute. The following table contains three INT columns, having default values of -1, 1, and NULL:

CREATE TABLE t
(
  i1 INT DEFAULT -1,
  i2 INT DEFAULT 1,
  i3 INT DEFAULT NULL
);

Section 3.2.3, “Specifying Column Default Values,” earlier in the chapter described the rules that MySQL uses for assigning a default value if a column definition includes no DEFAULT clause.

3.2.4.5 Choosing Numeric Data Types

When you choose a type for a numeric column, consider the range of values that you need to represent and choose the smallest type that will cover the range. Choosing a larger type wastes space, leading to tables that are unnecessarily large and that cannot be processed as efficiently as if you had chosen a smaller type. TINYINT is the best for integers if the range of values in your data is small, such as a person’s age or number of siblings. MEDIUMINT can represent millions of values and can be used for many more types of values, at some additional cost in storage space. BIGINT has the largest range of all but requires twice as much storage as the next smallest integer type (INT) and should be used only when really necessary. For floating-point values, DOUBLE takes twice as much space as FLOAT. Unless you need exceptionally high precision or an extremely large range of values, you can probably represent your data at half the storage cost by using FLOAT instead of DOUBLE.

Every numeric column’s range of values is determined by its type. If you attempt to insert a value that lies outside the column’s range, the result depends on whether strict SQL mode is enabled. If it is, an out of range value results in an error. If strict mode is not enabled, truncation occurs: MySQL clips the value to the appropriate endpoint of the range, uses the result, and generates a warning.

Value truncation occurs according to the range of the data type, not the display width. For example, a SMALLINT(3) column has a display width of 3 and a range from −32768 to 32767. The value 12345 is wider than the display width but within the range of the column, so it is inserted without clipping and retrieved as 12345. The value 99999 is outside the range, so it is clipped to 32767 when inserted. Subsequent retrievals return the value 32767.

For fixed-point or floating-point columns, if values are stored that have more digits in the fractional part than allowed by the column specification, rounding occurs. If you store 1.23456 in a FLOAT(8,1) column, the result is 1.2. If you store the same value in a FLOAT(8,4) column, the result is 1.2346. This means you should define such columns with a sufficient number of decimals to store values as precise as you require. If you need accuracy to thousandths, don’t define a type with only two decimal places.

3.2.5 String Data Types

MySQL provides several data types for storing string values. Strings are often used for text values like these:

'N. Bertram, et al.'
'Pencils (no. 2 lead)'
'123 Elm St.'
'Monograph Series IX'

But strings are actually “generic” types in a sense because you can use them to represent any value. For example, you can use binary string types to hold binary data, such as images, sounds, or compressed gzip output.

Table 3.10 shows all the types provided by MySQL for defining string-valued columns, and the maximum size and storage requirements of each type. M represents the maximum length of column values (in bytes for binary strings and characters for non-binary strings), and L represents the actual length of a given value in bytes. w is the number of bytes required for the widest character in the character set. The BLOB and TEXT types each have several variants that are distinguished by the maximum size of values they can hold.

Table 3.10. String Data Types

Type Specification

Maximum Size

Storage Required

BINARY[(M)]

M bytes

M bytes

VARBINARY(M)

M bytes

L + 1 or 2 bytes

CHAR[(M)]

M characters

M × w bytes

VARCHAR(M)

M characters

L + 1 or 2 bytes

TINYBLOB

28−1 bytes

L + 1 bytes

BLOB

216−1 bytes

L + 2 bytes

MEDIUMBLOB

224−1 bytes

L + 3 bytes

LONGBLOB

232−1 bytes

L + 4 bytes

TINYTEXT

28−1 characters

L + 1 bytes

TEXT

216−1 characters

L + 2 bytes

MEDIUMTEXT

224−1 characters

L + 3 bytes

LONGTEXT

232−1 characters

L + 4 bytes

ENUM('value1','value2',...)

65,535 members

1 or 2 bytes

SET('value1','value2',...)

64 members

1, 2, 3, 4, or 8 bytes

Some types hold binary strings (byte strings) and others hold non-binary strings (character strings). Thus, maximum size as listed in Table 3.10 is given in number of bytes per value for binary string types and number of characters for non-binary string types. For example, BINARY(20) holds 20 bytes, whereas CHAR(20) holds 20 characters (which requires more than 20 bytes for multi-byte characters). The differences between byte and character semantics for binary and non-binary strings are characterized in Section 3.1.2, “String Values.” Each of the binary string types for byte strings has a corresponding non-binary type for character strings, as shown in Table 3.11.

Table 3.11. Corresponding Binary and Non-Binary String Types

Binary String Type

Non-Binary String Type

BINARY

CHAR

VARBINARY

VARCHAR

BLOB

TEXT

Each of the non-binary string types, as well as ENUM and SET, can be assigned a character set and collation. Different columns can be assigned different character sets. Character set assignment is discussed in Section 3.2.5.5, “String Data Type Attributes.”

BINARY and CHAR are fixed-length string types. For columns of these types, MySQL allocates the same amount of storage for every value and pads those that are shorter than the column length. Padding uses zero (0x00) bytes for BINARY and spaces for CHAR. Because CHAR(M) must be able to represent the largest possible string in the column’s character set, each column requires M × w bytes, where w is the number of bytes required for the widest character in the character set. For example, ujis characters take from one to three bytes, so CHAR(20) must be allocated 60 bytes in case a value requires three bytes for all 20 characters.

Other string types are variable-length. The amount of storage taken by a value varies from row to row and depends on the length of the values actually stored in the column. This length is represented by L in the table for variable-length types. The extra bytes required in addition to L are the number of bytes needed to store the length of the value. MySQL handles variable-length values by storing both the content of the value and a prefix that records its length. These extra prefix “length bytes” are treated as an unsigned integer. There is a correspondence between a variable-length type’s maximum length, the number of length bytes required for that type, and the range of the unsigned integer type that uses the same number of bytes. For example, a MEDIUMBLOB value may be up to 224−1 bytes long and requires 3 bytes to record the length. The 3-byte integer type MEDIUMINT has a maximum unsigned value of 224−1. That’s not a coincidence.

The length prefix for VARBINARY and VARCHAR requires 1 byte if the maximum length of column values in bytes is less than 256. The requirement is 2 bytes otherwise.

Values for all string types except ENUM and SET are stored as a sequence of bytes and interpreted either as bytes or characters depending on whether the type holds binary or non-binary strings. Values that are too long to store are chopped to fit. (In strict mode, an error occurs instead unless the chopped characters are spaces.) But string types range from very small to very large, with the largest type able to hold nearly 4GB of data, so you should be able to find something long enough to avoid truncation of your information. (The effective maximum column size actually is imposed by the maximum packet size of the client/server communication protocol, which is 1MB by default.)

For ENUM and SET, the column definition includes a list of legal string values, but ENUM and SET values are stored internally as numbers, as detailed later in Section 3.2.5.4, “The ENUM and SET Data Types.” Attempting to store a value other than those in the list causes the value to be converted to '' (the empty string) unless strict mode is enabled. In strict mode, an error occurs instead.

3.2.5.1 The CHAR and VARCHAR Data Types

The CHAR and VARCHAR string types hold non-binary strings, and thus are associated with a character set and collation.

The primary differences between CHAR and VARCHAR lie in whether they have a fixed or variable length, and in how trailing spaces are treated:

  • CHAR is a fixed-length type, whereas VARCHAR is a variable-length type.
  • Values retrieved from CHAR columns have trailing spaces removed. For a CHAR(M) column, values that are shorter than M characters are padded to a length of M when stored, but trailing spaces are stripped when the values are retrieved. As of MySQL 5.1.20, you can enable the PAD_CHAR_TO_FULL_LENGTH SQL mode to cause retrieved CHAR column values to retain trailing spaces.
  • For a VARCHAR(M) column, trailing spaces are retained both for storage and retrieval.

CHAR columns can be defined with a maximum length M from 0 to 255. M is optional for CHAR and defaults to 1 if missing. Note that CHAR(0) is legal. A CHAR(0) column can be used to represent on/off values if you allow it to be NULL. Values in such a column can have one of two values: NULL or the empty string. A CHAR(0) column takes very little storage space in the table—only a single bit.

The syntactically allowable range of M for VARCHAR(M) is 1 to 65,535, but the effective maximum number of characters is less than 65,535 because MySQL has a maximum row size of 65,535 bytes. That has certain implications:

  • A long VARCHAR requires two length bytes, which count against the row size.
  • Use of multi-byte characters reduces the number of characters that can fit within the maximum row size.
  • Inclusion of other columns in the table reduces the amount of space for the VARCHAR column in the row.

Keep in mind two general principles when choosing between CHAR and VARCHAR data types:

  • If your values all are M characters long, a VARCHAR(M) column actually will use more space than a CHAR(M) column due to the extra byte or bytes required to record the length of values. On the other hand, if your values vary in length, VARCHAR columns have the advantage of taking less space. A CHAR(M) column always takes M characters, even if it is empty or NULL.
  • If you’re using MyISAM tables and your values don’t vary much in length, CHAR is a better choice than VARCHAR because the MyISAM storage engine can process fixed-length rows more efficiently than variable-length rows. See Section 5.3, “Choosing Data Types for Efficient Queries.”

3.2.5.2 The BINARY and VARBINARY Data Types

The BINARY and VARBINARY types are similar to CHAR and VARCHAR, with the following differences:

  • CHAR and VARCHAR are non-binary types that store characters and have a character set and collation. Comparisons are based on the collating sequence.
  • BINARY and VARBINARY are binary types that store bytes and have no character set or collation. Comparisons are based on numeric byte values.

The rules for handling of padding for BINARY values are as follows:

  • As of MySQL 5.0.15, short values are padded with 0x00 bytes. Nothing is stripped on retrieval.
  • Before MySQL 5.0.15, short values are padded with spaces. Trailing spaces are stripped on retrieval.

For VARBINARY, no padding occurs when values are stored and no stripping occurs for retrieval.

3.2.5.3 The BLOB and TEXT Data Types

A “BLOB” is a binary large object—basically, a container that can hold anything you want to toss into it, and that you can make about as big as you want. In MySQL, the BLOB type is really a family of types (TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB). These types are identical except in the maximum amount of information they can hold (see Table 3.10). BLOB columns store binary strings. They are useful for storing data that may grow very large or that may vary widely in size from row to row. Some examples are compressed data, encrypted data, images, and sounds.

MySQL also has a family of TEXT types (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT). These are similar to the corresponding BLOB types, except that TEXT types store non-binary strings rather than binary strings. That is, they store characters rather than bytes, and are associated with a character set and collation. This results in the general differences between binary and non-binary strings that were described earlier in Section 3.1.2, “String Values.” For example, in comparison operations, BLOB values are compared in byte units and TEXT values are compared in character units using the column collation.

BLOB or TEXT columns sometimes can be indexed, depending on the storage engine you’re using:

  • The MyISAM and InnoDB storage engines support BLOB and TEXT indexing. However, you must specify a prefix size to be used for the index. This avoids creating index entries that might be huge and thereby defeat any benefits to be gained by that index. The exception is that prefixes are not used for FULLTEXT indexes on TEXT columns. FULLTEXT searches are based on the entire content of the indexed columns, so any prefix you specify is ignored.
  • MEMORY tables do not support BLOB and TEXT indexes because the MEMORY engine does not support BLOB or TEXT columns at all.

BLOB or TEXT columns may require special care:

  • Due to the typical large variation in the size of BLOB and TEXT values, tables containing them are subject to high rates of fragmentation if many deletes and updates are done. If you’re using a MyISAM table to store BLOB or TEXT values, you can run OPTIMIZE TABLE periodically to reduce fragmentation and maintain good performance. See Chapter 5, “Query Optimization,” for more information.
  • The max_sort_length system variable influences BLOB and TEXT comparison and sorting operations. Only the first max_sort_length bytes of each value are used. (For TEXT columns that use a multi-byte character set, this means that comparisons might involve fewer than max_sort_length characters.) If this causes a problem with the default max_sort_length value of 1024, you might want to increase the value before performing comparisons.
  • If you’re using very large values, you might need to configure the server to increase the value of the max_allowed_packet parameter. See Section 12.6.2, “General-Purpose System Variables,” for more information. You will also need to increase the packet size for any client that wants to use very large values. The mysql and mysqldump clients support setting this value directly using a startup option.

3.2.5.4 The ENUM and SET Data Types

ENUM and SET are special string data types that allow only values chosen from a fixed (predefined) list of allowable strings. The primary difference between them is that ENUM column values must consist of exactly one member of the list of values, whereas SET column values may contain any or all members of the list. In other words, ENUM is used for values that are mutually exclusive, whereas SET allows multiple choices from the list.

The ENUM data type defines an enumeration. ENUM columns may be assigned values consisting of exactly one member chosen from a list of values specified at table-creation time. You can define an enumeration to have up to 65,535 members. Enumerations are commonly used to represent category values. For example, values in a column defined as ENUM('N','Y') can be either 'N' or 'Y'. Or you can use ENUM for such things as available sizes or colors for a product or for answers to multiple-choice questions in a survey or questionnaire where a single response must be selected:

employees ENUM('less than 100','100-500','501-1500','more than 1500')
color ENUM('red','green','blue','black')
size ENUM('S','M','L','XL','XXL')
vote ENUM('Yes','No','Undecided')

If you are processing selections from a Web page that includes mutually exclusive radio buttons, you can use an ENUM to represent the options from which a visitor to your site chooses. For example, if you run an online pizza-ordering service, ENUM columns can be used to represent the type of crust and size of pizza a customer orders:

crust ENUM('thin','regular','pan style','deep dish')
size ENUM('small','medium','large')

If enumeration categories represent counts, it’s important to choose your categories properly when you create the enumeration. For example, when recording white blood cell counts from a laboratory test, you might group the counts into categories like this:

wbc ENUM('0-100','101-300','>300')

If any given test result is provided as an exact count, you can record the value in the wbc column using the category into which the count falls. But you cannot recover the original count if you decide you want to convert the column from a category-based ENUM to an integer column based on exact count. If you really need the exact count, use an integer column instead, and group integer values into categories when you retrieve them using the CASE construct. For example, if wbc is defined as an integer column, you can select it as a category like this:

SELECT CASE WHEN wbc <= 100 THEN '0-100'
            WHEN wbc <= 300 THEN '101-300'
            ELSE '>300' END AS 'wbc category'
FROM ...

The SET type is similar to ENUM in the sense that when you create a SET column, you specify a list of legal set members. But unlike ENUM, each column value may consist of any number of members from the set. The set may have up to 64 members. You can use a SET when you have a fixed set of values that are not mutually exclusive as they are in an ENUM column. For example, you might use a SET to represent options available for an automobile:

SET('luggage rack','cruise control','air conditioning','sun roof')

Then particular SET values would represent those options actually ordered by customers:

'cruise control,sun roof'
'luggage rack,air conditioning'
'luggage rack,cruise control,air conditioning'
'air conditioning'
''

The final value shown (the empty string) means that the customer ordered no options. This is a legal value for any SET column.

SET column definitions are written as a list of individual strings separated by commas to indicate what the set members are. A SET column value, on the other hand, is written as a single string. If the value consists of multiple set members, the members are separated within the string by commas. This means you shouldn’t use a string containing a comma as a SET member.

Other uses for SET columns might be for representing information such as patient diagnoses or results from selections on Web pages. For a diagnosis, there may be a standard list of symptoms to ask a patient about, and the patient might exhibit any or all of them:

SET('dizziness','shortness of breath','cough')

For an online pizza service, the Web page for ordering could have a set of check boxes for ingredients that a customer wants as toppings on a pizza, several of which might be chosen:

SET('pepperoni','sausage','mushrooms','onions','ripe olives')

The way you define the legal value list for an ENUM or SET column is significant in several ways:

  • The list determines the possible legal values for the column, as has already been discussed.
  • If an ENUM or SET column has a collation that is not case sensitive, you can insert legal values in any lettercase and they will be recognized. However, the lettercase of the strings as specified in the column definition determines the lettercase of column values when they are retrieved later. For example, if you have an ENUM('Y','N') column and you store 'y' and 'n' in it, the values are displayed as 'Y' and 'N' when you retrieve them. If the column has a case sensitive or binary collation, you must insert values using exactly the lettercase used in the column definition or the values will not be recognized as legal. On the other hand, you can have distinct elements that differ only in lettercase, something that is not true when you use a collation that is not case sensitive.
  • The order of values in an ENUM definition is the order used for sorting. The order of values in a SET definition also determines sort order, although the relationship is more complicated because column values may contain multiple set members.
  • When MySQL displays a SET value that consists of multiple set members, the order in which it lists the members within the value is determined by the order in which they appear in the SET column definition.

ENUM and SET are classified as string types because enumeration and set members are specified as strings when you create columns of these types. However, the ENUM and SET types actually have a split personality: The members are stored internally as numbers and you can work with them as such. This means that ENUM and SET types are more efficient than other string types because they often can be handled using numeric operations rather than string operations. It also means that ENUM and SET values can be used in either string or numeric contexts. Finally, ENUM and SET columns can cause confusion if you use them in string context but expect them to behave as numbers, or vice versa.

MySQL sequentially numbers ENUM members in the column definition beginning with 1. (The value 0 is reserved for the error member, which is represented in string form by the empty string.) The number of enumeration values determines the storage size of an ENUM column. One byte can represent 256 values and two bytes can represent 65,536 values. (Compare this to the ranges of the one-byte and two-byte integer types TINYINT UNSIGNED and SMALLINT UNSIGNED.) Thus, counting the error member, the maximum number of enumeration members is 65,536 and the storage size depends on whether there are more than 256 members. You can specify a maximum of 65,535 (not 65,536) members in the ENUM definition because MySQL reserves a spot for the error member as an implicit member of every enumeration. When you assign an illegal value to an ENUM column, MySQL assigns the error member. (In strict mode, an error occurs instead.)

The following example demonstrates that you can retrieve ENUM values in either string or numeric form (which shows the numeric ordering of enumeration members and also that the NULL value has no number in the ordering):

mysql> CREATE TABLE e_table (e ENUM('jane','fred','will','marcia'));
mysql> INSERT INTO e_table
    -> VALUES('jane'),('fred'),('will'),('marcia'),(NULL);
mysql> SELECT e, e+0, e+1, e*3 FROM e_table;
+--------+------+------+------+
| e      | e+0  | e+1  | e*3  |
+--------+------+------+------+
| jane   |    1 |    2 |    3 |
| fred   |    2 |    3 |    6 |
| will   |    3 |    4 |    9 |
| marcia |    4 |    5 |   12 |
| NULL   | NULL | NULL | NULL |
+--------+------+------+------+

You can compare ENUM members either by name or number:

mysql> SELECT e FROM e_table WHERE e='will';
+------+
| e    |
+------+
| will |
+------+
mysql> SELECT e FROM e_table WHERE e=3;
+------+
| e    |
+------+
| will |
+------+

It is possible to define the empty string as a legal enumeration member, but this will only cause confusion. The string is assigned a non-zero numeric value, just as any other member listed in the definition. However, an empty string also is used for the error member that has a numeric value of 0, so it would correspond to two internal numeric element values. In the following example, assigning the illegal enumeration value 'x' to the ENUM column causes the error member to be assigned. This is distinguishable from the empty string member listed in the column definition only when retrieved in numeric form:

mysql> CREATE TABLE t (e ENUM('a','','b'));
mysql> INSERT INTO t VALUES('a'),(''),('b'),('x');
mysql> SELECT e, e+0 FROM t;
+------+------+
| e    | e+0  |
+------+------+
| a    |    1 |
|      |    2 |
| b    |    3 |
|      |    0 |
+------+------+

In strict mode, assigning the illegal value 'x' causes an error and no value is stored.

The numeric representation of SET columns is a little different from that for ENUM columns. Set members are not numbered sequentially. Instead, members correspond to successive individual bits in the SET value. The first set member corresponds to bit 0, the second member corresponds to bit 1, and so on. In other words, the numeric values of SET members all are powers of two. The empty string corresponds to a numeric SET value of 0.

SET values are stored as bit values. Eight set members per byte can be stored this way, so the storage size for a SET column is determined by the number of set members, up to a maximum of 64 members. SET values take 1, 2, 3, 4, or 8 bytes for set sizes of 1 to 8, 9 to 16, 17 to 24, 25 to 32, and 33 to 64 members.

The representation of a SET as a set of bits is what allows a SET value to consist of multiple set members. Any combination of bits can be turned on in the value, so the value may consist of any combination of the strings in the SET definition that correspond to those bits.

The following example shows the relationship between the string and numeric forms of a SET column. It displays the numeric value in both decimal and binary form:

mysql> CREATE TABLE s_table (s SET('table','lamp','chair','stool'));
mysql> INSERT INTO s_table
    -> VALUES('table'),('lamp'),('chair'),('stool'),(''),(NULL);
mysql> SELECT s, s+0, BIN(s+0) FROM s_table;
+-------+------+----------+
| s     | s+0  | BIN(s+0) |
+-------+------+----------+
| table |    1 | 1        |
| lamp  |    2 | 10       |
| chair |    4 | 100      |
| stool |    8 | 1000     |
|       |    0 | 0        |
| NULL  | NULL | NULL     |
+-------+------+----------+

If you assign to the column s a value of 'lamp,stool', MySQL stores it internally as 10 (binary 1010) because 'lamp' has a value of 2 (bit 1) and 'stool' has a value of 8 (bit 3).

When you assign values to SET columns, the substrings don’t need to be listed in the same order that you used when you defined the column. However, when you retrieve the value later, members are displayed within the value in definition order. Also, if you assign to a SET column a value containing substrings that are not listed as set members, those strings drop out and the column is assigned a value consisting of the remaining substrings. When you retrieve the value later, the illegal substrings will not be present.

If you assign a value of 'chair,couch,table' to the column s in s_table, two things happen:

  • 'couch' drops out because it’s not a member of the set. This occurs because MySQL determines which bits correspond to each substring of the value to be assigned and turns them on in the stored value. 'couch' corresponds to no bit and is ignored.
  • When you retrieve the value later, it appears as 'table,chair'. On retrieval, MySQL constructs the string value from the numeric value by scanning the bits in order, which automatically reorders the substrings to the order used when the column was defined. This behavior also means that if you specify a set member more than once in a value, it will appear only once when you retrieve the value. If you assign 'lamp,lamp,lamp' to a SET column, it will be simply 'lamp' when retrieved.

In strict mode, use of an illegal SET member causes an error instead and the value is not stored. In the preceding example, assigning a value containing 'couch' would cause an error and the assignment would fail.

The fact that MySQL reorders members in a SET value means that if you search for values using a string, you must list members in the proper order. If you insert 'chair,table' and then search for 'chair,table' you won’t find the row; you must look for it as 'table,chair'.

Sorting and indexing of ENUM and SET columns is done according to the internal (numeric) values of column values. The following example might appear to be incorrect because the values are not displayed in alphanumeric order:

mysql> SELECT e FROM e_table ORDER BY e;
+--------+
| e      |
+--------+
| NULL   |
|        |
| jane   |
| fred   |
| will   |
| marcia |
+--------+

You can better see what’s going on by retrieving both the string and numeric forms of the ENUM values:

mysql> SELECT e, e+0 FROM e_table ORDER BY e;
+--------+------+
| e      | e+0  |
+--------+------+
| NULL   | NULL |
|        |    0 |
| jane   |    1 |
| fred   |    2 |
| will   |    3 |
| marcia |    4 |
+--------+------+

If you have a fixed set of values and you want them to sort in a particular order, you can exploit the ENUM sorting properties: Represent the values as an ENUM column in a table and list the enumeration values in the column definition in the order that you want them to be sorted. Suppose that you have a table representing personnel for a sports organization, such as a football team, and that you want to sort output by personnel position so that it comes out in a particular order, such as coaches, assistant coaches, quarterbacks, running backs, receivers, linemen, and so on. Define the column as an ENUM and list the enumeration elements in the order that you want to see them. Then column values automatically will come out in that order for sort operations.

For cases where you want an ENUM to sort in normal lexical order, you can convert the column to a non-ENUM string by using CAST() and sorting the result:

mysql> SELECT CAST(e AS CHAR) AS e_str FROM e_table ORDER BY e_str;
+--------+
| e_str  |
+--------+
| NULL   |
|        |
| fred   |
| jane   |
| marcia |
| will   |
+--------+

CAST() doesn’t change the displayed values, but has the effect in this statement of performing an ENUM-to-string conversion that alters their sorting properties so they sort as strings.

3.2.5.5 String Data Type Attributes

The attributes unique to the string data types are CHARACTER SET (or CHARSET) and COLLATE for designating a character set and collating order. You can specify these as options for the table itself to set its defaults, or for individual columns to override the table defaults. (Actually, each database also has a default character set and collation, as does the server itself. These defaults sometimes come into play during table creation, as we’ll see later.)

The CHARACTER SET and COLLATE attributes apply to the CHAR, VARCHAR, TEXT, ENUM, and SET data types. They do not apply to the binary string data types (BINARY, VARBINARY, and BLOB), because those types contain byte strings, not character strings.

When you specify the CHARACTER SET and COLLATE attributes, whether at the column, table, or database level, the following rules apply:

  • The character set must be one that the server supports. To display the available character sets, use SHOW CHARACTER SET.
  • For a definition that includes both CHARACTER SET and COLLATE, the character set and collation must be compatible. For example, with a character set of latin2, you could use a collation of latin2_croatian_ci, but not latin1_bin. To display the collations for each character set, use SHOW COLLATION.
  • For a definition with CHARACTER SET but without COLLATE, the character set’s default collation is used.
  • For a definition with COLLATE but without CHARACTER SET, the character set is determined from the first part of the collation name.

To see how these rules apply, consider the following statement. It creates a table that uses several character sets:

CREATE TABLE mytbl
(
  c1  CHAR(10),
  c2  CHAR(40) CHARACTER SET latin2,
  c3  CHAR(10) COLLATE latin1_german1_ci,
  c4  BINARY(40)
) CHARACTER SET utf8;

The resulting table has utf8 as its default character set. No COLLATE table option is given, so the default table collation is the default utf8 collation (which is utf8_general_ci). The c1 column definition contains no CHARACTER SET or COLLATE attributes of its own, so the table defaults are used for it. The table-level character set and collation are not used for c2, c3, and c4: c2 and c3 have their own character set information, and c4 has a binary string type, so the character set attributes do not apply. For c2, the collation is latin2_general_ci, the default collation for latin2. For c3, the character set is latin1, as implied by the collation name latin1_german1_ci.

To see character set information for an existing table, use SHOW CREATE TABLE:

mysql> SHOW CREATE TABLE mytbl\G
*************************** 1. row ***************************
       Table: mytbl
Create Table: CREATE TABLE `mytbl` (
  `c1` char(10) default NULL,
  `c2` char(40) character set latin2 default NULL,
  `c3` char(10) character set latin1 collate latin1_german1_ci default NULL,
  `c4` binary(40) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

If SHOW CREATE TABLE does not display a column character set, it is the same as the table default character set. If it does not display a column collation, it is the default collation for the character set.

You can also add the FULL keyword to SHOW COLUMNS to cause it to display collation information (from which character sets can be derived):

mysql> SHOW FULL COLUMNS FROM mytbl;
+-------+------------+-------------------+------+-----+---------+...
| Field | Type       | Collation         | Null | Key | Default |...
+-------+------------+-------------------+------+-----+---------+...
| c1    | char(10)   | utf8_general_ci   | YES  |     | NULL    |...
| c2    | char(40)   | latin2_general_ci | YES  |     | NULL    |...
| c3    | char(10)   | latin1_german1_ci | YES  |     | NULL    |...
| c4    | binary(40) | NULL              | YES  |     | NULL    |...
+-------+------------+-------------------+------+-----+---------+...

The preceding discussion mentions column and table character set assignments, but character sets actually can be designated at the column, table, database, or server level. When MySQL processes a character column definition, it determines which character set to use for it by trying the following rules in order:

  1. If the column definition includes a character set, use that set. (This includes the case where only a COLLATE attribute is present, because that implies which character set to use.)
  2. Otherwise, if the table definition includes a table character set option, use that set.
  3. Otherwise, use the database character set as the table default character set, which also becomes the column character set. If the database was never assigned a character set explicitly (for example, if it was created prior to MySQL 4.1), the database character set is taken from the server character set.

In other words, MySQL searches up through the levels at which character sets may be specified until it finds a character set defined, and then uses that for the column. The database always has a default character set, so the search process is guaranteed to terminate at the database level even if no character set is specified explicitly at any of the lower levels.

The character set name binary is special. If you assign the binary character set to a non-binary string column, that is equivalent to defining the column using the corresponding binary string type. The following pairs of column definitions each show two equivalent definitions:

c1 CHAR(10) CHARACTER SET binary
c1 BINARY(10)

c2 VARCHAR(10) CHARACTER SET binary
c2 VARBINARY(10)

c3 TEXT CHARACTER SET binary
c3 BLOB

If you specify CHARACTER SET binary for a binary string column, it is ignored because the type already is binary. If you specify CHARACTER SET binary for an ENUM or SET, it is used as is.

If you assign the binary character set as a table option, it applies to each string column that does not have any character set information specified in its own definition.

MySQL provides some shortcut attributes for defining character columns:

  • The ASCII attribute is shorthand for CHARACTER SET latin1.
  • The UNICODE attribute is shorthand for CHARACTER SET ucs2.
  • If you use the BINARY attribute for a non-binary string column, ENUM, or SET, it is shorthand for specifying the binary collation of the column’s character set. For example, assuming a table default character set of latin1, these definitions are equivalent:

    c1 CHAR(10) BINARY
    c2 CHAR(10) CHARACTER SET latin1 BINARY
    c3 CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin

    If you specify the BINARY attribute for a binary string column, it is ignored because the type already is binary.

The general attributes NULL or NOT NULL can be specified for any of the string types. If you don’t specify either of them, NULL is the default. However, defining a string column as NOT NULL does not prevent you from storing an empty string (that is, '') in the column. In MySQL, an empty value is different from a missing value, so don’t make the mistake of thinking that you can force a string column to contain non-empty values by defining it NOT NULL. If you require string values to be non-empty, that is a constraint you must enforce from within your own applications.

You can also specify a default value using a DEFAULT clause for all string data types except the BLOB and TEXT types. Section 3.2.3, “Specifying Column Default Values,” earlier in this chapter, described the rules that MySQL uses for assigning a default value if a column definition includes no DEFAULT clause.

3.2.5.6 Choosing String Data Types

When you choose a data type for a string column, consider the following questions:

Are values represented as character or binary data? For character data, non-binary string types are most appropriate. For binary data, use a binary string type.

Do you want comparisons to be lettercase-aware? If so, use one of the non-binary string types, because those store characters and are associated with a character set and collation.

The case sensitivity of non-binary string values for comparison and sorting purposes is controlled by the collation that you assign to them. If you want string values to be regarded equal regardless of lettercase, use a case-insensitive collation. Otherwise, use either a binary or case-sensitive collation. A binary collation compares character units using the numeric character codes. A case-sensitive collation compares character units using a specific collating order, which need not correspond to character code order. In either case, the lowercase and uppercase versions of a given character are considered distinct for comparisons. Suppose that 'mysql', 'MySQL', and 'MYSQL' are strings in the latin1 character set. They are all considered the same if compared using a case-insensitive collation such as latin1_swedish_ci, but as three different strings if compared using the binary latin1_bin collation or case-sensitive latin1_general_cs collation.

If you want to use a string column both for case-sensitive and not case-sensitive comparisons, use a collation that corresponds to the type of comparison you will perform most often. For comparisons of the other type, apply the COLLATE operator to change the collation. For example, if mycol is a CHAR column that uses the latin1 character set, you can assign it the latin1_swedish_ci collation to perform case-insensitive comparisons by default. The following comparison is not case sensitive:

mycol = 'ABC'

For those times when you need case-sensitive comparisons, use the latin1_general_cs or latin1_bin collation. The following comparisons are case sensitive (it doesn’t matter whether you apply the COLLATE operator to the left hand string or the right hand string):

mycol COLLATE latin1_general_cs = 'ABC'
mycol COLLATE latin1_bin = 'ABC'
mycol = 'ABC' COLLATE latin1_general_cs
mycol = 'ABC' COLLATE latin1_bin

Do you want to minimize storage requirements? If so, use a variable-length type, not a fixed-length type.

Will the allowable column values always be chosen from a fixed set of legal values? If so, ENUM or SET might be a good choice.

ENUM also can be useful if you have a limited set of string values that you want to sort in some non-lexical order. Sorting of ENUM values occurs according to the order in which you list the enumeration values in the column definition, so you can make the values sort in any order you want.

Are trailing pad values significant? If values must be retrieved exactly as they are stored without addition or removal of trailing spaces (or 0x00 bytes, for binary data types) during storage or retrieval, use a TEXT or VARCHAR column for non-binary strings and a BLOB or VARBINARY column for binary strings. This factor is important also if you are storing compressed, hashed, or encrypted values computed in such a way that the encoding method might result in trailing spaces. Table 3.12 shows how trailing padding is handled for storage and retrieval operations for various string data types.

Table 3.12. String Data Type Pad-Value Handling

Data Type

Storage

Retrieval

Result

CHAR

Padded

Stripped

Retrieved values have no trailing padding

BINARY

Padded

No action

Retrieved values have no trailing padding

VARCHAR, VARBINARY

No action

No action

Trailing padding is not changed

TEXT, BLOB

No action

No action

Trailing padding is not changed

As of MySQL 5.1.20, you can enable the PAD_CHAR_TO_FULL_LENGTH SQL mode to cause retrieved CHAR column values to retain trailing spaces. For BINARY columns prior to MySQL 5.0.15, short values are padded with spaces for storage and trailing spaces are stripped for retrieval.

3.2.6 Date and Time Data Types

MySQL provides several data types for storing temporal values: DATE, TIME, DATETIME, TIMESTAMP, and YEAR. Table 3.13 shows these types and the range of legal values for each type. The storage requirements for each type are shown in Table 3.14.

Table 3.13. Date and Time Data Types

Type Specification

Range

DATE

'1000-01-01' to '9999-12-31'

TIME

'-838:59:59' to '838:59:59'

DATETIME

'1000-01-01 00:00:00' to '9999-12-31 23:59:59'

TIMESTAMP

'1970-01-01 00:00:01' to '2038-01-19 03:14:07'

YEAR[(M)]

1901 to 2155 for YEAR(4), and 1970 to 2069 for YEAR(2)

Table 3.14. Date and Time Data Type Storage Requirements

Type Specification

Storage Required

DATE

3 bytes

TIME

3 bytes

DATETIME

8 bytes

TIMESTAMP

4 bytes

YEAR

1 byte

Each date and time type has a “zero” value that is stored when you insert a value that is illegal for the type, as shown in Table 3.15. The “zero” value also is the default value for date and time columns that are defined with the NOT NULL constraint. Depending on the SQL mode, illegal values might be treated as errors and rejected; see Section 3.3, “How MySQL Handles Invalid Data Values.”

Table 3.15. Date and Time Type “Zero” Values

Type Specification

Zero Value

DATE

'0000-00-00'

TIME

'00:00:00'

DATETIME

'0000-00-00 00:00:00'

TIMESTAMP

'0000-00-00 00:00:00'

YEAR

0000

MySQL always represents dates with the year first, in accordance with the standard SQL and ISO 8601 specifications. For example, December 3, 2008, is represented as '2008-12-03'. However, MySQL does allow some leeway in how you can specify input dates. For example, it will convert two-digit year values to four digits, and you need not supply a leading zero digit for month and day values that are less than 10. However, you must specify the year first and the day last. Formats that you may be more used to, such as '12/3/99' or '3/12/99', will not be interpreted as you might intend. In such cases, you might find the STR_TO_DATE() function useful for converting strings in non-ISO format to ISO-format dates. Section 3.2.6.5, “Working with Date and Time Values,” further discusses the date interpretation rules that MySQL uses.

For combined date and time values, it is allowable to specify a ‘T’ character rather than a space between the date and time (for example, '2008-12-31T12:00:00').

Time or combined date and time values can include a microseconds part following the time, consisting of a decimal point and up to six digits. (For example, '12:30:15.5' or '2008-06-15 10:30:12.000045'.) However, current support in MySQL for microsecond values is only partial. Some temporal functions use them, but you cannot store a temporal value that includes a microseconds part in a table; the microseconds part is discarded.

For retrieval, you can display date and time values in a variety of formats by using the DATE_FORMAT() and TIME_FORMAT() functions.

3.2.6.1 The DATE, TIME, and DATETIME Data Types

The DATE and TIME types hold date and time values. The DATETIME type holds combined date and time values. The formats for the three types of values are 'CCYY-MM-DD', 'hh:mm:ss', and 'CCYY-MM-DD hh:mm:ss', where CC, YY, MM, DD hh, mm, and ss represent century, year, month, day, hour, minute, and second, respectively.

For the DATETIME type, the date and time parts are both required; if you assign a DATE value to a DATETIME column, MySQL automatically adds a time part of '00:00:00'. Conversions work in the other direction as well. If you assign a DATETIME value to a DATE or TIME column, MySQL discards the part that is irrelevant:

mysql> CREATE TABLE t (dt DATETIME, d DATE, t TIME);
mysql> INSERT INTO t (dt,d,t) VALUES(NOW(), NOW(), NOW());
mysql> SELECT * FROM t;
+---------------------+------------+----------+
| dt                  | d          | t        |
+---------------------+------------+----------+
| 2007-09-14 10:26:26 | 2007-09-14 | 10:26:26 |
+---------------------+------------+----------+

MySQL treats the time in DATETIME and TIME values slightly different. For DATETIME, the time part represents a time of day and must be in the range from '00:00:00' to '23:59:59'. A TIME value, on the other hand, represents elapsed time—that’s why the range shown in Table 3.13 for TIME columns includes values larger than '23:59:59' and negative values.

One thing to watch out when inserting TIME values into a table is that if you use a “short” (not fully qualified) value, it may not be interpreted as you expect. For example, you’ll probably find that if you insert '30' and '12:30', into a TIME column, one value will be interpreted from right to left and the other from left to right, resulting in stored values of '00:00:30' and '12:30:00'. If you consider '12:30' to represent a value of “12 minutes, 30 seconds,” you should specify it in fully qualified form as '00:12:30'.

3.2.6.2 The TIMESTAMP Data Type

TIMESTAMP is a temporal data type that stores combined date and time values. (The word “timestamp” might appear to connote time only, but that is not the case.) The TIMESTAMP data type has the special properties noted in the following discussion.

TIMESTAMP columns have a range of values from '1970-01-01 00:00:01' to '2038-01-19 03:14:07'. The range is tied to Unix time, where the first day of 1970 is “day zero,” also known as “the epoch.” Values are stored as a four-byte number of seconds since the epoch. The beginning of 1970 determines the lower end of the TIMESTAMP range. The upper end of the range corresponds to the maximum four-byte value for Unix time.

Values are stored in Universal Coordinated Time (UTC). When you store a TIMESTAMP value, the server converts it from the connection’s time zone to UTC. When you retrieve the value later, the server converts it back from UTC to the connection’s time zone, so you see the same value that you stored. However, if another client connects to the server, uses a different time zone, and retrieves the value, it will see the value adjusted to its own time zone. In fact, you can see this effect within a single connection if you change your own time zone:

mysql> CREATE TABLE t (ts TIMESTAMP);
mysql> SET time_zone = '+00:00';   # set time zone to UTC
mysql> INSERT INTO t VALUES('2000-01-01 00:00:00');
mysql> SELECT ts FROM t;
+---------------------+
| ts                  |
+---------------------+
| 2000-01-01 00:00:00 |
+---------------------+
mysql> SET time_zone = '+03:00';   # advance time zone 3 hours
mysql> SELECT ts FROM t;
+---------------------+
| ts                  |
+---------------------+
| 2000-01-01 03:00:00 |
+---------------------+

These examples specify time zones using values given as a signed offset in hours and minutes relative to UTC. It is also possible to use named time zones such as 'Europe/Zurich' if the server time zone tables have been set up as described in Section 12.9.1, “Configuring Time Zone Support.”

TIMESTAMP has automatic initialization and update properties. You can designate any single TIMESTAMP column in a table to have either or both of these properties:

  • “Automatic initialization” means that for new rows the column is set to the current timestamp if you omit it from the INSERT statement or set it to NULL.
  • “Automatic update” means that for existing rows the column is updated to the current timestamp when you change any other column. Setting a column to its current value does not count as a change. You must set it to a different value for automatic update to occur.

In addition, if you set any TIMESTAMP column to NULL, its value is set to the current timestamp. You can defeat this by defining the column with the NULL attribute to allow NULL values to be stored in the column.

Only one TIMESTAMP column in a table can be designated to have automatic properties. You cannot have automatic initialization for one TIMESTAMP column and automatic update for another. Nor can you have automatic initialization for multiple columns, or automatic update for multiple columns.

The syntax for specifying a TIMESTAMP column is as follows, assuming a column name of ts:

ts TIMESTAMP [DEFAULT constant_value] [ON UPDATE CURRENT_TIMESTAMP]

The DEFAULT and ON UPDATE attributes can be given in any order, if both are given. The default value can be CURRENT_TIMESTAMP or a constant value such as 0 or a value in 'CCYY-MM-DD hh:mm:ss' format. Synonyms for CURRENT_TIMESTAMP are CURRENT_TIMESTAMP() and NOW(); they’re all interchangeable in a TIMESTAMP definition.

To have one or both of the automatic properties for the first TIMESTAMP column in a table, you can define it using various combinations of the DEFAULT and ON UPDATE attributes:

  • With DEFAULT CURRENT_TIMESTAMP, the column has automatic initialization. It also has automatic update if ON UPDATE CURRENT_TIMESTAMP is given.
  • With neither attribute, MySQL defines the column with both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP.
  • With a DEFAULT constant_value attribute that specifies a constant value, the column does not have automatic initialization. It does have automatic update if ON UPDATE CURRENT_TIMESTAMP is given.
  • Without DEFAULT but with ON UPDATE CURRENT_TIMESTAMP, the default value is 0 and the column has automatic update.

To use automatic initialization or update for a TIMESTAMP column other than the first one, you must explicitly define the first one with a DEFAULT constant_value attribute and without ON UPDATE CURRENT_TIMESTAMP. Then you can use DEFAULT CURRENT_TIMESTAMP or ON UPDATE CURRENT_TIMESTAMP (or both) with any other single TIMESTAMP column.

If you want to defeat automatic initialization or update for a TIMESTAMP column, set it explicitly to the desired value for insert or update operations. For example, you can prevent an update from changing the column by setting the column to its current value.

TIMESTAMP column definitions also can include NULL or NOT NULL. The default is NOT NULL. Its effect is that when you explicitly set the column to NULL, MySQL sets it to the current timestamp. (This is true both for inserts and updates.) If you specify NULL, setting the column to NULL stores NULL rather than the current timestamp.

If you want a table to contain a column that is set to the current timestamp for new rows and that remains unchanged thereafter, you can achieve that two ways:

  • Use a TIMESTAMP column declared as follows without an ON UPDATE attribute:

    ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP

    When you create a new row, initialize the column to the current timestamp by setting it to NULL or by omitting it from the INSERT statement. The column will retain its value for subsequent updates unless you change it explicitly.

  • Use a DATETIME column. When you create a row, initialize the column to NOW(). Whenever you update the row thereafter, leave the column alone.

If you want a table to contain columns for both a time-created value and a last-modified value, use two TIMESTAMP columns:

CREATE TABLE t
(
  t_created  TIMESTAMP DEFAULT 0,
  t_modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP
                       ON UPDATE CURRENT_TIMESTAMP
  ... other columns ...
);

When you insert a new row, set both TIMESTAMP columns to NULL to set them to the insertion timestamp. When you update an existing row, leave both columns alone; t_modified will be updated automatically to the modification timestamp if any other columns change value.

3.2.6.3 The YEAR Data Type

YEAR is a one-byte data type intended for efficient representation of year values. A YEAR column definition may include a specification for a display width M, which should be either 4 or 2. If you omit M from a YEAR definition, the default is 4. YEAR(4) has a range of 1901 to 2155. YEAR(2) has a range of 1970 to 2069, but only the last two digits are displayed. You can use the YEAR type when you want to store date information but only need the year part of the date, such as year of birth, year of election to office, and so forth. When you do not require a full date value, YEAR is much more space-efficient than other date types.

TINYINT has the same storage size as YEAR (one byte), but not the same range. To cover the same range of years as YEAR by using an integer type, you would need a SMALLINT, which takes twice as much space. If the range of years you need to represent coincides with the range of the YEAR type, YEAR is more space-efficient than SMALLINT. Another advantage of YEAR over an integer column is that MySQL converts two-digit values into four-digit values for you using MySQL’s usual year-guessing rules. For example, 97 and 14 become 1997 and 2014. However, be aware that inserting the numeric value 00 into a four-digit YEAR column results in the value 0000 being stored, not 2000. If you want a value of 00 to convert to 2000, specify it in string form as '0' or '00'.

3.2.6.4 Date and Time Data Type Attributes

The following remarks apply to all temporal types except TIMESTAMP:

  • The general attributes NULL or NOT NULL may be specified. If you don’t specify either of them, NULL is the default.
  • You can also specify a default value using a DEFAULT clause. Section 3.2.3, “Specifying Column Default Values,” described the rules that MySQL uses for assigning a default value if a column definition includes no DEFAULT clause.

    Note that because default values must be constants, you cannot use a function such as NOW() to supply a value of “the current date and time” as the default for a DATETIME column. To achieve that result, set the column value explicitly to NOW() whenever you create a new row, or else consider using a TIMESTAMP column instead. Alternatively, set up a trigger that initializes the column to the appropriate value; see Section 4.3, “Triggers.”

TIMESTAMP columns are special; the default for the first such column in a table is the current date and time, and the “zero” value for any others. However, the full set of rules governing default values is more complex. See Section 3.2.6.2, “The TIMESTAMP Data Type,” for details.

3.2.6.5 Working with Date and Time Values

MySQL tries to interpret input values for date and time columns in a variety of formats, including both string and numeric forms. Table 3.16 shows the allowable formats for each of the date and time types.

Table 3.16. Date and Time Type Input Formats

Type

Allowable Formats

DATETIME,

'CCYY-MM-DD hh:mm:ss'

TIMESTAMP

'YY-MM-DD hh:mm:ss'

'CCYYMMDDhhmmss'

'YYMMDDhhmmss'

CCYYMMDDhhmmss

YYMMDDhhmmss

DATE

'CCYY-MM-DD'

'YY-MM-DD'

'CCYYMMDD'

'YYMMDD'

CCYYMMDD

YYMMDD

TIME

'hh:mm:ss'

'hhmmss'

hhmmss

YEAR

'CCYY'

'YY'

CCYY

YY

MySQL interprets formats that have no century part (CC) using the rules described in Section 3.2.6.6, “Interpretation of Ambiguous Year Values.” For string formats that include delimiter characters, you don’t have to use ‘-’ for dates and ‘:’ for times. Any punctuation character may be used as the delimiter. Interpretation of values depends on context, not on the delimiter. For example, although times are typically specified using a delimiter of ‘:’, MySQL won’t interpret a value containing ‘:’ as a time in a context where a date is expected. In addition, for the string formats that include delimiters, you need not specify two digits for month, day, hour, minute, or second values that are less than 10. The following are all equivalent:

'2012-02-03 05:04:09'
'2012-2-03 05:04:09'
'2012-2-3 05:04:09'
'2012-2-3 5:04:09'
'2012-2-3 5:4:09'
'2012-2-3 5:4:9'

MySQL may interpret values with leading zeros in different ways depending on whether they are specified as strings or numbers. The string '001231' will be seen as a six-digit value and interpreted as '2000-12-31' for a DATE, and as '2000-12-31 00:00:00' for a DATETIME. On the other hand, the number 001231 will be seen as 1231 after the parser gets done with it and then the interpretation becomes problematic. This is a case where it’s best to supply a string value '001231', or else use a fully qualified value if you are using numbers (that is, 20001231 for DATE and 200012310000 for DATETIME).

In general, you may freely assign values between the DATE, DATETIME, and TIMESTAMP types, although there are certain restrictions to keep in mind:

  • If you assign a DATETIME or TIMESTAMP value to a DATE, the time part is discarded.
  • If you assign a DATE value to a DATETIME or TIMESTAMP, the time part of the resulting value is set to zero ('00:00:00').
  • The types have different ranges. In particular, TIMESTAMP has a more limited range (1970 to 2038); so, for example, you cannot assign a pre-1970 DATETIME value to a TIMESTAMP and expect reasonable results. Nor can you assign values that are far in the future to a TIMESTAMP.

MySQL provides many functions for working with date and time values. See Appendix C for more information.

3.2.6.6 Interpretation of Ambiguous Year Values

For all date and time types that include a year part (DATE, DATETIME, TIMESTAMP, YEAR), MySQL handles values that contain two-digit years by converting them to four-digit years:

  • Year values from 00 to 69 become 2000 to 2069
  • Year values from 70 to 99 become 1970 to 1999

You can see the effect of these rules most easily by storing different two-digit values into a YEAR column and then retrieving the results. This also demonstrates something you should take note of:

mysql> CREATE TABLE y_table (y YEAR);
mysql> INSERT INTO y_table VALUES(68),(69),(99),(00),('00');
mysql> SELECT * FROM y_table;
+------+
| y    |
+------+
| 2068 |
| 2069 |
| 1999 |
| 0000 |
| 2000 |
+------+

Observe that 00 is converted to 0000, not to 2000. That’s because, as a number, 00 is the same as 0, and is a perfectly legal value for the YEAR type. If you insert a numeric zero, that’s what you get. To get 2000 using a value that does not contain the century, insert the string '0' or '00'. You can make sure that MySQL sees a string and not a number by inserting YEAR values using CAST(value AS CHAR) to produce a string result uniformly regardless of whether value is a string or a number.

Keep in mind that the rules for converting two-digit to four-digit year values provide only a reasonable guess. There is no way for MySQL to be certain about the meaning of a two-digit year when the century is unspecified. MySQL’s conversion rules are adequate for many situations, but if they don’t produce the values that you want, it is necessary to provide unambiguous data with four-digit years. For example, to enter birth and death dates into the president table, which lists U.S. presidents back into the 1700s, four-digit year values are in order. Values in these columns span several centuries, so letting MySQL guess the century from a two-digit year is definitely the wrong thing to do.

3.2.7 Spatial Data Types

Spatial values enable representation of values such as points, lines, and polygons. These data types are implemented per the OpenGIS specification, which is available at the Open Geospatial Consortium Web site (http://www.opengeospatial.org/). Table 3.17 lists the spatial data types that MySQL supports.

Table 3.17. Spatial Data Types

Type Name

Meaning

GEOMETRY

A spatial value of any type

POINT

A point (a pair of X,Y coordinates)

LINESTRING

A curve (one or more POINT values)

POLYGON

A polygon

GEOMETRYCOLLECTION

A collection of GEOMETRY values

MULTILINESTRING

A collection of LINESTRING values

MULTIPOINT

A collection of POINT values

MULTIPOLYGON

A collection of POLYGON values

The level of support for spatial types varies by storage engine. The most complete support is implemented in MyISAM. Other engines such as InnoDB, NDB, and ARCHIVE offer more limited support. For example, in MyISAM tables, spatial values can be indexed using either SPATIAL or non-SPATIAL indexes (using INDEX, UNIQUE, or PRIMARY KEY). Other engines that support spatial data types can use only non-SPATIAL indexes (except ARCHIVE, which cannot index spatial columns at all). If a table is partitioned, it cannot contain spatial columns.

Spatial columns included in a SPATIAL index cannot use NULL to represent missing values within columns, because SPATIAL indexes do not allow NULL values. Depending on your application, it might be acceptable to use an empty (zero-dimensional) value instead.

MySQL works with spatial values in three formats. One is the internal format that MySQL uses for storing spatial values in tables. The other two are the Well-Known Text (WKT) and Well-Known Binary (WKB) formats; these are standards for representing spatial values as text strings or in binary format. The syntax for text strings and the binary representation are defined in the OpenGIS specification. For example, the WKT format for a POINT value with coordinates of x and y is written as a string:

'POINT(x y)'

Note the absence of a comma between the coordinate values. In lists of multiple coordinates, commas separate pairs of x and y values. The following string represents a LINESTRING value consisting of several points:

'LINESTRING(10 20, 0 0, 10 20, 0 0)'

More complex values have a more complex representation. This POLYGON has a rectangular outer boundary and a triangular inner boundary:

'POLYGON((0 0, 100 0, 100 100, 0 100, 0 0),(30 30, 30 60, 45 60, 30 30))'

Because spatial values can be complex, most operations on them are done by invoking functions. The set of spatial functions is extensive and includes functions for converting from one format to another (see Appendix C).

The following example shows how to use several aspects of spatial support:

mysql> CREATE TABLE pt_tbl (p POINT);
mysql> INSERT INTO pt_tbl (p) VALUES
    ->   (POINTFROMTEXT('POINT(0 0)')),
    ->   (POINTFROMTEXT('POINT(0 50)')),
    ->   (POINTFROMTEXT('POINT(100 100)'));
mysql> CREATE FUNCTION dist (p1 POINT, p2 POINT)
    ->   RETURNS FLOAT DETERMINISTIC
    ->   RETURN SQRT(POW(X(p2)-X(p1),2) + POW(Y(p2)-Y(p1),2));
mysql> SET @ref_pt = POINTFROMTEXT('POINT(0 0)');
mysql> SELECT ASTEXT(p), dist (p, @ref_pt) AS dist FROM pt_tbl;
+----------------+-----------------+
| ASTEXT(p)      | dist            |
+----------------+-----------------+
| POINT(0 0)     |               0 |
| POINT(0 50)    |              50 |
| POINT(100 100) | 141.42135620117 |
+----------------+-----------------+

The example performs these operations:

  1. It creates a table that includes a spatial column.
  2. It populates the table with some POINT values, using the POINTFROMTEXT() function that produces an internal-format value from a WKT representation.
  3. It creates a stored function that computes the distance between two points, using the X() and Y() functions to extract point coordinates.
  4. It computes the distance of each point in the table from a given reference point.
  • + Share This
  • 🔖 Save To Your Account
MySQL, 4th Edition

This chapter is from the book

MySQL, 4th Edition

Discussions

comments powered by Disqus