MySQL的字符集和连接校对的概念和理解

天·地·人 · 发表于 2010-4-16 11:44:15

本帖最后由天·地·人于 2010-4-16 12:04 编辑

Character Sets and Collations in General

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.

Suppose that we have an alphabet with four letters: “A”, “B”, “a”, “b”. We give each letter a number: “A” = 0, “B” = 1, “a” = 2, “b” = 3. The letter “A” is a symbol, the number 0 is the encoding for “A”, and the combination of all four letters and their encodings is a character set.

Suppose that we want to compare two string values, “A” and “B”. The simplest way to do this is to look at the encodings: 0 for “A” and 1 for “B”. Because 0 is less than 1, we say “A” is less than “B”. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binary collation.

But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters “a” and “b” as equivalent to “A” and “B”; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.

In real life, most character sets have many characters: not just “A” and “B” but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents, and for multiple-character mappings.

MySQL can do these things for you:

Store strings using a variety of character sets
Compare strings using a variety of collations
Mix strings with different character sets or collations in the same server, the same database, or even the same table
Allow specification of character set and collation at any level

In these respects, MySQL is far ahead of most other database management systems. However, to use these features effectively, you need to know what character sets and collations are available, how to change the defaults, and how they affect the behavior of string operators and functions.

天·地·人 · 发表于 2010-4-16 11:51:40

本帖最后由天·地·人于 2010-4-16 12:18 编辑

Character Sets and Collations in MySQL

The MySQL server can support multiple character sets. To list the available character sets, use the SHOW CHARACTER SET statement. A partial listing follows.

Any given character set always has at least one collation. It may have several collations. To list the collations for a character set, use the SHOW COLLATION statement. For example, to see the collations for the latin1 (cp1252 West European) character set, use this statement to find those collation names that begin with latin1:

The latin1 collations have the following meanings.

Collations have these general characteristics:

Two different character sets cannot have the same collation.
Each character set has one collation that is the default collation. For example, the default collation for latin1 is latin1_swedish_ci. The output for SHOW CHARACTER SET indicates which collation is the default for each displayed character set.
There is a convention for collation names: They start with the name of the character set with which they are associated, they usually include a language name, and they end with _ci (case insensitive), _cs (case sensitive), or _bin (binary).

In cases where a character set has multiple collations, it might not be clear which collation is most suitable for a given application. To avoid choosing the wrong collation, it can be helpful to perform some comparisons with representative data values to make sure that a given collation sorts values the way you expect.

admin · 发表于 2010-4-17 16:48:31

字符集和校对规则有4个级别的默认设置：服务器级、数据库级、表级和连接级。

数据库中关于字符集的种类有很多，对编程有影响的主要是客户端字符集和数据库字符集。

数据库中常用的操作就是保存数据和读取数据，在这过程中，乱不乱码和数据库字符集貌似没有什么关系。我们只要保证写入时选择的字符集和读取时选择的字符集一致，即只需保证两次操作的客户端字符集一致即可。

在写入时MySQL会将客户端指定的字符集转换成数据库字符集存入数据文件，读取时又将数据库字符集转换成客户端指定的字符集展示给客户端，把客户端字符集和数据库字符设置一致，显而易见的好处是免掉转换的性能损耗；另外，如果考虑到以后数据库的迁移，将数据库字符集设置为大多数数据库都支持的字符集会省掉很大麻烦。

		自动登录	找回密码
密码			注册