GNU Emacs Lisp Reference Manual - Splitting Characters

Go to the first, previous, next, last section, table of contents.

Splitting Characters

The functions in this section convert between characters and the byte values used to represent them. For most purposes, there is no need to be concerned with the sequence of bytes used to represent a character, because Emacs translates automatically when necessary.

Function: char-bytes character

This function returns the number of bytes used to represent the character character. This depends only on the character set that character belongs to; it equals the dimension of that character set (see section Character Sets), plus the length of its introduction sequence.

(char-bytes 2248)
     => 2
(char-bytes 65)
     => 1
(char-bytes 192)
     => 1

The reason this function can give correct results for both multibyte and unibyte representations is that the non-ASCII character codes used in those two representations do not overlap.

Function: split-char character

Return a list containing the name of the character set of character, followed by one or two byte values (integers) which identify character within that character set. The number of byte values is the character set's dimension.

(split-char 2248)
     => (latin-iso8859-1 72)
(split-char 65)
     => (ascii 65)

Unibyte non-ASCII characters are considered as part of the ascii character set:

(split-char 192)
     => (ascii 192)

Function: make-char charset &rest byte-values

This function returns the character in character set charset identified by byte-values. This is roughly the inverse of split-char. Normally, you should specify either one or two byte-values, according to the dimension of charset. For example,

(make-char 'latin-iso8859-1 72)
     => 2248

If you call make-char with no byte-values, the result is a generic character which stands for charset. A generic character is an integer, but it is not valid for insertion in the buffer as a character. It can be used in char-table-range to refer to the whole character set (see section Char-Tables). char-valid-p returns nil for generic characters. For example:

(make-char 'latin-iso8859-1)
     => 2176
(char-valid-p 2176)
     => nil
(split-char 2176)
     => (latin-iso8859-1 0)

Go to the first, previous, next, last section, table of contents.