The Java Tutorials have been written for JDK 8. Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.
See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.
See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.
The Character
class encapsulates the char
data type. For the J2SE release 5, many methods were added to the Character
class to support supplementary characters. This API falls into two categories: methods that convert between char
and code point values and methods that verify the validity of or map code points.
This section describes a subset of the available methods in the Character
class. For the complete list of available APIs, see the
Character
class specification.
The following table includes the most useful conversion methods, or methods that facilitate conversion, in the Character
class. The codePointAt
and codePointBefore
methods are included in this list because text is generally found in a sequence, such as a String
, and these methods can be used to extract the desired substring.
Method(s) | Description |
---|---|
toChars(int codePoint, char[] dst, int dstIndex) toChars(int codePoint) |
Converts the specified Unicode code point to its UTF-16 representation and places it in a char array. Sample usage: Character.toChars(0x10400) |
toCodePoint(char high, char low) |
Converts the specified surrogate pair to its supplementary code point value. |
codePointAt(char[] a, int index) codePointAt(char[] a, int index, int limit) codePointAt(CharSequence seq, int index) |
Returns the Unicode code point at the specified index. The third method takes a CharSequence and the second method enforces an upper limit on the index. |
codePointBefore(char[] a, int index) codePointBefore(char[] a, int index, int start) codePointBefore(CharSequence seq, int index) codePointBefore(char[], int, int) |
Returns the Unicode code point before the specified index. The third method accepts a CharSequence and the other methods accept a char array. The second method enforces a lower limit on the index. |
charCount(int codePoint) |
Returns the value 1 for characters that can be represented by a single char . Returns the value 2 for supplementary characters that require two char s. |
Some of the previous methods that used the char
primitive data type, such as isLowerCase(char)
and isDigit(char)
, were supplanted by methods that support supplementary characters, such as isLowerCase(int)
and isDigit(int)
. The previous methods are supported but do not work with supplementary characters. To create a global application and ensure that your code works seamlessly with any language, it is recommended that you use the newer forms of these methods.
Note that, for performance reasons, most methods that accept a code point do not verify the validity of the code point parameter. You can use the isValidCodePoint
method for that purpose.
The following table lists some of the verification and mapping methods in the Character
class.
Method(s) | Description |
---|---|
isValidCodePoint(int codePoint) |
Returns true if the code point is within the range of 0x0000 to 0x10FFFF, inclusive. |
isSupplementaryCodePoint(int codePoint) |
Returns true if the code point is within the range of 0x10000 to 0x10FFFF, inclusive. |
isHighSurrogate(char) |
Returns true if the specified char is within the high surrogate range of \uD800 to \uDBFF, inclusive. |
isLowSurrogate(char) |
Returns true if the specified char is within the low surrogate range of \uDC00 to \uDFFF, inclusive. |
isSurrogatePair(char high, char low) |
Returns true if the specified high and low surrogate code values represent a valid surrogate pair. |
codePointCount(CharSequence, int, int) codePointCount(char[], int, int) |
Returns the number of Unicode code points in the CharSequence , or char array. |
isLowerCase(int) isUpperCase(int) |
Returns true if the specified Unicode code point is a lowercase or uppercase character. |
isDefined(int) |
Returns true if the specified Unicode code point is defined in the Unicode standard. |
isJavaIdentifierStart(char) isJavaIdentifierStart(int) |
Returns true if the specified character or Unicode code point is permissible as the first character in a Java identifier. |
isLetter(int) isDigit(int) isLetterOrDigit(int) |
Returns true if the specified Unicode code point is a letter, a digit, or a letter or digit. |
getDirectionality(int) |
Returns the Unicode directionality property for the given Unicode code point. |
Character.UnicodeBlock.of(int codePoint) |
Returns the object representing the Unicode block that contains the given Unicode code point or returns null if the code point is not a member of a defined block. |
The String
, StringBuffer
, and StringBuilder
classes also have constructors and methods that work with supplementary characters. The following table lists some of the commonly used methods. For the complete list of available APIs, see the javadoc for the
String
,
StringBuffer
, and
StringBuilder
classes.