In this post we’ll learn about a new feature Compact Strings in Java, added in Java 9, which adopts a more space-efficient internal representation for strings.
Motivation for Compact Strings in Java
Implementation of Java String class before Java 9 stored characters in a char array, using two bytes for each character - UTF-16 encoding. Since String is one of the most used class, String instances constitute a major component of heap usage. It has been observed that most String objects contain only Latin-1 characters which requires only one byte of storage. So internal storage always as UTF-16 means half of the storage is going unused.
Changes for Compact Strings
In order to make Strings more space efficient Java 9 onward internal representation of the String class has been modified from a UTF-16 char array to a byte array plus an encoding-flag field.
As per the Java Compact String feature, based upon the contents of the string characters are stored either as-
- ISO-8859-1/Latin-1 (one byte per character), or
- UTF-16 (two bytes per character)
The encoding-flag field indicates which encoding is used.
In the String class you can see the changes for the same-
Storage from char[] array, before Java 9
/** The value is used for character storage. */ private final char value[];
has been changed to byte[] array
private final byte[] value;
Encoding-flag field is named as coder and is of type byte-
private final byte coder;
coder can have either of these two values-
@Native static final byte LATIN1 = 0; @Native static final byte UTF16 = 1;
Based on whether the storage is Latin-1 or UTF-16 methods of the String class have different implementations too. In fact even the String class has two variants-
final class StringLatin1 final class StringUTF16
Based on the value of the encoding-flag field (coder) specific implementation is called by the methods of the String class.
public int compareTo(String anotherString) { byte v1[] = value; byte v2[] = anotherString.value; if (coder() == anotherString.coder()) { return isLatin1() ? StringLatin1.compareTo(v1, v2) : StringUTF16.compareTo(v1, v2); } return isLatin1() ? StringLatin1.compareToUTF16(v1, v2) : StringUTF16.compareToLatin1(v1, v2); }
That's all for this topic Compact Strings in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!
>>>Return to Java Basics Tutorial Page
Related topics
You may also like-
No comments:
Post a Comment