XML documents can contain international characters, like Norwegian
æøå, or French êèé.
To avoid errors, you should specify the encoding used, or save your XML files as
Character encoding defines a unique binary code for each different character used
in a document.
In computer terms, character encoding are also called character set, character
map, code set, and code page.
Unicode is an industry standard for character encoding of text documents. It deﬁnes (nearly) every possible international character by a
name and a number.
Unicode has two variants: UTF-8 and UTF-16.
UTF = Universal character set Transformation
UTF-8 uses 1
byte (8-bits) to represent characters in the ASCII set, and two or three bytes for
UTF-16 uses 2 bytes (16 bits) for most characters, and four bytes
for the rest.
UTF-8 – The Web Standard
UTF-8 is the standard character encoding on the web.
SQL, and XML.
The ﬁrst line in an XML document is called the prolog:
The prolog is optional. Normally it contains the XML version
It can also contain information about the encoding used in the
document. This prolog specifies UTF-8 encoding:
The XML standard states that all XML software must understand both UTF-8 and
UTF-8 is the default for documents without encoding information.
In addition, most XML software systems understand encodings like ISO-8859-1,
Windows-1252, and ASCII.
Most often, XML documents are created on one computer, uploaded to a server
on a second computer, and displayed by a browser on a third computer.
If the encoding is not correctly interpreted by all the three computers, the
browser might display meaningless text, or you might get an error message.
For high quality XML documents, UTF-8 encoding is be the best to use. UTF-8 covers
international characters, and it is also the default, if no encoding is declared.
When you write an XML document:
- Use an XML editor that supports encoding
- Make sure you know what encoding the editor uses
- Describe the encoding in the encoding attribute
- UTF-8 is the safest encoding to use
- UTF-8 is the web standard