XML Encoding

XML documents can contain international characters, like Norwegian
æøå, or French êèé.

To avoid errors, you should specify the encoding used, or save your XML files as
UTF-8.


Character Encoding

Character encoding defines a unique binary code for each different character used
in a document.

In computer terms, character encoding are also called character set, character
map, code set, and code page.


Unicode

Unicode is an industry standard for character encoding of text documents. It defines (nearly) every possible international character by a
name and a number.

Unicode has two variants: UTF-8 and UTF-16.

UTF = Universal character set Transformation
Format.

UTF-8 uses 1
byte (8-bits) to represent characters in the ASCII set, and two or three bytes for
the rest.

UTF-16 uses 2 bytes (16 bits) for most characters, and four bytes
for the rest.


UTF-8 – The Web Standard

UTF-8 is the standard character encoding on the web.

UTF-8 is the default character encoding for HTML5, CSS, JavaScript, PHP,
SQL, and XML.


XML Encoding

The first line in an XML document is called the prolog:

<?xml version=”1.0″?>

The prolog is optional. Normally it contains the XML version
number.

It can also contain information about the encoding used in the
document. This prolog specifies UTF-8 encoding:

<?xml version=”1.0″ encoding=”UTF-8?>

The XML standard states that all XML software must understand both UTF-8 and
UTF-16.

UTF-8 is the default for documents without encoding information.

In addition, most XML software systems understand encodings like ISO-8859-1,
Windows-1252, and ASCII.


XML Errors

Most often, XML documents are created on one computer, uploaded to a server
on a second computer, and displayed by a browser on a third computer.

If the encoding is not correctly interpreted by all the three computers, the
browser might display meaningless text, or you might get an error message.

Look at these two XML files:

Note saved with right encoding
and

Note saved with wrong encoding
.

For high quality XML documents, UTF-8 encoding is be the best to use. UTF-8 covers
international characters, and it is also the default, if no encoding is declared.


Conclusion

When you write an XML document:

  • Use an XML editor that supports encoding
  • Make sure you know what encoding the editor uses
  • Describe the encoding in the encoding attribute
  • UTF-8 is the safest encoding to use
  • UTF-8 is the web standard

large porn tube
This entry was posted in   XML.
Bookmark the   permalink.

Admin has written 171 articles