This chapter take you through the simple syntax rules to be followed to write a XML document. Following is a complete (but very simple) XML document:
<?xml version="1.0"?> <contact-info> <name>Tanmay Patil</name> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </contact-info>
You can notice there are two kinds of information in the above example:
- markup, like <contact-info> and
The following are the syntax rules to write different kinds of markup and text in a XML document:
Tags and Elements
The following diagram depicts the syntax rules to write different kinds of markup and text in an XML document.
Let us see each of the above points in detail in the following sections.
The XML document can optionally have a XML declaration. The XML declaration is written as below:
<?xml version="1.0" encoding="UTF-8"?>
Where version is the XML version and encoding specify the character encoding used in the document.
Syntax Rules for XML declaration
The XML declaration is case sensitive i.e it may not begin with “<?XML” or any other variant.
If you specify the XML declaration, then it must be the first statement in the XML document.
An HTTP protocol can override the encoding value that you put in the XML declaration.
Tags and Elements
A xml file is structured by several xml-elements, also called xml-nodes or xml-tags. The xml-element name is enclosed by < > brackets as shown below:
Syntax Rules for Tags and Elements
(1) Element Syntax: Each xml-element needs to closed as shown below. Either with start and end elements:
or for simple-cases just this way:
(2) Nesting of elements: A xml-element can contain additional children xml-elements inside it. But the elements may not overlap i.e an end tag must always have the same name as the most recent unmatched start tag.
Following example shows wrong nested tags:
<?xml version="1.0"?> <contact-info> <company>shiksha360.com <contact-info> </company>
Following example shows correct nested tags:
<?xml version="1.0"?> <contact-info> <company>shiksha360.com</company> <contact-info>
(3) Root element: A XML document can have only one root element. For example following is not a well-formed XML document, because both the x and y elements occur at the top level:
The following example show a well formed XML document:
<root> <x>...</x> <y>...</y> </root>
(4) Case sensitivity: The names of xml-elements are case-sensitive. That means the name of start and end elements need to be written in the same case.
For example <contact-info> is different from <Contact-Info>.
A xml-element can have one or more attributes. An attribute specifies a single property for the element, using a name/value pair. For example:
Here href is the attribute name and http://www.tutorialspoint.com/ is attribute value.
Syntax Rules for XML Attributes
(1) – Attribute names in XML (unlike HTML) are case sensitive i.e HREF and href refer to two different XML attributes.
(2) – Same attribute cannot have two values. The following example is not well-formed because the b attribute is specified twice:
<a b="x" c="y" b="z">....</a>
(3) – An attribute name should not be defined in quotation marks, whereas attribute values must always appear in quotation marks. Following example demonstrates WRONG xml format:
Here attribute value is not defined in quotation marks.
References usually allow you to add or include additional text or markup in an XML document. References always begin with the character “&” (which is specially reserved) and end with the character “;”. XML has two types of references:
(1) Entity References: An entity reference contains a name between the start and end delimiters. For example & where amp is name. The name refers to a predefined string of text and/or markup.
(2) Character References: These contain references, like A, contains a hash mark (“#”) followed by a number. The number always refers to the Unicode code for a single character. In this case 65 refers to letter “A”.
(1) – The names of xml-elements and xml-attributes are case-sensitive which means the name of start and end elements need to be written in the same case.
(2) – To avoid character encoding problems all xml files should be saved as Unicode UTF-8 or UTF-16 files.
(3) – Whitespace characters like blanks, tabs and line-breaks between xml-elements and between the xml-attributes will be ignored.
(4) – Some characters are reserved by the xml syntax itself. Hence they can’t be used directly. To use them some replacement-entities can be used which are listed below:
|not allowed character||replacement-entity||character description|