A brief overview of XML (extensible markup language)

XML is a way to store and exchange data similar to CSV and JSON. It is popular within the publishing industry for its compatibility with programs such as InDesign and Quark, as well as being at the centre of the DocBook format.

Let's get started

XML shares many structural similarities with HTML, but the difference is that everything from the doctype declaration through to the elements and their attributes are not predefined and so can be labelled in any way you wish with little restriction.

The first restriction is that all elements must be contained within a root element. In HTML we have the root element:

<html></html>

In XML we can call this root element anything we like:

<potato></potato>

It must simply exist.

What's next?

The next thing to add is the content, and as with HTML elements can be nested, but they must have opening and closing tags.

<potato>
<websites_I_dig>
<website>Daring Fireball</website>
<website>A List Apart</website>
<website>sketchyTech</website>
</websites_I_dig> </potato>

If you copy and paste this content to a file (in a regular text editor) and save it with a .xml extension (e.g. websites.xml) then we can open that file in Firefox, for example, and this is what we'll see.


Don't worry at the moment about the top line, we'll move onto styling the XML in a later post on XSLT (extensible stylesheet language) but for now you'll see we have a well-formed and structured document. If there was an error with the document, we'd see an "XML Parsing Error" and a far less attractive screen.

Attributes

In HTML we have a list of fixed attributes that can be attached to elements, such as class, id, src, href, name, and so on. In XML we are free to not only give the values of an attribute names of our choosing, but also to name the attributes themselves whatever we choose.

<potato>
<websites_I_dig>
<website rating="radical">Daring Fireball</website>
<website rating="awesome">A List Apart</website>
<website rating="too modest to say" blogger="me">sketchyTech</website>
</websites_I_dig> </potato>


There are also no rules about whether or not attributes are used, or about balancing them between similar entries.

What is it good for?

Absolutely everything. Well almost, it is a format that can be passed around most web scripting languages, like JavaScript and PHP, and "parsed". This means that it can be used as a data source. It can also be used to store data in a reliable way that is less susceptible to corruption and can be read by text editors as plain text, making it future proof. 

What else should I know?

There a few characters that XML doesn't allow in the text of your document, these include:
  1. &
  2. <
  3. >
  4. '
  5. "
and we get around these by using the following entity codes in their place:
  1. &lt;
  2. &gt;
  3. &amp;
  4. &apos;
  5. &quot; 
Of course we are allowed angle brackets (or greater than and less than signs) to open and close element tags, but not in the text.

And, finally

It is good practice to declare that it is an XML document at the very beginning.

<?xml version="1.0"?>

Additional things can be added here. For example the text-encoding:

<?xml version="1.0" encoding="UTF-8"?>

and whether or not it is a standalone document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

So let's add these now:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <potato>
<websites_I_dig>
<website rating="radical">Daring Fireball</website>
<website rating="awesome">A List Apart</website>
<website rating="too modest to say" blogger="me">sketchyTech</website>
</websites_I_dig> </potato>

Time for a final copy and paste into your file and to try it out.

Comments