I always skip the chapter on XML. This entry is an attempt to be able to continue to do this, yet still be able to grasp why things are done the way they are in the world of web document authoring.
A Definition
eXtensible Markup Language or XML is not a language (like HTML) but rather a set of rules for creating other markup languages. This makes it a a metalanguage–a language for describing other languages–which lets you design your own markup languages for different type of documents, and gives some insight as to the “eXtensible” aspect of its name. XML can do this because it’s written in SGML, the international standard metalanguage for text document markup (ISO 8879).
See also
- Cover Pages (various dates). SGML: General Introductions and Overviews. SGML is vast and complex. ML is a simplified form of SGML. This offers a brief history of the evolution of document standards.
Elements and Structure
The most significant thing about XML is that it offers semantic markup and document structure using elements such as: <dog>Lassie</dog>. The tags <dog> and </dog> add meaning to Lassie for humans and machines alike. Elements can contain other elements, which contain yet more elements, and together give a document create its structure:
<?xml version="1.0"?>
<movie>
<title>Lassie Come Home</title>
<year>1943</year>
<plot>Hard times came for Carraclough family and they are forced to sell Lassie to the rich Duke of Rudling.</plot>
<cast>
<human>Roddy McDowall</human>
<dog>Lassie</dog>
<!--more movies added hear -->
</cast>
<!--more movies added hear -->
</movie>
Of note is that this representation is both text and data, and so can be stored in a database or in plain-text. This means that XML documents are not tied to a proprietary format or device that may become obsolete and can be easily shared between incompatible systems.
Also of note is that XML documents may be used for all sorts of content, not just Lassie movies. Some XML languages use a Document Type Definition (DTD) that defines which elements may be used in the document.
See Also
- W3C (2000). XML Schema. XML Schemas offer a method for defining XML elements and document structure.
- W3C (2006). The Extensible Stylesheet Language Family (XSL). Markup languages describe structure, not the presentation of a document. Like HTML, XML documents can use Cascading Style sheets for presentation (fast and preferable) or Extensible Stylesheet Language (slow but sometimes necessary).
Well-Formedness
This is an important distinction to make before rushing to validity: An XML document must be well-formed, and should be valid, but validity is not essential.
Well-formed documents comply with the XML rules for marking up a document, regardless of specific language. For example, all elements muct be correctly nested and may not overlap. Valid documents are both well-formed and comply with the rules set for a particular XML language. So, in XHTML is is invalid to put body element inside a link element, even if it is perfectly nested.
Of note to authors: browsers may still be able to render sloppy, error-ridden HTML, but they cannot do so with XML documents
See Also
- Sall, K. (2000). XML Software Guide: XML Parsers. There are hundreds of explicit criteria for creating well-formed XML documents, many of them common sense. It is always a good idea to check the syntax of your document using one of the well-formedness checker listed at the Web Developer’s Virtual Library.
- Eisenberg, J.D. (2001). How to Read W3C Specs. Learning to read a DTD (they begin with
<!DOCTYPE ...>) is not easy, but worthwhile if you spend anytime authoring XML documents because it is the ultimate authority for what is and is not syntactically correct for a particular markup language. He also talks about namespaces, which allows you to use elements from differnt XML applications in the same document.
XML on the Web
This is a list of the XML languages that are relevant to the Web. For now, they are just placeholders; but I want to delve into some of these in more detail at some point.
- XHMTL (Extensive Hypertex Markup Language)
- RSS (Really Simple Syndication or RDF Site Summary): RSS is an XML language for distributing web content on one web site, to many other web sites.
- RDF (Resource Description Framework): RDF is a W3C standard for describing Web resources, such as the title, author, modification date, content, and copyright information of a Web page. This is useful for indexing, searching or navigating a web site, and could be usedful to automated search agents.
- SVG (Scalable Vector Graphics): SVG is an emerging W3C standard for describing two-dimensional vector graphics for the web.
- SMIL (Synchronized Multimedia Integration Language): SMIL (pronounced “smile”) is a W3C recommendation for combining different multimedia presentations in a precise and synchronised way. For example, it defines timing markup, layout markup, animations, visual transitions, and media embedding.
- MathML (Mathematical Markup Language): MathML defines mathematical notation for the web. Because HTML does not offer a way to produce mathematical equations, authors have often resorted to using images of equations. This standard offers a way to give equations mathematical context.