Metadata, Meta Tags, Meta What?

There would appear to be a certain amount of confusion about the terms "metadata" and "meta tags" – I know that it has confused me in the past so I am hoping that this article may make things a little more clear for those who are struggling with these meta-things.

Metadata 

As this article is being written for a newsletter focusing on accessibility, let’s start by looking at meta-things in this context. Checkpoint 13.2 of WCAG10[1] tells us to:

Provide metadata to add semantic information to pages and sites.

 What does this mean? Let’s start by defining what we mean by metadata: metdata is data about data. So, providing metadata for our page (which is data) means that by some means we need to describe that data.

How much metadata do we need to provide about our page to satisfy Checkpoint 13.2?  There is probably no one correct answer to this other than to provide as much useful metadata as possible.  The minimum is to provide a value for (X)HTML’s only real metadata element – the <title></title>.

Yes, the <title></title> is metadata – it is data that describes the data that is our page or document.  Please note that "Untitled" is not a good value for our <title></title>, nor is it cool or clever to use the title of the site for the <title></title> of every page in the site.  (Including the site title, however, is good as it allows the page to make more sense when viewed on its own.)

Meta Tags or Meta Elements 

What about  "meta tags" or "meta elements" then?  Are these not metadata?  No; meta elements (or tags if you will) are not metadata in themselves, they are (X)HTML elements that allow us to embed metadata in a page.

Let’s look at how we can use meta elements to put metadata into a document, using a couple of well-known examples:

<html>
<title>My page</title>
<meta name="description" content="A page I wrote about some stuff." />
<meta name="keywords" content="Smiffy, stuff, things," />
</head>

In the example above, we are presenting three pieces of metadata – the page title, some information called "description" and some information called "keywords".  The description would give a brief summary of what the page is about and the keywords a comma separated list of terms relevant to the page.  Whilst description and keywords are often published, how much they actually get used is debateable.  There was a time when search engines might have taken note of these but from what I have been reading recently, they are largely ignored for the simple reason that much metadata of this type cannot be trusted.  The description and keywords metadata values may be of use if your organisation has an in-house search engine that can make use of them – but there are better things available as we will see shortly.

I have seen many other terms added as metadata, for instance "author".  Great – it’s good to identify the author of a document, but information embedded in meta elements is really for machines to read, not humans.  Besides, can we agree that the creator or writer of a document is called an author?  Probably not.  

The problem that we have here is that we are not working to a formal metadata scheme.  If we make up our own terms, they will probably only be of use to us, and only then if we have our in-house search engine as mentioned earlier.

Formal Metadata Schemes

If we really want to make our metadata useful, we need to agree on what the terms are (description and keywords may be common, but they are informal).  If we all say that the person who creates a document is a "creator", then my in-house search engine can look at your documents and know who wrote them because we have both used the same term – and your in-house search engine can look at my documents and make sense of them in the same way.

Where we can get really clever is when I use a set of terms that you may not be familiar with, but I also provide you with a link to something called a schema that defines those terms.  Your software can then run off and look at the schema and come back and tell you what my metadata means.  This is where we start to touch on what is known as the Semantic Web.

Dublin Core: A Formal Metadata Scheme

Let’s look now at a metadata scheme called Dublin Core[2].  Some people might even call Dublin Core the formal metadata scheme as it is actually listed with the International Standards Organisation as ISO15836.  There are two parts to Dublin Core, the 15 Elements which constitute ISO15836 and the Terms, which give us much more scope about what we can describe.  It should be noted that Dublin Core metadata isn’t just about describing Web content – it can describe physical objects, events, services and more.

Rather than go into the boring theory of Dublin Core metadata, let’s get our hands dirty and look at a practical example; this is a selection of the Dublin Core metadata used to describe the document you are currently reading:

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />
<meta name="DC.language" scheme="DCTERMS.RFC1766" content="en" />
<meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
<meta name="DC.format" scheme="DCTERMS.IMT" content="text/html; charset=UTF-8" />
<meta name="DC.title" lang="en" content="Smiffy's Place: Metadata, Meta Tags, Meta What?" />
<meta name="DC.creator" content="Matthew Smith (Smiffy)" />
<meta name="DC.identifier" content="http://www.smiffysplace.com/metadata-meta-tags-meta-what" />
<meta name="DCTERMS.license" content="http://creativecommons.org/licenses/by-nc-sa/3.0/" />
<meta name="DC.rights" content="(C) Copyright 2001-2007 Matthew Steven Smith" />
<meta name="DC.description" content="Article for the GAWDS newsletter clarifying the differences
	between metadata and meta tags and what the two are actually for." />
<meta name="DC.subject" content="Accessibility;Dublin Core;HTML;Technical;XHTML;adaptability;metadata;namespace;scheme" />
<meta name="DCTERMS.created" scheme="DCTERMS.W3CDTF" content="2007-08-14" />

Rather than describing every element, I will point out a few items of interest – you can read up on the elements and terms at the Dublin Core site[2].

Firstly, those first two links – what are they there for?  These links point to the schemas for the Dublin Core Elements (DC) and Terms (DCTERMS).  If you don’t understand my metadata, you can follow these links to the schemas so that you can find out how they work – or at least your software can.

The first meta element, DC.language, has more than the usual name and content properties – it also has a property called "scheme".  What this means is that the value of our content is part of a formal, controlled vocabulary.  This means that the only values that can appear in the content must be picked from this vocabulary, so we are all speaking the same language – no making up your own.

The element DC.title has yet another new property: lang="en".  This means that the title that I am presenting is in English.  I could have several different values of DC.title, each with a different language attribute, allowing me to present a multi-lingual version of the metadata.

Not forgetting that we started off talking about description and keywords, Dublin Core covers these.  Description is DC.description and keywords becomes DC.subject.  If you already have description and subject meta elements in your documents, you can easily convert them to the equivalent Dublin Core terms by a) renaming them and b) changing the commas in keywords to semicolons in DC.subject.  Easy!

I would encourage readers to go through the code example above in conjunction with the documentation on the Dublin Core site.

Who Uses Dublin Core?

If you decide to start using Dublin Core metadata in your documents, you will be in the company of libraries, museums, governments and educational establishments worldwide.  Contracts with many of these may mandate the use of Dublin Core metadata, so it is a good idea to be familiar with it.

Other Metadata in Meta Elements 

Other metadata that can be embedded in meta elements that are worthy of mention are:

  • The robots term with which we can tell well-behaved software user agents whether we want our pages indexed and links followed or no.
  • The meta http-equiv terms which allow us to tell our server software to embed their contents in the HTTP headers sent to the client.   Typically, these can include the character set being used (a duplication of our DC.format, if used) and expiry times for proxy/user agent cache control.

There are those who may contest that these last items are not metadata, as they do not so much describe the data (apart from the character set) as the disposition of the data – I thought I should include them though for completeness.

So All Metadata Lives in Meta Elements?

No. We have already seen that our only "pure" metadata field is actually the document <title></title>. Other than that, there is no reason at all that we have to put our metadata in Meta Elements.  Metadata may well be stored externally in another format, such as RDF.  We can then link to it:

<link rel="metadata" type="application/rdf+xml" href="my_ref_metadata.xml"
title="RDF Metadata" />

Metadata for Accessibility

A discussion about metadata for an accessibility-related newsletter would not be complete without a mention of EARL[3], the W3C’s Evaluation And Report Language, which we can use to describe the results of accessibility testing on our pages.  The Fundación Sidar’s HERA[4] software can generate EARL which we can then provide as accessibility metadata, linked from our documents as above.

Remember The Humans

Metadata embedded in (X)HTML <head></head> sections and stored elsewhere as RDF and other formats is all very well for software to read, but not much use to us humans.

If you have metadata that might be of use to your human visitors, such as the document creator, the creation date, license details, etc., put it in the document body, even if it duplicates the machine-readable metadata that you have already taken so much effort over.  Remember – as soon as your page gets printed, everything embedded gets lost.

Re-Cap

  • Metadata is data describing data.
  • Meta tags or elements are a means of embedding metadata and other information in your (X)HTML document.
  • Metadata that does not conform to a published scheme may be of little or no use.
  • The most widespread formal metadata scheme is Dublin Core.
  • Metadata can be stored external to a document.
  • Present any metadata that may be of use to humans in a human-readable form in your document.

References

  1. http://www.w3.org/TR/WAI-WEBCONTENT/wai-pageauth.html#tech-use-metadata
  2. http://www.dublincore.org
  3. http://www.w3.org/TR/EARL10-Schema/
  4. http://www.sidar.org/hera/index.php.en

Footnote

This article has been reproduced at InDelv.com.