Monthly Archives: December 2009

Technology: Business Asset or Business Risk?

Risky Business

Everything we do, every day, has an element of risk. This is equally true in business as in other aspects of life. Whilst we may be aware of the risks inherent in driving to work, we are often unaware of risks involved in our work – not the regular health & safety risks – but more subtle risks to the business itself. Decisions we make in our use of technology assets generate risks, risks that might go unnoticed but could have a devastating impact on our business, should things go wrong. [And thus on the businesses of the clients that rely on us too; always remember that.]

This is a fairly long article, but I make no apology for this: business risk is a very serious matter. It could be worse: given the subject matter and my years in IT/network management, this could have been a very long article.

Seek, and Ye Shall Find

The process of identifying risks and their potential impacts is known as risk assessment. Risk assessments can be carried out by expensive consultants – or by anyone able to apply a little logical thinking and common sense. (When issues are complex or large amounts of money are at stake, it may be well to consider the expensive consultant route.)

For the purposes of this discussion, I am suggesting that we should list all the technologies that we use in our business and do a risk assessment on them. For each item we need to start by asking two questions:

The Yes/No Question

If this technology were to suddenly become unavailable, for whatever reason, would it affect my ability to do business?

The Quantity Question

Should the previous question yield an answer of ‘yes,’ for how long would I be able to work without this technology before its absence became a serious problem?

Write It Down!

Forewarned is forearmed. When undertaking a risk assessment, findings, plans of action, whom to call, etcetera, should all be documented. There is little point going through the exercise, having a risk become an incident and then finding that nobody can remember what is supposed to happen next.

In the following sections, I will run through a list of what I consider to be critical technologies, although not all will apply to all businesses. This list is not intended to be exhaustive but exists to give readers a starting point in performing risk assessments of the technology in their specific businesses.

The Telephone

Whilst there may be businesses out there that still do not have a computer (I have visions of people sitting at high desks, wearing fingerless gloves and half-moon glasses, writing with quill-pens in heavy ledgers,) very few will not have a telephone.

The Telephone Yes/No Question

As regards telephones, I cannot see the Yes/No question ever returning a ‘no.’ I make very little use of the telephone myself but it is an essential tool for When Things Go Wrong. Anyone who thinks that their business would not be affected by the loss of a telephone service should be asking exactly how they intend to call the fire service when their premises are burning down.

The Telephone Quantity Question

How long a business can operate effectively without a telephone depends on the nature of that business. I would not be comfortable knowing that I had no telephone service for over, say, one hour; the next thing to go might be my Internet connection – how would I call my ISP?

For any business where the telephone is a major means of communication with clients, any downtime is bad.

The Telephone – Discussion

Landlines

As we are starting off by looking at one of the most mature of technologies in use, let’s consider first the most mature of telephone technologies: the landline. As there may be businesses that do not have computers, there may be businesses that do not have mobile telephones. Strive not to be one of these because you need some means to call for service when the landline stops working. (Anyone thinking “oh, but we’ve got 10 lines” should be made aware that a backhoe can take out a 40-pair cable just as easily as a 4-pair cable.)

PABXs

If the business in question has a PABX, it should have a service contract for it. (Please tell me it has a service contract!) The answer to our Quantity Question should be used when negotiating the guaranteed response time for the service contract. If the answer is zero time, the minimum response time should be chosen.

–> Important Bit <–

Or should it? If the contract cost with the minimum response time sounds a bit steep, a little more thought is required. The cost of the outage (loss of business, etcetera,) should be weighed against the cost of the contract. Customer expectations should also be borne in mind as a part of this process. This is an important decision for the business owner and should not be undertaken lightly. This decision-making process applies not just to PABX service contracts but all business technology service contracts and Service Level Agreements (SLAs) for online services such as web hosting, too.

Final Word on PABXs

Things may be different nowadays, especially if the telephone service is provided over fibre; however, traditional PABXs used to have ports for ordinary, analogue, handsets to be plugged in to provide a service in the even of power failure. If you have a PABX, find out if it has such a port(s) and get a handset connected for emergencies if one is not already fitted.

Mobiles

Landline hansets tend to be rather hard to lose and are reasonably robust (decent business handsets, at any rate.) Mobile handsets, on the other hand, are horribly easy both to lose and to break. I have two pieces of advice for the mobile ‘phone user to help mitigate risk:

  • Buy a USB SIM card adapter and software. These are very cheap and allow the contents of the SIM card to be backed up to a computer. Make backups regularly, especially if you add new numbers to your phone book on a regular basis. (Make sure that numbers are always saved to SIM, not to phone.)
  • Have a cheap, spare, handset that you can put your SIM card into in the event of the phone taking a tumble, a ride in the washing machine, or whatever. My SIM has survived the death of several handsets, including Death by Washing. My spare handset has a pay-as-you-go SIM card in it; should the main handset be lost or stolen, I can still make calls.

I know very little about smartphones and do not aspire to own one. However, a smartphone is a just a portable computing platform. Computers should be backed up. Check with your vendor to find out how.

Computer Hardware

Computer Hardware Yes/No Question

After some consideration, I an unable to think of a scenario where a business has a computer or computers but can work quite happily without them. On the strength that anyone reading this article is doing so using a computer (rather than have a secretary print off a hard copy to avoid touching that Devil Machine,) I will, as with the telephone, assume that we will be looking at a ‘yes’ response here.

Computer Hardware Quantity Question

This question is where I would expect to see a bit more variance in answers. A business that only uses a computer to run accounts once a week would probably be somewhat more comfortable with an outage than, say, myself. (I am a developer; no computer = no work. It takes a genius like the late but amazing Ada Lovelace to write software before the computer has even been built.)

As the computer is such a fundamental and critical component of my business, I will detail what I do to keep myself in operation.

Computer Hardware – Discussion

If the computer is a key tool in a business, the simple fact is that a spare should be available or some guaranteed means of laying hands on another one quickly. Not only does the spare machine need to be available quickly, it also needs to be ready to do what the regular one does (or did in the event of a failure) – any software used should be installed, it should be set up to work with the office network, etcetera.

Desktop Machines

Thinking about desktop machines, if someone in the organisation is any good with hardware, a set of spares can be carried for emergency repairs. (If several computers are involved, it helps if they are the same make/model or at least that spares are interchangeable.) A spare power supply and hard disc should be carried at the very least. The simplest approach, however, is to have an entire machine into which we can swap the hard disc (assuming this hasn’t died) from a defunct machine, or cannibalise for parts. (Also consider having a spare keyboard, mouse, monitor to hand – although most businesses seem to accumulate these in the course of upgrades.)

Where is the data used by the desktop machine stored? If it is on a server and the user has been disciplined to not save files to the local disc, swapping the machine out with another pre-loaded with the required software should be quick and simple. If, however, files are stored on the local machine a second, mirrored, hard disc (RAID 1) should always be employed if the machine is mission-critical.

Note that repairs/replacement could be effected by someone outside the business if they were known to be able to attend quickly. However, consideration should always be given to the fact that the critical person may not be available due to whatever reason. Contingency plans should always be made to cover this eventuality.

Laptops

Laptops are far less easy to repair than desktops. Keeping just-in-case spare parts is far more expensive than for their desktop brethren. Furthermore, laptops are easy to drop, steal, spill coffee in (far worse than spilling coffee on a desktop keyboard,) and generally give a hard time.

If, like me, the primary machine is a laptop, a spare is needed. This is probably the point where some readers will be saying “argh, expensive! I can’t afford that!” I would ask those readers to put a cost on the work that they will not be able to do without the spare.

The spare laptop need not be the same as the main one; it just needs to have the same software installed and be configured in a compatible manner. It can be clunky and slow so long as it is up to the task. I run a large, desktop-replacement ThinkPad as my primary. It does a great job, but is only portable in a fairly loose sense of the word. My secondary/backup is a little Vaio; it has a somewhat smaller screen but is very portable. It was also quite cheap.

Only one laptop ever leaves the house – the Vaio. As this puts it into Getting Stolen risk category, the hard disc is completely encrypted. (My machines hold sensitive client data; I have a duty of care to my clients to ensure that their data never ends up where it shouldn’t.) When at home, I keep the two machines synchronised after every file save. (I do this using version management software – a topic which exceeds the scope of this article but which I mention for the sake of those who might be curious and wish to investigate further.) So, when coffee hits keyboard, ignoring the repair bill, things are not so disastrous.

Oh, and a spare for a laptop can always be a desktop; it might prove a bit tricky to go walkabout with it though. If portability is not an issue, it could save a few $$$.

Networking Gear

I have experienced about as many failures of networking equipment – modems, routers, hubs/switches – as I have actual computers. As with computers, carry spares. If your business has a $5,000 managed hub, have a little $70 to tide over essential services when it goes “pfft!” I have a spare Ethernet switch to hand (an old one that I upgraded) and a ready-configured ADSL router/wireless access point. Total cost: $150.

Note that network cables tend to suffer all sorts of abuse – having a couple of spare in the drawer could just help save the day.

Contracts

My approach in the Computer Hardware section has assumed small to medium businesses which look after their own hardware requirements. An alternative, especially when dealing with expensive servers, is to have a maintenance contract. Maintenance contracts are just as much for sole traders as they are for large corporates. My points made in the PABX section regarding response times/SLAs apply in this context too.

With computer hardware services, there are a large number of fly-by-night operators (they exist in the telecomms sector, too.) Anyone considering a contract should look carefully at who will be delivering the service. My inclination would be to buy only from the Big Names such as Dell, IBM, HP/Compaq, Sun if any form of maintenance contract is required.

For those who particularly want to deal with a smaller operator, go ahead – but ensure that second and third smaller operators are also identified for when the first choice cannot/does not deliver.

What About Apple?

I am not an Apple user (apart from my iPod;) this section was written with PCs in mind but all concepts still apply. Vendors should be consulted regarding maintenance contracts and the like.

Network Services

In this section I will be discussing that all important tool, the Internet connection, along with e-mail, web hosting and this thing they call The Cloud. Now, I’ve already given two examples about the Yes/No question and the Quantity Question; for this section I will leave these as an exercise for the reader
and launch straight into some critical network services, the risks and how
they might be mitigated.

Internet Connection

Readers may have noticed a theme through the discussion so far – critical technologies require some form of backup. (Readers who have not noticed this are invited to have another coffee before re-reading this article 😉) Internet connections – if mission-critical – should have some form of backup just like all the other technologies mentioned so far. Assuming that the main Internet connection is coming in over a telephone line – either ADSL or a private pair (older technology) – mobile broadband makes a logical backup solution. However, there are limitations:

  • Mobile broadband is not available everywhere
  • Mobile broadband can be slow (it hardly deserves the epiphet ‘broadband’)
  • It might not be possible to plug it straight into an existing network (some routers can accommodate this though)

My advice with regards to backing up Internet connections for those of a non-technical nature is simple: talk to the ISP providing the main service. If this ISP cannot assist with a backup service, it may be worthwhile shopping around for another ISP that can.

E-mail

There are many different types of e-mail service (Amanda Gonzalez has written this simple guide at Flying Solo,) each with its own risks. The three main risks that an e-mail system presents are:

  1. Not being able to send/receive e-mails
  2. Losing sent/received e-mails
  3. Losing address books

A few tips/points regarding e-mail:

  • The safest e-mail service is probably a hosted one where availability of backups and an SLA are guaranteed by contract.
  • Personally, I like IMAP; I run (and back up) my own mail servers. My entire IMAP folder structure is copied to a second server in my office and also a server in the USA on a daily basis. IMAP also makes it convenient in that I can access my mail from either laptop at any time.
  • The risk of data loss with POP may be mitigated by backing up the appropriate folder(s) on the computer used to access mail on a regular (daily or greater) basis.
  • Unless using an enterprise mail system (GroupWise, Exchange, etcetera) where address books are a server function, address books for IMAP/POP mail clients need to be backed up.
  • Free e-mail services can provide a handy secondary/backup for regular e-mail services. Address books from primary services should be synchronised to secondary services on a regular basis.
  • I would discourage the use of any free e-mail services for mission-critical applications. When paying for a service, the provider has a contractual obligation to make sure that things work; with free services, it is a gamble. (I have seen enough instances of outages, compromised (hacked) systems and user data loss in free e-mail services to recommend them only as secondary/backup systems.)

Web Hosting

Here are a few points to consider when assessing the risks of web hosting:

  • SLA – 99.99% guaranteed uptime sounds great. But is that per year or per month? Lose a 9 there and that’s just under 9 hours in a year. Examine these figures very carefully.
  • Hosting providers (especially the cheaper ones) often perform scheduled maintenance without warning customers. How critical is uptime – is this an issue?
  • Overseas hosting providers often perform scheduled maintenance during the night – which might be in the middle of business hours elsewhere. Could this present an issue?
  • If a hosting provider is also handling DNS and/or registration for a domain, it may be very hard to move to another provider in the even of the first provider going broke (doing a runner, turning ‘funny,’ etcetera; I’ve heard them all.)
  • Always have a hosting contingency plan should it prove necessary to move a site in an emergency.
  • Remember that ftp is not a secure protocol. Personally, I would not use a hosting provider that used ftp with plaintext user name/password logins for any site that handled sensitive (personal, financial) data. ftps (encrypted ftp) should really be the minimum standard.

The Cloud

Readers are likely to have been hearing much buzz of late regarding ‘The Cloud.’ The main thing to understand about Cloud Computing is that, rather than having software installed on my computer, I run software on another computer (or computers) somewhere else.

It is at this at this point that I should disclose that I am a self-confessed Cloud Skeptic. Whilst I can see the many benefits and possibilities of Cloud Computing, I am very much aware of the risks that come with this technology and which need to be addressed before the business world becomes over-reliant on it.

Web Applications – There Rather Than Somewhere

Here I am, a web applications developer, saying that The Cloud is risky. Is this not an odd thing to do? No – and for two reasons:

  1. I constantly analyse the risks of my own business
  2. I make a distinction between the applications I write and host in known physical locations with applications running somewhere (anywhere.) I run Virtual Private Servers (VPS) for myself and my clients; these are located in data centres I have specified. If I were to ask my provider, they could even send me a photo of the physical machines the VPSs are running on. With a Cloud-hosted application, I just have to be content with it running ‘somewhere.’

My concern over Cloud-hosted applications is that there the systems required to produce server instances ‘somewhere’ are by far more complex (and immature – and I’ll cop some flack for saying this) than those required to deliver a Virtual Machine on that computer over there. –> *points*

Internet Connection

No, this is not an inadvertent copy and paste from earlier on in this article. If I run software – say a word-processing package – on my computer and my Internet connection fails, I can carry on using it. However, if my word-processing package is actually running as a service somewhere in Cloud-Land, whoops – it’s gone. The Internet connection thus becomes the weakest link in the business for which provision needs to be made accordingly – such as a means of being able to work offline.

Use The Cloud, by all means – just be prepared.

Conclusion

If all that technical detail has readers reeling, not to worry! I will now summarise the entire article in three bullet-points:

  • Technologies on which a business relies present risks.
  • For each technology used by a business, an assessment should be made as to whether it presents a business risk and, if so, to what degree.
  • Action should be taken for each identified risk which may include:
    1. Acquiring backup equipment
    2. Taking out support contracts
    3. Identifying alternative vendors
    4. Documenting plans on how to respond to a risk becoming an incident

Other Stuff

Likely as not, if looking at business risks for the first time, readers might be starting to think that they extend far beyond the technology risks I have discussed. I will, therefore, leave you with some further avenues of thought:

  • Infrastructure – power, water.
  • Premises – where to relocate?
  • Key staff – should more than one person understand their role?
  • Work vehicles – alternatives when off the road?
  • Zombie attack; seriously. Zombies only exist in the movies (and my office, before my first espresso,) but analysing the risks of a hypothetical, if fictional, scenario may identify gaps elsewhere.

Phew, finished! It’s a lot easier to do risk assessments than to tell other people how to do them. Hey, wait, is this thing still recording?

Welcome to the Wheeling Gourmet!

I am pleased to announce the release of a new food site,
The Wheeling Gourmet by friend
and former chef, Nicolas Steenhout.

The Wheeling Gourmet will have a constantly growing set of recipes,
cooking lessons, tips and tricks, and food blog posts.

Whilst there is no specifically gluten-free material at the time of writing,
Nicolas tells me that this is an area he will be exploring and developing over the next few months. (I will be sure to keep reminding him of this!)

Nicolas is running a Twitter account for the site: @WgChef in addition to his regular @vavroom account.

Smiffy’s (Gourmet) Bacon and Egg Pie

Preamble

When I was a child, I used to be fed something called bacon and egg pie. I later heard of this referred to as quiche, which sounded a bit posh and nobby for such a humble dish. The recipe presented here follows the bacon and egg pie/quiche concept but is gluten-free, having no pastry crust, and fairly low-carbohydrate, for the same reason.

This dish composes 2 parts:

The Filler

Bacon, onion, and other vegetables are chopped fairly finely and fried in olive oil. This could be a full-blown ratatouille, if you fancy. The important thing here is to cook until nearly all the water has evaporated; we do not want a wet filler to go with our egg mixture.

I have made 2 version of this dish – one with fresh tomato in the filler; the alternative is to add tomato paste to the egg mixture. If using fresh tomato, just observe the caveat of getting rid of the water.

The Eggs

To fill my pie dish, I require 8 eggs, about 100g of goat milk yoghurt and a generous pinch of salt. Yes, other types of yoghurt may be used, but a good goat milk yoghurt lends a savoury flavour that others cannot. Yoghurt should be either set or fairly thick – watery will have a potentially adverse affect on texture.

Egg/yoghurt mixture is beaten thoroughly with a hand-whisk. A mild but flavoursome, hard cheese is grated into this mixture (I use the goat cheese, Chevrette.) The cooked bacon and vegetables are now stirred into this mixture. If using tomato paste, this goes in now.

Cooking

Mixture is transferred to a greased pie dish and into the oven. In my fan oven, I cook this at 175° Celsius. This dish should rise as it cooks, from the outside in. Once it is risen evenly to the middle, it is cooked.

Serving Suggestions & Variations

Serve with chips and/or salad, or just on its own. Goes with a wide variety of wines or dry cider.

If you want a meat-free variant, leave out the bacon and use a stronger-flavoured cheese to maintain the savoury nature of the dish.

Whilst the dish is risen and light (at least mine turns out that way,) separating the eggs and beating the whites separately before whisking all together should make it even lighter.

Enjoy!

Getting Better 2009

The Story So Far

Executive summary: I’ve been sick for a while. (End of summary.) I have reported on this at various times in this journal; in February 2007, which would have corresponded fairly closely to the nadir of my health issues, I described a day in the life of (me.)

Skip on a couple of months and I started – albeit slowly – weight training. The Weighty Matters section of this site (includes this article) documents my weight training progress.

Now, 2 years and 10 months from that low point, things are much better – and I am talking quantum levels better, from my rather subjective viewpoint, at any rate.

The Story Today

I think that my work capacity speaks volumes. At the beginning of this year, I was able to work about 12-14 hours per week, never more than 4 hours a day (on a really good day) and never 2 consecutive days. At the time of writing, I am able to work in excess of 20 hours per week, sometimes up to 6 hours per day (best days) and up to 2 days consecutively.

As regards lifting weights, improvements during this year include some of my troublesome joints ceasing to be so (and for the first time in my life,) and breathing issues that hindered exercises with heavier weights, far more than the weights themselves, having resolved.

Where I Am With Weights

My weight training progress seems to have been the most constant of improvements. This is a good thing as it creates motivation (and something to feel good about in general) through positive feedback.

Primarily due to elbow issues – the last of my joint troubles to persist – and a weakness in my thoracic spine due to a probable combination of scoliosis and an old horse-fall, I have simplified my workout cycle to something resembling the following:

  • Bench press
  • Variants on the lat pull
  • Squat
  • Bench press
  • Variants on the lat pull
  • Deadlift

Including rest days, this constitutes about 10 days to 2 weeks of workouts. I tend to avoid using the same intensity of exercise for more than two sessions of a specific workout.

Numbers-wise, I regard workout-weight to be that which can be performed for 6-8 repetitions, expressed in terms of bodyweight. (My bodyweight, that is!) Squats are at 1.75, deadlifts at 1.46 (with hooks – wrists still a bit dodgy,) bench press lagging at 0.92, with 5 reps at 0.97.

My most significant, recent, milestones in terms of absolute weight were reaching what had for long seemed the unobtainable 4+ reps at 100kg (after months in the 90’s,) and achieving 4 sets of 5-rep deadlifts at 130kg without hooks. Meaningless milestones to most, but morale-boosters for me. And yes, I can be a bit numbers-obsessive; I have a terrible compulsion to count things – wonder if it means I’m a vampire?

Quantum Sleep-Leap

My health improvements have been due to factors both within and beyond my control. In the list of ‘beyond my control,’ I include things which I could control, but am unaware. It turned out that my sleep quality was one of these things.

After 4 years of CPAP therapy, I thought that my sleep apnoea issues were well and truly nailed. So did my respiratory/sleep physician and the rest of my medical team. At my last appointment, my physician suggested a machine upgrade as the old one was rather old and I had no backup should it fail. (No machine = no sleep. Or at least, no sleep that would count.) I followed up on this advice and acquired a new machine and mask.

Within a week of starting with the new equipment, I had far more energy and my work capacity shot up. Why? Despite 100% CPAP compliance and despite data downloaded from the machine suggesting that all was well, there was more to the problems of sleep than just apnoeas. The discomfort of the mask – possibly designed by a member of the Spanish Inquisition – and the noise of the old machine appeared to have been giving me really lousy sleep-quality. I just wish I had known that a long time ago.

CPAP therapy isn’t the scary thing that it may seem to those who think they may need it or are about to start on it. You need it, you use it, you get used to it. At least I do/did. Having the extra bag when traveling and trying to find a power point near hotel beds can be a bit of an inconvenience, but that’s life. It would be far worse without it.

Keep Taking The Medicine

One of the key players in my improved health has been my doctor (GP.) Lots of diagnostics, trialling medications (mostly hormones,) and regular reviews have finally hit a relatively sweet-spot. Whilst I’m still well below-par when compared with that elusive beast, the population average, I am very pleased with where I am and what I can do compared with that time nearly 3 years ago.

Although I’m not writing the prescriptions or ordering the blood tests, the business between my doctor and myself is very much a factor within my control. Collecting and presenting relevant information and complying with a doctor’s suggestions (including taking medications) is very much the patient’s responsibility. Doctors can’t work miracles.

Despite the best of intentions in both directions, things can go wrong. I have experienced a few occasions where everything started going wrong – fatigue started to get worse again, the weird eye problems that limit my work came back – almost like forgetting to take medications. It transpired that, on each of these occasions, there was a common factor: stale medication. A crucial, heat-sensitive, medication (hydrocortisone,) short-dated and sourced through a rural pharmacy which has no cold-chain deliveries seemed to be the culprit. Changing the source to an online supplier – which incidentally seems to have much longer-dated batches – and ordering for delivery during cooler weather has eliminated yet another unanticipated, external factor.

In addition to medication, I have also received several treatments from a massage therapist. This has certainly helped with various skeleto-muscular issues, which has helped with lifting weights. (Having a massage therapist who understands weight training is a big advantage.)

Lifestyle Management

That which isn’t from my doctor or lifting weights falls under the catch-all of Lifestyle Management. This I will break down into two sections: Eat Well and Don’t Overdo It.

Eat Well

I’m not going to give a nutrition lecture unless it’s “don’t eat processed food.” I tend to break my own rule here by eating the occasional protein bar.

Don’t Overdo It

You may need to push yourself a bit to get moving but, once moving, don’t keep pushing – you may end up crashing to a halt.

One of the most important lessons that I have learned through the time of my incapacity is to learn to gauge my limits and work within them – whether working, lifting weights, gardening or anything else. I won’t pretend otherwise – this can be incredibly frustrating; stopping a job half-way through because you’ve reached your fatigue threshold may be very hard. But then so is the crash from over-doing it and equally frustrating the week that you are unable to work due to poor body-management. I have been there many times. I think that I am finally getting the message and no longer exceeding my limits.

Conclusion: My Message

  • This article is all about me. If you see yourself or somebody you know in here, take heart, you are not alone.
  • Chronic fatigue is probably one of the biggest cop-out diagnoses being made by doctors in this day and age.
  • Don’t let your doctor write you off; if looks like they’re going to, write them off first and find one who really cares and wants to help you.
  • Sick people can lift weights – and doing so with care can help make them less sick.

Stupid Disclaimer

I am not a doctor, this article doesn’t constitute medical advice. If you want medical advice, go to a healthcare professional.

Extensible Metadata for Your CMS

Preamble

I am a metadata enthusiast, especially when it comes to Dublin Core. When it comes to the Web, I don't just want to see metadata for pages, I want to see metadata that conforms to a formal vocabulary (like Dublinc Core.) A quick read of my article Metadata, Meta Tags, Meta What? may help the reader get up to speed on this.

Content Management Systems (CMS) can provide a perfect framework for the creation, maintenance and presentation of metadata. Unfortunately, for most CMS software, this functionality is limited – often to informal, 'legacy,' terms – if it exists at all.

In my ideal world, a CMS would provide a ready-to-use means of associating Dublin Core metadata with all pages and be extensible so that the vocabulary could be extended or extra vocabularies added. Compared with some CMS functionality, this is not something that is difficult to achieve so I can only assume that the general lack of implementation speaks a total lack of interest in metadata on the part of the CMS developers.

Some time ago, I presented a set of notes on how to achieve this to a developer working on the Mambo CMS. This work never came to anything at the time as the project forked shortly thereafter and said developer left the project. Subsequent to this, I started working through my notes to produce an extensible metadata extension for the Drupal CMS and also described a toolkit that could be used to work with other CMS. Due to ill-health and lack of time, neither of these bore fruit.

The only progress I have really made on this to date has been in advising the developer of mojoPortal on my metadata concepts; a Dublin Core implementation for mojoPortal is being worked on at the time of writing.

Now, some three years on, I will try to make amends through this article by describing my concepts for adding an extensible metadata management system to a CMS.

I will attempt to keep this article as technology-neutral as possible by describing only the SQL table schemata and queries required to implement the system. However, it should be borne in mind that I am writing from a MySQL perspective and that changes may be required if working with other database technologies – especially when it comes to stored procedures.

One assumption that I am making, which is key to the whole concept, is that every page in the CMS is identified internally by a unique integer field. In the Drupal CMS, this would be the Node ID (nid.) If some other system is employed, a lookup table may need to be employed to implement my concepts.

The Simple, Inflexible Approach

To add metadata functionality to our CMS, we first need to extend the database schema. We could do this either by adding new fields to the table where we store our page content or to create a new table where we can store our metadata.

Our extended table or new table can have a column for every term. This keeps queries and management very simple – but is highly inflexible as adding terms would require modification of the table schema and the queries that relate to it. I find this approach somewhat distasteful – using a flat and fixed data structure when we have the power of a relational database to work with.

Key Metadata Concepts: Triples, n-Tuples

Metadata are data describing data. In the Web context, metadata are various pieces of data that describe properties of a page or media object.

In its simplest form, a metadata statement comprises three elements, the thing we are talking about, the property we are describing, and the value of that property. This set-of-three may be described as a triple or 3-tuple.

Consider the following example of the 'legacy' description metadata element:


<meta name="description" content="an article about metadata" />

Do you see the three elements of the triple? No; that's confusing, isn't it? This is because we are presenting the metadata on the page we are describing; the name attribute of the meta element tells us the property we are describing, the content attribute the value of that property; the subject – the thing we are talking about – is implied. (Those who deal in the grammar of human languages may wish to compare this with the concept of an imperative sentence, where the subject is implied rather than expressed. The name and content attributes thus form the predicate of that sentence.)

Now, who says we can only present metadata about a page on that page? Nobody. If we are storing this metadata in our CMS, we can present it elsewhere, such as in an external RDF file. Our store of metadata may be used to create a library-catalogue of our entire site.

For this simple case, where our metadata can be represented by triples, we might create a database table like this to accommodate it. (Note that more detailed descriptions of fields will be given for the "real-life"
schema later in this document.)


/*
Metadata is stored here.

subject - unique ID of page we are describing.

term - refers to metaterms.term; we look up metaterms.termname
to find the value that goes in the name attribute of the meta
element.

termvalue - what goes in the content attribute of the meta element.
*/
create table metadata
(
subject int unsigned not null,
term int unsigned not null,
unique index(subject,term),
termvalue text
);

/*
Terms are stored here.
*/
create table metaterms
(
term int unsigned not null auto_increment primary key,
term_name varchar(64)
);

/*
Set up some terms.
*/
insert into metaterms (term_name) values ('description'), ('keywords');

/*
Now create description and keywords records for our page which
has unique ID of 1.
*/
insert into metadata (subject, term, termvalue) values
(1,1,'an article about metadata'),(1,2,'metadata; Dublin Core; blah blah;');

See? All nice and simple for triples.

Dublin Core Complicates the Issue

Let's have a look at a couple of meta elements containing Dublin Core metadata assertions:


<meta name="DC.title" lang="en" content="Extensible Metadata for Your CMS" />
<meta name="DCTERMS.created" scheme="DCTERMS.W3CDTF" content="2009-12-05" />

Our DC.title has an extra property, lang, and DCTERMS.created has an extra
property, scheme. This somewhat complicates matters and means that the triple is no longer capable of holding all the bits we need. We are now moving up in the n-tuple (a triple is a tuple with 3 components, an n-tuple is a tuple with n components) world. Our triple, or 3-tuple, has now become a 6-tuple.

If you are now wondering how I came up with a 6-tuple, let's have a count:

  1. Subject (this page, implied)
  2. Vocabulary – the first part of the name attribute. From our example, this is either DC or DCTERMS.
  3. Term name – the second part of the name attribute.
  4. Scheme
  5. Language
  6. Value of the content attribute.

So, the mysterious extra member of the n-tuple occurs because we are overloading the name attribute of the meta element.

Our database structure just got a bit more complicated. How much more complicated is up to the developer; we can either stand up as purists and use a fully relational model, or we can cheat, simplify things and hope they don't come back to bite us. If we plan things carefully and consider the scenarios in which we are going to use our CMS, hopefully being bitten by the results of Bad Decisions will not be amongst our worries.

The Fully Relational Method

Is actually not quite full relational. I have cheated a little even in this method to make metadata searches a little more efficient. Let's have a look at the new schema of our metadata table:

Metadata Table


create table metadata
(
subject int unsigned not null,
index(subject),
termid int unsigned not null,
index(term),
scheme int unsigned not null,
index(scheme),
lang char(8) not null,
index(lang),
termvalue text,
fulltext(termvalue)
);

Metadata Table Fields

subject
The unique ID of the page in question (eg: nid for Drupal.)
termid
foreign key – refers to the primary key of the metaterms table, described below.
scheme
foreign key – refers to the primary key of the schemes table, described below.
lang
The language of the content of the meta element (eg: EN-US, FR, DE, etc.)
termvalue
The actual value of the content attribute of the meta element.

You will note that this table does not have a field to store the vocabulary. This is not necessary as this may be looked up from the metaterms table.

The columns scheme and lang are designated NOT NULL for purposes of indexing.
As values for these are not always present, we would populate these with
0 (zero) and 'NULL' respectively when no values are given. The
software generating the meta element for the HTML document would skip creation of the respective attributes if these defined null values were found.

Metaterms Table

The metaterms table is where we define all the metadata terms that we can use.


create table metaterms
(
termid int unsigned not null auto_increment primary key,
vocabterm varchar(32) not null,
unique index(vocabterm),
vocab int unsigned,
defscheme int unsigned
);

Metaterms Table Fields

termid
The primary key for this table.
vocabterm
The value of the name attribute of the meta element. It is here that a bit of "cheating" takes place. You will
recall that the name attribute of the meta element is overloaded by combining both vocabulary and term, as in DC.title. The metaterms table would have a field that contains just the term – at least it would if were doing things nicely. For the sake of efficiency, however, the vocabterm field contains the same vocabulary+term value that appears in the name attribute of the meta element. (The alternative would be to look up the vocabulary [the DC part of DC.title] from the vocabs table.)
vocab
foreign key – refers to the primary key of the vocabs table.
defscheme
foreign key – refers to the primary key of the schemes table; this is the default scheme for this term. If we want our system to be flexible, we should
let the user override this on a per-use basis, if they so wish.

See Appendix A for a dataset that can be used to pre-populate this table.>

Vocabs Table

The vocabs table is where we set up master records for the different vocabularies that will use. One of the key functions of this table is to provide the URIs that should be linked in our HTML document <head></head>.
For a full Dublin Core implementation, these would be:

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" />

And here's the schema:


create table vocabs
(
vocab int unsigned not null primary key,
vocabname varchar(8),
vocaburi varchar(128)
);

Vocabs Table Fields

vocab
The primary key for this table.
vocabname
This is the first of the values that are joined in the name attribute
of the meta element – the DC of DC.title or DCTERMS of DCTERMS.created.
vocaburi
URI of the schema for this vocabulary, for instance http://purl.org/dc/elements/1.1/ for the DC vocabulary.

Appendix B provides a dataset that can be used to pre-populate this table.

Schemes Table

The schemes table provides a list of possible values that can be used
in the scheme attribute of the meta element.


create table schemes
(
scheme int unsigned not null auto_increment primary key,
schemename varchar(32) not null,
index(schemename)
);

Schemes Table Fields

scheme
The primary key for this table.
schemename
The actual value that will appear in the scheme attribute of
the meta element.

Appendix C provides a dataset that can be used to pre-populate this table.

Simple/Cheats' Method

If we are prepared to sacrifice flexibility and accept the default scheme in
the metaterms table as being the only that may be used for each term, we can do away with the schemes table altogether and replace the integer column metaterms.scheme with a varchar column containing that default scheme.

Another option would be to abandon the vocabs table and hard-code the links shown in the vocabs table section into the document template. If additional vocabularies were to be added, any corresponding schema links would also need to be added to the template.

SQL Queries

Whilst the database structure described here should provided what is required to implement a metadata repository for a CMS, I will provide some example queries to help get the ball rolling.

Vocabulary Links

select concat('schema.',vocabname), vocaburi from vocabs
where vocaburi is not null and vocaburi!='';

This will provide values ready to put in the rel and href attributes
of link elements. These links could also be added as static text
to the page template, as described in the Simple/Cheats' Method section.

Retrieving Metadata for a Page

Assuming that our page/node ID is 1234:

select t.vocabterm, s.schemename, m.lang, m.termvalue
from metadata m
join metaterms t on t.termid=m.termid
left join schemes s on s.scheme=m.scheme
where m.subject=1234;

This will return values for the meta element attributes name, scheme, lang, and content respectively. As values for scheme and lang may be NULL, creation
of these attributes should be suppressed if no value is returned for them.

Note
that the schemes table is attached with a left join so that a NULL may be
returned if the value of metadata.scheme=0.
(See Metadata Table Fields.)

Further queries may be added to this section if I think of anything
else that might be useful.

Implementation

Here is a toolkit, how it is implemented is the choice of the developer. Here are some pointers that may assist.

It may be sufficient for many to implement on Dublin Core metadata. When this is the case, no provision need be made in the CMS for maintaining the
metaterms, vocabs and schemes tables – the values provided in the appendices should provide all that is needed. If another vocabulary were identified that might be useful to a reasonable number of CMS users, this too could be added to the inserts in the appendices and no provision be made for maintaining it through the CMS.

If a form, or section of form in the CMS page maintenance area is provided for adding/maintaining metadata for pages, some fields could be pre-populated. DC.title could take the existing page title (I cannot see any reasonable situation where these would be different;) DC.identifier – the page URI – could be calculated; DCTERMS.created and DCTERMS.modified could certainly be derived automatically; DC.rights could be taken from a site default; DC.type and DC.format would generally be fixed. And the list goes on. Pre-population of fields would make the task of maintaining metadata less onerous and encourage compliance, which may be an issue in some organisations where provision of metadata is mandated.

Search facilities could be built that could identify lists of documents by author (DC.creator,) creation date, etcetera. I created an experimental metadata repository a while back – some four million pseudo-pages, each with three items of metadata. Searches on unique metadata values all completed under a second, much to my surprise. The repository used nearly the same table schemata (including indexing) presented here, so a powerful search engine would not be hard to implement for a CMS holding very large numbers of pages. I am currently unable to find the search queries I used, but will append them to the SQL Queries section, should I come across them at any point.

Sitemaps and other machine-readable (RDF) views of the repository could be generated – either as the results of search queries, or just dumps of the entire repository.

License

The contents of this document are released under the Creative Commons Attribution 3.0 Unported License. If you make use of the material presented here, I require attribution as a contributor to your work. A link back to this page would be nice, too. Yes, you can use it commercially; if you make heaps of money out of it, I'm rather partial to full-bodied reds. Hint, hint.

If you do make use of this material in your project, I'd love to hear from you and link to your project from this page.

Appendices

Appendix A

Values for the metaterms table.

insert into metaterms (vocabterm,vocab,defscheme) values
('DC.contributor','1',''),
('DC.coverage','1',''),
('DC.creator','1',''),
('DC.date','1','18'),
('DC.description','1',''),
('DC.format','1','4'),
('DC.identifier','1','17'),
('DC.language','1','13'),
('DC.publisher','1',''),
('DC.relation','1',''),
('DC.rights','1',''),
('DC.source','1',''),
('DC.subject','1',''),
('DC.title','1',''),
('DC.type','1','2'),
('DCTERMS.abstract','2',''),
('DCTERMS.accessRights','2',''),
('DCTERMS.accrualMethod','2',''),
('DCTERMS.accrualPeriodicity','2',''),
('DCTERMS.accrualPolicy','2',''),
('DCTERMS.alternative','2',''),
('DCTERMS.audience','2',''),
('DCTERMS.available','2',''),
('DCTERMS.bibliographicCitation','2',''),
('DCTERMS.conformsTo','2',''),
('DCTERMS.created','2','18'),
('DCTERMS.dateAccepted','2',''),
('DCTERMS.dateCopyrighted','2',''),
('DCTERMS.dateSubmitted','2',''),
('DCTERMS.educationLevel','2',''),
('DCTERMS.extent','2',''),
('DCTERMS.hasFormat','2',''),
('DCTERMS.hasPart','2',''),
('DCTERMS.hasVersion','2',''),
('DCTERMS.instructionalMethod','2',''),
('DCTERMS.isFormatOf','2',''),
('DCTERMS.isPartOf','2',''),
('DCTERMS.isReferencedBy','2',''),
('DCTERMS.isReplacedBy','2',''),
('DCTERMS.issued','2',''),
('DCTERMS.isVersionOf','2',''),
('DCTERMS.license','2','17'),
('DCTERMS.mediator','2',''),
('DCTERMS.medium','2',''),
('DCTERMS.modified','2','18'),
('DCTERMS.provenance','2',''),
('DCTERMS.references','2',''),
('DCTERMS.replaces','2',''),
('DCTERMS.requires','2',''),
('DCTERMS.rightsHolder','2',''),
('DCTERMS.spatial','2',''),
('DCTERMS.tableOfContents','2',''),
('DCTERMS.temporal','2',''),
('DCTERMS.valid','2',''),
('HTML.title','3',''),
('HTML.doctype','3',''),
('OTHER.description','4',''),
('OTHER.keywords','4',''),
('OTHER.author','4',''),
('OTHER.copyright','4',''),
('OTHER.generator','4','');

Appendix B

Values for the vocabs table.

insert into vocabs values
('1','DC','http://purl.org/dc/elements/1.1/'),
('2','DCTERMS','http://purl.org/dc/terms/'),
('3','HTML',NULL),
('4','OTHER',NULL);

Note the inclusion of vocabs HTML and OTHER. I have provided these so that our metadata repository can store the title and doctype of the HTML document (vocab=HTML,) and various 'legacy' metadata terms (vocab=OTHER,) if so required. If these are excluded, the corresponding entries should also be excluded from the end of the insert in Appendix A,
above.

Appendix C

Values for the schemes table.

insert into schemes (schemename) values
('Box'), ('DCMIType'), ('DDC'), ('IMT'),
('ISO3166'), ('ISO639-2'), ('LCC'), ('LCSH'),
('MESH'), ('NLM'), ('Period'), ('Point'),
('RFC1766'), ('RFC3066'), ('TGN'), ('UDC'),
('URI'), ('W3CDTF');

TinyURL for this page: http://tinyurl.com/ybgr2ds