Semantics and Structure

Published 19:54 on 05 March, 2007

I’m currently working on improving nefariousdesigns.co.uk - both in design and in technology. I’ve continued to learn lots of great web development “stuff” over the past year, and it’s time I updated my personal site to reflect that burgeoning knowledge.

My first port of call is the underlying structure of my HTML - it’s ok, but it could definitely be improved.

I recently had a fantastic conversation with Mike Pearce - an old colleague (who’s now a good friend) - regarding document structure in HTML. Whilst peer-reviewing some of his code, I noticed some not-uncommon structural characteristics that, although not wrong, definitely didn’t convey the best semantics within the document. For this reason I thought I’d document some of the stuff I’ve learned about structured HTML and open it up for comment.

The Point

So what do we gain from enforcing good structure in our HTML documents? Well, to begin with, it aids accessibility by describing and organising our content. This means that screen readers, normal browsers, and even search engines are able to understand our content and can navigate and classify it better.

Good structure is often overlooked as being important because we can easily emulate it visually with styling - in fact it’s often the case that a badly structured HTML document is, in fact, inferring structure using paragraphs or table cells styled as headings. This is simply down to developers not understanding the importance of structure within HTML. A well structure HTML document is the base for good web standards, as it allows us to set out an organised template for more semantic content. In fact, it’s the very beginning of semantics.

Semantics

Semantic is one of those words you’ll hear a lot from web standards developers and standardistas; it defines our existence and, without it, we’d all suffer an existential crisis and degrade into a depressed rabble of alcoholics… oh wait…

In all seriousness, it’s a word I’ve heard people use without truly understanding the meaning. So let’s take a look at a dictionary definition:

se·man·tic adj.
  1. Of or relating to meaning, especially meaning in language.
  2. Of, relating to, or according to the science of semantics.

Great - but what does it mean in regard to HTML? Well basically, it means using HTML properly - to describe our content with the elements supplied in our HTML DocType; rather than describing presentation. Good HTML should be as semantically correct as possible, which means using elements that describe headings, lists, content inflection, and organisation, instead of using elements to arrange content visually. A good example of this would be table-based design. Once our content is described properly using semantic HTML, we can improve presentation using external stylesheets.

Perception

To really understand the importance of structure in our HTML documents, you need to alter your perception. I’m not talking about taking excessive amounts of mind-altering drugs (unless that kind of thing floats your boat - I’ve met a few space-cadet web developers in my time), but simply changing your point of view in regard to your vision of the internet.

As humans, we’ve begun to perceive the internet as a series of sites, when in fact it is nothing more than a complex network of single pages; totally oblivious to the domain and file structures we use to organise them. Each page is mutually exclusive; it exists as a single entity and links to many similar entities. As humans, we perceive sites because they imply organisation; browsers and search engines only see pages and are generally unconcerned with site structure. This means that each page is a unique document and, as such, may require structure to make sense within context.

This is also the case for anyone landing on our sites from search engines - the site structure is not instantly conveyed, yet the structure of the single page is. If we want people to understand the page they are looking at, we need to organise it accordingly.

Headings

Heading elements are our key method of outlining structure. Much like any other type of document we may be writing, headings allow us to develop a document map - a set of titles for key points or content groupings. This map of our document could be compared to the “contents” section of a text book, since it outlines the hierarchy of our content - and in some cases, could even be used for navigation purposes.

A handy tool I’ve discovered for viewing the structure of a page is the Document Map Extension for Firefox. If you’re using Firefox, I recommend you install it and have a bit of a play - it will help you to visualise what I’m talking about!

The Heading Elements

HTML provides us with six heading elements; <h1> through to <h6>. It’s invalid to start with anything but an <h1> since our heading tags are required in order by the DocType. A basic structure might only need <h1> elements. Here’s an example:

<h1>Introduction</h1>

<p>This is the beginning of our document. This could be
any kind of content you so desired - but it would all
fall under the heading "Introduction".</p>

<h1>Summmary</h1>

<p>This is another section. Note that, in this example,
both "Summary" and "Introduction" exists at the same
level in the hierarchy</p>

Should we need to, we can add further layers to our hierarchy using the extra heading elements like so:

<h1>The Ten Word Review</h1>

<h2>What is it?</h2>

<p>It's an excellent site where users review various
things in 10 words. No more, no less.</p>

<h2>What's the point?</h2>

<h3>Information Resource</h3>

<p>Find out what other people think about something.</p>

<h3>Voice Your Opinion</h3>

<p>Speak your mind as succinctly as possible.
Voice your opinion without all that needless waffle.</p>

<h2>Where can I find it?</h2>

<p>That's easy - you can find it here:</p>

<p><a href="http://thetenwordreview.com">
  The Ten Word Review
</a></p>

Here you can see that a good peppering of heading elements allows us to structure our document meaningfully. Obviously it’s of more use when there’s a lot more content, but I hope you get the idea.

Notice that the numbers in the heading elements correspond to the level of the heading within the page - not their order of importance as content. This is a common mistake that I’ve seen made in the past and it’s a fundamental misunderstanding of the elements themselves. It’s also worth pointing out that, even though a site-wide heading (like “Bedtime Reading” on this site) may be an <h2> on one page, there’s no reason it shouldn’t be a different level on different page. For this reason, it’s worthwhile thinking about content structure well before you start styling things up with CSS. I’ve always been a proponent of markup before styling and this is a good example of where this method is particularly valid.

Understanding the heading elements is the key to applying structure. The real problems arise when we try to decide where to place those elements. This is the most common stumbling block for web developers.

Rules of Engagement

When trawling through the burgeoning number of standards-based sites on the internet, I’ve noticed that most developers place the title of the site within an <h1> element. For home pages, this is certainly the correct thing to do; but on other pages in the site, it really isn’t.

Unfortunately, this phenomenon is a side-effect of visualising site structure instead of page structure - as developers we’re always aware that the page we’re developing is situated within our site, therefore we surmise that the site name is always our top-level heading. This is definitely something I’m personally guilty of in my current structure and is one of the first things I’m going to address with my redesign/realign.

If we alter our perception and view our site as a number of unique (yet still related) pages, we begin to understand that the title of the page itself should be our top-level heading (the <h1> element). In fact, as a side-note, if you make sure the title element and your first <h1> element match, you’re instantly boosting your SEO. This is because, as I pointed out earlier, search engines have less concept of your site structure than they do of your page structure.

Case Study

With all this in mind, let’s take a look at the home page of this site:

A small image of Nefarious Designs' home page

So what do I use as the top-level heading on this page? Obviously I don’t have the words “home page” anywhere in the content - and why should I? It’d be pretty meaningless in terms of context; the best heading to use here is the title of the site, since the home page is an overview of that. In this particular instance I’d like to use a logo graphic as the heading, rather than just plain text, so I am presented with a number of options. For instance, I could wrap the text “Nefarious Designs” in the <h1> element and then use image replacement in CSS to introduce the logo; or (the option I chose) I can wrap the logo image in the <h1> element - this is still perfectly valid HTML. I chose this option because the graphic has importance as my logo - if it were simply a graphical representation of heading text, I’d probably have used image replacement.

Now that I have defined my top-level heading, I usually do a quick scan of the content to make sure there are no more headings that I wish to exist at the same level. This does not occur often but it’s always worth checking. In the case of my home page (and for most other home pages), I want the title of the entire site to exist at that level, as discussed before. The next step is to look at the next level of headings.

The next level of headings are associated with my content modules - these are defined chunks of related information that I organised together when defining my information architecture. These include the latest three posts, my portfolio, my del.icio.us feed, my flickr feed and my bedtime reading - these are all completely different groups of content and, as such, deserve to be categorised beneath different headings. This method continues within my content - needless to say I probably don’t need to go any further for you to understand my methodology; in fact, if you’re interested in seeing further levels of heading, you’re probably best off looking at the source behind one of my posts.

Within the text of my posts, the actual post title is the most important element and is marked up as such. In this particular type of page I no longer require the importance I bestowed upon the site title on the home page, since it is not really as relevant here. In fact, I’m more likely to place the title of the post in the title element, therefore I should think about doing the same in my <h1> element. Further headings beneath this will reflect this change in structure - this does, however, mean that we can’t use any headings before our <h1>; but if you think about it, this makes perfect sense since the title of the page should always be the first heading.

Finally our content modules, which are repeated from the homepage, have the choice of being marked up at the same level as my post title; or I could mark them up as a level below since they are content items within a page with that title. In fact, it’s probably preferable to use this method as the content is repeated throughout the site and should probably maintain a constant structure.

Summary

Currently the document map of each page of my site is structured more towards the site than the page itself. This ultimately means that my document structure is inefficient and confusing. By correcting this I will improve both the accessibility of my pages and their search engine optimisation. Both of these goals are key when redesigning my site.

Hopefully, using my own site as an example, I’ve laid out the fundamentals of document structure; and equally how important it is to good web standards development. By improving our understanding of document maps we can improve our pages for navigation by all our users and their respective internet browsing clients; be they standard browsers, screen-readers, or even search engine robots. You can also see how easy it is to improve your structure by taking these things into account - and also how easy it is to get it wrong without realising.