Semantics and Structure

5th March, 2007 @ 10:54am GMT

Accessibility, Browsers, Design, Development, Tutorials, Web, Web Standards, XHTML / 20 Comments

I’m currently working on improving nefariousdesigns.co.uk - both in design and in technology. I’ve continued to learn lots of great web development “stuff” over the past year, and it’s time I updated my personal site to reflect that burgeoning knowledge.

My first port of call is the underlying structure of my HTML - it’s ok, but it could definitely be improved.

I recently had a fantastic conversation with Mike Pearce - an old colleague (who’s now a good friend) - regarding document structure in HTML. Whilst peer-reviewing some of his code, I noticed some not-uncommon structural characteristics that, although not wrong, definitely didn’t convey the best semantics within the document. For this reason I thought I’d document some of the stuff I’ve learned about structured HTML and open it up for comment.

The Point

So what do we gain from enforcing good structure in our HTML documents? Well, to begin with, it aids accessibility by describing and organising our content. This means that screen readers, normal browsers, and even search engines are able to understand our content and can navigate and classify it better.

Good structure is often overlooked as being important because we can easily emulate it visually with styling - in fact it’s often the case that a badly structured HTML document is, in fact, inferring structure using paragraphs or table cells styled as headings. This is simply down to developers not understanding the importance of structure within HTML. A well structure HTML document is the base for good web standards, as it allows us to set out an organised template for more semantic content. In fact, it’s the very beginning of semantics.

Semantics

Semantic is one of those words you’ll hear a lot from web standards developers and standardistas; it defines our existence and, without it, we’d all suffer an existential crisis and degrade into a depressed rabble of alcoholics… oh wait…

In all seriousness, it’s a word I’ve heard people use without truly understanding the meaning. So let’s take a look at a dictionary definition:

se·man·tic
adj.

  1. Of or relating to meaning, especially meaning in language.
  2. Of, relating to, or according to the science of semantics.

Great - but what does it mean in regard to HTML? Well basically, it means using HTML properly - to describe our content with the elements supplied in our HTML DocType; rather than describing presentation. Good HTML should be as semantically correct as possible, which means using elements that describe headings, lists, content inflection, and organisation, instead of using elements to arrange content visually. A good example of this would be table-based design. Once our content is described properly using semantic HTML, we can improve presentation using external stylesheets.

Perception

To really understand the importance of structure in our HTML documents, you need to alter your perception. I’m not talking about taking excessive amounts of mind-altering drugs (unless that kind of thing floats your boat - I’ve met a few space-cadet web developers in my time), but simply changing your point of view in regard to your vision of the internet.

As humans, we’ve begun to perceive the internet as a series of sites, when in fact it is nothing more than a complex network of single pages; totally oblivious to the domain and file structures we use to organise them. Each page is mutually exclusive; it exists as a single entity and links to many similar entities. As humans, we perceive sites because they imply organisation; browsers and search engines only see pages and are generally unconcerned with site structure. This means that each page is a unique document and, as such, may require structure to make sense within context.

This is also the case for anyone landing on our sites from search engines - the site structure is not instantly conveyed, yet the structure of the single page is. If we want people to understand the page they are looking at, we need to organise it accordingly.

Headings

Heading elements are our key method of outlining structure. Much like any other type of document we may be writing, headings allow us to develop a document map - a set of titles for key points or content groupings. This map of our document could be compared to the “contents” section of a text book, since it outlines the hierarchy of our content - and in some cases, could even be used for navigation purposes.

A handy tool I’ve discovered for viewing the structure of a page is the Document Map Extension for Firefox. If you’re using Firefox, I recommend you install it and have a bit of a play - it will help you to visualise what I’m talking about!

The Heading Elements

HTML provides us with six heading elements; <h1> through to <h6>. It’s invalid to start with anything but an <h1> since our heading tags are required in order by the DocType. A basic structure might only need <h1> elements. Here’s an example:

<h1>Introduction</h1>

<p>This is the beginning of our document. This could be
any kind of content you so desired - but it would all
fall under the heading "Introduction".</p>

<h1>Summmary</h1>

<p>This is another section. Note that, in this example,
both "Summary" and "Introduction" exists at the same
level in the hierarchy</p>

Should we need to, we can add further layers to our hierarchy using the extra heading elements like so:

<h1>The Ten Word Review</h1>

<h2>What is it?</h2>

<p>It's an excellent site where users review various
things in 10 words. No more, no less.</p>

<h2>What's the point?</h2>

<h3>Information Resource</h3>

<p>Find out what other people think about something.</p>

<h3>Voice Your Opinion</h3>

<p>Speak your mind as succinctly as possible.
Voice your opinion without all that needless waffle.</p>

<h2>Where can I find it?</h2>

<p>That's easy - you can find it here:</p>

<p><a href="http://thetenwordreview.com">
  The Ten Word Review
</a></p>

Here you can see that a good peppering of heading elements allows us to structure our document meaningfully. Obviously it’s of more use when there’s a lot more content, but I hope you get the idea.

Notice that the numbers in the heading elements correspond to the level of the heading within the page - not their order of importance as content. This is a common mistake that I’ve seen made in the past and it’s a fundamental misunderstanding of the elements themselves. It’s also worth pointing out that, even though a site-wide heading (like “Bedtime Reading” on this site) may be an <h2> on one page, there’s no reason it shouldn’t be a different level on different page. For this reason, it’s worthwhile thinking about content structure well before you start styling things up with CSS. I’ve always been a proponent of markup before styling and this is a good example of where this method is particularly valid.

Understanding the heading elements is the key to applying structure. The real problems arise when we try to decide where to place those elements. This is the most common stumbling block for web developers.

Rules of Engagement

When trawling through the burgeoning number of standards-based sites on the internet, I’ve noticed that most developers place the title of the site within an <h1> element. For home pages, this is certainly the correct thing to do; but on other pages in the site, it really isn’t.

Unfortunately, this phenomenon is a side-effect of visualising site structure instead of page structure - as developers we’re always aware that the page we’re developing is situated within our site, therefore we surmise that the site name is always our top-level heading. This is definitely something I’m personally guilty of in my current structure and is one of the first things I’m going to address with my redesign/realign.

If we alter our perception and view our site as a number of unique (yet still related) pages, we begin to understand that the title of the page itself should be our top-level heading (the <h1> element). In fact, as a side-note, if you make sure the title element and your first <h1> element match, you’re instantly boosting your SEO. This is because, as I pointed out earlier, search engines have less concept of your site structure than they do of your page structure.

Case Study

With all this in mind, let’s take a look at the home page of this site:

A small image of Nefarious Designs' home page

So what do I use as the top-level heading on this page? Obviously I don’t have the words “home page” anywhere in the content - and why should I? It’d be pretty meaningless in terms of context; the best heading to use here is the title of the site, since the home page is an overview of that. In this particular instance I’d like to use a logo graphic as the heading, rather than just plain text, so I am presented with a number of options. For instance, I could wrap the text “Nefarious Designs” in the <h1> element and then use image replacement in CSS to introduce the logo; or (the option I chose) I can wrap the logo image in the <h1> element - this is still perfectly valid HTML. I chose this option because the graphic has importance as my logo - if it were simply a graphical representation of heading text, I’d probably have used image replacement.

Now that I have defined my top-level heading, I usually do a quick scan of the content to make sure there are no more headings that I wish to exist at the same level. This does not occur often but it’s always worth checking. In the case of my home page (and for most other home pages), I want the title of the entire site to exist at that level, as discussed before. The next step is to look at the next level of headings.

The next level of headings are associated with my content modules - these are defined chunks of related information that I organised together when defining my information architecture. These include the latest three posts, my portfolio, my del.icio.us feed, my flickr feed and my bedtime reading - these are all completely different groups of content and, as such, deserve to be categorised beneath different headings. This method continues within my content - needless to say I probably don’t need to go any further for you to understand my methodology; in fact, if you’re interested in seeing further levels of heading, you’re probably best off looking at the source behind one of my posts.

Within the text of my posts, the actual post title is the most important element and is marked up as such. In this particular type of page I no longer require the importance I bestowed upon the site title on the home page, since it is not really as relevant here. In fact, I’m more likely to place the title of the post in the title element, therefore I should think about doing the same in my <h1> element. Further headings beneath this will reflect this change in structure - this does, however, mean that we can’t use any headings before our <h1>; but if you think about it, this makes perfect sense since the title of the page should always be the first heading.

Finally our content modules, which are repeated from the homepage, have the choice of being marked up at the same level as my post title; or I could mark them up as a level below since they are content items within a page with that title. In fact, it’s probably preferable to use this method as the content is repeated throughout the site and should probably maintain a constant structure.

Summary

Currently the document map of each page of my site is structured more towards the site than the page itself. This ultimately means that my document structure is inefficient and confusing. By correcting this I will improve both the accessibility of my pages and their search engine optimisation. Both of these goals are key when redesigning my site.

Hopefully, using my own site as an example, I’ve laid out the fundamentals of document structure; and equally how important it is to good web standards development. By improving our understanding of document maps we can improve our pages for navigation by all our users and their respective internet browsing clients; be they standard browsers, screen-readers, or even search engine robots. You can also see how easy it is to improve your structure by taking these things into account - and also how easy it is to get it wrong without realising.

Like this post? Digg it, Del.icio.us it, Ma.gnolia it!

Comments (20)

Skip to the comment form…

  1. Gravatar Image Mike Pearce March 5, 2007 @ 11:19 am

    Great post - a little hard to get my head round, it’s most definately a change of perception. How did we get that perception in the first place though?

  2. Gravatar Image Nathan Smith March 5, 2007 @ 4:43 pm

    Interesting point about the h1 usage to reflect site hierarchy vs. page structure. I’m not sure if I disagree, or if it is just an unsettling paradigm shift for me. I will have to think it over further. Thanks for bringing it up!

  3. Gravatar Image Brendon Kozlowski March 6, 2007 @ 2:05 am

    I’m curious as to what you plan to do with your site’s logo on deeper pages in your site. I would assume you still wish to keep your site’s logo somewhere on the page as it’s a form of branding. However, from what I read here, you would no longer keep it within an H1 tag (in pages other than HOME), because the title of the page would have changed, as well as the main subject.

    Would you simply remove the H1 tags from enclosing the image, but leave the image in place?

  4. Gravatar Image Tim March 6, 2007 @ 10:12 am

    @ Nathan: Yeah, it took me a while to come around to this mode of thinking but, ultimately, it is how our pages are perceived by search engines and browsers. It’s this perception that molds our own view of the web (as readers, that is) and, as a result, I think it’s highly important that we understand that any one of our pages may only ever be viewed as a single page.

    @ Brendon: Yes; for any page other than the home page I would simply leave the logo as an image and remove the h1 element.

  5. Gravatar Image Keith March 6, 2007 @ 1:21 pm

    A good read. Thanks:-)

    A few things that came to mind when reading it:

    1) There is a Document Outline option already available in the Web Developer Toolbar, so if you have that installed there’s no need to install the one you mentioned above. To see the outline go to the Information menu and choose ‘View Document Outline’.

    2) I think the reason a lot of sites have the logo inside a H1 tag on all pages is because of the underlying system that powers the site. Lots of blog software, content management systems, etc use the same header file/template for the entire site, which hinders the developers ability to change the structure. Thakfully Wordpress has a built in is_home() function which will allow you to setup an if/else statement to remove the H1 tag from deeper pages. Maybe developers should be taking options like that into consideration when choosing the system that will power their site?

    I’m currently redesigning my own site and it uses a H1 tag (with image replacement) for the logo on all pages. After reading this article I’ll be changing the structure. Thanks for making more work for me ;-)

  6. Gravatar Image ephi March 6, 2007 @ 1:24 pm

    I’m not too keen on using <h1> tags twice or more in a single HTML document. If it’s a book, it’s like having two titles for the book. I clearly understand that a webpage is not a book, maybe it’s just a different point of view. Thanks for the enlighting article. ;-)

  7. Gravatar Image Tim March 6, 2007 @ 2:04 pm

    @ Keith: Firstly, sorry for making life harder. :)

    In regard to your first point, I never realised that tool was there in Web Developer so thanks for pointing that out. When I get a moment, I’ll update the article to reflect that.

    As for your second point, I completely agree that some CMS/blogging software may limit your choices. In most cases this is just to make life easier for non-back-end developers. Wordpress, for example, is quite happy for you to define different headers for different pages - just don’t use the built in header include methods.

    Basically, back-end code should never limit your choices for front-end code - if it does, something is most definitely wrong.

  8. Gravatar Image Jon Henshaw March 6, 2007 @ 3:40 pm

    As a rule of thumb, the code should always be written for a search bot (assuming you care about search engine performance). That means you make sure everything is semantically structured in the code (H1s before H2s, etc… — just as you described in your article). Layout should be secondary, from the sense that in most cases you’ll be able to get whatever layout you want, regardless of markup, if you’re clever enough with how you write your CSS.

  9. Gravatar Image Tim Hofman March 6, 2007 @ 4:16 pm

    Good article! It seems that more and more developers are getting aware of the importance of headings, which obvisouly is good thing!

    I agree with you that on an article page the h1 shouldn’t be the sitelogo/name, but the title of the article on the page in question. While working on a redesign, or better said the first real design of my site, I stumbled on a problem with headers prior to navigation blocks.

    It is very helpfull for screenreader users when there is a header prior to the ul or ol of the navigation. Now the problem is that the navigation always, well at least most of the time, comes prior to the primary content and the specs tells us that we must not skip headers and h2 should come after a h1 and not vice versa. This means that the the primary navigation should be the h1, but that just doesn’t feel right cause like I said we want the title of the article to be the h1.

    Having multiple h1 headers isn’t also the solution in my opinion, which leaves us one option: using a h2 for the navigation, although it’s prior to the h1 or having the article before the navigation.

    It’s not what spec recommends us, but I think that this is the best solution.

  10. Gravatar Image Tim March 6, 2007 @ 5:16 pm

    @ Tim: Personally, I’m more inclined to place the content before the navigation, as you suggested - obviously making sure there is a “skip to navigation” link at the top of the page.

    This method ensures that your page is not only more accessible, but also that your content is presented to the search engines first; thus improving search engine optimisation.

  11. Gravatar Image Anton March 6, 2007 @ 6:19 pm

    I too, will be restructuring quite a bit. But it might take some time, since I just recently launched it.

    But what you wrote makes a lot of sense. Thanks!

  12. Gravatar Image Ryan March 6, 2007 @ 8:25 pm

    the good old w3.org validator also provides an outline option (which can be enabled via a checkbox on their form, or in the query string via a referer check – http://validator.w3.org/check?uri=referer&outline=1)

  13. Gravatar Image Tim Hofman March 7, 2007 @ 8:12 am

    It is a possibility to put your content prior to the navigation, although a tests with screenreader users shows us that less experienced screenreader users do not know how to get a list of links or headers to navigate through, nor do some know or understand our skip links. Those users have to read the whole article before being able to navigate through the website. Another thing is that users expect the navigation to be on the top, not somewhere at the bottom.

    I know I would sacrifice seo a bit for my users. When the rest of the website has good semantic markup etc. you’ll eventually also get a good search engine ranking.

  14. Gravatar Image Tim March 7, 2007 @ 9:18 am

    @ Tim: The following statement bothers me a little:

    I know I would sacrifice seo a bit for my users.

    But not half as much as the following from Jon Henshaw’s comment earlier:

    As a rule of thumb, the code should always be written for a search bot (assuming you care about search engine performance).

    Basically we should always sacrifice SEO for our users (which is, I think, what you were getting at). Our HTML should never be written for the search bots but, rather, written to achieve high accessibility. Search engine optimisation should be a very happy side effect of this process.

  15. Gravatar Image Tim Hofman March 8, 2007 @ 10:09 am

    @Tim

    You’re right, thats exactly what I meant to say. SEO is a good thing, but shouldn’t effect the users of a web site.

  16. Gravatar Image andy trusz March 8, 2007 @ 12:19 pm

    The specs are very clear about headings: “a heading element briefly describes the topic of the section it introduces”. How a section is defined is up to the author. There is no one “right” way.

    The purpose of the specs is to help machines give a better rendering of the author’s intent. Elements provide consistent means for marking up a document. Headings should group information be it textual, graphic, or multimedia. Extracting headings, as the spec mentions, should provide something like a table of contents — like the outlines one learns to do in school. Similarly, extracting definition lists should result in paired values which make sense. This should happen no matter what document is analyzed.

    It’s your second definition — the science of semantics — that matters here. It’s the meaning behind meaning. Proper use of elements allows machines to more accurately reflect what the author had to say. A UA, for example, could then do a search which returned a manageable numbers of hits each of which had a high degree of correlation with the search criteria. Add RDF and OWL and a great deal of the chaos of surfing and searching can be mitigated.

    Of course an author can choose to misuse elements. As a more semantic web emerges, that would mean an author has purposely chosen to markup the document as gibberish. Want your document to be recognized, follow standards. Want your document seen as unintelligible, ignore standards.

    Now we just need browsers and other UAs that can deliver on the promise.

  17. Gravatar Image mark rushworth March 8, 2007 @ 3:50 pm

    LOL in your example, the purests would say that that’s a definition list since your describing elements lol!!!

    Mark

  18. Gravatar Image andy trusz March 9, 2007 @ 5:09 pm

    Mark wrote:
    “LOL in your example, the purests would say that that’s a definition list since your describing elements lol!!!”

    Surely only the purists. The purest would understand.

    andy

  19. Gravatar Image Tom March 10, 2007 @ 9:27 pm

    Next time you write something, get to the point and don’t define the word “semantic” — I think some of what you said was valid, but you were too long winded.

  20. Gravatar Image Patrick March 11, 2007 @ 8:36 pm

    Great post, Tks lot!
    But why did you used some title into your right block of links?
    These are not the same of your semantic content of your article (?)
    Thank you for your answer.

Leave a comment





Categories

Syndication

Technorati

© 2008 Tim Huegdon, All Rights Reserved / Website design and development by Nefarious Designs

Powered by Wordpress 2.3 / Login

No chinchillas were harmed during the making of this website.