Logo of the HfWeb 2000 conference

Automatic web usability evaluation: 
what needs to be done?

Giorgio Brajnik

Dipartimento di Matematica e Informatica
Università di Udine
Udine ­--- Italy

giorgio@dimi.uniud.it


 

Abstract

Website redesign and maintenance are likely to absorb more and more resources as web technologies and uses keep evolving at the current pace. Usability evaluation methods need to be run after each change in order to ensure a decent quality level. The means to control the complexity and cost of website maintenance lies in tools performing automatic usability evaluations.

I present a survey of tools that analyze websites, illustrating what kind of automatic tests they perform and which usability factors the tests are more closely related to. The survey then leads to an analysis of the still remaining gaps and of research openings.
 

 

1. Introduction

It is well known that the average quality of websites is poor, “lack of navigability” being the #1 cause of user dissatisfaction [Fleming, 1998; Nielsen, 1999].

On the one hand web technologies evolve extremely fast, enabling sophisticated tools to be deployed and complex interactions to take place. Secondly, the life cycle of a website is also extremely fast: maintenance of a website is performed at a rate that is higher than that of other software products because of market pressure and lack of distribution barriers. In addition, often the scope of maintenance becomes so wide that a complete redesign takes place.

On the other hand, the quality of a website is rooted on its usability, which usually results from the adoption of user-centered development and evaluation approaches [Newman and Lamming, 1994; Fleming, 1998; Rosenfeld and Morville, 1998; Nielsen, 1999]. Usability testing is thus a necessary and repeated step during the life-cycle of a website.

To test usability of a website a developer can adopt two kinds of methods: usability inspection methods (e.g. heuristic evaluation [Nielsen and Mack, 1994]) or user testing [Nielsen, 2000]. Heuristic evaluation is based on a pool of experts that inspect and use a (part of a) website and identify usability problems that they assume will affect end users. With user testing, a sample of the user population of the website is selected and is asked to use (part of the) website and report things that they think did not work or are not appropriate.

Even though the cost (in terms of time and effort) of both methods is not particularly high, and their application improves the website quality and reduce the overall development cost, they are not systematically performed at detailed levels on every different part of a website after each maintenance or development step.

It is clear that as change actions on a website increase rapidly in number and variety, more and more resources need to be deployed to ensure that website quality does not decrease (but hopefully increases). It is also clear that any tool that can, at least in part, automate the usability evaluation and maintenance processes will help to fill this ever widening gap.
 
 

The goal of this paper is to present a brief survey of what these tools do and how they contribute to the usability evaluation problem. From the analysis it appears that gaps exist between what these tools achieve and what is required to ensure usability. While some of these gaps are inherently unsolvable, other ones can probably be filled in, given that additional research is carried out to identify effective techniques.

2. A software engineering view of a website

A website is an interactive software system. It interacts with at least two different kinds of users: end users trying to achieve some goal and developers/maintainers striving to keep the system working and improving it.

End users can be characterized in terms of:

Information seeking through browsing is a process that almost all websites must support. Unfortunately, it is also a difficult task to model and support because it encompasses complex cognitive, social and cultural processes [Allen, 1996] spanning through interpretation of textual, visual, audio messages, selection of relevant information and learning.

 
 

On the other hand we have developers and maintainers. Amongst their activities, a preminent role is played by actions that include: corrective maintenance (i.e. fixing problems with the website behavior or inserting missing contents), adaptive maintenance (i.e. upgrading the site with respect to new technologies, like new browsers’ capabilities), perfective maintenance (ie. improving the site behavior or content), and preventive maintenance (i.e. fixing problems in behavior or content before they affect users). A large fraction of these activities is aimed at detecting system failures (that is departures from its required behavior), analyzing them and identifying faults (that is representations, within the system, of human errors that occurred during development – bugs).
 

 

Maintenance is meant to improve the quality of the website. ISO9126 defines quality as “the totality of features and characteristics of a software product that bear on its ability to satisfy stated or implied needs” and it includes properties like maintainability, robustness, reliability and usability that are particularly important for websites.

Usability can be defined (ISO9241) as “the effectiveness, efficiency and satisfaction with which specified users achieve specified goals in particular environments”, where:

General properties like these are not independent: for example, a robustness failure of a website (e.g. some browser incompatibility) will result also in a usability failure (e.g. user inability to complete a task and dissatisfaction).

 
 

In order to be operationalized these properties need to be decomposed into more detailed ones that can be assessed in a simpler and perhaps more standard way. For example, mantainability can be decomposed into complexity of the DHTML code, its size, the number of absolute URLs, etc.

The same applies to usability. It can be described in terms of usability factors (like speed of use, error rate, ease of error recovery, etc) which in turn can be reduced to other lower-level properties. The most important properties for website usability include those related with “navigability” (most of them taken from [Fleming, 1998]):

    consistency of presentation and controls

    adequate feedback

    natural organization of the information (systematic labels, clear hierarchical structure)

    contextual navigation (in each state all and only the possible navigation options are available)

    efficient navigation (in terms of time and effort needed to complete a task)

    clear and meaningful labels.

Other properties relevant to usability of a website are:

    robustness (i.e. how well the website handles technology used by users that has not been foreseen by developers)

    flexibility (for example: availability of graphic and textual versions, redundant indexes and site maps, duplicated image map links)

    functionality (i.e. support of users’ goals)

The latter can be further decomposed if we narrow users' goals. For e-commerce sites, for example, other relevant attributes can be:

The Web Accessibility Initiative [W3C, 2000] is an effort by the W3C organization to improve website accessibility. They publish a set of guidelines [WAI, 1999] where accessibility is defined as the website ability to be used by someone with disabilities. An accessible website:

While usability implies accessibility (at least when an unconstrained user population is considered), the contrary is not necessarily true. For example, a missing link to the home page may be a fault affecting usability, while it does not affect accessibility.

 

All these properties (either those related with usability or those related with accessibility) may be further decomposed into more detailed ones that refer to specific attributes of the website implementation. Actually, such a decomposition has to be done in order to support usability inspection methods and to identify and fix faults. For example, to determine how flexible a website is, we need to inspect implementation (or perhaps design specifications) to determine if there is a textual version of the page, if there are textual links that duplicate those embedded in images, etc.
 

 

Some of these lower-level properties refer to attributes that depend only on how the website has been designed/developed (e.g. textual duplicates of links embedded in images) – they are internal attributes, while others depend on the website and its usage (e.g. how meaningful a label is) – external attributes. This is always the case for properties referring to the content, which require some sort of interpretation that assigns meaning to symbols in order to be assessed.
 

 

While for evaluating usability of a website both internal and external attributes are needed, only the former ones are amenable for automatic tests. External attributes can be evaluated only via semi-automatic means that entail a human evaluation step. However, tools can provide useful assistance by filtering and ranking content that is potentially relevant (for example, by adopting statistical techniques developed in Information Retrieval [Belkin and Croft, 1987]).
 

 

3. Automatic tools for usability evaluation

Tools that support the developer/maintainer in finding usability faults and fixing them can be classified according to:
    location: web-based vs off-line

    type of service: failure identifiers (they discover potential failures via simulation of user actions, like filling a form; sometimes they rank them according to severity); fault analyzers (they find failures and highlight their causes, i.e. faults; usually they systematically analyze the source code of the website; sometimes ranking the list of faults according to their severity); analysis and repair tools (they assist the developer also in fixing the faults)

    information source: automatic usability analysis can be performed on the basis of the actual implementation of a website (sources), or on webserver logs, or data acquired during user testing (user testing data); this paper deals only with tools analyzing website sources

    scope, i.e. the set of attributes that are considered during the automatic analysis. A classification based on scope is:

      HTML validators and cleaners (they assist in removing non standard usage of the language)

      HTML/graphic optimizers (they improve downloading and rendering performance by recoding certain parts of HTML or graphic documents)

      link checkers (they probe all the links leaving a page to determine if their targets exist)

      usability tools (they detect and sometimes help to fix usability faults).

In the following of the paper I will discuss only tools having d) for scope, being the most general one.
 
 

At the moment the following tools have been developed and are available (or will soon be available) from the web1:

These tools cover a relatively large set of tests, which can be grouped according to usability-related properties as follows2:

    consistency of presentation and controls
      underline: avoid mixing underlined text with underlined links

      link label: different links pointing to the same resource should have the same label

      email label: labels associated to a given email address should be consistent

      color consistency: colors used for background/foreground/links should be consistent among pages

      background consistency: background images should be consistently used

      nav-bar consistency: links included in navigation bars should be consistent among pages

    adequate feedback
      freshness: pages should be time- and author- stamped
    natural organization of the information

    contextual navigation (in each state the required navigation options are available)

      NOFRAMES validity: NOFRAMES should be present and it should contain equivalent navigation options

      link to home: each page should contain a link to the home page

      logical path: each page should contain links to each intermediate page in the path connecting the page to the home

      self-referential pages: pages should not contain links to themselves

      frame titles: frames should set the “title” attribute

      local links validity: links that are local to the website should point to existing resources

      external links validity: links to external resources should be periodically checked

    efficient navigation
      site depth: the number of links that need to be followed from home page to other pages should not exceed a threshold

      table coding: table components should have explicit width and height

      image coding: images should also have explicit width and height

      download time: pages should download within given time threshold

      recycled graphics: images used in the website should be shared (so that browsers can cache them)

      hidden elements: pages should not contain elements that cannot be shown (like maps not associated to any image)

    clear and meaningful labels
      informative link labels: links pointing to heavy/plug-in dependent resources should specify that in the label

      explicit mailto addresses: labels of “mailto:” links should contain the actual email address

      missing page title: pages should have a title

      table headers: tables should have headers and summaries

      form prompts: within forms, text input fields should have a label

    robustness (of the site with respect to the technology used by users)
      browser compatibility: HTML code should not use proprietary structures

      safe colors: page elements should use web-safe colors

      link targets: avoid “_blank” target in frames; use correct targets for links leaving the frames

      HTML validity: only standard HTML code should be used

      portable font-faces: standard font faces should be used in addition to desired ones

      color contrast: background and foreground colors combinations should provide sufficient contrast

    flexibility
      image ALT: images should have alternative textual descriptions

      other media ALT: videos, audios, applets and other objects should have alternative textual descriptions

      imagemap links: links embedded in images should be available also in textual format

      auto-refresh: duplicate auto-refresh links in the page body (both forward and backward ones)

      forced downloading: links embedding an image in their label cannot be followed without downloading the image

      tables/frames/font resizing: relative sizes should be used

    support of users’ goals
      form coding: forms should have “submit”, “reset” buttons
    maintainability
      relative links: URLs that are local to the website should be relative
    other
      spelling: spell-check the content of pages

      different media: report on the number of different media that are used in pages/website

      keywords/description: pages should have appropriate META information to be searchable by search engines

      site popularity: how many other websites point to the one under analysis

      marquee,blink: avoid animated features

Item 3 in the previous list (natural organization of information) has not been reduced to any lower-level attribute since it refers to an external property that cannot be assessed without human intervention. The situation for “adequate feedback” (n. 2), “support for user’s goals” (n. 9) and “maintainability” (n. 10) is similar, though slightly more positive.

The following table shows the range of tests performed by each of the tools considered.
 

 
 
Range of tests performed by reviewed tools
TEST
Macro­Bot
Meta­Bot
Web

Criteria

A-

Prompt

Bobby
Net­Mech
Web

Garage

Link

Bot

Dr

Html

Web

SAT

LIFT
1.1 underline                    
*
1.2 link label consist.                    
*
1.3 email consist.                    
*
1.4 color consist.                    
*
1.5 backgr. consist.                    
*
1.6 nav-bar consist.                    
*
2.1 freshness    
*
             
*
4.1 noframes validity      
*
*
       
*
*
4.2 link to home                    
*
4.3 logical path                    
*
4.4 self-ref. pages                    
*
4.5 frame titles      
*
*
         
*
4.6 local links validity          
*
*
*
*
 
*
4.7 external links valid.          
*
*
*
*
 
*
5.1 site depth    
*
             
*
5.2 table coding                
*
   
5.3 image coding          
*
 
*
*
*
*
5.4 download time    
*
   
*
*
*
*
*
*
5.5 recycled graphics                    
*
5.6 hidden elements                    
*
6.1 informative labels                  
*
*
6.2 explicit mailto                    
*
6.3 missing page title              
*
   
*
6.4 table headers      
*
             
6.5 form prompts      
*
             
7.1 browser compatib.          
*
*
     
*
7.2 safe colors                  
*
*
7.3 link targets                  
*
*
7.4 HTML validity        
*
*
*
*
*
 
*
7.7 portable faces                    
*
7.8 color contrast      
*
             
8.1 image ALT      
*
*
*
 
*
*
*
*
8.2 other ALT      
*
*
*
 
*
*
*
 
8.3 imagemap links      
*
*
         
*
8.4 auto-refresh        
*
         
*
8.5 forced downlding                    
*
8.6 resize                    
*
9.1 form coding
*
             
*
*
 
10.1 relative links                  
*
 
11.1 spell checking          
*
*
 
*
   
11.2 different media  
*
*
               
11.3 keywords/descr.  
*
       
*
   
*
*
11.4 site popularity            
*
       
11.5 marquee/blink      
*
*
       
*
*
TOTAL
1
2
4
9
8
9
8
8
10
12
34

4. Analysis

The table shows a relatively sparse set of features. In particular:
    there is no tool dealing with external properties related with item 3 (“natural organization of information”). Similarly for the the other two items pinpointed in the previous section (i.e. “adequate feedback” and “maintainability”). Adequate feedback requires either or both an interaction based on pages containing information that conveys such meaning or relatively complex programmatic actions that are more difficult to analyze automatically (for example because they are written in javascript instead of plain HTML). Maintainability on the other hand does not affect usability and therefore, probably, is not related to the goal of those tools.

    Most frequently adopted tests are the download time of a page, presence of alternative textual descriptions, validation of HTML and links, presence of search keywords and document descriptions. Obviously, these are the tests that present the best cost/benefit ratio as they are easy to implement and accurate, in the sense that they rarely fail (missing actual faults – false negatives -- or identifying non-existing faults -- false positive).

    There are areas in the table that are poorly covered: “consistency” (n. 1), “contextual navigation” (n. 4) and “clear and meaningful labels” (n. 6). The tests encompassed within these items are clearly more difficult to implement than the previously discussed ones. Furthermore they are also less accurate, as related to properties that are somewhat external ones: consistency, clarity, meaningfulness are like beauty, in the eyes of the beholder. Nonetheless these tests could be considered as heuristic tools, highlighting aspects that are a potential problem. By adopting proper ranking strategies, these aspects can be shown to the tool user without necessarily overloading him or her.

4. The test effectiveness problem

While these tools offer a test suite that is reasonably wide and open, at the moment there is no standard way to assess usability of the tools themselves. This is particularly true for their effectiveness, that is how accurate are the tests that they run. Determining the means to measure and evaluate test effectiveness is an important requirement, both from research and pragmatic viewpoints. In fact, a standard tool evaluation methodology:
    could be used to assess validity of each test and consequently each tool;

    could be used to compare effectiveness of different tools;

    could be used to define standard levels of effectiveness, that might then automatically reflect on standard usability levels of websites that have been passed through certified tests;

    could provide insights for a proper interpretation of the results produced by tests (what can be the consequences of the problems identified and fixed by tools).

The research on web usability and accessibility guidelines [WAI, 1999; Scapin et al., 2000] is a first step towards such a methodology. But more is needed to define a proper methodology.
 
 

An evaluation methodology, given the fast evolution pace of web technologies and uses, can probably be only based on experiments comparing test results with results obtained through other usability evaluation methods, namely usability inspection methods and user testing.

It should specify a set of tests (by identifying possible usability failures and related faults), how test effectiveness is to be measured and how the experiment should be performed (what kind of user testing, what kind of questionnaires or data acquisition methods should be adopted, etc.) in order to be valid. The Goal-Question-Metrics approach [Fenton and Lawrence Pfleeger, 1997] could be followed as a framework to define such a methodology.
 
 

Notice that even though many tests are likely to yield false positives, the major consequence of this is a reduced productivity of the maintainer (that has to cope with incorrect information). In my view, it is more important to define effectiveness in terms of the number of false negatives, that is cases where the automatic tool was not able to identify a fault that was instead uncovered by other means.
 
 

Test sites could be set up where specific faults are injected with the purpose of exercising certain tests. Tools then could be evaluated on the basis of the number of faults that they uncover.

5. Conclusions

In this paper a brief survey of automatic usability evaluation tools for websites has been presented. These tools consider a large set of properties depending on attributes of websites only (and not on the context in which websites are used, thus not considering its contents).

Expecially those supporting repair actions (in addition to identification of usability faults) have the potential to dramatically reduce the time and effort needed to perform maintenance activities.

Several tests are still uncovered even though it seems that they are viable with currently available technology. In other cases, in order to be able to advance the state of the art in automatic usability evaluation, the test effectiveness problem needs to be formulated and solved. This is the problem of defining a standard methodology for evaluating the effectiveness of these tools. This in turn requires that appropriate models for usability are defined.
 
 

References

[Allen, 1996] B. Allen, Information tasks : Toward a User-Centered Approach to Information Systems, Academic Press, 1996

[ATRC, 1999] Adaptive Technology Resource Center, A-Prompt: Web Accessibility Verifier, University of Toronto, http://www.snow.utoronto.ca, 1999

[Belkin and Croft, 1987] Belkin N. J. and Croft W. B., Retrieval Techniques, in M. Williams(ed.) Annual Review of Information Science and Technology. Vol 22, pp.109-145, 1987

[CAST, 1999] Center for Applied Special Technology, Bobby 3.1.1, http://www.cast.org/bobby, 1999

[Fenton and Lawrence Pfleeger, 1997] Fenton N.E. and Lawrence Pfleeger S., Software metrics, 2nd ed., International Thompson Publishing Company, 1997

[Fleming, 1998] J. Fleming, WEB navigation: designing the user experience, O'Reilly, 1998

[Imagiware, 1997] Imagiware, HTML Doctor, http://www2.imagiware.com/RxHTML/index_noframes.html, 1997

[NetMechanic, 2000], Netmechanic, http://ww.netmechanic.com, 2000

[Netscape, 1999] Netscape, Web Site Garage, http://websitegarage.netscape.com, 1999

[Newman & Lamming, 1994] Newman and Laming, Interactive System Design, Addison-Wesley, 1994.

[Nielsen and Mack, 1994] J. Nielsen and R. Mack (eds), Usability Inspection Methods, Wiley, 1994.

[Nielsen, 1999] Nielsen J., Designing Web Usability: the practice of semplicity, New Riders Publishing, 1999.

[Nielsen, 2000] Nielsen J., http://www.useit.com/alertbox, March 2000

[NIST, 1999] National Institute for Standards and Technology, WebMetric Tools, http://zing.ncsl.nist.gov/webmet, 1999

[Rosenfeld and Morville, 1998] Rosenfeld L. and Morville P., Information architecture for the World Wide Web, O’Reilly, 1998

[Scapin et al, 2000] Scapin D., Leulier C., Vanderdonckt J., Mariage C., Bastien C., Farenc C., Palanque P., Bastide R. Towards automated testing of web usability guidelines, Human Factors and the WEB, 6th Conference (these proceedings), Austin, June 2000

[UsableNet, 2000] UsableNet.com, LIFT: web preflight and usability assistant, http://www.usablenet.com, 2000

[WAI, 1999] WAI, Web Content Accessibility Guidelines 1.0, http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505, 1999

[WatchFire, 2000] Watchfire, Press materials, http://www.watchfire.com/press, 2000

[WebCriteria, 2000] WebCriteria, http://www.webcriteria.com, 2000

[W3C, 2000] The World Wide Web Consortium, http://www.w3.org, March 2000

1 The tool list is based on a subjective selection of the tools that are described in the web and that appear to offer significant evaluation services (as of end of May 2000).
2 The test list is compiled on the basis of information about the tools gathered from the web in May 2000; I considered only the tests that can be performed automatically. In many cases a test belongs to more than one category: I listed it in the category that I believe is more fitting.