I present a survey
of tools that analyze websites, illustrating what kind of automatic tests
they perform and which usability factors the tests are more closely related
to. The survey then leads to an analysis of the still remaining gaps and
of research openings.
On the one hand web technologies evolve extremely fast, enabling sophisticated tools to be deployed and complex interactions to take place. Secondly, the life cycle of a website is also extremely fast: maintenance of a website is performed at a rate that is higher than that of other software products because of market pressure and lack of distribution barriers. In addition, often the scope of maintenance becomes so wide that a complete redesign takes place.
On the other hand, the quality of a website is rooted on its usability, which usually results from the adoption of user-centered development and evaluation approaches [Newman and Lamming, 1994; Fleming, 1998; Rosenfeld and Morville, 1998; Nielsen, 1999]. Usability testing is thus a necessary and repeated step during the life-cycle of a website.
To test usability of a website a developer can adopt two kinds of methods: usability inspection methods (e.g. heuristic evaluation [Nielsen and Mack, 1994]) or user testing [Nielsen, 2000]. Heuristic evaluation is based on a pool of experts that inspect and use a (part of a) website and identify usability problems that they assume will affect end users. With user testing, a sample of the user population of the website is selected and is asked to use (part of the) website and report things that they think did not work or are not appropriate.
Even though the cost (in terms of time and effort) of both methods is not particularly high, and their application improves the website quality and reduce the overall development cost, they are not systematically performed at detailed levels on every different part of a website after each maintenance or development step.
It is clear that as change actions on a website increase rapidly in
number and variety, more and more resources need to be deployed to ensure
that website quality does not decrease (but hopefully increases). It is
also clear that any tool that can, at least in part, automate the usability
evaluation and maintenance processes will help to fill this ever widening
gap.
The goal of this paper is to present a brief survey of what these tools do and how they contribute to the usability evaluation problem. From the analysis it appears that gaps exist between what these tools achieve and what is required to ensure usability. While some of these gaps are inherently unsolvable, other ones can probably be filled in, given that additional research is carried out to identify effective techniques.
End users can be characterized in terms of:
context: user behavior during information seeking processes is strongly affected by users’ culture, language, previous knowledge in the field, experience in using the web.
technology: end users interact with the website through a layer of technology that is not under control by the web designer: browsers, protocols, plug-ins, operating system platforms, interaction devices (screens, speaking devices, pens, reduced telephone keyboards, etc.), network connections.
On the other hand we have developers and
maintainers. Amongst their activities, a preminent role is played by actions
that include: corrective maintenance (i.e. fixing problems with
the website behavior or inserting missing contents),
adaptive maintenance
(i.e. upgrading the site with respect to new technologies, like new browsers’
capabilities), perfective maintenance (ie. improving the site behavior
or content), and
preventive maintenance (i.e. fixing problems in
behavior or content before they affect users). A large fraction of these
activities is aimed at detecting system failures (that is departures
from its required behavior), analyzing them and identifying faults
(that is representations, within the system, of human errors that occurred
during development – bugs).
Maintenance is meant to improve the quality of the website. ISO9126 defines quality as “the totality of features and characteristics of a software product that bear on its ability to satisfy stated or implied needs” and it includes properties like maintainability, robustness, reliability and usability that are particularly important for websites.
Usability can be defined (ISO9241) as “the effectiveness, efficiency and satisfaction with which specified users achieve specified goals in particular environments”, where:
efficiency means “the resources expended in relation to the accuracy and completeness of goals achieved”, and
satisfaction means “the comfort and acceptability of the work system to its users and other people affected by its use”.
In order to be operationalized these properties need to be decomposed into more detailed ones that can be assessed in a simpler and perhaps more standard way. For example, mantainability can be decomposed into complexity of the DHTML code, its size, the number of absolute URLs, etc.
The same applies to usability. It can be described in terms of usability factors (like speed of use, error rate, ease of error recovery, etc) which in turn can be reduced to other lower-level properties. The most important properties for website usability include those related with “navigability” (most of them taken from [Fleming, 1998]):
adequate feedback
natural organization of the information (systematic labels, clear hierarchical structure)
contextual navigation (in each state all and only the possible navigation options are available)
efficient navigation (in terms of time and effort needed to complete a task)
clear and meaningful labels.
Other properties relevant to usability of a website are:
flexibility (for example: availability of graphic and textual versions, redundant indexes and site maps, duplicated image map links)
functionality (i.e. support of users’ goals)
The latter can be further decomposed if we narrow users' goals. For e-commerce sites, for example, other relevant attributes can be:
similarly for privacy
how easy and effective it is to find the desired item
how easy and effective it is to search the catalog for an item not known a priori
how easy and effective it is to preview an item
what are the return policies and how they are communicated
The Web Accessibility Initiative [W3C, 2000] is an effort by the W3C organization to improve website accessibility. They publish a set of guidelines [WAI, 1999] where accessibility is defined as the website ability to be used by someone with disabilities. An accessible website:
makes content understandable and navigable: it should present its content in a clear and simple language, and should provide understandable mechanisms to navigate within and between pages.
While usability implies accessibility (at least when an unconstrained user population is considered), the contrary is not necessarily true. For example, a missing link to the home page may be a fault affecting usability, while it does not affect accessibility.
All these properties (either those related
with usability or those related with accessibility) may be further decomposed
into more detailed ones that refer to specific attributes of the website
implementation. Actually, such a decomposition has to be done in order
to support usability inspection methods and to identify and fix faults.
For example, to determine how flexible a website is, we need to inspect
implementation (or perhaps design specifications) to determine if there
is a textual version of the page, if there are textual links that duplicate
those embedded in images, etc.
Some of these lower-level properties refer
to attributes that depend only on how the website has been designed/developed
(e.g. textual duplicates of links embedded in images) – they are internal
attributes, while others depend on the website and its usage (e.g.
how meaningful a label is) – external attributes. This is always
the case for properties referring to the content, which require some sort
of interpretation that assigns meaning to symbols in order to be assessed.
While for evaluating usability of a website
both internal and external attributes are needed, only the former ones
are amenable for automatic tests. External attributes can be evaluated
only via semi-automatic means that entail a human evaluation step. However,
tools can provide useful assistance by filtering and ranking content that
is potentially relevant (for example, by adopting statistical techniques
developed in Information Retrieval [Belkin and Croft, 1987]).
type of service: failure identifiers (they discover potential failures via simulation of user actions, like filling a form; sometimes they rank them according to severity); fault analyzers (they find failures and highlight their causes, i.e. faults; usually they systematically analyze the source code of the website; sometimes ranking the list of faults according to their severity); analysis and repair tools (they assist the developer also in fixing the faults)
information source: automatic usability analysis can be performed on the basis of the actual implementation of a website (sources), or on webserver logs, or data acquired during user testing (user testing data); this paper deals only with tools analyzing website sources
scope, i.e. the set of attributes that are considered during the automatic analysis. A classification based on scope is:
HTML/graphic optimizers (they improve downloading and rendering performance by recoding certain parts of HTML or graphic documents)
link checkers (they probe all the links leaving a page to determine if their targets exist)
usability tools (they detect and sometimes help to fix usability faults).
At the moment the following tools have been developed and are available (or will soon be available) from the web1:
Bobby: available from CAST [CAST, 1999]; web-based and off-line, with ranking; fault analyzer
Doctor HTML: available from Imagiware [Imagiware, 1997]; web-based and off-line; fault analyzer
LIFT: available from UsableNet.com [Usablenet, 2000]; web-based and off-line, with ranking; fault analyzer and repair tool
LinkBot: by WatchFire [WatchFire, 2000]; off-line, with ranking; fault analysis and repair tool;
MacroBot: by WatchFire [WatchFire, 2000]; off-line; failure identifier
MetaBot: by WatchFire [WatchFire, 2000]; off-line; fault analyzer and repair tool;
NetMechanic: by Netmechanic [Netmechanic, 2000]; web-based; fault analyzer and repair tool;
WebCriteria: available from WebCriteria [WebCriteria, 2000]; web-based; comparative evaluation of a website with respect to a benchmark derived from similar well-established websites; failure identifier
WebGarage: available from Netscape [Netscape, 1999]; web-based; fault analyzer
WebSAT: available from NIST [NIST, 1999]; web-based and off-line; fault analyzer
These tools cover a relatively large set of tests, which can be grouped according to usability-related properties as follows2:
link label: different links pointing to the same resource should have the same label
email label: labels associated to a given email address should be consistent
color consistency: colors used for background/foreground/links should be consistent among pages
background consistency: background images should be consistently used
nav-bar consistency: links included in navigation bars should be consistent among pages
contextual navigation (in each state the required navigation options are available)
link to home: each page should contain a link to the home page
logical path: each page should contain links to each intermediate page in the path connecting the page to the home
self-referential pages: pages should not contain links to themselves
frame titles: frames should set the “title” attribute
local links validity: links that are local to the website should point to existing resources
external links validity: links to external resources should be periodically checked
table coding: table components should have explicit width and height
image coding: images should also have explicit width and height
download time: pages should download within given time threshold
recycled graphics: images used in the website should be shared (so that browsers can cache them)
hidden elements: pages should not contain elements that cannot be shown (like maps not associated to any image)
explicit mailto addresses: labels of “mailto:” links should contain the actual email address
missing page title: pages should have a title
table headers: tables should have headers and summaries
form prompts: within forms, text input fields should have a label
safe colors: page elements should use web-safe colors
link targets: avoid “_blank” target in frames; use correct targets for links leaving the frames
HTML validity: only standard HTML code should be used
portable font-faces: standard font faces should be used in addition to desired ones
color contrast: background and foreground colors combinations should provide sufficient contrast
other media ALT: videos, audios, applets and other objects should have alternative textual descriptions
imagemap links: links embedded in images should be available also in textual format
auto-refresh: duplicate auto-refresh links in the page body (both forward and backward ones)
forced downloading: links embedding an image in their label cannot be followed without downloading the image
tables/frames/font resizing: relative sizes should be used
different media: report on the number of different media that are used in pages/website
keywords/description: pages should have appropriate META information to be searchable by search engines
site popularity: how many other websites point to the one under analysis
marquee,blink: avoid animated features
Item 3 in the previous list (natural organization of information) has not been reduced to any lower-level attribute since it refers to an external property that cannot be assessed without human intervention. The situation for “adequate feedback” (n. 2), “support for user’s goals” (n. 9) and “maintainability” (n. 10) is similar, though slightly more positive.
The following table shows the range of tests performed by each of the
tools considered.
|
|
|
Criteria |
Prompt |
|
|
Garage |
Bot |
Html |
SAT |
|
1.1 underline |
|
||||||||||
1.2 link label consist. |
|
||||||||||
1.3 email consist. |
|
||||||||||
1.4 color consist. |
|
||||||||||
1.5 backgr. consist. |
|
||||||||||
1.6 nav-bar consist. |
|
||||||||||
2.1 freshness |
|
|
|||||||||
4.1 noframes validity |
|
|
|
|
|||||||
4.2 link to home |
|
||||||||||
4.3 logical path |
|
||||||||||
4.4 self-ref. pages |
|
||||||||||
4.5 frame titles |
|
|
|
||||||||
4.6 local links validity |
|
|
|
|
|
||||||
4.7 external links valid. |
|
|
|
|
|
||||||
5.1 site depth |
|
|
|||||||||
5.2 table coding |
|
||||||||||
5.3 image coding |
|
|
|
|
|
||||||
5.4 download time |
|
|
|
|
|
|
|
||||
5.5 recycled graphics |
|
||||||||||
5.6 hidden elements |
|
||||||||||
6.1 informative labels |
|
|
|||||||||
6.2 explicit mailto |
|
||||||||||
6.3 missing page title |
|
|
|||||||||
6.4 table headers |
|
||||||||||
6.5 form prompts |
|
||||||||||
7.1 browser compatib. |
|
|
|
||||||||
7.2 safe colors |
|
|
|||||||||
7.3 link targets |
|
|
|||||||||
7.4 HTML validity |
|
|
|
|
|
|
|||||
7.7 portable faces |
|
||||||||||
7.8 color contrast |
|
||||||||||
8.1 image ALT |
|
|
|
|
|
|
|
||||
8.2 other ALT |
|
|
|
|
|
|
|||||
8.3 imagemap links |
|
|
|
||||||||
8.4 auto-refresh |
|
|
|||||||||
8.5 forced downlding |
|
||||||||||
8.6 resize |
|
||||||||||
9.1 form coding |
|
|
|
||||||||
10.1 relative links |
|
||||||||||
11.1 spell checking |
|
|
|
||||||||
11.2 different media |
|
|
|||||||||
11.3 keywords/descr. |
|
|
|
|
|||||||
11.4 site popularity |
|
||||||||||
11.5 marquee/blink |
|
|
|
|
|||||||
TOTAL |
|
|
|
|
|
|
|
|
|
|
|
Most frequently adopted tests are the download time of a page, presence of alternative textual descriptions, validation of HTML and links, presence of search keywords and document descriptions. Obviously, these are the tests that present the best cost/benefit ratio as they are easy to implement and accurate, in the sense that they rarely fail (missing actual faults – false negatives -- or identifying non-existing faults -- false positive).
There are areas in the table that are poorly covered: “consistency” (n. 1), “contextual navigation” (n. 4) and “clear and meaningful labels” (n. 6). The tests encompassed within these items are clearly more difficult to implement than the previously discussed ones. Furthermore they are also less accurate, as related to properties that are somewhat external ones: consistency, clarity, meaningfulness are like beauty, in the eyes of the beholder. Nonetheless these tests could be considered as heuristic tools, highlighting aspects that are a potential problem. By adopting proper ranking strategies, these aspects can be shown to the tool user without necessarily overloading him or her.
could be used to compare effectiveness of different tools;
could be used to define standard levels of effectiveness, that might then automatically reflect on standard usability levels of websites that have been passed through certified tests;
could provide insights for a proper interpretation of the results produced by tests (what can be the consequences of the problems identified and fixed by tools).
The research on web usability and accessibility guidelines [WAI, 1999;
Scapin et al., 2000] is a first step towards such a methodology. But more
is needed to define a proper methodology.
An evaluation methodology, given the fast evolution pace of web technologies and uses, can probably be only based on experiments comparing test results with results obtained through other usability evaluation methods, namely usability inspection methods and user testing.
It should specify a set of tests (by identifying possible usability
failures and related faults), how test effectiveness is to be measured
and how the experiment should be performed (what kind of user testing,
what kind of questionnaires or data acquisition methods should be adopted,
etc.) in order to be valid. The Goal-Question-Metrics approach [Fenton
and Lawrence Pfleeger, 1997] could be followed as a framework to define
such a methodology.
Notice that even though many tests are likely to yield false positives,
the major consequence of this is a reduced productivity of the maintainer
(that has to cope with incorrect information). In my view, it is more important
to define effectiveness in terms of the number of false negatives, that
is cases where the automatic tool was not able to identify a fault that
was instead uncovered by other means.
Test sites could be set up where specific faults are injected with the purpose of exercising certain tests. Tools then could be evaluated on the basis of the number of faults that they uncover.
Expecially those supporting repair actions (in addition to identification of usability faults) have the potential to dramatically reduce the time and effort needed to perform maintenance activities.
Several tests are still uncovered even though it seems that they are
viable with currently available technology. In other cases, in order to
be able to advance the state of the art in automatic usability evaluation,
the test effectiveness problem needs to be formulated and solved. This
is the problem of defining a standard methodology for evaluating the effectiveness
of these tools. This in turn requires that appropriate models for usability
are defined.
[ATRC, 1999] Adaptive Technology Resource Center, A-Prompt: Web Accessibility Verifier, University of Toronto, http://www.snow.utoronto.ca, 1999
[Belkin and Croft, 1987] Belkin N. J. and Croft W. B., Retrieval Techniques, in M. Williams(ed.) Annual Review of Information Science and Technology. Vol 22, pp.109-145, 1987
[CAST, 1999] Center for Applied Special Technology, Bobby 3.1.1, http://www.cast.org/bobby, 1999
[Fenton and Lawrence Pfleeger, 1997] Fenton N.E. and Lawrence Pfleeger S., Software metrics, 2nd ed., International Thompson Publishing Company, 1997
[Fleming, 1998] J. Fleming, WEB navigation: designing the user experience, O'Reilly, 1998
[Imagiware, 1997] Imagiware, HTML Doctor, http://www2.imagiware.com/RxHTML/index_noframes.html, 1997
[NetMechanic, 2000], Netmechanic, http://ww.netmechanic.com, 2000
[Netscape, 1999] Netscape, Web Site Garage, http://websitegarage.netscape.com, 1999
[Newman & Lamming, 1994] Newman and Laming, Interactive System Design, Addison-Wesley, 1994.
[Nielsen and Mack, 1994] J. Nielsen and R. Mack (eds), Usability Inspection Methods, Wiley, 1994.
[Nielsen, 1999] Nielsen J., Designing Web Usability: the practice of semplicity, New Riders Publishing, 1999.
[Nielsen, 2000] Nielsen J., http://www.useit.com/alertbox, March 2000
[NIST, 1999] National Institute for Standards and Technology, WebMetric Tools, http://zing.ncsl.nist.gov/webmet, 1999
[Rosenfeld and Morville, 1998] Rosenfeld L. and Morville P., Information architecture for the World Wide Web, O’Reilly, 1998
[Scapin et al, 2000] Scapin D., Leulier C., Vanderdonckt J., Mariage C., Bastien C., Farenc C., Palanque P., Bastide R. Towards automated testing of web usability guidelines, Human Factors and the WEB, 6th Conference (these proceedings), Austin, June 2000
[UsableNet, 2000] UsableNet.com, LIFT: web preflight and usability assistant, http://www.usablenet.com, 2000
[WAI, 1999] WAI, Web Content Accessibility Guidelines 1.0, http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505, 1999
[WatchFire, 2000] Watchfire, Press materials, http://www.watchfire.com/press, 2000
[WebCriteria, 2000] WebCriteria, http://www.webcriteria.com, 2000
[W3C,
2000] The World Wide Web Consortium, http://www.w3.org,
March 2000