(C) 2002 Giorgio Brajnik.
Presented at CHI2002 workshop
Automatically evaluating usability of Web Sites,
Minneapolis (MN), April 2002, www.acm.org/sigchi/chi2002.
Quality Models based on Automatic Webtesting
Dipartimento di Matematica e Informatica
Università di Udine, Italy
There are many books and other published material that present a wealth of information on what to do, and what to avoid, when designing, developing or maintaining a web site; a valuable instance is [Lynch and Horton, 1999]. For example, they discuss typography on the web, dealing with issues like alignment, capitalization, typefaces, etc. Although extremely educational and useful, this written knowledge is not sufficient. In order to improve the quality of a live site, a webmaster has to study this material and has to decide which principles to apply, how to apply them, and when.
A crucial decision is which principles to apply, as different situations and contexts call for different choices. In order to determine which principles are relevant to a specific situation a webmaster has to (i) detect failures of the site, (ii) diagnose them and identify their causes, (iii) prioritize them in terms of importance, (iv) determine how to repair them, and (v) estimate benefits and costs of these changes.
Consider how often such activities take place. Web technologies change at an extremely rapid pace, and websites follow the pace. Driven by market pressure websites contents have to be updated very frequently, and redesigns of a website (its contents, information architecture and look and feel) occur very often. Nevertheless a constant, or an improved quality level is required to generate and maintain user trust and motivation to use the site.
A methodology based on a quality model for the site can support the activity of the webmaster. A quality model specifies which properties are important for a website (e.g. its usability, its performance, its visibility) and how these properties are to be determined.
Once a quality model is defined, and there should be one for each situation and context of analysis, the webmaster has to apply it (i.e. if the problem lies in low usability, then the quality model should emphasize usability factors; if the problem lies low performance, then the model should be based on stress and load tests on web servers; etc.). S/he can use the model to monitor the quality level of the site and to diagnose detected failures. Monitoring the quality level of the site entails measuring certain properties of the site (like counting images that lack the ALT attribute) and, through the model, linking these data to an overall measure of quality. The measuring activity is likely to absorb much time and effort of our webmaster unless s/he uses automatic tools for analyzing websites.
Automatic tools for websites analysis can be used to inspect the source code (mainly HTML pages loaded by the web server), to inspect the live web pages (the HTML that is produced by the web server), to inspect the web server logs of usage of a website, to test the performance of the web server and backends, to test the positioning of a site on search engines. Such an investigation will produce reports highlighting potential failures and, in some cases, will describe also the causes of such failures, providing thus active support in finding possible solutions.
The claim of this paper is that automatic tools for analysis, being systematic and mostly automatic, are crucial ingredients in a methodology based on quality models for assuring constant quality levels. Even though there are limits that these tools cannot overcome, many properties included in quality models can be automatically determined by the tools, automating thus at least part of the quality assurance activity. The final effect is increased productivity of the webmaster and fewer errors.
In the context of this paper, quality is a property of a website defined in terms of a system of attributes, like consistency of background colors or average download time. A quality model, defined for supporting a given kind of analysis, is a description of which attributes are important for the analysis, which one is more important than others, and which measurement methods have to be used to assess the attributes values. This definition of quality model follows the one given by [Fenton and Pfleeger, 1997] for software systems. See also [Brajnik, 2001] for additional details.
A quality model may involve a lot of interdependent attributes and has, of course, to take into account the particular purpose of analysis for which quality is being modeled.
Attributes of a web site may include a very large list of properties, possibly at different levels of detail, including usability, efficiency, reliability, maintainability, complexity. Figure 1 shows a portion of a possible quality model of a website centered on usability based on factors mentioned in [Brajnik, 2000].
How can a quality model be defined? Like any modeling activity, this is a creative task for which it is not possible, in general, to give a precise list of steps to be followed. However a general method to tackle the problem can be based on the Goal, Question, Metrics (GQM) approach outlined first in [Basili and Weiss, 1984] and then often adopted in software engineering investigations.
The GQM approach can be followed on any analysis that requires data collection. Quality assessment is such an activity since the webmaster needs to acquire data about the site to determine its quality level. But which data? And how will the data be used?
The GQM approach prescribes the following steps, described here in the context of web site analysis:
At this point, after goals, questions and metrics are defined, the quality model describes which properties are important for given goals, and how these properties can be traced back to simpler attributes that can be measured. The model also prescribes how measurements have to be taken.
The emphasis on measurability of attributes is justified if we consider for which purpose the quality model is going to be used. We can use it to detect if a quality property falls below a certain threshold, to compare how two different design alternatives score, to monitor quality levels over time, as a site evolves; to compare quality level of our site against the site of a competitor of ours.
What De Marco said for software engineering 20 years ago fully applies to web sites of today: you cannot control what you cannot measure [De Marco, 1982]. But unless a well defined quality model is used the results of quality assessments will be meaningless. Especially if we rely on subjective methods to determine if certain properties hold, then the data acquisition activities may be too vaguely defined, hindering validity of the results.
Automatic webtesting tools can play a crucial role in the definition and usage of quality models. They are important because they: (i) necessarily adopt objective metrics only, (ii) are systematic and error free, (iii) much more cost-effective than any other manual method. There are several flavors of such tools:
Obviously, automatic tools cannot assess all sorts of properties. In particular, anything that requires interpretation (e.g. usage of natural and concise language) or that requires assessment of relevance (e.g. ALT text of an image is equivalent to the image itself) will be out of reach. Nevertheless these tools can highlight a number of issues that have to be later inspected by humans and can avoid highlighting them when there is reasonably certainty that the issue is not a problem. For example, a non-empty ALT can contain placeholder text (like the filename of the image); it is reasonably simple to write heuristic programs that can detect such a pattern and flag such an issue only when appropriate -- and be able not to flag it if the ALT contains other text.
Therefore if the quality model that a webmaster is interested in includes attributes that are amenable to treatment by automatic tools, the quality assessment problem can be solved by appropriately configuring the tool, running it to acquire relevant data, and then weighing the data found by the tool according to the importance criteria defined in the model. Such an activity, being based on systematic and objective analysis of the web site, is at once both economically feasible and relatively error-free.
As pointed out in [Brajnik, 2001] the issue of validity of the metrics adopted in a quality model arises when metrics are computed by automated tools and when metrics start dealing with more interesting properties, like assessing accessibility or usability. In these cases a tool may lead to incorrect answers (i.e. incorrect values associated to attributes included in the quality model). This may happen either because the tool found some false positive (i.e. an issue has been reported where there is none) or because of false negatives (i.e. the tool was unable to detect a problem). Methods like the ones proposed in [Brajnik, 2001] can be applied, and they can limit the consequences of this problem. In general only relatively simple quality models will be based entirely on automatic tools. In the vast majority of the cases, quality assessment will be based also on human inspection and human judgment. However, the contribution that automatic web testing tools can bring to quality assessment of websites is significant: low cost and superficial analyses can be performed automatically. Only thereafter, if needed, a more in-depth and accurate human analysis can be performed. In this way, productivity of webmasters will be enhanced.
[Basili and Weiss, 1984] Basili V.R. and Weiss D. "A methodology for collecting valid software engineering data", IEEE Trans. on Software Engineering, SE-10(6), pp. 728-738, 1984.
[Brajnik, 2000] Brajnik, G. "Automatic web usability evaluation: what needs to be done?", in Proc. Human Factors and the Web, 6th Conference, Austin, June 2000, www.dimi.uniud.it/~giorgio/papers/hfweb00.html
[Brajnik, 2001] Brajnik G. "Towards valid quality models for websites", in Proc. Human Factors and the Web, 7th Conference, Madison, WI, June 2001. www.dimi.uniud.it/~giorgio/papers/hfweb01.html
[DeMarco, 1982] De Marco T. Controlling software projects, Yourdon Press, New York, 1982.
[Fenton and Lawrence Pfleeger, 1997] Fenton N.E. and Lawrence Pfleeger S. Software metrics, 2nd ed., International Thompson Publishing Company, 1997
[Lynch and Horton, 1999] Lynch P. and Horton S. Web Style Guide, Yale University, 1999.
[Nielsen, 1999] Nielsen J., Designing Web Usability: the practice of simplicity, New Riders Publishing, 1999.
[Nielsen, 2002] Nielsen J., http://www.useit.com/alertbox/
[Sinha et al, 2002] Sinha R., Hearst M., Ivory M., Draisin M. "Content or Graphics? An empirical analysis of criteria for award-winning websites", in Proc. Human Factors and the Web, 7th Conference, Madison, WI, June 2001.
Giorgio Brajnik (www.dimi.uniud.it/~giorgio)
is a faculty member of the Computer Science School at the
University of Udine, Italy. He has done research for more
than 15 years on user interfaces for information systems,
focussing during the last 3 years on usability of web sites
and on webtesting systems. His teaching includes courses in
Information Retrieval and Web Design.
He is scientific advisor for UsableNet Inc.