(C) 2002 Giorgio Brajnik.
Presented at CHI2002 workshop
Automatically evaluating usability of Web Sites
,
Minneapolis (MN), April 2002, www.acm.org/sigchi/chi2002.


Quality Models based on Automatic Webtesting

Giorgio Brajnik

Dipartimento di Matematica e Informatica
Università di Udine, Italy
giorgio@dimi.uniud.it

The quality problem of web sites

There are many books and other published material that present a wealth of information on what to do, and what to avoid, when designing, developing or maintaining a web site; a valuable instance is [Lynch and Horton, 1999]. For example, they discuss typography on the web, dealing with issues like alignment, capitalization, typefaces, etc. Although extremely educational and useful, this written knowledge is not sufficient. In order to improve the quality of a live site, a webmaster has to study this material and has to decide which principles to apply, how to apply them, and when.

A crucial decision is which principles to apply, as different situations and contexts call for different choices. In order to determine which principles are relevant to a specific situation a webmaster has to (i) detect failures of the site, (ii) diagnose them and identify their causes, (iii) prioritize them in terms of importance, (iv) determine how to repair them, and (v) estimate benefits and costs of these changes.

Consider how often such activities take place. Web technologies change at an extremely rapid pace, and websites follow the pace. Driven by market pressure websites contents have to be updated very frequently, and redesigns of a website (its contents, information architecture and look and feel) occur very often. Nevertheless a constant, or an improved quality level is required to generate and maintain user trust and motivation to use the site.

A methodology based on a quality model for the site can support the activity of the webmaster. A quality model specifies which properties are important for a website (e.g. its usability, its performance, its visibility) and how these properties are to be determined.

Once a quality model is defined, and there should be one for each situation and context of analysis, the webmaster has to apply it (i.e. if the problem lies in low usability, then the quality model should emphasize usability factors; if the problem lies low performance, then the model should be based on stress and load tests on web servers; etc.). S/he can use the model to monitor the quality level of the site and to diagnose detected failures. Monitoring the quality level of the site entails measuring certain properties of the site (like counting images that lack the ALT attribute) and, through the model, linking these data to an overall measure of quality. The measuring activity is likely to absorb much time and effort of our webmaster unless s/he uses automatic tools for analyzing websites.

Automatic tools for websites analysis can be used to inspect the source code (mainly HTML pages loaded by the web server), to inspect the live web pages (the HTML that is produced by the web server), to inspect the web server logs of usage of a website, to test the performance of the web server and backends, to test the positioning of a site on search engines. Such an investigation will produce reports highlighting potential failures and, in some cases, will describe also the causes of such failures, providing thus active support in finding possible solutions.

The claim of this paper is that automatic tools for analysis, being systematic and mostly automatic, are crucial ingredients in a methodology based on quality models for assuring constant quality levels. Even though there are limits that these tools cannot overcome, many properties included in quality models can be automatically determined by the tools, automating thus at least part of the quality assurance activity. The final effect is increased productivity of the webmaster and fewer errors.

Quality models for websites

In the context of this paper, quality is a property of a website defined in terms of a system of attributes, like consistency of background colors or average download time. A quality model, defined for supporting a given kind of analysis, is a description of which attributes are important for the analysis, which one is more important than others, and which measurement methods have to be used to assess the attributes values. This definition of quality model follows the one given by [Fenton and Pfleeger, 1997] for software systems. See also [Brajnik, 2001] for additional details.

A quality model may involve a lot of interdependent attributes and has, of course, to take into account the particular purpose of analysis for which quality is being modeled.

Attributes of a web site may include a very large list of properties, possibly at different levels of detail, including usability, efficiency, reliability, maintainability, complexity. Figure 1 shows a portion of a possible quality model of a website centered on usability based on factors mentioned in [Brajnik, 2000].

How can a quality model be defined? Like any modeling activity, this is a creative task for which it is not possible, in general, to give a precise list of steps to be followed. However a general method to tackle the problem can be based on the Goal, Question, Metrics (GQM) approach outlined first in [Basili and Weiss, 1984] and then often adopted in software engineering investigations.

The GQM approach can be followed on any analysis that requires data collection. Quality assessment is such an activity since the webmaster needs to acquire data about the site to determine its quality level. But which data? And how will the data be used?

The GQM approach prescribes the following steps, described here in the context of web site analysis:

  1. establish the goals of the analysis. Possible goals include: to detect and remove usability obstacles that hinder an online sale procedure; to learn whether the site conveys trust to the intended user population; to compare two designs of a site to determine which one is more usable; to determine performance bottlenecks in the website and its backend implementation.
    Defining the goals of an analysis is very important. Without clear goals one could collect data  that are incomplete or where no patterns can be detected. It is also important to define the goals at the beginning, before the analysis is carried out. Otherwise it is possible that, as data are acquired, new goals might pop up, requiring further data to be acquired in order to be fulfilled. However such a new goal might involve costly changes in the acquisition and analysis procedure (e.g. users have to be interviewed again; same questions have to be answered in a slightly different context). Goals normally can be defined by understanding the situation of concern, that is the reason why the analysis is performed. Which actions will be taken as a result of the quality assessment? What do we need to know in order to take the actions? These are some of the questions that help eliciting the goals.
  2. develop questions of interest whose answers would be sufficient to decide on the appropriate course of actions. For example, a question related to the online sale obstacles goal mentioned above could be "how many users are able to buy product X in less than 3 minutes and with no mistakes?". Another question, related to the same goal, might be "are we using too many font faces on those forms?".
    Questions of interest constitute a bridge between goals of the analysis and measurable properties that are used for collecting the data. Questions also lead to sharper definition of the goals. They can be used to filter out inappropriate goals: goals for which questions cannot be formulated and goals for which there are no measurable properties can be filtered out. Questions can also be evaluated to determine if they completely cover their goal and if they refer to measurable properties.
    In addition certain questions will be more important than others in fulfilling a goal. It is at this level that quality attributes start to play a role. Each question refers to one or more quality factors (like number of errors, success rate, time required to complete a task, number of font faces) and the relative importance of questions, with respect to goals, reflects on importance of underlying factors. Factors and their importance are the first ingredients of the quality model.
  3. establish measurement methods (i.e. metrics) to be used to collect and organize data to answer the questions of interest. A measurement method (see [Fenton and Pfleeger, 1997] for additional details) should include at least:

At this point, after goals, questions and metrics are defined, the quality model describes which properties are important for given goals, and how these properties can be traced back to simpler attributes that can be measured. The model also prescribes how measurements have to be taken.

The emphasis on measurability of attributes is justified if we consider for which purpose the quality model is going to be used. We can use it to detect if a quality property falls below a certain threshold, to compare how two different design alternatives score, to monitor quality levels over time, as a site evolves; to compare quality level of our site against the site of a competitor of ours.

What De Marco said for software engineering 20 years ago fully applies to web sites of today: you cannot control what you cannot measure [De Marco, 1982]. But unless a well defined quality model is used the results of quality assessments will be meaningless. Especially if we rely on subjective methods to determine if certain properties hold, then the data acquisition activities may be too vaguely defined, hindering validity of the results.

Automatic webtesting systems

Automatic webtesting tools can play a crucial role in the definition and usage of quality models. They are important because they: (i) necessarily adopt objective metrics only, (ii) are systematic and error free, (iii) much more cost-effective than any other manual method. There are several flavors of such tools:

Obviously, automatic tools cannot assess all sorts of properties. In particular, anything that requires interpretation (e.g. usage of natural and concise language) or that requires assessment of relevance (e.g. ALT text of an image is equivalent to the image itself) will be out of reach. Nevertheless these tools can highlight a number of issues that have to be later inspected by humans and can avoid highlighting them when there is reasonably certainty that the issue is not a problem. For example, a non-empty ALT can contain placeholder text (like the filename of the image); it is reasonably simple to write heuristic programs that can detect such a pattern and flag such an issue only when appropriate -- and be able not to flag it if the ALT contains other text.

Therefore if the quality model that a webmaster is interested in includes attributes that are amenable to treatment by automatic tools, the quality assessment problem can be solved by appropriately configuring the tool, running it to acquire relevant data, and then weighing the data found by the tool according to the importance criteria defined in the model. Such an activity, being based on systematic and objective analysis of the web site, is at once both economically feasible and relatively error-free.

As pointed out in [Brajnik, 2001] the issue of validity of the metrics adopted in a quality model arises when metrics are computed by automated tools and when metrics start dealing with more interesting properties, like assessing accessibility or usability. In these cases a tool may lead to incorrect answers (i.e. incorrect values associated to attributes included in the quality model). This may happen either because the tool found some false positive (i.e. an issue has been reported where there is none) or because of false negatives (i.e. the tool was unable to detect a problem). Methods like the ones proposed in [Brajnik, 2001] can be applied, and they can limit the consequences of this problem. In general only relatively simple quality models will be based entirely on automatic tools. In the vast majority of the cases, quality assessment will be based also on human inspection and human judgment. However, the contribution that automatic web testing tools can bring to quality assessment of websites is significant: low cost and superficial analyses can be performed automatically. Only thereafter, if needed, a more in-depth and accurate human analysis can be performed. In this way, productivity of webmasters will be enhanced.

References

[Basili and Weiss, 1984] Basili V.R. and Weiss D. "A methodology for collecting valid software engineering data", IEEE Trans. on Software Engineering, SE-10(6), pp. 728-738, 1984.

[Brajnik, 2000] Brajnik, G. "Automatic web usability evaluation: what needs to be done?", in Proc. Human Factors and the Web, 6th Conference, Austin, June 2000, www.dimi.uniud.it/~giorgio/papers/hfweb00.html

[Brajnik, 2001] Brajnik G. "Towards valid quality models for websites", in Proc. Human Factors and the Web, 7th Conference, Madison, WI, June 2001. www.dimi.uniud.it/~giorgio/papers/hfweb01.html

[DeMarco, 1982] De Marco T. Controlling software projects, Yourdon Press, New York, 1982.

[Fenton and Lawrence Pfleeger, 1997] Fenton N.E. and Lawrence Pfleeger S. Software metrics, 2nd ed., International Thompson Publishing Company, 1997

[Lynch and Horton, 1999] Lynch P. and Horton S. Web Style Guide, Yale University, 1999.

[Nielsen, 1999] Nielsen J., Designing Web Usability: the practice of simplicity, New Riders Publishing, 1999.

[Nielsen, 2002] Nielsen J., http://www.useit.com/alertbox/

[Sinha et al, 2002] Sinha R., Hearst M., Ivory M., Draisin M. "Content or Graphics? An empirical analysis of criteria for award-winning websites", in Proc. Human Factors and the Web, 7th Conference, Madison, WI, June 2001.

Biosketch

Giorgio Brajnik (www.dimi.uniud.it/~giorgio) is a faculty member of the Computer Science School at the University of Udine, Italy. He has done research for more than 15 years on user interfaces for information systems, focussing during the last 3 years on usability of web sites and on webtesting systems. His teaching includes courses in Information Retrieval and Web Design.
He is scientific advisor for UsableNet Inc.