HOW MANY RELEVANCES IN IR?

Stefano Mizzaro

Department of Mathematics and Computer Science - University of Udine

Via delle Scienze, 206 - Loc. Rizzi

33100 - Udine - Italy

E-mail: mizzaro@dimi.uniud.it

1. Introduction

It is widely recognised [1] that relevance is a (if not 'the') central concept of IR. I show that: (i) there are many kinds of relevance, not just one; (ii) these kinds can be classified in a four dimensional space; and (iii) such classification helps us to understand the nature of relevance judgement.

2. Kinds of relevance

2.1. First dimension

It is commonly accepted that relevance is a relation between two entities of two groups. In the first group, we have one of the following three entities:

* Document, the physical entity that the user of an IR system will obtain after his seeking of information;

* Surrogate, a representation of a document, consisting of one or more of the following: title, list of keywords, author(s) name(s), bibliographic data, abstract, and so on;

* Information, the (not physical) entity that the user receives/creates when reading a document.

2.2. Second dimension

In the second group, we have one of the following four entities:

* Problem, that a human being is facing and that requires information for being solved;

* Information need, a representation of the problem (implicit) in the mind of the user. It is different from the problem because the user might not perceive in the correct way his problem [2,3];

* Request, a representation of the information need of the user in a 'human' language, usually in natural language;

* Query, a representation of the information need in a 'system' language, e.g. boolean.

These entities appear if one analyses the interaction between a user and an IR system (see Fig. 1): the user has a problem, perceives it and builds the information need, expresses the information need in a request, and formalises (perhaps with the help of an intermediary) the request in a query. Perception, expression, and formalisation are not so simple as it might seem at first glance, since some well-known problems (ASK, label effect, vocabulary problem, and so on, see [4]) appear.

Click here for Picture

Fig. 1: Problem, information need, request, and query.

2.3. Third dimension

On this basis, a relevance can be seen as a relation between two entities, one from each group: the relevance of a surrogate to a query, or the relevance of the information received by the user to the information need, and so on. Therefore, a relevance seems a point in a two-dimensional space. But these are not all the possible relevances, since two more dimensions have to be taken into account. First, the above mentioned entities can be decomposed into the following three components (third dimension) [5,6,7]:

* Topic, that refers to the subject area to which the user is interested. For example, 'the concept of relevance in information science';

* Task, that refers to the activity that the user will execute with the retrieved documents. For example, 'to write a survey paper on ...';

* Context, that includes everything not pertaining to topic and task, but however affecting the way the search takes place and the evaluation of results. For example, documents already known by the user (and thus not worth being retrieved), time and/or money available for the search, and so on.

Therefore, a surrogate (a document, some information) is relevant to a query (request, information need, problem) with respect to one or more of these components.

2.4. Fourth dimension

The fourth dimension is the time: a surrogate (a document, some information) may be not relevant to a query (request, information need, problem) at a certain point of time, and be relevant later, or vice versa. This happens, for instance, if the user learns something that permits him to understand a document, or if the user problem changes, and so on. Thus the scenario represented in Fig. 1 has to be improved in order to take into account the highly dynamic interaction between user and IR system. In Fig. 2 the transformations of problem, information need, request, and query are illustrated. Four levels, represented by the four ellipsis, can be individuated, and refer to these four elements. At time t(P0) the user has a problem P0. The user perceives it, obtaining the initial information need (N0, at time t(N0)), he expresses it, obtaining the initial request (R0, at time t(R0)), and he formalises it, obtaining the initial query (Q0, at time t(Q0)). Then a revision takes place: the initial query may be modified (obtaining Q1, at time t(Q1)), the same may happen for the request and the information need, until the final information need (Np, at time t(Np)), request (Rm, at time t(Rm)), and query (Qn, at time t(Qn)) are obtained.

Click here for Picture

Fig. 2: The dynamic interaction user-IR system.

2.5. Relevance as a point in a four-dimensional space

Summarising, each relevance can be seen as a point in a four-dimensional space, the values of each dimension being:

1. Surrogate, document, information;

2. Query, request, information need, problem;

3. Topic, task, context, and each combination of them;

4. The various time instants from the arising of the user's problem until its solution.

The situation described so far is (partially) represented in Fig. 3: on the left hand side, there are the elements of the first dimension, and on the right hand side there are the elements of the second one. Each line linking two of these objects is a relevance (graphically emphasised by a circle on the line). It is more difficult to graphically represent the third dimension, since its elements are only partially (and not totally) ordered. In figure, the three grey levels represent the three components (induced in each of the relevances by the elements of the first two dimensions), emphasising that a relevance may concern one or more of them. For simplifying the figure, the time dimension is not represented. Finally, the grey arrows represent a partial order among the relevances. This order denotes how much a relevance is near to the relevance of the information received to the problem for all three components, the one to which the user is interested, and how difficult is to measure it. The number of steps (arrows) needed to reach the topmost small circle is an indication of the distance of each kind of relevance from the relevance of the information received to the problem.

Click here for Picture

Fig. 3: The various kinds of relevance.

3. Relevance judgement

A relevance judgement is an assignment of a value of a relevance by a judge at a certain point of time. Similarly to what done above, it is possible to say that there are many kinds of relevance judgement, that can be classified along five dimensions:

1. The kind of relevance judged (see the previous section);

2. The kind of judge (for instance, it is possible to distinguish between user and non-user);

3. What the judge can use (surrogate, document, or information) for expressing his relevance judgement. It is the same dimension used for relevance, but it is needed since, for instance, the judge can assess the relevance of a document on the basis of a surrogate;

4. What the judge can use (query, request, information need, or problem) for expressing his relevance judgement (needed for the same reason of the previous point 3);

5. The time at which the judgement is expressed (at a certain time point, one may obviously judge the relevance in another time point).

4. Conclusions

The analysis presented in this paper:

* can easily be formalised using set theory, see [8,9];

* shows how it is short-sighted to speak merely of 'system relevance' (the relevance as seen by an IR system) as opposed to 'user relevance' (the relevance in which the user is interested), and how 'topicality' (a relevance for what concerns the topic component) is conceptually different from 'system relevance';

* is useful for avoiding ambiguities on which relevance (judgement) we are talking about;

* has to be considered in the implementation of IR systems working closer to the user; and

* emphasises that it is not so strange if different studies on relevance obtain different results, as one must pay attention to both which kind of relevance is measured and which kind of relevance judgement is adopted.

References

[1] S. Mizzaro, to appear. "Relevance: The Whole History". Accepted for the publication in Journal of the American Society for Information Science.

[2] S. Mizzaro, 1996. "A cognitive analysis of information retrieval." Accepted for the publication in Proceedings of the CoLIS2 conference, Copenhagen, October 13-16 1996.

[3] S. Mizzaro, 1996. "On the foundations of information retrieval." Accepted for the publication in Proceedings of the Annual Conference AICA96, Rome, September 24-27 1996.

[4] P. Ingwersen, 1992. Information Retrieval Interaction, Taylor Graham, London.

[5] G. Brajnik, S. Mizzaro, and C. Tasso, 1996. "Evaluating User Interfaces to Information Retrieval Systems: A Case Study on User Support". In Proceedings of the SIGIR96, Zurich, August 1996, pp. 128-136 .

[6] G. Brajnik, S. Mizzaro, and C. Tasso, 1996. "La valutazione di interfacce intelligenti per il reperimento di informazioni". In AI*IA Notizie (supplement to Nr. 3, Year IX, September 1996), pp. 36-37. (In Italian).

[7] G. Brajnik, S. Mizzaro, and C. Tasso, 1995. "Interfacce intelligenti a banche di dati bibliografici". In Sistemi evoluti per basi di dati, D. Saccà (editor), pp. 95-128, Franco Angeli, Milano. (In Italian).

[8] S. Mizzaro, 1995. "Le differenti relevance in information retrieval: una classificazione." In Proceedings of the Annual Conference AICA95, pages 361-368. (In Italian).

[9] S. Mizzaro, 1996. "Relevance: the whole (hi)story". Technical Report UDMI/12/96/RR, Dipartimento di Matematica e Informatica, University of Udine.

Back to Stefano Mizzaro Home Page