# Comparing two ranked lists

There comes times when we want to compare two rankings, for example, between my top-5 list of all time best male distance runners, and that of my friend Robin. My top 1 through 5 are:

3. Paul Tergat

4. Emil Zátopek

While Robin agrees mostly with me, but his top-5 list consists of:

3. Emil Zátopek

5. Mo Farah

So the question is, how similar is my ranked list to Robin's? As much fun as it is to rank distance runners, we encounter such problems in other cases, such as movie, or product rankings. A similarity measure can be quite useful to help us to cluster customers, or to evaluate a ranked list generated by a new model to the old ones. But it is not quite straightforward to derive a numerical value, for example, between the two top-5 rankings shown above.

Before we dive in to find the proper protocols, it is best to enumerate some properties we like to have for the similarity measure. **First** we like it to be bounded, say, between [0, 1], with 0 meaning completely different and 1 meaning identical -- we will further elaborate what does (in)different mean later. **Second**, we like such protocol handle non-conjoint case, i.e., to allow items shown in only one of the two lists. In the example above, I have Paul Tergat in my list while Robin has Mo Farah in his list, we should be able to handle some situations. **Third**, we like to handle lists with different lengths, e.g., comparing a top-5 list and a top-10 list, or even more broadly, comparing results between two infinitely long lists (think search results of the same query returned by Google and Bing).

Now we have at least three desired properties in mind, we can seek for the appropriate measures. There are mostly two types of similarity measures between ranked lists: correlation based, and intersection based. One of the key difference between the two categories is that, the correlation based measures only works in the conjoint case, namely, item appears in one list must appear in the other list as well.

An popular example for the first category is the Kendall's tau, which has a nice probability interpretation. In short, we ** randomly** pick two items (say, A and B), and Kendall's tau is the probability that A ranks before (or after) B in both lists. This property nicely bounded Kendall's tau between 0 and 1, however, the conjoint requirement would fail, as in our example, due to (dis)appearance of Paul Tergat and Mo Farah. We can remove all those items that only appear in one list, and re-rank the remaining ones, but it doesn't sound like a satisfactory solution. Hence, In terms of running time we are counting the number of inversions, hence given two lists with the same length of

*N*, Kendall's tau can be calculated in O(

*N*log

*N*). Not surprisingly, scipy already has an implementation for Kendall's tau.

Then comes the intersection based measure. The idea is very simple, we simply count what is the proportion of overlapped items, among all items, as we progress the ranking depth. This will become clear in the figure shown below.

The figure above should be quite self-explanatory, in which I represent runners with the first letter in their last names. The overlap is the ratio between the length of common items set (intersection) and the total item set (union). The average overlap at each ranking depth k, is the average of all the k overlaps. The last average overlap (0.55), will be our similarity measure. A similar metric based on this idea, called rank-biased overlap (RBO), is proposed in this paper.

Compared to the correlation based measures, the intersection based measures handily deal with the non-conjoint cases, hence satisfying our second requirement outline above. Furthermore the authors of RBO also made some modifications, to satisfy the other two requirements. RBO also comes with some other nice properties, such as the top-weightness: the (dis)agreement at the top of the two lists will weight more than the same (dis)agreement towards to the bottom of the lists, this is quite natural, isn't it?

In terms of calculating the RBO, it could be done in O(*N*) time, where *N* being the length of the longer list. Here you can find my implementation of the RBO in python.