Link-Contexts for Ranking
University of California Santa Cruz
Abstract. Anchor text has been shown to be effective in ranking and
a variety of information retrieval tasks on web pages. Some authors have
expanded on anchor text by using the words around the anchor tag, a
link-context, but each with a different definition of link-context. This lack
of consensus begs the question: What is a good link-context?
The two experiments in this paper address the question by comparing the
results of using different link-contexts for the problem of ranking. Specif-
ically, we concatenate the link-contexts of links pointing to a web page
to create a link-context document used to rank that web page. By com-
paring the ranking order resulting from using different link-contexts, we
found that smaller contexts are effective at ranking relevant urls highly.
In their 2001 paper on anchor text, Craswell et al. show that ranking based
on anchor text is twice as effective as ranking based on document content. This
finding spurred many researchers to leverage anchor text for various information
retrieval tasks on web pages including ranking, summarizing, categorization, and
clustering [12, 5, 4, 2, 3].
As descriptive words often appear around the hyperlink rather than between
anchor tags, many researchers have expanded the words past anchor text to a
‘context’ around the link. These link-contexts have been defined as sentences ,
a windows , and the enclosing HTML DOM trees [12, 2] around the anchor
text. Given these competing definitions of link-context, this paper investigates
which definition performs best for the task of ranking.
Specifically, the two experiments in this paper explore how defining link-
context as either windows, sentences, or HTML trees effect ranking quality. The
link-contexts of links pointing to a single page are concatenated to create a link-
context document for that page. Each link-context document is then ranked by
the BM25 algorithm and the