A Study of Blog Search
Gilad Mishne
Maarten de Rijke
Informatics Institute, University of Amsterdam,
Kruislaan 403, 1098 SJ Amsterdam, The Netherlands
gilad,mdr@science.uva.nl
Abstract. We present an analysis of a large blog search engine query
log, exploring a number of angles such as query intent, query topics, and
user sessions. Our results show that blog searches have different intents
than general web searches, suggesting that the primary targets of blog
searchers are tracking references to named entities, and locating blogs by
theme. In terms of interest areas, blog searchers are, on average, more
engaged in technology, entertainment, and politics than web searchers,
with a particular interest in current events. The user behavior observed
is similar to that in general web search: short sessions with an interest
in the first few results only.
1 Introduction
The rise on the Internet of blogging—the publication of journal-like web page
logs, or blogs—has created a highly dynamic and tightly interwoven subset of the
World Wide Web [10]. The blogspace (the collection of blogs and all their links)
is giving rise to a large body of research, both concerning content (e.g., Can we
process blogs automatically and find consumer complaints and breaking reports
about vulnerabilities of products?) and structure (e.g., What is the dynamics of
the blogspace?). A variety of dedicated workshops bear witness to this burst of
research activity around blogs; see e.g., [20].
In this paper we focus on another aspect of the blogspace: searching blogs.
The exponential rise in the number of blogs from thousands in the late 1990s
to tens of millions in 2005 [3, 18, 19] has created a need for effective access and
retrieval services. Today, there is a broad range of search and discovery tools for
blogs, offered by a variety of players; some focus exclusively on blog access (e.g.,
Blogdigger [2], Blogpulse [3], and Technorati [18]), while web search engines such
as Google, Yahoo! and AskJeeves offer specialized blog services.
The develo