Person Search Made Easy
Nazlı İkizler and Pınar Duygulu
Department of Computer Engineering, Bilkent University, Ankara, Turkey
{inazli,duygulu}@cs.bilkent.edu.tr
Abstract. In this study, we present a method to extensively reduce the
number of retrieved images and increase the retrieval performance for the
person queries on the broadcast news videos. A multi-modal approach
which integrates face and text information is proposed. A state-of-the-
art face detection algorithm is improved using a skin color based method
to eliminate the false alarms. This pruned set is clustered to group the
similar faces and representative faces are selected from each cluster to
be provided to the user. For six person queries of TRECVID2004, on the
average, the retrieval rate is increased from 8% to around 50%, and the
number of images that the user has to inspect are reduced from hundreds
and thousands to tens.
1 Introduction
News videos, with their high social impact, are a rich source of information, there-
fore multimedia applications which aim to ease their access are important. In-
dexing, retrieval and analysis of these news videos constitute a big challenge due
to their multi-modal nature. This challenge has recently been acknowledged by
NIST and broadcasted news videos are chosen as the data set for the TRECVID
(TREC Video Retrieval Evaluation) [1] competition.
Broadcast news mostly consist of stories about people making the queries
related to a specific person important. The common way to retrieve the infor-
mation related to a person is to query his/her name on the speech transcript or
closed caption text. Such retrieval methods are based on the assumption that
a person is likely to appear when his/her name is mentioned. However, this as-
sumption does not always hold. For instance, as shown in Figure 1, Clinton’s face
appears when his name is not mentioned in the speech transcript, and whenever
the anchorperson or the reporter is speaking, his name is mentioned. As a result,
a query based only on text is likely to yield f