International Journal of Document Analysis (2006) 8(2) 66–86
DOI 10.1007/s10032-006-0017-x
ORIGINAL PAPER
David W. Embley · Matthew Hurst ·
Daniel Lopresti · George Nagy
Table-processing paradigms: a research survey
Received: 9 January 2005 / Accepted: 1 January 2006 / Published online: 9 May 2006
c© Springer-Verlag 2006
Abstract Tables are a ubiquitous form of communication.
While everyone seems to know what a table is, a pre-
cise, analytical definition of “tabularity” remains elusive be-
cause some bureaucratic forms, multicolumn text layouts,
and schematic drawings share many characteristics of tables.
There are significant differences between typeset tables,
electronic files designed for display of tables, and tables in
symbolic form intended for information retrieval. Most past
research has addressed the extraction of low-level geomet-
ric information from raster images of tables scanned from
printed documents, although there is growing interest in the
processing of tables in electronic form as well. Recent re-
search on table composition and table analysis has improved
our understanding of the distinction between the logical and
physical structures of tables, and has led to improved for-
malisms for modeling tables. This review, which is struc-
tured in terms of generalized paradigms for table process-
ing, indicates that progress on half-a-dozen specific research
issues would open the door to using existing paper and elec-
tronic tables for database update, tabular browsing, struc-
tured information retrieval through graphical and audio in-
terfaces, multimedia table editing, and platform-independent
display.
Keywords Document analysis · Table recognition · Table
understanding
D. W. Embley
Computer Science Department, Brigham Young University,
Provo, UT 84602, USA
M. Hurst
Intelliseek Applied Research Center, Pittsburgh, PA 15213, USA
D. Lopresti (B)
Department of Computer Science and Engineering, Lehigh University,
Bethlehem, PA 18015, USA
G. Nagy
Department of Electrical, Computer, and Systems Engineeri