Embedding Data within Knowledge Spaces
James D. Myers, Joe Futrelle, Jeff Gaynor, Joel Plutchak, Peter Bajcsy, Jason Kastner, Kailash
Kotwani, Jong Sung Lee, Luigi Marini, Rob Kooper, Robert E. McGrath, Terry McLaren, Alejandro
Rodriguez, Yong Liu
National Center for Supercomputing Applications, University of Illinois at Urbana Champaign
Abstract
The promise of e-Science will only be realized when data is discoverable, accessible, and
comprehensible within distributed teams, across disciplines, and over the long-term – without reliance
on out-of-band (non-digital) means. We have developed the open-source Tupelo semantic content
management framework and are employing it to manage a wide range of e-Science entities (including
data, documents, workflows, people, and projects) and a broad range of metadata (including
provenance, social networks, geospatial relationships, temporal relations, and domain descriptions).
Tupelo couples the use of global identifiers and resource description framework (RDF) statements
with an aggregatable content repository model to provide a unified space for securely managing
distributed heterogeneous content and relationships. The Tupelo framework includes an HTTP-based
data/metadata management protocol, application programming interfaces, and user interface widgets
which have been incorporated into NCSA’s portal and workflow tools and is a key component in
recent work creating dynamic digital observatories (digital watersheds) that combine observational
and modeled information. Tupelo also supports specialized indexes and inference logic (computation)
relevant to metadata including geospatial location and provenance. This additional capability creates a
powerful knowledge space that can map between disciplinary conceptual models and between the
storage and data organization choices made by different e-Science organizations.
Key words: semantic web, content management, e-Science, virtual organizations
Introduction
E-Science and Cyberinfrastructure are often des