Querying incomplete information in semistructured data

Yaron Kanza, Werner Nutt, Yehoshua Sagiv

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)


Semistructured data occur in situations where information lacks a homogeneous structure and is incomplete. Yet, up to now the incompleteness of information has not been reflected by special features of query languages. Our goal is to investigate the principles of queries that allow for incomplete answers. We do not present, however, a concrete query language. Queries over classical structured data models contain a number of variables and constraints on these variables. An answer is a binding of the variables by elements of the database such that the constraints are satisfied. In the present paper, we loosen this concept in so far as we allow also answers that are partial; that is, not all variables in the query are bound by such an answer. Partial answers make it necessary to refine the model of query evaluation. The first modification relates to the satisfaction of constraints: in some circumstances we consider constraints involving unbound variables as satisfied. Second, in order to prevent a proliferation of answers, we only accept answers that are maximal in the sense that there are no assignments that bind more variables and satisfy the constraints of the query. Our model of query evaluation consists of two phases, a search phase and a filter phase. Semistructured databases are essentially labeled directed graphs. In the search phase, we use a query graph containing variables to match a maximal portion of the database graph. We investigate three different semantics for query graphs, which give rise to three variants of matching. For each variant, we provide algorithms and complexity results. In the filter phase, the maximal matchings resulting from the search phase are subjected to constraints, which may be weak or strong. Strong constraints require all their variables to be bound, while weak constraints do not. We describe a polynomial algorithm for evaluating a special type of queries with filter constraints, and assess the complexity of evaluating other queries for several kinds of constraints. In the final part, we investigate the containment problem for queries consisting only of search constraints under the different semantics. © 2002 Elsevier science (USA).

Original languageEnglish
Pages (from-to)655-693
Number of pages39
JournalJournal of Computer and System Sciences
Issue number3
Publication statusPublished - May 2002


Dive into the research topics of 'Querying incomplete information in semistructured data'. Together they form a unique fingerprint.

Cite this