National Science Foundation
Project Title: Access Methods for Bitemporal Databases
Vassilis J. Tsotras
Department of Computer Science and Engineering
University of California
Riverside, CA 92521
Substitute-PI (Aug.1998-Aug.1999):
Alex Delis
Department of Computer and Information Science
Polytechnic University
Brooklyn, NY 11201
Contact Information
Vassilis J. Tsotras
Department of Computer Science and Engineering
University of California
Riverside, CA 92521
Phone: (909) 787-2888
Fax : (909) 787-4643
Email: tsotras@cs.ucr.edu , http://www.cs.ucr.edu/~tsotras
WWW PAGE
http://www.cs.ucr.edu/~tsotras/temporal.html
Project Award Information
-
Award Number: NSF IIS-9509527
-
Duration: Sept. 1, 1995-Aug. 31, 1999
-
Title: Access Methods for Bitemporal Databases
Keywords
temporal databases, access methods, transaction-time, valid-time,
I/O optimization
Project Summary
Traditional databases capture only the most current data
(snapshot) of the modeled reality. While snapshot information is enough
for a number of applications, it is not sufficient for applications that
require past and/or future data. Instead, a bitemporal database is needed,
i.e., a database that supports both valid time (the time when an event
was valid in the modeled reality) and transaction time (the time when the
database was updated). Much research has been performed recently on access
methods that support transaction time, however, not much has been done
for bitemporal indexes, i.e., methods that support both transaction and
valid time on the same index structure. The objective of this project is
to design efficient access methods for bitemporal databases. A novel approach
is used that reduces bitemporal queries to problems of partial persistence
for which efficient access methods are then designed. Various basic bitemporal
queries are addressed, like the bitemporal pure and range timeslice queries.
We have also examined the problem of temporal hashing, i.e., membership
queries over sets time-evolving sets. Currently we examine efficient ways
to perform bitemporal joins. We are also looking into indexing spatiotemporal
databases. The results of this project aim at more efficient implementation
of Temporal DBMS.
Goals, Objectives, and Targeted Activities
The primary goal of this project is to provide efficient
access methods for bitemporal databases. Our basic contribution is that
bitemporal queries can be seen as partial persistence problems. In the
first two years we investigated various bitemporal queries, starting from
the simplest bitemporal pure-timeslice query. We invented the Bitemporal
Interval Tree for this problem. Our approach was to take the Interval Tree
and make it partially persistent. We implemented this method and compared
it with two other approaches: (i) the 2-R tree methodology, and (ii) an
approach that uses a single R-tree. The performance of the Bitemporal Interval
Tree and its limitations are examined in [2]. While the Bitemporal Interval
Tree has guaranteed worst case performance, it is limited in that it requires
apriori knowledge of the valid-time universe and applies only to the bitemporal
pure-timeslice query.
We then examined more complex bitemporal queries, including
queries involving ranges in the transaction and valid time dimensions.
We were able to provide a more robust solution based on persistent R-trees.
The result is the Bitemporal R-tree [1], that applies partial persistence
on an R-tree. Our method has good average case behavior (which is also
what regular R-trees provide) but it applies to a large variety of bitemporal
queries involving ranges. We implemented this method and compared with
the 2-R and single R-tree approaches. Our results show that the Bitemporal
R-tree behaves much better than the other approaches [1].
Past research in Temporal Access Methods has been concentrated
in devising tree-based indices [4]. Such methods usually address temporal
queries that involve some range predicate ("find all employees in the company
at time t with ids in range R"). It thus interesting to see whether efficient
hashing-based methods are possible for temporal data. In [3] we answer
positively this question, i.e., we provide an efficient hashing scheme
(an extension of linear hashing in a temporal environment) for "temporal
membership queries" (like: "find whether employee with id X was in the
company at time t"). As with a non-temporal environment, (temporal) hashing
is faster than (temporal) tree-based indices for (temporal) membership
queries.
During the same period we also worked on revising the
paper "A Comparison of Access Methods for Temporal Data", co-authored with
B. Salzberg [4]. The initial paper covered transaction-time databases.
In the revised paper we added valid and bitemporal methods. We also introduce
lower bounds for temporal queries (transaction, valid and bitemporal).
The extensive revision almost doubled the size of the initial paper.
In [5] we examined the problem of supporting Temporal
Views in a Management Information Base, the database used for Network Management.
Keeping the history of a network's evolution is useful for many network
management applications (billing, post-mortem fault analysis etc.) A network
environment has the special characteristic that data changes rapidly, i.e.,
if special data structures are used to store this information, they have
to be updated quickly. In [5] we present methods for two kind of views:
Partial and Hierarchical; our methods have the advantage of using minimal
space (proportional to the number of changes), fast updating and fast querying.
Temporal, spatial and spatiotemporal queries are inherently
multidimensional, combining predicates on time dimension(s) with predicates
on explicit attributes and/or several spatial dimensions. In the past there
was no consistent way to refer to temporal or spatiotemporal queries, thus
leading to considerable confusion. In an attempt to eliminate this problem,
we propose a new notation in [8]. Our notation is simple and extensible.
In general we classify spatiotemporal queries according to: Key// X_dimension/
Y_dimension/ Z_dimension// Valid/ Transaction. The proposed notation was
inspired by the one used for queueing system models (e.g. M/M/1 systems).
In addition, we worked in another database related problem,
that of disseminating data to multiple users (clients) through broadcasting.
Under broadcasting, a server continuously and repeatedly broadcasts pages
to a client community without the exact knowledge of what the clients need.
This is due to the lack of, or, limited uplink communication channel from
the clients to the server. When a client needs a certain page, it monitors
the broadcast channel until the desired page is detected and captures it
for use. The client response time corresponds to the elapsed time from
when a request for a particular page is made by a client to when the requested
page is broadcast by the server and consequently received by the client.
One of the main challenges in this environment, is the organization of
the data pages in a broadcast schedule so as to minimize the average client
response time. Due to the asymmetric communication, the server may know
only the past access pattern of the clients or an estimate of the clients'
access requests. The server relies on this information in order to broadcast
the pages according to a schedule that results in low latency for the clients.
In [6] we consider a broadcasting policy in which the server transmits,
at each slot, the page which is most likely to be requested by the client,
given the history of previous broadcasts. This policy results in periodic
schedules with low latency. In [7] we extend our work and discuss characteristics
of the optimal policy for this setting. We also provide a class of suboptimal
policies that result in average user response time close to the lower bound.
In [9] we presented an update to the Temporal Database
bibliography. This bibliography shows that temporal database research has
increased significantly over the last few years, as established by the
number of publications related to the field. In [10] we prepared a general
article for temporal snapshot queries for an encyclopedia. In [11] we prepared
a review article on temporal databases for the J.Wiley Encyclopedia.
On-going work. Currently we are working on the following
problems: (a) efficient bitemporal joins and (b) indexing spatiotemporal
databases. Our approach in (a) is concentrated on indexed temporal relations.
In (b) we are examining the behavior of spatial indexes when made persistent,
i.e., when time is added on the index.
Indication of Success
We have performed most of the work according to the project's
first three years plan. Our work has been concentrated on devising efficient
temporal access methods. We consider a method to be efficient if it provides
fast query time while using space that remains linear to the total number
of changes in the temporal evolution and fast update processing.
We have produced a number of results:
A. We have addressed some of the important problems in
designing access methods for bitemporal databases. We have provided a robust
method with very good average case performance, the Bitemporal R-Tree [1],
a method with good worst case performance for bitemporal timeslice queries,
the Bitemporal Interval Tree [2]. These methods were implemented and compared
with other approaches.
B. We have devised an efficient method to support temporal
hashing [3]; as with non-temporal data, hashing is proved to be faster
than tree-based indices for membership queries.
C. We have also presented lower bounds for such problems
and temporal queries in general [4].
D. We provided efficient ways to support Temporal Views
in a network management environment [5].
E. We have worked on the data dissemination problem in
an asymmetric broadcasting environment [6,7].
We are currently working on two problems related to temporal
access methods, i.e., Bitemporal Joins and Indexing for Spatiotemporal
Databases.
The results of this project will enable the efficient
index implementation of temporal databases. This problem is of particular
importance given the usually large amount of temporal data that needs to
be stored.
Project Impact
There is one Research Assistant currently working on this
project, Mr. George Kollios, who joined Polytechnic University in September
1996. During the first year of his studies he worked on the temporal hashing
problem. He is currently working on the bitemporal joins problem.
Another research assistant (Dr. Anil Kumar) also worked
on problems related to this project. He received his Ph.D. from Polytechnic
University in January 1998.
Parts of this research (especially papers [1] and [3]
from List of Publications) were used in two courses taught at Polytechnic
University (Spring 1997; CIS903: Advanced Topics in Databases) and at University
of California, Riverside (Spring 1998; CS267 Seminar in Databases).
The survey paper [3] provides the database community
and database developers with a much needed comparison of various proposed
temporal access methods. The same paper has also been used in advanced
database courses in other Universities.
Project References
[1] A. Kumar, V.J. Tsotras, C. Faloutsos, "Designing Access
Methods for Bitemporal Data bases", IEEE Transactions on Knowledge and
Data Engineering, Vol 10, No. 1, pp 1-20, Jan-Feb 1998.
[2] A. Kumar, V.J. Tsotras, C. Faloutsos, "Access Methods
for Bitemporal Databases", International Workshop on Temporal Databases.
In Recent Advances in Temporal Databases, J. Clifford, A. Tuzhilin (eds.),
pp. 235-254, Springer-Verlag, 1995.
[3] G. Kollios, V.J. Tsotras, "Hashing Methods for Temporal
Data", submitted for publication; appears also as a TimeCenter TR-24 (http://www.cs.auc.dk/research/DBS/tdb/TimeCenter/publications3.html).
[4] B. Salzberg, V.J. Tsotras, "A Comparison of Access
Methods for Temporal Data", to appear at ACM Computing Surveys,
1999, appears also TimeCenter TR-18 (http://www.cs.auc.dk/research/DBS/tdb/TimeCenter/publications2.html)
[5] V.J. Tsotras, V. Phalke, A. Kumar, B. Gopinath, "Supporting
Temporal Views in a Management Information Base", to appear at the Journal
of Network and Systems Management. Appears also as Tech. Report from Polytechnic
Univ. (CATT-TR-95-87)
[6] C.J. Su, L. Tassiulas, V.J. Tsotras, "A New Method
to Design Broadcast Schedules in a Wireless Communication Environment",
Univ. of Maryland, ISR Tech. Rep. TR-96-30,1996.
[7] C.J. Su, L. Tassiulas, V.J. Tsotras, "Broadcast Scheduling
for Information Distribution", to appear at the ACM/Baltzer Wireless Networks
Journal, 1999.
[8] V.J. Tsotras, C.S. Jensen, R.T. Snodgrass, "A Notation
for Spatiotemporal Queries", ACM Sigmod Record, March 1998.
[9] A. Kumar, V.J. Tsotras, "Temporal Database Bibliography
Update", ACM Sigmod Record, pp 41-51, March 1996.
[10] V. J. Tsotras, "Indexing Temporal Data with the Snapshot
Index", to appear at the Encyclopedia of Microcomputers, Marcel Dekker,
Inc., New York.
[11] V. J. Tsotras, X. Sean Wang, "Temporal Databases",
to appear at the J.Wiley Encyclopedia of Electrical and Electronics Engineering,
John Wiley & Sons, NY, 1999.
Area Background
Conventional database systems capture only a single snapshot
of the modeled reality. While serving many applications well, they are
not sufficient for applications that require the support of time-varying
information (past and/or future data). Instead, temporal database systems
have been proposed as they can store and query temporal data through the
support of two orthogonal time dimensions: the valid and transaction times.
The valid time of a fact is the time when the fact is true in the modeled
reality. Transaction time on the other hand refers to the time when a new
value is posted to the database by a transaction. A temporal database is
categorized as transaction-time, valid-time or bitemporal, according to
which temporal dimension(s) it supports. The transaction time dimension
represents the history of a database activity rather than real world history.
Transaction times are system generated and monotonically increasing. Since
it is impossible to change the past, transaction times cannot be changed
and there is no way to correct errors in past tuples. A valid-time database
maintains the entire history of an enterprise as best known now, i.e.,
it stores the current knowledge about the past and future. Any errors discovered
in this history, are corrected by modifying the database. When a correction
is applied, previous values are not retained; therefore it is not possible
to view the database as it was before the correction.
Clearly both time dimensions are needed in order to accurately
model reality. In a bitemporal database one can query tuples that are valid
at some (valid) time, as known at some other (transaction) time. Due to
the possible large amount of data that a bitemporal database may record,
it is important to invent efficient access methods for such databases.
By efficient we mean a method that has very fast query time, while keeping
its space linear to the number of changes in the temporal evolution (where
a change corresponds to an object addition, deletion or modification) and
has fast update processing.
Area References
R. Snodgrass, I. Ahn, "Temporal Databases", IEEE Computer,
Vol.19, No.9, pp 35-42, 1986.
G. Ozsoyoglu, R. Snodgrass, "Temporal and Real-Time Databases:
A Survey", IEEE Trans. on Knowledge and Data Engineering, Vol. 7, No. 4,
pp 513-532, Aug. 1995.
D. Lomet, B. Salzberg, "Access Methods for Multiversion
Data", Proc. ACM SIGMOD, 1989.
B. Becker, S. Gschwind, T. Ohler, B. Seeger, P. Widmayer,
"An Asymptotically Optimal Multiversion B-Tree", VLDB Journal 5(4): 264-275,
1996.
B. Salzberg, V.J. Tsotras, "A Comparison of Access Methodsfor
Temporal Data", to appear at ACM Computing Surveys, 1999, appears
also TimeCenter TR-18 (http://www.cs.auc.dk/research/DBS/tdb/TimeCenter/publications2.html)
V. J. Tsotras, B. Gopinath, G.W. Hart, "Efficient Management
of Time-Evolving Databases", IEEE Trans. on Knowledge and Data Engineering,
Vol. 7, No. 4, pp 591-608, Aug. 1995.
Potential Related Projects
I am interested in SpatioTemporal related problems. The Bitemporal
R-Tree can be utilized as a general Spatiotemporal access method since
one dimension is used for transaction-time evolution while the other R-tree
dimensions can be used to define for spatial objects.