National Science Foundation

Project Title: Access Methods for Bitemporal Databases

Vassilis J. Tsotras
Department of Computer Science and Engineering
University of California
Riverside, CA 92521

Substitute-PI (Aug.1998-Aug.1999):

Alex Delis
Department of Computer and Information Science
Polytechnic University
Brooklyn, NY 11201

Contact Information

Vassilis J. Tsotras
Department of Computer Science and Engineering
University of California
Riverside, CA 92521
Phone: (909) 787-2888
Fax : (909) 787-4643
Email: tsotras@cs.ucr.edu , http://www.cs.ucr.edu/~tsotras

WWW PAGE

http://www.cs.ucr.edu/~tsotras/temporal.html

Project Award Information

Award Number: NSF IIS-9509527
Duration: Sept. 1, 1995-Aug. 31, 1999
Title: Access Methods for Bitemporal Databases

Keywords

temporal databases, access methods, transaction-time, valid-time, I/O optimization

Project Summary

Traditional databases capture only the most current data (snapshot) of the modeled reality. While snapshot information is enough for a number of applications, it is not sufficient for applications that require past and/or future data. Instead, a bitemporal database is needed, i.e., a database that supports both valid time (the time when an event was valid in the modeled reality) and transaction time (the time when the database was updated). Much research has been performed recently on access methods that support transaction time, however, not much has been done for bitemporal indexes, i.e., methods that support both transaction and valid time on the same index structure. The objective of this project is to design efficient access methods for bitemporal databases. A novel approach is used that reduces bitemporal queries to problems of partial persistence for which efficient access methods are then designed. Various basic bitemporal queries are addressed, like the bitemporal pure and range timeslice queries. We have also examined the problem of temporal hashing, i.e., membership queries over sets time-evolving sets. Currently we examine efficient ways to perform bitemporal joins. We are also looking into indexing spatiotemporal databases. The results of this project aim at more efficient implementation of Temporal DBMS.

Goals, Objectives, and Targeted Activities

The primary goal of this project is to provide efficient access methods for bitemporal databases. Our basic contribution is that bitemporal queries can be seen as partial persistence problems. In the first two years we investigated various bitemporal queries, starting from the simplest bitemporal pure-timeslice query. We invented the Bitemporal Interval Tree for this problem. Our approach was to take the Interval Tree and make it partially persistent. We implemented this method and compared it with two other approaches: (i) the 2-R tree methodology, and (ii) an approach that uses a single R-tree. The performance of the Bitemporal Interval Tree and its limitations are examined in [2]. While the Bitemporal Interval Tree has guaranteed worst case performance, it is limited in that it requires apriori knowledge of the valid-time universe and applies only to the bitemporal pure-timeslice query.
We then examined more complex bitemporal queries, including queries involving ranges in the transaction and valid time dimensions. We were able to provide a more robust solution based on persistent R-trees. The result is the Bitemporal R-tree [1], that applies partial persistence on an R-tree. Our method has good average case behavior (which is also what regular R-trees provide) but it applies to a large variety of bitemporal queries involving ranges. We implemented this method and compared with the 2-R and single R-tree approaches. Our results show that the Bitemporal R-tree behaves much better than the other approaches [1].
Past research in Temporal Access Methods has been concentrated in devising tree-based indices [4]. Such methods usually address temporal queries that involve some range predicate ("find all employees in the company at time t with ids in range R"). It thus interesting to see whether efficient hashing-based methods are possible for temporal data. In [3] we answer positively this question, i.e., we provide an efficient hashing scheme (an extension of linear hashing in a temporal environment) for "temporal membership queries" (like: "find whether employee with id X was in the company at time t"). As with a non-temporal environment, (temporal) hashing is faster than (temporal) tree-based indices for (temporal) membership queries.
During the same period we also worked on revising the paper "A Comparison of Access Methods for Temporal Data", co-authored with B. Salzberg [4]. The initial paper covered transaction-time databases. In the revised paper we added valid and bitemporal methods. We also introduce lower bounds for temporal queries (transaction, valid and bitemporal). The extensive revision almost doubled the size of the initial paper.
In [5] we examined the problem of supporting Temporal Views in a Management Information Base, the database used for Network Management. Keeping the history of a network's evolution is useful for many network management applications (billing, post-mortem fault analysis etc.) A network environment has the special characteristic that data changes rapidly, i.e., if special data structures are used to store this information, they have to be updated quickly. In [5] we present methods for two kind of views: Partial and Hierarchical; our methods have the advantage of using minimal space (proportional to the number of changes), fast updating and fast querying.
Temporal, spatial and spatiotemporal queries are inherently multidimensional, combining predicates on time dimension(s) with predicates on explicit attributes and/or several spatial dimensions. In the past there was no consistent way to refer to temporal or spatiotemporal queries, thus leading to considerable confusion. In an attempt to eliminate this problem, we propose a new notation in [8]. Our notation is simple and extensible. In general we classify spatiotemporal queries according to: Key// X_dimension/ Y_dimension/ Z_dimension// Valid/ Transaction. The proposed notation was inspired by the one used for queueing system models (e.g. M/M/1 systems).
In addition, we worked in another database related problem, that of disseminating data to multiple users (clients) through broadcasting. Under broadcasting, a server continuously and repeatedly broadcasts pages to a client community without the exact knowledge of what the clients need. This is due to the lack of, or, limited uplink communication channel from the clients to the server. When a client needs a certain page, it monitors the broadcast channel until the desired page is detected and captures it for use. The client response time corresponds to the elapsed time from when a request for a particular page is made by a client to when the requested page is broadcast by the server and consequently received by the client. One of the main challenges in this environment, is the organization of the data pages in a broadcast schedule so as to minimize the average client response time. Due to the asymmetric communication, the server may know only the past access pattern of the clients or an estimate of the clients' access requests. The server relies on this information in order to broadcast the pages according to a schedule that results in low latency for the clients. In [6] we consider a broadcasting policy in which the server transmits, at each slot, the page which is most likely to be requested by the client, given the history of previous broadcasts. This policy results in periodic schedules with low latency. In [7] we extend our work and discuss characteristics of the optimal policy for this setting. We also provide a class of suboptimal policies that result in average user response time close to the lower bound.
In [9] we presented an update to the Temporal Database bibliography. This bibliography shows that temporal database research has increased significantly over the last few years, as established by the number of publications related to the field. In [10] we prepared a general article for temporal snapshot queries for an encyclopedia. In [11] we prepared a review article on temporal databases for the J.Wiley Encyclopedia.

On-going work. Currently we are working on the following problems: (a) efficient bitemporal joins and (b) indexing spatiotemporal databases. Our approach in (a) is concentrated on indexed temporal relations. In (b) we are examining the behavior of spatial indexes when made persistent, i.e., when time is added on the index.

Indication of Success

We have performed most of the work according to the project's first three years plan. Our work has been concentrated on devising efficient temporal access methods. We consider a method to be efficient if it provides fast query time while using space that remains linear to the total number of changes in the temporal evolution and fast update processing.
We have produced a number of results:
A. We have addressed some of the important problems in designing access methods for bitemporal databases. We have provided a robust method with very good average case performance, the Bitemporal R-Tree [1], a method with good worst case performance for bitemporal timeslice queries, the Bitemporal Interval Tree [2]. These methods were implemented and compared with other approaches.
B. We have devised an efficient method to support temporal hashing [3]; as with non-temporal data, hashing is proved to be faster than tree-based indices for membership queries.
C. We have also presented lower bounds for such problems and temporal queries in general [4].
D. We provided efficient ways to support Temporal Views in a network management environment [5].
E. We have worked on the data dissemination problem in an asymmetric broadcasting environment [6,7].

We are currently working on two problems related to temporal access methods, i.e., Bitemporal Joins and Indexing for Spatiotemporal Databases.

The results of this project will enable the efficient index implementation of temporal databases. This problem is of particular importance given the usually large amount of temporal data that needs to be stored.

Project Impact

There is one Research Assistant currently working on this project, Mr. George Kollios, who joined Polytechnic University in September 1996. During the first year of his studies he worked on the temporal hashing problem. He is currently working on the bitemporal joins problem.
Another research assistant (Dr. Anil Kumar) also worked on problems related to this project. He received his Ph.D. from Polytechnic University in January 1998.
Parts of this research (especially papers [1] and [3] from List of Publications) were used in two courses taught at Polytechnic University (Spring 1997; CIS903: Advanced Topics in Databases) and at University of California, Riverside (Spring 1998; CS267 Seminar in Databases).
The survey paper [3] provides the database community and database developers with a much needed comparison of various proposed temporal access methods. The same paper has also been used in advanced database courses in other Universities.

Project References

[1] A. Kumar, V.J. Tsotras, C. Faloutsos, "Designing Access Methods for Bitemporal Data bases", IEEE Transactions on Knowledge and Data Engineering, Vol 10, No. 1, pp 1-20, Jan-Feb 1998.

[2] A. Kumar, V.J. Tsotras, C. Faloutsos, "Access Methods for Bitemporal Databases", International Workshop on Temporal Databases. In Recent Advances in Temporal Databases, J. Clifford, A. Tuzhilin (eds.), pp. 235-254, Springer-Verlag, 1995.

[3] G. Kollios, V.J. Tsotras, "Hashing Methods for Temporal Data", submitted for publication; appears also as a TimeCenter TR-24 (http://www.cs.auc.dk/research/DBS/tdb/TimeCenter/publications3.html).

[4] B. Salzberg, V.J. Tsotras, "A Comparison of Access Methods for Temporal Data", to appear at ACM Computing Surveys, 1999, appears also TimeCenter TR-18 (http://www.cs.auc.dk/research/DBS/tdb/TimeCenter/publications2.html)

[5] V.J. Tsotras, V. Phalke, A. Kumar, B. Gopinath, "Supporting Temporal Views in a Management Information Base", to appear at the Journal of Network and Systems Management. Appears also as Tech. Report from Polytechnic Univ. (CATT-TR-95-87)

[6] C.J. Su, L. Tassiulas, V.J. Tsotras, "A New Method to Design Broadcast Schedules in a Wireless Communication Environment", Univ. of Maryland, ISR Tech. Rep. TR-96-30,1996.

[7] C.J. Su, L. Tassiulas, V.J. Tsotras, "Broadcast Scheduling for Information Distribution", to appear at the ACM/Baltzer Wireless Networks Journal, 1999.

[8] V.J. Tsotras, C.S. Jensen, R.T. Snodgrass, "A Notation for Spatiotemporal Queries", ACM Sigmod Record, March 1998.

[9] A. Kumar, V.J. Tsotras, "Temporal Database Bibliography Update", ACM Sigmod Record, pp 41-51, March 1996.

[10] V. J. Tsotras, "Indexing Temporal Data with the Snapshot Index", to appear at the Encyclopedia of Microcomputers, Marcel Dekker, Inc., New York.

[11] V. J. Tsotras, X. Sean Wang, "Temporal Databases", to appear at the J.Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, NY, 1999.

Area Background

Conventional database systems capture only a single snapshot of the modeled reality. While serving many applications well, they are not sufficient for applications that require the support of time-varying information (past and/or future data). Instead, temporal database systems have been proposed as they can store and query temporal data through the support of two orthogonal time dimensions: the valid and transaction times. The valid time of a fact is the time when the fact is true in the modeled reality. Transaction time on the other hand refers to the time when a new value is posted to the database by a transaction. A temporal database is categorized as transaction-time, valid-time or bitemporal, according to which temporal dimension(s) it supports. The transaction time dimension represents the history of a database activity rather than real world history. Transaction times are system generated and monotonically increasing. Since it is impossible to change the past, transaction times cannot be changed and there is no way to correct errors in past tuples. A valid-time database maintains the entire history of an enterprise as best known now, i.e., it stores the current knowledge about the past and future. Any errors discovered in this history, are corrected by modifying the database. When a correction is applied, previous values are not retained; therefore it is not possible to view the database as it was before the correction.

Clearly both time dimensions are needed in order to accurately model reality. In a bitemporal database one can query tuples that are valid at some (valid) time, as known at some other (transaction) time. Due to the possible large amount of data that a bitemporal database may record, it is important to invent efficient access methods for such databases. By efficient we mean a method that has very fast query time, while keeping its space linear to the number of changes in the temporal evolution (where a change corresponds to an object addition, deletion or modification) and has fast update processing.

Area References

R. Snodgrass, I. Ahn, "Temporal Databases", IEEE Computer, Vol.19, No.9, pp 35-42, 1986.

G. Ozsoyoglu, R. Snodgrass, "Temporal and Real-Time Databases: A Survey", IEEE Trans. on Knowledge and Data Engineering, Vol. 7, No. 4, pp 513-532, Aug. 1995.

D. Lomet, B. Salzberg, "Access Methods for Multiversion Data", Proc. ACM SIGMOD, 1989.

B. Becker, S. Gschwind, T. Ohler, B. Seeger, P. Widmayer, "An Asymptotically Optimal Multiversion B-Tree", VLDB Journal 5(4): 264-275, 1996.

B. Salzberg, V.J. Tsotras, "A Comparison of Access Methodsfor Temporal Data", to appear at ACM Computing Surveys, 1999, appears also TimeCenter TR-18 (http://www.cs.auc.dk/research/DBS/tdb/TimeCenter/publications2.html)

V. J. Tsotras, B. Gopinath, G.W. Hart, "Efficient Management of Time-Evolving Databases", IEEE Trans. on Knowledge and Data Engineering, Vol. 7, No. 4, pp 591-608, Aug. 1995.

Potential Related Projects

I am interested in SpatioTemporal related problems. The Bitemporal R-Tree can be utilized as a general Spatiotemporal access method since one dimension is used for transaction-time evolution while the other R-tree dimensions can be used to define for spatial objects.