Data communication and replication strategies for large scale distributed databases

Xiangning Liu, Purdue University

Abstract

When the number of sites in a distributed database system increases to a large scale, how to support high system performance, consistency, and availability arises as a problem. To address this problem, this thesis presents a series of experiments on Internet communication software and proposes UDP (User Datagram Protocol) cooling mechanisms for efficient message transmission, a scalable model for data replication, and a scheme for availability evaluation. The experiments were conducted on an Ethernet LAN (Local Area Network), an Internet WAN (Wide Area Network), and an ATM (Asynchronous Transfer Mode) LAN. In the experiments on the Ethernet LAN, a UDP cooling data transmission strategy reduced the datagram loss by a factor of 10 for messages over 200 Kbyte. In the experiments on the ATM LAN, a deadlock problem of the TCP (Transmission Control Protocol) implementation was identified and is discussed. The proper use of ATM to avoid this deadlock and the use of larger buffer sizes for high performance are proposed. This research evolves from prior thesis research on UDP communication by L. Mafla, Y. Zhang and others in Raid Lab at Purdue University. The scalable model for data replication groups closely-related sites in a hierarchical cluster architecture. Corresponding to this, a hierarchical data-replica structure is formed. In such an environment, a multi-view access protocol, which maintains multiple levels of consistent views by propagating updates level by level, is proposed to support efficient access and insular consistency. Insular consistency guarantees that databases always change from a consistent state to a consistent state although read-only transactions may access stale data. Compared with N. Adly's research at Cambridge University, the atomic unit of execution in this research is a transaction rather than a single operation, and insular consistency is supported in addition to strong and relaxed consistency. The scheme to evaluate the availability of transaction systems takes time and space which are only a logarithm of those taken by the method of Martell et al. of Politecnico di Milano in Italy. The performance and availability of systems are evaluated mathematically and verified by computer simulation showing the advantage of the proposed multi-view access protocol on the basis of the presented algorithms and experiments.

Degree

Ph.D.

Advisors

Bhargava, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS