A hierarchical approach to concurrency control for multidatabases

Hierarchical concurrency control has been proposed as one possible approach for multidatabase systems. However, to apply this approach, some restrictions must be imposed on the local concurrency control algorithms. The restriction is identified. Based on this restriction, the hierarchical concurrency control approach is formalized and its correctness is proven. A novel global concurrency control algorithm based on this hierarchical approach is also presented.<<ETX>>


Introduction
A Multidatabase System (MDBS) is a facility that allows access to data stored in multiple autonomous and possibly heterogeneous database systems.An MDBS is characterized by strong autonomy requirements [DEL089] [GK88] [EVS7] of its local database systems implying that operations of each local database system must be unaffected by the MDBS facility.Moreover, each local database systems is allowed to leave or join an MDBS without any reprogramming or loss of data consistency in the MDBS.
Global concurrency control is required in order to allow concurrent global updates in an MDBS.A general hierarchical approach to concurrency control has been proposed for the autonomous database environment [GP86].Many global concurrency control algorithms have been recently proposed based on this general approach [Pu88] [AGS87] [Vid87].However, this general approach is not suitable for all MDBS environments.In this paper, we will concentrate on examining the local concurrency control restrictions under which the hierarchical approach is applicable.A new global concurrency control algorithm based on this general approach is also proposed in this paper.
The rest of this paper is organized as follows.In section 2, a transaction model is introduced, global serializability is defined and an example showing the effect of autonomy on global concurrency control is given.In section 3, the general hierarchical approach is discussed and a static property is defined.Finally, the general hierarchical approach is formalized and its cor• rectness is proved.In section 4, a new global concurrency control algorithm is presented.In section 5, existing global concurrency control algorithms based on the hierarchical approach are surveyed.A summary of this paper is given in section 6.

Global Serializability
A transaction contains a set of read and write operations.Two operations are said to be conflicting if (1) they belong to two different transactions, (2) they access the same data item and (3) at least one of them is a write operation.A concurrency control algorithm controls the execution order of the conflicting operations such that the serializability property of the history is maintained.Serializability is used as the correctness criterion for the MDBS in this paper, therefore, an LCC has to maintain the serializability L,  precedes OJ in 1l.
The global serializability theorem can then be stated as follows: Lemma 2.2 (The Global Serializability Theorem) If the global serialization graph of a global history 1t is acyclic then 1-£ is serializable.
The proof of this theorem is similar to the serializability theorem in centralized database systems (see [BGS5]) and is not given in this paper.

Autonomy
The difficulty in doing the global concurrency control in MDBSs is mainly due to local autonomy.The set V is called the order domain; and di is called the serialization order of transaction Tj • The time tj when the transaction 1i is serialized is called the serialization point of Tj.The order set V can be any countable or uncountable set, if only a total order can be defined on it.

The Static Properties of the Existing Concurrency Control Algorithms
Most of the existing concurrency control algorithms are static.Moreover, most of their order domain are the same as the time domain.For example, in the two phase locking algorithm [EGLT76], locking is used to resolve the execution order of the conflicting operations.The time when a transaction acquires the locks for all data items it needs is called the lock point of the transaction.Lock point can be used as a serialization order of a transaction.Since for any two transactions Tj and Tj, if Tj reaches its serialization IFor a serializable history, T; is said to be serialized before T; if Ti conflicts with T; and the confliding operations of Ti are executed before those of T,.
point before Tj, then Ti must be serialized before Tj.Moreover, the lock point occurs within the lifetime of the transaction.In other words, the two phase locking algorithm is static.Furthermore, the serialization point and serialization order of a transaction are the same in a two phase locking algorithm.
For timestamping concurrency control algorithm [BHG87], a timestamp assigned at the beginning of a transaction is used to resolve the conflicts.
All conflicting operations must be executed according to their corresponding transaction timestamps.Since the transactions are serialized by their timestamps, the timestamp of a transaction can be used as its serialization order.This order is determined at the beginning of the transaction which is within the lifetime of the transaction.Therefore, the timestamping algorithm is static.
For optimistic concurrency control algorithm[KR8l], a transaction number is assigned to a transaction at the end of its read phase.The transaction is then validated using this number.It can be reasoned that transactions are serialized according to their transaction numbers.Therefore, the transaction number can be used as its serialization order.Furthermore, a transaction number is determined at the end of a transaction read phase which occurs within the lifetime of the transaction.Due to these observations, we conclude that the optimistic concurrency control algorithm is static.
A newly developed concurrency control algorithm is the value dates concurrency control algorithm [LT88].In the value dates concurrency control algorithm, a transaction must specify the time (or date) when it is going to finish.The algorithm then uses these value dates to resolve conflicts.If a data item is not available to a transaction ( the operation of the transaction is late), then the transaction can either be undone or access other data item if possible.Since only the transaction with later value date can be scheduled to wait until the other terminates, the algorithm will serialize the transactions according to their value dates.Therefore, the value date _ _t Not all of the existing concurrency control algorithms are static.For example, the serialization graph testing algorithm [CasSI] is not static.Even though the serialization graph testing algorithm maintains the serializability property of the transaction execution, there is no specific time for which the serialization order of a transaction can be determined.The serialization order of a transaction is affected by the execution of the concurrent transactions in such a way that it may not be determined.

The Importance of the Static Properties on Global Concurrency Control
Two important static properties are • Serialization order is determined within the lifetime of a transaction; and, • The serialization order is a total order.
The first property enables us to obtain the serialization order without violating local autonomy.If a transaction is not serialized in its lifetime, then other transactions must be consulted in order to determine the serialization order of it.Among them there may be some local transactions.To access information about local transaction is a violation of local autonomy.The second property is needed to prevent the invisible indirect serialization order between subtransactions introduced by the local transactions.One of the invisible indirect serialization order is shown in example 1.The invisible indirect serialization order will not occur if the concurrency control algorithm is static.This can be argued by using example 1.Consider local history h z .If the local concurrency control algorithm in LDBS z is static, then the orders of G 1 and G z at LDBSz must be determined when both the G 1 and G z finish.If G 1 precedes G z , then there is no valid serialization order for L. since according to the conflicts, serialization order of L must precede G 1 and be after G z , which is not possible.In this case L will be aborted by the LCC.If the serialization order of G z precedes the serialization order of G I , then the indirect serialization order introduced by L is the same as the order between G I and Gzi therefore is visible to the GCC.

Correctness of the hierarchical approach
Under the assumption that local concurrency control algorithms are static, it can be shown that the global concurrency control problem turns out to be a problem of maintaining the compatibility of the serialization orders of the global subtransactions.In this section, we formalize the hierarchical concurrency control approach and prove its correctness.exists a total order on 9 such that V GiJ Gj E Y, if G. precedes Gj in the totalorder J then for any k from 1 to n, Sk(Gik) < Sk(Gjk) if both Gik and Gjk exist.
The correctness of the hierarchical approach is stated in the following theorem. Theorem

Limitations of the Hierarchical Approach
The hierarchical approach is very restrictive in the sense that it only allows a small subset of all serializable histories.It is possible for a serializable h..istory not to satisfy the two conditions of the theorem 3.1.This is illustrated by the following example.
Example 3: Consider the MDBS in example 1, suppose now we have the the following local histories: hI: wgI(a)

h" T,,(b)
According to the local histories, we have SI(On) < Sl( G I2 ) at LDB8 1 (since wgI(a) conflicts with wg2(a) and precedes wg2(a)) and possibly we may also have S2(G22 ) < 82(012) (this depends on which sub transaction is serialized first) at LDBS 2 • In this case, the serialization orders are not compatible.
However, the global history is serializable.This is because the fact that the subtransactions G l2 and 0 22 at LDBS 2 do not conflict is not taken into account in the conditions of theorem 3.1.Based on this observation, it seems that the two conditions in theorem 3.1 are too restrictive.However, due to the lack of local information3 , we do not know whether the local transactions at the local database systems will introduce the indirect order between two non-conflicting subtransactions or not.The only thing that we can do is to assume that they do conflict, even though they may not conflict at all.In facts, the impossibility of detecting the conflict between subtransactions is the major obstacle which hinders the design of an efficient global concurrency control algorithm.

The Site Queue Algorithm
Using the assumption that local concurrency control algorithms are static, the Gee is reduced to maintaining the compatibility of the serialization orders of subtransactions.In this section, we present an algorithm for maintaining the compatibility of the subtransaction serialization orders.The proposed algorithm is a top down approach in the sense that the Gee decides the serialization orders of the subtransactions at the global level and then enforces them at the local level.In this section, we will also discuss a way of simulating the prepared state for the basic two phase commit protocol [G"79].

Assumptions
Before we outline the site queue algorithm, we first state our assumptions as follows: Al For all local concurrency control algorithms, the serialization point of a transaction is the same as its serialization order.
A2 The communication network is capable of maintaining the order of the messages it sends such that the order of the messages received at the destination site is the same as the order of the messages sent from the source site.
The first assumption is used to simplify the algorithm.This assumpt ioD can be relaxed by some minor modifications to the algorithm (see next section).The second assumption can be relaxed by appending with each message the message identifiers of those messages which are sent ahead of it.The LDBS can then use this information to restore the order of the messages.
In addition to these two assumptions, we also assume that the serialization point of a subtransaction is visible to the GCC.One way of exposing the serialization point without violating the local autonomy is discussed a latter section.

Site Queue Concurrency Control Algorithm
A server is created to maintain a subtransaction queue (Figure 5).The Submission Rule The server can submit a subtransaction to the LTM for execution only when the subtransaction is in the front of the queue and the previously submitted subtransaction is serialized or aborted.
An outline for the server is shown in figure 6.
It is noted that since a sub transaction may not be serialized before it is aborted, an ABORT message has to be sent to the server if a sub transaction is aborted.Since the sub transactions of global transactions are queued one after the other, and are then serialized in the local database system according to their positions in the queue, the site queue algorithm maintains the LOOP, do forever begin on receiving a subtransaetion begin insert the subtransaction into the rear of the queue; go to LOOP; end; if the queue is not empty then begin submit the first subtransaction in the queue to the LTMj wait for SERIALIZED or ABORT from the previously submitted subtransaction; end; end; For some local concurrency control algorithms, the order domain is the the same as the time domain.However the serialization point may nbt he the same as the serialization order (as in YO).In this case, the server has to wait for the previously submitted subtransaction to reach its serialization order rather than its serialization point, before it can submit the next subtransaction.

Exposing Serialization Points and Simulating Prepared States
It is discussed in [GL84] that the major problem of implementing the two phase commit protocol in an MDBS is the lack of prepared state in the local database systems.A prepared state is one in which a transaction finishes all of its read and computation operations and has all of its updates stored in a stable storage.A transaction in prepared state is able to commit or abort according to a global decision.Since a prepared state may not be supported by the local database systems, we simulate a prepared state for a subtransaction by properly restructuring the subtransaction.This restructuring also makes the serialization point of a subtransaction visible to the outside world.
In the proposed format (Figure 6), a subtransaction contains database operations, communication primitives and control statements.All of them are enclosed within a BEGIN_TRANSACTION and an END_TRANSACTION.
It is assumed that the local database system buffers the write operations in the private working area of the transaction until the transaction issues its commit operation.It is also assumed that the local database system supports a rollback operation which can recover a failed transaction.Every database operation is embedded within a conditional statement which will take a proper action when the execution of the database operation fails.At some point in the transaction when the subtransaction is first serialized, a send operation is inserted, which will report a SERIALIZED message to an agent of the GCC.If all the database operations are successfully executed, the subtransaction will wait for a PREPARE message from the coordinator of the two phase commit protocol.Subsequently, if a PREPARE message is received, the transaction will respond with a READY message to the coordinator, and then waits for a COMMIT or ABORT message from the coordinator.If a COMMIT message is received, the subtransaction will commit the sub transaction by issuing a commit operation.Otherwise, the sub transaction will abort itself by issuing a rollback operation.In case there is any failure in executing a database operation, subtransaction will abort itself and respond with an ABORT message to both the GCC agent and the coordinator.All of the above stated communications is done by using the send and receive primitives.
By restructuring a subtransaction as above, we simulate a prepared state into the subtransaction without violating the local autonomy.The role of the participant of the two phase commit protocol [BHG87J is assumed by the subtransaction, while the coordinator is the same as usual and must be implemented in one of the local database systems.Since the two phase commit protocol with simulated prepared state is very similar to the basic two phase commit protocol\ it will not be further detailed in this paper.
It is worth noting that since the updates are not stored in a stable storage, the two phase commit protocol with simulated state can not tolerate site failure.However, it can tolerate the subtransaction failure.This is the bottom up approach.In the second, GCe controls the execution ofthe subtransactions such that serialization orders are prevented from being incompatible.This is the top down approach.
The superdatabases approach proposed by [PuSS] is an example of the bottom up approach.In this approach, the LDBSs report to the superdatabase s the serialization order of each subtransaction under its control.The serialization order of a subtransaction is called the O..element ( order-element).The O..elements of the subtransactions of a global transaction is then used to construct an O_vector.The superdatabase then searches for a consistent position for this O_vector in the set of 0 _vectors of the recently committed global transactions.If a consistent position can be found, then the global transaction is committed, otherwise, it is aborted.Altruistic Locking protocol [AGS87] and Non-two-phase Locking protocol [Vid87] are examples of the top down approach.In these protocols, locking is used to maintain the compatibility of the serialization orders.Before submitting a subtransaction to an LDBS, the global transaction must lock the intended LDBS.A sub transaction can be submitted to a local database system only when the lock of the LDBS is available (not locked).The way that the LDBSs are locked and released must follows some correct protocols to guarantee the compatibility of the serialization orders.The Altruistic Locking protocol is a variant of the two phase locking protocol, which allows early release of locks.In the Non-two-phase Locking protocol, the LDBSs are first ordered as a rooted tree, then the tree protocol [KS86] is applied on this rooted tree.Both of these two protocols can be used as the global concurrency control protocol.It is to be noted that the static property is also required for the top down approach.The static property is needed to guarantee that the submission order of the subtransaction is the same as its serialization order.As shown in example 1, since the local concurrency con• trol algorithm in LDBS 2 is not static, even though the subtransaction ofG I is submitted before that of G 2 , the serialization order of the subtransaetion Sit is a global transaction manager.
of G 2 precedes that of G 1 • An example of a protocol that does not follow the the hierarchical approach is the site graph algorithm proposed by Breitbart [BST87].In this algorithm, a site graph is constructed in which nodes are sites (LDBSs), and edges are global transactions.If a global transaction accesses two data items on two different sites, an edge which is labeled by this global transaction is added between these two sites.The global serializability is main tained by retaining the acycliclty of the site graph.A database operation can be executed only if it does not create a cycle in the site graph.This approach allows low degree of concurrency for global transactions, since it does not allow any two global transactions to concurrently access more than one common site.Furthermore, it is not easy to purge the graph.
Generally speaking, the bottom up approach suffers from a high abortion rate of the global transactions.This can be illustrated by the following simple analysis.Let us assume that a bottom up approach is used for global concurrency control.Because of local autonomy, every pair of subtransactions executed on the same LOBS are assumed to conflict with each other.Consider an MOBS with three LOBSs.Suppose that there are two global transactions.Each global transaction has one subtransaction on each LOBS.Let us further assume that for any pair of subtransactions on the same LDBS, the probability for the serialization order of one to precede the other is ~.Then the probability for these two global transactions to have compatible serialization orders is t.If the number of global transactions increases to three in the above example.Then the probability for the serialization orders to be compatible becomes 3 1 6' which is very low.It can be shown that when the number of concurrent global transactions becomes large, the completion rate in the bottom up approach will be small.Since aborting global transactions is costly.We conjecture that the top down approach is more efficient.

Conclusion
One way of doing global concurrency control in an MDBS is to impose a control hierarchy on the Gee and LeGs.In a hierarchical concurrency control, Lees control the execution of local transactions and global subtransactions to retain the serializability of the local executions; while Gee controls the execution of global subtransactions to maintain the compatibility of the subtransaction serialization orders.However, this approach is not applicable to all MDBS environments.In this paper, we identify a class of local concurrency control algorithms on which the hierarchical concurrency control approach can be applied.This class of local concurrency control algorithms is characterized by having the static property.One contribution of this paper is to highlight this property.Other contributions are (1) to formalize the hierarchical approach and prove its correctness; (2) to propose a new deadlock free global concurrency control algorithm; and (3) to suggest a way of implementing the two phase commit protocol.
Transaction Model for MDBS AB shown in figure 1, an MDBS is composed of a set of pre-existing local database management systems, a set of global transactions, a set of local transactions, a set of Local Transaction Managers (LTMs) and a Global Transaction Manager (GTM).A local transaction is a transaction which is issued directly to one of the local database systems.A global transaction consists of a set of subtransactions, each of which accesses one local database system on behalf of the global transaction.It is assumed that each global transaction can have at most one subtransaction per local database system (Tills assumption simplifies the concurrency control problem).The LTM controls the execution of local transactions and global subtransactions in the local database system, while the GTM controls the execution of the global transactions at the global level.The LTMs and the GTM as a whole are responsible for traIlBaction management in the MDBS.Transaction management includes the concurrency control, commitment control and recovery control.Concurrency control is performed by the Local Concurrency Controller (LCC) and the Global Concurrency Controller (GCC) collectively.

Figure 1 :
Figure 1: A Transaction Model for MDBS

Figure 3 :
Figure 3: The serialization point and the serialization order To facilitate the discussion, we define a serialization function Sk for the local history h k .Let ST k denote the set of all the successfully executed global sub transactions and/or local transactions in hk.The serialization function Sk is defined as follows: Definition 3.1 A serialization function Sk for the local history h k is a mapping, where Sk: ST k -+ V A serialization function maps successfully executed transactions in a local history into their corresponding serialization orders.Now, we define the compatibility of the serialization orders of the subtransactions of a set of global transaction y as follows.Definition 3.2 The serialization orders of the subtmnsactions of a set of global tmnsactions 9 in an MDBS with n LDBSs are compatible if there o

"Figure 5 :
Figure 5: The Site Queue of an MDBS

Figure 6 :
Figure 6: Pseudo code for the server

5
Survey of the Existing Global Concurrency Control Algorithms Most of the proposed global concurrency control algorithms can be classified into the hierarchical concurrency control approach.Depending on how the compatibility of the serialization orders is maintained, most of the global concurrency control algorithms can be further classified into one of the two classes.In the first, a global transaction is first executed without any global control.At commit time, the execution of the global transaction is validated against the set of committed global transactions.The validation is done by the GCC by comparing the serialization orders of the subtransactions with the serialization orders of a set of recently committed global subtransactions.
The autonomy requirement significantly aggravates the concurrency control problem in an MDBS.The local autonomy require-Let hI and h 2 be local histories at LDBS 1 and LDBS 2 , respectively: mlnates (either commlts or aborts).A domain set V is a total order set.In other words, there is an irreflex.ive,transitivebinaryrelation < on V such that for any a,b E V, either a < b or b < a. H a < b we say that a precedes b and vice versa.A concurrency control algorithm is said to be static if it has the following properties:1.For every successfully executed transaction Ti, there exist a corre• sponding dj E V, such that 2. For any two different transactions Tj and T j , dj tdj;3.3 a timestamp tj E LT; such that dj will be determined when time tj is reached; and 4. If Ti conflicts with Tj and is serialized before!Tj, then dj < d j .
The set of the serialization orders of the subtransaclions of global transactions are compatible.The serialization order graph of this global history is shown in The serialization orders are compaHble if and only if the SOG of the set of the global transactions is acyclic.Ln,l-Jo, ... ,Ln,pn-tGi.From Gi-tLi,t we deduce that there are con:fl.ictoperations between Gj and Li,ti furthermore, the conflicting operations of Gi precedes those of Li,l.if Li,t is at LDBSk, then we have Sk(Gik) < [Deo74]r to check the compatibility of the serialization orders, we propose the Serialization Order Graph (SOG) which is defined as follows.Definition 3.3 A serialization order graph is a directed graph in which the vertices are the set of all global transactions, and there is an arc from Gi to Gj for any pair of global transactions Gi, Gj E g, if and only if 3 k such that both Gik and Gjk exist and Sk(Gid < Sk(Gjk) is true.Example 2: Let G t , G 2 and G 3 be three global transactions executed in an MDBS with three LDBSs.Suppose we have Sl(G 3 d < Sl(C n ) < S,(G 21 ), S2(G,,) < S2(G 22), and S3(G.,) < S3(G 23 ).G , does not have a subtransaction at LDBS 2 , and G 3 does not have any subtransaction at LDBS a .SP{Gip) < Sp(Gjp) for any p from 1 to n.Now assume that the SaG is cyclic, let's assume the cycle isG 1 --+ G 2 -) ... --+ Gk --+ G 1 .Since G 1 precedes G 2 , there exists a number k such that Sk(G1k) < Sk(G 2 k).Because the serialization orders are compatible, we have Sp(G 1p ) < Sp(G 2p ) for any p from 1 to n. Shnilarly, Sp(G2p) < Sp(Gap ) for any p.Because the '<' relation is transitive, we conclude that Sp(G1p) < Sp(G 1p ) for any p, which is a contradiction 2 • In other words, if the serialization orders are compatible, there should be no cycle in the SaG.0 ({=) If the SaG is acyclic, a total order on r;; which satisfies the condition in definition 3.2 exists and can be obtained by using topological sort on the SOG graph[Deo74]; therefore, the serialization orders are compatible.0 i ,2' we have Sk(Li,t} < Sk(Li,2), and so on.Finally we have Sk(Gik) < Sk(Gi+t,k).Similarly, we have Ski (Gi,kJ < SkI (Gi+t,kJ, Sk2(Gi+l,k 2 ) < S'\:2(Gi+2,k 2 ), ••• , Skn_I(Gn-I,kn_l) < Skn_I(Gn,kn_l) and Skn(Gn,k n ) < Skn(G1,k n ) for some k t , k 2 , ... ,k n E (1, ..., n).In other words, we have a cycle in the corresponding SOG of the history.By lemma 3.1, the serialization orders are not compatible, which contradicts condition 2 of the theorem.In conclusion, we have shown that jf the global history is not serializable, then the two conditions of the theorem do not hold.That is, if the two conditions in the theorem hold then the global history is serializable.