Abstract | U ovom je radu obrađen problem održavanja konzistentnosti u distribuiranim bazama podataka. U prvom dijelu rada analizirane su komunikacije u distribuiranim bazama podataka s gledišta konzistentnosti baza te utjecaj particioniranja mreže na obradu transakcije u distribuiranoj bazi. Nadalje, s gledišta baze, opisana su ACID svojstva transakcije, te su dani različiti pogledi na vrste transakcije.
Napravljena je analiza primarne i izvedene horizontalne fragmentacije. Opisane su prednosti i nedostaci horizontalne fragmentacije te je sve ilustrirano opsežnim primjerom. Napravljena je analiza vertikalne fragmentacije.
Analizirana je obrada upita. Raščlanjen je problem troškova upita. Uspoređene su tehnike spajanja relacija i predložen je odabir tehnike za određene uvjete. Napravljena je usporedba statičkih i dinamičkih optimizatora upita. Pokazano je kroz primjer kako strategije upita određuje brzinu, odnosno troškove obrade upita. Opisana je obrada upita u modernom WAN okruženju kroz Mariposa eksperimentalan sustav distribuiranih baza podataka.
Repliciranje podataka u distribuiranim bazama ima vrlo značajnu ulogu. Replicirajući podatke na više čvorova transakcije imaju brži pristup lokalnim kopijama podataka. U ovom je radu prikazna analiza i načinjena usporedba eager i lazy metoda replikacija. Eager replikacija omogućuje konzistentnost podataka na vrlo direktan način. Štoviše, koristeći pristup “ažuriranja svugdje” (update everywhere) sve vrste transakcija mogu biti izvršene na bilo kojem čvoru bez ograničenja. Usprkos tim karakteristikama, eager “ažuriranje svugdje” se rijetko gdje koristi u praksi iz razloga što sadašnja rješenja imaju velika ograničenja u smislu učinkovitosti i složenosti. Nadalje, u radu je obrađena kontrola istovremenosti izvršavanja transakcija. Analizirani su algoritmi za kontrolu istovremenosti: dvofazni sustav zaključavanja (2PL) i vremenski žigovi (BTO) te je napravljena njihova usporedba. Dana je analiza potpunog zastoja (deadlock), te su uspoređene metode njegova otkrivanja i razrješenja.
Obrađen je standardni dvofazni protokol potvrđivanja (2PC), koji se koristi kao standard za održavanje konzistentnosti u komercijalnim bazama podataka, zatim su opisane njegove inačice PrC, PrA, PrAny, DPr2PC, te također i njihovi nedostaci i način rada. Opisan je i analiziran 3PC protokol, kao i relativno novi optimistični protokol potvrđivanja zvan PROMPT . Nakon toga je opisan jednofazni Early Prepare protokol (EP) koji je ujedno i uspoređen sa PROMPT protokolom. Potom opisan PEP protokol koji je nastao kao kombinacija prijašnja dva protokola. Na kraju je analiziran novi protokol 1-2PC koji je zanimiljiv zbog mogućnosti rada kao jednofazni, ali kao i dvofazni protokol. |
Abstract (english) | The thesis presents the problem of maintaining consistency in distributed database systems. It first compares new communications technologies with regard to distributed database consistency. Then it considers partitioning of communications network and provides the consequences such partitioning bears on processing of transactions in distributed database systems. Also, from the viewpoint of distributed database transactions, it presents ACID properties and attempts to further classify such transactions.
The paper further analyses primary and derived horizontal fragmentation. Through an elaborate example the benefits and disadvantages of horizontal fragmentation have been described. An analysis of vertical fragmentation is presented as well.
Next, query processing is analyzed. The concept of cost of a query is introduced. Next, a comparison between join relations is made, and related to it, a recommendation is made what technique would be preferred, given the circumstances. At that point, a comparison between static and dynamic query optimizers is presented. It is shown how different strategies impact speed and cost of processing of a query through a real-world example. Furthermore, query processing has been illustrated through Mariposa distributed database management system.
Data replication in distributed database systems is an important topic. By replicating data across the sites, transactions have fast access to local copies. In this paper an analysis and comparison has been made between “eager” and “lazy” replication methods. “Eager” replica control provides data consistency in a straightforward way. Furthermore, via an “update everywhere” approach, all types of transactions can be submitted at any site without restrictions. Despite these characteristics, “eager update everywhere” replication is rarely used in commercial systems since existing solutions have severe disadvantages in terms of performance and complexity. Next, concurrency control algorithms are presented and compared: Two-Phase Lock (2PL) and Timestamps (BTO). Also, database deadlock is analyzed, together with the workable methods of its detection and resolution.
Methods for preserving consistency of a distributed database are presented through Two-Phase Commit Protocol (2PC), which is analyzed for its strenghts and weaknesses. Similar analysis has been done for other consistency preserving protocols: PrC, PrA, PrAny, DPr2PC and 3PC, as well as for newest additions to the family - PROMPT protocol. Early Prepare (EP) protocol is presented and compared against PROMPT. PEP protocol, which has been developed as a combination of the two protocols (EP and PROMPT), is presented as well. At the end, recently developed protocol 1-2PC is analysed. This protocol is interesting due to its dual mode of operation – it can function either as one-phase or two-phase protocol. |