Distributed Database
A distributed database system is a database which is stored in various storage devices which are neither physically connected to each other nor they are connected to a common Central Processing Unit. Distributed database system is a collection of data with different parts under the control of separate DBMS running on different computers. These computers are connected together to communicate with one another through various communication techniques. These system do not share the common memory, but communicates among themselves.
Types of distributed database
Distributed database are of two types.
(a) Homogeneous distributed database :In a homogeneous distributed database. All sites have identical database management system software, are aware of one another, and agree to cooperate in processing user’s requests. In such a system local sites surrender a portion of their autonomy in terms of their right to change schema or database management system software.
(b) Heterogeneous distributed database: In a heterogeneous distributed database, different sites may use different schema, and different database management system software. The sites may not be aware of one another, and they may provide only limited facilities for cooperation in transaction processing.
Data Replication
If relation ‘r’ is replicated, a copy of relation r, is stored in two or more sites. In the most extreme cases, we have full replication, in which a copy is stored in every site in the system.
Merits of Distributed Database
The merits of distributed database are:
(a) Distributed DBMS allows each system to store and maintain its own database, causing immediate and efficient access to data.
(b) It allows to access the data stored at remote sites. At the same time users can retain the control to its own site to access the local data.
(c) If one sub-system is not working due to any reason, the system will not be down because the other sites of the network can possibly continue functioning.
(d) New sub-systems can be added to the system any time with no or little efforts.
(e) If a user needs to access the data from multiple sub-systems then the desired query can be subdivided into sub queries and the parts evaluated in parallel.
Demerits of distributed database systems
Demerits of distributed database systems are:
(a) Complex software is required for a distributed database environment.
(b) The various sites must exchange messages and perform additional calculations to ensure proper coordination among the sites.
(c) A by-product of the increased complexity and need for coordination is the additional exposure to improper updating and other problems of data integrity.
(d) If the data is not distributed properly according to their usage, or if queries are not formulated correctly, response to requests for data can be extremely slow.
How might a distributed database designed for a local area network differ from one designed for a wide area network?
Data transfer in a local area network is much faster than on a wide area network. Thus replication and fragmentation will not increase through put and speed up on a LAN, as much as in a WAN. But even in a LAN, replication has its uses in increasing reliability.
Types of failure in a distributed system
The types of failure that can occur in a distributed system include
(a) Computer failure
(b) Disk failure
(c) Communication failure
Difference between data replication in a distributed system and the maintenance of a remote backup site
In remote backup systems all transactions are performed at the primary site and the data is replicated at the remote backup site. The remote backup site is kept synchronized with the updates at the primary site by sending all log records. Whenever the primary site fails, the remote backup site takes over processing. The distributed systems offer greater availability by having multiple copies of the data at different sites whereas the remote backup systems offer lesser availability at lower cost and execution overhead. In the distributed system, transaction code runs at all the sites whereas in a remote backup system it runs only at the primary site. The distributed system transactions follow two phase commit to have the data in consistent state whereas a remote backup system does not follow two phase commit and avoids related overhead.
Difference between parallel and distributed database
The logical distribution of data into number of divisions, departments, projects of an organisation and physically distributed into offices, factories, where all individuals maintain their related data. This sharing policy of data over a network to improve the efficiency of data access is possible through the Distributed Database System. Introducing the use of relational queries facilitates parallel execution of task by different users that results in pipelined parallelism, which ultimately improves the overall performance of the Database system. Parallel Database System provides this efficiency to the overall process of data by different users. A parallel database aims at the linear increase in the performance of a particular database and also the processing and storage power, whereas sharing data is the key of a distributed database.
Pros and Cons of Data replication
There are a number of pros and cons of replication.
(a) Availability: If one of the sites containing the relation ‘r’ fails, then the relation can be found in another site. Thus, the system can continue to process queries involving r, despite the failure of one site.
(b) Increased parallelism: In the case, where the majority accesses to the relation ‘r’ result in only the reading of the relation, then several sites can process queries involving ‘r’ in parallel.
(c) Increased overhead on update: The system must ensure that all replicas of a relation ‘r’ are consistent, the update must be propagated to all sites containing replicas. The result is increased overhead.
Related Links
Database Management System
Oureducation.in is the best source for your learning
Tell us Your Queries, Suggestions and Feedback