In order to meet the workload needs, the Advantages of a Database Distribution are used, and they don’t require any changes to the database application or vertical scaling of any individual machines.
Throughput, latency, scalability, availability, fault tolerance, and many other difficulties that may occur while using a single machine and a single database are all resolved by distributed databases.
You’ll discover what the Advantages of Database Distribution are in this article, along with some of their benefits and drawbacks.
Define Distributed Database
A distributed database is made up of numerous interlinked databases that are dispersed over numerous locations linked via a network. Since all of the databases are linked together, users only see one database.
Multi-node distributed databases are used. They create a distributed system and grow it horizontally. The single point of failure problem is solved by adding more nodes to the system, which increases computational power and availability.
The processing demands are split among processors on various database nodes, and multiple components of the distributed database are physically stored in multiple places.
The distributed data is managed by a centralized distributed database management system (DDBMS), just as if it were kept in a single physical location. All data transactions between databases are synchronized by DDBMS, which also makes sure that changes made to one database are automatically reflected in databases at other locations.
Features of Distributed Databases
The following are some common traits of distributed databases:
- The independence of location – Data is managed by a separate DDBMS and physically stored at various locations.
- Processing queries across a network – In a distributed setting that manages data at various locations, distributed databases respond to queries. For easier management, high-level queries are converted into a query execution plan.
- Management of distributed transactions – Commit protocols, distributed concurrency control strategies, and distributed recovery approaches are used to provide a consistent distributed database in the event of numerous transactions and failures.
- Complete integration – A set of interconnected databases often represents a single logical database.
- Network tying – A collection of databases is connected by a network and exchanges information with one another.
- Processing transactions – Transaction processing, a program that combines one or more database processes, is a component of distributed databases.
In an atomic process, transaction processing can either be fully executed or not at all.
Types of Distributed Databases
Two categories of distributed databases exist:
- Homogenous
- Heterogeneous
1: Homogeneous
A network of identical databases kept at several locations makes up a homogeneous distributed database. The sites are easily controllable because they share an operating system, DDBMS, and data structure.
Users can readily access data from each database thanks to homogenous databases.
A homogenous database is exemplified in the diagram below:
2: Heterogeneous
Different operating systems, DDBMS, and data models are used in a heterogeneous distributed database.
A given site may be completely oblivious of other sites in the case of a heterogeneous distributed database, resulting in limited cooperation in handling user requests. Translations are necessary to establish communication across sites because of the constraint.
A heterogeneous database is exemplified in the diagram below:
Storage for distributed databases
There are two approaches to handling distributed database storage:
- Replication
- Fragmentation
1: Replication
Systems that use database replication keep copies of the data at many locations. A database is fully redundant if every single record is accessible from numerous locations.
Database replication has the benefit of increasing data accessibility across locations and enabling the handling of concurrent query demands.
To maintain an exact database copy, data must be updated often and synchronized with other sites, according to database replication. A site’s modifications must be reflected on other sites in order to avoid discrepancies.
Constant updates burden the server and make concurrency control more difficult because many concurrent inquiries must be checked across all accessible sites.
2: Fragmentation
The relations are fragmented, which means they are divided into smaller portions when it comes to distributed database storage. Each fragment is kept in a different location depending on where it is needed.
Making ensuring that the fragments can be afterward reassembled into the original relation without losing data is a need for fragmentation.
Data consistency is avoided by fragmentation since there are no duplicates of the data.
Two different types of fragmentation exist:
- Fragmentation on the horizontal – Each group (tuple) in the relation schema is assigned to a different fragment, which is divided into groups of rows.
- Vertical dispersion – To ensure a lossless join, the relation schema is divided into smaller schemas, each of which has a common candidate key.
Disadvantages & Advantages of a Database Distribution
The following are some significant benefits and drawbacks of distributed databases:
Advantages | Disadvantages |
Modular development | Costly software |
Reliability | Large overhead |
Lower communication costs | Data integrity |
Better response | Improper data distribution |
The following sections go into great detail about the benefits and drawbacks.
Advantages
- Development in modules
A distributed database that has been developed in a modular fashion can be expanded to new locations or units by adding new servers and data to the current configuration and seamlessly connecting them to the distributed system. Distributed databases continue to operate normally after this kind of expansion.
- Reliability
Comparatively speaking, distributed databases are more reliable than centralized ones. In the event of a centralized database failure, the entire system shuts down. When a failure occurs in a distributed database, the system continues to run, although with reduced performance, until the problem is fixed.
- Reduced communication expense
In distributed databases, local data storage lowers communication costs for data modification. Centralized databases do not allow for local data storage.
- Better Reaction
When user requests are satisfied locally, a distributed database system with efficient data distribution offers a quicker response. All user queries in centralized databases are handled by the same central computer. Response times lengthen as a result, especially when there are many queries.
Disadvantages
- Expensive software.
Using pricey software in a distributed database system is frequently necessary to provide data openness and collaboration across several sites.
- Massive Overhead
When database replication is employed, several activities across numerous sites necessitate numerous calculations and ongoing synchronization, adding a significant amount of processing overhead.
- Data reliability.
Data integrity, which is jeopardized by changing data at various sites, is a potential risk when employing database replication.
- Data distribution errors.
The effective data dissemination is a key factor in user request response. That implies responsiveness may suffer if data is not evenly spread across many sites.
Conclusion
Now that you’re aware of what distributed databases are and how they work,
Comparing distributed databases to centralized databases reveals many advantages. You should be able to select the appropriate database type for you after reading this article.
Read more: