Table of Contents
All modern technologies have one thing in common, they all generate large amount of big data, be it messaging applications, IoT devices, Machine learning, AI algorithms and so on. As data grows bigger and bigger the need for scalability becomes very crucial for applications which use them. Database scalability helps in adding or removing resources as needed based on changing demands of applications. The actual underlying issue here is how to query and retrieve data which is relevant among billions of data records?
In today’s topic we will learn about Database Sharding and Database Partitioning, their pros and cons, and how they differ from each other.
Database Sharding
Sharding is a subset of partitioning only. It is a specific type of horizontal partitioning to distribute data across several independent databases known as Shards. Each Shard is a self-contained unit in itself which holds a subset of data. It is useful in handling large databases used by high traffic applications and helps in distribution of data storage and query load across multiple systems.
Sharding splits data horizontally using a shard key across databases. The shard key determines which database will hold what portion of data. Shard key determines distribution of data based on user ID, geo location etc. Shard function implements algorithm-based shard key for each shard.
Advantages and Disadvantages of Database Sharding
PROS
- For read/write operations scalability is improved
- Smaller database sets or shards improve performance
- Availability is high as failure of one shard do not impact other
- Commodity hardware becomes a viable choice instead of high-end servers
CONS
- Database management and application logic, makes it complex to implement and manage
- Across shards Joins and transactions could be challenging to execute
- Changing Sharding scheme is difficult once implemented
Related: What is AMAZON RDS (Relational Database Service)?
Database Partitioning
Database partitioning as the name suggests ‘partition’ means divide or split a database into smaller databases. The partition is subset of data having a specific criterion. Partitioning improves performance, brings ease of management, spreading access load across smaller storage units.
Databases can be split vertically or horizontally. In vertical splitting you will have duplicate rows but different columns whereas in horizontal splitting columns or schema would be the same with different rows. In horizontal portioning the schema of one table is replicated to multiple smaller tables. Vertical portioning will divide table schema and data both accordingly.
Advantages and Disadvantages of Database Partitioning
PROS
- Partition pruning implementation help in query performance improvement
- Data management is easy such as archival of old partitions
- Availability is enhanced as partitions can be taken independently offline without impacting other
- Scalability is better for certain query types
CONS
- Complexity is increased in design and maintenance overhead
- Uneven data distribution issues if not implemented properly
- Querying across several partitions could be challenging
- Certain types of queries such as join across partitions have impact
Comparison: Database Sharding vs Database Partitioning
Features | Database Sharding | Database Partitioning |
Definition | Multiple database servers are involved in Sharding. Shards could be read replicas or partitioned or distributed across servers. | Partitioning is dividing data into smaller and manageable chunks within the same server. |
Features | Meant to handle large volume of read/write Expensive for cross shard queries | Improved Query performance for large data sets |
Complexity | Complex to implement, operational complexity | It is not that complex to implement as compared to Sharding |
When to use | * To reduce index size for performance improvement * Distribution of database across large number of machines for performance improvement * Data segmentation by geography due to regulatory requirements and cross border data transfer restrictions * Ideal for applications require horizontal scaling – where large number of read / write requests to be processed * Large scale applications such as e-commerce platforms | * Tables size more than 2 GB are always considered good candidate for partitioning * Tables having historical data where new data is added to new partition * Table data distribution required across variety of storage devices * Used mostly to increase SQL query performance * Simplify maintenance specific tasks such as backups, indexing etc. in large databases |
Download the comparison table: Database Sharding vs Database Partitioning
ABOUT THE AUTHOR
I am here to share my knowledge and experience in the field of networking with the goal being – “The more you share, the more you learn.”
I am a biotechnologist by qualification and a Network Enthusiast by interest. I developed interest in networking being in the company of a passionate Network Professional, my husband.
I am a strong believer of the fact that “learning is a constant process of discovering yourself.”
– Rashmi Bhardwaj (Author/Editor)