What is Database Sharding? Database Sharding vs Database Partitioning

Rashmi Bhardwaj | Blog,Storage
Google ADs

All modern technologies have one thing in common, they all generate large amount of big data, be it messaging applications, IoT devices, Machine learning, AI algorithms and so on. As data grows bigger and bigger the need for scalability becomes very crucial for applications which use them. Database scalability helps in adding or removing resources as needed based on changing demands of applications. The actual underlying issue here is how to query and retrieve data which is relevant among billions of data records?

In today’s topic we will learn about Database Sharding and Database Partitioning, their pros and cons, and how they differ from each other. 

Database Sharding   

Sharding is a subset of partitioning only. It is a specific type of horizontal partitioning to distribute data across several independent databases known as Shards. Each Shard is a self-contained unit in itself which holds a subset of data. It is useful in handling large databases used by high traffic applications and helps in distribution of data storage and query load across multiple systems. 

Google ADs

Sharding splits data horizontally using a shard key across databases. The shard key determines which database will hold what portion of data. Shard key determines distribution of data based on user ID, geo location etc. Shard function implements algorithm-based shard key for each shard. 

Advantages and Disadvantages of Database Sharding 

PROS

  • For read/write operations scalability is improved
  • Smaller database sets or shards improve performance
  • Availability is high as failure of one shard do not impact other 
  • Commodity hardware becomes a viable choice instead of high-end servers

CONS

  • Database management and application logic, makes it complex to implement and manage
  • Across shards Joins and transactions could be challenging to execute 
  • Changing Sharding scheme is difficult once implemented 

Related: What is AMAZON RDS (Relational Database Service)?

Database Partitioning 

Database partitioning as the name suggests ‘partition’ means divide or split a database into smaller databases. The partition is subset of data having a specific criterion. Partitioning improves performance, brings ease of management, spreading access load across smaller storage units. 

Databases can be split vertically or horizontally. In vertical splitting you will have duplicate rows but different columns whereas in horizontal splitting columns or schema would be the same with different rows. In horizontal portioning the schema of one table is replicated to multiple smaller tables. Vertical portioning will divide table schema and data both accordingly.

Advantages and Disadvantages of Database Partitioning 

PROS

  • Partition pruning implementation help in query performance improvement 
  • Data management is easy such as archival of old partitions 
  • Availability is enhanced as partitions can be taken independently offline without impacting other 
  • Scalability is better for certain query types

CONS

  • Complexity is increased in design and maintenance overhead
  • Uneven data distribution issues if not implemented properly
  • Querying across several partitions could be challenging
  • Certain types of queries such as join across partitions have impact 

Comparison: Database Sharding vs Database Partitioning

FeaturesDatabase ShardingDatabase Partitioning
DefinitionMultiple database servers are involved in Sharding. Shards could be read replicas or partitioned or distributed across servers.Partitioning is dividing data into smaller and manageable chunks within the same server.
FeaturesMeant to handle large volume of read/write Expensive for cross shard queriesImproved Query performance for large data sets  
ComplexityComplex to implement, operational complexityIt is not that complex to implement as compared to Sharding
When to use* To reduce index size for performance improvement
* Distribution of database across large number of machines for performance improvement
* Data segmentation by geography due to regulatory requirements and cross border data transfer restrictions
* Ideal for applications require horizontal scaling – where large number of read / write requests to be processed
* Large scale applications such as e-commerce platforms
* Tables size more than 2 GB are always considered good candidate for partitioning
* Tables having historical data where new data is added to new partition
* Table data distribution required across variety of storage devices
* Used mostly to increase SQL query performance
* Simplify maintenance specific tasks such as backups, indexing etc. in large databases

Download the comparison table: Database Sharding vs Database Partitioning

ABOUT THE AUTHOR


Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart