To understand AWS Kinesis, first we need to understand, what streaming data is?
Streaming data is records generated by thousands of data sources sending data simultaneously and in small sizes. Few examples of real time streaming data are:
- Purchases from online stores i.e. Amazon.com etc.
- Stock Exchanges & Trading Systems where the stock prices fluctuate in real time.
- Online Gaming
- Social Networking Data i.e. visiting facebook.com, Instagram.com etc.
- Geospatial data i.e. the one used by Uber & OLA to track the cab locations in real time.
Amazon Kinesis is a service hosted on AWS Cloud which acts as the landing point for all your streaming data. It is used to capture, store, and process data from large, distributed streams such as event logs and social media feeds. After processing the data, Kinesis distributes it to multiple consumers simultaneously.
TYPES OF AWS KINESIS SERVICES:
There are three different types of Kinesis Services as below:
- Kinesis Streams
- Kinesis Firehose
- Kinesis Analytics
- Kinesis streams consists of shards
- Shards provide 5 transactions per second for reads, up to a maximum total data read rate of 2MB per second and up to 1,000 records per second for writes up to a maximum total data write rate of 1MB per second.
- The data capacity of your stream is a function of the number of shards that you specify for the data stream. The total capacity of the Kinesis stream is the sum of the capacities of all shards.
In Kinesis streams you have data persistency for 24 hrs to 7 days.
Producers produce the data which is then sent to the kinesis stream and is stored in the shards. By default data retention in shards is 24 hrs. which can be increased up to 7 days. Once the data is stored in shards, then you have EC2 instances which are known as consumers. They take the data from shards and turned it into useful data. Once the consumers have performed its calculation, then the useful data is moved to either of the AWS services, i.e. Dynamo DB, S3, and Redshift.
Kinesis Firehose is a service used for delivering streaming data to destinations such as Amazon S3, Amazon Redshift, and Amazon Elastic search.
One major difference between Kinesis Streams and Kinesis Firehose is that, in kinesis streams you will have to manage the resources manually while in Kinesis Firehose all the resources are fully automated using the lambda function.
Kinesis firehose can send data only to S3 and Redshift while Kinesis stream can send data to a number of more AWS services. In kinesis firehose there is no data retention like Kinesis streams hence the data is either analyzed or sent to S3 directly.
The data received from the producer is after analysis is sent either to S3 directly or to redshift via S3.
Kinesis Analytics is a service of Kinesis in which streaming data is processed and analyzed using standard SQL.
In Amazon kinesis Analytics you can run SQL queries to store data in S3, Redshift or Elastic search cluster
AWS Kinesis Pricing
Kinesis Data Streams refers to pay as you go model. Its straightforward billing, with no upfront or one time minimal fees – Only pay for the resources which have been used. Pricing considers two important parameters –
- Shard Hour – It is the base throughput unit of an AWS Kinesis data stream
- PUT Payload Unit – PUT Payload Unit is calculated in terms of 25KB payload (chunks) that comprises a record.
Other optional parameters are
- enhanced fan-out
- extended data retention
- long-term data retention.
Another best thing AWS has done is created a Pricing calculator which helps give cost associated with the service (in this case AWS Kinesis).
Here is the link to AWS Pricing Calculator