AWS Managed Streaming for Apache Kafka : Streaming messages from producer to consumer using Amazon MSK and create an event source to msk using Lambda

Gargee Bhatnagar
Gargee Bhatnagar
Published in
9 min readDec 11, 2021

--

by Gargee Bhatnagar | on 11 DECEMBER 2021 | in Amazon EC2, Security, Identity and Compliance | Permalink | Comments | Share

“Challenges faced to find the solution of how to stream the messages in the client machine”. I have found Apache Kafka to do the same activity. But as we all know that with streaming of messages, we should have high availability, compatibility and security as well if we are deploying the solution on Cloud. So I have chosen the solution as AWS Managed Streaming for Apache Kafka which is a highly available and secure service in AWS. As per cost perspective, it is cheap service which charges based on Kafka storage and broker hours. Also able to add MSK as an event source as trigger in lambda to get log streams in cloudwatch.

Amazon MSK is a fully managed, highly available, and secure service that makes it easy for developers and DevOps managers to run applications on Apache Kafka in the AWS Cloud without needing Apache Kafka infrastructure management expertise. Amazon MSK operates highly available Apache Kafka clusters, provides security features out of the box, is fully compatible with open-source versions of Apache Kafka allowing existing applications to migrate without code changes, and has built-in AWS integrations that accelerate application development. To learn more, read the Amazon MSK.

Apache Kafka (Kafka) is an open-source platform that enables customers to capture streaming data like clickstream events, transactions, IoT events, application and machine logs, and have applications that perform real-time analytics, run continuous transformations, and distribute this data to data lakes and databases in real time. You can use Kafka as a streaming data store to decouple applications producing streaming data (producers) from those consuming streaming data (consumers).

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. With Lambda, you can run code for virtually any type of application or backend service — all with zero administration. Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute execution power and runs your code based on the incoming request or event, for any scale of traffic. You can set up your code to automatically trigger from 140 AWS services or call it directly from any web or mobile app. You can write Lambda functions in your favorite language (Node.js, Python, Go, Java, and more) and use both serverless and container tools, such as AWS SAM or Docker CLI, to build, test, and deploy your functions. To learn more, read the AWS Lambda.

In this post, you will get to know how to stream messages from producer to consumer using Amazon MSK and create an event source to msk using Lambda. Here I have used managed streaming for Apache Kafka to stream messages from producer to consumer and also created a trigger as an event source for msk in lambda to get record of messages as log streams in cloudwatch.

Prerequisites

You’ll need an Amazon EC2 Server for this post. Getting started with amazon EC2 provides instructions on how to launch an EC2 Server.

You’ll also need a VPC with two public and private subnets. Getting started with VPC provides instructions on how to create a VPC, Private and Public Subnet, Internet Gateway, NAT Gateway and Route Table. For this blog, I assume that I have two ec2 servers and a VPC created.

Architecture Overview

The architecture diagram shows the overall deployment architecture with data flow, ec2 server, Amazon VPC, Amazon Managed Streaming for Kafka, AWS Lambda and Amazon CloudWatch.

Solution overview

The blog post consists of the following phases:

  1. Create a MSK Cluster with custom created VPC
  2. Installation of Apache Kafka client libraries and tools on the client machine (ec2 servers) and create of topic
  3. Setting up lambda function that uses the Amazon MSK Cluster with topic as an event source and create a IAM role with required permissions
  4. Testing output with consumer server from producer server and checking for log streams of MSK streams in cloudwatch

I have a VPC and two ec2 server as below →

Phase 1: Create a MSK Cluster with custom created VPC

  1. Open the console, click on create cluster named MSk-Cluster. Select the Kafka version and choose msk default configuration option. In the networking section, select the custom vpc created earlier, number of zones as 2, zones and its private subnets, security group.

2. In the broker section, select the broker type as required and number of brokers per zone. In storage, input the ebs storage required. In the security section, select TLS and plaintext both methods. And leave other settings as default and you can customize the other settings if required as well.

3. Once the cluster gets created, we can view the client information as bootstrap servers and Apache zookeeper connection. Also able to see the cluster created configurations and broker details.

Phase 2: Installation of Apache Kafka client libraries and tools on the client machine (ec2 servers) and create of topic

  1. Connect the producer and consumer server via ssh. Install java and Kafka client libraries. Also configured cacerts as certificates in server for Kafka client. And create a topic using topic name and replication factor.

Phase 3: Setting up lambda function that uses the Amazon MSK Cluster with topic as an event source and create a IAM role with required permissions

  1. Create IAM role with required managed policies and Create a lambda function as MSK-Function with existing IAM role. Add a trigger in lambda with cluster name and topic. Also deployed code for getting records in cloudwatch as in log streams. And also configured VPC in lambda as with custom vpc, private subnets and custom security group.

Phase 4: Testing output with consumer server from producer server and checking for log streams of MSK streams in cloudwatch

Clean-up

Delete AWS Managed Streaming for Apache Kafka, AWS Lambda, EC2, VPC, IAM, Cloudwatch Log group.

Pricing

I review the pricing and estimated cost of this example.

Cost of Managed Streaming for Apache Kafka →

USD 0.221 per broker hour for Kafka.m5.large:RunBroker = 12.487 hours = $2.76

USD 0.114 per GB-Month for Kafka.Storage.GP2:RunVolume = 1.678 GB-Mo = $0.19

Total = ($2.76+$0.19) =$2.95

Cost of EC2 = $0.43 per 17.455 Hrs + $0.01 per 0.122 GB-Mo = $0.44

Cost of Lambda = $0.0

Cost of Data Transfer = $0.0

Cost of Key Management Service = $0.0

Cost of Cloudwatch = $0.0

Cost of Simple Notification Service = $0.0

Total Cost = $(2.95+0.44+0.0+0.0+0.0+0.0+0.0) = $3.39

Summary

In this post, I showed “how to do streaming of messages from producer to consumer using Amazon MSK and create an event source to msk using Lambda”.

For more details on AWS Managed Streaming for Apache Kafka, Checkout Get started AWS Managed Streaming for Apache Kafka, open the AWS Managed Streaming for Apache Kafka console. To learn more, read the AWS Managed Streaming for Apache Kafka documentation. For more details on AWS Lambda, Checkout Get started AWS Lambda, open the AWS Lambda console. To learn more, read the AWS Lambda documentation.

Thanks for reading!

Connect with me: Linkedin

--

--

Gargee Bhatnagar
Gargee Bhatnagar

DevOps Engineer and AWS Solution Architect in Electromech Corporation