Apache Kafka

Apache Kafka
A Distributed Streaming Platform

About Kafka

Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java

Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.

PUBLISH & SUBSCRIBE
PROCESS
STORE

Kafka Topics

The following are the things covered under Kafka.

Messaging

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc).

Website Activity Tracking

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds.

Metrics

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

Log Aggregation

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing.

Stream Processing

Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing.

Event Sourcing

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

Course Contents

The following are the course contents offered for Kafka

Understanding the principles of messaging systems
Understanding messaging systems
Peeking into a point-to-point messaging system
Publish-subscribe messaging system
Advance Queuing Messaging Protocol
Using messaging systems in big data streaming applications

Kafka origins
Kafka's architecture
Message topics
Message partitions
Replication and replicated logs
Message producers
Message consumers
Role of Zookeeper

Kafka producer internals
Kafka Producer APIs
Producer object and ProducerRecord object
Custom partition
Additional producer configuration
Introduction
Use Cases
Architecture
Components of Kafka - Broker, Producer, Consumer, Topic, Partition
Ecosystem
Kafka vs Flume

First Things First
Installing a Kafka Broker
Broker Configuration
General Broker
Topic Defaults
num.partitions
log.retention.ms
log.retention.bytes
log.segment.bytes
log.segment.ms
message.max.bytes
Hardware Selection
Kafka in the Cloud
Kafka Clusters
How Many Brokers
Broker Configuration
Operating System Tuning
Virtual Memory
Disk
Networking
Production Concerns
Garbage Collector Options
Datacenter Layout
Colocating Applications on Zookeeper
Getting Started With Clients

Zookeeper
Single node kafka
Hands-On - Setting Up
Multi node kafka
Hands-On - Multi Node Setup
Console Producer & Console Consumer
Hands-On - Producer & Consumer
High Availability & Performance

Producer overview
Constructing a Kafka Producer
Sending a Message to Kafka
Serializers
Custom Serializers
Serializing using Apache Avro
Using Avro records with Kafka
Partitions
Configuring Producers
acks
buffer.memory
compression.type
retries
batch.size
linger.ms
client.id
max.in.flight.requests.per.connection
timeout.ms and metadata.fetch.timeout.ms
Old Producer APIs

Performance tuning
Serialization
Message Delivery Semantics
Replication
Log Compaction
Quotas
Hands-On

KafkaConsumer Concepts
Consumers and Consumer Groups
Consumer Groups - Partition Rebalance
Creating a Kafka Consumer
Subscribing to Topics
The Poll Loop
Commits and Offsets
Automatic Commit
Commit Current Offset
Asynchronous Commit
Combining Synchronous and Asynchronous commits
Commit Specified Offset
Rebalance Listeners
Seek and Exactly Once Processing
But How Do We Exit?
Deserializers
Configuring Consumers
fetch.min.bytes
fetch.max.wait.ms
max.partition.fetch.bytes
session.timeout.ms
auto.offset.reset
enable.auto.commit
partition.assignment.strategy
client.id
Stand Alone Consumer - Why and How to Use a Consumer without a Group
Older consumer APIs

Cluster Membership
Replication
Request Processing
Produce Requests
Fetch Requests
Other Requests
Physical Storage
Partition Allocation
File Management
File Format
Indexes
Compaction
How Compaction Works
Deleted Events
When Are Topics Compacted

Broker Configs
Hands-On
Producer Configs
Consumer Configs
Consumer groups
Hands-On

API Design
Producer and Consumer APIs (Java)
Hands-On Producer & Consumer API
Message format
Log
Hands-On

Managing Topics
Decommissioning nodes
Data mirroring
Data centers and Racks
Monitoring
Security
Authorization and ACL
REST API
Hands-On

Overview
Confluent Platform vs Apache Kafka
Kafka Streams
Kafka Connectors
Confluent Platform Hands On Usecases

Millions of Messages per second
How to Handle with Kafka?
IoT HandsOn Usecase
Kafka with Spark
Hands-On
Kafka with Flume (for Hadoop/Hbase/Hive)
Hands-On

IoT Realtime Streaming Data via Kafka
Using Kafka in Big Data Applications
Managing high volumes in Kafka
Appropriate hardware choices
Producer read and consumer write choices
Kafka message delivery semantics
At least once delivery
At most once delivery
Exactly once delivery
Big data and Kafka common usage patterns
Kafka and data governance
Alerting and monitoring
Useful Kafka matrices
Producer matrices
Broker matrices
Consumer metrics

An overview of securing Kafka
Wire encryption using SSL
Steps to enable SSL in Kafka
Configuring SSL for Kafka Broker
Configuring SSL for Kafka clients
Kerberos SASL for authentication
Steps to enable SASL/GSSAPI - in Kafka
Configuring SASL for Kafka broker
Configuring SASL for Kafka client - producer and consumer
Understanding ACL and authorization
Common ACL operations
List ACLs
Understanding Zookeeper authentication
Apache Ranger for authorization
Adding Kafka Service to Ranger
Adding policies
Best practices

Latency and throughput
Data and state persistence
Data sources
External data lookups
Data formats
Data serialization
Level of parallelism
Out-of-order events
Message processing semantics
Integrating Kafka with Streaming Applications

Introduction to Kafka Streams
Using Kafka in Stream processing
Kafka Stream - lightweight Stream processing library
Kafka Stream architecture
Integrated framework advantages
Understanding tables and Streams together
Maven dependency
Kafka Stream word count
KTable
Use case example of Kafka Streams

Managing high volumes in Kafka
Appropriate hardware choices
Producer read and consumer write choices
Kafka message delivery semantics
At least once delivery
At most once delivery
Exactly once delivery
Big data and Kafka common usage patterns
Kafka and data governance
Alerting and monitoring
Useful Kafka matrices
Producer matrices
Broker matrices
Consumer metrics
Securing Kafka
An overview of securing Kafka

List ACLs
Understanding Zookeeper authentication
Adding policies
Best practices
The Confluent Platform
Introduction
Installing the Confluent Platform
Using Kafka operations
Using the Schema Registry
Using the Kafka REST Proxy
Using Kafka Connect
Using Kafka with Confluent Platform
Introduction to Confluent Platform
Deep driving into Confluent architecture
Understanding Kafka Connect and Kafka Stream

Kafka Streams
Moving Kafka data to HDFS
Gobblin architecture
Kafka Connect
Flume
Apache Kafka Connect API
Kafka JDBC Connecto
Kafka ElasticSearch Connector
Spark Streaming with Kafka IOT Use-case Demo

Download

Download Apache Kafka course plan

WeCanDoNow

Apache Kafka
A Distributed Streaming Platform

About Kafka

Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java

Kafka Topics

Messaging

Website Activity Tracking

Metrics

Log Aggregation

Stream Processing

Event Sourcing

Course Contents

Download

WeCanDoNow

Useful Links

Contact Us

Apache Kafka A Distributed Streaming Platform

About Kafka

Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java

Messaging

Website Activity Tracking

Metrics

Log Aggregation

Stream Processing

Event Sourcing

Download

Apache Kafka
A Distributed Streaming Platform