Manual offsets in Kafka Consumers Example

The consumer code in Kafka Producer And Consumer Example so far auto-commits records every 5 seconds. Now let’s update the consumer to take a third argument that manually sets your offset consumption. If you use the value of the last argument equal to 0, the consumer will assume that you want to start from the beginning,…

Partitioning in Kafka Example

DefaultPartitioner is good enough for most cases for sending messages to each partition on a round robin basis to balance out the load. But if you want to control which partition your messages are sent to you need to implement a custom partitioner instead. For this example, let’s assume that we have a retail site that consumers can…

Kafka Producer And Consumer Example

A simple producer/consumer application The Kafka producer will retrieve user input from the console and send each new line as a message to a Kafka server. The consumer will retrieve messages for a given topic and print them to the console. The producer and consumer components in this case are your own implementations of kafka-console-producer.sh…

Kafka

Apache Kafka is messaging system built to scale for big data. Similar to Apache ActiveMQ or RabbitMq, Kafka enables applications built on different platforms to communicate via asynchronous message passing. But Kafka differs from these more traditional messaging systems in key ways: It’s designed to scale horizontally, by adding more commodity servers. It provides much…

Spark Machine Learning Example

Spark Machine Learning Application Machine Learning application using classification technique, specifically collaborative filtering method, to predict the movies to recommend to a user based on other users’ ratings on different movies. Our recommendation engine solution will use Alternating Least Squares (ALS) machine learning algorithm. Even though the data sets used in the code example in…

Spark Streaming Example

Spark Streaming Application This example illustrates a web server log analytics use case to show how Spark Streaming can help with running analytics on data streams that are generated in a continuous manner. These log messages are considered time series data, which is defined as a sequence of data points consisting of successive measurements captured…

Spark

Spark gives us a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data). Spark enables applications in Hadoop clusters to run up to 100 times faster in…