This unit is a Processor unit type.
The Kafka Sender unit writes input events into a Kafka queue.
An event comes in through the in port. A Kafka record is created and sent to a Kafka queue. Kafka API is asynchronous, meaning feedback about the success or error will not be provided immediately. Fill in the 'ackInfo' settings to be notified when feedback is in.
Successful events will be emitted through the send port, enriched with the fields defined in 'ackInfo'.
If an error occurs, the input event is enriched with new fields describing the problem, and the event is sent through the error port.
When an input event enters through the flush port a signal is sent to flush the Kafka producer. The event sent to the flushed output port without modification.
When an input event enters through the metrics input port, Kafka metrics are obtained. For every metric obtained, the input event is enriched with fields containing the metric name and value, and sent to the metrics output port.
After dragging this unit into the Flow canvas, double-click it to access its configuration options. The following table describes the configuration options of this unit:
|General||Name||Enter a name for the unit. It must start with a letter, and cannot contain spaces. Only letters, numbers, and underscores are allowed.|
|Kafka||Client ID||An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging.|
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
The client will make use of all servers, irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should use the following format:
Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
The number of acknowledgments the producer requires the leader to have received before considering a request as complete.
This controls the durability of records that are sent. The following settings are allowed:
0 - If set to zero, the producer will not wait for any acknowledgment from the server. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
Amount of time to wait (in seconds) before sending requests in the absence of a full batch size. Leave at 0 (default) for no delay.
The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However, in some circumstances the client may want to reduce the number of requests even under moderate load. This setting acomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record, the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together.
This can be thought of as analogous to Nagle's algorithm in TCP.
For example, entering a linger of 5ms would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absence of a load.
Once the batch size of records for a partition is fulfilled, it will be sent immediately regardless of this setting. However, if there are fewer than this many bytes accumulated for this partition, the unit will 'linger' for the specified time waiting for more records to come in.
Enter the default batch size in bytes to avoid batching records larger than this size. Enter 0 to disable batching entirely.
The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server.
A small batch size will make batching less common and may reduce throughput.
A very large batch size will use more memory to allocate a buffer of the specified batch size in anticipation of additional records.
Enter the total bytes of memory to use to buffer records waiting to be sent to the server.
If records are sent faster than they can be delivered to the server, the producer will block for maxBlocks and will throw an exception.
This setting should correspond roughly to the total memory, but is not a hardbound since not all memory is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests.
Enter the compression type for all data generated by the producer.
The default is none (i.e. no compression). Valid values are none, gzip, snappy, or lz4.
Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression).
|Max block||The max block controls how many seconds during which the KafkaProducer.send() and KafkaProducer.partitionsFor() will block. These methods can be blocked either because the buffer is full or metadata is unavailable. Blocking in the user-supplied serializers or partitioner will not be counted against this timeout.|
The number of retries on error.
Setting a value greater than zero will cause the client to resend any failed record with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error.
The maximum amount of time the client will wait for the response of a request (in seconds).
If the response is not received before the timeout elapses, the client will resend the request if necessary or fail the request if retries are exhausted.
There are static (unit level) and dynamic (event level) settings.
Everything that can be defined by a Kafka record can be a dynamic setting.
timestamp & key can be dynamic, or not sent.
value must always be dynamic.
|Topic||The topic for sent records.|
|Topic field||Enter the name of an input event field containing the topic for the sent record.|
|Partition||Enter the number of partitions for clusters of sent records.|
|Partition field||Enter the name of an input event field containing the partition value for the sent record.|
|Timestamp field||Enter the name of an input event field containing the timestamp value for the sent record.|
|Key field||Enter the name of an input event field containing a key for the sent record.|
|Value field||Enter the name of an input event field containing the value for the sent record.|
|Metric name field||Enter a name for the output event field containing the name of the metric.|
|Metric value field||Enter a name for the output event field containing the value of the metric.|
|Ack info||Checksum field||Enter a name for the output event field containing a checksum of sent value.|
|Partition field||Enter a name for the output event field containing the partition to which the record is assigned.|
|Offset field||Enter a name for the output event field containing the offset where the record is stored.|
|Timestamp field||Enter a name for the output event field containing the timestamp assigned.|
Events to be sent as Kafka records.
|flush||Events to request the Kafka producer to be flushed.|
|metrics||Events to request Kafka metrics.|
|send||Outputs events successfully sent to Kafka, enriched with the fields defined in the Ack info tab.|
|error||Outputs events that could not be sent to Kafka, enriched with standard error fields.|
|flushed||Outputs events that requested a flush, without any modification|
Outputs events that requested metrics, repeated once for every obtained metric, and enriched with fields for metric name and value.