Skip to main content
Skip table of contents

Kafka record extractor - User Guide


Created , Updated


The Kafka record extractor deploys a kafkacat container on Docker. It extracts SMF records, filtered by a configured time interval, from an Apache Kafka cluster. In some troubleshooting cases, support teams may request that an extract of SMF records. The following instructions help users to install, configure and run the Kafka record extractor.

Step 1: Prerequisites

A Linux distribution with Docker installed is required.

Step 2: Installation

New users should contact technical support to receive their access credentials.

Download the kafka-record-extractor.tar archive from our repository.

Extract the archive:

CODE
tar -xvf kafka-record-extractor.tar

Step 3: Configuration

Customize the properties in the settings.config configuration file:

Configuration file syntax ...
  • The character # can be used at the beginning of a line to comment out that line.

  • Colons are used as delimiters to distinguish between keys and values, for example:

    CODE
    KAFKA_HOST_IP: 172.17.0.1
  • Colons may not be used as part of a key nor a value.

  • A value can be represented as a bash command by enclosing them with $(<command>), for example:

    CODE
    KAFKA_HOST_IP: $(ip -4 addr show docker0 | grep -Po 'inet \\K[\\d.]+')
  • Any leading and/or trailing spaces in keys and values are trimmed during processing

The date system command formats the string in START_TIMEDATE and END_TIMEDATE into a corresponding timestamp. Click here for more information about the supported date string formats.

KEY

EXAMPLE VALUE

DESCRIPTION

KAFKA_HOST_IP

127.0.0.1

localhost

The Kafka cluster IP address or hostname

KAFKA_PORT

9092

The active port listening for connections to the Kafka cluster.

KAFKA_TOPIC

smf

The existing topic that contains the SMF records.

START_TIMEDATE

2021-04-12 20:52:00 UTC

Earliest time a record arrived in the cluster which should be extracted. Forms a time interval with END_TIMEDATE.

END_TIMEDATE

2021-04-12 21:52:00 UTC

Latest time a record arrived in the cluster which should be extracted. Forms a time interval with START_TIMEDATE.

MAX_RECORDS_TO_EXTRACT

500

The maximum number of records to be extracted.

ARCHIVE_RECORDS

true

Flag determines whether extracted records must be archived into gzipped tarballs.

RECORDS_PER_ARCHIVE

100

Only processed if ARCHIVE_RECORDS is true.

The maximum number of records per archive.

When set to 0, all extracted records wil be archived into a single gzipped tarball.

Step 4: Run

The properties must be configured correctly before running the Kafka record extractor

Run the bash script extractor.sh that is located in the extracted root folder:

BASH
./extractor.sh

Extracted records or gzipped archives are stored in the sub folder /extract/records.

Naming Convention

  • Record: <KAFKA_TOPIC>-<TOPIC_PARTITION>-<PARTITION_OFFSET>_<DATE>_<TIME>_UTC.smf

  • Archive: records-<DATE>_<TIME>_UTC-x<ARCHIVE_NUMBER>.tar.gz

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.