Apache NiFi Guided Tour, part 1

Russell Bateman
February 2023
last update:


This guide will teach how to...

Practical requirements include...

Table of Contents

Introduction
Setting up Apache NiFi
Creating a NiFi Flow
Run the Flow
 
Guided Tour, part 2
Guided Tour, part 3

Apache NiFi is a software project designed to automate the flow of data between sofware systems leveraging the concept of extract, transform and load (ETL). This software, originally developed by the NSA, was called, "Niagara Files," the metaphore being a multitude of data (in files) flowing over a waterfall.

Apache NiFi offer myriad standard and special-purpose processors that broadly accomplish nearly any process related to the extraction from, transformation of or loading of data into other widely disparate systems (database, queue management, etc.).

When what NiFi offers isn't enough, it's possible and reasonably easy to write a custom processor to use in combination with NiFi's standard array. This is done in Java, using any IDE, to generate the custom processor as a "NiFi archive" or NAR (compare TAR, JAR, WAR, etc.).

To get started, download and set up Apache NiFi...

Setting up Apache NiFi

  1. Download and install NiFi 1.19.1 locally from Apache NiFi Downloads. Use the Apache NiFi Binary 1.19.1 which will result in the artifact nifi-1.19.1-bin.zip. Put this artifact someplace such as /home/user/dev/nifi and expload it there.

  2. Follow the instructions at How to get NiFi to work (unsecurely) as before.... This will obviate the need to set up users, certificates, etc.

  3. In the above, depending on port-number usage on your development host, you will need to choose a port number that suits you. In the example, this is 9999; leave it the same or reassign it to a different (valid) port number if you like, but you'll need to remember what it is for a browser URL later.

  4. Edit this same file (conf/nifi.properties), find the line containing nifi.nar.library.directory=./lib, then add the following line after it:
    nifi.nar.library.directory.custom=./custom-lib
    

  5. Once set up, launch NiFi using this command:
    ~/dev/nifi/nifi-1.19.1/bin $ ./nifi.sh start
    
  6. Launch a new tab in any browser to this address (substituting your port number):
    http://localhost:port-number/nifi
    

Setting up a NiFi flow

  1. Minimize the Navigation and Operation palettes (because they're useless to us in this exercise and take up real estate).

  2. In the toolbar at the top of the NiFi canvas, click (the first tool icon from the left) and drag a new processor down onto the canvas. A dialog will open. In the Filter edit field, type "GenerateFlowFile" then click Add.

  3. Configure GenerateFlowFile by right-clicking on it, choosing Configure, then...
    1. In the Settings tab, change the Yield Duration to 60 sec.
    2. In the Scheduling tab, ensure Run Schedule is 1 min.
    3. In the Properties tab, configure the following:
      1. Custom Text: Type in the text below using SHIFT-ENTER to insert newlines:
        This is a test of the Emergency Broadcast System. This is only a test.
        The quick brown fox jumped over the lazy dog's back and got clean away.
        
      2. Important: leave all other property values at the defaults.
    4. Click the Apply button at the bottom right.

  4. Now create an instance of Wait on the canvas; there's no need to configure it because it's going to stop the flow (any processor would do as long as not activated). It's just a placeholder.

  5. Hover over GenerateFlowFile, click the circled-arrow icon produced (by the hover action) and drag it to the Wait processor. When a dialog appears, ensure that the success checkbox is checked, then click Add. An arc will appear connecting the two processors with a queue in between.

Run the Flow

  1. Start then stop GenerateFlowFile immediately by
    1. Right-click the processor and choose Start.
    2. Right-click the processor and choose Stop.
    3. In an unoccupied portion of your canvas, right-click and choose Refresh.
    4. You should see between GenerateFlowFile and Wait that one flowfile is in the queue.
    5. Right-click the success queue and choose List queue.

  2. Observe, at the extreme right end of the (single) flowfile listed that you can:
    1. Download the contents of the flowfile,
    2. examine its contents, or
    3. ponder its provenance, i.e.: how it was created and where it's been over its lifetime.
    4. (As you can imagine, provenance is a very useful tool in debugging problems in the flow of files through NiFi.)

  3. To the extreme left of the (queue list) window that appears, click the View Details control (a tiny dark circle with i in it). In the resulting dialog, click the View button. This opens a new browser tab displaying the contents of the flowfile you caused GenerateFlowFile to create. Close the new browser tab.

  4. Back in the FlowFile dialog, notice that, by clicking the Attributes tab, it's possible to inspect a flowfile's attribute metadata. For now, click the OK button to close.

  5. Close the window listing the flowfiles in the queue (click the X button at the upper right).

Write your own custom processor

This is part 2 of this guided tour. You won't need to write the processor code; it will be more an exercise in setting up a NAR project in IntelliJ IDEA. (If you prefer Eclipse, that can be done too, but there's not help forthcoming from me to do that—it's been a decade since I last used that IDE.)

Please see Apache NiFi Guided Tour, part 2: Setting up a NiFi custom-processor project.