Notes on Tuning Apache NiFi 
Russell Bateman
10 April 2023
last update:
 Random notes on tuning NiFi flows... 
-  Modernly, most discussions of flow tuning involve a NiFi cluster of nodes.
			I speculate that adapating the observations to a single-node installation
			is still useful. Setting up, managing and maintaining clustered NiFi is
			not the easiest thing to do. Graduating from single-node NiFi to a
			cluster is a quantum-sized leap.
			
-  Use the NiFi Summary UI to identify the number of threads used per
			processor. This is reached from the General (hamburger) menu,
			Summary. The summary is global (doesn't change scope just
			because you're currently down inside a process group). It provides tabs
			of summaries for (or by)
			
			-  Processors 
-  Input Ports 
-  Output Ports 
-  Remote Process Groups 
-  Connections (relationship arcs) 
-  Process Groups 
 
-  How to find the indices evoked in discussions of tuning (UI)?
			(Some of this works differently depending on whether
			clustered or not):
			
			-  thread-pool size
						         
						         
						  
					—
			
-  active-thread count
						         
						     
					—given as the leftmost figure on the second banner (top-level
					UI)
			
-  available cores
						         
						         
						    
						—see below in system diagnostics
			
-  core load average
						         
						        
						—see below in system diagnostics
			
-  number of concurrent tasks
					—set in
					Configure Processor → Scheduling → Concurrent Tasks
			
 
-  
			Cloudera: Turning your data flow, published 2019-06-26, last
			updated, 2022-12-13. This document discusses
			
			-  The size of flowfiles, large or small, can influence how to tune
						flows. There is nothing built-in to isolate flowfiles by size; most
						often, tuning should be done in a global way ensure no side effects
						between data flows.
			
-  The first and primary bottleneck in NiFi is disk I/O operations.
						Distribute especially the flowfile content, flowfile
						metadata, and, to a lesser degree, the provenance
						repositories to dedicated disk volumes. Obviously, SSDs will be far
						more performant than spindles.
			
-  Viewing various values in the flow
						
						-  Number of threads (in a cluster)
						
-  Number of active threads 
-  Number of cores 
-  Configuring thread-pool size 
 
-  Concurrent tasks 
-  Run duration 
-  Recommendations
						
						-  Adjust the thread-pool size to be based on the number of
									cores.
						
-  Do not increase the thread-pool size unless the number
									of active threads is observed to remain equal to the maximum
									available threads. (If a three-node cluster with eight cores
									per node, do not increase from 24 unless the active thread
									count, displayed in the UI, is frequently seen to be around
									72.)
						
-  If the count of active threads is frequently equal to
									the maximum number of available threads, review the core load
									average on the nodes.
						
-  If the core load average is below 80% of the number of
									cores and the count of active threads is at its maximum,
									slightly increase the thread-pool size. Begin by increasing
									the pool size by n+1 times the number of cores where
									n is the current value.
									Keep the load average around 80% of the
									number of cores to account for the loss of one node and you
									want some resources to remain avalable to process the
									additonal amount of work on the remaining nodes.
						
-  For the number of concurrent tasks, if
									
									-  there is back-pressure somewhere in the workflow, 
-  the load average is low, 
-  the count of active threads is not at the maximum,
									
 consider increasing the number of concurrent tasks where the
									processors are not processing enough data and are causing
									back-pressure.
 
 Increase the number of concurrent tasks iteratively by only 1
									each time while monitoring
									-  how things are evolving globally, in particular, how the
												thread pool is shared across all the workflows of the
												cluster,
									
-  the load average, and 
-  the active thread count. 
 If a processor (on its right side, the number of active
									threads is displayed), has active threads and is not
									processing data as fast as expected while the load average on
									the server is low, the issue could be related to disk I/O
									operation. Check the I/O statistics on the disks hosting the
									NiFi repositories (content, flowfile and
									provenance).
 
 
-  It is a good practice to start by setting the Timer Driven Thread Pool
			size number equal to three times the number of cores available
			on your NiFi nodes. For example, if you have nodes with eight cores, you
			would set this value to 24.
			
-  Concurrent Tasks increases how many flowfiles are processed by a
			single processor by using system resources that then are not usable by
			other Processors. This provides a relative weighting of
			processors—it controls how much of the system’s resources should be
			allocated to this processor instead of other processors.
 
 This field is available for most processors. There are, however, some
			types of processors that can only be scheduled with a single concurrent
			task. It is also worth noting that increasing this value for a processor
			may increase thread contention and, as a result, setting this to a large
			value can harm productivity.
 
 As a best practice, you should not generally set this value to anything
			greater than 12.
-  NiFi is optimized to support flowfiles of any size. This is achieved by
			never materializing the file into memory directly. Instead, NiFi uses
			input and output streams to process events (there are a few exceptions
			with some specific processors). This means that NiFi does not require
			significant memory even if it is processing very large files. Most of the
			memory on the system should be left available for the OS cache. By having
			a large enough OS cache, many of the disk reads are skipped completely.
			Consequently, unless NiFi is used for very specific memory oriented data
			flows, setting the Java heap to 8 or 16 Gb is usually sufficient.
			
 
 The performance expected directly depends on the hardware and the flow
			design. For example, when reading compressed data from a cloud object
			store, decompressing the data, filtering it based on specific values,
			compressing the filtered data, and sending it to a cloud object store,
			the following results can be reasonable expected:
 
 
			
				
						| Nodes | Data rate per second | Events per second | Data rate per day | Events per day |  | 1 | 192.5 Mb | 946,000 | 16.6 Tb | 81.7 billion |  | 25 | 5.8 Gb | 26 million | 501 Tb | 2.25 trillion |  | 150 | 32.6 Gb | 141.3 million | 2.75 Pb | 12.2 trillion |  
 The characteristics of the above include:
					(per node)
			-  32 cores 
-  15 Gb RAM 
-  2 Gb heap 
-  1 Tb persistent SSD (400 Mb/s write, 1200 Mb/s read) 
 
 At a minimum:
			(per node)
			-  4 cores 
-  6 disks disk dedicated to repositories (including mirroring)*
			
-  4 Gb heap 
-  1 Tb persistent SSD (400 Mb/s write, 1200 Mb/s read) 
 
 See the full exposé by Mark Payne here.
 
 * Why 6 disks dedicated to the NiFi repositories? Because one
			disk and one mirror for each of content, flowfile and
			provenance. In that order of priority and return on investment.
			Leave out the database (4th) repository? It's only the user data
			(keys, who's logged in, etc.) plus a database to track configuration
			changes made in the UI.
 Single-node NiFi differences... 
From the NiFi UI of a single-node installation, there is no menu choice,
Cluster, under the General (hamburger) menu. What there is is
obtained under General → Summary, then clicking
system diagnostics at lower right of the window. This yields tabs
that lead to alerts for
-  JVM
Heap (71.0%)                                  Non-heap
Max:                                          Max:
512 MB                                        -1 bytes
Total:                                        Total:
512 MB                                        261.2 MB
Used:                                         Used:
362.02 MB                                     245.48 MB
Free:                                         Free:
149.98 MB                                     15.72 MB
Garbage Collection
GI Old Generation:
0 times (00:00:00.000)
GI Young Generation:
Runtime
792 times (00:00:01.546)
Uptime:
87:14:57.964
 
-  System
Available Cores:                              Core Load Average: (for the last minute)
16                                            20.02
 
-  Version
System Diagnostics
------------------
NiFi
NiFi Version    1.13.2
Tag             nifi-1.13.2-RC1
Build Date/Time 03/17/2021 20:45:53 MDT
Branch          UNKNOWN
Revision        174938e
Java
Version         11.0.10
Vendor          AdoptOpenJDK
Operating System
Name            Linux
Version         5.4.0-146-generic
Architecture    amd64
 
 Home-grown, practical, remedial hypotheses to test 
-  Make processors that process next to nothing by weight get all the cycles
			they want.
			
			-  UpdateAttributes 
-  RouteOnAttributes 
 
-  Weigh instead the heavy processing in the balance bridling their leisure,
			applying back-pressure, etc. for the big guns (heavy processors).