Which AWS EC2 instance type should you select for Apache NiFi? I recommend the following instance types, for reasons discussed further below.
|Instance Type||EBS-Optimized||Enhanced Networking||Estimated $/mo*||Notes|
* Estimated monthly price based on current on-demand hourly prices in the US-East-1 region as of May, 2017. ** By the time your Apache NiFi installation is growing beyond the bounds of a 2xlarge instance type, you should have enough data to make a more educated instance type selection. For additional growth, you may also wish to consider a cluter of NiFi instances rather than scaling up the single instance size.
Why is this Complicated?
Choosing an instance type for NiFi is complicated because it involves an intersection of:
- AWS's bewildering array of instance types, many of which are "optimized" for compute, memory, IO, etc. All of these attributes sound good.
- General requirements of Apache NiFi.
- Unique requirements of your NiFi flow, which you have to learn from experience.
- Your budget.
Basic Requirements for Apache NiFi
Apache NiFi 1.x requires more than 1 Gigabyte of RAM to start up, and can easily use 2 Gigabytes for a simple flow. It is not feasible to run NiFi 1.x on a micro instance. A t2-small is the most inexpensive instance type for running an experimental NiFi. A t2-medium is an economical starter instance type for a modest production flow.
Start With a General-Purpose Instance
Don't waste your time and money speculating about how your flow should be optimized for compute, memory, or IO. Start with a general-purpose instance type and monitor your instance's performance for a while to see where resources are constrained.
- M4 is the current family of general-purpose instances, and you should pick one of these by default.
- T2 instance types are good choices for low-cost experimentation and learning NiFi. For production, you should consider if the "burstable CPU" nature of these instances is really a good fit for your streaming data flow.
Prefer an EBS-Optimized Instance
Apache NiFi is typically disk IO intensive, and you should prefer instance types with EBS-optimized disk IO. In AWS-speak, an EBS-optimized instance has a separate network connection to EBS storage, rather than sharing the default network connection.
Certain instance types are EBS-optimized at no additional charge -- including M4 general-purpose and C4 compute-optimized -- these are great for NiFi. The actual bandwidth to EBS storage varies, you can scale this up both as part of selecting a larger instance and by selecting provisioned IOPS disks.
Prefer an Enhanced-Networking Instance
Apache NiFi is frequently performing network-intensive work that can benefit from AWS's Enhanced-Networking instance types. Instance types that include Enhanced Networking by default include the M4 general-purpose family and the C4 compute-optimized family.
Don't Forget the Disks
Although instance type is a popular question, disk configuration is frequently more critical to flow performance. Consider allocating separate disks for the Provenance, FlowFile, and Content repositories. Separate disks will improve fault tolerance, provide additional IO and IO scaling options, and make it easier to monitor IO activity.
Do I Need a Compute-Optimized Instance?
Not just yet. It seems intuitive to many users that their flow will be CPU-bound. While you may eventually benefit from a compute-optimized instance, you should run NiFi on a general-purpose instance for long enough to understand your actual performance constraints. Proper utilization of CPU resources also requires tuning your NiFi flow to allocate threads to processors, and to relieve unnecessary CPU demands where appropriate (regex and script optimization, for example).
Last, the differences between M4 general-purpose and C4 compute-optimized instances are modest, so don't get too caught up in choosing one over the other. Consider the following chart of EC2 instance types by virtual CPU and RAM resources, sized by monthly cost:
M-series and C-series instances have relatively similar performance characteristics. Only starting at the 2xlarge and 4xlarge level is there any significant divergence in the CPU/Memory/Price relationship where a decision might be warranted. The decision to use a compute-optimized instance is really a decision to save money by not paying for extra memory. To repeat the points above, the impact of disk configuration and flow tuning will be greater than the differences between M-series and C-series instances.
* AWS measures compute performance in both Virtual CPUs and Elastic Compute Units (ECUs). C-series Compute-optimized instances have better ECU performance than the vCPU numbers reflect.
Need Some Help?
Please visit the BatchIQ Support Portal to open a ticket and get customized assistance.