Tuning performance of Foreman

1. Introduction to performance tuning

Tune Foreman to improve system response times and handle larger numbers of managed hosts. Performance tuning helps optimize resource usage and ensures smooth operations as your infrastructure scales.

2. System requirements for tuning

You can find the hardware and software requirements in Preparing your environment for Foreman server installation in Installing Foreman Server 3.19 on Debian/Ubuntu.

3. Hardware and operating system configuration

You can tune the hardware and operating system to optimize Foreman performance for your scenario.

CPU: The more physical cores that are available to Foreman, the higher throughput can be achieved for the tasks. Some of the Foreman components such as Puppet and PostgreSQL are CPU intensive applications and can really benefit from the higher number of available CPU cores.
Memory: The higher amount of memory available in the system running Foreman, the better will be the response times for the Foreman operations. Since Foreman uses PostgreSQL as the database solutions, any additional memory coupled with the tunings will provide a boost to the response times of the applications due to increased data retention in the memory.
Disk: Foreman should be deployed using SSDs to avoid performance bottlenecks. Components such as PostgreSQL benefit from using SSDs due to their lower latency compared to HDDs.
Network: The communication between the Foreman server and Smart Proxies is impacted by the network performance. A decent network with a minimum jitter and low latency is required to enable hassle free operations such as Foreman server and Smart Proxies synchronization. At the very least, ensure it is not causing connection resets and similar problems.
Server Power Management: Your server by default is likely to be configured to conserve power. While this is a good approach to keep the max power consumption in check, it also has a side effect of lowering the performance that Foreman can achieve. For a server running Foreman, it is recommended to set the BIOS to enable the system to be run in performance mode to boost the maximum performance levels that Foreman can achieve.

3.1. Enabling tuned profiles

Enable tuned profiles to automatically optimize the performance settings of Foreman for its workloads. This improves overall system performance by applying configuration optimizations tailored to your environment.

Procedure

Install tuned on your Smart Proxy:
```
# apt install tuned
```
Check if tuned is running:
```
# systemctl status tuned
```
If tuned is not running, enable it:
```
# systemctl enable --now tuned
```
Optional: View a list of available tuned profiles:
```
# tuned-adm list
```
Enable a tuned profile depending on your scenario:
```
# tuned-adm profile "My_Tuned_Profile"
```

3.2. Transparent Hugepage and database performance

Disable Transparent Hugepage to improve database performance in Foreman, especially for PostgreSQL and Redis workloads that perform poorly with this memory management feature enabled.

Transparent Hugepage is a memory management technique used by the Linux kernel to reduce the overhead of using the Translation Lookaside Buffer (TLB) by using larger sized memory pages. Due to databases having Sparse Memory Access patterns instead of Contiguous Memory access patterns, database workloads often perform poorly when Transparent Hugepage is enabled.

In deployments with databases running on separate servers, the benefit of using Transparent Hugepage on the Foreman server might be limited.

Additional resources

How to disable transparent hugepages (THP) on Red Hat Enterprise Linux

4. Configuring Foreman for performance

Foreman includes several components that communicate with each other. You can tune these components independently of each other to achieve the maximum possible performance for your scenario.

4.1. Considerations for performance tuning

You can optimize Foreman performance by adjusting various tunables to suit your workload.

Performance tuning requires careful testing in non-production environments with valid backups because most configuration changes require a Foreman restart. Setting up monitoring before applying changes helps you evaluate their effectiveness in your specific environment.

Our testing environment might be too far from what you will see although we are trying hard to mimic real world environment.

Optional: After any change, run this quick Foreman health check:

$ foreman-maintain health check

4.2. Puma tunings

Tune Puma to optimize Foreman performance for handling concurrent host operations and API requests. Proper Puma configuration is essential for deployments managing large numbers of hosts or frequent operations.

4.2.1. Puma workers and threads auto-tuning

Auto-tuning configures Puma workers and threads based on available CPU and memory to optimize Foreman performance. Understanding this default behavior helps you determine when manual tuning is necessary for your environment.

If you do not provide any Puma workers and thread values with foreman-installer or they are not present in your Foreman configuration, the foreman-installer configures a balanced number of workers. It follows this formula:

min(CPU_COUNT * 1.5, RAM_IN_GB - 1.5)

This should be fine for most cases, but with some usage patterns tuning is needed to either limit the amount of resources dedicated to Puma (so other Foreman components can use these) or for any other reason. Each Puma worker consumes around 1 GiB of RAM.

View your current Foreman server settings

# cat /etc/systemd/system/foreman.service.d/installer.conf

View the currently active Puma workers

# systemctl status foreman

4.2.2. Manually tuning Puma workers and threads count

Manually configure Puma workers and threads count to optimize Foreman performance for your specific hardware and workload requirements when auto-tuning does not meet your needs.

For example, the following configuration uses two workers and five threads:

# foreman-installer \
--foreman-foreman-service-puma-workers 2 \
--foreman-foreman-service-puma-threads-max 5

4.2.3. Puma workers and threads recommendations

Configure Puma workers and threads based on your deployment size and hardware to optimize Foreman performance, especially for concurrent host registration workloads.

The main test used in this testing was concurrent registration with the following combinations along with different number of workers and threads. Our recommendation is based purely on concurrent registration performance, so it might not reflect your exact use-case.

Name	Number of hosts	RAM	Cores	Recommended Puma threads	Recommended Puma workers
default	0 – 5000	20 GiB	4	16	4 – 6
medium	5000 – 10000	32 GiB	8	16	8 – 12
large	10000 – 20000	64 GiB	16	16	12 – 18
extra-large	20000 – 60000	128 GiB	32	16	16 – 24
extra-extra-large	60000+	256 GiB+	48+	16	20 – 26

Name

Number of hosts

RAM

Cores

Recommended Puma threads

Recommended Puma workers

default

0 – 5000

20 GiB

4 – 6

medium

5000 – 10000

32 GiB

8 – 12

large

10000 – 20000

64 GiB

12 – 18

extra-large

20000 – 60000

128 GiB

16 – 24

extra-extra-large

60000+

256 GiB+

48+

20 – 26

Tuning number of workers is the more important aspect here and in some case we have seen up to 52% performance increase. Although installer uses 5 threads by default, we recommend 16 threads with all the tuning profiles in the table above. That is because we have seen up to 23% performance increase with 16 threads (14% for 8 and 10% for 32) when compared to setup with 4 threads.

To figure out these recommendations we used concurrent registrations test case which is a very specific use-case. It can be different on your Foreman which might have more balanced use-case (not only registrations). Keeping default 5 threads is a good choice as well.

These are some of our measurements that lead us to these recommendations:

	4 workers, 4 threads	4 workers, 8 threads	4 workers, 16 threads	4 workers, 32 threads
Improvement	0%	14%	23%	10%

4 workers, 4 threads

4 workers, 8 threads

4 workers, 16 threads

4 workers, 32 threads

Improvement

14%

23%

10%

Use 4 – 6 workers on a default setup (4 CPUs) – we have seen about 25% higher performance with 5 workers when compared to 2 workers, but 8% lower performance with 8 workers when compared to 2 workers – see table below:

	2 workers, 16 threads	4 workers, 16 threads	6 workers, 16 threads	8 workers, 16 threads
Improvement	0%	26%	22%	-8%

2 workers, 16 threads

4 workers, 16 threads

6 workers, 16 threads

8 workers, 16 threads

Improvement

26%

22%

-8%

Use 8 – 12 workers on a medium setup (8 CPUs) – see table below:

	2 workers, 16 threads	4 workers, 16 threads	8 workers, 16 threads	12 workers, 16 threads	16 workers, 16 threads
Improvement	0%	51%	52%	52%	42%

2 workers, 16 threads

4 workers, 16 threads

8 workers, 16 threads

12 workers, 16 threads

16 workers, 16 threads

Improvement

51%

52%

42%

Use 16 – 24 workers on a 32 CPUs setup (this was tested on a 90 GiB RAM machine and memory turned out to be a factor here as system started swapping – proper extra-large should have 128 GiB), higher number of workers was problematic for higher registration concurrency levels we tested, so we cannot recommend it.

	4 workers, 16 threads	8 workers, 16 threads	16 workers, 16 threads	24 workers, 16 threads	32 workers, 16 threads	48 workers, 16 threads
Improvement	0%	37%	44%	52%	too many failures	too many failures

4 workers, 16 threads

8 workers, 16 threads

16 workers, 16 threads

24 workers, 16 threads

32 workers, 16 threads

48 workers, 16 threads

Improvement

37%

44%

52%

too many failures

4.2.4. Puma workers configuration

Configure Puma workers to scale Foreman performance based on available CPU resources. Adding more workers increases concurrent request capacity, improving registration times, and overall system responsiveness.

If you have enough CPUs, adding more workers adds more performance. For example, we have compared Foreman setups with 8 and 16 CPUs:

Table 1. foreman-installer options used to test effect of workers count
Foreman VM with 8 CPUs, 40 GiB RAM	Foreman VM with 16 CPUs, 40 GiB RAM
`--foreman-foreman-service-puma-threads-max 16`	`--foreman-foreman-service-puma-threads-max 16`
`--foreman-foreman-service-puma-workers {2\|4\|8\|16}`	`--foreman-foreman-service-puma-workers {2\|4\|8\|16}`

In 8 CPUs setup, changing the number of workers from 2 to 16, improved concurrent registration time by 36%. In 16 CPUs setup, the same change caused 55% improvement.

Adding more workers can also help with total registration concurrency Foreman can handle. In our measurements, setup with 2 workers were able to handle up to 480 concurrent registrations, but adding more workers improved the situation.

4.2.5. Puma threads configuration

Configure Puma threads to improve parallel host registration performance and optimize resource utilization per worker process in Foreman.

For example, we have compared these two setups:

Foreman VM with 8 CPUs, 40 GiB RAM Foreman VM with 8 CPUs, 40 GiB RAM

Foreman VM with 8 CPUs, 40 GiB RAM	Foreman VM with 8 CPUs, 40 GiB RAM
`--foreman-foreman-service-puma-threads-max 16`	`--foreman-foreman-service-puma-threads-max 8`
`--foreman-foreman-service-puma-workers 2`	`--foreman-foreman-service-puma-workers 4`

--foreman-foreman-service-puma-threads-max 16

--foreman-foreman-service-puma-threads-max 8

--foreman-foreman-service-puma-workers 2

--foreman-foreman-service-puma-workers 4

Using more workers and the same total number of threads results in about 11% of speedup in highly concurrent registrations scenario. Moreover, adding more workers did not consume more CPU and RAM but gets more performance.

4.3. Apache HTTPD performance tuning

Tune Apache httpd to improve Foreman performance and increase concurrency for Foreman web UI and API requests.

Apache httpd forms a core part of the Foreman and acts as a web server for handling the requests sent through the Foreman web UI or exposed APIs.

4.4. Tuning Dynflow

When you have many hosts checking in and running concurrent tasks, you can increase the number of Dynflow workers to improve Foreman task execution performance.

Dynflow is the workflow management system and task orchestrator which is a Foreman plugin and is used to execute the different tasks of Foreman in an out-of-order execution manner. Under the conditions when there are many hosts checking in on Foreman and running several tasks, Dynflow can take some help from an added tuning specifying how many executors it can launch.

For more information about the tunings involved related to Dynflow, see https://foreman.example.com/foreman_tasks/sidekiq.

Foreman contains a Dynflow service called dynflow-sidekiq that performs tasks scheduled by Dynflow. Sidekiq workers can be grouped into various queues to ensure that many tasks of one type will not block execution of tasks of other type.

Foreman community recommends increasing the number of sidekiq workers to scale the Foreman tasking system for bulk concurrent tasks. There are two options available:

You can increase the number of threads used by a worker (worker’s concurrency). This has limited impact for values larger than five due to Ruby implementation of the concurrency of threads.
You can increase the number of workers, which is recommended.

Procedure

Increase the number of workers from one worker to three while remaining five threads/concurrency of each:

# foreman-installer --foreman-dynflow-worker-instances 3    # optionally, add --foreman-dynflow-worker-concurrency 5

Optional: Check if there are three worker services:

# systemctl -a | grep dynflow-sidekiq@worker-[0-9]
dynflow-sidekiq@worker-1.service        loaded    active   running   Foreman jobs daemon - worker-1 on sidekiq
dynflow-sidekiq@worker-2.service        loaded    active   running   Foreman jobs daemon - worker-2 on sidekiq
dynflow-sidekiq@worker-3.service        loaded    active   running   Foreman jobs daemon - worker-3 on sidekiq

4.5. Pull-based REX transport tuning

You can adjust MQTT broker settings and client polling intervals to optimize pull-based remote execution performance for the scale and network conditions of your environment.

For more information, see Transport modes for remote execution in Managing hosts.

4.5.1. Increasing host limit for pull-based REX transport

Tune the mosquitto MQTT server on your Smart Proxy to support more than the default limit of 1024 connected hosts when scaling pull-based remote execution to larger deployments.

Prerequisites

You have enabled pull-based remote execution on your Smart Proxy. For more information, see Configuring pull-based transport for remote execution in Installing a Smart Proxy Server 3.19 on Debian/Ubuntu.

Procedure

On your Smart Proxy, set the upper limit of connected hosts for pull-based remote execution in /etc/foreman-installer/custom-hiera.yaml:
```
systemd::dropin_files:
  limits.conf:
    unit: mosquitto.service
    content: "[Service]\nLimitNOFILE=5000\n"
```
This example configures the mosquitto service on your Smart Proxy to handle up to 5000 hosts.
Re-run the installer for the changes to take effect:
```
# foreman-installer
```

4.6. Tuning PostgreSQL

Configure PostgreSQL database settings in Foreman to improve response times and operational performance. Tuning PostgreSQL reduces database query latency and enhances overall system responsiveness, especially in environments with high database activity.

The PostgreSQL authors recommend disabling Transparent Hugepage on servers running PostgreSQL. For more information, see Transparent Hugepage and database performance.

You can apply a set of tunings to PostgreSQL to improve its response times, which will modify the postgresql.conf file. You can use this to effectively tune down your Foreman instance irrespective of a tuning profile.

Procedure

Append /etc/foreman-installer/custom-hiera.yaml to tune PostgreSQL:
```
postgresql::server::config_entries:
  max_connections: 1000
  shared_buffers: 2GB
  work_mem: 8MB
  autovacuum_vacuum_cost_limit: 2000
```
This tuning configuration alters the following keys:

max_connections

The key defines the maximum number of connections that can be accepted by the PostgreSQL processes that are running.

shared_buffers

The shared buffers define the memory used by all the active connections inside PostgreSQL to store the data for the different database operations. An optimal value for this will vary between 2 GiB to a maximum of 25% of your total system memory depending upon the frequency of the operations being conducted on Foreman.

work_mem

The work_mem is the memory that is allocated on per process basis for PostgreSQL and is used to store the intermediate results of the operations that are being performed by the process. Setting this value to 8 MB should be more than enough for most of the intensive operations on Foreman.

autovacuum_vacuum_cost_limit

The key defines the cost limit value for the vacuuming operation inside the autovacuum process to clean up the dead tuples inside the database relations. The cost limit defines the number of tuples that can be processed in a single run by the process. Foreman community recommends setting the value to 2000 as it is for the medium, large, extra-large, and extra-extra-large profiles, based on the general load that Foreman pushes on the PostgreSQL server process.
Reconfigure your Foreman:
```
# foreman-installer
```

4.6.1. Benchmarking raw database performance

Use pgbench to measure raw PostgreSQL performance and validate storage behavior on your Foreman environment. Benchmarking helps you establish baseline performance metrics and verify that tuning changes improve database responsiveness.

Prerequisites

A non-production system with a valid backup.

Warning

Never run benchmarks on a production system or without a valid backup.
Before you start testing, determine how large the database files are. Testing with a very small database does not produce meaningful results. For example, if the database is only 20 GiB and the buffer pool is 32 GiB, the benchmark does not show problems with a large number of connections because the data fits entirely in the buffer pool.

Procedure

To identify the largest tables by disk space for Foreman, use the postgres-size-report script in the satellite-support Git repository. For more information, see github.com/RedHatSatellite/satellite-support.
Ensure the PostgreSQL data directory /var/lib/postgresql has enough space for the benchmark workload. You might need to resize the volume to 100 GiB or more, depending on how long the benchmark runs and how much data it generates.

Note

Choice of filesystem for PostgreSQL data directory might matter as well.
Install the PGBench utility:
```
# apt install postgresql-contrib
```

4.7. Redis tuning

Understanding Redis tuning in Foreman helps you optimize caching and task tracking performance. Properly configured Redis ensures stable memory consumption and improves responsiveness of Foreman operations.

Redis is an in-memory data store. Foreman uses Redis for caching and Dynflow uses Redis to track its tasks. Given the way Foreman uses Redis, its memory consumption should be stable.

The Redis authors recommend disabling Transparent Hugepage on servers running Redis.

Additional resources

Transparent Hugepage and database performance

Report issue