1. Introduction to performance tuning
This document provides guidelines for tuning Foreman for performance and scalability.
2. System requirements for tuning
You can find the hardware and software requirements in Preparing environment for Foreman server installation in Installing Foreman Server 3.16 on Debian/Ubuntu.
3. Determining hardware and operating system configuration
- CPU
-
The more physical cores that are available to Foreman, the higher throughput can be achieved for the tasks. Some of the Foreman components such as Puppet and PostgreSQL are CPU intensive applications and can really benefit from the higher number of available CPU cores.
- Memory
-
The higher amount of memory available in the system running Foreman, the better will be the response times for the Foreman operations. Since Foreman uses PostgreSQL as the database solutions, any additional memory coupled with the tunings will provide a boost to the response times of the applications due to increased data retention in the memory.
- Disk
-
Foreman should be deployed using SSDs to avoid performance bottlenecks. Components such as PostgreSQL benefit from using SSDs due to their lower latency compared to HDDs.
- Network
-
The communication between the Foreman server and Smart Proxies is impacted by the network performance. A decent network with a minimum jitter and low latency is required to enable hassle free operations such as Foreman server and Smart Proxies synchronization (at least ensure it is not causing connection resets, etc).
- Server Power Management
-
Your server by default is likely to be configured to conserve power. While this is a good approach to keep the max power consumption in check, it also has a side effect of lowering the performance that Foreman may be able to achieve. For a server running Foreman, it is recommended to set the BIOS to enable the system to be run in performance mode to boost the maximum performance levels that Foreman can achieve.
3.1. Enabling tuned profiles
On bare-metal, Foreman community recommends to run the throughput-performance
tuned profile on Foreman server and Smart Proxies.
On virtual machines, Foreman community recommends to run the virtual-guest
profile.
-
Install
tuned
on your Smart Proxy:# apt install tuned
-
Check if
tuned
is running:# systemctl status tuned
-
If
tuned
is not running, enable it:# systemctl enable --now tuned
-
Optional: View a list of available
tuned
profiles:# tuned-adm list
-
Enable a
tuned
profile depending on your scenario:# tuned-adm profile "My_Tuned_Profile"
3.2. Disable Transparent Hugepage
Transparent Hugepage is a memory management technique used by the Linux kernel to reduce the overhead of using the Translation Lookaside Buffer (TLB) by using larger sized memory pages. Due to databases having Sparse Memory Access patterns instead of Contiguous Memory access patterns, database workloads often perform poorly when Transparent Hugepage is enabled. To improve PostgreSQL and Redis performance, disable Transparent Hugepage. In deployments where the databases are running on separate servers, there may be a small benefit to using Transparent Hugepage on the Foreman server only.
For more information on how to disable Transparent Hugepage, see How to disable transparent hugepages (THP) on Red Hat Enterprise Linux.
4. Configuring Foreman for performance
Foreman includes several components that communicate with each other. You can tune these components independently of each other to achieve the maximum possible performance for your scenario.
4.1. Applying configurations
In following sections we suggest various tunables and how to apply them. Please always test changing these in non production environment first, with valid backup and with proper outage window as in most of the cases Foreman restart is required.
It is also a good practice to setup a monitoring before applying any change as it will allow you to evaluate effect of the change. Our testing environment might be too far from what you will see although we are trying hard to mimic real world environment.
Optional: After any change, run this quick Foreman health check:
# foreman-maintain health check
4.2. Puma tunings
Puma is a ruby application server which is used for serving all Foreman related requests to hosts. For any Foreman configuration that is supposed to handle a large number of hosts or frequent operations, it is important for the Puma to be tuned appropriately.
4.2.1. Puma workers and threads auto-tuning
If you do not provide any Puma workers and thread values with foreman-installer
or they are not present in your Foreman configuration, the foreman-installer
configures a balanced number of workers.
It follows this formula:
min(CPU_COUNT * 1.5, RAM_IN_GB - 1.5)
This should be fine for most cases, but with some usage patterns tuning is needed to either limit the amount of resources dedicated to Puma (so other Foreman components can use these) or for any other reason. Each Puma worker consumes around 1 GiB of RAM.
# cat /etc/systemd/system/foreman.service.d/installer.conf
# systemctl status foreman
4.2.2. Manually tuning Puma workers and threads count
If you decide not to rely on Puma workers and threads auto-tuning, you can apply custom numbers for these tunables. In the example below we are using 2 workers and 5 threads:
# foreman-installer \ --foreman-foreman-service-puma-workers 2 \ --foreman-foreman-service-puma-threads-max 5
4.2.3. Puma workers and threads recommendations
In order to recommend thread and worker configurations for the different tuning profiles, we conducted Puma tuning testing on Foreman with different tuning profiles. The main test used in this testing was concurrent registration with the following combinations along with different number of workers and threads. Our recommendation is based purely on concurrent registration performance, so it might not reflect your exact use-case.
Name | Number of hosts | RAM | Cores | Recommended Puma threads | Recommended Puma workers |
---|---|---|---|---|---|
default |
0 – 5000 |
20 GiB |
4 |
16 |
4 – 6 |
medium |
5000 – 10000 |
32 GiB |
8 |
16 |
8 – 12 |
large |
10000 – 20000 |
64 GiB |
16 |
16 |
12 – 18 |
extra-large |
20000 – 60000 |
128 GiB |
32 |
16 |
16 – 24 |
extra-extra-large |
60000+ |
256 GiB+ |
48+ |
16 |
20 – 26 |
Tuning number of workers is the more important aspect here and in some case we have seen up to 52% performance increase. Although installer uses 5 threads by default, we recommend 16 threads with all the tuning profiles in the table above. That is because we have seen up to 23% performance increase with 16 threads (14% for 8 and 10% for 32) when compared to setup with 4 threads.
To figure out these recommendations we used concurrent registrations test case which is a very specific use-case. It can be different on your Foreman which might have more balanced use-case (not only registrations). Keeping default 5 threads is a good choice as well.
These are some of our measurements that lead us to these recommendations:
4 workers, 4 threads | 4 workers, 8 threads | 4 workers, 16 threads | 4 workers, 32 threads | |
---|---|---|---|---|
Improvement |
0% |
14% |
23% |
10% |
Use 4 – 6 workers on a default setup (4 CPUs) – we have seen about 25% higher performance with 5 workers when compared to 2 workers, but 8% lower performance with 8 workers when compared to 2 workers – see table below:
2 workers, 16 threads | 4 workers, 16 threads | 6 workers, 16 threads | 8 workers, 16 threads | |
---|---|---|---|---|
Improvement |
0% |
26% |
22% |
-8% |
Use 8 – 12 workers on a medium setup (8 CPUs) – see table below:
2 workers, 16 threads | 4 workers, 16 threads | 8 workers, 16 threads | 12 workers, 16 threads | 16 workers, 16 threads | |
---|---|---|---|---|---|
Improvement |
0% |
51% |
52% |
52% |
42% |
Use 16 – 24 workers on a 32 CPUs setup (this was tested on a 90 GiB RAM machine and memory turned out to be a factor here as system started swapping – proper extra-large should have 128 GiB), higher number of workers was problematic for higher registration concurrency levels we tested, so we cannot recommend it.
4 workers, 16 threads | 8 workers, 16 threads | 16 workers, 16 threads | 24 workers, 16 threads | 32 workers, 16 threads | 48 workers, 16 threads | |
---|---|---|---|---|---|---|
Improvement |
0% |
37% |
44% |
52% |
too many failures |
too many failures |
4.2.4. Configuring Puma workers
If you have enough CPUs, adding more workers adds more performance. For example, we have compared Foreman setups with 8 and 16 CPUs:
Foreman VM with 8 CPUs, 40 GiB RAM | Foreman VM with 16 CPUs, 40 GiB RAM |
---|---|
|
|
|
|
In 8 CPUs setup, changing the number of workers from 2 to 16, improved concurrent registration time by 36%. In 16 CPUs setup, the same change caused 55% improvement.
Adding more workers can also help with total registration concurrency Foreman can handle. In our measurements, setup with 2 workers were able to handle up to 480 concurrent registrations, but adding more workers improved the situation.
4.2.5. Configuring Puma threads
More threads allow for lower time to register hosts in parallel. For example, we have compared these two setups:
Foreman VM with 8 CPUs, 40 GiB RAM | Foreman VM with 8 CPUs, 40 GiB RAM |
---|---|
|
|
|
|
Using more workers and the same total number of threads results in about 11% of speedup in highly concurrent registrations scenario. Moreover, adding more workers did not consume more CPU and RAM but gets more performance.
4.3. Apache HTTPD performance tuning
Apache httpd forms a core part of the Foreman and acts as a web server for handling the requests that are being made through the Foreman web UI or exposed APIs. To increase the concurrency of the operations, httpd forms the first point where tuning can help to boost the performance of your Foreman.
4.4. Dynflow tuning
Dynflow is the workflow management system and task orchestrator which is a Foreman plugin and is used to execute the different tasks of Foreman in an out-of-order execution manner. Under the conditions when there are many clients checking in on Foreman and running several tasks, Dynflow can take some help from an added tuning specifying how many executors can it launch.
For more information about the tunings involved related to Dynflow, see https://foreman.example.com/foreman_tasks/sidekiq
.
Foreman contains a Dynflow service called dynflow-sidekiq
that performs tasks scheduled by Dynflow.
Sidekiq workers can be grouped into various queues to ensure that many tasks of one type will not block execution of tasks of other type.
Foreman community recommends increasing the number of sidekiq workers to scale the Foreman tasking system for bulk concurrent tasks. There are two options available:
-
You can increase the number of threads used by a worker (worker’s concurrency). This has limited impact for values larger than five due to Ruby implementation of the concurrency of threads.
-
You can increase the number of workers, which is recommended.
-
Increase the number of workers from one worker to three while remaining five threads/concurrency of each:
# foreman-installer --foreman-dynflow-worker-instances 3 # optionally, add --foreman-dynflow-worker-concurrency 5
-
Optional: Check if there are three worker services:
# systemctl -a | grep dynflow-sidekiq@worker-[0-9] dynflow-sidekiq@worker-1.service loaded active running Foreman jobs daemon - worker-1 on sidekiq dynflow-sidekiq@worker-2.service loaded active running Foreman jobs daemon - worker-2 on sidekiq dynflow-sidekiq@worker-3.service loaded active running Foreman jobs daemon - worker-3 on sidekiq
4.5. Pull-based REX transport tuning
Foreman has a pull-based transport mode for remote execution. This transport mode uses MQTT as its messaging protocol and includes an MQTT client running on each host. For more information, see Transport modes for remote execution in Managing hosts.
4.5.1. Increasing host limit for pull-based REX transport
You can tune the mosquitto
MQTT server and increase the number of hosts connected to it.
-
Enable pull-based remote execution on your Foreman server or Smart Proxy server:
# foreman-installer --foreman-proxy-plugin-remote-execution-script-mode pull-mqtt
Note that your Foreman server or Smart Proxy server can only use one transport mode, either SSH or MQTT.
-
Create a config file to increase the default number of hosts accepted by the MQTT service:
cat >/etc/systemd/system/mosquitto.service.d/limits.conf <<EOF [Service] LimitNOFILE=5000 EOF
This example sets the limit to allow the
mosquitto
service to handle 5000 hosts. -
Apply your changes:
# systemctl daemon-reload # systemctl restart mosquitto.service
4.6. PostgreSQL tuning
Foreman uses the PostgreSQL database for the storage of persistent context across a wide variety of tasks. The database sees an extensive usage is usually working on to provide the Foreman with the data which it needs for its smooth functioning. This makes PostgreSQL a heavily used process which if tuned can have several benefits on the overall operational response of Foreman.
The PostgreSQL authors recommend disabling Transparent Hugepage on servers running PostgreSQL. For more information, see Disable Transparent Hugepage.
You can apply a set of tunings to PostgreSQL to improve its response times, which will modify the postgresql.conf
file.
-
Append
/etc/foreman-installer/custom-hiera.yaml
to tune PostgreSQL:postgresql::server::config_entries: max_connections: 1000 shared_buffers: 2GB work_mem: 8MB autovacuum_vacuum_cost_limit: 2000
-
Reconfigure your Foreman:
# foreman-installer
You can use this to effectively tune down your Foreman instance irrespective of a tuning profile.
In the above tuning configuration, there are a certain set of keys which we have altered:
-
max_connections
: The key defines the maximum number of connections that can be accepted by the PostgreSQL processes that are running. -
shared_buffers
: The shared buffers define the memory used by all the active connections inside PostgreSQL to store the data for the different database operations. An optimal value for this will vary between 2 GiB to a maximum of 25% of your total system memory depending upon the frequency of the operations being conducted on Foreman. -
work_mem
: The work_mem is the memory that is allocated on per process basis for PostgreSQL and is used to store the intermediate results of the operations that are being performed by the process. Setting this value to 8 MB should be more than enough for most of the intensive operations on Foreman. -
autovacuum_vacuum_cost_limit
: The key defines the cost limit value for the vacuuming operation inside the autovacuum process to clean up the dead tuples inside the database relations. The cost limit defines the number of tuples that can be processed in a single run by the process. Foreman community recommends setting the value to2000
as it is for the medium, large, extra-large, and extra-extra-large profiles, based on the general load that Foreman pushes on the PostgreSQL server process.
4.6.1. Benchmarking raw DB performance
To get a list of the top table sizes in disk space for Foreman, check postgres-size-report script in satellite-support git repository.
PGbench utility (note you may need to resize PostgreSQL data directory /var/lib/postgresql
directory to 100 GiB or what does benchmark take to run) might be used to measure PostgreSQL performance on your system.
Use apt install postgresql-contrib
to install it.
For more information, see github.com/RedHatSatellite/satellite-support.
Choice of filesystem for PostgreSQL data directory might matter as well.
Warning
|
|
4.7. Redis tuning
Redis is an in-memory data store. Foreman uses Redis for caching and Dynflow uses Redis to track its tasks. Given the way Foreman uses Redis, its memory consumption should be stable.
The Redis authors recommend disabling Transparent Hugepage on servers running Redis. For more information, see Disable Transparent Hugepage.