Eking out some Nextcloud performance

Tweeking my linux server Nextcloud is notorious in the selfhosted community of being difficult for some people to achieve a decent level of performance. After enabling the basic caching with both APCu and Redis there are several options to trim some fat. Once all the easy stuff is taken care of the hidden bottlenecks is where I am focusing my efforts. So far I have had some success by switching to UNIX sockets in my dockerised Nextcloud deployment.

Generally I've found:

  • Shipping file logging off to syslog made a noticeable visual difference over logging to the nextcloud.log file.
  • Using postgresql has been often touted as a decent option for easy performance gains.
  • Using the preview generator app alongside using Imaginary makes images less of an issue for general browsing.

But what else can you do after that? Trying to find bottlenecks in your setup. Be it spinning rust vs SSD vs M.2 drives there are usually some form of low hanging fruit you can find that is causing issues. A big potential issue is of course your abstraction layer, in my case docker. Docker adds some minor overheads to any service, a trade off for simplifying deployment and replication, one of these overheads is the networking stack. My understanding is that Docker's networking when not in host mode acts as a NAT, even when one container is talking to another. One method of bypass networking overhead between local services is the use of unix sockets.

In researching how to achieve this I found @jonbaldie's post on How to Connect to Redis with Unix Sockets in Docker. A few modifications and I was ready to test and verify that this made a difference.

Setup

These are the modifications done to my docker-compose file. Note that I have made a few modifications to avoid the need to set the folders and sockets permissions as 777. This is mainly handled by modifying the container user group id to the www-data group from the Nextcloud app container.

version: '2'

services:
    #Temporary busybox container to set correct permissions to shared socket folder
    tmp:
      image: busybox
      command: sh -c "chown -R 33:33 /tmp/docker/ && chmod -R 770 /tmp/docker/"
      volumes:
        - /tmp/docker/

    db:
      container_name: nextcloud_db
      image: postgres:14-alpine
      restart: always
      volumes:
        - ./volumes/postgresql:/var/lib/postgresql/data
        - /etc/localtime:/etc/localtime:ro
        - /etc/timezone:/etc/timezone:ro
      env_file:
        - db.env
      # Unix socket modifications
      # Run as a member of the www-data GID 33 group but keep postgres uid as 70
      user: "70:33"
      # Add the /tmp/docker/ socket folder to postgres
      command: postgres -c unix_socket_directories='/var/run/postgresql/,/tmp/docker/'
      depends_on:
        - tmp
      # Add shared volume from Temporary busybox container
      volumes_from:
        - tmp

    redis:
      container_name: nextcloud_redis
      image: redis:alpine
      restart: always
      volumes:
        - /etc/localtime:/etc/localtime:ro
        - /etc/timezone:/etc/timezone:ro
      # Unix socket modifications
        - ./volumes/redis.conf:/etc/redis.conf
      # Run redis with custom config
      command: redis-server /etc/redis.conf
      # Run as a member of the www-data GID 33 group but keep redis uid as 999
      user: "999:33"
      depends_on:
        - tmp
      # Add shared volume from Temporary busybox container
      volumes_from:
        - tmp

    app:
      container_name: nextcloud_app
      image: nextcloud:apache
      restart: always
      ports:
        - 127.0.0.1:9001:80
      volumes:
        - ./volumes/nextcloud:/var/www/html
        - ./volumes/php.ini:/usr/local/etc/php/conf.d/zzz-custom.ini
        - /etc/localtime:/etc/localtime:ro
        - /etc/timezone:/etc/timezone:ro
      depends_on:
        - db
        - redis
      # Unix socket modifications
      # Add shared volume from Temporary busybox container
      volumes_from:
        - tmp

This is the redis.conf file that tells it to only listen to the unix socket, and what permissions to use on said socket. Note I have a password enabled here, this is not really need it if not exposed publicly but I've used it just for best practice.

# 0 = do not listen on a port
port 0

# listen on localhost only
bind 127.0.0.1

# create a unix domain socket to listen on
unixsocket /tmp/docker/redis.sock

# set permissions for the socket
unixsocketperm 770

requirepass [password]

Finally the Nextcloud config I updated to reflect the connection changes

'dbtype' => 'pgsql',
'dbhost' => '/tmp/docker/',
'dbname' => 'nextcloud',
'dbuser' => 'nextcloud',
'dbpassword' => '{password}',

'memcache.local' => '\\OC\\Memcache\\APCu',
'memcache.distributed' => '\\OC\\Memcache\\Redis',
'memcache.locking' => '\\OC\\Memcache\\Redis',
'redis' =>
array (
  'host' => '/tmp/docker/redis.sock',
  'port' => 0,
  'dbindex' => 0,
  'password' => '{password}',
  'timeout' => 1.5,
),

Verifying the changes made a difference.

There is not much point in doing this without verification, otherwise we are all just participating in a cargo cult seeking performance enlightenment. With that in mind I set out to do some very basic benchmarks to ensure the performance gain I felt when navigating my Nextcloud install was in fact happening.

I did all my testing inside my Nextcloud container to better simulate a real-world result. I modified the redis.conf temporarily to allow both socket connections and TCP IP connections, then I had to install the redis-tools and postgresql-contrib packages to get the tools required.

# 0 = do not listen on a port
# port 0
port 6379

# listen on localhost only
# bind 127.0.0.1
bind 0.0.0.0
sudo docker exec -it nextcloud_app bash

apt update && apt install redis-tools && apt install postgresql-contrib

I then performed the same tests as @jonbaldie's using the commands time redis-benchmark -a [password] -h redis -p 6379 and time redis-benchmark -a [password] -s /tmp/docker/redis.sock

REDIS TCP (s) UNIX (s) % Diff
Real 242.8 165.5 32%
User 63.4 60.9 4%
Sys 132.1 70.6 47%
Total 438.4 297.1 32%

As you can see on my system I saw a staggering 32% difference compared to @jonbaldie's 13%. Clearly the Redis socket is a very worthwhile modification.

Using some of what I learned from reading this article I now wanted to test my Postgres database using it's benchmarking tool pgbench. I did a quick database backup just in case, but it shouldn't harm the Nextcloud db as it's only adding the tables pgbench_accounts, pgbench_branches, pgbench_tellers and pgbench_history to perform the tests.

First test the testing tables initialisation

pgbench -h db -i -p 5432 -U nextcloud -d nextcloud

...

done in 1.85 s (drop tables 0.00 s, create tables 0.13 s, client-side generate 0.60 s, vacuum 0.60 s, primary keys 0.51 s)

Then I Ran 3 tests using the command pgbench -h /tmp/docker/ -c 10 -U nextcloud -d nextcloud simulating 10 clients.

Postgres TCP 1 2 3 Average
latency average 265.887 333.644 280.873 293.468
tps (including connections establishing) 37.60993 29.972067 35.603308 34.3951016666667
tps (excluding connections establishing) 38.089613 30.24576 35.997626 34.7776663333333

Clean up inbeteween tests

psql -h /tmp/docker/ -i -U nextcloud -d nextcloud

DROP TABLE pgbench_accounts, pgbench_branches, pgbench_tellers, pgbench_history;

First test the testing tables initialisation

pgbench -h /tmp/docker/ -i -U nextcloud -d nextcloud

...

done in 1.42 s (drop tables 0.00 s, create tables 0.11 s, client-side generate 0.68 s, vacuum 0.25 s, primary keys 0.38 s).

Then I Ran 3 tests using the command pgbench -h db -c 10 -p 5432 -U nextcloud -d nextcloud simulating 10 clients.

Postgres UNIX 1 2 3 Average
latency average 291.566 290.129 222.446 268.047
tps (including connections establishing) 34.297528 34.467479 44.954712 37.906573
tps (excluding connections establishing) 34.397523 34.570084 45.137941 38.0351826666667

My results show a much more modest performance difference with the database. But it's still an unambiguous improvement so well worth the minor amount of effort.

% Diff
latency average 9.00%
tps (including connections establishing) 10.00%
tps (excluding connections establishing) 9.00%
testing tables initialisation 23.00%

Finding, testing and minimising bottlenecks is possibly the most difficult task for any selfhosting admin. I hope you found this of use in your own bottleneck hunting journey.

Comments