System Configuration
This section describes how to configure the varfish-docker-compose
setup.
When running with the varfish-docker-compose
files and the provided database files, VarFish comes preconfigured with sensible default settings and also contains some example datasets to try out.
There are a few things that you might want to tweak.
Please note that there might be more settings that you can change when exploring the VarFish source code but right now their use is not supported for external users.
VarFish & Docker Compose
The recommended (and supported) way to deploy VarFish is using Docker compose.
The VarFish server and its component are not installed on the system itself but rather a number of Docker containers with fixed Docker images are run and work together.
The base docker-compose.yml
file starts a fully functional VarFish server.
Docker Compose supports using so-called override files.
Basically, the mechanism works by providing an docker-compose.override.yml
file that is automatically read at startup when running docker-compose up
.
This file is put into the .gitignore so it is not in the varfish-docker-compose
repository but rather created in the checkouts (e.g., manually or using a configuration management tool such as Ansible).
On startup, Docker Compose will read first the base docker-compose.yml
file.
It will then read the override file (if it exists) and recursively merge both YAML files with the override file overriding taking precedence over the base file.
Note that the recursive merging will be done on YAML dicts only, lists will overwritten.
The mechanism in detail is described in the official documentation.
We provide the following files that you can use/combine into the local docker-compose.override.yml
file of your installation.
docker-compose.override.yml-cert
– use TLS encryption with your own certificate from your favourite certificate provider (by default an automatically generated self-signed certificate will be used by traefik, the reverse proxy).docker-compose.override.yml-letsencrypt
– use letsencrypt to obtain a certificate.docker-compose.override.yml-cadd
– spawn Docker containers for allowing pathogenicity annotation of your variants with CADD.
The overall process is to copy any of the *.override.yml-*
files to docker-compose.yml
and adjusting it to your need (e.g., merging with another such file).
Note that you could also explicitely provide multiple override files but we do not consider this further. For more information on the override mechanism see the official documentation.
The following sections describe the possible adjustment with Docker Compose override files.
TLS / SSL Configuration
The varfish-docker-compose
setup uses traefik as a reverse proxy and must be reconfigured if you want to change the default behaviour of using self-signed certificates.
Use the contents of docker-compose.override.yml-cert
for providing your own certificate.
You have to put the cerver certificate and key into config/traefik/tls/server.crt
and server.key
and then restart the traefik
container.
Make sure to provide the full certificate chain if needed (e.g., for DFN issued certificates).
If your site is reachable from the internet then you can also use the contents of docker-compose.override.yml-letsencrypt
which will use [letsencrypt](https://letsencrypt.org/) to obtain the certificates.
Make sure to adjust the line with --certificatesresolvers.le.acme.email=
to your email address.
Note well that if you make your site reachable from the internet then you should be aware of the implications.
VarFish is MIT licensed software which means that it comes “without any warranty of any kind”, see the LICENSE
file for details.
After changing the configuration, restart the site (e.g., with docker-compose down && docker-compose up -d
if it is running in detached mode).
LDAP Configuration
VarFish can be configured to use up to two upstream LDAP servers (e.g., OpenLDAP or Microsoft Active Directory).
For this, you have to set the following environment variables in the file .env
in your varfish-docker-compose
checkout and restart the site.
The variables are given with their default values.
ENABLE_LDAP=0
Enable primary LDAP authentication server (values:
0
,1
).AUTH_LDAP_SERVER_URI=
URI for primary LDAP server (e.g.,
ldap://ldap.example.com:port
orldaps://...
).AUTH_LDAP_BIND_DN=
Distinguished name (DN) to use for binding to the LDAP server.
AUTH_LDAP_BIND_PASSWORD=
Password to use for binding to the LDAP server.
AUTH_LDAP_USER_SEARCH_BASE=
DN to use for the search base, e.g.,
DC=com,DC=example,DC=ldap
AUTH_LDAP_USERNAME_DOMAIN=
Domain to use for user names, e.g. with
EXAMPLE
users from this domain can login withuser@EXAMPLE
.AUTH_LDAP_DOMAIN_PRINTABLE=${AUTH_LDAP_USERNAME_DOMAIN}
Domain used for printing the user name.
If you have the first LDAP configured then you can also enable the second one and configure it.
ENABLE_LDAP_SECONDARY=0
Enable secondary LDAP authentication server (values:
0
,1
).
The remaining variable names are derived from the ones of the primary server but using the prefix AUTH_LDAP2
instead of AUTH_LDAP
.
SAML Configuration
Besides LDAP configuration, it is also possible to authenticate with existing SAML 2.0 ID Providers (e.g. Keycloak). Since varfish is built on top of sodar core, you can also refer to the sodar-core documentation for further help in configuring the ID Providers.
To enable SAML authentication with your ID Provider, a few steps are necessary. First, add a SAML Client for your ID Provider of choice. The sodar-core documentation features examples for Keycloak. Make sure you have assertion signing turned on and allow redirects to your varfish site.
The SAML processing URL should be set to the externally visible address of your varfish deployment, e.g. https://varfish.example.com/saml2_auth/acs/
.
Next, you need to obtain your metadata.xml aswell as the signing certificate and key file from the ID Provider. Make sure you convert these keys to standard OpenSSL
format, before starting your varfish instance (you can find more details here).
If you deploy varfish without docker, you can pass the file paths of your metadata.xml and key pair directly. Otherwise, make sure that you have included them
into a single folder and added the corresponding folder to your docker-compose.yml
(or add it as a docker-compose-overrrided.yml
), like in the following snippet.
varfish-web:
...
volumes:
- "/path/to/my/secrets:/secrets:ro"
Then, define atleast the following variables in your docker-compose .env
file (or the environment variables when running the server natively).
ENABLE_SAML
[Default 0] Enable [1] or Disable [0] SAML authentication
SAML_CLIENT_ENTITY_ID
The SAML client ID set in the ID Provider config (e.g. “varfish”)
SAML_CLIENT_ENTITY_URL
The externally visible URL of your varfish deployment
SAML_CLIENT_METADATA_FILE
The path to the metadata.xml file retrieved from your ID Provider. If you deploy using docker, this must be a path inside the container.
SAML_CLLIENT_IDP
The url to your IDP. In case of keycloak it can look something like https://keycloak.example.com/auth/realms/<my_varfish_realm>
SAML_CLIENT_KEY_FILE
Path to the SAML signing key for the client.
SAML_CLIENT_CERT_FILE
Path to the SAML certificate for the client.
SAML_CLIENT_XMLSEC1
[Default /usr/bin/xmlsec1] Path to the xmlsec executable.
By default, the SAML attributes map is configured to work with Keycloak as SAML Auth provider. If you are using a different ID Provider,
or different settings you also need to adjust the SAML_ATTRIBUTES_MAP
option.
SAML_ATTRIBUTES_MAP
A dictionary identifying the SAML claims needed to retrieve user information. You need to set atleast
email
,username
,first_name
andlast_name
. Example:SAML_ATTRIBUTES_MAP="email=email,username=uid,first_name=firstName,last_name=name"
To set initial user permissions on first login, you can use the following options:
SAML_NEW_USER_GROUPS
Comma separated list of groups for a new user to join.
SAML_NEW_USER_ACTIVE_STATUS
[Default True] Whether a new user is considered active.
SAML_NEW_USER_STAFF_STATUS
[Default True] New users get the staff status.
SAML_NEW_USER_SUPERUSER_STATUS
[Default False] New users are marked superusers (I advise leaving this one alone).
If you encounter any troubles with this rather involved procedure, feel free to take a look at the discussion forums on github and open a thread.
Sending of Emails
You can configure VarFish to send out emails, e.g., when permissions are granted to users.
PROJECTROLES_SEND_EMAIL=0
Enable sending of emails.
EMAIL_SENDER=
String to use for the sender, e.g.,
noreply@varfish.example.com
.EMAIL_SUBJECT_PREFIX=
Prefix to use for email subjects, e.g.,
[VarFish]
.EMAIL_URL=
URL to the SMTP server to use, e.g.,
smtp://user:password@mail.example.com:1234
.
External Postgres Server
In some setups, it might make sense to run your own Postgres server. The most common use case would be that you want to run VarFish in a setting where fast disks are not available (virtual machines or in a “cloud” setting). You might still have a dedicated, fast Postgres server running (or available as a service from your cloud provider). In this case, you can configure the database connection settings as follows.
DATABASE_URL=postgresql://postgres:password@postgres/varfish
Adjust to the credentials, server, and database name that you want to use.
The default settings do not make for secure settings in the general case.
However, Docker Compose will create a private network that is only available to the Docker containers.
In the default docker-compose
setup, postgres server is thus not exposed to the outside and only reachable by the VarFish web server and queue workers.
Miscellaneous Configuration
VARFISH_LOGIN_PAGE_TEXT
Text to display on the login page.
FIELD_ENCRYPTION_KEY
Key to use for encrypting secrets in the database (such as saved public keys for the Beacon Site feature). You can generate such a key with the following command:
python -c 'import os, base64; print(base64.urlsafe_b64encode(os.urandom(32)))'
.VARFISH_QUERY_MAX_UNION
Maximal number of cases to query for at the same time for joint queries. Default is
20
.
Sentry Configuration
Sentry is a service for monitoring web apps. Their open source version can be installed on premise. You can configure sentry support as follows
ENABLE_SENTRY=0
Enable Sentry support.
SENTRY_DSN=
A sentry DSN to report to. See Sentry documentation for details.
System and Docker (Compose) Tweaks
A number of customizations customizations of the installation can be done using Docker or Docker Compose. Other customizations have to be done on the system level. This section lists those that the authors are aware of but in particular network-related settings can be done on many levels.
Using Non-Default HTTP(S) Ports
If you want to use non-standard HTTP and HTTPS ports (defaults are 80 and 443) then you can tweak this in the traefik
container section.
You have to adjust two parts, below we give them separately with full YAML “key” paths.
services:
traefik:
ports:
- "80:80"
- "443:443"
To listen on ports 8080
and 8443
instead, your override file should have:
- services:
- traefik:
- ports:
“8080:80”
“8443:443”
Also, you have to adjust the command line arguments to traefik for the web
(HTTP) and websecure
(HTTPS) entrypoints.
services:
traefik:
command:
# ...
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
Use the following in your override file.
services:
traefik:
command:
# ...
- "--entrypoints.web.address=:8080"
- "--entrypoints.websecure.address=:8443"
Based on the docker-compose.yml
file alone, your docker-compose.override.yml
file should contain the following line.
You will have to adjust the file accordingly if you want to use a custom static certificate or letsencrypt by incorporating the files from the provided example docker-compose.override.yml-*
files.
services:
traefik:
ports:
- "8080:80"
- "8443:443"
command:
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
- "--entrypoints.web.http.redirections.entryPoint.to=websecure"
- "--entrypoints.web.http.redirections.entryPoint.scheme=https"
- "--entrypoints.web.http.redirections.entrypoint.permanent=true"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
Then, restart by calling docker-compose up -d
in the directory with the docker-compose.yml
file.
Listing on Specific IPs
By default, the traefik
container will listen on all IPs and interfaces of the host machine.
You can change this by prefixing the ports
list with the IPs to listen on.
The settings to adjust here are:
services:
traefik:
ports:
- "80:80"
- "443:443"
And they need to be overwritten as follows in your override file.
services:
traefik:
ports:
- "10.0.0.1:80:80"
- "10.0.0.1:443:443"
More details can be found in the corresponding section of the Docker Compose manual.
Of course, you can combine this with adjusting the ports, e.g., to 10.0.0.1:8080:80
etc.
Limit Incoming Traffic
In some settings you might want to limit incoming traffic to certain networks / IP ranges.
In principle, this is possible with adjusting the Traefik load balancer/reverse proxy.
However, we would recommend you to use the firewall of your operating system or your overall network for this purpose.
Consult the corresponding manual (e.g., of firewalld
for CentOS/Red Hat or of ufw
for Debian/Ubuntu) for instructions.
We remark that in most cases it is better to perform an actual separation of networks and place each (virtual) machine into one network only.
Understanding Volumes
The volumes
sub directory of the varfish-docker-compose
directory contains the data for the containers.
These are as follows.
cadd-rest-api
Databases for variant annotation with CADD (large).
exomiser
Databases for variant prioritization (medium)
jannovar
Transcript databases for annotation (small).
minio
Storage for files uploaded from client via REST API (big).
postgres
PostgreSQL databases (very big).
redis
Storage for the work queues (small).
traefik
Configuration and certificates for load balancer (very small).
In principle, you can put these on different storages systems (e.g., some over the network and some on directly attached disks).
The main motivation is that fast storage is expensive.
Putting the small and medium sized directories on slower, cheaper storage will have little or no effect on storage efficiency.
At the same time, access to redis
and exomiser
directories should be fast.
As for postgres
, this storage is accessed most heavily and should be on storage as fast as you can afford.
cadd-rest-api
should also be on fast storage but it is accessed almost only read-only.
You can put the minio
folder on slower storage to shave off some storage costs from your VarFish installation.
To summarize:
You can put
minio
on cheaper storage.As for
cadd-rest-api
, you can probably get away to put this on cheaper storage.Put everything else, in particular
postgres
on storage as fast as you can afford.
As described in the section Performance Tuning, the authors recommend using an advanced file system such as ZFS on multiple SSDs for large, fast storage and enabling compression. You will get excellent performance and can expect storage saving of 50%.
Beacon Site (Experimental)
An experimental support for the GA4GH beacon protocol.
VARFISH_ENABLE_BEACON_SITE=
Whether or not to enable experimental beacon site support.
Undocumented Configuration
The following list remains a points to implement with Docker Compose and document.
Kiosk Mode
Updating Extras Data