Docker & Data Builds

This section describes how to build the Docker images and also the VarFish site data tarballs. The intended audience are VarFish developers.

Build Docker Images

Building the image:

$ ./docker/build-docker.sh

By default the latest tag is used. You can change this with.

$ GIT_TAG=v0.1.0 ./docker/build-docker.sh

Get varfish-docker-compose

The database is built in varfish-docker-compose.

$ git clone git@github.com:bihealth/varfish-docker-compose.git
$ cd varfish-docker-compose
$ ./init.sh

First-Time Container Startup

You have to startup the postgres container once to create the Postgres database. Once it has been initialized, shutdown with Ctrl-C.

$ docker-compose up postgres
<Ctrl-C>

Now copy over the postgresql.conf file that has been tuned for the VarFish use cases.

$ cp config/postgres/postgresql.conf volumes/postgres/data/postgresql.conf

Bring up the site again so we can build the database.

$ docker-compose up

Wait until varfish-web is up and running and all migrations have been applied, look for VARFISH MIGRATIONS END in the output of run-docker-compose-up.sh.

Pre-Build Postgres Database

Download static data

$ cd /plenty/space
$ wget https://file-public.bihealth.org/transient/varfish/athenea/varfish-server-background-db-20201006.tar.gz{,.sha256}
$ sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
$ tar xzvf varfish-server-background-db-20201006.tar.gz

Adjust the docker-compose.yml file such that /plenty/space is visible in the varfish-web container.

volumes:
    - "/plenty/space:/data"

Get the name of the running varfish-web container.

$ docker ps
CONTAINER ID   IMAGE                                                       COMMAND                  CREATED          STATUS              PORTS                                      NAMES
44be6ece102e   minio/minio                                                 "/usr/bin/docker-ent…"   11 minutes ago   Up About a minute   9000/tcp                                   varfish-docker-compose_minio_1
3b23113e5aa1   quay.io/biocontainers/exomiser-rest-prioritiser:12.1.0--1   "exomiser-rest-prior…"   11 minutes ago   Up About a minute                                              varfish-docker-compose_exomiser-rest-prioritiser_1
b8c49e8c24a6   quay.io/biocontainers/jannovar-cli:0.33--0                  "jannovar -Xmx6G -Xm…"   11 minutes ago   Up About a minute                                              varfish-docker-compose_jannovar_1
409a535b9951   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celerybeat_1
7eb7425c59e2   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celeryd-import_1
020811fde306   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celeryd-query_1
87b03ee0249b   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-celeryd-default_1
7a3fdb337fae   bihealth/varfish-server:0.22.1-0                            "docker-entrypoint.s…"   12 minutes ago   Up About a minute   8080/tcp                                   varfish-docker-compose_varfish-web_1
9295a101570f   postgres:12                                                 "docker-entrypoint.s…"   12 minutes ago   Up About a minute   5432/tcp                                   varfish-docker-compose_postgres_1
1c4d6e235074   traefik:v2.3.1                                              "/entrypoint.sh --pr…"   12 minutes ago   Up About a minute   0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   varfish-docker-compose_traefik_1
8d72fd096743   redis:6                                                     "docker-entrypoint.s…"   12 minutes ago   Up About a minute   6379/tcp                                   varfish-docker-compose_redis_1

Initialize the tables (while at least docker-compose up varfish-web postgres redis is running).

$ docker exec -it -w /usr/src/app varfish-docker-compose_varfish-web_1 python manage.py import_tables --tables-path /data --threads 8

Then, shutdown the docker-compose up, remove the volumes: entry for varfish-web, and create a tarball of the postgres database to have a clean copy.

Add Other Data

Copy the other required data for jannovar and exomiser. You can find the appropriate files to download on the Jannovar (via Zenodo) and Exomiser data download sites:

You should use the hg19 data for Exomiser for any genome release as we will only use the the gene to phenotype prioritization that is independent of the genome release.

The result should look similar to this:

# tree volumes/jannovar volumes/exomiser
volumes/jannovar
├── hg19_ensembl.ser
├── hg19_refseq_curated.ser
└── hg19_refseq.ser
volumes/exomiser
├── 1909_hg19
│   ├── 1909_hg19_clinvar_whitelist.tsv.gz
.   .   [..]
│   └── 1909_hg19_variants.mv.db
└── 1909_phenotype
    ├── 1909_phenotype.h2.db
    ├── phenix
    │   ├── 10.out
    .   .   [..]
    │   ├── ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt
    │   ├── hp.obo
    │   └── phenotype_annotation.tab
    └── rw_string_10.mv

3 directories, 55 files

Create a Superuser

While the docker-compose up is running

$ docker exec -it -w /usr/src/app varfish-docker-compose_varfish-web_1 python manage.py createsuperuser
Username: root
Email address:
Password: <changeme>
Password (again): <changeme>
Superuser created successfully.

Setup Initial Data

Create test category & project.

Obtain API key and configure varfish-cli.

Import some test data through the API.

$ varfish-cli --no-verify-ssl case create-import-info --resubmit \
    92f5d735-0967-4db2-a801-50fe96359f51 \
    $(find path/to/variant_export/work/*NA12878* -name '*.tsv.gz' -or -name '*.ped')

Create Data Tarballs

Now create the released data tarballs.

tar -cf - volumes | pigz -c > varfish-site-data-v1-20210728-grch37.tar.gz && sha256sum varfish-site-data-v1-20210728-grch37.tar.gz >varfish-site-data-v1-20210728-grch37.tar.gz.sha256 &
tar -cf - volumes | pigz -c > varfish-site-data-v1-20210728-grch38.tar.gz && sha256sum varfish-site-data-v1-20210728-grch38.tar.gz >varfish-site-data-v1-20210728-grch38.tar.gz.sha256 &
tar -cf - test-data | pigz -c > varfish-test-data-v1-20211125.tar.gz && sha256sum varfish-test-data-v1-20211125.tar.gz >varfish-test-data-v1-20211125.tar.gz.sha256