Software Stack Deployment

This page will go through how to get started with the software stack installation. We will break down the process into pre-requisites, local infrastructure, with two stages of configuration and deployment, and finally the centralized infrastructure.

Note

The centralized infrastructure is not required for the local infrastructure to work, but it is recommended for larger projects with multiple sites and a large amount of data.

Pre-requisites

As explained in the hardware requirements section, the BIDS-flux infrastructure recommends using two local servers and one centralized server to ensure secure and efficient functionality. One server should have ample storage and moderate compute power, while the other should have limited storage but high computational power. The centralized server should have robust storage and sufficient computational power to handle data from all sites.

Both local and centralized infrastructures will rely on docker , specifically using docker-compose and docker swarm mode to run the services. Super user permissions will be required. Additionally, both local and centralized servers/VMs will use DataLad (which wraps Git and Git-annex) for data management and version control. Docker swarm mode was selected to take advantage of the built-in secret features and the ability to scale services across multiple servers if necessary.

Install Docker on all servers/VMs by following the official Docker documentation for your specific Linux distribution: install docker and docker compose. You can verify the installation with the following command:

sudo docker --version
sudo docker run hello-world

Install DataLad in all the servers/VMs. Follow the official documentation to install Datalad for your correct linux distribution. You can run the following command to check if git is installed correctly:

datalad status

Note

Git and Git-annex will need to be installed for datalad to work properly. Carefully follow the instructions in the DataLad installation guide to install them.

Local Infrastructure

Each local server/VM will need to have docker engine running which we will use to initialize a docker swarm. The recommended breakdown is to use two servers and divide the services as follows:

  1. Data Server:

    1. GitLab

    2. GitLab - Runner (Docker-in-Docker)

    3. MinIO

    4. Mercure / StoreSCP

  2. Processing Server:

    1. GitLab Runner

  1. On the Data Server you will run the following command to initialize the docker swarm manager node .

    docker swarm init --dport 9789
    

    This command will initialize the docker swarm and will return a token that you will need to run on the Processing Server. The output should look something like this:

    Swarm initialized: current node (dxn1zf6l61qsb1josjja83ngz) is now a manager.
    
    To add a worker to this swarm, run the following command:
    
        sudo docker swarm join \
        --token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c \
        --advertise-addr 192.168.99.100:2377
    
    To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
    

    Warning

    Be aware of the issues with docker swarm in a VMWare virtual machine.

    Note

    Make sure that your network is configured correctly and that the data and processing servers/VMs can communicate with each other on the required ports by docker https://docs.docker.com/engine/swarm/swarm-tutorial/#open-protocols-and-ports-between-the-hosts.

  2. Go into the worker node (processing server) and run the following command with the information obtained from the previous command.

    docker swarm join --token TOKEN --advertise-addr <IP-ADDRESS-OF-WORKER-1> <IP-ADDRESS-OF-MANAGER>:2377
    
  3. Create an attachable docker overlay network. This network will be used by all the services to securely communicate to each other.

    docker network create --driver=overlay --attachable BIDS-flux-net --gateway=192.11.0.2
    
  4. Once you have Docker, Git installed, and the docker swarm configured, you can start deploying the services. You will need to clone the software stack git repository which contains the docker-compose yaml files to deploy the services into the manager node in this case this will be the data server.

    git clone https://gitlab.unf-montreal.ca/bids-flux/local-stack.git
    

    Note

    You can also clone the following repositories to keep your repositories up to date with following releases:

    git clone https://gitlab.unf-montreal.ca/bids-flux/containers.git
    git clone https://gitlab.unf-montreal.ca/bids-flux/ci-pipelines.git
    
  5. The deployment of the services will be mostly automatic, nevertheless, there will still be some manual configurations that will require careful attention to detail.

Configuration Stage 1

  1. Change directory into the local-stack cloned repository and follow the next steps.

    cd local-stack
    
  2. The .env file will need to be set up with the proper DOMAIN_NAME of the Docker Swarm nodes where the individual services will be deployed. Once again, for BIDS-flux, the recommended breakdown is:

    Data Server: GitLab, GitLab - Runner (Docker-in-Docker), MinIO, Mercure / StoreSCP

    Processing Server: GitLab Runner

    This is an example of what the .env file should look like:

    # This is an example of what you will want to configure
    DOMAIN_NAME=data-server.org
    DICOM_ENDPOINT_HOST=data-server.org
    GITLAB_HOST=data-server.org
    STORAGE_SERVER_HOST=data-server.org
    PROC_SERVER_HOST=proc-server.org
    
  3. The .env file also contains information regarding the directory were in the filesystem will the infraestructure be storing all its data for future backups.

    # This location is usually standard but feel free to modify is required
    GITLAB_HOME=/srv/gitlab
    MERCURE_BASE=/opt/mercure
    MINIO_HOME=/mnt/minio-disks
    

    Warning

    Mare sure that the directories exist, otherwise docker wearm will fail to start the services.

  4. If you are using the recommended Mercure, you will require to configure some fields of the config/mercure-conf/default_mercure.json to:

    1. The Modules field in the json file to properly point to the dicom-indexer image.

    2. The environment variables to be used for this containers.

    3. The docker arguments including the docker command to run.

    4. Any necessary directory bindings for this container.

    Note

    This step can be manually finetunned using the Mercure GUI once Mercure has been installed.

    Note

    You may have noticed that the mercure service is not included in the `` BIDSflux-stack.yml`` file, this is okay. Currently, Mercure needs to be installed using docker-compose as opposed to docker swarm, but don’t worry, we will install it right after.

Stack Deployment Stage 1

  1. One you have completed the initial configuration, we need to deploy de secrets for the docker-warm services by running the deploy/generate_secrets.sh. This command will create the secrets required for the deployment of the services.

    bash deploy/generate_secrets.sh
    

    Important

    The secrets will only be displayed once so make sure to store them in a safe place.

  2. You will need to run the following command to initiate the docker swarm for the BIDSflux infraestructure. This will create a new docker stack where the docker swarm services will be deployed.

    sudo docker stack deploy -c BIDSflux_stack.yml BIDSflux
    

    Note

    It takes some time to finish up downloading the images and deploying the services.

    You can confirm the docker stack initialization by checking the individual services.

    sudo docker services ls
    

    This should return some information of the deployment status, for example, the gitlab service.

    ID             NAME                      MODE         REPLICAS   IMAGE                                                                             PORTS
    jhyou70vh0zz   BIDSflux_gitlab               replicated   1/1        gitlab/gitlab-ee:17.7.1-ee.0                                                      *:80->80/tcp, *:222->22/tcp, *:443->443/tcp, *:5050->5050/tcp
    

    What we care about the most is the REPLICAS as it tells us how many of the asked deployments are successfully up and running. You can also run the following command to get the service logs.

    sudo docker service logs BIDSflux_gitlab
    

    Note

    If you see REPLICAS as 0/1 this means that your deployment is ongoing or that there was an error with the deployment. If you encounter issues make sure that the Docker Swarm ports are open between the servers/VMs, and that the directories specified in the .env file exist. You can get more information using the following command.

    sudo docker stack ps BIDSflux --no-trunc | grep <Service with 0/1 replicas>
    
  3. You now should have all BIDSflux services running with 1/1 replicas, so, it is time to move to the next configuration stage.

Configuration Stage 2

  1. Run the mercure-setup.sh script in preparation for the Mercure deployment, this script will create some of the required directories and assign the correct USERNAME and permissions for mercure to run properly.

    bash mercure-setup.sh
    
  2. We need to create a root user GITLAB_TOKEN. For this you will need to go to your browser and open the GitLab instance, log in, and create a GITLAB_TOKEN that we will need for the following steps. You can do this by going to the URL defined by your DOMAIN_NAME in the .env file.”

    https://<DOMAIN_NAME>:443
    

    You will need to log in using the following credentials:

    username: root
    password: <gitlab_root_password> #as it was created using the deploy/generate_secrets.sh script
    

    Once you are logged in, go to the settings and create a new personal access token. Make sure to select the following scopes:

    api
    read_user
    read_repository
    write_repository
    read_registry
    write_registry
    read_package
    write_package
    admin_mode
    
  3. The next step is to run the deploy/init_ni-dataops.py and the deploy/runner_registration.py scripts to configure required users, tokens, variables, groups, clone the necessary resositories from the BIDS-flux, and the registration of the processing workforce the gitlab runners. Follow these steps:

    1. You will need to declare the following variables in your shell environment:

      • GITLAB_TOKEN #this was defined in the previous step where we created the personal access token.

      • BOT_EMAIL_DOMAIN #this can be an email domain of your choice, but it is recommended to use the same as the DOMAIN_NAME.

    Important

    If you do not have python installed, you must install it using the appropriate packages for your linux distribution.

    1. Create a python environment to install the required python packages to complete the configuration using the deploy/python-env.txt file. You can do this using the following command:

    python3 -m venv --system-site-packages /path/to/specific/directory/env
    source /path/to/specific/directory/env/bin/activate
    pip install -r deploy/python-env.txt
    
    1. Figure out what are the docker containers that are running the gitlab runners so we can use this information to register the correct runners in the correct servers. The dind runner will be running in the data server and the proc runners will be running in the processing server.

      sudo docker ps | grep gitlab-runner
      
    2. Once you have the python environment created and activated. You need to run the following script twice, once to register the dind gitlab runner which will handle tasks that require running docker inside a docker container like when building images inside a docker container; and a second time to register the untaged, bids, and processing runners which will handle the main pipeline tasks like DICOM to BIDS conversion, and the running of derivative pipelines:

      python deploy/runner_registration.py ~/.docker/config.json deploy/runner_configuration.json BIDSflux_gitlab-runner.x
      
      python deploy/runner_registration.py ~/.docker/config.json deploy/dind_runner_configuration.json BIDSflux_gitlab-runner-dind.x
      

      You can verify that the gitlab runners were registered correctly by going to the GitLab instance and checking the instance-wide runners in the amdin page settings.

      1. On the left sidebar, at the bottom, select Admin.

      2. Select CI/CD > Runners.

      You should see something like this:

      _images/runners.png
    3. After successfully registering the GitLab Runners, you can run the script which will finalize the configuration of the local GitLab instance. This script will show you two tokens that you will need to store in a safe place. The first token is the GITLAB_BOT_TOKEN which will be used to push the data to the GitLab instance, and the second token is the BIDS_API_TOKEN which will be used to provide access to the data in the pipelines.

      python deploy/init_ni-dataops.py --ci_config_path deploy/ci_variables.json
      

      Important

      Make sure that you safely store the BIDS_API_TOKEN and the DICOM_API_TOKEN as you will require them for the next steps.

  4. If you are using storescp instead of mercure you will need to properly configure these .env variables.

    # Required if you are using storescp an not mercure, if using mercure these will be configured someplace else
    GITLAB_REGISTRY_PATH=registry.gitlab.${DOMAIN_NAME}/ni-dataops/containers
    S3_URL_PATTERN='s3://s3.data-server.org/test.{ReferringPhysicianName}.{StudyDescriptionPath[1]}.dicoms'
    GITLAB_INDEXER_GROUP_TEMPLATE="{ReferringPhysicianName}/{StudyDescriptionPath[1]}"
    
    1. CI_SERVER_HOST: The URL of the GitLab instance where the data will be pushed. Make sure to change the DOMAIN_NAME_PLACEHOLDER to the correct domain name.

    2. S3_URL_PATTERN: The URL pattern for the S3 bucket where the data will be pushed to in MinIO. Ideally the information used here should match the one used in the GITLAB_INDEXER_GROUP_TEMPLATE.

    3. GITLAB_INDEXER_GROUP_TEMPLATE: The template for the GitLab group where the data will be pushed. This should be the same as the one used in the S3_URL_PATTERN.

    Additionally, you will need to uncomment the following lines in the BIDS-flux.yml file corresponding to the service deployment.

    # # this service requires:
    # # - gitlab instance to be started
    # # - deploy to be run to have containers repo fork
    # # - ni-dataops/containers to have completed containers build so that image below is in registry
    # dicom_endpoint:
    #   image: ${GITLAB_REGISTRY_PATH}/dicom_indexer:latest
    # #  hostname: storescp
    # #  profiles: [dicom_endpoint]
    #   depends_on: [gitlab, gitlab-runner-proc]
    #   environment:
    #     CI_SERVER_HOST: $CI_SERVER_HOST
    #     GITLAB_BOT_USERNAME: $GITLAB_BOT_USERNAME
    #     GITLAB_BOT_EMAIL: $GITLAB_BOT_EMAIL
    #     STORESCP_AET: $STORESCP_AET
    #     GITLAB_INDEXER_GROUP_TEMPLATE: "{ReferringPhysicianName}/{StudyDescriptionPath[1]}"
    #     S3_URL_PATTERN: 's3://s3.data-server.org/test.{ReferringPhysicianName}.{StudyDescriptionPath[1]}.dicoms'
    #   networks:
    #     - BIDS-flux-net
    #   secrets:
    #     - source: dicom_bot_token
    #       target: /var/run/secrets/dicom_bot_gitlab_token
    #     - source: s3_id
    #       target: /var/run/secrets/s3_id
    #     - source: s3_key
    #       target: /var/run/secrets/s3_secret
    
    #   ports:
    #     - "$STORESCP_PORT:$STORESCP_PORT"
    #   deploy:
    #     placement:
    #       constraints:
    #         - node.hostname == $DICOM_ENDPOINT_HOST
    #   entrypoint: ["/usr/bin/storescp", "-aet", "$STORESCP_AET", "-pm", "-od", "/tmp", "-su", "", "--eostudy-timeout", "60", "--exec-on-eostudy", "python indexer/index_dicom.py", "--gitlab-url $CI_SERVER_HOST", "--storage-remote", '$S3_URL_PATTERN', "--gitlab-group-template", '$GITLAB_INDEXER_GROUP_TEMPLATE', '#p', '$STORESCP_PORT']
    

    If you are using mercure, you can skip to step 2 at the end of Stack Deployment Stage 2.

Stack Deployment Stage 2

  1. As promised you can now deploy Mercure and you can do so with a simple command.

    sudo docker compose -f docker-compose-mercure.yml up -d
    

    Here the -f tells docker compose which file to use, -d tells docker to run in detached mode, and up is the command to deploy the mercure services

    You can also navigate to the Mercure GUI deployed in http://<DOMAIN_NAME>:8000 and check the status/logs of the services there. If all services are up, you should see something like this:

    _images/mercure-gui.png

    You will need to login using the default credentials:

    username: admin
    password: router
    

    You can always change the login credentials in the GUI settings.

    Note

    Alternatibly, you can check if the mercure services are running check the logs running:

    sudo docker ps | grep mercure #identify the mercure related containers and check the logs of the indiviudal containers
    sudo docker logs mercure-receiver-1
    sudo docker logs mercure-dispatcher-1
    sudo docker logs mercure-cleaner-1
    sudo docker logs mercure-router-1
    

    Note

    Refer to the Mercure documentation for more information on how to configure the mercure services and troubleshooting.

    In the GUI, you will be able to see the status of the services, logs, and configure the routing/processing of DICOMS. Which brings us to the next step.

    Configure the DICOM receiver rules to properly route/process the received DICOMS.

    Navigate to the Settings tab and go to the Rules section. Here you will be able to configure the rules filtering based on the DICOM tags available. Mercure is very powerful and flexible. You can configure actions to re-route the received DICOMS to another DICOM service, to process the DICOMS, or to do both. The rules can be based on individual MRI series or based on the study (complete set of series in an MRI visit with the same StudyInstanceUID) completion, and how to define series/study completion is also flexible. You can define the study/series completion rules based on the time after the last DICOM transfer, or based on the received series in case of the study-wide actions.

    Let’s go through the configuration rules of the pre-configured rule.

    _images/mercure-rules.png
    1. In the selection rule we will indicate what DICOMS will trigger this rule. In our case all dicoms which have the Modality tag set to MR. This means that all the DICOMS that are received with these tags will trigger this rule.

    2. The trigger is set to Completed Study. This means that when all the series of the study are received this will trigger the action.

    3. The action is set as Process Only. This means that when the the completion criteria is met we will proceed to process the data.

    4. The Completion Criteria is set to Listed Series Received. This means that when the expected series have been received the action will be triggered. In our case we have an example of two series which’s SeriesDescription is ‘MRSI’ and ‘STAGE_preproc’

    5. You can also Force an action if the completion criteria is not met. This means that even if the expected series are not received, you can decide what action to perform with the data.

    The next configuration needed is the Mercure Modules which are docker images that will be used to process the data received.

    _images/mercure-modules.png

    Note

    The imaged built for the dicom-indexer module would have been created when we ran the deploy/init_ni-dataops.py script.

    You can edit this module to better suit the needs for your project.

    _images/mercure-modules2.png
    1. Docker Tag: registry.DOMAIN_NAME_PLACEHOLDER/ni-dataops/containers/dicom_indexer:latest is the name of the docker image that was build in consecuence to the creation of the containers repository in Gitlab. Make sure to change the DOMAIN_NAME_PLACEHOLDER to the correct domain name.

    2. Environment Variables for the jobs to run whenever the module is triggered. These variables will be passed to the docker container when it is run. You can add any environment variable you want to pass to the container.

      {
          "CI_SERVER_HOST": "gitlab.DOMAIN_NAME_PLACEHOLDER",
          "GITLAB_BOT_USERNAME": "bids_bot",
          "GITLAB_BOT_EMAIL": "bids_bot@DOMAIN_NAME_PLACEHOLDER",
          "GITLAB_TOKEN": "GITLAB_TOKEN_PLACEHOLDER",
          "AWS_ACCESS_KEY_ID": "AWS_ACCESS_KEY_ID_PLACEHOLDER",
          "AWS_SECRET_ACCESS_KEY": "AWS_SECRET_ACCESS_KEY_PLACEHOLDER",
          "S3_URL_PATTERN": "s3://s3.DOMAIN_NAME_PLACEHOLDER/test.{ReferringPhysicianName}.{StudyDescription}.dicoms",
          "GITLAB_INDEXER_GROUP_TEMPLATE": "{ReferringPhysicianName}/{StudyDescription}",
          "GIT_SSH_PORT": 222,
          "DEBUG": false
      }
      
      1. CI_SERVER_HOST: The URL of the GitLab instance where the data will be pushed. Make sure to change the DOMAIN_NAME_PLACEHOLDER to the correct domain name.

      2. GITLAB_BOT_USERNAME: The username of the GitLab bot which should also be given access to the data being pushed. Make sure to change the DOMAIN_NAME_PLACEHOLDER to the correct domain name.

      3. GITLAB_BOT_EMAIL: The email of the GitLab bot which should also be given access to the data being pushed.

      4. GITLAB_TOKEN: The token of the GitLab bot which will be used to push the data dicom_bot. This token was created when we ran the deploy/init_ni-dataops.py script. You were asked to safely store this token Configuration Stage 2.

      5. AWS_ACCESS_KEY_ID: The AWS access key ID for the S3 bucket where the data will be pushed. You were asked to safely store this token Stack Deployment Stage 1 under s3_id.

      6. AWS_SECRET_ACCESS_KEY: The AWS secret access key for the S3 bucket where the data will be pushed. You were asked to safely store this token Stack Deployment Stage 1 under s3_secret.

      7. S3_URL_PATTERN: The URL pattern for the S3 bucket where the data will be pushed to in MinIO. Ideally the information used here should match the one used in the GITLAB_INDEXER_GROUP_TEMPLATE.

      8. GITLAB_INDEXER_GROUP_TEMPLATE: The template for the GitLab group where the data will be pushed. This should be the same as the one used in the S3_URL_PATTERN.

      9. GIT_SSH_PORT: The port used to connect to the GitLab instance. This should be the same as the one used in the .env file.

      10. DEBUG: Set to true if you want to see the logs of the module when it is run. This is useful for debugging purposes.

  2. If you are using storescp then re-run the following command after you have uncommented the lines in the BIDS-flux.yml file.:

    sudo docker stack deploy -c BIDSflux_stack.yml BIDSflux
    

Centralized Infrastructure