Usage Notes ----------- After completing the :ref:`software deployment ` you should have at least 2 servers/VMs (it could technically be just 1) a **data server** and a **processing server**. This section is intended to provide a guide on how to use the infraestructure for it's intended purpose of neuroimaging data management and analysis acceleration. Data Flow ^^^^^^^^^ This section describes an overview of the data flow from the moment a study participant is scanned to the moment the data is available for a release. .. _diagram:: .. mermaid:: %%{ init: { "theme": "forest", "sequence": { "showSequenceNumbers": true } } }%% sequenceDiagram box Gray Dicom Acquisition participant MRI Scanner participant Mercure as Mercure
Storescp end box Orange Local Gitlab participant DicomStudy participant BIDS participant MRIQC participant BIDS-derivatives end box Green Centralized Gitlab participant BIDS-fed participant BIDS-derivatives-fed end MRI Scanner ->> Mercure: send dicom Mercure ->> DicomStudy: init Mercure ->> BIDS: init loop Every new session Mercure ->> DicomStudy: add session as submodule DicomStudy ->> BIDS: trigger heudiconv BIDS ->>+BIDS: open MR activate BIDS BIDS ->> BIDS: test: bids-validator BIDS ->> BIDS: test: protocol compliance BIDS ->> BIDS: run: defacing BIDS ->> MRIQC: trigger: mriqc MRIQC ->> BIDS: include/add link to reports in MR create actor A as Dataset Admin BIDS ->>-A: Notify A ->> BIDS: Review + (Fix) + merge MR loop For each configured preproc pipelines BIDS ->>+BIDS-derivatives: trigger preproc newly merged session BIDS-derivatives ->> BIDS-derivatives: open MR (preproc reports as artifacts) BIDS-derivatives ->>-A: Notify A ->> BIDS-derivatives: Review reports + merge MR end end A ->> BIDS: Create release BIDS ->> BIDS-fed: Push git + "green" data (minio) A ->> BIDS-derivatives: Create release BIDS-derivatives ->> BIDS-derivatives-fed: Push git + "green" data (minio) As we can see in the diagram the data flow is divided into 3 parts. 1) Data acquisition in DICOM format which will be pushed to the Mercure instance. The data is then automatically pushed to the local GitLab instance. 2) Local GitLab git flow illustrates the workflow, starting from the push of a new DICOM session to executing data conversion and processing. 3) Data integration and sharing using a centralized GitLab instance. The dataset admin stick man corresponds the indiviudal responsible for QC reviewing, dataset merging and pipeline monitoring. These tasks can be divided into multiple individuals for more efficient and robust management. We utilize DICOM networking protocol to transfer the images from the scanner to the data server Mercure instance where it gets archived and automatically pushed to the GitLab instance based on the following DICOM tags: - **ReferringPhysicianName:** This determines the Principal Investigator Name and corresponds to the root GitLab group name for the hierarchical structure of the dataset. - **StudyDescription:** This determines the study name and it corresponds to the following sub-group name in the hierarchical structure of the dataset. - **StudyInstanceUID:** This determines the unique dicom study ID and it is used to track the dicom data in GitLab. - **PatientID:** This determines the BIDS unique participant ID and session IDs and it is used to create link the DICOM data to the BIDS dataset. Technically speaking, mercure can receive data from any MRI vendor, however, it has been only been configured to work with either a Siemens or GE 3T MRI scanner yet. This can be adapted to any scanner vendor with a bit of work and testing. .. note:: The selected DICOM tags can be modified to adapt to the restrictions of the acquisition site. Nevertheless, it is advised to reliably have enough information in the tags to be able to create an equivalent structure. Git Flow ^^^^^^^^ Using DataLad for data version control enables tracking the provenance of datasets from their creation to their sharing. This is achieved through a Git flow approach, where changes to the dataset are stored in separate branches and merged when ready. The previous :ref:`sequence diagram` illustrates the workflow, starting from the push of a new DICOM session to executing data processing and release mechanisms on the federated instance (details to be designed). Each GitLab actor/repository (e.g., DicomStudy, BIDS, Derivatives) is specific to a study, as defined by the DICOM tag `StudyDescription` (e.g., C-PIP). Each study follows this structure within a GitLab group, organized under groups corresponding to the local Principal Investigator or local consortia. All operations on GitLab are automated through GitLab pipelines, executed as CI jobs by the GitLab runners and can be divided into different phases. Pilot Phase ~~~~~~~~~~~ During the pilot phase, an experimenter will acquire one or multiple sessions to test sequences and/or full protocol. When the sessions are labelled as ``dev`` or ``pilot`` in the ``PatientID`` these are considered pilot sessions. The pilot sessions are converted to BIDS as regular session but open a `Merge Request (MR)` to the `pilot branch`. That MR triggers the same workflow as for sessions in the production phase, including BIDS-validation, defacing, and MRIQC: all useful to examine the compliance and quality of the data. Once merged to the pilot branch they also trigger: - A configuration of the `forbids `_ tool that will enforce the protocol in future sessions. - A configuration of standard pre-processing pipelines based on the acquired data. - Standard pre-processing pipelines are then triggered to check if the pilot data are compatible and produce sensible results. The merge of new sessions iterating on the protocol reconfigure the protocol and pipelines, and also opens a `Merge Request` from the `cherry-picked` configs on `config` to the `base` branch. When the protocol is finalized and all checks pass, that MR with the latest config is to be reviewed, manually edited if necessary, and merged, effectively setting-up the repo for tests and derivatives generation during the production phase. .. mermaid:: %%{ init: { "theme": "forest" } }%% gitGraph: commit "start" branch config branch base checkout base commit id:"zzzzzzzzzzz" branch pilot checkout base branch convert/pilot1 checkout convert/pilot1 commit id:"heudiconv" commit id:"post-heudiconv-fixes" commit id:"fill-intendedfor/b0field" commit id:"deface" checkout pilot merge convert/pilot1 commit id:"configs" checkout config cherry-pick id:"configs" checkout base commit id:"to better align" branch convert/pilot2 checkout convert/pilot2 commit id:"heudiconv-2" commit id:"post-heudiconv-fixes-2" commit id:"fill-intendedfor/b0field-2" commit id:"deface-2" checkout pilot merge convert/pilot2 commit id:"reconfigs" checkout config cherry-pick id:"reconfigs" checkout base merge config checkout main merge base Production Phase ~~~~~~~~~~~~~~~~ During the production phase, new sessions are `converted` into separated ``convert/{session_name}`` branches and open new Merge Requests with tests / QC reports to be reviewed and edited if necessary, before merging into the `dev` branch. .. mermaid:: %%{ init: { "theme": "forest" } }%% gitGraph: commit "start" branch base checkout base commit id:"zzzzzzzzzzz" branch dev checkout base branch convert/session_name1 checkout convert/session_name1 commit id:"heudiconv" commit id:"post-heudiconv-fixes" commit id:"fill-intendedfor/b0field" commit id:"deface" checkout dev merge convert/session_name1 checkout base branch convert/session_name2 checkout convert/session_name2 commit id:"heudiconv-2" commit id:"post-heudiconv-fixes-2" commit id:"fill-intendedfor/b0field-2" commit id:"deface-2" checkout dev merge convert/session_name2 Release Phase ~~~~~~~~~~~~~ When working on a data-release, a new release branch can be created from ``dev``, iterated upon (eg. edit README, docs, ) through branches and MRs, and finally merge to the ``main`` branch and tagged with a release version. New sessions continues to be added to the ``dev`` branch in the back. .. mermaid:: %%{ init: { "theme": "forest" } }%% gitGraph: branch dev checkout main commit commit id:"previous_release" tag:"rel/www" checkout dev commit id:"long history" commit id:"bunch_of_sessions_now" branch rel/xxx checkout rel/xxx branch fix/xyz checkout dev commit commit checkout fix/xyz commit id:"random-fix" checkout rel/xxx merge fix/xyz checkout dev commit commit checkout rel/xxx branch fix/zyx checkout fix/zyx commit id:"edit README" checkout rel/xxx merge fix/zyx checkout main merge rel/xxx tag:"rel/xxx" checkout dev commit Pipeline Management ^^^^^^^^^^^^^^^^^^^ Automated ~~~~~~~~~~ After proper configurations have been made, the data ingestion process is fully automated. The data is pushed to the Mercure instance and automatically pushed to the local GitLab instance. The data is then converted to BIDS format and processed using the configured pipelines. Heudiconv Conversion to BIDS ============================ The Heudiconv tool is used to convert DICOM files to BIDS format following a set of heuristics that define how the data should be organized. The heuristics file is a Python script that can be found in `ci-pipelines BIDS-flux repository `_. In general the heuristics file is configured to run multiple functions: - **def custom_seqinfo(wrapper, series_files):** This function is used to extract the relevant DICOM tags from the DICOM files that will be used to determine the BIDS sequence information. - **def infotoids(seqinfos, outdir):** This function leverages the extracted DICOM tags to determine the BIDS subject and session IDS. - **def infotodict(seqinfo):** Heuristic evaluator for determining which runs belong where allowed template fields follow python string module. Deface of BIDS images ============================ The defacing of BIDS images is performed using a simple custom tool that affinely registers the T1w image to the MNI spcase and applies a mask to the image. BIDS-validation ============================ The BIDS-validation process is performed using the dockerized version of the `BIDS-validator tool `_, which checks the newly created BIDS dataset for compliance with the BIDS standard. This step is repeated for every change made to the BIDS datalad dataset in GitLab. MRIQC ============================ The MRIQC tools is used to asses the quality of the BIDS images. The MRIQC reports are generated and stored in the ``qc/mriqc`` datalad dataset in gitlab. The reports are linked to the BIDS images, allowing for easy access and review through the merge request. Manual Input ~~~~~~~~~~~~ The dataset administrator is responsible for reviewing the BIDS-converted data and associated MRIQC reports. They may also manually edit the BIDS dataset when necessary. Additionally, the administrator oversees the approval process for merge requests, ensuring that any required modifications are made prior to granting approval. Retiggering of Heudiconv ============================ If the Heudiconv conversion process fails or requires reconfiguration of the heuristics, the dataset administrator can manually trigger the process again using the GitLab interface. This allows for flexibility in managing the conversion process and ensuring that the data is properly formatted. .. note:: If the DICOM data was partially converted causing the pipeline to fail the BIDS-validation and a new ``convert/sub-1_ses-1`` branch was created. You will need to either change the branch name to something like ``convert/sub-1_ses-1_originalconv`` or delete it as the retrigger process will try to recreate the same branch as before failing in the process. The partially converted data will be kept in the S3 compatible storage (MinIO) unless you delete it manually. You can delete it using a combination of git, git-annex, and datalad with the following command: .. code-block:: bash git checkout convert/sub-1_ses-1 git annex drop --from= /path/to/data --force datalad save --message "deleted partial conversion data" The reason we need to save the changes after the fact is that git-annex needs to be notified that you dropped the binary data from the remote. Otherwise when reconverting the data, datalad might think the data already exists in the remote and not upload the complete data. Manual Editing of BIDS Dataset ============================ The dataset administrator can manually edit the BIDS dataset using git and Datalad commands. This allows for flexibility in managing the dataset and ensuring that it meets the BIDs standards. .. code:: bash git mv /path/to/file /new/path/to/file datalad save --message "Renamed files" git rm /path/to/file datalad save --message "Deleted files" datalad push --to=origin datalad push --to= MRIQC Report & Merge Request Review ============================ The MRIQC reports will need to be reviewed by the dataset administrator. Depending on the project needs the dataset administrator can choose to either approve the merge request of new ``convert/sub-1_ses-1`` to the ``dev`` branch or reject it. Data Access ^^^^^^^^^^^ Access Management ============================ Access to the data is managed through GitLab groups and S3 bucket policies. This access can be as granular as the project requires. The dataset administrator is responsible for managing access to the data, including granting and revoking permissions as needed. Access to the data is typically restricted to authorized personnel only, ensuring that sensitive information is protected. When data is ready to be shared openly or with specific collaboration groups or individuals, the dataset administrator can create a release branch and tag it with a version number. Different tiers of access using gitlab can be reviewed in the `official GitLab documentation `_. S3 bucket policies can be used to restrict access to the data stored in MinIO. The dataset administrator can create policies that allow or deny access to specific users or groups based on their roles and responsibilities. This ensures that only authorized personnel have access to sensitive data, while still allowing for collaboration and sharing of non-sensitive data. Locally ~~~~~~~ GitLab serves as a catalogue for the BIDS-flux data. To access data from the BIDS-flux infrastructure you will need to work with two of the software applications deployed for BIDS-flux, GitLab and MinIO. GitLab tracks the structure and history of the repositories, or in our case, the study directory hierarchy. The hierarchy of directories inside of GitLab is defined in this order: **Principal Investigator** / **Study Name** / (``bids``, ``sourcedata``, ``qc``, ``derivatives``). **Principal Investigator** will be the investigator who is heading the study. **Study Name** will be the name of the study or studies which are under the principal investigator. Under each independent study, you will find 4 different repositories containing study-specific data. The ``sourcedata`` repository will be the one keeping track of all the DICOM files of the study. The ``bids`` folder tracks the BIDS formatted images for the study. The ``qc`` repository tracks the quality control checks for the data of the study, and the ``derivatives`` repository tracks processing steps for the BIDS formatted data. MinIO will serve as the object storage for all the data for the repositories in GitLab. GitLab track the file’s history and the structure while MinIO stores all the images and binary objects (all non-text files).