NOMAD Oasis is running on the oasis.mch.rwth-aachen.de machine which is managed by our IT (currently: Sergej Laiko). Contact IT (currently Sergej) for login and password. The system itself is running in a virtual machine (VM) and using some Ubuntu clone optimized for VMs. The machine has currently 8 cores and 32GB of RAM. The disc space of the VM itself is quite with 32GB, the Oasis data and docker images reside on a separate /dev/xvdb1 drive (currently 1TB).
The upstream documentation for running an Oasis is here https://nomad-lab.eu/prod/rae/docs/oasis.html. The system works through a docker https://www.docker.com/ containers, you can think of it as another lightweight level of virtualization, so that several services that are needed for an working Oasis each have its own container. The complete Oasis settings resides in the oasis
folder. The separate docker containers are configured in the docker-compose.yaml
file, the main oasis config file is nomad.yaml
, webserver settings are in nginx.conf
and the ssl keys are there as well (in order to have a working https).
All the Nomad Oasis data reside withing the docker directory tree. In fact the main drive /dev/xvdb1
is mounted as /var/snap/docker/common/var-lib-docker
so most of the docker settings and images should be on it as well.
Oasis is started automatically on system startup. If not, you can start it manually by running
cd /home/admo/oasis_1 docker-compose up -d
from the oasis_1
directory. You can stop everything by
docker-compose down
For further docker-compose
commands see docker-compose docs https://docs.docker.com/compose/.
While the upstream NOMAD provides docker images for use, we want to apply some customization, so we have to build our own images. This is done directly from the https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/ tree, which is cloned to nomad-git
directory on the VM.
Updates of the git tree works in the usual way, i.e.
cd nomad-git git fetch git rebase origin/<desired branch>
origin/master
is the latest stable branch, so that is the safe bet, unless some extra features from development branches are needed. The tricky part is that the parsers and other components are incorporated through submodules. So after the rebase of the nomad-FAIR tree, also the
git submodule update --init --recursive
is needed to keep the submodules in sync with the nomad tree. If one needs to update a parser or submodule to a newer version that what is included in the current tree, one must go to the specific directory, for example to update the lobster parser to latest upstream do
cd dependencies/parsers/lobster/ git fetch && git rebase origin/master cd ../../.. git add dependencies/parsers/lobster/ git commit
When you are happy with the update and changes, the docker images can be build with
docker build .
After the build finishes, check the list of all build images with
docker images
and write down the image ID of the most recent one. Now tag it with
docker tag <image_id> <some_new_random_tag>
and replace the currently used images for worker
and app
container in the docker-compose.yaml
with the newly built one
worker: ... image: <some_new_random_tag>:latest ...
app: ... image: <some_new_random_tag>:latest ...
Now just restart the docker. A good idea is to check before that no one is uploading/downloading something to the oasis, this could be checked with top
that there are no processes with high CPU usage running.
docker-compose down docker-compose up -d
Everything should be running the new version now.
We have a set of custom patches on top of the vanila NOMAD Oasis, so the it might be possible there will be some conflicts during the rebase. They needs to be resolved in the usual git
way, however sometime a python or web frontend programmings skills might be needed if the conflicts are more complex. Most of the patches are quite simple though.
The current set of patches is
The MCh specific tweaking is likely gonna stay forever, as well as the system type (MatID based) classification patches. The rest should be hopefully not needed when the OpenMX and LOBSTER parsers are properly upstreamed. The fix for the VASP parser is questionable so it might also never make it to the upstream branch.
This is the main Oasis config file. It was mostly copy-pasted from the example one at https://nomad-lab.eu/prod/rae/docs/oasis.html and configured for our specific url and settings. See inline comments for explanation of some more specific settings.
services: api_host: 'oasis.mch.rwth-aachen.de' https: true https_upload: true api_port: 443 api_base_path: '/nomad-oasis' mongo: db_name: nomad_v1 elastic: entries_index: nomad_oasis_entries_v1 materials_index: nomad_oasis_materials_v1 # This was needed to combat some hard to debug parser bugs in the past, # might not be needed anymore and probably comes with some performance penalty # however it is likely the safe option. See https://matsci.org/t/36150 for # the original problems. # process_reuse_parser: false # If new mainfile matching should be done on reprocessing (for example if a parser) # for new DFT code was added. See https://matsci.org/t/36286 for more details. reprocess_match: true # How large systems (in atoms) should be characterized with the system type normalizer. # Please note that the main limit seems to be the memory. normalize: system_classification_with_clusters_threshold: 350 oasis: is_oasis: true uses_central_user_management: true # Add access for new users by adding their email here. allowed_users: - nomad-oasis@mch.rwth-aachen.de - email1@mch.rwth-aachen.de - ... meta: deployment: 'oasis' deployment_id: 'oasis.mch.rwth-aachen.de' maintainer_email: 'nomad-oasis@mch.rwth-aachen.de' deployment_url: 'https://oasis.mch.rwth-aachen.de/api' celery: timeout: 10000 max_memory: 16000000
Mostly copy pasted from https://nomad-lab.eu/prod/rae/docs/oasis.html, the only changes are the –concurrency=5
switch for the worker
container (that influences the number of cores used for parsing, 5 is a compromise between parallelism and possible slowdowns due to swapping when parsing large cases. The memory heavy stuff is ATM mostly vasprun.xml parser https://github.com/nomad-coe/nomad-parser-vasp/issues/12 and the system normalizer for system with lot of atoms) and the OMP_NUM_THREADS: 1
environment variable for the worker to prevent overload (see https://github.com/nomad-coe/nomad/issues/10 for details).
version: '3' x-common-variables: &nomad_backend_env NOMAD_RABBITMQ_HOST: rabbitmq NOMAD_ELASTIC_HOST: elastic NOMAD_MONGO_HOST: mongo services: # broker for celery rabbitmq: restart: always image: rabbitmq:3.11.5 container_name: nomad_oasis_rabbitmq_v1 environment: - RABBITMQ_ERLANG_COOKIE=SWQOKODSQALRPCLNMEQG - RABBITMQ_DEFAULT_USER=rabbitmq - RABBITMQ_DEFAULT_PASS=rabbitmq - RABBITMQ_DEFAULT_VHOST=/ volumes: - nomad_oasis_rabbitmq:/var/lib/rabbitmq - healthcheck: test: ["CMD", "rabbitmq-diagnostics", "--silent", "--quiet", "ping"] interval: 10s timeout: 10s retries: 30 start_period: 10s # the search engine elastic: ulimits: nofile: soft: 1048576 hard: 1048576 restart: unless-stopped image: docker.elastic.co/elasticsearch/elasticsearch:7.17.1 container_name: nomad_oasis_elastic_v1 environment: - ES_JAVA_OPTS=-Xms512m -Xmx512m - discovery.type=single-node volumes: - elastic:/usr/share/elasticsearch/data healthcheck: test: - "CMD" - "curl" - "--fail" - "--silent" - "http://elastic:9200/_cat/health" interval: 10s timeout: 10s retries: 30 start_period: 60s # the user data db mongo: ulimits: nofile: soft: 1048576 hard: 1048576 restart: unless-stopped image: mongo:5.0.6 container_name: nomad_oasis_mongo_v1 environment: - MONGO_DATA_DIR=/data/db - MONGO_LOG_DIR=/dev/null volumes: - mongo:/data/db - ./.volumes/mongo:/backup command: mongod --logpath=/dev/null # --quiet healthcheck: test: - "CMD" - "mongo" - "mongo:27017/test" - "--quiet" - "--eval" - "'db.runCommand({ping:1}).ok'" interval: 10s timeout: 10s retries: 30 start_period: 10s # nomad worker (processing) worker: ulimits: nofile: soft: 1048576 hard: 1048576 restart: unless-stopped image: gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:latest container_name: nomad_oasis_worker_v1 environment: <<: *nomad_backend_env NOMAD_SERVICE: nomad_oasis_worker OMP_NUM_THREADS: 1 depends_on: rabbitmq: condition: service_healthy elastic: condition: service_healthy mongo: condition: service_healthy volumes: - ./configs/nomad.yaml:/app/nomad.yaml - /var/snap/docker/common/var-lib-docker/volumes/oasis_nomad_oasis_files/_data:/app/.volumes/fs command: python -m celery -l info -A nomad.processing worker -Q celery --concurrency=5 # nomad app (api + gui) app: ulimits: nofile: soft: 1048576 hard: 1048576 restart: unless-stopped image: gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:latest container_name: nomad_oasis_app_v1 environment: <<: *nomad_backend_env NOMAD_SERVICE: nomad_oasis_app NOMAD_SERVICES_API_PORT: 80 NOMAD_FS_EXTERNAL_WORKING_DIRECTORY: "$PWD" depends_on: rabbitmq: condition: service_healthy elastic: condition: service_healthy mongo: condition: service_healthy volumes: - ./configs/nomad.yaml:/app/nomad.yaml - /var/snap/docker/common/var-lib-docker/volumes/oasis_nomad_oasis_files/_data:/app/.volumes/fs command: ./run.sh healthcheck: test: - "CMD" - "curl" - "--fail" - "--silent" - "http://localhost:8000/-/health" interval: 10s timeout: 10s retries: 30 start_period: 10s # nomad gui (a reverse proxy for nomad) proxy: ulimits: nofile: soft: 1048576 hard: 1048576 restart: unless-stopped image: nginx:1.13.9-alpine container_name: nomad_oasis_proxy_v1 command: nginx -g 'daemon off;' volumes: - ./configs/nginx.conf:/etc/nginx/conf.d/default.conf - ./configs/cert/cert-oasis-witchchain.pem:/ssl/cert-oasis-witchchain.pem - ./configs/cert/server-oasis-mch-key.pem:/ssl/server-oasis-mch-key.pem depends_on: app: condition: service_healthy worker: condition: service_started # TODO: service_healthy ports: - 443:443 volumes: mongo: name: "nomad_oasis_mongo" elastic: name: "nomad_oasis_elastic" rabbitmq: name: "nomad_oasis_rabbitmq" keycloak: name: "nomad_oasis_keycloak" nomad_oasis_files: networks: default: name: nomad_oasis_network
Config file for the nginx server, based on the default one as well. Some custom settings include mostly the ssl config. See nginx docs http://nginx.org/en/docs/ for more info.
map $http_upgrade $connection_upgrade { default upgrade; '' close; } server { listen 443 ssl; server_name oasis.mch.rwth-aachen.de; proxy_set_header Host $host; ssl on; ssl_certificate /ssl/cert-oasis-witchchain.pem; ssl_certificate_key /ssl/server-oasis-mch-key.pem; ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5; location / { proxy_pass http://app:8000; } location ~ /nomad-oasis\/?(gui)?$ { rewrite ^ /nomad-oasis/gui/ permanent; } location /nomad-oasis/gui/ { proxy_intercept_errors on; error_page 404 = @redirect_to_index; proxy_pass http://app:8000; } location @redirect_to_index { rewrite ^ /nomad-oasis/gui/index.html break; proxy_pass http://app:8000; } location ~ \/gui\/(service-worker\.js|meta\.json)$ { add_header Last-Modified $date_gmt; add_header Cache-Control 'no-store, no-cache, must-revalidate, proxy-revalidate, max-age=0'; if_modified_since off; expires off; etag off; proxy_pass http://app:8000; } location ~ /api/v1/uploads(/?$|.*/raw|.*/bundle?$) { client_max_body_size 35g; proxy_request_buffering off; proxy_pass http://app:8000; } location ~ /api/v1/.*/download { proxy_buffering off; proxy_pass http://app:8000; } }
The GUI has almost no administration option. Hence if some hand-editing of the database and uploads is needed, one has to use the command line nomad admin
tools.
First, ssh to the oasis machine and connect to the app container with
docker exec -ti nomad_oasis_app /bin/bash
Now the nomad admin
command provides a lot of management option. See https://nomad-lab.eu/prod/rae/docs/client/cli_ref.html#admin-cli-commands for documentation, or run it with -
-help
switch. Apply extreme caution as one can easily delete or destroy everything with nomad admin tools!
For example if you would want to delete a selected upload do
nomad admin uploads rm -- <upload id>
If you updated a parser and want to regenerate the metadata or possibly detect new entries in old uploads (if the update has a completely new parser), that can be done with
nomad admin uploads re-process -- <upload id>
for selected upload id
, or with
nomad admin uploads re-process
for all uploads (when no upload id is specified, the command is applied to all uploads). Please note that the re-processing is quite intensive and can run for hours (days in the future when we have more entries in the oasis).
A daily backup is performed on the RWTH Commvault System and a monthly backup on the DC1.mch.rwth-aachen.de NAS station.
The main channel to developers is https://matsci.org/c/nomad/32 forum. Developers usually respond within hours. If a bug is found, it should be reported at the github repo https://github.com/nomad-coe/nomad/issues or to the specific parser subprojects https://github.com/nomad-coe/nomad-parser-*/issues
. pavel.ondracka@gmail.com
might be willing to help as well.