blog:nomad_oasis_management
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
blog:nomad_oasis_management [2024/06/14 17:21] – [Docker settings] laiko | blog:nomad_oasis_management [2024/07/16 10:58] (current) – removed fecik | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== MCh NOMAD Oasis Administration ====== | ||
- | FIXME | ||
- | NOMAD Oasis is running on the oasis.mch.rwth-aachen.de machine which is managed by our IT (currently: Sergej Laiko). Contact IT (currently Sergej) for login and password. The system itself is running in a virtual machine (VM) and using some Ubuntu clone optimized for VMs. The machine has currently 8 cores and 32GB of RAM. The disc space of the VM itself is quite with 32GB, the Oasis data and docker images reside on a separate | ||
- | |||
- | ===== MCh NOMAD Oasis ===== | ||
- | |||
- | The upstream documentation for running an Oasis is here [[https:// | ||
- | |||
- | ==== Docker settings ==== | ||
- | |||
- | All the Nomad Oasis data reside withing the docker directory tree. In fact the main drive ''/ | ||
- | |||
- | Oasis is started automatically on system startup. If not, you can start it manually by running | ||
- | cd / | ||
- | docker-compose up -d | ||
- | from the '' | ||
- | docker-compose down | ||
- | For further '' | ||
- | |||
- | ==== Building our custom oasis ==== | ||
- | |||
- | While the upstream NOMAD provides docker images for use, we want to apply some customization, | ||
- | |||
- | Updates of the git tree works in the usual way, i.e. | ||
- | cd nomad-git | ||
- | git fetch | ||
- | git rebase origin/< | ||
- | |||
- | '' | ||
- | |||
- | git submodule update --init --recursive | ||
- | |||
- | is needed to keep the submodules in sync with the nomad tree. If one needs to update a parser or submodule to a newer version that what is included in the current tree, one must go to the specific directory, for example to update the lobster parser to latest upstream do | ||
- | |||
- | cd dependencies/ | ||
- | git fetch && git rebase origin/ | ||
- | cd ../../.. | ||
- | git add dependencies/ | ||
- | git commit | ||
- | |||
- | When you are happy with the update and changes, the docker images can be build with | ||
- | |||
- | docker build . | ||
- | |||
- | After the build finishes, check the list of all build images with | ||
- | |||
- | docker images | ||
- | |||
- | and write down the image ID of the most recent one. Now tag it with | ||
- | |||
- | docker tag < | ||
- | |||
- | and replace the currently used images for '' | ||
- | |||
- | worker: | ||
- | ... | ||
- | image: < | ||
- | ... | ||
- | |||
- | app: | ||
- | ... | ||
- | image: < | ||
- | ... | ||
- | |||
- | Now just restart the docker. A good idea is to check before that no one is uploading/ | ||
- | |||
- | docker-compose down | ||
- | docker-compose up -d | ||
- | | ||
- | Everything should be running the new version now. | ||
- | |||
- | === Current patches === | ||
- | |||
- | We have a set of custom patches on top of the vanila NOMAD Oasis, so the it might be possible there will be some conflicts during the rebase. They needs to be resolved in the usual '' | ||
- | |||
- | The current set of patches is | ||
- | * MCh specific tweaking (this patch updates the webpage to be actually relevant for us as by default it is just a copy of the upstream NOMAD) | ||
- | * Try harder to detect surfaces (small patch for the system type classification: | ||
- | * VASP parser update (we have one custom VASP parser patch from https:// | ||
- | * LOBSTER parser update (LOBSTER parser is now part of the upstream but some minor parts of integration are still missing, see https:// | ||
- | * OpenMX parser | ||
- | * Mark systems that are too large for the MatID classification | ||
- | |||
- | The MCh specific tweaking is likely gonna stay forever, as well as the system type (MatID based) classification patches. The rest should be hopefully not needed when the OpenMX and LOBSTER parsers are properly upstreamed. The fix for the VASP parser is questionable so it might also never make it to the upstream branch. | ||
- | |||
- | ==== Current oasis settings ==== | ||
- | |||
- | === nomad.yaml === | ||
- | |||
- | This is the main Oasis config file. It was mostly copy-pasted from the example one at [[https:// | ||
- | |||
- | client: | ||
- | url: ' | ||
- | | ||
- | services: | ||
- | api_host: ' | ||
- | https: true | ||
- | https_upload: | ||
- | api_port: 443 | ||
- | api_base_path: | ||
- | admin_user_id: | ||
- | | ||
- | keycloak: | ||
- | realm_name: fairdi_nomad_prod | ||
- | username: ' | ||
- | password: ' | ||
- | oasis: true | ||
- | | ||
- | mongo: | ||
- | db_name: nomad_v0_8 | ||
- | | ||
- | elastic: | ||
- | | ||
- | | ||
- | # This was needed to combat some hard to debug parser bugs in the past, | ||
- | # might not be needed anymore and probably comes with some performance penalty | ||
- | # however it is likely the safe option. See https:// | ||
- | # the original problems. | ||
- | process_reuse_parser: | ||
- | | ||
- | # If new mainfile matching should be done on reprocessing (for example if a parser) | ||
- | # for new DFT code was added. See https:// | ||
- | reprocess_match: | ||
- | | ||
- | # How large systems (in atoms) should be characterized with the system type normalizer. | ||
- | # Please note that the main limit seems to be the memory. | ||
- | normalize: | ||
- | system_classification_with_clusters_threshold: | ||
- | | ||
- | oasis: | ||
- | # Add access for new users by adding their email here. | ||
- | allowed_users: | ||
- | - email1@mch.rwth-aachen.de | ||
- | - ... | ||
- | |||
- | meta: | ||
- | release: ' | ||
- | deployment_id: | ||
- | maintainer_email: | ||
- | |||
- | === docker-compose.yaml === | ||
- | |||
- | Mostly copy pasted from [[https:// | ||
- | |||
- | |||
- | version: ' | ||
- | | ||
- | x-common-variables: | ||
- | NOMAD_RABBITMQ_HOST: | ||
- | NOMAD_ELASTIC_HOST: | ||
- | NOMAD_MONGO_HOST: | ||
- | | ||
- | services: | ||
- | # broker for celery | ||
- | rabbitmq: | ||
- | restart: always | ||
- | image: rabbitmq: | ||
- | container_name: | ||
- | environment: | ||
- | - RABBITMQ_ERLANG_COOKIE=SWQOKODSQALRPCLNMEQG | ||
- | - RABBITMQ_DEFAULT_USER=rabbitmq | ||
- | - RABBITMQ_DEFAULT_PASS=rabbitmq | ||
- | - RABBITMQ_DEFAULT_VHOST=/ | ||
- | volumes: | ||
- | - nomad_oasis_rabbitmq:/ | ||
- | | ||
- | # the search engine | ||
- | elastic: | ||
- | restart: always | ||
- | image: docker.elastic.co/ | ||
- | container_name: | ||
- | environment: | ||
- | - discovery.type=single-node | ||
- | volumes: | ||
- | - nomad_oasis_elastic:/ | ||
- | | ||
- | # the user data db | ||
- | mongo: | ||
- | restart: always | ||
- | image: mongo:4 | ||
- | container_name: | ||
- | environment: | ||
- | - MONGO_DATA_DIR=/ | ||
- | - MONGO_LOG_DIR=/ | ||
- | volumes: | ||
- | - nomad_oasis_mongo:/ | ||
- | command: mongod --logpath=/ | ||
- | | ||
- | # nomad worker (processing) | ||
- | worker: | ||
- | restart: always | ||
- | image: mchnomad33: | ||
- | container_name: | ||
- | environment: | ||
- | <<: *nomad_backend_env | ||
- | NOMAD_SERVICE: | ||
- | OMP_NUM_THREADS: | ||
- | links: | ||
- | - rabbitmq | ||
- | - elastic | ||
- | - mongo | ||
- | volumes: | ||
- | - ./ | ||
- | - nomad_oasis_files:/ | ||
- | command: python -m celery worker -l info -A nomad.processing -Q celery, | ||
- | | ||
- | # nomad app (api + gui) | ||
- | app: | ||
- | restart: always | ||
- | image: mchnomad33: | ||
- | container_name: | ||
- | environment: | ||
- | <<: *nomad_backend_env | ||
- | NOMAD_SERVICE: | ||
- | links: | ||
- | - rabbitmq | ||
- | - elastic | ||
- | - mongo | ||
- | volumes: | ||
- | - ./ | ||
- | - nomad_oasis_files:/ | ||
- | command: ./run.sh | ||
- | | ||
- | # nomad gui (a reverse proxy for nomad) | ||
- | gui: | ||
- | restart: always | ||
- | image: nginx: | ||
- | container_name: | ||
- | command: nginx -g ' | ||
- | volumes: | ||
- | - ./ | ||
- | - ./ | ||
- | - ./ | ||
- | links: | ||
- | - app | ||
- | ports: | ||
- | - 443:443 | ||
- | | ||
- | volumes: | ||
- | nomad_oasis_mongo: | ||
- | nomad_oasis_elastic: | ||
- | nomad_oasis_rabbitmq: | ||
- | nomad_oasis_files: | ||
- | |||
- | === nginx.conf === | ||
- | |||
- | Config file for the nginx server, based on the default one as well. Some custom settings include mostly the ssl config. See nginx docs [[http:// | ||
- | |||
- | server { | ||
- | listen | ||
- | server_name | ||
- | proxy_set_header Host $host; | ||
- | | ||
- | ssl on; | ||
- | ssl_certificate | ||
- | ssl_certificate_key | ||
- | ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; | ||
- | ssl_ciphers | ||
- | | ||
- | location / { | ||
- | proxy_pass http:// | ||
- | } | ||
- | | ||
- | location ~ / | ||
- | rewrite ^ / | ||
- | } | ||
- | | ||
- | location / | ||
- | proxy_intercept_errors on; | ||
- | error_page 404 = @redirect_to_index; | ||
- | proxy_pass http:// | ||
- | } | ||
- | | ||
- | location @redirect_to_index { | ||
- | rewrite ^ / | ||
- | proxy_pass http:// | ||
- | } | ||
- | | ||
- | location ~ \/ | ||
- | add_header Last-Modified $date_gmt; | ||
- | add_header Cache-Control ' | ||
- | if_modified_since off; | ||
- | expires off; | ||
- | etag off; | ||
- | proxy_pass http:// | ||
- | } | ||
- | | ||
- | location ~ \/ | ||
- | client_max_body_size 200g; | ||
- | proxy_request_buffering off; | ||
- | proxy_pass http:// | ||
- | } | ||
- | | ||
- | location ~ \/ | ||
- | proxy_buffering off; | ||
- | proxy_pass http:// | ||
- | } | ||
- | | ||
- | location ~ \/ | ||
- | proxy_buffering off; | ||
- | proxy_read_timeout 600; | ||
- | proxy_pass http:// | ||
- | } | ||
- | | ||
- | location ~ \/ | ||
- | proxy_buffering off; | ||
- | proxy_read_timeout 600; | ||
- | proxy_pass http:// | ||
- | } | ||
- | } | ||
- | |||
- | ==== Nomad admin tools ==== | ||
- | |||
- | The GUI has almost no administration option. Hence if some hand-editing of the database and uploads is needed, one has to use the command line '' | ||
- | |||
- | First, ssh to the oasis machine and connect to the app container with | ||
- | docker exec -ti nomad_oasis_app /bin/bash | ||
- | |||
- | Now the '' | ||
- | |||
- | For example if you would want to delete a selected upload do | ||
- | nomad admin uploads rm -- <upload id> | ||
- | |||
- | If you updated a parser and want to regenerate the metadata or possibly detect new entries in old uploads (if the update has a completely new parser), that can be done with | ||
- | nomad admin uploads re-process -- <upload id> | ||
- | for selected '' | ||
- | nomad admin uploads re-process | ||
- | for all uploads (when no upload id is specified, the command is applied to all uploads). Please note that the re-processing is quite intensive and can run for hours (days in the future when we have more entries in the oasis). | ||
- | ==== Backups ==== | ||
- | |||
- | There MUST be regular backups of the ''/ | ||
- | ===== Getting help ===== | ||
- | |||
- | The main channel to developers is [[https:// | ||
- | |||
- | ===== Current TODO list ===== | ||
- | |||
- | * Get LOBSTER and OpenMX parsers fully upstream so we get rid of the maintenance burden. If there are some changes in the upstream metainfo scheme or some significant refactors, the parsers will likely break and will need to be fixed. If we can get it to upstream NOMAD, the developers introducing the changes will have to fix the parsers as well. The parsers are currently residing at [[https:// | ||
- | * Make sure everything is regularly backed-up! Dunno if monthly backup is sufficient? Maybe discuss with Sergej. | ||
- | * Make sure there is enough disc space. We now have just 1TB (at the time of writing ~50% full). Sergej promised to enlarge it months ago, but nothing happened so far. | ||
- | * Configure docker to not require root. | ||
- | * Get more RAM for the VM (this would allow higher concurency for the workers and hence faster parsing). | ||
- | * The current MCh Nomad Oasis manual/ |
blog/nomad_oasis_management.1718378500.txt.gz · Last modified: 2024/06/14 17:21 by laiko