# STRAW2analysis All analysis for the STRAW project. To install: 1. Create a conda virtual environment from the `environment.yml` file. ```shell cd config conda env create --file environment.yml conda activate straw2analysis ``` If you have already created this environment, you can update it using: ```shell conda deactivate conda env update --file environment.yml conda activate straw2analysis ``` To use this environment in the Jupyter notebooks under `./exploration/`, you can select it under Kernel > Change kernel after running: ```shell ipython kernel install --user --name=straw2analysis ``` 2. Provide a file called `.env` to be used by `python-dotenv` which should be placed in the top folder of the application and should have the form: ``` DB_PASSWORD=database-password ``` # RAPIDS To install RAPIDS, follow the [instructions on their webpage](https://www.rapids.science/1.6/setup/installation/). Here, I include additional information related to the installation and specific to the STRAW2analysis project. The installation was tested on Windows using Ubuntu 20.04 on Windows Subsystem for Linux ([WSL2](https://docs.microsoft.com/en-us/windows/wsl/install)). ## Custom configuration ### Credentials As mentioned under [Database in RAPIDS documentation](https://www.rapids.science/1.6/snippets/database/), a `credentials.yaml` file is needed to connect to a database. It should contain: ```yaml PSQL_STRAW: database: staw host: 212.235.208.113 password: password port: 5432 user: staw_db ``` where`password` needs to be specified as well. ## Possible installation issues ### Missing dependencies for RPostgres To install `RPostgres` R package (used to connect to the PostgreSQL database), an error might occur: ```text ------------------------- ANTICONF ERROR --------------------------- Configuration failed because libpq was not found. Try installing: * deb: libpq-dev (Debian, Ubuntu, etc) * rpm: postgresql-devel (Fedora, EPEL) * rpm: postgreql8-devel, psstgresql92-devel, postgresql93-devel, or postgresql94-devel (Amazon Linux) * csw: postgresql_dev (Solaris) * brew: libpq (OSX) If libpq is already installed, check that either: (i) 'pkg-config' is in your PATH AND PKG_CONFIG_PATH contains a libpq.pc file; or (ii) 'pg_config' is in your PATH. If neither can detect , you can set INCLUDE_DIR and LIB_DIR manually via: R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...' --------------------------[ ERROR MESSAGE ]---------------------------- :1:10: fatal error: libpq-fe.h: No such file or directory compilation terminated. ``` The library requires `libpq` for compiling from source, so install accordingly. ### Timezone environment variable for tidyverse (relevant for WSL2) One of the R packages, `tidyverse` might need access to the `TZ` environment variable during the installation. On Ubuntu 20.04 on WSL2 this triggers the following error: ```text > install.packages('tidyverse') ERROR: configuration failed for package ‘xml2’ System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down Warning in system("timedatectl", intern = TRUE) : running command 'timedatectl' had status 1 Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : namespace ‘xml2’ 1.3.1 is already loaded, but >= 1.3.2 is required Calls: ... namespaceImportFrom -> asNamespace -> loadNamespace Execution halted ERROR: lazy loading failed for package ‘tidyverse’ ``` This happens because WSL2 does not use the `timedatectl` service, which provides this variable. ```bash ~$ timedatectl System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down ``` and later ```bash Warning message: In system("timedatectl", intern = TRUE) : running command 'timedatectl' had status 1 Execution halted ``` This can be amended by setting the environment variable manually before attempting to install `tidyverse`: ```bash export TZ='Europe/Ljubljana' ``` ## Possible runtime issues ### Unix end of line characters Upon running rapids, an error might occur: ```bash /usr/bin/env: ‘python3\r’: No such file or directory ``` This is due to Windows style end of line characters. To amend this, I added a `.gitattributes` files to force `git` to checkout `rapids` using Unix EOL characters. If this still fails, `dos2unix` can be used to change them. ### System has not been booted with systemd as init system (PID 1) See [the installation issue above](#Timezone-environment-variable-for-tidyverse-(relevant-for-WSL2)). ## Update RAPIDS To update RAPIDS, first pull and merge [origin]( https://github.com/carissalow/rapids), such as with: ```commandline git fetch --progress "origin" refs/heads/master git merge --no-ff origin/master ``` Next, update the conda and R virtual environment. ```bash R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))' ```