e7bb9d6702 | ||
---|---|---|
.github | ||
data/external | ||
docs | ||
example_profile | ||
models | ||
notebooks | ||
problems | ||
references | ||
renv | ||
reports | ||
rules | ||
src | ||
tests | ||
tools | ||
.Rprofile | ||
.gitattributes | ||
.gitignore | ||
Dockerfile | ||
LICENSE | ||
README.md | ||
Snakefile | ||
__init__.py | ||
automl_test.py | ||
code_of_conduct.md | ||
config.yaml | ||
donotmakechanges.py | ||
environment.yml | ||
mkdocs.yml | ||
rapids | ||
renv.lock |
README.md
RAPIDS
Reproducible Analysis Pipeline for Data Streams
For more information refer to our documentation
By MoSHI, University of Pittsburgh
Installation
For RAPIDS installation refer to to the documentation
For the installation of the Docker version
-
Follow the instructions to setup RAPIDS via Docker (from scratch).
-
Delete current contents in /rapids/ folder when in a container session.
cd .. rm -rf rapids/{*,.*} cd rapids
-
Clone RAPIDS workspace from Git and checkout a specific branch.
git clone "https://repo.ijs.si/junoslukan/rapids.git" . git checkout <branch_name>
-
Install missing “libpq-dev” dependency with bash.
apt-get update -y apt-get install -y libpq-dev
-
Restore R venv. Type R to go to the interactive R session and then:
renv::restore()
-
Install cr-features module From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch master. Then follow the "cr-features module" section below.
-
Install all required packages from environment.yml, prune also deletes conda packages not present in environment file.
conda env update --file environment.yml –prune
-
If you wish to update your R or Python venvs.
R in interactive session: renv::snapshot() Python: conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
cr-features module
This RAPIDS extension uses cr-features library accessible here.
To use cr-features library:
-
Follow the installation instructions in the README.md.
-
Copy built calculatingfeatures folder into the RAPIDS workspace.
-
Install the cr-features package by:
pip install path/to/the/calculatingfeatures/folder e.g. pip install ./calculatingfeatures if the folder is copied to main parent directory cr-features package has to be built and installed everytime to get the newest version. Or an the newest version of the docker image must be used.
Updating RAPIDS
To update RAPIDS, first pull and merge origin, such as with:
git fetch --progress "origin" refs/heads/master
git merge --no-ff origin/master
Next, update the conda and R virtual environment.
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
Custom configuration
Credentials
As mentioned under Database in RAPIDS documentation, a credentials.yaml
file is needed to connect to a database.
It should contain:
PSQL_STRAW:
database: staw
host: 212.235.208.113
password: password
port: 5432
user: staw_db
wherepassword
needs to be specified as well.
Possible installation issues
Missing dependencies for RPostgres
To install RPostgres
R package (used to connect to the PostgreSQL database), an error might occur:
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libpq was not found. Try installing:
* deb: libpq-dev (Debian, Ubuntu, etc)
* rpm: postgresql-devel (Fedora, EPEL)
* rpm: postgreql8-devel, psstgresql92-devel, postgresql93-devel, or postgresql94-devel (Amazon Linux)
* csw: postgresql_dev (Solaris)
* brew: libpq (OSX)
If libpq is already installed, check that either:
(i) 'pkg-config' is in your PATH AND PKG_CONFIG_PATH contains a libpq.pc file; or
(ii) 'pg_config' is in your PATH.
If neither can detect , you can set INCLUDE_DIR
and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: libpq-fe.h: No such file or directory
compilation terminated.
The library requires libpq
for compiling from source, so install accordingly.
Timezone environment variable for tidyverse (relevant for WSL2)
One of the R packages, tidyverse
might need access to the TZ
environment variable during the installation.
On Ubuntu 20.04 on WSL2 this triggers the following error:
> install.packages('tidyverse')
ERROR: configuration failed for package ‘xml2’
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
Warning in system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
namespace ‘xml2’ 1.3.1 is already loaded, but >= 1.3.2 is required
Calls: <Anonymous> ... namespaceImportFrom -> asNamespace -> loadNamespace
Execution halted
ERROR: lazy loading failed for package ‘tidyverse’
This happens because WSL2 does not use the timedatectl
service, which provides this variable.
~$ timedatectl
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
and later
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Execution halted
This can be amended by setting the environment variable manually before attempting to install tidyverse
:
export TZ='Europe/Ljubljana'
Note: if this is needed to avoid runtime issues, you need to either define this environment variable in each new terminal window or (better) define it in your ~/.bashrc
or ~/.bash_profile
.
Possible runtime issues
Unix end of line characters
Upon running rapids, an error might occur:
/usr/bin/env: ‘python3\r’: No such file or directory
This is due to Windows style end of line characters.
To amend this, I added a .gitattributes
files to force git
to checkout rapids
using Unix EOL characters.
If this still fails, dos2unix
can be used to change them.