growthcleanr
has been developed and tested using R
versions 3.6 and 4+. It should work using R on Windows, macOS, or
Unix/Linux, although there are some additional platform-specific notes you may wish to review.
To get started with growthcleanr
, install it from
CRAN:
install.packages("growthcleanr")
To install the latest development version from GitHub using
devtools
:
devtools::install_github("carriedaymont/growthcleanr", ref="main")
Installing growthcleanr
will install several additional
packages in turn.
See GitHub and source-level install for developers for additional details.
These packages are not required for running
cleangrowth()
and the other main functions of
growthcleanr
, but may be necessary for certain use
cases.
argparser
is used by the gcdriver.R
script for command line operation as described in Working with large data sets
bit64
is necessary if you have long (64-bit) integer
subjid
values
These can be installed the usual way:
install.packages(c("argparser", "bit64"))
We have observed an issue with using growthcleanr
with
Windows in a large agency where some R packages are installed by an
administrator for a user who may not have write access permissions on
the package folder. Similar issues may occur with some networked drives.
This caused problems with growthcleanr
’s parallel
processing option. If possible, install R and its packages in locations
hosted on the same local machine and folder(s) for which the primary
user has write permissions. These steps should help to avoid this
problem.
Some users have reported that growthcleanr
runs more
slowly on Windows compared with Linux or macOS.
growthcleanr
uses the data.table
package for R extensively. data.table
provides a faster
version of R’s data frames, and is used to improve
growthcleanr
performance. Typically data.table
installs in a manner that will be able to take advantage of multiple
threads. You will know it worked successfully if, when you load the
data.table
library in R, you see something like the
following:
library(data.table)
1.12.2 using 2 threads (see ?getDTthreads). Latest news: r-datatable.com data.table
That data.table reports “using 2 threads” indicates that installation
has succeeded. If the message reports using only one thread, see the
advice under the “OpenMP
enabled compiler for Mac” instructions to re-install
data.table
.
Users have reported errors running multiple batches with
parallel = T
from within RStudio. If this happens, the
problem may be resolved by running from RGui or from the command line
using Rscript
. An example standalone script that may be
used for this purpose is documented in Working with large data sets.
This package includes a Dockerfile
that enables easy
installation of R and growthcleanr
on a machine with Docker installed. It requires an
up-to-date Docker install, and a few command-line steps, but can save
time over installing R and growthcleanr
’s dependencies
manually.
To install and run growthcleanr
using Docker, open the
PowerShell on Windows, or open the Terminal on macOS, and enter this
docker
command:
docker run -it ghcr.io/mitre/growthcleanr/gcr-image:latest R
The image tag latest
in the example above will refer to
the latest version of the package available on the main branch of the mitre/growthcleanr
repository, which is typically in close sync with the upstream carriedaymont/growthcleanr
repository. This will usually be the latest released version of
growthcleanr
. To explicitly choose a release by name,
replace latest
with the release tag, e.g. for the released
package v2.1.0
:
docker run -it ghcr.io/mitre/growthcleanr/gcr-image:v2.1.0 R
Whichever package you use, the first time this command is run, it
might take a few minutes to download and extract several necessary
components, but this should be fully automated. If successful, you
should see an R prompt, from which you can use the
growthcleanr
package immediately.
This R environment is virtualized inside Docker, however, and
isolated from your local machine. Because of this, you will need to map
a local folder on your computer into the Docker environment to work with
your own data. For example, if you are on Windows, and your data is in
C:\Users\exampleuser\analysis
, specify a mapping using the
added -v
step below:
docker run -it -v C:\Users\exampleusers\analysis:/usr/src/app \
ghcr.io/mitre/growthcleanr/gcr-image:latest R
Note that the slashes in file paths reverse direction from the reference to the folder location on your Windows machine (before the colon) to the folder location on the Docker container (after the colon); this is intentional, and accounts for how the two different environments reference disk locations.
Note also that when mapping a folder on Windows, you may be prompted to confirm that you indeed want to “Share” the folder. This is a standard Windows security practice, and it is okay to confirm and proceed.
If you are on macOS, and your data is in
/Users/exampleuser/analysis
, specify a folder mapping like
this:
docker run -it -v /Users/exampleuser/analysis:/usr/src/app \
ghcr.io/mitre/growthcleanr/gcr-image:latest R
If you mapped a folder, then inside the Docker environment’s R
prompt, when you then issue a command like list.files()
,
you should see a list of the same files in the R session that you see in
that folder on your desktop. You can now open and read your data files,
run cleangrowth()
and other analyses, and write result
files to that same directory.
Exit the Docker R environment with quit()
as you
normally would. Any new files you saved will appear in the desktop
folder you mapped.
You can install the growthcleanr
package directly from
GitHub using devtools
in the R console with:
install.packages("devtools")
devtools::install_github("carriedaymont/growthcleanr", ref="main")
growthcleanr
itself has several dependencies, so it may
take a little while to download and install everything on your
machine.
Note that the ref="main"
part is required; the default
value of ref
refers to a branch name that is not used in
the growthcleanr
repository, which instead uses a default
branch called “main
”.
To install a different branch, for example if you want to test a
branch associated with a merge request, specify the branch name as the
value of ref
.
If you are unable to install devtools
, a similar
function is available in the remotes
package:
install.packages("remotes")
remotes::install_github("carriedaymont/growthcleanr", ref="main")
If you are developing the growthcleanr
code itself, you
can download or clone the growthcleanr
source code and then
install it from source. To clone the source using git
:
git clone https://github.com/carriedaymont/growthcleanr.git
Once you have the growthcleanr
package source, open an R
session from the growthcleanr
base directory. Then install
growthcleanr using the R devtools
package:
devtools::install(".")