--- title: "trimmer" author: "Lars Kjeldgaard" output: rmarkdown::html_vignette date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{trimr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(pryr) ``` `trimmer` 0.7.5 is now available on CRAN. `trimmer` is a lightweight toolkit to trim a (potentially big) R object without breaking the results of a given function call, where the (trimmed) R object is given as argument. The `trim` function is the bread and butter of `trimmer`. It seeks to reduce the size of an R object by recursively removing elements from the object one-by-one. It does so in a 'greedy' fashion - it constantly tries to remove the element that uses the most memory. The trimming process is constrained by a reference function call. The trimming procedure will not allow elements to be removed from the object, that will cause results from the function call to diverge from the original results of the function call. ## Motivation There can be many data reasons as to why, you might want to 'trim' an R object. A typical example could be a R model object. It will typically contain all kinds of (more or less useful) stuff and meta data with information about the model. You might want to try to reduce the size of the object for (memory) efficiency purposes, such that the model only contains only what is in fact needed to predict new observations - and _nothing_ else! ## Installation Install the development version of `trimmer` with: ```{r install_github, eval = FALSE} remotes::install_github("smaakage85/trimmer") ``` Or install the version released on CRAN: ```{r install_cran, eval = FALSE} install.packages("trimmer") ``` ## Trimming Process The trimming procedure - conducted with `trim()` - consists of the following steps: 1. Call the specified function with the object before trimming and save results for reference. 2. Compute size of elements in the most shallow layer of object. These elements are the candidates for elimination. 3. Identify the candidate element that uses most memory. 4. Call function again, but this time with the element from step 3 removed from object. 5. If results from function call are the same as in step 1, remove element from object. If results diverge, keep element - and expand the list with candidates for elimination with elements of most shallow layer of this object (if there are any). 6. Repeat steps 3 to 5, until target size is reached or until no further elements can be removed without results from function call diverge. ## Workflow Example Get ready by loading the package. ```{r} library(trimmer) ``` Train a model on the famous `mtcars` data set. ```{r} # load training data. trn <- datasets::mtcars # estimate model. mdl <- lm(mpg ~ ., data = trn) ``` I want to trim the model object `mdl` as possible without affecting the predictions, computed with function `predict()`, for the resulting model. The trimming is then simply conducted by invoking: ```{r} mdl_trim <- trim(obj = mdl, obj_arg_name = "object", fun = predict, newdata = trn) ``` And that's it! Note, that I provide the `trim` function with the extra argument `newdata`, that is passed to the function call with `fun`. This means, that the trimming is constrained by, that the results of 'fun' (=`predict`) _MUST_ be exactly the same on these data before and after the trimming. The trimmed model object now measures `r pf_obj_size(object_size(mdl_trim))`. The original object measured `r pf_obj_size(object_size(mdl))`. ## Set Target Size If you just want the object size to be below some threshold, you can set that as a criterion. The 'trimming' process will continue no further, when this threshold is reached. This approach can be time-saving compared to minimizing the object as much as possible (=default setting). ```{r} mdl_trim <- trim(obj = mdl, obj_arg_name = "object", fun = predict, newdata = trn, size_target = 0.015) ``` With these settings, the trimmed model object measures `r pf_obj_size(object_size(mdl_trim))`. The original object measured `r pf_obj_size(object_size(mdl))`. ## Other Applications `trimmer` is compatible with all R objects, that inherit from the `list` class - not just R model objects - and all kinds of functions - not just the `predict function`. Hence `trimmer` is quite a flexible tool. To illustrate I will trim the same object but under the constraint, that the results from the `summary()` function must be preserved. ```{r} mdl_trim <- trim(obj = mdl, obj_arg_name = "object", fun = summary) ``` ## Other Notes You can choose whether or not to tolerate warnings from reference function calls with argument `tolerate_warnings`. You can also choose, that certain elements _MUST NOT_ be removed in the trimming process. Do this with the `dont_touch` argument. ## Future Development I would like to extend the framework to also support parallellization. That is it, I hope, that you will enjoy the `trimmer` package :)