Package 'recorder' reference manual

Title:	Toolkit to Validate New Data for a Predictive Model
Description:	A lightweight toolkit to validate new observations when computing their predictions with a predictive model. The validation process consists of two steps: (1) record relevant statistics and meta data of the variables in the original training data for the predictive model and (2) use these data to run a set of basic validation tests on the new set of observations.
Authors:	Lars Kjeldgaard [aut, cre]
Maintainer:	Lars Kjeldgaard <[email protected]>
License:	MIT + file LICENSE
Version:	0.8.2
Built:	2025-03-08 02:14:13 UTC
Source:	https://github.com/smaakage85/recorder

Compress Results of Detailed Tests

Description

Subsets results of the tests, where at least one row failed.

Usage

compress_detailed_tests(dt)
compress_detailed_tests(dt)

Arguments

`dt`	`list` results of detailed tests.

Value

list with test failures.

Concatenate Validation Test Failures Descriptions

Description

Concatenates validation test failures descriptions to a single character vector.

Usage

concatenate_test_failures(test_failures)
concatenate_test_failures(test_failures)

Arguments

test_failures

data.frame with test results as columns.

Value

character concatenated descriptions of test failures with one string pr. row.

Create Data Frame with Test Results

Description

Create Data Frame with Test Results

Usage

create_test_results_df(x)
create_test_results_df(x)

Arguments

`x`	`list` results of tests.

Value

data.table with test results as columns.

Create Meta Data of Validation Tests

Description

Creates meta data of available validation tests as a list. The list has as many elements as the number of available validation test - one for each test. Entries are named after the different tests.

Usage

create_tests_meta_data()
create_tests_meta_data()

Details

The meta data of a validation test consists of:

evaluate_level: is the test evaluated on column level ('col') or on row level ('row')?
evaluate_class: what classes of variables are being tested with this specific test?
description: a short description of what a test failure means for the given test

Value

list meta data of validation tests.

Examples

create_tests_meta_data()
create_tests_meta_data()

Get Clean Rows

Description

Get Clean Rows

Usage

get_clean_rows(playback, ignore_tests = NULL, ignore_cols = NULL,
  ignore_combinations = NULL)
get_clean_rows(playback, ignore_tests = NULL, ignore_cols = NULL,
  ignore_combinations = NULL)

Arguments

`playback`	`data.playback` to extract failed tests from.
`ignore_tests`	`character` ignore test results from tests with these names.
`ignore_cols`	`character` ignore test results from tests of columns with these names.
`ignore_combinations`	`list` ignore test results from specific tests of specific columns.

Details

Look up the descriptions and other meta data of the available validation tests with get_tests_meta_data.

Value

logical with the same length as the number of rows in new data. The value is TRUE, if the row passed all tests, otherwise FALSE.

Examples

# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)

get_clean_rows(playback)
get_clean_rows(playback, ignore_tests = "outside_range")
get_clean_rows(playback, ignore_cols = "junk")
get_clean_rows(playback, ignore_combinations = list(outside_range = "Sepal.Width"))
# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)

get_clean_rows(playback)
get_clean_rows(playback, ignore_tests = "outside_range")
get_clean_rows(playback, ignore_cols = "junk")
get_clean_rows(playback, ignore_combinations = list(outside_range = "Sepal.Width"))

Get Failed Tests

Description

Get Failed Tests

Usage

get_failed_tests(playback, ignore_tests = NULL, ignore_cols = NULL,
  ignore_combinations = NULL)
get_failed_tests(playback, ignore_tests = NULL, ignore_cols = NULL,
  ignore_combinations = NULL)

Arguments

`playback`	`data.playback` to extract failed tests from.
`ignore_tests`	`character` ignore test results from tests with these names.
`ignore_cols`	`character` ignore test results from tests of columns with these names.
`ignore_combinations`	`list` ignore test results from specific tests of specific columns.

Value

data.table with test results as logicals for all of the tests with at least one failure. A failed test for any given row is equivalent to a value of TRUE. If all tests passed, the function will simply return a data.table with one column, 'any_failures', that is always FALSE, to ensure that the output is (type) stable and consistent.

Examples

# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)

get_failed_tests(playback)
get_failed_tests(playback, ignore_tests = "outside_range")
get_failed_tests(playback, ignore_cols = "junk")
get_failed_tests(playback, ignore_combinations = list(outside_range = "Sepal.Width"))
# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)

get_failed_tests(playback)
get_failed_tests(playback, ignore_tests = "outside_range")
get_failed_tests(playback, ignore_cols = "junk")
get_failed_tests(playback, ignore_combinations = list(outside_range = "Sepal.Width"))

Get Failed Tests as a String

Description

Concatenates information of the tests that failed into one single character vector.

Usage

get_failed_tests_string(playback, ignore_tests = NULL,
  ignore_cols = NULL, ignore_combinations = NULL)
get_failed_tests_string(playback, ignore_tests = NULL,
  ignore_cols = NULL, ignore_combinations = NULL)

Arguments

`playback`	`data.playback` to extract failed tests from.
`ignore_tests`	`character` ignore test results from tests with these names.
`ignore_cols`	`character` ignore test results from tests of columns with these names.
`ignore_combinations`	`list` ignore test results from specific tests of specific columns.

Details

Look up the descriptions and other meta data of the available validation tests with get_tests_meta_data.

Value

character with one entry for each row in new data. Each entry concatenates information of the tests, that did NOT pass for the corresponding row in new data.

Examples

# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)

get_failed_tests_string(playback)
get_failed_tests_string(playback, ignore_tests = "outside_range")
get_failed_tests_string(playback, ignore_cols = "junk")
get_failed_tests_string(playback, ignore_combinations = list(outside_range = "Sepal.Width"))
# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)

get_failed_tests_string(playback)
get_failed_tests_string(playback, ignore_tests = "outside_range")
get_failed_tests_string(playback, ignore_cols = "junk")
get_failed_tests_string(playback, ignore_combinations = list(outside_range = "Sepal.Width"))

Get Meta Data of Validation Tests in a Data Frame

Description

Gets meta data of available validation tests as a data.frame.

Usage

get_tests_meta_data()
get_tests_meta_data()

Details

The meta data of a validation test consists of:

test_name: name of the test
evaluate_level: is the test evaluated on column level ('col') or on row level ('row')?
evaluate_class: what classes of variables are being tested with this specific test?
description: a short description of what a test failure means for the given test

Value

data.frame meta data of validation tests.

Examples

get_tests_meta_data()
get_tests_meta_data()

Ignore Certain Test Results

Description

Ignores certain test results in accordance with user inputs.

Usage

ignore(tests, variables_newdata, ignore_tests = NULL,
  ignore_cols = NULL, ignore_combinations = NULL)
ignore(tests, variables_newdata, ignore_tests = NULL,
  ignore_cols = NULL, ignore_combinations = NULL)

Arguments

`tests`	`list` test results.
`variables_newdata`	`character` names of variables in new data.
`ignore_tests`	`character` ignore test results from tests with these names.
`ignore_cols`	`character` ignore test results from tests of columns with these names.
`ignore_combinations`	`list` ignore test results from specific tests of specific columns.

Details

Look up the descriptions and other meta data of the available validation tests with get_tests_meta_data.

Value

list only the relevant test results.

Ignore Test Results from Tests of Specific Columns

Description

Ignore Test Results from Tests of Specific Columns

Usage

ignore_cols(tests, col_names, variables_newdata)
ignore_cols(tests, col_names, variables_newdata)

Arguments

`tests`	`list` test results.
`col_names`	`character` names of columns for which test results should be ignored.
`variables_newdata`	`character` names of variables in new data.

Value

list results after removing tests.

Ignore Test Results from Specific Tests of Specific Columns

Description

Ignore Test Results from Specific Tests of Specific Columns

Usage

ignore_combinations(tests, combinations, variables_newdata)
ignore_combinations(tests, combinations, variables_newdata)

Arguments

`tests`	`list` test results.
`combinations`	`list` combinations of tests and columns from which test results should be ignored.
`variables_newdata`	`character` names of variables in new data.

Value

list test results after removals.

Ignore Results from Specific Tests

Description

Ignore Results from Specific Tests

Usage

ignore_tests(tests, test_names = NULL)
ignore_tests(tests, test_names = NULL)

Arguments

`tests`	`list` test results.
`test_names`	`character` names of tests to be ignored.

Value

list results after removing specific tests.

Simulated Iris New Data

Description

A mutated version of the famous 'iris' data set.

Usage

iris_newdata
iris_newdata

Format

A data.frame with 150 rows and 5 columns.

Source

Script attached.

Order Test Results by Test Names

Description

Order Test Results by Test Names

Usage

order_by_tests(dt)
order_by_tests(dt)

Arguments

`dt`	`list` test results.

Value

list test results ordered by test names.

Validate New Data by Playing a Data Tape on It

Description

Runs a set of validation tests on new data to be predicted with an existing predictive model. These tests are based on statistics and meta data of the variables in the training data - recorded with record.

Usage

play(tape, newdata, verbose = TRUE)
play(tape, newdata, verbose = TRUE)

Arguments

`tape`	`data.tape` statistics and meta data recorded from training data.
`newdata`	`data.frame` new data to be predicted with an existing predictive model.
`verbose`	`logical` should messages be printed?

Details

Look up the descriptions and other meta data of the available validation tests with get_tests_meta_data.

Value

data.playback results from validation tests.

Examples

# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
play(tape, iris_newdata)
# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
play(tape, iris_newdata)

Print Data Playback

Description

Print Data Playback

Usage

## S3 method for class 'data.playback'
print(x, ...)
## S3 method for class 'data.playback'
print(x, ...)

Arguments

`x`	A 'data.playback' object.
`...`	further arguments passed to or from other methods.

Value

The original object (invisibly)

Examples

# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)
# print it.
print(playback)
# record tape from `iris`.
tape <- record(iris)
# load data.
data(iris_newdata)
# validate new data by playing new tape on it.
playback <- play(tape, iris_newdata)
# print it.
print(playback)

Record Statistics and Meta Data of Variables in Training Data

Description

Records statistics and meta data of variables in the training data for a predictive model. The recorded data can then be used to compute a set of validation tests on new data with play.

Usage

record(x, ...)
record(x, ...)

Arguments

`x`	training data (or just a single variable from the training data) to record the statistics and other relevant meta data of.
`...`	further arguments passed to or from other methods.

Value

list recorded statistics and meta data. The list will inherit from the data.tape class when the function is invoked with a data.frame.

Examples

record(iris)
record(iris)

Record Statistics and Meta Data of a Character

Description

Records statistics and meta data of a character.

Usage

## S3 method for class 'character'
record(x, ...)
## S3 method for class 'character'
record(x, ...)

Arguments

`x`	`character`
`...`	all further arguments.

Value

list recorded statistics and meta data.

Examples

record(letters)
record(letters)

Record Statistics and Meta Data of a Data Frame

Description

Records Statistics and meta data of a data.frame.

Usage

## S3 method for class 'data.frame'
record(x, verbose = TRUE, ...)
## S3 method for class 'data.frame'
record(x, verbose = TRUE, ...)

Arguments

`x`	`data.frame` training data for predictive model.
`verbose`	`logical` should messages be printed?
`...`	all further arguments.

Value

list recorded statistics and meta data.

Examples

record(iris)
record(iris)

Record Statistics and Meta Data

Description

Records statistics and meta data.

Usage

## Default S3 method:
record(x, ...)
## Default S3 method:
record(x, ...)

Arguments

`x`	anything.
`...`	all further arguments.

Value

list recorded statistics and meta data.

Examples

some_junk_letters <- letters[1:10]
class(some_junk_letters) <- "junk"
record(some_junk_letters)
some_junk_letters <- letters[1:10]
class(some_junk_letters) <- "junk"
record(some_junk_letters)

Record Statistics and Meta Data of a Factor

Description

Records statistics and meta data of a factor.

Usage

## S3 method for class 'factor'
record(x, ...)
## S3 method for class 'factor'
record(x, ...)

Arguments

`x`	`factor`
`...`	all further arguments.

Value

list recorded statistics and meta data.

Examples

record(iris$Species)
record(iris$Species)

Record Statistics and Meta Data of an Integer

Description

Records statistics and meta data of an integer.

Usage

## S3 method for class 'integer'
record(x, ...)
## S3 method for class 'integer'
record(x, ...)

Arguments

`x`	`integer`
`...`	all further arguments.

Value

list recorded statistics and meta data.

Examples

record(c(1:10, NA_integer_))
record(c(1:10, NA_integer_))

Record Statistics and Meta Data of a Numeric

Description

Records statistics and meta data of a numeric.

Usage

## S3 method for class 'numeric'
record(x, ...)
## S3 method for class 'numeric'
record(x, ...)

Arguments

`x`	`numeric`
`...`	all further arguments.

Value

list recorded statistics and meta data.

Examples

record(iris$Sepal.Length)
record(iris$Sepal.Length)

Run Validation Tests on Variable in New Data

Description

Runs a set of validation tests on a variable in new data. These tests are based on statistics and meta data of the same variable recorded (with record) from the training data.

Usage

run_validation_tests(x, parameters, ...)
run_validation_tests(x, parameters, ...)

Arguments

`x`	variable in new data.
`parameters`	`list` statistics and meta data of the same variable recorded from training data (with `record`).
`...`	further arguments passed to or from other methods. Not used at the moment.

Details

Look up the descriptions and other meta data of the available validation tests with get_tests_meta_data.

Value

list results from validation tests.

Run Validation Tests on Character

Description

Runs a set of validation tests on a character in new data. These tests are based on statistics and meta data of the same variable recorded (with record) from the training data.

Usage

## S3 method for class 'character'
run_validation_tests(x, parameters, ...)
## S3 method for class 'character'
run_validation_tests(x, parameters, ...)

Arguments

`x`	`character` in new data.
`parameters`	`list` statistics and meta data of the same variable recorded from training data (with `record`).
`...`	further arguments passed to or from other methods. Not used at the moment.

Value

list results from validation tests.

Run Validation Tests on Variable

Description

Runs a set of validation tests on variable in new data. These tests are based on statistics and meta data of the same variable recorded (with record) from the training data.

Usage

## Default S3 method:
run_validation_tests(x, parameters, ...)
## Default S3 method:
run_validation_tests(x, parameters, ...)

Arguments

`x`	anything.
`parameters`	`list` statistics and meta data of the same variable recorded from training data (with `record`).
`...`	further arguments passed to or from other methods. Not used at the moment.

Value

list results from validation tests.

Run Validation Tests on Factor

Description

Runs a set of validation tests on a factor in new data. These tests are based on statistics and meta data of the same variable recorded (with record) from the training data.

Usage

## S3 method for class 'factor'
run_validation_tests(x, parameters, ...)
## S3 method for class 'factor'
run_validation_tests(x, parameters, ...)

Arguments

`x`	`factor` in new data.
`parameters`	`list` statistics and meta data of the same variable recorded from training data (with `record`).
`...`	further arguments passed to or from other methods. Not used at the moment.

Value

list results from validation tests.

Run Validation Tests on Integer

Description

Runs a set of validation tests on a integer in new data. These tests are based on statistics and meta data of the same variable recorded (with record) from the training data.

Usage

## S3 method for class 'integer'
run_validation_tests(x, parameters, ...)
## S3 method for class 'integer'
run_validation_tests(x, parameters, ...)

Arguments

`x`	`integer` in new data.
`parameters`	`list` statistics and meta data of the same variable recorded from training data (with `record`).
`...`	further arguments passed to or from other methods. Not used at the moment.

Value

list results from validation tests.

Run Validation Tests on a Numeric

Description

Runs a set of validation tests on a numeric in new data. These tests are based on statistics and meta data of the same variable recorded (with record) from the training data.

Usage

## S3 method for class 'numeric'
run_validation_tests(x, parameters, ...)
## S3 method for class 'numeric'
run_validation_tests(x, parameters, ...)

Arguments

`x`	`numeric` in new data.
`parameters`	`list` statistics and meta data of the same variable recorded from training data (with `record`).
`...`	further arguments passed to or from other methods. Not used at the moment.

Value

list results from validation tests.

Package 'recorder'

Help Index

Compress Results of Detailed Tests

Description

Usage

Arguments

Value

Concatenate Validation Test Failures Descriptions

Description

Usage

Arguments

Value

Create Data Frame with Test Results

Description

Usage

Arguments

Value

Create Meta Data of Validation Tests

Description

Usage

Details

Value

Examples

Get Clean Rows

Description

Usage

Arguments

Details

Value

Examples

Get Failed Tests

Description

Usage

Arguments

Value

Examples

Get Failed Tests as a String

Description

Usage

Arguments

Details

Value

Examples

Get Meta Data of Validation Tests in a Data Frame

Description

Usage

Details

Value

Examples

Ignore Certain Test Results

Description

Usage

Arguments

Details

Value

Ignore Test Results from Tests of Specific Columns

Description

Usage

Arguments

Value

Ignore Test Results from Specific Tests of Specific Columns

Description

Usage

Arguments

Value

Ignore Results from Specific Tests

Description

Usage

Arguments

Value

Simulated Iris New Data

Description

Usage

Format

Source

Order Test Results by Test Names

Description

Usage

Arguments

Value