| Title: | Toolkit to Validate New Data for a Predictive Model |
|---|---|
| Description: | A lightweight toolkit to validate new observations when computing their predictions with a predictive model. The validation process consists of two steps: (1) record relevant statistics and meta data of the variables in the original training data for the predictive model and (2) use these data to run a set of basic validation tests on the new set of observations. |
| Authors: | Lars Kjeldgaard [aut, cre] |
| Maintainer: | Lars Kjeldgaard <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.8.2 |
| Built: | 2026-05-10 07:52:43 UTC |
| Source: | https://github.com/smaakage85/recorder |
Subsets results of the tests, where at least one row failed.
compress_detailed_tests(dt)compress_detailed_tests(dt)
dt |
|
list with test failures.
Concatenates validation test failures descriptions to a single character vector.
concatenate_test_failures(test_failures)concatenate_test_failures(test_failures)
test_failures |
|
character concatenated descriptions of test failures with
one string pr. row.
Create Data Frame with Test Results
create_test_results_df(x)create_test_results_df(x)
x |
|
data.table with test results as columns.
Creates meta data of available validation tests as a list. The list has as many elements as the number of available validation test - one for each test. Entries are named after the different tests.
create_tests_meta_data()create_tests_meta_data()
The meta data of a validation test consists of:
is the test evaluated on column level ('col') or on row level ('row')?
what classes of variables are being tested with this specific test?
a short description of what a test failure means for the given test
list meta data of validation tests.
create_tests_meta_data()create_tests_meta_data()
Get Clean Rows
get_clean_rows(playback, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)get_clean_rows(playback, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)
playback |
|
ignore_tests |
|
ignore_cols |
|
ignore_combinations |
|
Look up the descriptions and other meta data of the available
validation tests with get_tests_meta_data.
logical with the same length as the number of rows in new
data. The value is TRUE, if the row passed all tests, otherwise FALSE.
# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) get_clean_rows(playback) get_clean_rows(playback, ignore_tests = "outside_range") get_clean_rows(playback, ignore_cols = "junk") get_clean_rows(playback, ignore_combinations = list(outside_range = "Sepal.Width"))# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) get_clean_rows(playback) get_clean_rows(playback, ignore_tests = "outside_range") get_clean_rows(playback, ignore_cols = "junk") get_clean_rows(playback, ignore_combinations = list(outside_range = "Sepal.Width"))
Get Failed Tests
get_failed_tests(playback, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)get_failed_tests(playback, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)
playback |
|
ignore_tests |
|
ignore_cols |
|
ignore_combinations |
|
data.table with test results as logicals for all of the tests
with at least one failure. A failed test for any given row is
equivalent to a value of TRUE. If all tests passed, the function will simply
return a data.table with one column, 'any_failures', that is always FALSE,
to ensure that the output is (type) stable and consistent.
# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) get_failed_tests(playback) get_failed_tests(playback, ignore_tests = "outside_range") get_failed_tests(playback, ignore_cols = "junk") get_failed_tests(playback, ignore_combinations = list(outside_range = "Sepal.Width"))# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) get_failed_tests(playback) get_failed_tests(playback, ignore_tests = "outside_range") get_failed_tests(playback, ignore_cols = "junk") get_failed_tests(playback, ignore_combinations = list(outside_range = "Sepal.Width"))
Concatenates information of the tests that failed into one single character vector.
get_failed_tests_string(playback, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)get_failed_tests_string(playback, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)
playback |
|
ignore_tests |
|
ignore_cols |
|
ignore_combinations |
|
Look up the descriptions and other meta data of the available
validation tests with get_tests_meta_data.
character with one entry for each row in new data. Each
entry concatenates information of the tests, that did NOT pass for the
corresponding row in new data.
# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) get_failed_tests_string(playback) get_failed_tests_string(playback, ignore_tests = "outside_range") get_failed_tests_string(playback, ignore_cols = "junk") get_failed_tests_string(playback, ignore_combinations = list(outside_range = "Sepal.Width"))# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) get_failed_tests_string(playback) get_failed_tests_string(playback, ignore_tests = "outside_range") get_failed_tests_string(playback, ignore_cols = "junk") get_failed_tests_string(playback, ignore_combinations = list(outside_range = "Sepal.Width"))
Gets meta data of available validation tests as a data.frame.
get_tests_meta_data()get_tests_meta_data()
The meta data of a validation test consists of:
name of the test
is the test evaluated on column level ('col') or on row level ('row')?
what classes of variables are being tested with this specific test?
a short description of what a test failure means for the given test
data.frame meta data of validation tests.
get_tests_meta_data()get_tests_meta_data()
Ignores certain test results in accordance with user inputs.
ignore(tests, variables_newdata, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)ignore(tests, variables_newdata, ignore_tests = NULL, ignore_cols = NULL, ignore_combinations = NULL)
tests |
|
variables_newdata |
|
ignore_tests |
|
ignore_cols |
|
ignore_combinations |
|
Look up the descriptions and other meta data of the available
validation tests with get_tests_meta_data.
list only the relevant test results.
Ignore Test Results from Tests of Specific Columns
ignore_cols(tests, col_names, variables_newdata)ignore_cols(tests, col_names, variables_newdata)
tests |
|
col_names |
|
variables_newdata |
|
list results after removing tests.
Ignore Test Results from Specific Tests of Specific Columns
ignore_combinations(tests, combinations, variables_newdata)ignore_combinations(tests, combinations, variables_newdata)
tests |
|
combinations |
|
variables_newdata |
|
list test results after removals.
Ignore Results from Specific Tests
ignore_tests(tests, test_names = NULL)ignore_tests(tests, test_names = NULL)
tests |
|
test_names |
|
list results after removing specific tests.
A mutated version of the famous 'iris' data set.
iris_newdatairis_newdata
A data.frame with 150 rows and 5 columns.
Script attached.
Order Test Results by Test Names
order_by_tests(dt)order_by_tests(dt)
dt |
|
list test results ordered by test names.
Runs a set of validation tests on new data to be predicted with an existing
predictive model. These tests are based on statistics and meta data of
the variables in the training data - recorded with record.
play(tape, newdata, verbose = TRUE)play(tape, newdata, verbose = TRUE)
tape |
|
newdata |
|
verbose |
|
Look up the descriptions and other meta data of the available
validation tests with get_tests_meta_data.
data.playback results from validation tests.
# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. play(tape, iris_newdata)# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. play(tape, iris_newdata)
Print Data Playback
## S3 method for class 'data.playback' print(x, ...)## S3 method for class 'data.playback' print(x, ...)
x |
A 'data.playback' object. |
... |
further arguments passed to or from other methods. |
The original object (invisibly)
# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) # print it. print(playback)# record tape from `iris`. tape <- record(iris) # load data. data(iris_newdata) # validate new data by playing new tape on it. playback <- play(tape, iris_newdata) # print it. print(playback)
Records statistics and meta data of variables in the training data for a
predictive model. The recorded data can then be used to compute a set
of validation tests on new data with play.
record(x, ...)record(x, ...)
x |
training data (or just a single variable from the training data) to record the statistics and other relevant meta data of. |
... |
further arguments passed to or from other methods. |
list recorded statistics and meta data. The list will inherit
from the data.tape class when the function is invoked with a
data.frame.
record(iris)record(iris)
Records statistics and meta data of a character.
## S3 method for class 'character' record(x, ...)## S3 method for class 'character' record(x, ...)
x |
|
... |
all further arguments. |
list recorded statistics and meta data.
record(letters)record(letters)
Records Statistics and meta data of a data.frame.
## S3 method for class 'data.frame' record(x, verbose = TRUE, ...)## S3 method for class 'data.frame' record(x, verbose = TRUE, ...)
x |
|
verbose |
|
... |
all further arguments. |
list recorded statistics and meta data.
record(iris)record(iris)
Records statistics and meta data.
## Default S3 method: record(x, ...)## Default S3 method: record(x, ...)
x |
anything. |
... |
all further arguments. |
list recorded statistics and meta data.
some_junk_letters <- letters[1:10] class(some_junk_letters) <- "junk" record(some_junk_letters)some_junk_letters <- letters[1:10] class(some_junk_letters) <- "junk" record(some_junk_letters)
Records statistics and meta data of a factor.
## S3 method for class 'factor' record(x, ...)## S3 method for class 'factor' record(x, ...)
x |
|
... |
all further arguments. |
list recorded statistics and meta data.
record(iris$Species)record(iris$Species)
Records statistics and meta data of an integer.
## S3 method for class 'integer' record(x, ...)## S3 method for class 'integer' record(x, ...)
x |
|
... |
all further arguments. |
list recorded statistics and meta data.
record(c(1:10, NA_integer_))record(c(1:10, NA_integer_))
Records statistics and meta data of a numeric.
## S3 method for class 'numeric' record(x, ...)## S3 method for class 'numeric' record(x, ...)
x |
|
... |
all further arguments. |
list recorded statistics and meta data.
record(iris$Sepal.Length)record(iris$Sepal.Length)
Runs a set of validation tests on a variable in new data. These tests are
based on statistics and meta data of the same variable recorded
(with record) from the training data.
run_validation_tests(x, parameters, ...)run_validation_tests(x, parameters, ...)
x |
variable in new data. |
parameters |
|
... |
further arguments passed to or from other methods. Not used at the moment. |
Look up the descriptions and other meta data of the available
validation tests with get_tests_meta_data.
list results from validation tests.
Runs a set of validation tests on a character in new data. These tests
are based on statistics and meta data of the same variable recorded
(with record) from the training data.
## S3 method for class 'character' run_validation_tests(x, parameters, ...)## S3 method for class 'character' run_validation_tests(x, parameters, ...)
x |
|
parameters |
|
... |
further arguments passed to or from other methods. Not used at the moment. |
list results from validation tests.
Runs a set of validation tests on variable in new data. These tests
are based on statistics and meta data of the same variable recorded
(with record) from the training data.
## Default S3 method: run_validation_tests(x, parameters, ...)## Default S3 method: run_validation_tests(x, parameters, ...)
x |
anything. |
parameters |
|
... |
further arguments passed to or from other methods. Not used at the moment. |
list results from validation tests.
Runs a set of validation tests on a factor in new data. These tests
are based on statistics and meta data of the same variable recorded
(with record) from the training data.
## S3 method for class 'factor' run_validation_tests(x, parameters, ...)## S3 method for class 'factor' run_validation_tests(x, parameters, ...)
x |
|
parameters |
|
... |
further arguments passed to or from other methods. Not used at the moment. |
list results from validation tests.
Runs a set of validation tests on a integer in new data. These tests
are based on statistics and meta data of the same variable recorded
(with record) from the training data.
## S3 method for class 'integer' run_validation_tests(x, parameters, ...)## S3 method for class 'integer' run_validation_tests(x, parameters, ...)
x |
|
parameters |
|
... |
further arguments passed to or from other methods. Not used at the moment. |
list results from validation tests.
Runs a set of validation tests on a numeric in new data. These tests
are based on statistics and meta data of the same variable recorded
(with record) from the training data.
## S3 method for class 'numeric' run_validation_tests(x, parameters, ...)## S3 method for class 'numeric' run_validation_tests(x, parameters, ...)
x |
|
parameters |
|
... |
further arguments passed to or from other methods. Not used at the moment. |
list results from validation tests.