# Call the parent class
library(filtro)
class_score
Create your own score class object
Create a new score class object for feature selection.
Introduction
To use code in this article, you will need to install the following packages: filtro and modeldata.
Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects can be found here. However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection.
For reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function.
Regarding scoring objects:
There is a parent class class_score
, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own scoring class object.
Therefore, the general procedure is to:
Create a subclass
class_score_*
that inherts fromclass_score
. This subclass introduces additional, method- or score-specific properties, as opposed to the general characteristics already defined in the parent class.Implement the scoring method in
fit()
, which computes feature score. Thefit()
generic refers to the subclass from step 1 to use the appropriatefit()
method.
The hierarchy can be visualized as:
class_score
└─> class_score_*
└─> fit()
Additionally, we provide guidance on documenting an S7 method.
Parent class (General scoring object)
All the subclasses (custom scoring objects) share the same parent class named class_score
. The parent class is already implemented, and we need this to build our own scoring class object:
These are the fixed properties (attributes) for this object:
args(class_score)
#> function (outcome_type = c("numeric", "factor"), predictor_type = c("numeric",
#> "factor"), case_weights = logical(0), range = integer(0), inclusive = logical(0),
#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL,
#> direction = character(0), deterministic = logical(0), tuning = logical(0),
#> calculating_fn = function() NULL, label = character(0), packages = character(0),
#> results = data.frame())
#> NULL
For example:
case_weights
: Does the method accpet case weights? It isTRUE
orFALSE
.fallback_value
: What is a value that can be used for the statistic so that it will never be eliminated? For example,0
orInf
.direction
: What direction of values indicates the most important values? For example,minimize
ormaximum
.results
: A slot for the results once the method is fitted. Initially, this is an empty data frame.
For details on its constructor and its remaining properties, please refer to the package documentation or check it out here.
Subclass (Custom scoring object)
All custom scoring objects implemented in filtro are subclasses of the parent class class_score
, meaning that they all inherit the parent class’s fixed properties. When creating a new scoring object, we do so by defining another subclass of this parent class.
class_score
└─> class_score_aov (example shown)
└─> class_score_cor
└─> ...
As an example, we demonstrate how to create a custom scoring object for ANOVA F-test named class_score_aov
.
For reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by lm()
. The lm()
function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance.
By setting parent = class_score
, the subclass class_score_aov
inherits all of the fixed properties from the parent class. Additional, implementation-specific properties can be added using the properties =
argument. For example:
# Create a subclass named 'class_score_aov'
<- S7::new_class(
class_score_aov "class_score_aov",
parent = class_score,
properties = list(
neg_log10 = S7::new_property(S7::class_logical, default = TRUE)
) )
In addition to the properties inherited from the parent class (discussed in the previous section), class_score_aov
also includes the following property:
neg_log10
: Represent the score as-log10(p_value)
? It isTRUE
orFALSE
.
For the ANOVA F-test filter, users can represent the score using either the
p-value or
F-statistic.
We demonstrate how to create these instances (objects) accordingly.
score_aov_pval
is created as an instance of the class_score_aov
subclass by calling its constructor and specifying its properties:
# ANOVA p-value
<-
score_aov_pval class_score_aov(
outcome_type = c("numeric", "factor"),
predictor_type = c("numeric", "factor"),
case_weights = TRUE,
range = c(0, Inf),
inclusive = c(FALSE, FALSE),
fallback_value = Inf,
score_type = "aov_pval",
direction = "maximize",
deterministic = TRUE,
tuning = FALSE,
label = "ANOVA p-values"
)
Once instantiated, individual properties can be accessed via object@
. For example:
@case_weights
score_aov_pval#> [1] TRUE
@fallback_value
score_aov_pval#> [1] Inf
@direction
score_aov_pval#> [1] "maximize"
Note that by default, the returned p-value is transformed to -log10(p_value)
, which means larger values correspond to more important predictors. This is why the fallback value is set to Inf
and the direction is set to "maximize"
.
score_aov_fstat
is another instance of the class_score_aov
subclass:
# ANOVA F-statistic
<-
score_aov_fstat class_score_aov(
outcome_type = c("numeric", "factor"),
predictor_type = c("numeric", "factor"),
case_weights = TRUE,
range = c(0, Inf),
inclusive = c(FALSE, FALSE),
fallback_value = Inf,
score_type = "aov_fstat",
direction = "maximize",
deterministic = TRUE,
tuning = FALSE,
label = "ANOVA F-statistics"
)
The F-statistic is not transformed, nor does it provide an option for transformation. Nevertheless, it also uses a fallback value of Inf
with the direction set to "maximize"
, since larger F-statistic values indicate more important predictors.
Fitting (or estimating) feature score
So far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter.
We now discuss the dual role of fit()
: it functions both as a generic and as the methods used to fit (or estimate) feature score.
The
fit()
generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method.We also define multiple methods named
fit()
. Eachfit()
method performs the actual fitting or score estimation for a specific class of object.
In other words, when fit()
is called, the generic refers to the custom scoring object class_score_*
to determine which method to dispatch. The actual scoring computation is performed within the dispatched method.
class_score
└─> class_score_aov (example shown)
└─> fit()
└─> class_score_cor
└─> fit()
└─> ...
Let’s use the ANOVA F-test filter again as an example:
# User-level example: Check the class of the object
class(score_aov_pval)
#> [1] "filtro::class_score_aov" "filtro::class_score"
#> [3] "S7_object"
class(score_aov_fstat)
#> [1] "filtro::class_score_aov" "filtro::class_score"
#> [3] "S7_object"
Both instances (objects) belong to the custom scoring object class_score_aov
. Therefore, when fit()
is called, the method for class_score_aov
is dispatched, performing the actual fitting using the ANOVA F-test:
# User-level example: Method dispatch for objects of class `class_score_aov`
|>
score_aov_pval fit(Sale_Price ~ ., data = ames)
|>
score_aov_fstat fit(Sale_Price ~ ., data = ames)
Defining S7 methods
For users to use the fit()
method described above, we need to define a S7 method that implements the scoring logic.
The following code defines the fit()
method specifically for the class_score_aov
subclass, specifying how feature score should be computed using ANOVA F-test:
# Define the scoring method for `class_score_aov`
::method(fit, class_score_aov) <- function(
S7
object,
formula,
data,case_weights = NULL,
...
) {# This is where you add the rest of the code for this implementation
@results <- res
object
object }
We would want to do something similar to define a S7 method for other class_score_*
subclass.
Documenting S7 methods
Documentation for S7 methods is still a work in progress, but our current best approach is as follows:
We re-export the
fit()
generic from generics.Instead of documenting each
fit()
method, we document it in the “Details” section and the “Estimating the scores” subsection of the documentation for the corresponding object (instance)score_*
.
The code below opens the help page for the fit()
generic:
# User-level example: Help page for `fit()` generic
?fit
The code below opens the help page for specific fit()
method:
# User-level example: Help page for `fit()` method along with the documentation for the specific object
?score_aov_pval ?score_aov_fstat
For users to access the help page using ?
as described above, the fit()
method needs to be exported using #' @export
, but it is not documented directly.
#' @export
::method(fit, class_score_aov) <- function(
S7
object,
...
) {# Include the rest of the function here
@results <- res
object
object }
Instead, documentation is provided in the “Details” section and the “Estimating the scores” subsection of the documentation for the score_aov_pval
object.
#' Scoring via analysis of variance hypothesis tests
#'
#' @description
#'
#' @name score_aov_pval
#' @family class score metrics
#'
#' @details
#'
#' These objects are used when either:
#'
#' ...
#'
#' ## Estimating the scores
#'
#' In \pkg{filtro}, the `score_*` objects define a scoring method (e.g., data
#' input requirements, package dependencies, etc). To compute the scores for
#' a specific data set, the `fit()` method is used. The main arguments for
#' these functions are:
#'
#' \describe{
#' \item{`object`}{A score class object (e.g., `score_aov_pval`).}
#' \item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}
#' \item{`data`}{A data frame containing the relevant columns defined by the formula.}
#' \item{`...`}{Further arguments passed to or from other methods.}
#' \item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}
#' }
#'
#' ...
#'
#' @export
<-
score_aov_pval class_score_aov(
outcome_type = c("numeric", "factor"),
predictor_type = c("numeric", "factor"),
case_weights = TRUE,
range = c(0, Inf),
inclusive = c(FALSE, FALSE),
fallback_value = Inf,
score_type = "aov_pval",
transform_fn = function(x) x,
direction = "maximize",
deterministic = TRUE,
tuning = FALSE,
label = "ANOVA p-values"
)
We can have the score_aov_fstat
object share the same help page as score_aov_pval
by using #' @name
. This avoids repeated documentation for similar or related objects.
#' @name score_aov_pval
#' @export
<-
score_aov_fstat class_score_aov(
outcome_type = c("numeric", "factor"),
predictor_type = c("numeric", "factor"),
case_weights = TRUE,
range = c(0, Inf),
inclusive = c(FALSE, FALSE),
fallback_value = Inf,
score_type = "aov_fstat",
transform_fn = function(x) x,
direction = "maximize",
deterministic = TRUE,
tuning = FALSE,
label = "ANOVA F-statistics"
)
Accessing results after fitting
Once the method has been fitted via fit()
, the data frame of results can be accessed via object@results
.
We use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. Sale_Price
is the outcome and is numeric.
library(modeldata)
<- modeldata::ames |>
ames_subset # Use a subset of data for demonstration
::select(
dplyr
Sale_Price,
MS_SubClass,
MS_Zoning,
Lot_Frontage,
Lot_Area,
Street
)<- ames_subset |>
ames_subset ::mutate(Sale_Price = log10(Sale_Price)) dplyr
Next, we fit the score as we discuss before:
# Specify ANOVA p-value and fit score
<-
ames_aov_pval_res |>
score_aov_pval fit(Sale_Price ~ ., data = ames_subset)
# Specify ANOVA F-statistic and fit score
<-
ames_aov_fstat_res |>
score_aov_fstat fit(Sale_Price ~ ., data = ames_subset)
Recall that individual properties of an object can be accessed using object@
. Once the method has been fitted, the resulting data frame can be accessed via object@results
:
@results
ames_aov_pval_res#> # A tibble: 5 × 4
#> name score outcome predictor
#> <chr> <dbl> <chr> <chr>
#> 1 aov_pval 237. Sale_Price MS_SubClass
#> 2 aov_pval 130. Sale_Price MS_Zoning
#> 3 aov_pval NA Sale_Price Lot_Frontage
#> 4 aov_pval NA Sale_Price Lot_Area
#> 5 aov_pval 5.75 Sale_Price Street
@results
ames_aov_fstat_res#> # A tibble: 5 × 4
#> name score outcome predictor
#> <chr> <dbl> <chr> <chr>
#> 1 aov_fstat 94.5 Sale_Price MS_SubClass
#> 2 aov_fstat 115. Sale_Price MS_Zoning
#> 3 aov_fstat NA Sale_Price Lot_Frontage
#> 4 aov_fstat NA Sale_Price Lot_Area
#> 5 aov_fstat 22.9 Sale_Price Street
Session information
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> ─ Session info ─────────────────────────────────────────────────────
#> version R version 4.5.0 (2025-04-11)
#> language (EN)
#> date 2025-08-29
#> pandoc 3.6.3
#> quarto 1.7.32
#>
#> ─ Packages ─────────────────────────────────────────────────────────
#> package version date (UTC) source
#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)
#> filtro 0.2.0 2025-08-26 CRAN (R 4.5.0)
#> modeldata 1.5.1 2025-08-22 CRAN (R 4.5.0)
#> purrr 1.1.0 2025-07-10 CRAN (R 4.5.0)
#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)
#> tibble 3.3.0 2025-06-08 CRAN (R 4.5.0)
#>
#> ────────────────────────────────────────────────────────────────────