Models used for predicting sample label ("POS": robustly maps R-loops; "NEG": poorly maps R-loops ) based on R-loop-forming sequences analysis (RLSeq::analyzeRLFS()). These models are used with RLSeq::predictCondition().

fft_model(quiet = FALSE)

prep_features(quiet = FALSE)

Arguments

quiet

If TRUE, messages are suppressed. Default: FALSE.

Value

A model object from the caret package.

Details

Source

The models were developed as part of a semi-automated online learning scheme found in the RLBase-data protocol here. Briefly, R-loop-forming sequences (RLFS) analysis was performed using RLSeq::analyzeRLFS() for every sample peakset in RLBase (see rlfs_res for full results). The samples were then manually inspected and any which starkly differed from their label were removed. Out of 693 possible samples, 135 were excluded due to a mismatch with their label. The remaining steps were performed automatically.

  • First,The non-discarded samples were partitioned 50:25:25 (train:test:discovery). Feature transformation was performed on the full data-set using the "YeoJohnson" transform along with typical standardization via caret::preProcess().

  • Then, feature selection was performed in the discovery set using Boruta::Boruta().

  • Then, the training set was then trained using a stacked ensemble model:

    • The ensemble model is a Random Forest and the 5 base models in the stack are:

      • Latent Dirichlet allocation

      • Recursive partitioning

      • Generalized linear model (logit)

      • K-nearest neighbors

      • Support vector machine (radial)

    • 10-fold 5-repeated cross-validation was implemented during training.

  • Finally, The model was then evaluated in the testing set. It demonstrates an accuracy of 0.9043. For more details, see the HTML report here.

Structure

  • prep_features()

    • A feature-transform model which prepares the data for classification.

    • It is an object of class preProcess from the caret::preProcess() function call.

  • fft_model()

    • A binary classifier which returns "POS" or "NEG".

    • It is an object of class caretStack from the caretEnsemble::caretList() function call.

Usage

These models are used internally by RLSeq::predictCondition().

Examples

fftModel <- fft_model()

pfModel <- prep_features()