Package 'RobustPrediction' reference manual

Title:	Robust Tuning and Training for Cross-Source Prediction
Description:	Provides robust parameter tuning and model training for predictive models applied across data sources where the data distribution varies slightly from source to source. This package implements three primary tuning methods: cross-validation-based internal tuning, external tuning, and the 'RobustTuneC' method. External tuning includes a conservative option where parameters are tuned internally on the training data and validating on an external dataset, providing a slightly pessimistic AUC estimate. It supports Lasso, Ridge, Random Forest, Boosting, and Support Vector Machine classifiers. Currently, only binary classification is supported. The response variable must be the first column of the dataset and a factor with exactly two levels. The tuning methods are based on the paper by Nicole Ellenbach, Anne-Laure Boulesteix, Bernd Bischl, Kristian Unger, and Roman Hornung (2021) "Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning" <doi:10.1007/s00357-020-09368-z>.
Authors:	Yuting He [aut, cre], Nicole Ellenbach [ctb], Roman Hornung [ctb]
Maintainer:	Yuting He <[email protected]>
License:	GPL-3
Version:	0.1.6
Built:	2025-03-18 06:05:02 UTC
Source:	https://github.com/yuting-he/robustprediction

Package Title: Robust Tuning and Training for Cross-Source Prediction

Description

This package provides robust parameter tuning and predictive modeling techniques, useful for situations where prediction across different data sources is important and the data distribution varies slightly from source to source.

Details

The 'RobustPrediction' package helps users build and tune classifiers using the methods 'RobustTuneC' method, internal, or external tuning method. The package supports the following classifiers: boosting, lasso, ridge, random forest, and support vector machine(SVM). It is intended for scenarios where parameter tuning across data sources is important.

The 'RobustPrediction' package provides comprehensive tools for robust parameter tuning and predictive modeling, particularly for cross-source prediction tasks.

The package includes functions for tuning model parameters using three methods: - **Internal tuning**: Standard cross-validation on the training data to select the best parameters. - **External tuning**: Parameter tuning based on an external dataset that is independent of the training data. This method has two variants controlled by the estperf argument: - **Standard external tuning (estperf = FALSE)**: Parameters are tuned directly using the external dataset. This is the default approach and provides a straightforward method for selecting optimal parameters based on external data. - **Conservative external tuning (estperf = TRUE)**: Internal tuning is first performed on the training data, and then the model is evaluated on the external dataset. This approach provides a more conservative (slightly pessimistic) AUC estimate, as described by Ellenbach et al. (2021). For the most accurate performance evaluation, it is recommended to use a second external dataset. - **RobustTuneC**: A method designed to combine internal and external tuning for better performance in cross-source scenarios.

The package supports Lasso, Ridge, Random Forest, Boosting, and SVM classifiers. These models can be trained and tuned using the provided methods, and the package includes the model's AUC (Area Under the Curve) value to help users evaluate prediction performance.

It is particularly useful when the data to be predicted comes from a different source than the training data, where variability between datasets may require more robust parameter tuning techniques. The methods provided in this package may help reduce overfitting the training data distribution and improve model generalization across different data sources.

Dependencies

This package requires the following packages: glmnet, mboost, mlr, pROC, ranger.

Author(s)

Maintainer: Yuting He [email protected]

Other contributors:

Nicole Ellenbach [contributor]
Roman Hornung [contributor]

References

Ellenbach, N., Boulesteix, A.-L., Bischl, B., Unger, K., & Hornung, R. (2021). Improved outcome prediction across data sources through robust parameter tuning. Journal of Classification, 38, 212-231. <doi:10.1007/s00357-020-09368-z>.

Examples

# Example usage:
data(sample_data_train)
data(sample_data_extern)
res <- tuneandtrain(sample_data_train, sample_data_extern, tuningmethod = "robusttunec", 
  classifier = "lasso")

# Example usage:
data(sample_data_train)
data(sample_data_extern)
res <- tuneandtrain(sample_data_train, sample_data_extern, tuningmethod = "robusttunec", 
  classifier = "lasso")

Sample External Validation Data Subset

Description

This dataset, named 'sample_data_extern', is a subset of publicly available microarray data from the HG-U133PLUS2 chip. It contains expression levels of 200 genes across 50 samples, used primarily as an external validation set in robust feature selection studies. The data has been sourced from the ArrayExpress repository and has been referenced in several research articles.

Usage

sample_data_extern
sample_data_extern

Format

A data frame with 50 observations and 201 variables, including:

y: Factor. The response variable.
236694_at: Numeric. Expression level of gene 236694_at.
222356_at: Numeric. Expression level of gene 222356_at.
1554125_a_at: Numeric. Expression level of gene 1554125_a_at.
232823_at: Numeric. Expression level of gene 232823_at.
205766_at: Numeric. Expression level of gene 205766_at.
1560446_at: Numeric. Expression level of gene 1560446_at.
202565_s_at: Numeric. Expression level of gene 202565_s_at.
234887_at: Numeric. Expression level of gene 234887_at.
209687_at: Numeric. Expression level of gene 209687_at.
221592_at: Numeric. Expression level of gene 221592_at.
1570123_at: Numeric. Expression level of gene 1570123_at.
241368_at: Numeric. Expression level of gene 241368_at.
243324_x_at: Numeric. Expression level of gene 243324_x_at.
224046_s_at: Numeric. Expression level of gene 224046_s_at.
202775_s_at: Numeric. Expression level of gene 202775_s_at.
216332_at: Numeric. Expression level of gene 216332_at.
1569545_at: Numeric. Expression level of gene 1569545_at.
205946_at: Numeric. Expression level of gene 205946_at.
203547_at: Numeric. Expression level of gene 203547_at.
243239_at: Numeric. Expression level of gene 243239_at.
234245_at: Numeric. Expression level of gene 234245_at.
210832_x_at: Numeric. Expression level of gene 210832_x_at.
224549_x_at: Numeric. Expression level of gene 224549_x_at.
236628_at: Numeric. Expression level of gene 236628_at.
214848_at: Numeric. Expression level of gene 214848_at.
1553015_a_at: Numeric. Expression level of gene 1553015_a_at.
1554199_at: Numeric. Expression level of gene 1554199_at.
1557636_a_at: Numeric. Expression level of gene 1557636_a_at.
1558511_s_at: Numeric. Expression level of gene 1558511_s_at.
1561713_at: Numeric. Expression level of gene 1561713_at.
1561883_at: Numeric. Expression level of gene 1561883_at.
1568720_at: Numeric. Expression level of gene 1568720_at.
1569168_at: Numeric. Expression level of gene 1569168_at.
1569443_s_at: Numeric. Expression level of gene 1569443_s_at.
1570103_at: Numeric. Expression level of gene 1570103_at.
200916_at: Numeric. Expression level of gene 200916_at.
201554_x_at: Numeric. Expression level of gene 201554_x_at.
202371_at: Numeric. Expression level of gene 202371_at.
204481_at: Numeric. Expression level of gene 204481_at.
205831_at: Numeric. Expression level of gene 205831_at.
207061_at: Numeric. Expression level of gene 207061_at.
207423_s_at: Numeric. Expression level of gene 207423_s_at.
209896_s_at: Numeric. Expression level of gene 209896_s_at.
212646_at: Numeric. Expression level of gene 212646_at.
214068_at: Numeric. Expression level of gene 214068_at.
217727_x_at: Numeric. Expression level of gene 217727_x_at.
221103_s_at: Numeric. Expression level of gene 221103_s_at.
221785_at: Numeric. Expression level of gene 221785_at.
224207_x_at: Numeric. Expression level of gene 224207_x_at.
228257_at: Numeric. Expression level of gene 228257_at.
228877_at: Numeric. Expression level of gene 228877_at.
231173_at: Numeric. Expression level of gene 231173_at.
231328_s_at: Numeric. Expression level of gene 231328_s_at.
231639_at: Numeric. Expression level of gene 231639_at.
232221_x_at: Numeric. Expression level of gene 232221_x_at.
232349_x_at: Numeric. Expression level of gene 232349_x_at.
232849_at: Numeric. Expression level of gene 232849_at.
233601_at: Numeric. Expression level of gene 233601_at.
234403_at: Numeric. Expression level of gene 234403_at.
234585_at: Numeric. Expression level of gene 234585_at.
234650_at: Numeric. Expression level of gene 234650_at.
234897_s_at: Numeric. Expression level of gene 234897_s_at.
236071_at: Numeric. Expression level of gene 236071_at.
236689_at: Numeric. Expression level of gene 236689_at.
238551_at: Numeric. Expression level of gene 238551_at.
239414_at: Numeric. Expression level of gene 239414_at.
241034_at: Numeric. Expression level of gene 241034_at.
241131_at: Numeric. Expression level of gene 241131_at.
241897_at: Numeric. Expression level of gene 241897_at.
242611_at: Numeric. Expression level of gene 242611_at.
244805_at: Numeric. Expression level of gene 244805_at.
244866_at: Numeric. Expression level of gene 244866_at.
32259_at: Numeric. Expression level of gene 32259_at.
1552264_a_at: Numeric. Expression level of gene 1552264_a_at.
1552880_at: Numeric. Expression level of gene 1552880_at.
1553186_x_at: Numeric. Expression level of gene 1553186_x_at.
1553372_at: Numeric. Expression level of gene 1553372_at.
1553438_at: Numeric. Expression level of gene 1553438_at.
1554299_at: Numeric. Expression level of gene 1554299_at.
1554362_at: Numeric. Expression level of gene 1554362_at.
1554491_a_at: Numeric. Expression level of gene 1554491_a_at.
1555098_a_at: Numeric. Expression level of gene 1555098_a_at.
1555990_at: Numeric. Expression level of gene 1555990_at.
1556034_s_at: Numeric. Expression level of gene 1556034_s_at.
1556822_s_at: Numeric. Expression level of gene 1556822_s_at.
1556824_at: Numeric. Expression level of gene 1556824_at.
1557278_s_at: Numeric. Expression level of gene 1557278_s_at.
1558603_at: Numeric. Expression level of gene 1558603_at.
1558890_at: Numeric. Expression level of gene 1558890_at.
1560791_at: Numeric. Expression level of gene 1560791_at.
1561083_at: Numeric. Expression level of gene 1561083_at.
1561364_at: Numeric. Expression level of gene 1561364_at.
1561553_at: Numeric. Expression level of gene 1561553_at.
1562523_at: Numeric. Expression level of gene 1562523_at.
1562613_at: Numeric. Expression level of gene 1562613_at.
1563351_at: Numeric. Expression level of gene 1563351_at.
1563473_at: Numeric. Expression level of gene 1563473_at.
1566780_at: Numeric. Expression level of gene 1566780_at.
1567257_at: Numeric. Expression level of gene 1567257_at.
1569664_at: Numeric. Expression level of gene 1569664_at.
1569882_at: Numeric. Expression level of gene 1569882_at.
1570252_at: Numeric. Expression level of gene 1570252_at.
201089_at: Numeric. Expression level of gene 201089_at.
201261_x_at: Numeric. Expression level of gene 201261_x_at.
202052_s_at: Numeric. Expression level of gene 202052_s_at.
202236_s_at: Numeric. Expression level of gene 202236_s_at.
202948_at: Numeric. Expression level of gene 202948_at.
203080_s_at: Numeric. Expression level of gene 203080_s_at.
203211_s_at: Numeric. Expression level of gene 203211_s_at.
203218_at: Numeric. Expression level of gene 203218_at.
203236_s_at: Numeric. Expression level of gene 203236_s_at.
203347_s_at: Numeric. Expression level of gene 203347_s_at.
203960_s_at: Numeric. Expression level of gene 203960_s_at.
204609_at: Numeric. Expression level of gene 204609_at.
204806_x_at: Numeric. Expression level of gene 204806_x_at.
204949_at: Numeric. Expression level of gene 204949_at.
204979_s_at: Numeric. Expression level of gene 204979_s_at.
205823_at: Numeric. Expression level of gene 205823_at.
205902_at: Numeric. Expression level of gene 205902_at.
205967_at: Numeric. Expression level of gene 205967_at.
206186_at: Numeric. Expression level of gene 206186_at.
207151_at: Numeric. Expression level of gene 207151_at.
207379_at: Numeric. Expression level of gene 207379_at.
207440_at: Numeric. Expression level of gene 207440_at.
207883_s_at: Numeric. Expression level of gene 207883_s_at.
208277_at: Numeric. Expression level of gene 208277_at.
208280_at: Numeric. Expression level of gene 208280_at.
209224_s_at: Numeric. Expression level of gene 209224_s_at.
209561_at: Numeric. Expression level of gene 209561_at.
209630_s_at: Numeric. Expression level of gene 209630_s_at.
210118_s_at: Numeric. Expression level of gene 210118_s_at.
210342_s_at: Numeric. Expression level of gene 210342_s_at.
211566_x_at: Numeric. Expression level of gene 211566_x_at.
211756_at: Numeric. Expression level of gene 211756_at.
212170_at: Numeric. Expression level of gene 212170_at.
212494_at: Numeric. Expression level of gene 212494_at.
213118_at: Numeric. Expression level of gene 213118_at.
214475_x_at: Numeric. Expression level of gene 214475_x_at.
214834_at: Numeric. Expression level of gene 214834_at.
215718_s_at: Numeric. Expression level of gene 215718_s_at.
216283_s_at: Numeric. Expression level of gene 216283_s_at.
217206_at: Numeric. Expression level of gene 217206_at.
217557_s_at: Numeric. Expression level of gene 217557_s_at.
217577_at: Numeric. Expression level of gene 217577_at.
218152_at: Numeric. Expression level of gene 218152_at.
218252_at: Numeric. Expression level of gene 218252_at.
219714_s_at: Numeric. Expression level of gene 219714_s_at.
220506_at: Numeric. Expression level of gene 220506_at.
220889_s_at: Numeric. Expression level of gene 220889_s_at.
221204_s_at: Numeric. Expression level of gene 221204_s_at.
221795_at: Numeric. Expression level of gene 221795_at.
222048_at: Numeric. Expression level of gene 222048_at.
223142_s_at: Numeric. Expression level of gene 223142_s_at.
223439_at: Numeric. Expression level of gene 223439_at.
223673_at: Numeric. Expression level of gene 223673_at.
224363_at: Numeric. Expression level of gene 224363_at.
224512_s_at: Numeric. Expression level of gene 224512_s_at.
224690_at: Numeric. Expression level of gene 224690_at.
224936_at: Numeric. Expression level of gene 224936_at.
225334_at: Numeric. Expression level of gene 225334_at.
225713_at: Numeric. Expression level of gene 225713_at.
225839_at: Numeric. Expression level of gene 225839_at.
226041_at: Numeric. Expression level of gene 226041_at.
226093_at: Numeric. Expression level of gene 226093_at.
226543_at: Numeric. Expression level of gene 226543_at.
227695_at: Numeric. Expression level of gene 227695_at.
228295_at: Numeric. Expression level of gene 228295_at.
228548_at: Numeric. Expression level of gene 228548_at.
229234_at: Numeric. Expression level of gene 229234_at.
229658_at: Numeric. Expression level of gene 229658_at.
229725_at: Numeric. Expression level of gene 229725_at.
230252_at: Numeric. Expression level of gene 230252_at.
230471_at: Numeric. Expression level of gene 230471_at.
231149_s_at: Numeric. Expression level of gene 231149_s_at.
231556_at: Numeric. Expression level of gene 231556_at.
231754_at: Numeric. Expression level of gene 231754_at.
232011_s_at: Numeric. Expression level of gene 232011_s_at.
233030_at: Numeric. Expression level of gene 233030_at.
234161_at: Numeric. Expression level of gene 234161_at.
235050_at: Numeric. Expression level of gene 235050_at.
235094_at: Numeric. Expression level of gene 235094_at.
235278_at: Numeric. Expression level of gene 235278_at.
235671_at: Numeric. Expression level of gene 235671_at.
235952_at: Numeric. Expression level of gene 235952_at.
236158_at: Numeric. Expression level of gene 236158_at.
236181_at: Numeric. Expression level of gene 236181_at.
237055_at: Numeric. Expression level of gene 237055_at.
237768_x_at: Numeric. Expression level of gene 237768_x_at.
238897_at: Numeric. Expression level of gene 238897_at.
239160_at: Numeric. Expression level of gene 239160_at.
239998_at: Numeric. Expression level of gene 239998_at.
240254_at: Numeric. Expression level of gene 240254_at.
240612_at: Numeric. Expression level of gene 240612_at.
240692_at: Numeric. Expression level of gene 240692_at.
240822_at: Numeric. Expression level of gene 240822_at.
240842_at: Numeric. Expression level of gene 240842_at.
241331_at: Numeric. Expression level of gene 241331_at.
241598_at: Numeric. Expression level of gene 241598_at.
241927_x_at: Numeric. Expression level of gene 241927_x_at.
242405_at: Numeric. Expression level of gene 242405_at.

Details

This dataset was extracted from a larger dataset available on ArrayExpress and is used as an external validation set for feature selection tasks and other machine learning applications in bioinformatics.

Source

The original dataset can be found on ArrayExpress: https://www.ebi.ac.uk/arrayexpress

References

Ellenbach, N., Boulesteix, A.L., Bischl, B., et al. (2021). Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning. Journal of Classification, 38, 212–231. doi:10.1007/s00357-020-09368-z.

Hornung, R., Causeur, D., Bernau, C., Boulesteix, A.L. (2017). Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics, 33(3), 397–404. doi:10.1093/bioinformatics/btw650.

Examples

# Load the dataset
data(sample_data_extern)

# View the first few rows of the dataset
head(sample_data_extern)

# Summary of the dataset
summary(sample_data_extern)
# Load the dataset
data(sample_data_extern)

# View the first few rows of the dataset
head(sample_data_extern)

# Summary of the dataset
summary(sample_data_extern)

Sample Training Data Subset

Description

This dataset, named 'sample_data_train', is a subset of publicly available microarray data from the HG-U133PLUS2 chip. It contains expression levels of 200 genes across 50 samples, used primarily as a training set in robust feature selection studies. The data has been sourced from the ArrayExpress repository and has been referenced in several research articles.

Usage

sample_data_train
sample_data_train

Format

A data frame with 50 observations and 201 variables, including:

y: Factor. The response variable.
236694_at: Numeric. Expression level of gene 236694_at.
222356_at: Numeric. Expression level of gene 222356_at.
1554125_a_at: Numeric. Expression level of gene 1554125_a_at.
232823_at: Numeric. Expression level of gene 232823_at.
205766_at: Numeric. Expression level of gene 205766_at.
1560446_at: Numeric. Expression level of gene 1560446_at.
202565_s_at: Numeric. Expression level of gene 202565_s_at.
234887_at: Numeric. Expression level of gene 234887_at.
209687_at: Numeric. Expression level of gene 209687_at.
221592_at: Numeric. Expression level of gene 221592_at.
1570123_at: Numeric. Expression level of gene 1570123_at.
241368_at: Numeric. Expression level of gene 241368_at.
243324_x_at: Numeric. Expression level of gene 243324_x_at.
224046_s_at: Numeric. Expression level of gene 224046_s_at.
202775_s_at: Numeric. Expression level of gene 202775_s_at.
216332_at: Numeric. Expression level of gene 216332_at.
1569545_at: Numeric. Expression level of gene 1569545_at.
205946_at: Numeric. Expression level of gene 205946_at.
203547_at: Numeric. Expression level of gene 203547_at.
243239_at: Numeric. Expression level of gene 243239_at.
234245_at: Numeric. Expression level of gene 234245_at.
210832_x_at: Numeric. Expression level of gene 210832_x_at.
224549_x_at: Numeric. Expression level of gene 224549_x_at.
236628_at: Numeric. Expression level of gene 236628_at.
214848_at: Numeric. Expression level of gene 214848_at.
1553015_a_at: Numeric. Expression level of gene 1553015_a_at.
1554199_at: Numeric. Expression level of gene 1554199_at.
1557636_a_at: Numeric. Expression level of gene 1557636_a_at.
1558511_s_at: Numeric. Expression level of gene 1558511_s_at.
1561713_at: Numeric. Expression level of gene 1561713_at.
1561883_at: Numeric. Expression level of gene 1561883_at.
1568720_at: Numeric. Expression level of gene 1568720_at.
1569168_at: Numeric. Expression level of gene 1569168_at.
1569443_s_at: Numeric. Expression level of gene 1569443_s_at.
1570103_at: Numeric. Expression level of gene 1570103_at.
200916_at: Numeric. Expression level of gene 200916_at.
201554_x_at: Numeric. Expression level of gene 201554_x_at.
202371_at: Numeric. Expression level of gene 202371_at.
204481_at: Numeric. Expression level of gene 204481_at.
205831_at: Numeric. Expression level of gene 205831_at.
207061_at: Numeric. Expression level of gene 207061_at.
207423_s_at: Numeric. Expression level of gene 207423_s_at.
209896_s_at: Numeric. Expression level of gene 209896_s_at.
212646_at: Numeric. Expression level of gene 212646_at.
214068_at: Numeric. Expression level of gene 214068_at.
217727_x_at: Numeric. Expression level of gene 217727_x_at.
221103_s_at: Numeric. Expression level of gene 221103_s_at.
221785_at: Numeric. Expression level of gene 221785_at.
224207_x_at: Numeric. Expression level of gene 224207_x_at.
228257_at: Numeric. Expression level of gene 228257_at.
228877_at: Numeric. Expression level of gene 228877_at.
231173_at: Numeric. Expression level of gene 231173_at.
231328_s_at: Numeric. Expression level of gene 231328_s_at.
231639_at: Numeric. Expression level of gene 231639_at.
232221_x_at: Numeric. Expression level of gene 232221_x_at.
232349_x_at: Numeric. Expression level of gene 232349_x_at.
232849_at: Numeric. Expression level of gene 232849_at.
233601_at: Numeric. Expression level of gene 233601_at.
234403_at: Numeric. Expression level of gene 234403_at.
234585_at: Numeric. Expression level of gene 234585_at.
234650_at: Numeric. Expression level of gene 234650_at.
234897_s_at: Numeric. Expression level of gene 234897_s_at.
236071_at: Numeric. Expression level of gene 236071_at.
236689_at: Numeric. Expression level of gene 236689_at.
238551_at: Numeric. Expression level of gene 238551_at.
239414_at: Numeric. Expression level of gene 239414_at.
241034_at: Numeric. Expression level of gene 241034_at.
241131_at: Numeric. Expression level of gene 241131_at.
241897_at: Numeric. Expression level of gene 241897_at.
242611_at: Numeric. Expression level of gene 242611_at.
244805_at: Numeric. Expression level of gene 244805_at.
244866_at: Numeric. Expression level of gene 244866_at.
32259_at: Numeric. Expression level of gene 32259_at.
1552264_a_at: Numeric. Expression level of gene 1552264_a_at.
1552880_at: Numeric. Expression level of gene 1552880_at.
1553186_x_at: Numeric. Expression level of gene 1553186_x_at.
1553372_at: Numeric. Expression level of gene 1553372_at.
1553438_at: Numeric. Expression level of gene 1553438_at.
1554299_at: Numeric. Expression level of gene 1554299_at.
1554362_at: Numeric. Expression level of gene 1554362_at.
1554491_a_at: Numeric. Expression level of gene 1554491_a_at.
1555098_a_at: Numeric. Expression level of gene 1555098_a_at.
1555990_at: Numeric. Expression level of gene 1555990_at.
1556034_s_at: Numeric. Expression level of gene 1556034_s_at.
1556822_s_at: Numeric. Expression level of gene 1556822_s_at.
1556824_at: Numeric. Expression level of gene 1556824_at.
1557278_s_at: Numeric. Expression level of gene 1557278_s_at.
1558603_at: Numeric. Expression level of gene 1558603_at.
1558890_at: Numeric. Expression level of gene 1558890_at.
1560791_at: Numeric. Expression level of gene 1560791_at.
1561083_at: Numeric. Expression level of gene 1561083_at.
1561364_at: Numeric. Expression level of gene 1561364_at.
1561553_at: Numeric. Expression level of gene 1561553_at.
1562523_at: Numeric. Expression level of gene 1562523_at.
1562613_at: Numeric. Expression level of gene 1562613_at.
1563351_at: Numeric. Expression level of gene 1563351_at.
1563473_at: Numeric. Expression level of gene 1563473_at.
1566780_at: Numeric. Expression level of gene 1566780_at.
1567257_at: Numeric. Expression level of gene 1567257_at.
1569664_at: Numeric. Expression level of gene 1569664_at.
1569882_at: Numeric. Expression level of gene 1569882_at.
1570252_at: Numeric. Expression level of gene 1570252_at.
201089_at: Numeric. Expression level of gene 201089_at.
201261_x_at: Numeric. Expression level of gene 201261_x_at.
202052_s_at: Numeric. Expression level of gene 202052_s_at.
202236_s_at: Numeric. Expression level of gene 202236_s_at.
202948_at: Numeric. Expression level of gene 202948_at.
203080_s_at: Numeric. Expression level of gene 203080_s_at.
203211_s_at: Numeric. Expression level of gene 203211_s_at.
203218_at: Numeric. Expression level of gene 203218_at.
203236_s_at: Numeric. Expression level of gene 203236_s_at.
203347_s_at: Numeric. Expression level of gene 203347_s_at.
203960_s_at: Numeric. Expression level of gene 203960_s_at.
204609_at: Numeric. Expression level of gene 204609_at.
204806_x_at: Numeric. Expression level of gene 204806_x_at.
204949_at: Numeric. Expression level of gene 204949_at.
204979_s_at: Numeric. Expression level of gene 204979_s_at.
205823_at: Numeric. Expression level of gene 205823_at.
205902_at: Numeric. Expression level of gene 205902_at.
205967_at: Numeric. Expression level of gene 205967_at.
206186_at: Numeric. Expression level of gene 206186_at.
207151_at: Numeric. Expression level of gene 207151_at.
207379_at: Numeric. Expression level of gene 207379_at.
207440_at: Numeric. Expression level of gene 207440_at.
207883_s_at: Numeric. Expression level of gene 207883_s_at.
208277_at: Numeric. Expression level of gene 208277_at.
208280_at: Numeric. Expression level of gene 208280_at.
209224_s_at: Numeric. Expression level of gene 209224_s_at.
209561_at: Numeric. Expression level of gene 209561_at.
209630_s_at: Numeric. Expression level of gene 209630_s_at.
210118_s_at: Numeric. Expression level of gene 210118_s_at.
210342_s_at: Numeric. Expression level of gene 210342_s_at.
211566_x_at: Numeric. Expression level of gene 211566_x_at.
211756_at: Numeric. Expression level of gene 211756_at.
212170_at: Numeric. Expression level of gene 212170_at.
212494_at: Numeric. Expression level of gene 212494_at.
213118_at: Numeric. Expression level of gene 213118_at.
214475_x_at: Numeric. Expression level of gene 214475_x_at.
214834_at: Numeric. Expression level of gene 214834_at.
215718_s_at: Numeric. Expression level of gene 215718_s_at.
216283_s_at: Numeric. Expression level of gene 216283_s_at.
217206_at: Numeric. Expression level of gene 217206_at.
217557_s_at: Numeric. Expression level of gene 217557_s_at.
217577_at: Numeric. Expression level of gene 217577_at.
218152_at: Numeric. Expression level of gene 218152_at.
218252_at: Numeric. Expression level of gene 218252_at.
219714_s_at: Numeric. Expression level of gene 219714_s_at.
220506_at: Numeric. Expression level of gene 220506_at.
220889_s_at: Numeric. Expression level of gene 220889_s_at.
221204_s_at: Numeric. Expression level of gene 221204_s_at.
221795_at: Numeric. Expression level of gene 221795_at.
222048_at: Numeric. Expression level of gene 222048_at.
223142_s_at: Numeric. Expression level of gene 223142_s_at.
223439_at: Numeric. Expression level of gene 223439_at.
223673_at: Numeric. Expression level of gene 223673_at.
224363_at: Numeric. Expression level of gene 224363_at.
224512_s_at: Numeric. Expression level of gene 224512_s_at.
224690_at: Numeric. Expression level of gene 224690_at.
224936_at: Numeric. Expression level of gene 224936_at.
225334_at: Numeric. Expression level of gene 225334_at.
225713_at: Numeric. Expression level of gene 225713_at.
225839_at: Numeric. Expression level of gene 225839_at.
226041_at: Numeric. Expression level of gene 226041_at.
226093_at: Numeric. Expression level of gene 226093_at.
226543_at: Numeric. Expression level of gene 226543_at.
227695_at: Numeric. Expression level of gene 227695_at.
228295_at: Numeric. Expression level of gene 228295_at.
228548_at: Numeric. Expression level of gene 228548_at.
229234_at: Numeric. Expression level of gene 229234_at.
229658_at: Numeric. Expression level of gene 229658_at.
229725_at: Numeric. Expression level of gene 229725_at.
230252_at: Numeric. Expression level of gene 230252_at.
230471_at: Numeric. Expression level of gene 230471_at.
231149_s_at: Numeric. Expression level of gene 231149_s_at.
231556_at: Numeric. Expression level of gene 231556_at.
231754_at: Numeric. Expression level of gene 231754_at.
232011_s_at: Numeric. Expression level of gene 232011_s_at.
233030_at: Numeric. Expression level of gene 233030_at.
234161_at: Numeric. Expression level of gene 234161_at.
235050_at: Numeric. Expression level of gene 235050_at.
235094_at: Numeric. Expression level of gene 235094_at.
235278_at: Numeric. Expression level of gene 235278_at.
235671_at: Numeric. Expression level of gene 235671_at.
235952_at: Numeric. Expression level of gene 235952_at.
236158_at: Numeric. Expression level of gene 236158_at.
236181_at: Numeric. Expression level of gene 236181_at.
237055_at: Numeric. Expression level of gene 237055_at.
237768_x_at: Numeric. Expression level of gene 237768_x_at.
238897_at: Numeric. Expression level of gene 238897_at.
239160_at: Numeric. Expression level of gene 239160_at.
239998_at: Numeric. Expression level of gene 239998_at.
240254_at: Numeric. Expression level of gene 240254_at.
240612_at: Numeric. Expression level of gene 240612_at.
240692_at: Numeric. Expression level of gene 240692_at.
240822_at: Numeric. Expression level of gene 240822_at.
240842_at: Numeric. Expression level of gene 240842_at.
241331_at: Numeric. Expression level of gene 241331_at.
241598_at: Numeric. Expression level of gene 241598_at.
241927_x_at: Numeric. Expression level of gene 241927_x_at.
242405_at: Numeric. Expression level of gene 242405_at.

Details

This dataset was extracted from a larger dataset available on ArrayExpress. It is used as a training set for feature selection tasks and other machine learning applications in bioinformatics.

Source

The original dataset can be found on ArrayExpress: https://www.ebi.ac.uk/arrayexpress

References

Examples

# Load the dataset:
data(sample_data_train)

# Dimension of the dataset:
dim(sample_data_train)

# View the first rows of the dataset:
head(sample_data_train)
# Load the dataset:
data(sample_data_train)

# Dimension of the dataset:
dim(sample_data_train)

# View the first rows of the dataset:
head(sample_data_train)

Tune and Train Classifier

Description

This function tunes and trains a classifier using a specified tuning method. Depending on the method chosen, the function will either perform RobustTuneC, external tuning, or internal tuning.

Usage

tuneandtrain(data, dataext = NULL, tuningmethod, classifier, ...)
tuneandtrain(data, dataext = NULL, tuningmethod, classifier, ...)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable, which must be a factor for classification tasks. The remaining columns should be the predictor variables. Ensure that the data is properly formatted, with no missing values.
`dataext`	A data frame containing the external validation data, required only for the tuning methods "robusttunec" and "ext". Similar to the 'data' argument, the first column should be the response variable (factor), and the remaining columns should be the predictors. If 'tuningmethod = "int"', this parameter is ignored.
`tuningmethod`	A character string specifying which tuning approach to use. Options are: "robusttunec": Uses robust tuning that combines internal and external validation for parameter selection. "ext": Uses external validation data for tuning the parameters. "int": Internal cross-validation is used to tune the parameters without any external data.
`classifier`	A character string specifying which classifier to use. Options include: "boosting": Boosting algorithms for improving weak classifiers. "rf": Random Forest for robust decision tree-based models. "lasso": Lasso regression for feature selection and regularization. "ridge": Ridge regression for regularization. "svm": Support Vector Machines for high-dimensional classification.
`...`	Additional parameters to be passed to the specific tuning and training functions. These can include options such as the number of trees for Random Forest, the number of folds for cross-validation, or hyperparameters specific to the chosen classifier.

Value

A list containing the results of the tuning and training process, which typically includes:

Best hyperparameters selected during the tuning process.
The final trained model.
Performance metrics (AUC) on the training or validation data, depending on the tuning method.

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage: Robust tuning with Ridge classifier
result_boosting <- tuneandtrain(sample_data_train, sample_data_extern, 
  tuningmethod = "robusttunec", classifier = "ridge")
result_boosting$best_lambda
result_boosting$best_model
result_boosting$final_auc

# Example usage: Internal cross-validation with Lasso classifier
result_lasso <- tuneandtrain(sample_data_train, tuningmethod = "int", 
  classifier = "lasso", maxit = 120000, nlambda = 200, nfolds = 5)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage: Robust tuning with Ridge classifier
result_boosting <- tuneandtrain(sample_data_train, sample_data_extern, 
  tuningmethod = "robusttunec", classifier = "ridge")
result_boosting$best_lambda
result_boosting$best_model
result_boosting$final_auc

# Example usage: Internal cross-validation with Lasso classifier
result_lasso <- tuneandtrain(sample_data_train, tuningmethod = "int", 
  classifier = "lasso", maxit = 120000, nlambda = 200, nfolds = 5)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train

Tune and Train Classifier by Tuning Method Ext

Description

This function tunes and trains a classifier using an external validation dataset. Based on the specified classifier, the function selects and runs the appropriate tuning and training process. The external validation data is used to optimize the model's hyperparameters and improve generalization performance across datasets.

Usage

tuneandtrainExt(data, dataext, classifier, ...)
tuneandtrainExt(data, dataext, classifier, ...)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables. Ensure that the data is properly formatted, with no missing values.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables. The external data is used for tuning hyperparameters to avoid overfitting on the training data.
`classifier`	A character string specifying the classifier to use. Must be one of the following: "boosting" for gradient boosting models. "rf" for Random Forest. "lasso" for Lasso regression (for feature selection and regularization). "ridge" for Ridge regression (for regularization). "svm" for Support Vector Machines (SVM).
`...`	Additional arguments to pass to the specific classifier function. These may include hyperparameters such as the number of trees for Random Forest, regularization parameters for Lasso/Ridge, or kernel settings for SVM.

Value

A list containing the results from the classifier's tuning and training process. The returned object typically includes:

best_model: The final trained model using the best hyperparameters.
best_hyperparams: The optimal hyperparameters found during the tuning process.
final_auc: Performance metrics (AUC) of the final model.

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with Lasso
result_lasso <- tuneandtrainExt(sample_data_train, sample_data_extern, classifier = "lasso",
  maxit = 120000, nlambda = 100)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train

# Example usage with Ridge
result_ridge <- tuneandtrainExt(sample_data_train, sample_data_extern, 
  classifier = "ridge", maxit = 120000, nlambda = 100)
result_ridge$best_lambda
result_ridge$best_model
result_ridge$final_auc
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with Lasso
result_lasso <- tuneandtrainExt(sample_data_train, sample_data_extern, classifier = "lasso",
  maxit = 120000, nlambda = 100)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train

# Example usage with Ridge
result_ridge <- tuneandtrainExt(sample_data_train, sample_data_extern, 
  classifier = "ridge", maxit = 120000, nlambda = 100)
result_ridge$best_lambda
result_ridge$best_model
result_ridge$final_auc

Tune and Train External Boosting

Description

This function tunes and trains a Boosting classifier using the mboost::glmboost function. It provides two strategies for tuning the number of boosting iterations (mstop) based on the estperf argument:

When estperf = FALSE (default): Hyperparameters are tuned using the external validation dataset. The mstop value that gives the highest AUC on the external dataset is selected as the best model. However, no AUC value is returned in this case, as per best practices.
When estperf = TRUE: Hyperparameters are tuned internally using the training dataset. The model is then validated on the external dataset to provide a conservative (slightly pessimistic) AUC estimate.

Usage

tuneandtrainExtBoost(
  data,
  dataext,
  estperf = FALSE,
  mstop_seq = seq(5, 1000, by = 5),
  nu = 0.1
)
tuneandtrainExtBoost(
  data,
  dataext,
  estperf = FALSE,
  mstop_seq = seq(5, 1000, by = 5),
  nu = 0.1
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`estperf`	A logical value indicating whether to use internal tuning with external validation (`TRUE`) or external tuning (`FALSE`). Default is `FALSE`.
`mstop_seq`	A numeric vector specifying the sequence of boosting iterations to evaluate. Default is `seq(5, 1000, by = 5)`.
`nu`	A numeric value specifying the learning rate for boosting. Default is `0.1`.

Value

A list containing the following components:

best_mstop: The optimal number of boosting iterations determined during the tuning process.
best_model: The trained Boosting model using the selected mstop.
est_auc: The AUC value evaluated on the external dataset. This is only returned when estperf = TRUE, providing a conservative (slightly pessimistic) estimate of the model's performance.

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
mstop_seq <- seq(50, 500, by = 50)
result <- tuneandtrainExtBoost(sample_data_train, sample_data_extern, 
  mstop_seq = mstop_seq, nu = 0.1)
print(result$best_mstop)         # Optimal mstop
print(result$best_model)         # Trained Boosting model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtBoost(sample_data_train, sample_data_extern, 
  estperf = TRUE, mstop_seq = mstop_seq, nu = 0.1)
print(result_internal$best_mstop) # Optimal mstop
print(result_internal$best_model) # Trained Boosting model
print(result_internal$est_auc)    # AUC on external validation dataset
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
mstop_seq <- seq(50, 500, by = 50)
result <- tuneandtrainExtBoost(sample_data_train, sample_data_extern, 
  mstop_seq = mstop_seq, nu = 0.1)
print(result$best_mstop)         # Optimal mstop
print(result$best_model)         # Trained Boosting model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtBoost(sample_data_train, sample_data_extern, 
  estperf = TRUE, mstop_seq = mstop_seq, nu = 0.1)
print(result_internal$best_mstop) # Optimal mstop
print(result_internal$best_model) # Trained Boosting model
print(result_internal$est_auc)    # AUC on external validation dataset

Tune and Train External Lasso

Description

This function tunes and trains a Lasso classifier using the glmnet package. It provides two strategies for tuning hyperparameters based on the estperf argument:

When estperf = FALSE (default): Hyperparameters are tuned using the external validation dataset. The lambda value that gives the highest AUC on the external dataset is selected as the best model. However, no AUC value is returned in this case, as per best practices.
When estperf = TRUE: Hyperparameters are tuned internally using the training dataset. The model is then validated on the external dataset to provide a conservative (slightly pessimistic) AUC estimate.

Usage

tuneandtrainExtLasso(
  data,
  dataext,
  estperf = FALSE,
  maxit = 120000,
  nlambda = 100
)
tuneandtrainExtLasso(
  data,
  dataext,
  estperf = FALSE,
  maxit = 120000,
  nlambda = 100
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`estperf`	A logical value indicating whether to use internal tuning with external validation (`TRUE`) or external tuning (`FALSE`). Default is `FALSE`.
`maxit`	An integer specifying the maximum number of iterations. Default is 120000.
`nlambda`	An integer specifying the number of lambda values to use in the Lasso model. Default is 100.

Value

A list containing the following components:

best_lambda: The optimal lambda value determined during the tuning process.
best_model: The trained Lasso model using the selected lambda value.
est_auc: The AUC value evaluated on the external dataset. This is only returned when estperf = TRUE, providing a conservative (slightly pessimistic) estimate of the model's performance.
active_set_Train: The number of active coefficients (non-zero) in the model trained on the training dataset.

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtLasso(sample_data_train, sample_data_extern, maxit = 120000, nlambda = 100)
print(result$best_lambda)
print(result$best_model)
print(result$active_set_Train)

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtLasso(sample_data_train, sample_data_extern, 
  estperf = TRUE, maxit = 120000, nlambda = 100)
print(result_internal$best_lambda)
print(result_internal$best_model)
print(result_internal$est_auc)
print(result_internal$active_set_Train)
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtLasso(sample_data_train, sample_data_extern, maxit = 120000, nlambda = 100)
print(result$best_lambda)
print(result$best_model)
print(result$active_set_Train)

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtLasso(sample_data_train, sample_data_extern, 
  estperf = TRUE, maxit = 120000, nlambda = 100)
print(result_internal$best_lambda)
print(result_internal$best_model)
print(result_internal$est_auc)
print(result_internal$active_set_Train)

Tune and Train External Random Forest

Description

This function tunes and trains a Random Forest classifier using the ranger package. It provides two strategies for tuning the min.node.size parameter based on the estperf argument:

When estperf = FALSE (default): Hyperparameters are tuned using the external validation dataset. The min.node.size value that gives the highest AUC on the external dataset is selected as the best model. However, no AUC value is returned in this case, as per best practices.
When estperf = TRUE: Hyperparameters are tuned internally using the training dataset. The model is then validated on the external dataset to provide a conservative (slightly pessimistic) AUC estimate.

Usage

tuneandtrainExtRF(data, dataext, estperf = FALSE, num.trees = 500)
tuneandtrainExtRF(data, dataext, estperf = FALSE, num.trees = 500)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`estperf`	A logical value indicating whether to use internal tuning with external validation (`TRUE`) or external tuning (`FALSE`). Default is `FALSE`.
`num.trees`	An integer specifying the number of trees in the Random Forest. Default is 500.

Value

A list containing the following components:

best_min_node_size: The optimal min.node.size value determined during the tuning process.
best_model: The trained Random Forest model using the selected min.node.size.
est_auc: The AUC value evaluated on the external dataset. This is only returned when estperf = TRUE, providing a conservative (slightly pessimistic) estimate of the model's performance.

Examples


# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtRF(sample_data_train, sample_data_extern, num.trees = 500)
print(result$best_min_node_size)  # Optimal min.node.size
print(result$best_model)          # Trained Random Forest model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtRF(sample_data_train, sample_data_extern, 
  estperf = TRUE, num.trees = 500)
print(result_internal$best_min_node_size)  # Optimal min.node.size
print(result_internal$best_model)          # Trained Random Forest model
print(result_internal$est_auc)             # AUC on external validation dataset

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtRF(sample_data_train, sample_data_extern, num.trees = 500)
print(result$best_min_node_size)  # Optimal min.node.size
print(result$best_model)          # Trained Random Forest model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtRF(sample_data_train, sample_data_extern, 
  estperf = TRUE, num.trees = 500)
print(result_internal$best_min_node_size)  # Optimal min.node.size
print(result_internal$best_model)          # Trained Random Forest model
print(result_internal$est_auc)             # AUC on external validation dataset

Tune and Train External Ridge

Description

This function tunes and trains a Ridge classifier using the glmnet package. It provides two strategies for tuning the regularization parameter lambda based on the estperf argument:

When estperf = FALSE (default): Hyperparameters are tuned using the external validation dataset. The lambda value that gives the highest AUC on the external dataset is selected as the best model. However, no AUC value is returned in this case, as per best practices.
When estperf = TRUE: Hyperparameters are tuned internally using the training dataset. The model is then validated on the external dataset to provide a conservative (slightly pessimistic) AUC estimate.

Usage

tuneandtrainExtRidge(
  data,
  dataext,
  estperf = FALSE,
  maxit = 120000,
  nlambda = 100
)
tuneandtrainExtRidge(
  data,
  dataext,
  estperf = FALSE,
  maxit = 120000,
  nlambda = 100
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`estperf`	A logical value indicating whether to use internal tuning with external validation (`TRUE`) or external tuning (`FALSE`). Default is `FALSE`.
`maxit`	An integer specifying the maximum number of iterations. Default is 120000.
`nlambda`	An integer specifying the number of lambda values to use in the Ridge model. Default is 100.

Value

A list containing the following components:

best_lambda: The optimal lambda value determined during the tuning process.
best_model: The trained Ridge model using the selected lambda.
est_auc: The AUC value evaluated on the external dataset. This is only returned when estperf = TRUE, providing a conservative (slightly pessimistic) estimate of the model's performance.

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtRidge(sample_data_train, sample_data_extern, maxit = 120000, nlambda = 100)
print(result$best_lambda)       # Optimal lambda
print(result$best_model)        # Final trained model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtRidge(sample_data_train, sample_data_extern, 
  estperf = TRUE, maxit = 120000, nlambda = 100)
print(result_internal$best_lambda)  # Optimal lambda
print(result_internal$best_model)   # Final trained model
print(result_internal$est_auc)      # AUC on external validation dataset
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtRidge(sample_data_train, sample_data_extern, maxit = 120000, nlambda = 100)
print(result$best_lambda)       # Optimal lambda
print(result$best_model)        # Final trained model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtRidge(sample_data_train, sample_data_extern, 
  estperf = TRUE, maxit = 120000, nlambda = 100)
print(result_internal$best_lambda)  # Optimal lambda
print(result_internal$best_model)   # Final trained model
print(result_internal$est_auc)      # AUC on external validation dataset

Tune and Train External SVM

Description

This function tunes and trains a Support Vector Machine (SVM) classifier using the mlr package. It provides two strategies for tuning the cost parameter based on the estperf argument:

When estperf = FALSE (default): Hyperparameters are tuned using the external validation dataset. The cost value that gives the highest AUC on the external dataset is selected as the best model. However, no AUC value is returned in this case, as per best practices.
When estperf = TRUE: Hyperparameters are tuned internally using the training dataset. The model is then validated on the external dataset to provide a conservative (slightly pessimistic) AUC estimate.

Usage

tuneandtrainExtSVM(
  data,
  dataext,
  estperf = FALSE,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE
)
tuneandtrainExtSVM(
  data,
  dataext,
  estperf = FALSE,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`estperf`	A logical value indicating whether to use internal tuning with external validation (`TRUE`) or external tuning (`FALSE`). Default is `FALSE`.
`kernel`	A character string specifying the kernel type to be used in the SVM. Default is `"linear"`.
`cost_seq`	A numeric vector specifying the sequence of cost values to evaluate. Default is `2^(-15:15)`.
`scale`	A logical value indicating whether to scale the predictor variables. Default is `FALSE`.

Value

A list containing the following components:

best_cost: The optimal cost value determined during the tuning process.
best_model: The trained SVM model using the selected cost.
est_auc: The AUC value evaluated on the external dataset. This is only returned when estperf = TRUE, providing a conservative (slightly pessimistic) estimate of the model's performance.

Examples


# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtSVM(sample_data_train, sample_data_extern, kernel = "linear", 
  cost_seq = 2^(-15:15), scale = FALSE)
print(result$best_cost)        # Optimal cost
print(result$best_model)       # Final trained model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtSVM(sample_data_train, sample_data_extern, 
  estperf = TRUE, kernel = "linear", cost_seq = 2^(-15:15), scale = FALSE)
print(result_internal$best_cost)  # Optimal cost
print(result_internal$best_model) # Final trained model
print(result_internal$est_auc)    # AUC on external validation dataset

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with external tuning (default)
result <- tuneandtrainExtSVM(sample_data_train, sample_data_extern, kernel = "linear", 
  cost_seq = 2^(-15:15), scale = FALSE)
print(result$best_cost)        # Optimal cost
print(result$best_model)       # Final trained model
# Note: est_auc is not returned when estperf = FALSE

# Example usage with internal tuning and external validation
result_internal <- tuneandtrainExtSVM(sample_data_train, sample_data_extern, 
  estperf = TRUE, kernel = "linear", cost_seq = 2^(-15:15), scale = FALSE)
print(result_internal$best_cost)  # Optimal cost
print(result_internal$best_model) # Final trained model
print(result_internal$est_auc)    # AUC on external validation dataset

Tune and Train by tuning method Int

Description

This function tunes and trains a specified classifier using internal cross-validation. The classifier is specified by the 'classifier' argument, and the function delegates to the appropriate tuning and training function based on this choice.

Usage

tuneandtrainInt(data, classifier, ...)
tuneandtrainInt(data, classifier, ...)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`classifier`	A character string specifying the classifier to use. Must be one of 'boosting', 'rf', 'lasso', 'ridge', 'svm'.
`...`	Additional arguments to pass to the specific classifier function.

Value

A list containing the results from the specific classifier's tuning and training process. The list typically includes:

best_hyperparams: The best hyperparameters selected by cross-validation.
best_model: The final trained model using the selected hyperparameters.
final_auc: Cross-validation results (AUC).

Examples

# Load sample data
data(sample_data_train)

# Example usage with Lasso
result_lasso <- tuneandtrainInt(sample_data_train, classifier = "lasso",
  maxit = 120000, nlambda = 100)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train

# Example usage with Ridge
result_ridge <- tuneandtrainInt(sample_data_train, classifier = "ridge", 
  maxit = 120000, nlambda = 100)
result_ridge$best_lambda
result_ridge$best_model
result_ridge$final_auc
# Load sample data
data(sample_data_train)

# Example usage with Lasso
result_lasso <- tuneandtrainInt(sample_data_train, classifier = "lasso",
  maxit = 120000, nlambda = 100)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train

# Example usage with Ridge
result_ridge <- tuneandtrainInt(sample_data_train, classifier = "ridge", 
  maxit = 120000, nlambda = 100)
result_ridge$best_lambda
result_ridge$best_model
result_ridge$final_auc

Tune and Train Internal Boosting

Description

This function tunes and trains a Boosting classifier using the mboost package. The function evaluates a sequence of boosting iterations on the training dataset using internal cross-validation and selects the best model based on the Area Under the Curve (AUC).

Usage

tuneandtrainIntBoost(data, mstop_seq = seq(5, 1000, by = 5), nu = 0.1)
tuneandtrainIntBoost(data, mstop_seq = seq(5, 1000, by = 5), nu = 0.1)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`mstop_seq`	A numeric vector of boosting iterations to be evaluated. Default is a sequence from 5 to 1000 with a step of 5.
`nu`	A numeric value for the learning rate. Default is 0.1.

Details

This function performs K-fold cross-validation on the training dataset, where the number of boosting iterations (mstop) is tuned to maximize the AUC. The optimal number of boosting iterations is selected, and the final model is trained on the entire training dataset.

Value

A list containing the best number of boosting iterations ('best_mstop') and the final Boosting classifier model ('best_model').

Examples


# Load sample data
data(sample_data_train)

# Example usage
mstop_seq <- seq(5, 5000, by = 5)
result <- tuneandtrainIntBoost(sample_data_train, mstop_seq, nu = 0.1)
result$best_mstop
result$best_model

# Load sample data
data(sample_data_train)

# Example usage
mstop_seq <- seq(5, 5000, by = 5)
result <- tuneandtrainIntBoost(sample_data_train, mstop_seq, nu = 0.1)
result$best_mstop
result$best_model

Tune and Train Internal Lasso

Description

This function tunes and trains a Lasso classifier using the glmnet package. The function performs internal cross-validation to evaluate a sequence of lambda (regularization) values and selects the best model based on the Area Under the Curve (AUC).

Usage

tuneandtrainIntLasso(data, maxit = 120000, nlambda = 200, nfolds = 5)
tuneandtrainIntLasso(data, maxit = 120000, nlambda = 200, nfolds = 5)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`maxit`	An integer specifying the maximum number of iterations. Default is 120000.
`nlambda`	An integer specifying the number of lambda values to use in the Lasso model. Default is 200.
`nfolds`	An integer specifying the number of folds for cross-validation. Default is 5.

Details

This function trains a logistic Lasso model on the training dataset using cross-validation. The lambda value that results in the highest AUC during cross-validation is chosen as the best model, and the final model is trained on the full training dataset with this optimal lambda value.

Value

A list containing the best lambda value ('best_lambda'), the final trained model ('best_model'), and the number of active coefficients ('active_set_Train').

Examples

# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntLasso(sample_data_train, maxit = 120000, nlambda = 200, nfolds = 5)
result$best_lambda
result$best_model
result$active_set_Train
# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntLasso(sample_data_train, maxit = 120000, nlambda = 200, nfolds = 5)
result$best_lambda
result$best_model
result$active_set_Train

Tune and Train Internal Random Forest

Description

This function tunes and trains a Random Forest classifier using the ranger package with internal cross-validation. The function evaluates a sequence of min.node.size values on the training dataset and selects the best model based on the Area Under the Curve (AUC).

Usage

tuneandtrainIntRF(data, num.trees = 500, nfolds = 5, seed = 123)
tuneandtrainIntRF(data, num.trees = 500, nfolds = 5, seed = 123)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`num.trees`	An integer specifying the number of trees in the Random Forest. Default is 500.
`nfolds`	An integer specifying the number of folds for cross-validation. Default is 5.
`seed`	An integer specifying the random seed for reproducibility. Default is 123.

Details

Random Forest constructs multiple decision trees and aggregates their predictions. The min.node.size parameter controls the minimum number of samples in each terminal node, affecting model complexity. This function performs cross-validation within the training dataset to evaluate the impact of different min.node.size values. The min.node.size value that results in the highest AUC is selected as the best model.

Value

A list containing the best 'min.node.size' value ('best_min_node_size') and the final trained model ('best_model').

Examples


# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntRF(sample_data_train, num.trees = 500, nfolds = 5, seed = 123)
result$best_min_node_size
result$best_model

# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntRF(sample_data_train, num.trees = 500, nfolds = 5, seed = 123)
result$best_min_node_size
result$best_model

Tune and Train Internal Ridge

Description

This function tunes and trains a Ridge classifier using the glmnet package. The function evaluates a sequence of lambda (regularization) values using internal cross-validation and selects the best model based on the Area Under the Curve (AUC).

Usage

tuneandtrainIntRidge(
  data,
  maxit = 120000,
  nlambda = 200,
  nfolds = 5,
  seed = 123
)
tuneandtrainIntRidge(
  data,
  maxit = 120000,
  nlambda = 200,
  nfolds = 5,
  seed = 123
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`maxit`	An integer specifying the maximum number of iterations. Default is 120000.
`nlambda`	An integer specifying the number of lambda values to use in the Ridge model. Default is 200.
`nfolds`	An integer specifying the number of folds for cross-validation. Default is 5.
`seed`	An integer specifying the random seed for reproducibility. Default is 123.

Details

The function trains a logistic Ridge regression model on the training dataset and performs cross-validation to select the best lambda value. The lambda value that gives the highest AUC on the training dataset during cross-validation is chosen as the best model.

Value

A list containing the best lambda value ('best_lambda') and the final trained model ('best_model').

Examples

# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntRidge(sample_data_train, maxit = 120000, 
  nlambda = 200, nfolds = 5, seed = 123)
result$best_lambda
result$best_model

# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntRidge(sample_data_train, maxit = 120000, 
  nlambda = 200, nfolds = 5, seed = 123)
result$best_lambda
result$best_model

Tune and Train Internal SVM

Description

This function tunes and trains a Support Vector Machine (SVM) classifier using the mlr package. The function evaluates a sequence of cost values using internal cross-validation and selects the best model based on the Area Under the Curve (AUC).

Usage

tuneandtrainIntSVM(
  data,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE,
  nfolds = 5,
  seed = 123
)
tuneandtrainIntSVM(
  data,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE,
  nfolds = 5,
  seed = 123
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`kernel`	A character string specifying the kernel type to be used in the SVM. Default is "linear".
`cost_seq`	A numeric vector of cost values to be evaluated. Default is '2^(-15:15)'.
`scale`	A logical indicating whether to scale the predictor variables. Default is FALSE.
`nfolds`	An integer specifying the number of folds for cross-validation. Default is 5.
`seed`	An integer specifying the random seed for reproducibility. Default is 123.

Details

In Support Vector Machines, the cost parameter controls the trade-off between achieving a low training error and a low testing error. This function trains an SVM model on the training dataset, performs cross-validation, and selects the cost value that results in the highest AUC. The final model is then trained using the optimal cost value, and the performance is reported based on the AUC.

Value

A list containing the best cost value ('best_cost') and the final trained model ('best_model').

Examples


# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntSVM(
  sample_data_train,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE,
  nfolds = 5,
  seed = 123
)
result$best_cost
result$best_model

# Load sample data
data(sample_data_train)

# Example usage
result <- tuneandtrainIntSVM(
  sample_data_train,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE,
  nfolds = 5,
  seed = 123
)
result$best_cost
result$best_model

Tune and Train Classifier by Tuning Method RobustTuneC

Description

This function tunes and trains a specified classifier using the "RobustTuneC" method and the provided data.

Usage

tuneandtrainRobustTuneC(data, dataext, classifier, ...)
tuneandtrainRobustTuneC(data, dataext, classifier, ...)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`classifier`	A character string specifying the classifier to use. Must be one of the following: "boosting" for Boosting classifiers. "rf" for Random Forest. "lasso" for Lasso regression. "ridge" for Ridge regression. "svm" for Support Vector Machines.
`...`	Additional arguments to pass to the specific classifier function.

Value

A list containing the results from the specific classifier's tuning and training process, the returned object typically includes:

best_hyperparams: The best hyperparameters selected through the RobustTuneC method.
best_model: The final trained model based on the best hyperparameters.
final_auc: Performance metrics (AUC) of the final model.

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with Lasso
result_lasso <- tuneandtrainRobustTuneC(sample_data_train, sample_data_extern, classifier = "lasso",
  maxit = 120000, nlambda = 100)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train

# Example usage with Ridge
result_ridge <- tuneandtrainRobustTuneC(sample_data_train, sample_data_extern, 
  classifier = "ridge", maxit = 120000, nlambda = 100)
result_ridge$best_lambda
result_ridge$best_model
result_ridge$final_auc
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with Lasso
result_lasso <- tuneandtrainRobustTuneC(sample_data_train, sample_data_extern, classifier = "lasso",
  maxit = 120000, nlambda = 100)
result_lasso$best_lambda
result_lasso$best_model
result_lasso$final_auc
result_lasso$active_set_Train

# Example usage with Ridge
result_ridge <- tuneandtrainRobustTuneC(sample_data_train, sample_data_extern, 
  classifier = "ridge", maxit = 120000, nlambda = 100)
result_ridge$best_lambda
result_ridge$best_model
result_ridge$final_auc

Tune and Train RobustTuneC Boosting

Description

This function tunes and trains a Boosting classifier using the mboost::glmboost function and the "RobustTuneC" method. The function performs K-fold cross-validation on the training dataset and evaluates a sequence of boosting iterations (mstop) based on the Area Under the Curve (AUC).

Usage

tuneandtrainRobustTuneCBoost(
  data,
  dataext,
  K = 5,
  mstop_seq = seq(5, 1000, by = 5),
  nu = 0.1
)
tuneandtrainRobustTuneCBoost(
  data,
  dataext,
  K = 5,
  mstop_seq = seq(5, 1000, by = 5),
  nu = 0.1
)

Arguments

`data`	Training data as a data frame. The first column should be the response variable.
`dataext`	External validation data as a data frame. The first column should be the response variable.
`K`	Number of folds to use in cross-validation. Default is 5.
`mstop_seq`	A sequence of boosting iterations to consider. Default is a sequence starting at 5 and increasing by 5 each time, up to 1000.
`nu`	Learning rate for the boosting algorithm. Default is 0.1.

Details

After cross-validation, the best mstop value is selected based on the AUC, and the final Boosting model is trained using this optimal mstop. The external validation dataset is then used to calculate the final AUC and assess the model performance.

Value

A list containing the best number of boosting iterations ('best_mstop'), the final trained model ('best_model'), and the chosen c value('best_c').

Examples

# Load the sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with the sample data
mstop_seq <- seq(50, 500, by = 50)
result <- tuneandtrainRobustTuneCBoost(sample_data_train, sample_data_extern, mstop_seq = mstop_seq)
result$best_mstop
result$best_model
result$best_c
# Load the sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage with the sample data
mstop_seq <- seq(50, 500, by = 50)
result <- tuneandtrainRobustTuneCBoost(sample_data_train, sample_data_extern, mstop_seq = mstop_seq)
result$best_mstop
result$best_model
result$best_c

Tune and Train RobustTuneC Lasso

Description

This function tunes and trains a Lasso classifier using the glmnet package and the "RobustTuneC" method. The function uses K-fold cross-validation to evaluate a sequence of lambda (regularization) values and selects the best model based on the Area Under the Curve (AUC).

Usage

tuneandtrainRobustTuneCLasso(
  data,
  dataext,
  K = 5,
  maxit = 120000,
  nlambda = 100
)
tuneandtrainRobustTuneCLasso(
  data,
  dataext,
  K = 5,
  maxit = 120000,
  nlambda = 100
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`K`	Number of folds to use in cross-validation. Default is 5.
`maxit`	Maximum number of iterations. Default is 120000.
`nlambda`	The number of lambda values to use for cross-validation. Default is 100.

Details

This function trains a logistic Lasso model using the training dataset and validates it through cross-validation. After selecting the best lambda value based on the training data, the model is then applied to an external validation dataset to compute the final AUC. The lambda value that results in the highest AUC on the external validation dataset is chosen as the best model.

Value

A list containing the best lambda value ('best_lambda'), the final trained model ('best_model'), the number of active coefficients ('active_set_Train'), and the chosen c value('best_c').

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCLasso(sample_data_train, sample_data_extern, 
  K = 5, maxit = 120000, nlambda = 100)
result$best_lambda
result$best_model
result$best_c
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCLasso(sample_data_train, sample_data_extern, 
  K = 5, maxit = 120000, nlambda = 100)
result$best_lambda
result$best_model
result$best_c

Tune and Train RobustTuneC Random Forest

Description

This function tunes and trains a Random Forest classifier using the ranger package and the "RobustTuneC" method. The function uses K-fold cross-validation to evaluate different min.node.size values on the training dataset and selects the best model based on the Area Under the Curve (AUC).

Usage

tuneandtrainRobustTuneCRF(data, dataext, K = 5, num.trees = 500)
tuneandtrainRobustTuneCRF(data, dataext, K = 5, num.trees = 500)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`K`	Number of folds to use in cross-validation. Default is 5.
`num.trees`	An integer specifying the number of trees to grow in the Random Forest. Default is 500.

Details

Random Forest constructs multiple decision trees and aggregates their predictions. The min.node.size parameter controls the minimum number of samples in each terminal node, affecting model complexity. This function evaluates the min.node.size values through cross-validation and then applies the best model to an external validation dataset. The min.node.size value that results in the highest AUC on the validation dataset is selected.

Value

A list containing the best minimum node size ('best_min_node_size'), the final trained model ('best_model'), and the chosen c value('best_c').

Examples


# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCRF(sample_data_train, sample_data_extern, K = 5, num.trees = 500)
result$best_min_node_size
result$best_model
result$best_c

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCRF(sample_data_train, sample_data_extern, K = 5, num.trees = 500)
result$best_min_node_size
result$best_model
result$best_c

Tune and Train RobustTuneC Ridge

Description

This function tunes and trains a Ridge classifier using the glmnet package with the "RobustTuneC" method. The function evaluates a sequence of lambda (regularization) values using K-fold cross-validation (K specified by the user) on the training dataset and selects the best model based on Area Under the Curve (AUC).

Usage

tuneandtrainRobustTuneCRidge(
  data,
  dataext,
  K = 5,
  maxit = 120000,
  nlambda = 100
)
tuneandtrainRobustTuneCRidge(
  data,
  dataext,
  K = 5,
  maxit = 120000,
  nlambda = 100
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`K`	Number of folds to use in cross-validation. Default is 5.
`maxit`	Maximum number of iterations. Default is 120000.
`nlambda`	The number of lambda values to use for cross-validation. Default is 100.

Details

The function first performs K-fold cross-validation on the training dataset to select the best lambda value based on AUC. Then, the model is further validated on an external dataset, and the lambda value that provides the best performance on the external dataset is chosen as the final model. The Ridge regression is fitted using the selected lambda value, and the final model's performance is evaluated using AUC on the external validation dataset.

Value

A list containing the best lambda value ('best_lambda'), the final trained model ('best_model'), and the chosen c value('best_c').

Examples

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCRidge(sample_data_train, sample_data_extern, 
  K = 5, maxit = 120000, nlambda = 100)
result$best_lambda
result$best_model
result$best_c
# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCRidge(sample_data_train, sample_data_extern, 
  K = 5, maxit = 120000, nlambda = 100)
result$best_lambda
result$best_model
result$best_c

Tune and Train RobustTuneC Support Vector Machine (SVM)

Description

This function tunes and trains a Support Vector Machine (SVM) classifier using the "RobustTuneC" method. It performs K-fold cross-validation (with K specified by the user) to select the best model based on the Area Under the Curve (AUC) metric.

Usage

tuneandtrainRobustTuneCSVM(
  data,
  dataext,
  K = 5,
  seed = 123,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE
)
tuneandtrainRobustTuneCSVM(
  data,
  dataext,
  K = 5,
  seed = 123,
  kernel = "linear",
  cost_seq = 2^(-15:15),
  scale = FALSE
)

Arguments

`data`	A data frame containing the training data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`dataext`	A data frame containing the external validation data. The first column should be the response variable (factor), and the remaining columns should be the predictor variables.
`K`	Number of folds to use in cross-validation. Default is 5.
`seed`	An integer specifying the random seed for reproducibility. Default is 123.
`kernel`	A character string specifying the kernel type to be used in the SVM. It can be "linear", "polynomial", "radial", or "sigmoid". Default is "linear".
`cost_seq`	A numeric vector of cost values to be evaluated. Default is '2^(-15:15)'.
`scale`	A logical value indicating whether to scale the predictor variables. Default is 'FALSE'.

Details

In Support Vector Machines, the cost parameter controls the trade-off between achieving a low training error and a low testing error. This function trains an SVM model on the training dataset, performs cross-validation to evaluate different cost values, and selects the one that yields the highest AUC. The final model is trained using the optimal cost value, and its performance is reported using the AUC metric on the external validation dataset.

Value

A list containing the best cost value ('best_cost'), the final trained model ('best_model'), and the chosen c value('best_c').

Examples


# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCSVM(sample_data_train, sample_data_extern, K = 5, seed = 123, 
                                     kernel = "linear", cost_seq = 2^(-15:15), scale = FALSE)
result$best_cost
result$best_model
result$best_c

# Load sample data
data(sample_data_train)
data(sample_data_extern)

# Example usage
result <- tuneandtrainRobustTuneCSVM(sample_data_train, sample_data_extern, K = 5, seed = 123, 
                                     kernel = "linear", cost_seq = 2^(-15:15), scale = FALSE)
result$best_cost
result$best_model
result$best_c

Package 'RobustPrediction'

Help Index

Package Title: Robust Tuning and Training for Cross-Source Prediction

Description

Details

Dependencies

Author(s)

References

See Also

Examples

Sample External Validation Data Subset

Description

Usage

Format

Details

Source

References

Examples

Sample Training Data Subset

Description

Usage

Format

Details

Source

References

Examples

Tune and Train Classifier

Description

Usage

Arguments

Value

Examples

Tune and Train Classifier by Tuning Method Ext

Description

Usage

Arguments

Value

Examples

Tune and Train External Boosting

Description

Usage

Arguments

Value

Examples

Tune and Train External Lasso

Description

Usage

Arguments

Value

Examples

Tune and Train External Random Forest

Description

Usage

Arguments

Value

Examples

Tune and Train External Ridge

Description

Usage

Arguments

Value

Examples

Tune and Train External SVM

Description

Usage

Arguments

Value

Examples

Tune and Train by tuning method Int

Description

Usage

Arguments

Value

Examples

Tune and Train Internal Boosting

Description

Usage

Arguments

Details

Value