ppi.prediction.RdFunction to use a support vector or random forest machine learning algorithm or to classify quantitative protein-protein interaction data.
ppi.prediction(
PPIdf = NULL,
referenceSet = NULL,
seed = 555,
method.scaling = "robust.scaler",
construct.scaling = "robust.scaler",
scale = TRUE,
independent.reference = FALSE,
independent.PPIdf = FALSE,
iter.scaler = TRUE,
range = c(0.25, 0.75),
data.scaling = "main",
negative.reference = c("RRS", "inter-complex"),
assay = c("mean_cBRET", "mean_mCit"),
svm.scale = TRUE,
sampling = "unweighted",
weightBy = NULL,
weightHi = TRUE,
model.type = "svm",
kernelType = "linear",
svm.parameters = FALSE,
C = 100,
gamma = NULL,
coef0 = 0,
degree = 2,
ensembleSize = 25,
top = NULL,
inclusion = NULL,
cutoff = "all",
iter = 5,
verbose = TRUE
)binary PPI data set containing interactions to be classified
reference PPI data set containing reference interactions used to train the svm models
set seed
accepted scaling arguments are: "none", "standardize", "robust.scaler", "construct", "orientation"
logical; If TRUE a 'robust.scaler' or 'standard.scaler' normalization is performed as specified under method.scaling
logical; controls 'scale' attribute within the e1071::svm function. See ?e1071::svm for details. Set to FALSE if performing 'scale=TRUE' and 'method.scaling="robust.scaler"' normalization.
method to scale the data. Choose between 'robust.scaler' or 'standard.scaler'.
logical; is the referenceSet a collection of independently collected reference sets? If TRUE, than a 'robust.scaler' normalization is performed on the distinct reference sets indicated by a column 'dataset'
logical; is the PPIdf a collection of independently collected data sets? If TRUE, than a 'robust.scaler' normalization is performed on the distinct data sets indicated by a column 'dataset'
if TRUE and when using "robust.scaler" it iteratively performs robust scaler normalization until the IQR of each construct is within the IQR of all loaded data sets
IQR range used in "robust.scaler"
speficies for which 'assay' the construcst are scaled. When "main" only the first assay is scaled. When "all", all the assays are scaled.
string in the column "complex" to specify the negative/random interactions
assay parameters used for training
use "weighted" or "unweighted" sampling to generate the independent training sets
assay parameter used for weighted sampling
if TRUE weighted sampling uses preferentially higher values; if FALSE weighted sampling uses preferentially smaller values
the machine learning algorithm used. Support are support vector machines: "svm" and random fores: "randomForest
the kernel used in training and predicting, see ?e1071::svm for details
if TRUE, the best parameters (degree, gamma, coef0, cost) are calculated; if FALSE the parameters must be provided manually
cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation; see ?e1071::svm for details
parameter needed for all kernels except linear (default: 1/(data dimension)); see ?e1071::svm for details
parameter needed for kernels of type polynomial and sigmoid (default: 0); see ?e1071::svm for details
parameter needed for kernel of type polynomial (default: 3); see ?e1071::svm for details
number of independent training sets assembled
number of highest scoring prs and rrs interactions to randomly sample from; if NULL, only interactions above the 'cs' will be used
number of interactions to include during training of each model (>30); if NULL, number of interactions are calculated from the training sets assembled (ensembleSize) so that each interaction has been sampled with a 99.99% probability.
sample interactions with quantitative scores above the specified 'cutoff' from the main assay
number of iterations performed to reclassify the training set
give detailed information
A list with elements containing the results from the machine learning prediction classes and the parameters used.