Function to use a support vector or random forest machine learning algorithm or to classify quantitative protein-protein interaction data.

ppi.prediction(
  PPIdf = NULL,
  referenceSet = NULL,
  seed = 555,
  method.scaling = "robust.scaler",
  construct.scaling = "robust.scaler",
  scale = TRUE,
  independent.reference = FALSE,
  independent.PPIdf = FALSE,
  iter.scaler = TRUE,
  range = c(0.25, 0.75),
  data.scaling = "main",
  negative.reference = c("RRS", "inter-complex"),
  assay = c("mean_cBRET", "mean_mCit"),
  svm.scale = TRUE,
  sampling = "unweighted",
  weightBy = NULL,
  weightHi = TRUE,
  model.type = "svm",
  kernelType = "linear",
  svm.parameters = FALSE,
  C = 100,
  gamma = NULL,
  coef0 = 0,
  degree = 2,
  ensembleSize = 25,
  top = NULL,
  inclusion = NULL,
  cutoff = "all",
  iter = 5,
  verbose = TRUE
)

Arguments

PPIdf:: binary PPI data set containing interactions to be classified
referenceSet:: reference PPI data set containing reference interactions used to train the svm models
seed:: set seed
construct.scaling:: accepted scaling arguments are: "none", "standardize", "robust.scaler", "construct", "orientation"
scale:: logical; If TRUE a 'robust.scaler' or 'standard.scaler' normalization is performed as specified under method.scaling
svm.scale:: logical; controls 'scale' attribute within the e1071::svm function. See ?e1071::svm for details. Set to FALSE if performing 'scale=TRUE' and 'method.scaling="robust.scaler"' normalization.
method.scaling:: method to scale the data. Choose between 'robust.scaler' or 'standard.scaler'.
independent.reference:: logical; is the referenceSet a collection of independently collected reference sets? If TRUE, than a 'robust.scaler' normalization is performed on the distinct reference sets indicated by a column 'dataset'
independent.PPIdf:: logical; is the PPIdf a collection of independently collected data sets? If TRUE, than a 'robust.scaler' normalization is performed on the distinct data sets indicated by a column 'dataset'
iter.scaler:: if TRUE and when using "robust.scaler" it iteratively performs robust scaler normalization until the IQR of each construct is within the IQR of all loaded data sets
range:: IQR range used in "robust.scaler"
data.scaling:: speficies for which 'assay' the construcst are scaled. When "main" only the first assay is scaled. When "all", all the assays are scaled.
negative.reference:: string in the column "complex" to specify the negative/random interactions
assay:: assay parameters used for training
sampling:: use "weighted" or "unweighted" sampling to generate the independent training sets
weightBy:: assay parameter used for weighted sampling
weightHi:: if TRUE weighted sampling uses preferentially higher values; if FALSE weighted sampling uses preferentially smaller values
model.type:: the machine learning algorithm used. Support are support vector machines: "svm" and random fores: "randomForest
kernelType:: the kernel used in training and predicting, see ?e1071::svm for details
svm.parameters:: if TRUE, the best parameters (degree, gamma, coef0, cost) are calculated; if FALSE the parameters must be provided manually
C:: cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation; see ?e1071::svm for details
gamma:: parameter needed for all kernels except linear (default: 1/(data dimension)); see ?e1071::svm for details
coef0:: parameter needed for kernels of type polynomial and sigmoid (default: 0); see ?e1071::svm for details
degree:: parameter needed for kernel of type polynomial (default: 3); see ?e1071::svm for details
ensembleSize:: number of independent training sets assembled
top:: number of highest scoring prs and rrs interactions to randomly sample from; if NULL, only interactions above the 'cs' will be used
inclusion:: number of interactions to include during training of each model (>30); if NULL, number of interactions are calculated from the training sets assembled (ensembleSize) so that each interaction has been sampled with a 99.99% probability.
cutoff:: sample interactions with quantitative scores above the specified 'cutoff' from the main assay
iter:: number of iterations performed to reclassify the training set
verbose:: give detailed information

Value

A list with elements containing the results from the machine learning prediction classes and the parameters used.