Function to use a support vector or random forest machine learning algorithm or to classify quantitative protein-protein interaction data.

ppi.prediction(
  PPIdf = NULL,
  referenceSet = NULL,
  seed = 555,
  method.scaling = "robust.scaler",
  construct.scaling = "robust.scaler",
  scale = TRUE,
  independent.reference = FALSE,
  independent.PPIdf = FALSE,
  iter.scaler = TRUE,
  range = c(0.25, 0.75),
  data.scaling = "main",
  negative.reference = c("RRS", "inter-complex"),
  assay = c("mean_cBRET", "mean_mCit"),
  svm.scale = TRUE,
  sampling = "unweighted",
  weightBy = NULL,
  weightHi = TRUE,
  model.type = "svm",
  kernelType = "linear",
  svm.parameters = FALSE,
  C = 100,
  gamma = NULL,
  coef0 = 0,
  degree = 2,
  ensembleSize = 25,
  top = NULL,
  inclusion = NULL,
  cutoff = "all",
  iter = 5,
  verbose = TRUE
)

Arguments

PPIdf:

binary PPI data set containing interactions to be classified

referenceSet:

reference PPI data set containing reference interactions used to train the svm models

seed:

set seed

construct.scaling:

accepted scaling arguments are: "none", "standardize", "robust.scaler", "construct", "orientation"

scale:

logical; If TRUE a 'robust.scaler' or 'standard.scaler' normalization is performed as specified under method.scaling

svm.scale:

logical; controls 'scale' attribute within the e1071::svm function. See ?e1071::svm for details. Set to FALSE if performing 'scale=TRUE' and 'method.scaling="robust.scaler"' normalization.

method.scaling:

method to scale the data. Choose between 'robust.scaler' or 'standard.scaler'.

independent.reference:

logical; is the referenceSet a collection of independently collected reference sets? If TRUE, than a 'robust.scaler' normalization is performed on the distinct reference sets indicated by a column 'dataset'

independent.PPIdf:

logical; is the PPIdf a collection of independently collected data sets? If TRUE, than a 'robust.scaler' normalization is performed on the distinct data sets indicated by a column 'dataset'

iter.scaler:

if TRUE and when using "robust.scaler" it iteratively performs robust scaler normalization until the IQR of each construct is within the IQR of all loaded data sets

range:

IQR range used in "robust.scaler"

data.scaling:

speficies for which 'assay' the construcst are scaled. When "main" only the first assay is scaled. When "all", all the assays are scaled.

negative.reference:

string in the column "complex" to specify the negative/random interactions

assay:

assay parameters used for training

sampling:

use "weighted" or "unweighted" sampling to generate the independent training sets

weightBy:

assay parameter used for weighted sampling

weightHi:

if TRUE weighted sampling uses preferentially higher values; if FALSE weighted sampling uses preferentially smaller values

model.type:

the machine learning algorithm used. Support are support vector machines: "svm" and random fores: "randomForest

kernelType:

the kernel used in training and predicting, see ?e1071::svm for details

svm.parameters:

if TRUE, the best parameters (degree, gamma, coef0, cost) are calculated; if FALSE the parameters must be provided manually

C:

cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation; see ?e1071::svm for details

gamma:

parameter needed for all kernels except linear (default: 1/(data dimension)); see ?e1071::svm for details

coef0:

parameter needed for kernels of type polynomial and sigmoid (default: 0); see ?e1071::svm for details

degree:

parameter needed for kernel of type polynomial (default: 3); see ?e1071::svm for details

ensembleSize:

number of independent training sets assembled

top:

number of highest scoring prs and rrs interactions to randomly sample from; if NULL, only interactions above the 'cs' will be used

inclusion:

number of interactions to include during training of each model (>30); if NULL, number of interactions are calculated from the training sets assembled (ensembleSize) so that each interaction has been sampled with a 99.99% probability.

cutoff:

sample interactions with quantitative scores above the specified 'cutoff' from the main assay

iter:

number of iterations performed to reclassify the training set

verbose:

give detailed information

Value

A list with elements containing the results from the machine learning prediction classes and the parameters used.