Obtain predicted values on test data using a trained super greedy forest.

# S3 method for class 'rfsgt'
predict(object, newdata,  get.tree = NULL,
     block.size = 10, seed = NULL, do.trace = FALSE,...)

Arguments

object

rfsgt object obtained from previous training call using rfsgt.

newdata

Test data. If not provided the training data is used and the original training forest is restored.

get.tree

Vector of integer(s) identifying trees over which the ensemble is calculated over. By default, uses all trees in the forest.

block.size

Determines how cumulative error rate is calculated. To obtain the cumulative error rate on every nth tree, set the value to an integer between 1 and ntree.

seed

Negative integer specifying seed for the random number generator.

do.trace

Number of seconds between updates to the user on approximate time to completion.

...

Additional options.

Details

Returns the predicted values for a super greedy forest.

Value

An object of class c("rfsgt", "predict", family) containing predictions and prediction-time summaries. When newdata is omitted, the object corresponds to the restored training forest; when newdata is supplied, it corresponds to predictions on the new data. The returned list contains:

call

The matched prediction call.

family

Model family inherited from the fitted rfsgt object.

n

Number of observations predicted.

samptype

Sampling type inherited from the fitted forest.

sampsize

Tree sample size inherited from the fitted forest.

ntree

Number of trees in the fitted forest.

hcut

hcut value used by the fitted forest.

splitrule

Split rule used by the fitted forest.

yvar

Response values used for prediction-error calculation. This is the training response in restore mode, the response from newdata when present, and NULL when newdata has no response column.

yvar.names

Response variable name from the fitted forest.

xvar

Predictor data used for prediction, after applying the same encoding and filtering as in training.

xvar.augment

Augmented base-learner data used for prediction, or NULL when no augmented terms are used.

xvar.names

Names of the retained predictor variables.

xvar.augment.names

Names of augmented base-learner terms, or NULL when no augmented terms are used.

xvar.info

Predictor-encoding metadata used to align new data with the training design.

term.map

Term map describing generated base-learner terms.

leaf.count

Number of terminal nodes in each tree.

forest

Stored forest object used to make the predictions.

membership

Matrix of terminal-node membership by observation and tree when membership output is requested; otherwise NULL.

inbag

Matrix of bootstrap membership counts in restore mode when membership output is requested; otherwise NULL.

block.size

Block size used for cumulative prediction error calculations.

perf.type

Internal performance-measure type used for prediction-error calculation.

predicted

Ensemble predicted values.

predicted.oob

Out-of-bag predictions in restore mode when available; otherwise NULL.

ambrOffset

Terminal-node offset matrix used internally to recover local terminal-node quantities for new observations; NULL in restore mode.

err.rate

Prediction error when response values are available; otherwise NULL.

ctime.internal

Timing information reported by the native prediction routine.

ctime.external

Elapsed R-side timing, computed from proc.time().

Author

Hemant Ishwaran and Udaya B. Kogalur

References

Ishwaran H. (2025). Super greedy trees. To appear in Artifical Intelligence Review.

See also

Examples


## ------------------------------------------------------------
##
##  mtcars: for CRAN testing
##
## ------------------------------------------------------------

o <- rfsgt(mpg~., mtcars[1:20,], ntree=3, treesize=1)
p <- predict(o, mtcars[-(1:20),])
print(o)
print(p)



# \donttest{
## ------------------------------------------------------------
##
## train/test using friedman 3
##
## ------------------------------------------------------------

if (requireNamespace("mlbench", quietly = TRUE)) {

  ## train/test using Friedman 3
  d.trn <- data.frame(mlbench::mlbench.friedman3(100))
  o <- rfsgt(y ~ ., d.trn, hcut = 1, ntree = 3, treesize = 1)
  print(o)

  d.tst <- data.frame(mlbench::mlbench.friedman3(200))
  y.tst <- d.tst$y
  x.tst <- d.tst[, colnames(d.tst) != "y"]
  yhat <- predict(o, x.tst)$predicted
  mean((yhat - y.tst)^2)

## train sgf on friedman 3
d.trn <- data.frame(mlbench::mlbench.friedman3(500))
o <- rfsgt(y~.,d.trn, hcut=1)
print(o)

## test sgf
d.tst <- data.frame(mlbench::mlbench.friedman3(1000))
y.tst <- d.tst$y
x.tst <- d.tst[, colnames(d.tst)!= "y"]
yhat <- predict(o, x.tst)$predicted
cat("test set mse:", mean((yhat - y.tst)^2), "\n")

## ------------------------------------------------------------
##
## restore a trained super greedy forest using boston
##
## ------------------------------------------------------------

## run sgf on boston
data(BostonHousing, package = "mlbench")
o <- rfsgt(medv~., BostonHousing)
print(o)

## restore the forest
print(predict(o))

## ------------------------------------------------------------
##
## coherence check using boston housing with factors
##
## ------------------------------------------------------------

## boston housing data: make factors
data(BostonHousing, package = "mlbench")
Boston <- BostonHousing[1:40,]
Boston$zn <- factor(Boston$zn)
Boston$chas <- factor(Boston$chas)
Boston$lstat <- factor(round(0.2 * Boston$lstat))
Boston$nox <- factor(round(20 * Boston$nox))
Boston$rm <- factor(round(Boston$rm))
     
## grow a single tree - save inbag information
o <- rfsgt(medv~., Boston, hcut=2, filter=FALSE, ntree=1, membership=TRUE, nodesize=3)

## coherence matrix
pred <- data.frame(
      inbag=o$inbag,
      pred.inb=o$predicted,
      pred.oob=o$predicted.oob,
      pred.inb.restore=predict(o)$predicted,
      pred.oob.restore=predict(o)$predicted.oob,
      pred.test=predict(o,Boston)$predicted)
print(pred)

## coherence check
cat("coherence for inbag data:", sum(pred$pred.inb-pred$pred.test,na.rm=TRUE)==0, "\n")
cat("  coherence for oob data:", sum(pred$pred.oob-pred$pred.test,na.rm=TRUE)==0, "\n")


## canonical example of train/test with prediction
trn <- sample(1:nrow(Boston), nrow(Boston)/2, replace=FALSE)
o.trn <- rfsgt(medv~., Boston[trn,], hcut=2)
predict(o.trn,Boston[-trn,])


## ------------------------------------------------------------
## prediction using tuning hcut and pre-filtering with tune.hcut 
## ------------------------------------------------------------

## fit the forest to the tuned hcut
dta <- data.frame(mlbench::mlbench.friedman3(500))
f <- tune.hcut(y~., dta, hcut=5, verbose=TRUE)
o <- rfsgt(y~., dta, filter=f)
print(o)

## test the tuned forest on new data
print(predict(o, data.frame(mlbench::mlbench.friedman3(25000))))

## over-ride the optimized hcut
o2 <- rfsgt(y~., dta, filter=use.tune.hcut(f, hcut=2))
print(o2)
print(predict(o2, data.frame(mlbench::mlbench.friedman3(25000))))

}
# }