TreeSS: A model-free Treebased Subdata Selection method for prediction

John Stufken

George Mason University
Barton Lectures in Computational Mathematics


Date: Wednesday, September 20, 2023
Time: 3:30 pm - 5:00 pm
Location: Petty 150

With ever larger datasets, there is a growing interest in methods that select a small portion of the entire dataset (subdata) so that reliable inferences can be obtained by analyzing only the selected subdata. Many of the subdata selection methods that have been proposed in recent years are based on model assumptions for the data. While these methods can work extremely well when the model assumptions hold, they may yield poor results if the assumptions are wrong. In addition, subdata that is good for one task may not be so good for another. In this presentation we introduce and discuss a model-free subdata selection method (TreeSS) that is based on using binary decision trees and that focuses on selecting subdata that performs well for prediction.