Submodular functions model the intuitive notion of diminishing returns. Due to their far-reaching applications, they have been rediscovered in many fields such as information theory, operations research, statistical physics, economics, and machine learning. They also enjoy computational tractability as they can be minimized exactly or maximized approximately. The goal of this talk is simple. We see how a little bit of randomness, a little bit of greediness, and the right combination can lead to pretty good methods for offline, streaming, and distributed solutions.
Many modern neural networks are trained in an over-parameterized regime where the parameters of the model exceed the size of the training dataset. Due to their over-parameterized nature these models in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, models trained via first-order methods (often with early stopping) continue to predict well on yet unseen test data.
- 1 of 61
- next ›