DEPARTMENT SEMINAR

Maria-Florina (Nina) Balcan
Carnegie Mellon University

Department of Computer Science

February 28, 2008
Computer Science Building, Room 151
4:00 PM

Faculty Host: Andrew McCallum

"A Theory of Similarity Functions for Learning and Clustering"

One of the most powerful tools developed in machine learning in recent
years is the class of kernel methods. These methods perform well in
many applications, and there is also a well-developed theory of when a
given kernel is useful for a given learning problem. However, while a
kernel can be thought of as just a pairwise similarity function that
satisfies additional mathematical properties, the existing theory
requires viewing kernels as implicit (and often difficult to
characterize) maps into high-dimensional spaces. In this work we
develop an alternative theory of learning with more general similarity
functions, which requires neither reference to implicit spaces, nor
the function to be positive semi-definite. Our results strictly
generalize the standard theory, and any good kernel function under the
usual definition can be shown to also be a good similarity function
under our definition. 

We then show how our framework can also be applied to clustering:
multi-way classification from purely unlabeled data. In particular,
using this perspective we develop a new model that directly addresses
the fundamental question of what kind of information a clustering
algorithm needs in order to produce a highly accurate partition of the
data. Our work can be viewed as an approach to defining a
discriminative model for clustering with non-interactive feedback. 

BIO:

Maria-Florina Balcan is a Ph.D. candidate at Carnegie Mellon
University under the supervision of Avrim Blum. She received B.S. and
M.S. degrees from the Faculty of Mathematics, University of Bucharest,
Romania. Her main research interests are Computational and Statistical
Machine Learning, Computational Aspects in Economics and Game Theory,
and Algorithms. She is a recipient of the IBM PhD Fellowship. 

Refreshments at 3:40 p.m. in the atrium, outside the presentation
room.