|
Publications
|
2009
|
Active Learning by Labeling Features
Gregory Druck, Burr Settles, Andrew McCallum.
To appear in Proceedings of EMNLP.
[abstract] [bib] [pdf]
Methods that learn from prior information about input features such as generalized expectation (GE) have been used to train accurate models with very little effort. In this paper, we
propose an active learning approach in which the machine solicits "labels" on features rather than instances. In both simulated and real user experiments on two sequence labeling
tasks we show that our active learning method outperforms passive learning with features as well as traditional active learning with instances. Preliminary experiments suggest that novel
interfaces which intelligently solicit labels on multiple features facilitate more efficient annotation.
@inproceedings{druck09active,
Author = {Gregory Druck and Burr Settles and Andrew McCallum},
Booktitle = {Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP 2009)},
Title = {Active Learning by Labeling Features},
Year = {2009}}
|
|
|
|
Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria.
Gregory Druck, Gideon Mann, Andrew McCallum.
To appear in Proceedings of ACL.
[abstract] [bib] [pdf]
In this paper, we propose a novel method for semi-supervised learning of non-projective log-linear dependency parsers using directly expressed linguistic prior knowledge (e.g. a noun's
parent is often a verb). Model parameters are estimated using a generalized expectation (GE) objective function that penalizes the mismatch between model predictions and linguistic
expectation constraints. In a comparison with two prominent "unsupervised" learning methods that require indirect biasing toward the correct syntactic structure, we show that GE can attain
better accuracy with as few as 20 intuitive constraints. We also present positive experimental results on longer sentences in multiple languages.
@inproceedings{druck09semi,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language
Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 09)},
Title = {Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria },
Year = {2009}}
|
|
|
|
Alternating Projections for Learning with Expectation Constraints.
Kedar Bellare, Gregory Druck, Andrew McCallum.
To appear in Proceedings of UAI.
[abstract] [bib] [pdf]
We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints. We optimize this objective function using a procedure that alternates between
information and moment projections. Our method provides an alternate interpretation of the posterior regularization framework (Graca et al., 2008), maintains uncertainty
during optimization unlike constraint-driven learning (Chang et al., 2007), and is more efficient than generalized expectation criteria (Mann and McCallum, 2008).
Applications of this framework include minimally
supervised learning, semi-supervised learning, and learning with constraints that are more expressive than the underlying model. In experiments, we demonstrate comparable accuracy to
generalized expectation criteria for minimally supervised learning, and use expressive structural constraints to guide semi-supervised learning, providing a 3%-6% improvement over
state-of-the-art constraint-driven learning.
@inproceedings{bellare09alternating,
Author = {Kedar Bellare and Gregory Druck and Andrew McCallum},
Booktitle = {Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 09)},
Title = {Alternating Projections for Learning with Expectation Constraints},
Year = {2009}}
|
|
|
2008
|
Learning from Labeled Features using Generalized Expectation Criteria.
Gregory Druck, Gideon Mann, Andrew McCallum.
In Proceedings of SIGIR.
A version of this paper appeared in the Proceedings of NESCAI 2008.
A version of this paper appeared as U. of Massachusetts Amherst Tech. Report UM-CS-2007-62.
[abstract] [bib] [pdf]
It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in
the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the
presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for
training discriminative probabilistic models with labeled features and unlabeled instances. Unlike previous approaches that use labeled features to create labeled pseudo-instances, we use
labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteria --- terms in a parameter
estimation objective function that express preferences on values of a model expectation. In this paper we train multinomial logistic regression models using GE criteria, but the method we
develop is applicable to other discriminative probabilistic models. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading
parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled
features. Experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. For example, after only one minute
of labeling features, we can achieve 80% accuracy on the ibm vs. mac text classification problem using GE-FL, whereas ten minutes labeling documents results in an accuracy of
only 77%.
@inproceedings{druck08learning,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval},
Pages = {595--602},
Title = {Learning from Labeled Features using Generalized Expectation Criteria},
Year = {2008}}
|
|
|
|
Learning to Predict the Quality of Contributions to Wikipedia.
Gregory Druck, Gerome Miklau, Andrew McCallum.
In AAAI Workshop on Wikipedia and AI.
[abstract] [bib] [pdf]
Although some have argued that Wikipedia's open edit policy is one of the primary reasons for its success, it also raises concerns about quality --- vandalism, bias, and errors can be problems.
Despite these challenges, Wikipedia articles are often (perhaps surprisingly) of high quality, which many attribute to both the dedicated Wikipedia community and "good Samaritan" users. As
Wikipedia continues to grow, however, it becomes more difficult for these users to keep up with the increasing number of articles and edits. This motivates the development of tools to assist
users in creating and maintaining quality. In this paper, we propose metrics that quantify the quality of contributions to Wikipedia through implicit feedback from the community. We then learn
discriminative probabilistic models that predict the quality of a new edit using features of the changes made, the author of the edit, and the article being edited. Through estimating parameters
for these models, we also gain an understanding of factors that influence quality. We advocate using edit quality predictions and information gleaned from model analysis not to place
restrictions on editing, but to instead alert users to potential quality problems, and to facilitate the development of additional incentives for contributors. We evaluate the edit quality
prediction models on the Spanish Wikipedia. Experiments demonstrate that the models perform better when given access to content-based features of the edit, rather than only features of
contributing user. This suggests that a user-based solution to the Wikipedia quality problem may not be sufficient.
@inproceedings{druck08wikiai,
Author = {Gregory Druck and Gerome Miklau and Andrew McCallum},
Booktitle = {Proceedings of the AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI 08)},
Pages = {7--12},
Title = {Learning to Predict the Quality of Contributions to Wikipedia},
Year = {2008}}
|
|
|
2007
|
Leveraging Existing Resources using Generalized Expectation Criteria.Gregory Druck, Gideon Mann, Andrew
McCallum. In NIPS Workshop on Learning Problem Design
Updated: 12/17/07
[abstract] [bib] [pdf]
It is difficult to apply machine learning to many real-world tasks because there are no existing labeled instances. In one solution to this problem, a human expert provides instance labels that
are used in traditional supervised or semi-supervised training. Instead, we want a solution that allows us to leverage existing resources other than complete labeled instances. We propose the
use of generalized expectation (GE) criteria to achieve this goal. A GE criterion is a term in a training objective function that assigns a score to values of a
model expectation. In this paper, the expectations are model predicted class distributions conditioned on the presence of selected features, and the score function is the Kullback-Leibler
divergence from reference distributions that are estimated using existing resources. We apply this method to the problem of named-entity-recognition, leveraging available lexicons. Using no
conventionally labeled instances, we learn a sliding-window multinomial logistic regression model that obtains an F1 score of 0.692 on the CoNLL 2003 data. To attain the same accuracy a
supervised classifier requires 4,000 labeled instances.
@inproceedings{druck07leveraging,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Learning Problem Design},
Title = {Leveraging Existing Resources using Generalized Expectation Criteria},
Year = {2007}}
|
|
|
|
Generalized Expectation Criteria.Andrew McCallum, Gideon Mann, Gregory Druck.
U. of Massachusetts Amherst Tech. Report UM-CS-2007-60
This working note has not been updated recently. The 2008 SIGIR, and 2009 ACL and EMNLP papers provide up-to-date descriptions of GE.
[abstract] [bib] [pdf]
This note describes generalized expectation (GE) criteria, a
framework for incorporating preferences about model expectations into
parameter estimation objective functions. We discuss relations to
other methods, various learning paradigms it supports, and
applications that can leverage its flexibility.
@techreport{mccallum07generalized,
Author = {Andrew McCallum and Gideon Mann and Gregory Druck},
Institution = {University of Massachusetts Amherst},
Number = {UM-CS-2007-60},
Title = {Generalized Expectation Criteria},
Year = {2007}}
|
|
|
|
Semi-Supervised Classification with Hybrid Generative/Discriminative Methods.Gregory Druck, Chris Pal, Xiaojin Zhu, Andrew
McCallum. In Proceedings of KDD.
[abstract] [bib] [pdf]
We compare two recently proposed frameworks for combining generative and discriminative probabilistic classifiers and apply them to semi-supervised classification. In both cases we explore the
tradeoff between maximizing a discriminative likelihood of labeled data and a generative likelihood of labeled and unlabeled data. While prominent semi-supervised learning methods assume low
density regions between classes or are subject to generative modeling assumptions, we conjecture that hybrid generative/discriminative methods allow semi-supervised learning in the presence of
strongly overlapping classes and reduce the risk of modeling structure in the unlabeled data that is irrelevant for the specific classification task of interest. We apply both hybrid approaches
within naively structured Markov random field models and provide a thorough empirical comparison with two well-known semi-supervised learning methods on six text classification tasks. A
semi-supervised hybrid generative/discriminative method provides the best accuracy in 75% of the experiments, and the multi-conditional learning hybrid approach achieves the highest
overall mean accuracy across all tasks.
@inproceedings{druck07semi,
Author = {Gregory Druck and Chris Pal and Andrew McCallum and Xiaojin Zhu},
Booktitle = {Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 07)},
Pages = {280--289},
Title = {Semi-supervised classification with hybrid generative/discriminative methods},
Year = {2007}}
|
|
 
|
|
Learning A* Underestimates: Using Inference to Guide Inference. Gregory Druck, Mukund Narasimhan, Paul Viola.
In Proceedings of AISTATS
[abstract] [bib] [pdf]
We present a technique for speeding up inference of structured variables using a priority-driven search algorithm rather than the more conventional dynamic programing. A priority-driven
search algorithm is guaranteed to return the optimal answer if the priority function is an underestimate of the true cost function. We introduce the notion of a probable approximate
underestimate, and show that it can be used to compute a probable approximate solution to the inference problem when used as a priority function. We show that we can learn probable
approximate underestimate functions which have the functional form of simpler, easy to decode models. These models can be learned from unlabeled data by solving a linear/quadratic
optimization problem. As a result, we get a priority function that can be computed quickly, and results in solutions that are (provably) almost optimal most of the time. Using these ideas,
discriminative classifiers such as semi-Markov CRFs and discriminative parsers can be sped up using a generalization of the A* algorithm. Further, this technique resolves one of the biggest
obstacles to the use of A* as a general decoding procedure, namely that of coming up with a admissible priority function. Applying this technique results in a algorithm that is more than 3
times as fast as the Viterbi algorithm for decoding semi-Markov Conditional Markov Models.
@inproceedings{druck07learning,
Author = {Gregory Druck and Mukund Narasimhan and Paul Viola},
Booktitle = {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 07)},
Pages = {99--106},
Title = {Learning A* underestimates: Using inference to guide inference},
Year = {2007}}
|
|
 
|
2006
|
Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. Andrew McCallum, Chris Pal, Gregory Druck, Xuerui Wang.
In Proceedings of AAAI.
[abstract] [bib] [pdf]
This paper presents multi-conditional learning (MCL), a training
criterion based on a product of multiple conditional likelihoods.
When combining the traditional conditional probability of "label given
input" with a generative probability of "input given label" the
later acts as a surprisingly effective regularizer. When applied to
models with latent variables, MCL combines the structure-discovery
capabilities of generative topic models, such as latent Dirichlet
allocation and the exponential family harmonium, with the accuracy and robustness of
discriminative classifiers, such as logistic regression and
conditional random fields. We present results on several standard
text data sets showing significant reductions in classification error
due to MCL regularization, and substantial gains in precision and
recall due to the latent structure discovered under MCL.
@inproceedings{mccallum06multi,
Author = {Andrew McCallum and Chris Pal and Gregory Druck and Xuerui Wang},
Booktitle = {Proceedings of the American Association for Artificial Intelligence National Conference on Artificial Intelligence (AAAI 06)},
Pages = {433--439},
Title = {Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification},
Year = {2006}}
|
|
|