Publications
Book Chapters
- J. Kubica, S. Singh, D. Sorokina
Parallel Large-scale Feature Selection
Scaling Up Machine Learning, Cambridge University Press, 2011
Website
Refereed
- S. Singh, A. McCallum
Towards Asynchronous Distributed MCMC Inference for Large Graphical Models
Neural Information Processing Systems (NIPS), Big Learning Workshop on Algorithms, Systems, and Tools for Learning at Scale, 2011
,
- S. Singh, B. Martin, A. McCallum
Inducing Value Sparsity for Parallel Inference in Tree-shaped Models
Neural Information Processing Systems (NIPS), Workshop on Computational Trade-offs in Statistical Learning (COST), 2011
,
- S. Singh, A. Subramanya, F. Pereira, A. McCallum
Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Association for Computational Linguistics: Human Language Technologies (ACL HLT), 2011
Best Talk Award at Machine Reading Project Phase 3 Kickoff, Seattle, WA
PDF, Slides, ,
Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
@inproceedings{singh11:large-scale,
Author = {Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum},
Booktitle = {Association for Computational Linguistics: Human Language Technologies (ACL HLT)},
Title = {Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models},
Year = {2011}}
- S. Singh, A. Subramanya, F. Pereira, A. McCallum
Distributed MAP Inference for Undirected Graphical Models
Neural Information Processing Systems (NIPS), Workshop on Learning on Cores, Clusters, and Clouds (LCCC), 2010
PDF, Slides, Video, ,
In this work, we distribute the MCMC-based MAP inference using the Map-Reduce framework. The variables are assigned randomly to machines, which leads to some factors that neighbor variables on separate machines. Parallel MCMC-chains are initiated using proposal distributions that only suggest local changes such that factors that lie across machines are not examined. After a fixed number of samples on each machine, we redistribute the variables amongst the machines to enable proposals across variables that were on different machines. To demonstrate the distribution strategy on a real-world information extraction application, we model the task of cross-document coreference.
@inproceedings{singh10:distributed,
Author = {Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum},
Booktitle = {Neural Information Processing Systems (NIPS), Workshop on Learning on Cores, Clusters and Clouds},
Title = {Distributed MAP Inference for Undirected Graphical Models},
Year = {2010}}
- S. Singh, L. Yao, S. Riedel, A. McCallum
Constraint-Driven Rank-Based Learning for Information Extraction
Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), 2010
PDF, Slides, ,
Most learning algorithms for factor graphs require complete inference over the dataset or an instance before making an update to the parameters. SampleRank is a rank-based learning framework that alleviates this problem by updating the parameters during inference. Most semi-supervised learning algorithms also rely on the complete inference, i.e. calculating expectations or MAP configurations. We extend the SampleRank framework to the semi-supervised learning, avoiding these inference bottlenecks. Different approaches for incorporating unlabeled data and prior knowledge into this framework are explored. We evaluated our method on a standard information extraction dataset. Our approach outperforms the supervised method significantly and matches the result of the competing semi-supervised learning approach.
@inproceedings{singh10:constraint:,
Author = {Sameer Singh and Limin Yao and Sebastian Riedel and Andrew McCallum},
Booktitle = {Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT)},
Title = {Constraint-Driven Rank-Based Learning for Information Extraction},
month = {June},
year = {2010},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {729--732}
}
- S. Singh, D. Hillard, C. Leggetter
Minimally-Supervised Extraction of Entities from Text Advertisements
Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), 2010
PDF, Slides, ,
Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and difficult to procure for ad creatives. A small set of manually derived constraints on feature expectations over unlabeled data can be used to *partially* and *probabilistically* label large amounts of data. Utilizing recent work in constraint-based semi-supervised learning, this paper injects light weight supervision specified as these ``constraints'' into a semi-Markov conditional random field model of entity extraction in ad creatives. Relying solely on the constraints, the model is trained on a set of unlabeled ads using an online learning algorithm. We demonstrate significant accuracy improvements on a manually labeled test set as compared to a baseline dictionary approach. We also achieve accuracy that approaches a fully supervised classifier.
@inproceedings{singh10:minimally:,
Author = {Sameer Singh and Dustin Hillard and Chris Leggetter},
Booktitle = {Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT)},
Title = {Minimally-Supervised Extraction of Entities from Text Advertisements},
Year = {2010}}
- A. McCallum, K. Schultz, S. Singh
FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs
Neural Information Processing Systems Conference (NIPS), 2009
PDF, Outline, ,
Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such "imperatively defined factor graphs" in a system we call Factorie, a software library for an object-oriented, strongly-typed, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 3-20 times faster while reducing error by 20-25%---achieving a new state of the art.
@inproceedings{mccallum09:factorie:,
Author = {Andrew McCallum and Karl Schultz and Sameer Singh},
Booktitle = {Neural Information Processing Systems (NIPS)},
Title = {FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs},
Year = {2009}}
- M. Wick, K. Rohanimesh, S. Singh, A. McCallum
Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
Neural Information Processing Systems Conference (NIPS), 2009
PDF, Outline, ,
Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these sampling-based methods suffer from local minima|the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed|we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain.
@inproceedings{wick09:training,
Author = {Michael Wick and Khashayar Rohanimanesh and Sameer Singh and Andrew McCallum},
Booktitle = {Neural Information Processing Systems (NIPS)},
Title = {Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference},
Year = {2009}}
- S. Singh, K. Schultz, A. McCallum
Bi-directional Joint Inference for Entity Resolution and Segmentation using Imperatively-Defined Factor Graphs
Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science) and European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2009
PDF, Slides, Video, ,
There has been growing interest in using joint inference across multiple subtasks as a mechanism for avoiding the cascading accumulation of errors in traditional pipelines. Several recent papers demonstrate joint inference between the segmentation of entity mentions and their de-duplication, however, they have various weaknesses: inference information flows only in one direction, the number of uncertain hypotheses is severely limited, or the subtasks are only loosely coupled. This paper presents a highly-coupled, bi-directional approach to joint inference based on efficient Markov chain Monte Carlo sampling in a relational conditional random field. The model is specified with our new probabilistic programming language that leverages imperative constructs to define factor graph structure and operation. Experimental results show that our approach provides a dramatic reduction in error while also running faster than the previous state-of-the-art system.
@inproceedings{singh09:bi-directional,
Author = {Sameer Singh and Karl Schultz and Andrew McCallum},
Booktitle = {Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science) and European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)},
Pages = {414-429},
Title = {Bi-directional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs},
Year = {2009}}
- S. Singh, J. Kubica, S. Larsen, D. Sorokina
Parallel Large Scale Feature Selection for Logistic Regression
SIAM International Conference on Data Mining (SDM), 2009
PDF, Slides, ,
In this paper we examine the problem of efficient feature evaluation for logistic regression on very large data sets. We present a new forward feature selection heuristic that ranks features by their estimated effect on the resulting model's performance. An approximate optimization, based on backfitting, provides a fast and accurate estimate of each new feature's coefficient in the logistic regression model. Further, the algorithm is highly scalable by parallelizing simultaneously over both features and records, allowing us to quickly evaluate billions of potential features even for very large data sets.
@inproceedings{singh09:parallel,
author = {Sameer Singh and
Jeremy Kubica and
Scott Larsen and
Daria Sorokina},
title = {Parallel Large Scale Feature Selection for Logistic Regression},
booktitle = {SIAM Data Mining Conference (SDM)},
year = {2009},
pages = {1171-1182},
ee = {http://www.siam.org/proceedings/datamining/2009/dm09_107_singhs.pdf}
}
- A. McCallum, K. Rohanimanesh, M. Wick, K. Schultz, S. Singh
FACTORIE: Efficient Probabilistic Programming via Imperative Declarations of Structure, Inference and Learning
Neural Information Processing Systems (NIPS), Workshop on Probabilistic Programming, 2008
PDF, ,
Abstract to be added
@inproceedings{mccallum08:factorie:,
Author = {Andrew McCallum and Khashayar Rohanimanesh and Michael Wick and Karl Schultz and Sameer Singh},
Booktitle = {Neural Information Processing Systems (NIPS), Workshop on Probabilistic Programming},
Title = {FACTORIE: Efficient Probabilistic Programming via Imperative Declarations of Structure, Inference and Learning},
Year = {2008}}
- T. Kichkaylo, C. van Buskirk, S. Singh, H. Neema, M. Orosz, R. Neches
Mixed-Initiative Planning for Space Exploration Missions
International Conference on Automated Planning and Scheduling Workshop (ICAPS), 2007
,
Abstract to be added
Bibtex to be added
- S.R. Schach, T.O.S. Adeshiyan, D. Balasubramanian, G. Madl, E.P. Osses, S. Singh, K. Suwanmongkol, M. Xie, D.G. Feitelson
Common Coupling and Pointer Variables, with Application to a Linux Case Study
Software Quality Journal (SQJ), 2007
PDF, ,
Abstract to be added
Bibtex to be added
- D.G. Feitelson, T.O.S. Adeshiyan, D. Balasubramanian, Y. Etsion, G. Madl, E.P. Osses, S. Singh, K. Suwanmongkol, M. Xie, S.R. Schach
Fine-Grain Analysis of Common Coupling and its Application to a Linux Case Study
Journal of Systems and Software (JSS)>, 2007
PDF, ,
Abstract to be added
Bibtex to be added
- S. Singh, J.A. Adams
Transfer of Learning for Complex Domains: A Demonstration Using Multiple Robots
International Conference on Robotics & Automation (ICRA), 2006
PDF, ,
Abstract to be added
Bibtex to be added
- S. Singh
Finding the shortest path for a mobile robot in an unmapped maze from minimum runs
Int Conf on CAD, CAM, Robotics & Autonomous Factories (INCARF), 2003
Non-Refereed
- S. Singh, M. Wick, A. McCallum
Distantly Labeling Data for Large Scale Cross-Document Coreference
Computing Research Repository (CoRR) eprint arXiv:1005.4298, 2010
PDF, ,
Cross-document coreference, the problem of resolving entity mentions across multi-document collections, is crucial to automated knowledge base construction and data mining tasks. However, the scarcity of large labeled data sets has hindered supervised machine learning research for this task. In this paper we develop and demonstrate an approach based on ``distantly-labeling'' a data set from which we can train a discriminative cross-document coreference model. In particular we build a dataset of more than a million people mentions extracted from 3.5 years of New York Times articles, leverage Wikipedia for distant labeling with a generative model (and measure the reliability of such labeling); then we train and evaluate a conditional random field coreference model that has factors on cross-document entities as well as mention-pairs. This coreference model obtains high accuracy in resolving mentions and entities that are not present in the training data, indicating applicability to non-Wikipedia data. Given the large amount of data, our work is also an exercise demonstrating the scalability of our approach.
@article{DBLP:journals/corr/abs-1005-4298,
author = {Sameer Singh and Michael L. Wick and Andrew McCallum},
title = {Distantly Labeling Data for Large Scale Cross-Document Coreference},
journal = {Computing Research Repository (CoRR)},
volume = {abs/1005.4298},
year = {2010},
ee = {http://arxiv.org/abs/1005.4298},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
- S. Singh, Readers: A. Barto, A. McCallum
Option Discovery in Hierarchical Reinforcement Learning for Training Large Factor Graphs for Information Extraction
University of Massachusetts Amherst, PhD Candidacy/Synthesis Report, 2009
PDF, ,
Since exact training and inference is not possible for most factor graphs, a number of techniques have been proposed to train models approximately, but they do not scale to large factor graphs used in recent work on joint inference on multiple information extraction tasks. SampleRank is an MCMC based training algorithm that performs competitively in accuracy and speed on these problems, but is sensitive to local minima in the scoring function. This problem has been averted by framing the problem in the reinforcement learning framework.
Both Reinforcement Learning and SampleRank are restricted in speed (and therefore accuracy) due to the proposal function. Naive proposal functions propose small changes to the configuration that do not contain information to accurately represent the evaluation of the move. If the proposal function is designed to propose larger moves in the configuration space, the number of possible moves increases dramatically, and domain knowledge is required to select useful moves, which may not be possible for most tasks. These problems lead to requiring a very large number of samples before good moves are discovered and high accuracy is obtained.
To avert these problems, MAP inference in factor graphs is reframed as hierarchical reinforcement learning, and a novel method for discovering options fast is introduced. Sample trajectories are analyzed to detect dependencies between primitive actions. These dependencies are exploited to extract the commonly occurring sequences efficiently, and are then abstracted to form closed loop stochastic policies. The method is efficient at discovering useful options, and can handle a large number of long trajectories. These options require fewer samples to reach regions of state space that are close to the goal state. Options discovered are shown to reduce the number of samples required for high accuracy on a real world information extraction dataset.
Bibtex to be added
- K. Rohanimanesh, M. Wick, S. Singh, A. McCallum
Reinforcement Learning for MAP Inference in Large Factor Graphs
University of Massachusetts Amherst, CMPSCI Technical Report, UM-CS-2008-040, 2008
,
Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these sampling-based methods suffer from local minima|the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL) to model delayed reward with a log-linear function approximation of residual future score improvement. Our method provides dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48\% reduction in error over state-of-the-art in that domain.
@techreport{rohanimanesh08:map-inference,
Address = {University of Massachusetts},
Author = {Khashayar Rohanimanesh and Michael Wick and Sameer Singh and Andrew McCallum},
Number = {UM-CS-2008-040},
Title = {MAP inference in Large Factor Graphs with Reinforcement Learning},
Year = {2008}}