
PhD Student
Office 264
Dept of Computer Science
140 Governors Drive
University of Massachusetts
Amherst, MA (USA) 01003
Phone: (562) 726 4373
Email: sameer |{AT}| cs.umass.edu
Consider attending the Big Learning Workshop I'm co-organizing at NIPS 2011
About
I'm a PhD student in Computer Science at UMass Amherst.
I'm working with Andrew McCallum, as part of the Information Extraction and Synthesis Lab (IESL) on various learning and inference techniques for large factor graphs. I've also been working on FACTORIE and have worked on some of the interesting machine learning problems within Rexa.
Last year, I received the Yahoo! Key Scientific Challenges Award for the year 2010-2011 (yahoo link, umass story). For the year 2009-2010, I was granted the Department Award for Accomplishments in Search and Mining (sponsored by Yahoo!) by the Computer Science Department. I was also awarded the Graduate School Fellowship by the university for the year 2010-2011.
I co-chaired the fourth North-East Students Colloquium on Artificial Intelligence (NESCAI) 2010 with David Mimno. The conference was held at UMass on April 16-18, 2010.
I interned at Google Research at Mountain View, CA last summer, where I worked on inference for large graphical models, with cross-document coreference as the task. In summer 2009, I worked with the Advertising Sciences team in Yahoo! Labs on extracting entities from ads using minimal supervision. Before I started my PhD, I interned for two semesters at Google Pittsburgh, where I got a chance to apply machine learning to some of the biggest data sets available.
Before starting my PhD, I finished my MS in Computer Science from Vanderbilt in May 2007, where I worked with Doug Fisher. I grew up in New Delhi, India, got my Bachelors in Electrical Engineering from NSIT in 2004, and graduated from Sardar Patel Vidyalaya in 2000.
Research Interests
|
|
|
|
|
|
Recent Publications
Book Chapters
- J. Kubica, S. Singh, D. Sorokina
Parallel Large-scale Feature Selection
Scaling Up Machine Learning, Cambridge University Press, 2011
Website
Refereed
- S. Singh, A. McCallum
Towards Asynchronous Distributed MCMC Inference for Large Graphical Models
Neural Information Processing Systems (NIPS), Big Learning Workshop on Algorithms, Systems, and Tools for Learning at Scale, 2011
,
- S. Singh, B. Martin, A. McCallum
Inducing Value Sparsity for Parallel Inference in Tree-shaped Models
Neural Information Processing Systems (NIPS), Workshop on Computational Trade-offs in Statistical Learning (COST), 2011
,
- S. Singh, A. Subramanya, F. Pereira, A. McCallum
Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Association for Computational Linguistics: Human Language Technologies (ACL HLT), 2011
Best Talk Award at Machine Reading Project Phase 3 Kickoff, Seattle, WA
PDF, Slides, ,
Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
@inproceedings{singh11:large-scale,
Author = {Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum},
Booktitle = {Association for Computational Linguistics: Human Language Technologies (ACL HLT)},
Title = {Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models},
Year = {2011}}
- S. Singh, A. Subramanya, F. Pereira, A. McCallum
Distributed MAP Inference for Undirected Graphical Models
Neural Information Processing Systems (NIPS), Workshop on Learning on Cores, Clusters, and Clouds (LCCC), 2010
PDF, Slides, Video, ,
In this work, we distribute the MCMC-based MAP inference using the Map-Reduce framework. The variables are assigned randomly to machines, which leads to some factors that neighbor variables on separate machines. Parallel MCMC-chains are initiated using proposal distributions that only suggest local changes such that factors that lie across machines are not examined. After a fixed number of samples on each machine, we redistribute the variables amongst the machines to enable proposals across variables that were on different machines. To demonstrate the distribution strategy on a real-world information extraction application, we model the task of cross-document coreference.
@inproceedings{singh10:distributed,
Author = {Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum},
Booktitle = {Neural Information Processing Systems (NIPS), Workshop on Learning on Cores, Clusters and Clouds},
Title = {Distributed MAP Inference for Undirected Graphical Models},
Year = {2010}}
- S. Singh, L. Yao, S. Riedel, A. McCallum
Constraint-Driven Rank-Based Learning for Information Extraction
Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), 2010
PDF, Slides, ,
Most learning algorithms for factor graphs require complete inference over the dataset or an instance before making an update to the parameters. SampleRank is a rank-based learning framework that alleviates this problem by updating the parameters during inference. Most semi-supervised learning algorithms also rely on the complete inference, i.e. calculating expectations or MAP configurations. We extend the SampleRank framework to the semi-supervised learning, avoiding these inference bottlenecks. Different approaches for incorporating unlabeled data and prior knowledge into this framework are explored. We evaluated our method on a standard information extraction dataset. Our approach outperforms the supervised method significantly and matches the result of the competing semi-supervised learning approach.
@inproceedings{singh10:constraint:,
Author = {Sameer Singh and Limin Yao and Sebastian Riedel and Andrew McCallum},
Booktitle = {Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT)},
Title = {Constraint-Driven Rank-Based Learning for Information Extraction},
month = {June},
year = {2010},
address = {Los Angeles, California},
publisher = {Association for Computational Linguistics},
pages = {729--732}
}
- S. Singh, D. Hillard, C. Leggetter
Minimally-Supervised Extraction of Entities from Text Advertisements
Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), 2010
PDF, Slides, ,
Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and difficult to procure for ad creatives. A small set of manually derived constraints on feature expectations over unlabeled data can be used to *partially* and *probabilistically* label large amounts of data. Utilizing recent work in constraint-based semi-supervised learning, this paper injects light weight supervision specified as these ``constraints'' into a semi-Markov conditional random field model of entity extraction in ad creatives. Relying solely on the constraints, the model is trained on a set of unlabeled ads using an online learning algorithm. We demonstrate significant accuracy improvements on a manually labeled test set as compared to a baseline dictionary approach. We also achieve accuracy that approaches a fully supervised classifier.
@inproceedings{singh10:minimally:,
Author = {Sameer Singh and Dustin Hillard and Chris Leggetter},
Booktitle = {Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT)},
Title = {Minimally-Supervised Extraction of Entities from Text Advertisements},
Year = {2010}}
- A. McCallum, K. Schultz, S. Singh
FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs
Neural Information Processing Systems Conference (NIPS), 2009
PDF, Outline, ,
Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such "imperatively defined factor graphs" in a system we call Factorie, a software library for an object-oriented, strongly-typed, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 3-20 times faster while reducing error by 20-25%---achieving a new state of the art.
@inproceedings{mccallum09:factorie:,
Author = {Andrew McCallum and Karl Schultz and Sameer Singh},
Booktitle = {Neural Information Processing Systems (NIPS)},
Title = {FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs},
Year = {2009}}
- M. Wick, K. Rohanimesh, S. Singh, A. McCallum
Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
Neural Information Processing Systems Conference (NIPS), 2009
PDF, Outline, ,
Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these sampling-based methods suffer from local minima|the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed|we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain.
@inproceedings{wick09:training,
Author = {Michael Wick and Khashayar Rohanimanesh and Sameer Singh and Andrew McCallum},
Booktitle = {Neural Information Processing Systems (NIPS)},
Title = {Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference},
Year = {2009}}
