Andrew McCallum

Contact Info
Bio & Affiliations
Vita
Teaching
Publications
Research & Projects
Code & Data
Students & other collab's
Activities & Events
Personal

Links:
UMass ML Seminar


Selected Publications by Topic (since 2001)

[ by topic | by date ]

Shortcuts:

  1. Social Network Analysis, Topic Models, Expertise Modeling and Clustering
  2. Coreference, Object Correspondence, Entity Resolution
  3. Efficient Inference and Learning in Graphical Models
  4. Joint Inference for NLP
  5. Information Extraction
  6. Semi-supervised Learning, Active Learning, Interactive Learning
  7. Bioinformatics
  8. Computer Vision, Networking, etc.
  9. Text Classification

Social Network Analysis, Topic Models, Expertise Modeling, and Clustering

  • Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email. Andrew McCallum, Xuerui Wang and Andres Corrada-Emmanuel. Journal of Artificial Intelligence Research (JAIR), 2007. (Journal paper version of IJCAI conference paper on Author-Recipient-Topic (ART) model.)
  • Mixtures of Hierarchical Topics with Pachinko Allocation. David Mimno, Wei Li and Andrew McCallum. ICML, 2007. (From a large document collection automatically discover topic hierarchies, where documents may be flexibly represented as mixtures across multiple leaves, not just mixtures up and down a single leaf-root path. Thus, for example, we can represent a document about instructing a robot in natural language, where those two topics are leaves. This new model, hPAM, combines the best of pachinko allocation (PAM) and hierarchical LDA (hLDA). Dramatic improvements in held-out data likelihood and mutual information between discovered topics and human-assigned categories.)
  • Nonparametric Bayes Pachinko Allocation. Wei Li, David Blei and Andrew McCallum. Draft submitted and accepted to UAI, 2007. (A version of pachinko allocation that automatically determines the number of topics (and super-topics), and its sparse connectivity structure by Dirichlet process priors. Positive results in redisovering known structure in synthetic data, and in held-out likelihood versus PAM, hLDA and HDP.)
  • Expertise Modeling for Matching Papers with Reviewers. David Mimno and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (The Author-Persona-Topic model is a LDA-style topic model especially designed to represent expertise as a mixture of topical intersections. We show positive results in matching reviewers to conference papers, as assessed by human judgements.)
  • Transfer Learning for Enhancing Information Flow in Organizations and Social Networks. Chris Pal, Xuerui Wang and Andrew McCallum. Submitted to Conference on Email and Spam (CEAS), 2007. Technical Note. (Continuous hidden varable conditional random field for CC prediction/suggestion in email.)
  • Generalized Component Analysis for Text with Heterogeneous Attributes. Xuerui Wang, Chris Pal and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (A topic model based on an undirected graphical model, which makes it easier to incorporate multiple modalities.)
  • Joint Group and Topic Discovery from Relations and Text. Andrew McCallum, Xuerui Wang and Natasha Mohanty, Statistical Network Analysis: Models, Issues and New Directions, Lecture Notes in Computer Science 4503, pp. 28-44, (Book chapter), 2007. (Book chapter version of NIPS 2006 conference paper. Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension in forms the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
  • Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. Xuerui Wang, Andrew McCallum and Xing Wei, Proceedings of the 7th IEEE International Conference on Data Mining (ICDM), 2007. (A topic model in the LDA style that uses a Markov model to automatically discover topically-relevant arbitrary-length phrases, not just lists of single words. The phrase discovery is not simply a post-processing step, but an intrinsic part of the model that helps it discover better topics. Experiments on document retrieval tasks.)
  • Organizing the OCA: Learning faceted subjects from a library of digital books. David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL), 2007. (Introduces the DCM-LDA topic model, which represents topics by a Dirichlet-compound-multinomial rather than a multinomial. In addition to obtaining interesting information about the difference varianes of the topics, this model lends itself to efficient parallelization with very coarse-grained synchronization. The result is a topic model that can run on over 1 billion words in just a few hours.)
  • Mining a digital library for influential authors. David Mimno and Andrew McCallum. Joint Conference on Digial Libraries (JCDL), 2007. (A probabilistic model that ranks authors based on their influence on particular areas of scientific research. Integrates topics with citation patterns.)
  • On Discriminative and Semi-Supervised Dimensionality Reduction. Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck and Andrew McCallum. Advances in Neural Information Processing Systems, Workshop on Novel Applications of Dimensionality Reduction, (NIPS Workshop), 2006. (Using Multi-Conditional Learning, learn to distribute mixture components just were needed to address some discriminative task. See compelling figure on synthetic overlapping spiral data.)
  • A Continuous-Time Model of Topic Co-occurrence Trends. Xuerui Wang, Wei Li, and Andrew McCallum. AAAI Workshop on Event Detection, 2006. (Capture the time distributions not only of a topics, but also of their co-occurrences. For example, notice that while NLP and ML have both been around for a long time, but their co-occurrence has been rising recently. The model is effectively a combination of the Pachinko Allocation Model (PAM) and Topics-Over-Time (TOT).)
  • Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning. Michael Kelm, Chris Pal, and Andrew McCallum. Draft accepted to the International Conference on Pattern Recognition (ICPR), 2006. (Multi-conditional learning explored in the context of computer vision.)
  • Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. Andrew McCallum, Chris Pal, Greg Druck, Xuerui Wang. AAAI, 2006. (Estimate parameters of an undirected graphical model not by joint likelihood, or conditional likelihood, but by a product of multiple conditional likelihoods. Can act as an improved regularizer. With latent variables, can cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models and (cross-cutting) conditional-training. Improved results on document classification, Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task.)
  • Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. Wei Li, and Andrew McCallum. ICML, 2006. (An LDA-style topic model that captures correlations between topics, enabling discovery of finer-grained topics. Similar motivations to Blei and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture arbitrary, nested and possibly sparse correlations among topics. Interior nodes of the DAG have a Dirichlet distribution over their children; words are in the leaves. Provides improved interpretability and held-out data likelihood.)
  • Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. Xuerui Wang and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD) 2006. (A new LDA-style topic model that models trends over time. The meaning of a topic remains fixed and reliable, but its prevalence over time is captured, and topics may thus focus in on co-occurrence patterns that are time-sensitive. Unlike other work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps. Improvements in topic saliency and the ability to predict time given words.)
  • Bibliometric Impact Measures Leveraging Topic Analysis. Gideon Mann, David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL) 2006. (Use a new topic model that leverages n-grams to discover interpretable, fine-grained topics in over a million research papers. Use these topic divisions as well as automated citation analysis to extend three existing bibliometric impact measures, and create three new ones: Topical Diversity, Topical Transfer, Topical Precedence.)
  • Group and Topic Discovery from Relations and Their Attributes. Xuerui Wang, Natasha Mohanty and Andrew McCallum. Neural Informaion Processing Systems (NIPS), 2006. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
  • A Note on Topical N-grams. Xuerui Wang and Andrew McCallum. University of Massachusetts Technical Report UM-CS-2005-071, 2005. (Discover topics like Latent Dirichlet Allocation, but model phrases in addition to single words on a per-topic basis. For example, in the Politics topic, "white house" has special meaning as a colocation, while in the RealEstate topic, modeling the individual words is sufficient. Our TNG model produces much cleaner, more interpretable topics.)
  • Predictive Random Fields: Latent Variable Models Fit by Multiway Conditional Probability with Applications to Document Analysis. Andrew McCallum, Xuerui Wang and Chris Pal. UMass Technical Report UM-CS-2005-053, version 2.1. 2005. (Cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models that are conditionally-trained. Improved results over Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task. This is an evolving Tech Report, which needs to be updated---in particular we are now referring to this method as "Multi-Conditional Learning" or "Multi-Conditional Mixtures".)
  • Learning Clusterwise Similarity with First-order Features. Aron Culotta and Andrew McCallum. NIPS Workshop on the Theoretical Foundations of Clustering. 2005. (Discriminatively-trained graph-partitioning methods for clustering, with features over entire clusters, including existential and universal quanifiers. Efficiently instantiate these features only on demand.)
  • Group and Topic Discovery from Relations and Text. Xuerui Wang, Natasha Mohanty and Andrew McCallum. KDD Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD) 2005. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
  • Topic and Role Discovery in Social Networks. Andrew McCallum, Andres Corrada-Emmanuel and Xuerui Wang. IJCAI, 2005. (Conference paper version of tech report by same authors in 2004 below. Also includes new results with Role-Author-Recipient-Topic model. Discover roles by social network analysis with a Bayesian network that models both links and text messages exchanged on those links. Experiments with Enron email and academic email.)
  • The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email. Andrew McCallum, Andres Corrada-Emmanuel, Xuerui Wang. Technical Report UM-CS-2004-096, 2004. (Also presented the NIPS'04 Workshop on " Structured Data and Representations in Probabilistic Models for Categorization") (Social network analysis that not only models links between people, but the word content of the messages exchanged between them. Discovers salient topics guided by the sender-recipient structure in data, and provides improved ability to measure role-similarity between people. A generative model in the style of Latent Dirichlet Allocation.)
  • Disambiguating Web Appearances of People in a Social Network. Ron Bekkerman and Andrew McCallum. WWW Conference, 2005. (Find homepages and other Web pages mentioning particular people. Do a better job by leveraging a collection of related people.)
  • Multi-Way Distributional Clustering via Pairwise Interactions. Ron Bekkerman, Ran El-Yaniv and Andrew McCallum. ICML 2005. (Distributional clustering in multiple feature dimensions or modalities at once--made efficient by a factored representation as used in graphical models, and by a combination of top-down and bottom-up clustering. Results on email clustering, and new best results on 20 Newsgroups.)
  • Extracting Social Networks and Contact Information from Email and the Web. Aron Culotta, Ron Bekkerman and Andrew McCallum. Conference on Email and Spam (CEAS) 2004. (Describes an early version of an end-to-end system that automatically populates your email address book with a large social network, including "friends-of-friends," and information about people's expertise.)
  • An Exploration of Entity Models, Collective Classification and Relation Description. Hema Raghavan, James Allan and Andrew McCallum. KDD Workshop on Link Analysis and Group Detection, August 2004. (Part of a student synthesis project: includes an application of RMNs to classifying people in newswire.)

Coreference, Object Correspondence, Entity Resolution

  • Resource-bounded Information Gathering for Correlation Clustering. Pallika Kanai and Andrew McCallum. Conference on Computational Learning Theory (COLT) Open Problems Track, 2007. (We present a new class of problems in which the goal is to perform correlational clustering under circumstances in which accuracy can be improved by augmenting the given graph with additional information.)
  • First-Order Probabilistic Models for Coreference Resolution. Aron Culotta, Michael Wick, Robert Hall and Andrew McCallum. NAACL/HLT, 2007. (Traditional coreference uses features only over pairs of mentions. Here we present a conditional random field with first-order logic for expressing features, enabling features over sets of mentions. The result is a new state-of-the-art results on ACE 2004 coref, jumping from 69 to 79---a 45% reduction in error. The advance depends crucially on a new method of parameter estimation for such "weighted logic" models based on learning rankings and error-driven training.)
  • Probabilistic Representations for Integrating Unreliable Data Sources. David Mimno and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Probabilistic representation of field values used in merging and augmenting information from DBPL and research paper PDFs.)
  • Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes. Pallika Kanani and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Improve entity resolution by adding web pages as new "mentions" to the graph-partitioning problem, and do so efficiently by selecting a subset of the possible queries and a subset of the returned pages.)
  • Canonicalization of Database Records using Adaptive Similarity Measures. Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Defines and explores the problem of "canonicalization"---selecting the best field values for a single, standard record formed from a set of consolodated, co-resolved information sources, such as arise from merging databases, or combining multiple sources of information extraction.)
  • Improving Author Coreference by Resource-bounded Information Gathering from the Web. Pallika Kanani, Andrew McCallum and Chris Pal. International Joint Conference on Artificial Intelligence (IJCAI), 2007. (Sometimes there is simply insufficient information to make an accurate entity resolution decision, and we must gather additional evidence. This paper describes the use of web queries to improve research paper author coreference, exploring two methods of augmenting a graph partitioning problem: using the web to obtain new features on existing edges, and use the web to obtain new nodes in the graph. We then go on to describe decision-theoretic approaches for maximizing accuracy gain with a limited budget of web queries, and demonstrate our methods on three large data sets.)
  • Tractable Learning and Inference with Higher-Order Representations. Aron Culotta and Andrew McCallum. ICML Workshop on Open Problems in Statistical Relational Learning, 2006. (When working with CRFs having features based on first-order logic, the "unrolled" graphical model would be far to large to fully instantiate. This paper describes a method leveraging MCMC to perform inference and learning while only partially instantiating the model. Positive results on entity resolution (of research papr authors) are described.)
  • Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta and Andrew McCallum. Technical Report IR-430, University of Massachusetts, September 2005. (Use existental and universal quantifiers in Markov Logic, doing so practially and efficiently by incrementally instantiating these terms as needed. Applied to object correspondence, this model combines the expressivity of BLOG with the predictive accuracy advantages of conditional probability training. Experiments on citation matching and author disambiguation.)
  • Joint Deduplication of Multiple Record Types in Relational Data. Aron Culotta and Andrew McCallum. Fourteenth Conference on Information and Knowledge Management (CIKM), 2005.
    (Longer Tech Report version: A Conditional Model of Deduplication for Multi-type Relational Data. Technical Report IR-443, University of Massachusetts, September 2005. (Leverage relations among multiple entity types to perform coreference collectively among all types. Uses CRF-style graph partitioning with a learned distance metric. Experimental results on joint coreference of both citations and their venues showing that accuracy on both improves.)
  • A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance. Andrew McCallum, Kedar Bellare and Fernando Pereira. Conference on Uncertainty in AI (UAI), 2005. (Train a string edit distance function from both positive and negative examples of string pairs (matching and mismatching). Significantly, the model designer is free to use arbitrary, fancy features of both strings, and also very flexible edit operations. This model is an example of an increasingly popular interesting class---conditionally-trained models with latent variables. Positive results on citations, addresses and names.)
  • Disambiguating Web Appearances of People in a Social Network. Ron Bekkerman and Andrew McCallum. WWW Conference, 2005. (Find homepages and other Web pages mentioning particular people. Do a better job by leveraging a collection of related people.)
  • Conditional Models of Identity Uncertainty with Application to Noun Coreference. Andrew McCallum and Ben Wellner. Neural Information Processing Systems (NIPS), 2004. (A model of object consolidation, based on graph partitioning with learned edge weights. Conference paper version of 2003 work in KDD Workshop on Data Cleaning.)
  • An Integrated, Conditional Model of Information Extraction and Coreference with Application to Citation Matching. Ben Wellner, Andrew McCallum, Fuchun Peng, Michael Hay. Conference on Uncertainty in Artificial Intelligence (UAI), 2004. (A conditionally-trained graphical model for identity uncertainty in relational domains, representing mentions, entities and their attributes. Also a first example of joint inference for extraction and identity uncertainty--coreference decisions actually integrate out uncertainty about information extraction.)
  • Object Consolidation by Graph Partitioning with a Conditionally-trained Distance Metric. Andrew McCallum and Ben Wellner. KDD Workshop on Data Cleaning, Record Linkage and Object Consolidation, 2003. (Later, improved version of workshop paper immediately below.)
  • Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference. Andrew McCallum and Ben Wellner. IJCAI Workshop on Information Integration on the Web, 2003. (A conditionally-trained model of object consolidation, based on graph partitioning with learned edge weights.)

Efficient Inference and Learning in Graphical Models

  • Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. Gideon Mann and Andrew McCallum. NAACL/HLT, (short paper) 2007. (A new, faster dynamic program for calculating the entropy of a finite-state subsequence and its gradient.)
  • Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. Charles Sutton and Andrew McCallum. ICML, 2007. (Train a large CRF in five times faster by dividing it into separate pieces and reducing numbers of predicted variable combinations with pseudolikelihood. Analysis in terms of belief propagation and Bethe energy.)
  • Sparse Message Passing Algorithms for Weighted Maximum Satisfiability. Aron Culotta, Andrew McCallum, Bart Selman, Ashish Sabharwal. New England Student Symposium on Artificial Intelligence (NESCAI), 2007. (A new algorithm for solving weighted maximum satisfiability (WMAX-SAT) problems that divides a large problem into sub-problems, and coordinates the global solution by message passing with sparse messages. Inspired by the desire to do joint-inference in (a) large weighted logics ala Markov Logic Networks, (b) large NLP pipelines, in which there are efficient pre-existing (dynamic programming) solutions to sub-parts of the pipeline. Positive results versus WalkSAT!)
  • Improved Dynamic Schedules for Belief Propagation. Charles Sutton and Andrew McCallum. Conference on Uncertainty in Artificial Intelligent (UAI), 2007. (Significantly faster inference in graphical models by selecting which BP messages to send based on an approximation to their residual.)
  • Tractable Learning and Inference with Higher-Order Representations. Aron Culotta and Andrew McCallum. ICML Workshop on Open Problems in Statistical Relational Learning, 2006. (When working with CRFs having features based on first-order logic, the "unrolled" graphical model would be far to large to fully instantiate. This paper describes a method leveraging MCMC to perform inference and learning while only partially instantiating the model. Positive results on entity resolution (of research papr authors) are described.)
  • Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta, Andrew McCallum. HLT Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, 2006. (Markov Logic Networks are Conditional Random Fields that use first-order logic to define features and parameter tying patterns. Making such models scale to non-trivial data set sizes is a challenge because the size of the full instantiation of the model is exponential in the arity of the formulae. Here we describe a method of partial instantiation that allows such models to scale to entity resolution problems millions of entity mentions. On both citation and author entity resolution problems we show that inclusing such first-order features provides increases in accuracy.)
  • Sparse Forward-Backward using Minimum Divergence Beams for Fast Training of Conditional Random Fields. Chris Pal, Charles Sutton, and Andrew McCallum. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006. (An alternative method for beam-search based on variational principles. Enables not only faster test-time performance of large-state-space CRFs, but this method makes beam search robust enough to be used at training time, enabling dramatically faster learning of discriminative finite-state methods for speech, IE and other applications.)
  • Fast, Piecewise Training for Discriminative Finite-state and Parsing Models. Charles Sutton and Andrew McCallum. Center for Intelligent Information Retrieval Technical Report IR-403. 2005. (Further results with "piecewise training", a method also described in a UAI'05 paper below.)
  • Piecewise Training for Undirected Models. Charles Sutton and Andrew McCallum. UAI, 2005. (Efficiently train a large graphical model in separately normalized pieces, and amazingly often obtain higher accuracy than without this approximation. This paper also shows that this piecewise objective is a lower bound on the exact likelihood, and gives results with three different graphical model structures.)
  • Constrained Kronecker Deltas for Fast Approximate Inference and Estimation. Chris Pal, Charles Sutton, Andrew McCallum. Submitted to UAI, 2005. (Sometimes the graph of the graphical model is not large and complex, but the cardinality of the variables is large. This paper describes a new and generalized method for beam search on graphical models, showing positive experimental results for both inference and training. Experiments on NetTalk.)
  • Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs. Andrew McCallum and Charles Sutton. Center for Intelligent Information Retrieval, University of Massachusetts  Technical Report IR-383. 2004. (Also presented at NIPS 2004 Workshop on Learning with Structured Outputs.) (Large undirected graphical models are expensive to train because they require global inference to calculate the gradient of the parameters. We describe a new method for fast training in locally-normalized pieces. Amazingly the resulting models also give higher accuracy than their globally-trained counterparts.)

 

Joint Inference for NLP

Information Extraction

  • Penn/UMass/CHOP BiocreativeII Systems. Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar Bellare, Andrew McCallum, Steven Carroll, Yang Jin, and Peter White. BiocreativeII Evaluation Workshop. 2007. (Description of our high-ranking entry in the competition for extraction and linkage from bioinformatics text.
  • Dynamic Conditional Random Fields. Charles Sutton, Andrew McCallum and Khashayar Rohanimanesh. Journal of Machine Learning Research (JMLR), Vol. 8(Mar), pages 693-723, 2007. (Journal paper version of ICML paper by the same authors, with new experiments on marginal likelihood training.)
  • Learning Field Compatibilities to Extract Database Records from Unstructured Text. Aron Culotta, Michael Wick, and Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2006. (Record extraction, jointly accounting for multi-field compatibility by content and layout features.)
  • Reducing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. HLT-NAACL, 2006. (Train separately CRFs with different subsets of the features, then integrate them at test time---four different variations on the method. Especially make more reliable use of lexicon features and other highly-predictable but brittle features.)
  • An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton and Andrew McCallum. In Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006. (An overview and introduction to conditional random fields for beginners and experts alike---motivation, background, mathematical foundations, linear-chain form, general-structure form, inference, parameter estimation, tips and tricks, an example application to information extraction with a skip-chain structure.)
  • Table extraction for answer retrieval. Xing Wei, Bruce Croft and Andrew McCallum. Information Retrieval Journal, volume 9, issue 5, pages 589-611, November 2006. (Information extraction from tables, using conditional random fields with language and layout features, with application to question answering. Journal paper version of our SIGIR 2003 paper.)
  • Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, volume 3, Number 9, November 2005. (An overview of information extraction by machine learning methods, written for people not familiar with machine learning, especially CTOs and other people in business.)
  • Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. Center for Intelligent Information Retrieval, University of Massachusetts Technical Report IR-402. 2005. (Avoid a common under-appreciated problem: overly heavy reliance on a few discriminative features which may not be as reliably present in the testing data. Discusses four methods of separate training and combination, and presents statistically-significant improvements---including new best results on CoNLL-2000 NP Chunking.)
  • Composition of Conditional Random Fields for Transfer Learning. Charles Sutton and Andrew McCallum. Proceedings of Human Language Technologies / Emprical Methods in Natural Language Processing (HLT/EMNLP) 2005. (Improve information extraction from email data by using the output of another extractor that was trained on large quantities of newswire. Improve accuracy further by using joint inference between the two tasks---so that the final target task can actually affect the output of the intermediate task.)
  • Reducing Labeling Effort for Structured Prediction Tasks. Aron Culotta and Andrew McCallum. AAAI, 2005. (A step toward bringing trainable information extraction to the masses! Make it easier for end-users to train IE by providing multiple-choice labeling options, and propagating any constraints their labels provide on portions of the record-labeling task.)
  • Extracting Social Networks and Contact Information from Email and the Web. Aron Culotta, Ron Bekkerman and Andrew McCallum. Conference on Email and Spam (CEAS) 2004. (Describes an early version of an end-to-end system that automatically populates your email address book with a large social network, including "friends-of-friends," and information about people's expertise.)
  • Accurate Information Extraction from Research Papers using Conditional Random Fields. Fuchun Peng and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004. (Applies CRFs to extraction from research paper headers and reference sections, to obtain current best-in-the-world accuracy. Also compares some simple regularization methods.)
  • Chinese Segmentation and New Word Detection using Conditional Random Fields. Fuchun Peng, Fangfang Feng, and Andrew McCallum. Proceedings of The 20th International Conference on Computational Linguistics (COLING 2004) , August 23-27, 2004, Geneva, Switzerland. (State-of-the art Chinese word segmentation with CRFs, with rich features and many lexicons; also using confidence estimation to add new words to the lexicon.)
  • Confidence Estimation for Information Extraction. Aron Culotta and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004, short paper. (How to provide not only an answer, but a formally-justified confidence in that answer--using contrained forward-backward.)
  • Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Induction. Wei Li and Andrew McCallum. ACM Transactions on Asian Language Information Processing, 2003. (How we developed a named entity recognition system for Hindi in just a few weeks.)
  • Efficiently Inducing Features of Conditional Random Fields. Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2003. (CRFs give you the great power to include the kitchen sink worth of features. How do you decide which ones to include to avoid over-fitting and running out of memory? A formal, information-theoretic approach, with carefully-chosen approximations to make it efficient with millions of candidate features. This technique key to success in Hindi above, as well as work by Pereira's group at UPenn)
  • Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Andrew McCallum and Wei Li. Seventh Conference on Natural Language Learning (CoNLL), 2003. (This is the first publication about named entity extraction with CRFs.)
  • Table Extraction Using Conditional Random Fields. David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft. Proceedings of the ACM SIGIR, 2003. (Application of CRFs to finding tables in government reports. Uses both language and layout features.)
  • Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. John Lafferty, Andrew McCallum and Fernando Pereira. ICML-2001. (A conditionally-trained model for sequences and other structured data, with global normalization. The original CRF paper. Don't bother reading the section on parameter estimation---use BFGS instead of Iterative Scaling; e.g. see [McCallum UAI 2003].)

Semi-supervised Learning, Active Learning, Interactive Learning

  • Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization. Gideon Mann and Andrew McCallum. International Conference on Machine Learning (ICML), 2007. (Semi-supervised learning is seldom used in real applications because it is often complicated to implement, fragile in tuning or inefficient for large data. We introduce a new highly usable approach to semi-supervised learning, augmenting traditional label log-likelihood with an additional term that encourages model predictions on unlabeled data to match certain expectations. Positive results on 5 data sets versus EM, transductive SVM, entropy regularization and a graph-based method.)
  • Learning Extractors from Unlabeled Text using Relevant Databases. Kedar Bellare and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Use conditional random fields to learn information extractors both from DB fields and from alignments of DB in free text. Uses an Alignment CRF, similar to our UAI 2005 paper.)
  • Semi-Supervised Classification with Hybrid Generative/Discriminative Methods. Greg Druck, Chris Pal, Xiaojin Zhu and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Leverage unlabeled data for text classification by using an objective function that combines (1) joint probability of labels and words and (2) conditional probability of labels give words.)
  • Corrective Feedback and Persistent Learning for Information Extraction. Aron Culota, Trausti Kristjansson, Andrew McCallum, Paul Viola. Artificial Intelligence Journal, volume 170, pages 1101-1122, 2006. (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction. Journal paper version of AAAI paper by the same authors below. Adds experiments with active learning.)
  • Semi-supervised Text Classification Using EM. Kamal Nigam, Andrew McCallum and Tom Mitchell. In Chapelle, O., Zien, A., and Scholkopf, B. (Eds.) Semi-Supervised Learning. MIT Press: Boston. 2006. (Overview, description, experiments on using expectation maximization with naive Bayes text classifiers for learning from labeled and unlabeled data. A chapter in a book about various methods of semi-supervised learning.)
  • Semi-Supervised Sequence Modeling with Syntactic Topic Models. Wei Li and Andrew McCallum. AAAI, 2005. (Learn a low-dimensional manifold from large quantities of unlabled text data, then use components of the manifold as additional features when training a linear-chain CRF with limited labeled data. The manifold is learned using HMM-LDA [Griffiths, Steyvers, Blei, Tenenbaum 2004], an unsupervised model with special structure suitable for sequences and topics. Experimens with English part-of-speech tagging and Chinese word segmentation.)
  • Reducing Labeling Effort for Structured Prediction Tasks. Aron Culotta and Andrew McCallum. AAAI, 2005. (A step toward bringing trainable information extraction to the masses! Make it easier for end-users to train IE by providing multiple-choice labeling options, and propagating any constraints their labels provide on portions of the record-labeling task.)
  • Interactive Information Extraction with Constrained Conditional Random Fields. Trausti Kristjannson, Aron Culotta, Paul Viola and Andrew McCallum. Nineteenth National Conference on Artificial Intelligence (AAAI 2004). San Jose, CA. (Winner of Honorable Mention Award.) (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction.)
  • A Note on Semi-supervised Learning using Markov Random Fields. Wei Li and Andrew McCallum. Technical Note, February 3, 2004. (A general framework for semi-supervised learning in Conditional Random Fields, with a focus on learning the distance metric between instances. Experimental results with collective classification of documents.)
  • Learning with Scope, with Application to Information Extraction and Classification. David Blei, Drew Bagnell and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2002. (Learn highly reliable formatting-based extractors on the fly at test time, using graphical models and variational inference. Describes both generative and conditional versions of the model.)
  • Toward Optimal Active Learning through Sampling Estimation of Error Reduction. Nick Roy and Andrew McCallum. ICML-2001. (A leave-one-out approach to active learning.)

Bioinformatics

  • Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005. (Use finite-state CRFs to locate introns and exons in DNA sequences. Shows the advantages of CRFs' ability to straightforwardly incorporate homology evidence from protein databases.)

Computer Vision, Networking, etc

Text Classification