Andrew McCallum UMass logo

Publications

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

  • Reinforcement Learning for MAP Inference in Large Factor Graphs. Khashayar Rohanimanesh, Michael Wick, Sameer Singh, and Andrew McCallum. University of Massachusetts Technical Report #UM-CS-2008-040 (TR), 2008
  • Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors. David Mimno, Hanna Wallach and Andrew McCallum. NIPS Workshop on Analyzing Graphs, (NIPS WS), 2008, Whistler, BC.
  • FACTORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning. Andrew McCallum, Khashayar Rohanemanesh, Michael Wick, Karl Schultz, Sameer Singh. NIPS Workshop on Probabilistic Programming, (NIPS WS), 2008. (Discriminatively trained undirected graphical models, or conditional random fields, have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. Although there has been much historic interest in the combination of logic and probability, we argue that in this mixture 'logic' is largely a red herring. The power in relational models is in their repeated structure and tied parameters; and logic is not necessarily the best way to define these structures. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an object-oriented imperative language to express various aspects of model structure, inference and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented our ideas in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional JVM language named Scala.)
  • A Discriminative Approach to Ontology Alignment. Michael Wick, Khashayar Rohanimanesh, Andrew McCallum, and AnHai Doan. In the International Workshop on New Trends in Information Integration (NTII) at the conference for Very Large Databases (VLDB WS), Auckland, New Zealand, 2008. (New state-of-the-art results on ontology alignment using graph-shaped conditional random fields, joint inference, and parameter estimation by Rank-Based Training.)
  • A Unified Approach for Schema Matching, Coreference, and Canonicalization. Michael Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew McCallum. In Conference on Knowledge Discovery and Data Mining (KDD). 2008. (Information integration, performing joint inference over schema matching, entity resolution and canonicalization, using conditional random fields, features encoding clauses in first-order logic, and efficient inference by Metropolis-Hastings. Positive experimental results on multiple data sets.)
  • Unsupervised Deduplication using Cross-field Dependencies. Robert Hall, Charles Sutton, Andrew McCallum. In Conference on Knowledge Discovery and Data Mining (KDD). 2008. (Hierarchical Dirichlet process model that jointly clusters citation venue strings based on both string-edit distance and title information.)
  • Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors. Hanna Wallach, Charles Sutton, Andrew McCallum. In International Conference on Machine Learning, Workshop on Prior Knowledge for Text and Language Processing. (ICML WS), 2008. (Two Bayesian dependency parsing models: 1. Model with Pitman-Yor prior that significantly improves Eisner's classic model; 2. Latent-variable model that learns "syntactic" topics.)
  • Learning from Labeled Features using Generalized Expectation Criteria. Gregory Druck, Gideon Mann and Andrew McCallum. Proceedings of ACM Special Interest Group on Information Retreival, (SIGIR), 2008. (Learn classifiers by labeling features rather than instances. Extensive evaluation on many text data sets, showing substantial improvement over other methods of semi-supervised learning.)
  • Learning to Predict the Quality of Contributions to Wikipedia. Gregory Druck, Gerome Miklau and Andrew McCallum. AAAI Workshop on Wikipedia and AI, (AAAI WS), 2008. (Predict the longevity of an edit to Wikipedia, using textual features of the edit as well as features of the editor. Could be part of a tool to prioritize verification of changes to Wikipedia.)
  • Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression. David Mimno and Andrew McCallum. (Plenary presentation.) Conference on Uncertainty in Artificial Intelligence, (UAI), 2008. (Text documents are usually accompanied by metadata, such as the authors, the publication venue, the date, and any references. Work in topic modeling that has taken such information into account, such as Author-Topic, Citation-Topic, and Topic-over-Time models, has generally focused on constructing specific models that are suited only for one particular type of metadata. This paper presents a simple, unified model for learning topics from documents given arbitrary non-textual features, which can be discrete, categorical, or continuous.)
  • Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields. Gideon Mann and Andrew McCallum. Proceedings of Association of Computational Linguistics, (ACL), 2008. (Generalized expectation for semi-supervised learning of linear-chain conditional random fields.)
  • Piecewise Training for Structured Prediction. Charles Sutton and Andrew McCallum. Accepted to the Machine Learning Journal, (MLJ), 2008. (Efficiently train CRFs in parts. It works well even though full joint inference is used at test time.)
  • Pachinko Allocation: Scalable Mixture Models of Topic Correlations. Wei Li and Andrew McCallum. Submitted to the Journal of Machine Learning Research, (JMLR), 2008. (The pachinko allocation model represents nested correlations among topics using a DAG. This paper has work is in efficiently fitting these models, (as well as plain old LDA) by creating and leveraging sparsity in the distribution over topics to be sampled for each document.)

2007

  • Unsupervised Coreference of Publication Venues . Robert Hall, Charles Sutton and Andrew McCallum. University of Massachusetts Amherst Technical Report, (TR), 2007. (A generative non-parametric mixture model for entity resolution of publication venues that leverages both the venue titles as well as distributions over words in paper titles.)
  • Generalized Expectation Criteria. Andrew McCallum, Gideon Mann and Gregory Druck. University of Massachusetts Amherst Technical Report #2007-60, (TR), 2007. (This note introduces and motivates Generalized Expectation (GE) criteria. GE criteria are terms in a parameter-estimation objective function that express preferences about model expectations. In certain simple cases, GE falls into the same equivalence class as moment matching, maximum likelihood and maximum entropy estimation. However, our work focusses on leveraging GE's special flexibility in three non-traditional ways: (1) GE criteria can be specified indepently of the model parameterization. In factor graphs, we break the traditional one-to-one mapping between (a) subsets of variables participating in parametered model factors and (b) subsets of variables over which the objective function's expectations are calculated. (2) Within the same objective function, multiple GE terms that are conditional expectations can be conditioned on multiple different data sets. This is useful for semi-supervised learning and transfer learning. (3) A target expectation (or more generally the expectation preference function can come from any source, including other tasks or human domain knowledge. GE is the successor to Expectation Regularization, which is described in our ICML 2007 paper below.)
  • Reducing Annotation Effort using Generalized Expectation Criteria--DRAFT. Gregory Druck, Gideon Mann and Andrew McCallum. University of Massachusetts Amherst Technical Report #2007-62, (TR), 2007. (A version of Generalized Expectation (GE) in which the supervision is provided by labeling features instead of instances. Dramatically faster wall-clock labeling to acheive high accuracy. Experiments on document classification.)
  • Community-based Link Prediction with Text. David Mimno, Hanna M. Wallach and Andrew McCallum. In Proceedings of the NIPS 2007 Workshop on Statistical Network Modeling (NIPS WS), 2007. (New state-of-the-art results in link-prediction using a latent-variable topic model, in which "community" variables are associated with topic distributions and author distributions. Thus the model combines the use of language/topics and co-authorships to discover communities.)
  • Leveraging Existing Resources using Generalized Expectation Criteria. Gregory Druck, Gideon Mann and Andrew McCallum. NIPS Workshop on Learning Problem Design, (NIPS WS), 2007. (Generalized Expectation applied in situations in which there is no labeled data. All supervision is obtained form existing auxiliary resources such as lexicons. Experiments on information extraction.)
  • Lightly-Supervised Attribute Extraction for Web Search. Kedar Bellare, Partha Pratim Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum and Mark Dredze. NIPS Workshop on Machine Learning for Web Search, (NIPS WS), 2007. (Extract a large number of attributes of different entities from natural language text. Methods based on co-training and maximum entropy classifiers.)
  • People-LDA: Anchoring Topics to People Using Face Recognition. Vidit Jain, Erik Learned-Miller, and Andrew McCallum. International Conference on Computer Vision (ICCV), 2007. (Jointly model people's identity, face appearance in an image, and surrounding text in the image captions with an LDA-style topic model. Improved results in identifying coherent sets of person "mentions"---that is, improved co-reference by using both text and image features.)
  • Joint Group and Topic Discovery from Relations and Text. Andrew McCallum, Xuerui Wang and Natasha Mohanty, Statistical Network Analysis: Models, Issues and New Directions, Lecture Notes in Computer Science 4503, pp. 28-44, (Book chapter), 2007. (Book chapter version of NIPS 2006 conference paper. Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension in forms the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
  • Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. Xuerui Wang, Andrew McCallum and Xing Wei, Proceedings of the 7th IEEE International Conference on Data Mining (ICDM), 2007. (A topic model in the LDA style that uses a Markov model to automatically discover topically-relevant arbitrary-length phrases, not just lists of single words. The phrase discovery is not simply a post-processing step, but an intrinsic part of the model that helps it discover better topics. Experiments on document retrieval tasks.)
  • Canonicalization of Database Records using Adaptive Similarity Measures. Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Defines and explores the problem of "canonicalization"---selecting the best field values for a single, standard record formed from a set of consolodated, co-resolved information sources, such as arise from merging databases, or combining multiple sources of information extraction.)
  • Generalized Component Analysis for Text with Heterogeneous Attributes. Xuerui Wang, Chris Pal and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (A topic model based on an undirected graphical model, which makes it easier to incorporate multiple modalities.)
  • Semi-Supervised Classification with Hybrid Generative/Discriminative Methods. Greg Druck, Chris Pal, Xiaojin Zhu and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Leverage unlabeled data for text classification by using an objective function that combines (1) joint probability of labels and words and (2) conditional probability of labels give words.)
  • Expertise Modeling for Matching Papers with Reviewers. David Mimno and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (The Author-Persona-Topic model is a LDA-style topic model especially designed to represent expertise as a mixture of topical intersections. We show positive results in matching reviewers to conference papers, as assessed by human judgements.)
  • Learning Extractors from Unlabeled Text using Relevant Databases. Kedar Bellare and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Use conditional random fields to learn information extractors both from DB fields and from alignments of DB in free text. Uses an Alignment CRF, similar to our UAI 2005 paper.)
  • Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes. Pallika Kanani and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Improve entity resolution by adding web pages as new "mentions" to the graph-partitioning problem, and do so efficiently by selecting a subset of the possible queries and a subset of the returned pages.)
  • Probabilistic Representations for Integrating Unreliable Data Sources. David Mimno and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Probabilistic representation of field values used in merging and augmenting information from DBPL and research paper PDFs.)
  • Author Disambiguation using Error-Driven Machine Learning With a Ranking Loss Function. Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Entity resolution of people using high-order features, made efficient with Metropolis-Hastings and SampleRank, a learning method based ranking.)
  • Nonparametric Bayes Pachinko Allocation. Wei Li, David Blei and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2007. (A version of pachinko allocation that automatically determines the number of topics (and super-topics), and its sparse connectivity structure by Dirichlet process priors. Positive results in redisovering known structure in synthetic data, and in held-out likelihood versus PAM, hLDA and HDP.)
  • Improved Dynamic Schedules for Belief Propagation. Charles Sutton and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2007. (Significantly faster inference in graphical models by selecting which BP messages to send based on an approximation to their residual.)
  • Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization. Gideon Mann and Andrew McCallum. International Conference on Machine Learning (ICML), 2007. (Semi-supervised learning is seldom used in real applications because it is often complicated to implement, fragile in tuning or inefficient for large data. We introduce a new highly usable approach to semi-supervised learning, augmenting traditional label log-likelihood with an additional term that encourages model predictions on unlabeled data to match certain expectations. Positive results on 5 data sets versus EM, transductive SVM, entropy regularization and a graph-based method.)
  • Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. Charles Sutton and Andrew McCallum. ICML, 2007. (Train a large CRF in five times faster by dividing it into separate pieces and reducing numbers of predicted variable combinations with pseudolikelihood. Analysis in terms of belief propagation and Bethe energy.)
  • Mixtures of Hierarchical Topics with Pachinko Allocation. David Mimno, Wei Li and Andrew McCallum. ICML, 2007. (From a large document collection automatically discover topic hierarchies, where documents may be flexibly represented as mixtures across multiple leaves, not just mixtures up and down a single leaf-root path. Thus, for example, we can represent a document about instructing a robot in natural language, where those two topics are leaves. This new model, hPAM, combines the best of pachinko allocation (PAM) and hierarchical LDA (hLDA). Dramatic improvements in held-out data likelihood and mutual information between discovered topics and human-assigned categories.)
  • Transfer Learning for Enhancing Information Flow in Organizations and Social Networks. Chris Pal, Xuerui Wang and Andrew McCallum. Submitted to Conference on Email and Spam (CEAS), 2007. Technical Note. (Continuous hidden varable conditional random field for CC prediction/suggestion in email.)
  • Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email. Andrew McCallum, Xuerui Wang and Andres Corrada-Emmanuel. Journal of Artificial Intelligence Research (JAIR), 2007. (Journal paper version of IJCAI conference paper on Author-Recipient-Topic (ART) model.)
  • Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. Gideon Mann and Andrew McCallum. NAACL/HLT, (short paper) 2007. (A new, faster dynamic program for calculating the entropy of a finite-state subsequence and its gradient.)
  • First-Order Probabilistic Models for Coreference Resolution. Aron Culotta, Michael Wick, Robert Hall and Andrew McCallum. NAACL/HLT, 2007. (Traditional coreference uses features only over pairs of mentions. Here we present a conditional random field with first-order logic for expressing features, enabling features over sets of mentions. The result is a new state-of-the-art results on ACE 2004 coref, jumping from 69 to 79---a 45% reduction in error. The advance depends crucially on a new method of parameter estimation for such "weighted logic" models based on learning rankings and error-driven training.)
  • Sparse Message Passing Algorithms for Weighted Maximum Satisfiability. Aron Culotta, Andrew McCallum, Bart Selman, Ashish Sabharwal. New England Student Symposium on Artificial Intelligence (NESCAI), 2007. (A new algorithm for solving weighted maximum satisfiability (WMAX-SAT) problems that divides a large problem into sub-problems, and coordinates the global solution by message passing with sparse messages. Inspired by the desire to do joint-inference in (a) large weighted logics ala Markov Logic Networks, (b) large NLP pipelines, in which there are efficient pre-existing (dynamic programming) solutions to sub-parts of the pipeline. Positive results versus WalkSAT!)
  • Cryptogram Decoding for OCR using Numerzation Strings. Gary Huang, Erik Learned-Miller and Andrew McCallum. ICDAR, 2007. (Robust OCR without font appearance models by incorporating language modeling.)
  • Penn/UMass/CHOP BiocreativeII Systems. Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar Bellare, Andrew McCallum, Steven Carroll, Yang Jin, and Peter White. BiocreativeII Evaluation Workshop. 2007. (Description of our high-ranking entry in the competition for extraction and linkage from bioinformatics text.
  • Resource-bounded Information Gathering for Correlation Clustering. Pallika Kanai and Andrew McCallum. Conference on Computational Learning Theory (COLT) Open Problems Track, 2007. (We present a new class of problems in which the goal is to perform correlational clustering under circumstances in which accuracy can be improved by augmenting the given graph with additional information.)
  • Organizing the OCA: Learning faceted subjects from a library of digital books. David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL), 2007. (Introduces the DCM-LDA topic model, which represents topics by a Dirichlet-compound-multinomial rather than a multinomial. In addition to obtaining interesting information about the difference varianes of the topics, this model lends itself to efficient parallelization with very coarse-grained synchronization. The result is a topic model that can run on over 1 billion words in just a few hours.)
  • Mining a digital library for influential authors. David Mimno and Andrew McCallum. Joint Conference on Digial Libraries (JCDL), 2007. (A probabilistic model that ranks authors based on their influence on particular areas of scientific research. Integrates topics with citation patterns.)
  • Improving Author Coreference by Resource-bounded Information Gathering from the Web. Pallika Kanani, Andrew McCallum and Chris Pal. International Joint Conference on Artificial Intelligence (IJCAI), 2007. (Sometimes there is simply insufficient information to make an accurate entity resolution decision, and we must gather additional evidence. This paper describes the use of web queries to improve research paper author coreference, exploring two methods of augmenting a graph partitioning problem: using the web to obtain new features on existing edges, and use the web to obtain new nodes in the graph. We then go on to describe decision-theoretic approaches for maximizing accuracy gain with a limited budget of web queries, and demonstrate our methods on three large data sets.)
  • Dynamic Conditional Random Fields. Charles Sutton, Andrew McCallum and Khashayar Rohanimanesh. Journal of Machine Learning Research (JMLR), Vol. 8(Mar), pages 693-723, 2007. (Journal paper version of ICML paper by the same authors, with new experiments on marginal likelihood training.)

2006

  • On Discriminative and Semi-Supervised Dimensionality Reduction. Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck and Andrew McCallum. Advances in Neural Information Processing Systems, Workshop on Novel Applications of Dimensionality Reduction, (NIPS Workshop), 2006. (Using Multi-Conditional Learning, learn to distribute mixture components just were needed to address some discriminative task. See compelling figure on synthetic overlapping spiral data.)
  • Learning Field Compatibilities to Extract Database Records from Unstructured Text. Michael Wick, Aron Culotta and Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2006. (Record extraction, jointly accounting for multi-field compatibility by content and layout features.)
  • Tractable Learning and Inference with Higher-Order Representations. Aron Culotta and Andrew McCallum. ICML Workshop on Open Problems in Statistical Relational Learning, 2006. (When working with CRFs having features based on first-order logic, the "unrolled" graphical model would be far to large to fully instantiate. This paper describes a method leveraging MCMC to perform inference and learning while only partially instantiating the model. Positive results on entity resolution (of research papr authors) are described.)
  • Corrective Feedback and Persistent Learning for Information Extraction. Aron Culota, Trausti Kristjansson, Andrew McCallum, Paul Viola. Artificial Intelligence Journal (AIJ), volume 170, pages 1101-1122, 2006. (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction. Journal paper version of AAAI paper by the same authors below. Adds experiments with active learning.)
  • CC Prediction with Graphical Models. Chris Pal and Andrew McCallum. Conference on Email and Anti-Spam (CEAS), 2006. (Help keep an organization coordinated by suggesting who to carbon-copy on your outgoing email message.)
  • Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta, Andrew McCallum. HLT Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, 2006. (Markov Logic Networks are Conditional Random Fields that use first-order logic to define features and parameter tying patterns. Making such models scale to non-trivial data set sizes is a challenge because the size of the full instantiation of the model is exponential in the arity of the formulae. Here we describe a method of partial instantiation that allows such models to scale to entity resolution problems millions of entity mentions. On both citation and author entity resolution problems we show that inclusing such first-order features provides increases in accuracy.)
  • A Continuous-Time Model of Topic Co-occurrence Trends. Xuerui Wang, Wei Li, and Andrew McCallum. AAAI Workshop on Event Detection, 2006. (Capture the time distributions not only of a topics, but also of their co-occurrences. For example, notice that while NLP and ML have both been around for a long time, but their co-occurrence has been rising recently. The model is effectively a combination of the Pachinko Allocation Model (PAM) and Topics-Over-Time (TOT).)
  • Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning. Michael Kelm, Chris Pal, and Andrew McCallum. Draft accepted to the International Conference on Pattern Recognition (ICPR), 2006. (Multi-conditional learning explored in the context of computer vision.)
  • Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. Andrew McCallum, Chris Pal, Greg Druck, Xuerui Wang. AAAI, 2006. (Estimate parameters of an undirected graphical model not by joint likelihood, or conditional likelihood, but by a product of multiple conditional likelihoods. Can act as an improved regularizer. With latent variables, can cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models and (cross-cutting) conditional-training. Improved results on document classification, Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task.)
  • Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. Wei Li, and Andrew McCallum. ICML, 2006. (An LDA-style topic model that captures correlations between topics, enabling discovery of finer-grained topics. Similar motivations to Blei and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture arbitrary, nested and possibly sparse correlations among topics. Interior nodes of the DAG have a Dirichlet distribution over their children; words are in the leaves. Provides improved interpretability and held-out data likelihood.)
  • Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. Xuerui Wang and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD) 2006. (A new LDA-style topic model that models trends over time. The meaning of a topic remains fixed and reliable, but its prevalence over time is captured, and topics may thus focus in on co-occurrence patterns that are time-sensitive. Unlike other work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps. Improvements in topic saliency and the ability to predict time given words.)
  • Exploring the Use of Conditional Random Field Models and HMMs for Historical Handwritten Document Recognition. Shaolei L. Feng, R. Manmatha and Andrew McCallum. IEEE International Conference on Document Image Analysis for Libraries (DIAL 06), pp. 30-37. 2006. (Mixed results on CRFs applied to handwritten word recognition.)
  • Reducing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. HLT-NAACL, 2006. (Train separately CRFs with different subsets of the features, then integrate them at test time---four different variations on the method. Especially make more reliable use of lexicon features and other highly-predictable but brittle features.)
  • Integrating Probabilistic Extraction Models and Relational Data Mining to Discover Relations and Patterns in Text. Aron Culotta, Andrew McCallum and Jonathan Betz. HLT-NAACL, 2006. (Extract relations from Wikipedia articles. Run data mining on the relational graph to obtain patterns that are predictive of relations---such as "opponent of my opponent is my ally" and "a person is likely to have the same religion as their parents." Then use feaures derived from these patterns in a second run of extraction that improves accuracy.)
  • Bibliometric Impact Measures Leveraging Topic Analysis. Gideon Mann, David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL) 2006. (Use a new topic model that leverages n-grams to discover interpretable, fine-grained topics in over a million research papers. Use these topic divisions as well as automated citation analysis to extend three existing bibliometric impact measures, and create three new ones: Topical Diversity, Topical Transfer, Topical Precedence.)
  • An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton and Andrew McCallum. Book chapter in Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006. (An overview and introduction to conditional random fields for beginners and experts alike---motivation, background, mathematical foundations, linear-chain form, general-structure form, inference, parameter estimation, tips and tricks, an example application to information extraction with a skip-chain structure.)
  • Sparse Forward-Backward using Minimum Divergence Beams for Fast Training of Conditional Random Fields. Chris Pal, Charles Sutton, and Andrew McCallum. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006. (An alternative method for beam-search based on variational principles. Enables not only faster test-time performance of large-state-space CRFs, but this method makes beam search robust enough to be used at training time, enabling dramatically faster learning of discriminative finite-state methods for speech, IE and other applications.)
  • Table extraction for answer retrieval. Xing Wei, Bruce Croft and Andrew McCallum. Information Retrieval Journal (IRJ), volume 9, issue 5, pages 589-611, November 2006. (Information extraction from tables, using conditional random fields with language and layout features, with application to question answering. Journal paper version of our SIGIR 2003 paper.)
  • Semi-supervised Text Classification Using EM. Kamal Nigam, Andrew McCallum and Tom Mitchell. Book chapter in Chapelle, O., Zien, A., and Scholkopf, B. (Eds.) Semi-Supervised Learning. MIT Press: Boston. 2006. (Overview, description, experiments on using expectation maximization with naive Bayes text classifiers for learning from labeled and unlabeled data. A chapter in a book about various methods of semi-supervised learning.)
  • Group and Topic Discovery from Relations and Their Attributes. Xuerui Wang, Natasha Mohanty and Andrew McCallum. Neural Informaion Processing Systems (NIPS), 2006. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)

2005

  • A Note on Topical N-grams. Xuerui Wang and Andrew McCallum. University of Massachusetts Technical Report UM-CS-2005-071, 2005. (Discover topics like Latent Dirichlet Allocation, but model phrases in addition to single words on a per-topic basis. For example, in the Politics topic, "white house" has special meaning as a colocation, while in the RealEstate topic, modeling the individual words is sufficient. Our TNG model produces much cleaner, more interpretable topics.)
  • Pachinko allocation: A Directed Acyclic Graph for Topic Correlations. Wei Li and Andrew McCallum. NIPS Workshop on Nonparametric Bayesian Methods, 2005. (Similar motivations to Blei and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture arbitrary and possibly sparse correlations among topics. Interior nodes of the DAG have a Dirichlet distribution over their children; words are in the leaves. Provides improved interpretability and classification, as well as improved held-out likelihood over CTM. See ICML 2006 paper above.)
  • Direct Maximization of Rank-Based Metrics for Information Retrieval. Don Metzler, W. Bruce Croft and Andrew McCallum. CIIR Technical Report IR-429, 2005.
  • Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, volume 3, Number 9, November 2005. (An overview of information extraction by machine learning methods, written for people not familiar with machine learning, especially CTOs and other people in business.)
  • Learning Clusterwise Similarity with First-order Features. Aron Culotta and Andrew McCallum. NIPS Workshop on the Theoretical Foundations of Clustering. 2005. (Discriminatively-trained graph-partitioning methods for clustering, with features over entire clusters, including existential and universal quanifiers. Efficiently instantiate these features only on demand.)
  • Composition of Conditional Random Fields for Transfer Learning. Charles Sutton and Andrew McCallum. Proceedings of Human Language Technologies / Emprical Methods in Natural Language Processing (HLT/EMNLP) 2005. (Improve information extraction from email data by using the output of another extractor that was trained on large quantities of newswire. Improve accuracy further by using joint inference between the two tasks---so that the final target task can actually affect the output of the intermediate task.)
  • Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. Center for Intelligent Information Retrieval, University of Massachusetts Technical Report IR-402. 2005. (Avoid a common under-appreciated problem: overly heavy reliance on a few discriminative features which may not be as reliably present in the testing data. Discusses four methods of separate training and combination, and presents statistically-significant improvements---including new best results on CoNLL-2000 NP Chunking.)
  • Fast, Piecewise Training for Discriminative Finite-state and Parsing Models. Charles Sutton and Andrew McCallum. Center for Intelligent Information Retrieval Technical Report IR-403. 2005. (Further results with "piecewise training", a method also described in a UAI'05 paper below.)
  • Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta and Andrew McCallum. Technical Report IR-430, University of Massachusetts, September 2005. (Use existental and universal quantifiers in Markov Logic, doing so practially and efficiently by incrementally instantiating these terms as needed. Applied to object correspondence, this model combines the expressivity of BLOG with the predictive accuracy advantages of conditional probability training. Experiments on citation matching and author disambiguation.)
  • Joint Deduplication of Multiple Record Types in Relational Data. Aron Culotta and Andrew McCallum. Fourteenth Conference on Information and Knowledge Management (CIKM), 2005.
    (Longer Tech Report version: A Conditional Model of Deduplication for Multi-type Relational Data. Technical Report IR-443, University of Massachusetts, September 2005. (Leverage relations among multiple entity types to perform coreference collectively among all types. Uses CRF-style graph partitioning with a learned distance metric. Experimental results on joint coreference of both citations and their venues showing that accuracy on both improves.)
  • Collective Multi-Label Classification. Nadia Ghamrawi and Andrew McCallum. Fourteenth Conference on Information and Knowledge Management (CIKM), 2005. (Multi-label document classification with a conditional maximum entropy model that captures not only the traditional dependences between words and the class labels, but also the coocurrence dependencies between the class labels. Performs joint inference among all class labels.)
  • Predictive Random Fields: Latent Variable Models Fit by Multiway Conditional Probability with Applications to Document Analysis. Andrew McCallum, Xuerui Wang and Chris Pal. UMass Technical Report UM-CS-2005-053, version 2.1. 2005. (Cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models that are conditionally-trained. Improved results over Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task. This is an evolving Tech Report, which needs to be updated---in particular we are now referring to this method as "Multi-Conditional Learning" or "Multi-Conditional Mixtures".)
  • Group and Topic Discovery from Relations and Text. Xuerui Wang, Natasha Mohanty and Andrew McCallum. KDD Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD) 2005. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
  • Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation. Yu Gu, Andrew McCallum and Don Towsley. Internet Measurement Conference, 2005. (Build a density model of normal Internet traffic with Maximum Entropy and feature induction. Detect network attacks by density threshold.)
  • A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance. Andrew McCallum, Kedar Bellare and Fernando Pereira. Conference on Uncertainty in AI (UAI), 2005. (Train a string edit distance function from both positive and negative examples of string pairs (matching and mismatching). Significantly, the model designer is free to use arbitrary, fancy features of both strings, and also very flexible edit operations. This model is an example of an increasingly popular interesting class---conditionally-trained models with latent variables. Positive results on citations, addresses and names.)
  • Joint Parsing and Semantic Role Labeling. Charles Sutton and Andrew McCallum. CoNLL (Shared Task), 2005. (Attempt to improve accuracy by performing joint inference over parsing and semantic role labeling---preserving uncertainty and multiple hypotheses in Dan Bikel's parser. Unfortunately the effort yielded negative results, most likely because the components needed to produce better calibrated probabilities.)
  • Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005. (Use finite-state CRFs to locate introns and exons in DNA sequences. Shows the advantages of CRFs' ability to straightforwardly incorporate homology evidence from protein databases.)
  • Semi-Supervised Sequence Modeling with Syntactic Topic Models. Wei Li and Andrew McCallum. AAAI, 2005. (Learn a low-dimensional manifold from large quantities of unlabled text data, then use components of the manifold as additional features when training a linear-chain CRF with limited labeled data. The manifold is learned using HMM-LDA [Griffiths, Steyvers, Blei, Tenenbaum 2004], an unsupervised model with special structure suitable for sequences and topics. Experimens with English part-of-speech tagging and Chinese word segmentation.)
  • Reducing Labeling Effort for Structured Prediction Tasks. Aron Culotta and Andrew McCallum. AAAI, 2005. (A step toward bringing trainable information extraction to the masses! Make it easier for end-users to train IE by providing multiple-choice labeling options, and propagating any constraints their labels provide on portions of the record-labeling task.)
  • Topic and Role Discovery in Social Networks. Andrew McCallum, Andres Corrada-Emmanuel and Xuerui Wang. IJCAI, 2005. (Conference paper version of tech report by same authors in 2004 below. Also includes new results with Role-Author-Recipient-Topic model. Discover roles by social network analysis with a Bayesian network that models both links and text messages exchanged on those links. Experiments with Enron email and academic email.)
  • Piecewise Training for Undirected Models. Charles Sutton and Andrew McCallum. UAI, 2005. (Efficiently train a large graphical model in separately normalized pieces, and amazingly often obtain higher accuracy than without this approximation. This paper also shows that this piecewise objective is a lower bound on the exact likelihood, and gives results with three different graphical model structures.)
  • Constrained Kronecker Deltas for Fast Approximate Inference and Estimation. Chris Pal, Charles Sutton, Andrew McCallum. Submitted to UAI, 2005. (Sometimes the graph of the graphical model is not large and complex, but the cardinality of the variables is large. This paper describes a new and generalized method for beam search on graphical models, showing positive experimental results for both inference and training. Experiments on NetTalk.)
  • Multi-Way Distributional Clustering via Pairwise Interactions. Ron Bekkerman, Ran El-Yaniv and Andrew McCallum. ICML 2005. (Distributional clustering in multiple feature dimensions or modalities at once--made efficient by a factored representation as used in graphical models, and by a combination of top-down and bottom-up clustering. Results on email clustering, and new best results on 20 Newsgroups.)
  • Disambiguating Web Appearances of People in a Social Network. Ron Bekkerman and Andrew McCallum. WWW Conference, 2005. (Find homepages and other Web pages mentioning particular people. Do a better job by leveraging a collection of related people.)

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993 and earlier