Andrew McCallum UMass logo

Code

  • FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.  It is flexible, supporting multiple modeling and inference paradigms. Its original emphasis was on conditional random fields, undirected graphical models, MCMC inference, online training, and discriminative parameter estimation. However, it now also supports directed generative models (such as latent Dirichlet allocation), and has preliminary support for variational inference, including belief propagation and mean-field methods.  It is also scalable, with demonstrated success on problems with many millions of variables and factors, and on models that have changing structure, such as case factor diagrams. It has also been plugged into a database back-end, representing a new approach to probabilistic databases capable of handling billions of variables.
  • MALLET is a library of Java code for machine learning applied to text. It provides facilities not only for document classification, but also information extraction, part-of-speech tagging, noun phrase segmentation, and much more. The development of the library is quite mature, however it does not yet have as polished front-ends or documentation as rainbow.
  • Libbow is a library of C code for document classification, clustering and retrieval. Also provided with the library is rainbow, its popular front-end for document classification, and archer, a speedy disk-based document retrieval engine with an AltaVista-like query interface, with the ability to handle several gigabytes of text.
  • Cora HMM is the C implementation of HMMs used for information extraction in Cora. It was written by Kristie Seymore.
  • RLKIT a software library that makes it easy to test various reinforcement learning algorithms in different environments with different sensory-motor systems. It's implemented in Objective-C and GNU Guile (Scheme).