Andrew McCallum

Contact Info
Bio & Affiliations
Vita
Teaching
Publications
Research & Projects
Code & Data
Students & other collab's
Activities & Events
Personal

Links:
UMass ML Seminar


Andrew McCallum

Associate Professor
Department of Computer Science
University of Massachusetts Amherst
140 Governors Drive
Amherst, MA 01003

voice: (413) 545-1323
fax: (413) 545-1789
mccallum@cs.umass.edu

Andrew in office

Research

The main goal of my research is to dramatically increase our ability to mine actionable knowledge from unstructured text. I am especially interested in information extraction from the Web, understanding the connections between people and between organizations, expert finding, social network analysis, and mining the scientific literature & community. Toward this end my group develops and employs various methods in statistical machine learning, natural language processing, information retrieval and data mining---tending toward probabilistic approaches and graphical models. For more information see our current projects and publications.

Teaching

This past Fall I taught CMPSCI 585, an Introduction to Natural Language Processing.

News & Notables

  • With Sam Roweis, am I Co-Program-Chair of ICML 2008.
  • We have publicly launched Rexa, a new research paper search engine. It is a sibling to CiteSeer and Google Scholar, except that it provides search and browsing over more "object types", including not just papers, but also people, grants and topics.
  • Charles Sutton and I have a comprehensive introduction to conditional random fields, a book chapter in Lise Getoor and Ben Taskar's book on statistical relational learning.
  • I've writen an introduction to information extraction by machine learning, intended for an audience that doesn't know machine learning. Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, Volume 3, Number 9, November 2005.
  • MALLET is a Java toolkit for machine learning applied to natural language. It provides facilities for document classification, information extraction, part-of-speech tagging, noun phrase segmentation, general finite state transducers and classification, and much more---all desgined to be extremely efficient for large data and feature sets. Although quite mature in functionality, documentation is still sparse.
  • Three of my papers made it into CiteSeer's list of most cited computer science papers.