Andrew McCallum
Contact Info
Bio & Affiliations
Vita
Teaching
Publications
Research & Projects
Code & Data
Students & other collab's
Activities & Events
Personal
Links:
UMass ML Seminar
|
Research
The main goal of my research is to dramatically increase our ability to mine actionable knowledge from unstructured text. I am especially interested in information extraction from the Web, understanding the connections between people and between organizations, expert finding, social network analysis, and mining the scientific literature & community. Toward this end my group develops and employs various methods in statistical machine learning, natural language processing, information retrieval and data mining---tending toward probabilistic approaches and graphical models. For more information see our current projects and publications.
Teaching
This past Fall I taught CMPSCI 585, an Introduction to Natural Language Processing.
News & Notables
- With Sam Roweis, am I Co-Program-Chair of ICML 2008.
- We have publicly launched Rexa, a new research paper search engine. It is a sibling to CiteSeer and Google Scholar, except that it provides search and browsing over more "object types", including not just papers, but also people, grants and topics.
- Charles Sutton and I have a comprehensive introduction to conditional random fields, a book chapter in Lise Getoor and Ben Taskar's book on statistical relational learning.
- I've writen an introduction to information extraction by machine learning, intended for an audience that doesn't know machine learning. Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, Volume 3, Number 9, November 2005.
- MALLET is a Java toolkit for machine learning applied to natural language. It provides facilities for document classification, information extraction, part-of-speech tagging, noun phrase segmentation, general finite state transducers and classification, and much more---all desgined to be extremely efficient for large data and feature sets. Although quite mature in functionality, documentation is still sparse.
- Three of my papers made it into CiteSeer's list of most cited computer science papers.
|