DEPARTMENT SEMINAR 

Andrew McGregor
University of California, San Diego
Electrical and Computer Engineering

Tuesday, March 11, 2008
Computer Science Building, Room 151
4:00 PM

Faculty Host: Ramesh Sitaraman

"Computing on Streams: New Results and Directions"

Over the last ten years, the data stream model has become an active
topic of research. I think there are two main reasons for this. First,
it is a model that is applicable to a range of practical problems such
as monitoring network traffic, query-planning in databases, designing
I/O efficient algorithms for massive data sets, and aggregation in
sensor networks. Second, it is a rich source of theoretical questions
and there are strong connections with other areas of theoretical
computer science including communication complexity, compressed
sensing, metric embeddings, pseudo-random generators, and
approximation algorithms. 

In this talk, I will overview some of my recent work in the area. I
will start by discussing algorithmic results and then go into detail
about an algorithm for identifying correlations in data streams. This
work builds upon previous work on using random linear projections and
introduces the idea of “bi-linear sketches.” I will then discuss more
general complexity issues such as, a) the trade-off between memory
requirements and the number of passes taken over the data, and b)
whether it matters how the data is ordered. I will conclude by
presenting results on “robust lower bounds,” a notion we introduce in
an attempt to understand when data stream problems may be solved in
practice, even when they are “hard in theory.” 

This talk will include joint work with Amit Chakrabarti and Graham
Cormode (STOC 2008); Sudipto Guha (ICALP 2007); and Piotr Indyk (SODA
2008). 

Bio:
Andrew McGregor is a postdoctoral scholar in the newly formed
Information Theory and Applications Center at the University of
California, San Diego. He received his Ph.D. from the University of
Pennsylvania in 2007. During the four summers at graduate school, he
visited DIMACS, NJ and the Fundamental Mathematics Department at Bell
Labs. Prior to this, he received the Certificate of Advanced Study in
Mathematics (2001) and a B.A. in Mathematics (2000) from the
University of Cambridge. His interests include processing massive data
sets and data streams; clustering; approximation algorithms; sequence
and phylogenetic tree reconstruction; coding and information
theory. His work appears in theoretical computer science, database,
data mining, coding theory, and machine learning publications. 

Refreshments at 3:40 PM in the atrium, outside the presentation room.