Andrew McGregor

Associate Professor

See also Google Scholar and DBLP.

Journal Publications:

  1. Trace Reconstruction: Generalized and Parameterized
    IEEE Transactions on Information Theory (with A. Krishnamurthy, A. Mazumdar, S. Pal)
  2. Correlation Clustering in Data Streams
    Algorithmica (with K. Ahn, G. Cormode, S. Guha, A. Wirth)
  3. Verifiable Stream Computation and Arthur–Merlin Communication
    SIAM Journal of Computing, 2019 (with A. Chakrabarti, G. Cormode, J. Thaler, S. Venkatasubramanian)
  4. Storage Capacity as an Information-Theoretic Analogue of Vertex Cover
    IEEE Transactions on Information Theory, 2019 (with A. Mazumdar, S. Vorotnikova)
  5. Structural Results on Matching Estimation with Applications to Streaming.
    Algorithmica, 2019 (with M. Bury, E. Grigorescu, M. Monemizadeh, C. Schwiegelshohn, S. Vorotnikova)
  6. Better Approximation of The Streaming Maximum Coverage Problem
    Theory of Computer Systems 2019 (with H. Vu)
  7. AUTOMAN: A Platform for Integrating Human-Based and Digital Computation
    Communications of the ACM 59(6): 102-109 2016 (with D. Barowy, C. Curtsinger, E. Berger)
  8. Robust Lower Bounds for Communication and Stream Computation
    Theory of Computing 2016 (with A. Chakrabarti G. Cormode)
  9. The matrix mechanism: optimizing linear counting queries under differential privacy
    VLDB Journal 2015 (with C. Li, G. Miklau, M. Hay, V. Rastogi)
  10. Space-Efficient Estimation of Statistics over Sub-Sampled Streams
    Algorithmica, 2015. (with A. Pavan, S. Tirthapura, D. Woodruff)
  11. Annotations in Data Streams
    ACM Transactions on Algorithms, 11 (2014), no. 1, pg. 1-30 (with A. Chakrabarti, G. Cormode, J. Thaler)
  12. Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition
    SIAM Journal of Computing, 42 (2013), no. 1, pg. 61–83. (with A. Chakrabarti, G. Cormode, R. Kondapally)
  13. SCALLA: A Platform for Scalable One-Pass Analytics using MapReduce
    ACM Trans. Database Syst, 37 (2012), no. 4, pg. 1-43 (with B. Li, E. Mazur, Y. Diao, P. Shenoy)
  14. CLARO: Modeling and Processing Uncertain Data Streams
    VLDB Journal, 21 (2012), no. 5, pg. 651-676 (with T. Tran, L. Peng, Y. Diao, A. Liu)
  15. A Near-Optimal Algorithm for Computing the Entropy of a Stream
    ACM Transactions on Algorithms, 6 (2010), no. 3, pg. 1-21 (with A. Chakrabarti, G. Cormode)
  16. On the Hardness of Approximating Stopping and Trapping Sets
    IEEE Transactions on Information Theory, 56 (2010), no. 4, pg. 1640-1650 (with O. Milenkovic)
  17. Sub-linear Estimation of Entropy and Information Distances
    ACM Transactions on Algorithms, 5 (2009), no. 4, pg. 1-16. (with S. Guha, S. Venkatasubramanian)
  18. Stream Order and Order Statistics: Quantile Estimation in Random-Order Streams
    SIAM Journal of Computing, 38 (2009), no. 5, 2044-2059 (with S. Guha)
  19. Graph Distances in the Data Stream Model
    SIAM Journal of Computing, 38 (2008), no. 5, pg. 1709-1727 (with J. Feigenbaum, S. Kannan, S. Suri, J. Zhang)
  20. Estimating Statistical Aggregates on Probabilistic Data Streams
    ACM Trans. Database Syst, 33 (2008), no. 4, pg. 1-30 (with T. S. Jayram, S. Muthukrishnan, E. Vee)
  21. Sketching Information Divergences
    Journal of Machine Learning, 72 (2008), no. 1-2, pg. 5-19 (with S. Guha, P. Indyk)
  22. Distance distribution of binary codes and the error probability of decoding
    IEEE Transactions on Information Theory, 51 (2005), no. 12, pg. 4237-4246. (with A. Barg)
  23. On Graph Problems in a Semi-Streaming Model
    Theoretical Computer Science, 348 (2005), no. 2-3, pg. 207-216. (with J. Feigenbaum, S. Kannan, S. Suri, J. Zhang)

Conference Publications (N.B. Links point to journal/extended versions if appropriate):

  1. Estimation of Entropy in Constant Space with Improved Sample Complexity
    ISIT 2024 (with R. Sengupta)
  2. Tight Approximations for Graphical House Allocation
    AAMAS 2024 (with H. Hosseini, R. Sengupta, R. Vaish and V. Viswanathan)
  3. Estimation of Entropy in Constant Space with Improved Sample Complexity
    NeurIPS 2022 (with M. Aliakbarpour, J. Nelson, E. Waingarten)
  4. Estimation of Entropy in Constant Space with Improved Sample Complexity
    NeurIPS 2022 (with M. Aliakbarpour, J. Nelson, E. Waingarten)
  5. Improving the Efficiency of the PC Algorithm by Using Model-Based Conditional Independence Tests
    CML4Impact 2022 (NeurIPS Workshop) (with E. Cai, D. Jensen)
  6. Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries
    ESA 2022 (with R. Addanki, C. Musco)
  7. Graph Reconstruction from Random Subgraphs
    ICALP 2022 (with R. Sengupta)
  8. Improved Approximation and Scalability for Fair Max-Min Diversification
    ICDT 2022 (with R. Addanki, A. Meliou, Z. Moumoulidou)
  9. PredictRoute: A Network Path Prediction Toolkit.
    Sigmetrics 2021 (with R. Singh, D. Tench, P. Gill)
  10. Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering
    AISTATS 2021 (with C. Greenberg, S. Macaluso, N. Monath, J. Lee, P. Flaherty, K. Cranmer, A. McCallum)
  11. Intervention Efficient Algorithms for Approximate Learning of Causal Graphs
    ALT 2021 (with R. Addanki, C. Musco)
  12. Diverse Data Selection under Fairness Constraints
    ICDT 2021 (with Z. Moumoulidou, A. Meliou)
  13. Maximum Coverage in the Data Stream Model: Parameterized and Generalized
    ICDT 2021 (with D. Tench, H. Vu)
  14. Cache me Outside: A New Look at DNS Cache Probing
    PAM 2021 (with A. Niaki, W. Marczak, S. Farhoodi, P. Gill, N. Weaver)
  15. Efficient Intervention Design for Causal Discovery with Latents
    ICML 2020 (with R. Addanki, S. Kasiviswanathan, C. Musco)
  16. Triangle and Four Cycle Counting in the Data Stream Model
    PODS 2020 (with S. Vorotnikova)
  17. Algebraic and Analytic Approaches for Parameter Learning in Mixture Models
    ALT 2020 (with A. Krishnamurthy, A. Mazumdar, S. Pal)
  18. Vertex Ordering Problems in Directed Graph Streams
    SODA 2020 (with A. Chakrabarti, P. Ghosh, S. Vorotnikova)
  19. Sample Complexity of Learning Mixture of Sparse Linear Regressions
    NeurIPS 2019 (with A. Krishnamurthy, A. Mazumdar, S. Pal)
  20. Trace Reconstruction: Generalized and Parameterized
    ESA 2019 (with A. Krishnamurthy, A. Mazumdar, S. Pal)
  21. Mesh: Compacting Memory Management for C/C++ Applications
    PLDI 2019 (with B. Powers, D. Tench, E. Berger)
  22. The Complexity of Counting Cycles in the Adjacency List Streaming Model
    PODS 2019 (with J. Kallaugher, E. Price, S. Vorotnikova)
  23. Compact Representation of Uncertainty In Clustering
    NeurIPS 2018 (with C. Greenberg, A. Kobren, N. Monath, P. Flaherty, A. McCallum)
  24. Connect the Dots to Prove It: A Novel Way to Learn Proof Construction
    SIGCSE 2018 (with M. McCartin-Lim, B. Woolf)
  25. A Simple, Space-Efficient, Streaming Algorithm for Matchings in Low Arboricity Graphs
    SOSA 2018 (with S. Vorotnikova)
  26. Storage Capacity as an Information-Theoretic Analogue of Vertex Cover
    ISIT 2017 (with A. Mazumdar, S. Vorotnikova)
  27. Better Approximation of The Streaming Maximum Coverage Problem
    ICDT 2017 (with H. Vu)
  28. Planar Matchings in Streams Revisited
    APPROX 2016 (with S. Vorotnikova)
  29. Stochastic Streams: Sample Complexity vs. Space Complexity
    ESA 2016 (with M. Crouch, G. Valiant, D. Woodruff)
  30. Better Algorithms for Counting Triangles in Data Streams
    PODS 2016 (with S. Vorotnikova, H. Vu)
  31. Sketching, Embedding, and Dimensionality Reduction for Information Spaces
    AISTATS 2016 (with A. Abdullah, R. Kumar, S. Vassilvitskii, S. Venkatasubramanian)
  32. Kernelization via Sampling with Applications to Dynamic Graph Streams
    SODA 2016 (with R. Chitnis, G. Cormode, H. Esfandiari, M. Hajiaghayi, M. Monemizadeh, S. Vorotnikova)
  33. Run Generation Revisited: What Goes Up May or May Not Come Down
    ISAAC 2015 (with M. Bender, S. McCauley, S. Singh, H. T. Vu)
  34. Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution
    ICDM 2015 (with O. Simpson, C. Seshadhri)
  35. Densest Subgraph in Dynamic Graph Streams
    MFCS 2015 (with D. Tench, S. Vorotnikova, H. Vu)
  36. Correlation Clustering in Data Streams
    ICML 2015 (with K. Ahn, G. Cormode, S. Guha, A. Wirth)
  37. Evaluating Bayesian Networks via Data Streams
    COCOON 2015 (with H. T. Vu)
  38. Vertex and Hyperedge Connectivity in Dynamic Graph Streams
    PODS 2015 (with S. Guha, D. Tench)
  39. Verifiable Stream Computation and Arthur–Merlin Communication
    CCC 2015 (with A. Chakrabarti, G. Cormode, J. Thaler, S. Venkatasubramanian)
  40. Trace Reconstruction Revisited
    ESA 2014 (with E. Price, S. Vorotnikova)
  41. Dynamic Graphs in the Sliding-Window Model
    ESA 2013 (with M. Crouch, D. Stubbs)
  42. Sketching Earth-Mover Distance on Graph Metrics
    APPROX 2013 (with D. Stubbs)
  43. Spectral Sparsification of Dynamic Graph Streams
    APPROX 2013 (with K. Ahn, S. Guha)
  44. Efficient Nearest-Neighbor Search in the Probability Simplex
    ICTIR 2013 (with K. Krstovski, D. Smith, H. Wallach)
  45. Homomorphic Fingerprints under Misalignments
    STOC 2013 (with A. Andoni, A. Goldberger, E. Porat)
  46. AUTOMAN: A Platform for Integrating Human-Based and Digital Computation
    OOPSLA 2012 (with D. Barowy, C. Curtsinger, E. Berger). [Press Coverage].
  47. Approximate Principal Direction Trees
    ICML 2012 (with M. McCartin-Lim, R. Wang)
  48. Space-Efficient Estimation of Statistics over Sub-Sampled Streams
    PODS 2012 (with A. Pavan, S. Tirthapura, D. Woodruff)
  49. Graph Sketches: Sparsfiers, Spanners, and Subgraphs
    PODS 2012 (with K. Ahn, S. Guha)
  50. Analyzing Graph Structure via Linear Measurements
    SODA 2012 (with K. Ahn, S. Guha)
  51. The Shifting Sands Algorithm
    SODA 2012 (with P. Valiant)
  52. Periodicity and Cyclic Shifts via Linear Sketches
    APPROX 2011 (with M. Crouch)
  53. A Platform for Scalable One-pass Analytics using MapReduce
    SIGMOD 2011 (with B. Li, E. Mazur, Y. Diao, P. Shenoy)
  54. Polynomial Fitting of Data Streams with Applications to Codeword Testing
    STACS 2011 (with A. Rudra, S. Uurtamo)
  55. Fast Query Expansion Using Approximations of Relevance Models
    CIKM 2010 (with M. Cartright, J. Allan, V. Lavrenko)
  56. The Limits of Two-Party Differential Privacy
    FOCS 2010 (with I. Mironov, T. Pitassi, O. Reingold, K. Talwar, S. Vadhan)
  57. Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition
    FOCS 2010 (with A. Chakrabarti, G. Cormode, R. Kondapally)
  58. Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations
    VLDB 2010 (with T. Tran, Y. Diao, L. Peng, A. Liu)
  59. Optimizing Linear Counting Queries Under Differential Privacy
    PODS 2010 (with C. Li, M. Hay, V. Rastogi, G. Miklau)
  60. Space-Efficient Estimation of Robust Statistics and Distribution Testing
    ICS 2010 (with S. Chien, K. Ligett)
  61. Probabilistic Histograms for Probabilistic Data
    VLDB 2009 (with G. Cormode, A. Deligiannakis, M. Garofalakis)
  62. The Oil Searching Problem
    ESA 2009 (with K. Onak, R. Panigrahy)
  63. Annotations in Data Streams
    ICALP 2009 (with A. Chakrabarti, G. Cormode).
  64. Sampling to Estimate Conditional Functional Dependencies
    SIGMOD 2009 (with G. Cormode, L. Golab, F. Korn, D. Srivastava, X. Zhang)
  65. Finding Metric Structure in Information Theoretic Clustering
    COLT 2008 (with K. Chaudhuri)
  66. Tight Lower Bounds for Multi-Pass Stream Computation via Pass Elimination
    ICALP 2008 (with S. Guha)
  67. Approximation Algorithms for Clustering Uncertain Data
    PODS 2008 (with G. Cormode)
  68. Robust Lower Bounds for Communication and Stream Computation
    STOC 2008 (with A. Chakrabarti, G. Cormode) Slides
  69. Sorting and Selection with Random Costs
    LATIN 2008 (with S. Angelov, K. Kunal)
  70. Declaring Independence via the Sketching of Sketches
    SODA 2008 (with P. Indyk)
  71. On the Hardness of Approximating Stopping and Trapping Sets in LDPC Codes
    ITW 2007 (with O. Milenkovic)
  72. Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming
    ICALP 2007 (with S. Guha) Slides
  73. Checking and Spot-Checking of Heaps
    ICALP 2007 (with M. Chu, S. Kannan) Slides
  74. Sketching Information Divergences
    COLT 2007 (with S. Guha, P. Indyk)
    Invited to Special Issue of Machine Learning Journal
  75. Estimating Statistical Aggregates on Probabilistic Data Streams
    PODS 2007 (with T. S. Jayram, S. Muthukrishnan, E. Vee)
    Invited to Special Issue of ACM Transactions on Database Systems
  76. Space-Efficient Sampling
    AISTATS 2007 (with S. Guha)
  77. Island Hopping and Path Colouring with Applications to WDM Network Design
    SODA 2007 (with F.B. Shepherd) Slides
  78. A Near-Optimal Algorithm for Computing the Entropy of a Stream
    SODA 2007 (with A. Chakrabarti, G. Cormode) Slides
  79. Spatial Scan Statistics: Approximations and Performance Study
    KDD 2006 (with D. Agarwal, J. Phillips, S. Venkatasubramanian, Z. Zhu)
  80. Approximate Quantiles and the Order of the Stream
    PODS 2006 (with S. Guha)
  81. Streaming and Sublinear Approximation of Entropy and Information Distances
    SODA 2006 (with S. Guha, S. Venkatasubramanian)
  82. More on Reconstructing Strings from Random Traces: Insertions and Deletions
    ISIT 2005 (with S. Kannan) Slides
  83. Approximating the Best-Fit Tree Under L_p Norms
    APPROX 2005 (with B. Harb, S. Kannan) Slides
  84. Finding Matchings in the Streaming Model
    APPROX 2005 Slides
  85. Graph Distances in the Streaming Model: The Value of Space
    SODA 2005. (with J. Feigenbaum, S. Kannan, S. Suri, J. Zhang) Slides
  86. A Problem in Scheduling: Your Time Starts Now...
    FUN with Algorithms 2004
  87. List decoding of concatenated codes: improved performance estimates
    ISIT 2004 (with A. Barg) Slides
  88. On Graph Problems in a Semi-Streaming Model
    ICALP 2004 (with J. Feigenbaum, S. Kannan, S. Suri, J. Zhang)
    Invited to Special Issue of Theoretical Computer Science
  89. Reconstructing Strings from Random Traces
    SODA 2004 (with T. Batu, S. Kannan, S. Khanna)
  90. More on the Reliability Function of the Binary Symmetric Channel
    ISIT 2003 (with A. Barg) Slides
  91. Distance distribution of binary codes and the error probability of decoding
    WCC 2003 International Workshop on Coding and Cryptography (with A. Barg)
  92. Physical Modeling of Musical Instruments using Bond Graphs
    Brazilian Symposium on Computer Music 1999 (with E. Miranda, P. Gawthrop)
Manuscripts, Invited Papers, etc.:
  1. Graph Stream Algorithms: A Survey
    SIGMOD Record
  2. Graph Sketching and Streaming: New Approaches for Analyzing Massive Graphs
    CSR 2017
  3. Towards a Theory of Homomorphic Compression
    Invited Session at CiE 2013
  4. Better Bounds for Frequency Moments in Random-Order Streams
    Manuscript (with A. Andoni, K. Onak, R. Panigrahy)
  5. Graph Mining in Streams
    Entry for the Encyclopedia of Database Systems
  6. Processing Data Streams
    Ph.D. Thesis (Advisor: Sampath Kannan) Slides
  7. Open Questions In Data Streams and Related Topics
    Questions arising at the IITK Workshop on Algorithms for Data Streams 2006.
  8. Open Questions In Data Streams, Property Testing, and Related Topics
    Questions arising at Sublinear Algorithms 2011 and IITK Workshop on Algorithms for Data Streams 2009 (with P. Indyk, I. Newman, K. Onak)

NB: Since most of the papers above are published, the copyright has been transferred to the respective publishers. Therefore, the papers cannot be duplicated for commercial purposes. See below for ACM's copyright notice.

Copyright 20xy by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that new copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.