Yanlei Diao

Assistant Professor
Department of Computer Science
University of Massachusetts Amherst

Email:     {first-name}@cs.umass.edu
Phone:     413.545.1135
Fax:         413.545.1249
Address:  Computer Science Research Center
                University of Massachusetts
                140 Governors Drive
                Amherst, MA 01003
-9264

Curriculum Vitae (pdf)  
 


[Research]    [Teaching]    [Publications]    [Talks]    [Service]


Research Interests

Information architectures and database management systems, with a focus on data streams, sensor data management, data dissemination, XML query processing, and learning-based data processing.

Member of the Database Group, the Center for Advanced RFID Research, and the Systems Group at UMass Amherst.

 


Projects

SASE: Complex Event Processing over Streams. We study stream processing in the context of large-scale event-based systems that are gaining adoption in applications such as supply chain management, surveillance, network and application monitoring, and environmental monitoring. These systems create high volumes of events. End applications require these events to be filtered and correlated for complex pattern detection, aggregated on different temporal and geographic scales, and transformed to new events that reach a semantic level appropriate for the applications. We address issues involved in stream-based event processing ranging from the query language to computation complexity to fast implementation.

SPIRE: Inference and Compression over RFID Streams. Radio Frequency Identification (RFID) technology is gaining acceptance in an increasing number of applications for tracking and monitoring purposes. Despite its promise to provide unprecedented visibility in various domains, RFID technology presents numerous challenges, including incomplete and noisy data, lack of information about inter-object relationships, and high volumes. In this project, we design and develop an efficient inference and compression system over RFID streams. It provides accurate interpretation of incomplete and insufficient raw data; in particular, it infers locations of unobserved objects and inter-object relationships such as collocation and containment. To handle high data volumes, SPIRE performs online interpretation, enabling online compression by identifying and discarding redundant data close to the hardware.

STONES: Low Power Sensor Databases. Recent advances in flash technology have enabled sensor nodes to be equipped with high-capacity local storage. We are designing new sensor databases on flash that support power-constrained processing and multi-resolution storage.

ONYX: Internet-Scale XML Data Dissemination. We study Internet-scale data dissemination that delivers XML-encoded documents from multiple publishing sites to millions of subscribers based on the subscribers' data interests. We explore the idea of content-based routing of documents in distributed dissemination systems. We seek to enhance such data dissemination with advanced services such as stateful publish/subscribe and QoS. We investigate implementations that are able to meet demanding efficiency and scalability requirements.

Fast and Memory-Efficient Packet Content Scanning. Packet content scanning compares the packet payload against a set of patterns specified as regular expressions. Memory requirements using traditional methods for fast packet scanning are prohibitively high. We develop regular expression rewrite techniques to reduce memory usage, and grouping schemes to increase the regular expression matching speed without increasing memory usage. Our implementation can achieve orders-of-magnitude performance improvements over the implementations used in the Linux L7-filter and Snort system. Such efficient packet content scanning enables new technologies such as real-time worm detection, content lookup in overlay networks, fine-grained load balancing, etc.

YFilter: High-Volume XML Message Brokering. We design a message brokering system that provides fast, on-the-fly filtering of incoming XML messages for large numbers of simultaneous queries, and transforms the matching messages according to recipient-specific requirements. We explore the key issues including shared processing of queries for efficient and scalable filtering and leveraging the filtering solutions for customized result generation. We released YFilter 1.0, a freely available software system containing the filtering engine and the query workload generator of YFilter.

Stream-based XQuery Processing. We develop a memoization-based approach to shared processing for the full XQuery language in a stream-based environment. We implement the approach by extending the streaming XQuery processor that BEA Systems incorporates as part of their BEA WebLogic Integration 8.1 product. We demonstrate the effectiveness of the approach in typical use cases of XQuery.

 


Teaching

Spring 2008, CMPSCI 445: Information Systems

Fall 2007, CMPSCI 445: Information Systems

Spring 2007, CMPSCI 645: Database Design and Implementation

Fall 2006, CMPSCI 691LL: Networked Information Systems

Spring 2006, CMPSCI 645: Database Design and Implementation (with Gerome Miklau)

 


Selected Publications

2008

Fast Packet Pattern Matching Algorithms. Fang Yu, Yanlei Diao, Randy Katz, T. V. Lakshman. Book Chapter. To appear in Algorithms for Next Generation Architectures edited by Graham Cormode and Marina Thottan.

Efficient Pattern Matching over Event Streams. Jagrati Agrawal, Daniel Gyllstrom, Yanlei Diao, and Neil Immerman. SIGMOD 2008. (pdf)

On Supporting Kleene Closure over Event Streams. Daniel Gyllstrom, Jagrati Agrawal, Yanlei Diao, and Neil Immerman. ICDE 2008. (pdf)

Efficient Data Interpretation and Compression over RFID Streams. Richard Cocci, Thanh Tran, Yanlei Diao, and Prashant Shenoy. ICDE 2008. (pdf) (tech report)

Publish/Subscribe over Streams. Yanlei Diao and Michael Franklin. Article. To appear in Encyclopedia of Database Systems. (pdf)

XML Publish/Subscribe. Yanlei Diao and Michael Franklin. Article. To appear in Encyclopedia of Database Systems. (pdf)

2007

SPIRE: Scalable Processing of RFID Event Streams. Richard Cocci, Yanlei Diao, and Prashant Shenoy. In Proceedings of the 5th RFID Academic Convocation, April 2007. (pdf)

SASE+: An Agile Language for Kleene Closure over Event Streams. Yanlei Diao, Neil Immerman, and Daniel Gyllstrom. UMass Technical Report 07-03. (pdf)

SASE: Complex Event Processing over Streams. Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, and Gordon Anderson. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR 2007) , Asilomar, CA, January 2007. Demo proposal. (pdf)

Re-thinking Data Management for Storage-centric Sensor Networks. Yanlei Diao, Deepak Ganesan, Gaurav Mathur, and Prashant Shenoy. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR 2007) , Asilomar, CA, January 2007. (pdf)

2006

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection. Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman, and Randy H. Katz. In Proceedings of ACM / IEEE Symposium on Architectures for Networking and Communications Systems (ANCS 2006) , San Jose, CA, December 3-5, 2006. (pdf)

High-Performance Complex Event Processing over Streams. Eugene Wu, Yanlei Diao, and Shariq Rizvi. SIGMOD 2006, June 2006. (pdf) (ppt)


Before 2006

 Yanlei Diao. Query Processing for Large-Scale XML Message Brokering. PhD Dissertation. August, 2005. ACM SIGMOD Dissertation Award Honorable Mention (pdf)

  YFilter 1.0 code release. October 2004.  

 Yanlei Diao, Shariq Rizvi, and Michael J. Franklin. Towards an Internet-Scale XML Dissemination Service. In Proceedings of VLDB2004, August 2004. (pdf) (ppt)

 Yanlei Diao, Daniela Florescu, Donald Kossmann, Michael J. Carey, and Michael J. Franklin. Implementing Memoization in a Streaming XQuery Processor. In Proceedings of the 2nd International XML Database Symposium (XSym2004), August 2004. (pdf)

 Yanlei Diao, Michael J. Franklin. Query Processing for High-Volume XML Message Brokering. In Proceedings of VLDB 2003 , September 2003. (pdf) (ppt)

 Yanlei Diao, Mehmet Altinel, Michael J. Franklin, Hao Zhang, Peter Fischer. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. ACM TODS , December 2003. (pdf)

 Yanlei Diao, and Michael J. Franklin. High-Performance XML Filtering: An Overview of YFilter. IEEE Data Engineering Bulletin , March, 2003. (pdf)

 Yanlei Diao, Peter Fischer, Michael Franklin, Raymond To. YFilter: Efficient and Scalable Filtering of XML Documents. Demo paper. In Proceedings of ICDE 2002, February 2002. (pdf)

 Yanlei Diao, Hongjun Lu, Songting Chen, Zengping Tian. Toward Learning Based Web Query Processing. In Proceedings of VLDB 2000, September 2000. (pdf)(ppt

 Songting Chen, Yanlei Diao, Hongjun Lu, Zengping Tian. FACT: A Learning Based Web Query Processing System. Demo paper. In Proceedings of SIGMOD 2000, May 2000. (ppt

 Yanlei Diao, Hongjun Lu, Dekai Wu. A Comparative Study of Classification Based Personal E-mail Filtering. In Proceedings of PAKDD 2000, April 2000. (ps) (pdf)

 


Invited Talks

"SASE+: Expressing and Processing Complex Event Patterns over Streams"

  • New England Database Day, Cambridge, MA, Feb 4, 2008
  • Microsoft Research Center, Seattle, MA, Oct 12, 2007
  • StreamBase, Lexington, MA, Feb 16, 2007

"High-Performance Complex Event Processing over Streams"

  • Hong Kong Baptist University, China, Jan 16, 2007
  • Fudan University, China, Sep 7, 2006

"Query Processing for Large-Scale XML Message Brokering"

  • Cisco Systems Inc., San Jose, CA, Jan 5, 2007
  • Tsinghua University, China, Sep 6, 2006
  • Beijing University, China, Sep 5, 2006
  • Distinguished Faculty Lecture Series, University of Texas at Austin, Dec 12-14, 2005
  • AT&T Research Lab, Dec 15, 2005

 


Service and Activities

Program Committee Member

  • Co-chair, International Workshop on Data Management for Sensor Networks (DMSN 2008)
  • International Conference on Very Large Data Bases (VLDB), 2007, 2008
    VLDB PhD Workshop, 2008
  • International Conference on Data Engineering (ICDE), 2007, 2008
  • International Workshop on Networking Meets Databases (NetDB), 2008
  • International Workshop on RFID Data Mangement (RFDM), 2008
  • International Workshop on Data Management for Sensor Networks (DMSN), 2007
  • International Workshop on Scalable Stream Processing Systems (SSPS), 2007
  • International XML Database Symposium (XSym), 2006
  • ACM International Conference on Management of Data (SIGMOD), Demonstrations, 2005

Journal reviewer

  • ACM Transactions on Database Systems (TODS)
  • ACM Transactions on Information Systems (TOIS)
  • ACM Transactions on Internet Technologies (TOIT)
  • International Journal on Very Large Data Bases (VLDB Journal)
  • Journal of Computer Science and Technology (JCST)

 


Useful Links

DBLP
ACM SIGMOD
Database Conferences: SIGMOD, VLDB, PODS, ICDE, EDBT, SIGKDD, WWW... 
Database Journals: ACM TODS, SIGMOD Record, VLDB Journal, TKDE, IEEE Data Engineering Bulletin
New England Database Society

Academic advice:

  • "Tips for writing technical papers", Jennifer Widom, Stanford University. link
  • "On the academic interview circuit: an end-to-end discussion", Ugur Cetintemel, Brown University. SIGMOD Record 2001
  • "Interviewing During a Tight Job Market", Qiong Luo, Hong Kong University of Science and Technology, and Zachary Ives, University of Pennsylvania. SIGMOD Record 2002
  • "Time Management for New Faculty", Anastassia Ailamaki, Carnegie Mellon University, and Johannes Gehrke, Cornell University. SIGMOD Record 2003
  • Tao Xie's collection

XML data sources:

 

 


Last Modified: January 2006