Yanlei Diao

Assistant Professor
Department of Computer Science
University of Massachusetts Amherst

Email:{first-name}@cs.umass.edu
Phone:413.545.1135
Fax:413.545.1249
Address: Department of Computer Science Room 232
140 Governors Drive
Amherst, MA 01003-9264
Assistant: Rachel Lavery {last-name}@cs.umass.edu

[Home]  [Funding]  [Teaching]  [Publications]  [Talks]  [Service]  [Misc]  [Curriculum Vitae]  [DB Group]


Research Interests

Information architectures and database management systems, with a focus on data streams, sensor data management, data dissemination, XML query processing, and learning-based data processing.

Database and Information Management Lab, co-directed with Prof. Gerome Miklau (DBLab wiki)
Systems Group at UMass Amherst, member
Center for Advanced RFID Research, member

 


Current Projects

CLARO: Uncertain Data Stream Processing. The goal of this project is to design and develop a stream processing system that captures data uncertainty from data collection to query processing to final result generation. Such uncertain data stream processing is crucial to many real-world applications such as hazardous weather monitoring, object tracking and monitoring, and traffic monitoring. To achieve this goal, our project takes a principled approach grounded in probability and statistical theory to support uncertainty as a first-class citizen, and efficiently integrate this approach into high-volume stream processing. The project has two main contributions. The first contribution of the project is to capture uncertainty of raw data streams emanating from sensing devices. The second is to capture uncertainty as data propagates through various query processing operators.

SASE: Complex Event Processing over Streams. We study stream processing in the context of large-scale event-based systems that are gaining adoption in applications such as supply chain management, surveillance, network and application monitoring, and environmental monitoring. These systems create high volumes of events. End applications require these events to be filtered and correlated for complex pattern detection, aggregated on different temporal and geographic scales, and transformed to new events that reach a semantic level appropriate for the applications. We address issues involved in stream-based event processing ranging from the query language to computation complexity to fast implementation.

SPIRE: RFID Data Management. Radio Frequency Identification (RFID) technology is gaining acceptance in an increasing number of applications for tracking and monitoring purposes. Despite its promise to provide unprecedented visibility in various domains, RFID technology presents numerous challenges, including incomplete and noisy data, lack of information about inter-object relationships, and high volumes. In this project, we design and develop an efficient inference and compression system over RFID streams. It provides accurate interpretation of incomplete and insufficient raw data; in particular, it infers locations of unobserved objects and inter-object relationships such as collocation and containment. To handle high data volumes, SPIRE performs online interpretation, enabling online compression by identifying and discarding redundant data close to the hardware.

STONES: Low Power Sensor Databases. Recent advances in flash technology have enabled sensor nodes to be equipped with high-capacity local storage. We are designing new sensor databases on flash that support power-constrained processing and multi-resolution storage.

 


Past Projects

Fast and Memory-Efficient Packet Content Scanning. Packet content scanning compares the packet payload against a set of patterns specified as regular expressions. Memory requirements using traditional methods for fast packet scanning are prohibitively high. We develop regular expression rewrite techniques to reduce memory usage, and grouping schemes to increase the regular expression matching speed without increasing memory usage. Our implementation can achieve orders-of-magnitude performance improvements over the implementations used in the Linux L7-filter and Snort system. Such efficient packet content scanning enables new technologies such as real-time worm detection, content lookup in overlay networks, fine-grained load balancing, etc.

ONYX: Internet-Scale XML Data Dissemination. We study Internet-scale data dissemination that delivers XML-encoded documents from multiple publishing sites to millions of subscribers based on the subscribers' data interests. We explore the idea of content-based routing of documents in distributed dissemination systems. We seek to enhance such data dissemination with advanced services such as stateful publish/subscribe and QoS. We investigate implementations that are able to meet demanding efficiency and scalability requirements.

YFilter: High-Volume XML Message Brokering. We design a message brokering system that provides fast, on-the-fly filtering of incoming XML messages for large numbers of simultaneous queries, and transforms the matching messages according to recipient-specific requirements. We explore the key issues including shared processing of queries for efficient and scalable filtering and leveraging the filtering solutions for customized result generation. We released YFilter 1.0, a freely available software system containing the filtering engine and the query workload generator of YFilter.

Stream-based XQuery Processing. We develop a memoization-based approach to shared processing for the full XQuery language in a stream-based environment. We implement the approach by extending the streaming XQuery processor that BEA Systems incorporates as part of their BEA WebLogic Integration 8.1 product. We demonstrate the effectiveness of the approach in typical use cases of XQuery.

 


Students

PhD Students
    Thanh Tran
    Junghee Jo
    Boduo Li
    Liping Peng
    Haopeng Zhang

MS Students
    Jagrati Agrawal

Graduated MS Students
    Richard Cocci (State Street Corporation, Boston)
    Ravishankar Guruswamy Rajamony (Goldman Sachs)
    Daniel Gyllstrom (UMass)

Visiting Students
    Zhao Cao
    Yanming Nie