|
Research Interests
|
Information architectures and database management systems, with
a focus on
data streams, sensor data management,
data dissemination,
XML query processing, and
learning-based data processing.
Database and Information Management Lab,
co-directed with Prof. Gerome Miklau (DBLab wiki)
Systems Group at UMass Amherst, member
Center for Advanced RFID Research, member
|
|
Current Projects
|
CLARO: Uncertain Data Stream Processing.
The goal of this project is to design and develop a stream processing system that captures data uncertainty from data collection to query processing to final result generation.
Such uncertain data stream processing is crucial to many real-world applications such as hazardous weather monitoring, object tracking and monitoring, and traffic monitoring.
To achieve this goal, our project takes a principled approach grounded in probability and statistical theory to support uncertainty as a first-class citizen, and efficiently integrate this approach into high-volume stream processing. The project has two main contributions. The first contribution of the project is to capture uncertainty of raw data streams emanating from sensing devices. The second is to capture uncertainty as data propagates through various query processing operators.
SASE: Complex Event Processing over
Streams.
We study stream processing in the context of large-scale event-based systems that are gaining
adoption in applications such as supply chain management, surveillance, network and application
monitoring, and environmental monitoring.
These systems create high volumes of events. End
applications require these events to be filtered and correlated for complex pattern
detection, aggregated on different temporal and geographic scales, and transformed
to new events that reach a semantic level appropriate for the applications.
We address issues
involved in stream-based event processing ranging from the query language to computation
complexity to fast implementation.
SPIRE: RFID Data Management.
Radio Frequency Identification (RFID)
technology is gaining acceptance in an increasing number of
applications for tracking and monitoring purposes.
Despite its promise to provide unprecedented visibility
in various domains, RFID technology presents numerous challenges,
including incomplete and noisy data, lack of information about inter-object relationships,
and high volumes.
In this project, we design and develop
an efficient inference and compression system over RFID streams.
It provides accurate interpretation of incomplete and insufficient raw data;
in particular, it
infers locations of unobserved objects and inter-object relationships such as
collocation and containment.
To handle high data volumes, SPIRE performs online interpretation, enabling online compression by
identifying and discarding redundant data
close to the hardware.
STONES: Low Power Sensor Databases.
Recent advances in flash technology have enabled sensor nodes to be equipped with high-capacity
local storage. We are designing new sensor databases on flash that support power-constrained
processing and multi-resolution storage.
|
|
Past Projects
|
Fast and Memory-Efficient Packet Content Scanning.
Packet content scanning compares the packet payload against a set of patterns specified as regular
expressions. Memory requirements using traditional methods for fast packet scanning are
prohibitively high. We develop regular expression rewrite techniques to reduce memory usage, and
grouping schemes to increase the regular expression matching speed without increasing memory
usage. Our implementation can achieve orders-of-magnitude performance improvements over the
implementations used in the Linux L7-filter and Snort system. Such efficient packet content
scanning enables new technologies such as real-time worm detection, content lookup in overlay
networks, fine-grained load balancing, etc.
ONYX: Internet-Scale XML Data Dissemination.
We study Internet-scale data dissemination that delivers XML-encoded documents from multiple
publishing sites to millions of subscribers based on the subscribers' data interests. We explore
the idea of content-based routing of documents in distributed dissemination systems. We seek to
enhance such data dissemination with advanced services such as stateful publish/subscribe and QoS.
We investigate implementations that are able to meet demanding efficiency and scalability
requirements.
YFilter: High-Volume XML Message Brokering.
We design a message brokering system that provides fast, on-the-fly filtering of incoming XML
messages for large numbers of simultaneous queries, and transforms the matching messages according
to recipient-specific requirements. We explore the key issues including shared processing of
queries for efficient and scalable filtering and leveraging the filtering solutions for customized
result generation. We released YFilter
1.0, a freely available software system containing the
filtering engine and the query workload generator of YFilter.
Stream-based XQuery Processing.
We develop a memoization-based approach to shared processing for the full XQuery language in a
stream-based environment. We implement the approach by extending the streaming XQuery processor
that BEA Systems incorporates as part of their BEA WebLogic
Integration 8.1 product. We demonstrate the effectiveness of the approach in typical use cases of
XQuery.
|