|
SOPA: Scalable one-pass analytics on high volumes of data.
Recently, MapReduce has emerged as a popular programming model for processing large datasets using a cluster of machines. However, the MapReduce model is geared towards batch processing and requires the data set to be fully loaded into the cluster before running analytical queries. In this project, we examine, from a systems standpoint, what architectural design changes are necessary to bring the benefits of the MapReduce model to fast one-pass analytics. Our work includes theoretical and empirical analyses of existing MapReduce systems and the proposal of a new data analysis platform that employs advanced hashing and frequency analysis to enable scalable fast one-pass analytics.
CLARO: Uncertain Data Stream Processing.
The goal of this project is to design and develop a stream processing system that captures data uncertainty from data collection to query processing to final result generation.
Such uncertain data stream processing is crucial to many real-world applications such as hazardous weather monitoring and traffic monitoring.
To achieve this goal, our project takes a principled approach grounded in probability and statistical theory to support uncertainty as a first-class citizen, and efficiently integrate this approach into high-volume stream processing.
In particular, we aim to capture uncertainty of raw data streams as they are
produced as well as changes of uncertainty as data propagates through various query processing operators.
STONES: Flash-based Data Management Systems.
Recent advances in flash technology have enabled embedded devices, personal computers, and high-end servers to be equipped with high-capacity flash memory and its packaged devices such as solid state drives (SSDs). Flash memory and SSDs provide faster random access and more energy-efficiet operations over tradiational hard disks. In this project, we are designing new storage systems and query processing algorithms for large-scale data analysis and high-performance databases that employ hybrid storage of flash memory and hard disks.
SASE: Complex Event Processing over Streams.
We study stream processing in the context of large-scale event-based systems that are gaining adoption in applications such as supply chain management, financial services, and network and application monitoring.
These systems create high volumes of events. End applications require these events to be filtered and correlated for complex pattern detection, aggregated on different temporal and geographic scales, and transformed to new events that reach a semantic level appropriate for the applications. We address issues involved in stream-based event processing ranging from the query language to computation complexity to fast implementation.
SPIRE: RFID Data Stream Processing.
Radio Frequency Identification (RFID) technology is gaining acceptance in an increasing number of applications for tracking and monitoring purposes. Despite its promise to provide unprecedented visibility in various domains, RFID technology presents numerous challenges, including incomplete and noisy data, lack of information about inter-object relationships, and high volumes.
In this project, we develop an RFID stream processing system that employs probabilistic inference to derive locations of unobserved objects and inter-object relationships such as containments and further supports probabilistic query processing to derive high-level information.
|