Faculty Recruiting Support CICS

Recent Computer Science Ph.D. Graduates (Fall 2012)

Recent Computer Science Ph.D. graduates (Fall 2012)

Michael Bendersky; Information Retrieval with Query Hypergraphs; (W. Bruce Croft, Advisor); Sept. 2012; Software Engineer, Google, Inc.

In this thesis we focus on verbose natural language search queries. To this end, we propose an expressive query representation based on query hypergraphs. Unlike the existing query representations, query hypergraphs model the dependencies between arbitrary concepts in the query, rather than dependencies between single query terms.  Query hypergraphs are parameterized by importance weights, which are assigned based on their contribution to the retrieval effectiveness. 

Query hypergraphs are not limited to modeling the explicit query, and we develop two methods for query expansion using query hypergraphs. In these methods, the expansion concepts may come either from the retrieval corpus or from a combination of external information sources. We empirically demonstrate that query hypergraphs are significantly more effective than many of the current state-of-the-art retrieval methods. Query hypergraphs improve the retrieval performance for all query types, and, in particular, they exhibit the highest effectiveness gains for verbose queries.

Filip Jagodzinski; Towards Large Scale Validation of Protein Flexibility Using Rigidity Analysis; (Ileana Streinu, Advisor); Sept. 2012; Assistant Professor, Dept. of Computer Science, Central Washington Univ.

Proteins flex and bend to perform their functions. At the atomic level, their motions cannot be observed. Rigidity analysis is a graph-based technique that infers the flexibility of molecules. Due to the lack of convenient tools for curating protein data, the usefulness of rigidity analysis in inferring biophysical properties has been demonstrated on only a handful of molecules. Conversely there is no agreed-upon choice of modeling of important stabilizing interactions.

We make progress towards large-scale validation of protein flexibility using rigidity analysis. We develop the KINARI software that permits automated curation of protein data. Rigidity analysis of protein biological assemblies generated by KINARI provides information that would be missed if only the unprocessed data were analyzed. We develop KINARI-Mutagen, which permits evaluation of the effects of mutations. Finally, we systematically vary the modeling of inter-atomic interactions and measure how rigidity parameters correlate with experimental data.

[Note:  Special thanks to Filip for all of his efforts as a Significant Bits graduate student liaison for the past six years.]

Jin Young Kim; Retrieval and Evaluation Techniques for Personal Information; (W. Bruce Croft, Advisor); Sept. 2012; Applied Researcher, Microsoft Bing

Providing an effective mechanism for personal information retrieval is important for many applications, and requires different techniques than have been developed for general web search. This thesis focuses on developing retrieval models and representations for personal search, and on designing evaluation frameworks that can be used to demonstrate retrieval effectiveness in a personal environment.

From the retrieval model perspective, personal information can be viewed as a collection of multiple document types each of which has unique metadata. Based on this perspective, we propose a retrieval model that exploits document metadata and multi-type structure. Proposed retrieval models were found to be effective in other structured document collections, such as movies and job descriptions.

Evaluating these methods is particularly challenging for personal information due to privacy issues. This thesis introduces a set of techniques that enables realistic and repeatable evaluation of techniques for personal information retrieval. In particular, we describe techniques for simulating test collections and show that game-based user studies can collect more realistic usage data with relatively small cost.

Scott Kuindersma; Variable Risk Policy Search for Dynamic Robot Control; (Roderic Grupen and Andrew Barto, Advisors); Sept. 2012; Postdoctoral Associate, MIT Computer Science and Artificial Intelligence Laboratory

In this thesis, I present efficient global and local risk-sensitive stochastic optimization algorithms suitable for performing policy search in variety of problems of interest to robotics researchers. These algorithms exploit new techniques in nonparameteric heteroscedastic regression to directly model the policy dependent distribution of cost. For local search, learned cost models can be used as critics for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to risk-sensitive criteria. This separation of learning and policy selection leads to variable risk control, where risk sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without requiring additional policy executions. I describe several experiments with the uBot-5 including learning dynamic arm motions to stabilize after large impacts, lifting heavy objects while balancing, and developing safe fall bracing behaviors.

Matthew Rattigan; Leveraging Relational Representations for Causal Discovery; (David Jensen, Advisor); Sept. 2012; Digital Analyst, Obama For America

This thesis represents a synthesis of relational learning and causal discovery, two subjects at the frontier of machine learning research.  Relational learning investigates algorithms for constructing statistical models of data drawn from multiple types of interrelated entities, and causal discovery investigates algorithms for constructing causal models from observational data. 

Traditionally, propositional (or "flat") data representations have dominated the statistical sciences.  These representations assume that data consist of independent and identically distributed (iid) entities which can be represented by a single data table.  More recently, data scientists have increasingly focused on "relational" data sets that consist of interrelated, heterogeneous entities.  However, relational learning and causal discovery are rarely combined. 

This unexplored topical intersection represents an opportunity for advancement, in which we can provide insight into the challenges found in each subject area.  By adopting a causal viewpoint, we can clarify the mechanisms that produce previously identified pathologies in relational learning.  Analogously, we can utilize relational data to establish and strengthen causal claims in ways that are impossible using only propositional representations.

Xiaobing Xue; Modeling Reformulation as Query Distributions; (W. Bruce Croft, Advisor); Sept. 2012; Software Developer, Twitter

Query reformulation modifies the original query with aim of providing a better representation of a user's information need and consequently improving the retrieval performance. Previous reformulation models typically generate words and phrases related to the original query, but do not consider how these words and phrases would fit together in realistic or actual queries.

A novel framework is proposed that models reformulation as a distribution of reformulated queries, where each reformulated query is associated with a probability indicating its importance. This framework considers a reformulated query as the basic unit and can capture the important query-level dependencies between words and phrases in a realistic or actual query. Since a reformulated query is the output of applying a single or multiple reformulation operations, this framework combines different operations such as query segmentation, query substitution and query deletion within the same framework. Moreover, a retrieval model is considered as an integrated part of this framework, which considers the reformulation model and the retrieval model jointly.

Tingxin Yan; Designing Novel Mobile Systems By Exploiting Sensing, User Context, and Crowdsourcing; (Deepak Ganesan, Advisor); Sept. 2012; Assistant Professor, Department of Computer Science & Engineering, Univ. of Arkansas

We focus first on the domain of context-aware mobile systems. We study the problem of how to incorporate user context into mobile operating system design by presenting a system named FALCON--an application-preloading engine, which infers user context from sensing data, learns associations between user context and application usage, and preloads applications to improve their responsiveness. Compared with existing caching schemes, Falcon improves the application responsiveness by 2 times.

The second focus is on the domain of participatory sensing. We explore the problem of improving image search accuracy by presenting a mobile service named CrowdSearch that achieves over 95% accuracy consistently across multiple categories of images with response time in a minute.

We then study the problem of image search under resource constraints, by presenting a mobile system named SenSearch that turns smartphones into micro image search engines, where images are collected, indexed, and transmitted using compact features that are two magnitudes smaller than their raw format. SenSearch improves the energy and bandwidth cost by five times compared with existing image search engines.