UMass Amherst
Department of Computer Science
 

CS Home Page
Introduction
People
Admissions
Academics
Research

News
Events
Publications
Join Us
Alumni
Support

Contact Information
Directions
The University

 


 

The Analytic Web

The Internet has irrevocably changed the way in which scientific research is undertaken. Huge volumes of data, stored on computers all over the world, are now available to scientists everywhere. As a result, observations taken around the globe can be accessed quickly by scientists, raising the prospect of accelerated formulation and validation of scientific hypotheses. Extensive computing power, mass storage, and fast Internet access seem poised to foster rapid expansion of scientific knowledge.

But there are risks associated with this attractive scenario. The ways in which scientists acquire and process data must be understood by those who use them. Failure to take this information into account can lead to misuses that, in turn, can lead to misleading or incorrect results.

To address these issues, the Analytic Web project, funded by the NSF ITR program, is investigating computer support for web-based scientific processes. This project brings together researchers from the UMass Computer Science Department and from Harvard Forest to explore automated support for defining, analyzing, and automating scientific processes. Professors Lee Osterweil and Lori Clarke are leading the software engineering effort; Research Assistant Professor David Jensen is providing statistical analysis expertise; and Vision Lab researchers, Professor Ed Riseman, Professor Al Hanson, and Senior Research Scientist Howard Schultz, are working closely with the Harvard Forest researchers on collecting data and carefully defining ecological processes. The ecologists from Harvard Forest (Emery Boose, Aaron Ellison, David Foster, and Julian Hadley), are concerned with measuring and predicting forest carbon dioxide sequestration.

The ecologists gather data from a flux tower, a 10-meter structure located in Petersham, MA, in the midst of Harvard Forest. The flux tower takes in ambient air and measures the percentage of carbon dioxide in the air five times per second. These measurements are affected by various natural phenomena such as temperature, wind speed, and tree species (identified by aerial photographs that are evaluated by the Vision Lab). The ecologists apply and evaluate a number of cleansing, estimation, and statistical processes with the aim of determining a model of forest carbon dioxide sequestration. It is clear that such findings can have a substantial impact on policies aimed at addressing the control of greenhouse gases, which lead to global warming. These processes also serve as excellent case studies for the researchers’ investigation into support for the Analytic Web.

Central to this investigation is a careful study of the models needed to represent scientific processes effectively. This aspect of the work builds upon Osterweil’s ongoing research aimed at developing languages for the specification of processes. Originally focused on languages for defining software development processes, this work has recently widened its focus to address processes in such diverse areas as medical procedures, government functions, and electronic commerce. This work has led to the development of Little-JIL, a graphical language that incorporates representations of such semantic issues as exception management, resource utilization, timing constraints, and concurrency control. “These are all essential to the articulate definition of processes, but most are absent from current process definition languages,” says Osterweil. “Thus our intention is to use Little-JIL as a starting point in our efforts to model scientific processes, expecting that experience will point the way towards modifications and enhancements needed to support working scientists.”

Photo: Little-JIL Process
Figure 1: Little-JIL Process for Cleansing Data

Figure 1 shows a small part of a Little-JIL process for cleansing the carbon dioxide data collected from a flux tower. One of the team’s first findings has been the need to complement the process model with a derivation model. A derivation model is similar to a data-flow model or state diagram, in that it shows how types of data are processed, but it must also distinguish data instances, as illustrated in Figure 2.

The derivation model and process model together carefully document the processing applied to various instances of the dataset. The description is adequate to be used as the basis for execution. Thus, in documenting their processes, scientists are provided with an execution framework. Although there is considerable work to be done on the models and the user interface for such a framework, the ecologists have already found it preferable to their current programming environment (don’t ask!). “In the future, we plan to investigate using such models to support automatic rederivation and configuration management,” remarks Osterweil.

Photo: Little-JIL Process
Figure 2. Derivation Model for Carbon Dioxide Sequestration Evaluation

Another central theme in this project is analyzing the soundness of scientific processes. Clarke is leading this effort, building upon her previous research in finite-state verification. In this research, Clarke and her colleagues are developing an analyzer, called FLAVERS, capable of determining whether or not user-specified properties, describing desirable (or undesirable) sequences of events, can occur on any execution of a concurrent system. In this project, the team is investigating how such analysis techniques can be applied to Little-JIL process models. Eventually they would like to build upon the work of Jensen and Tim Oates (Ph.D. ’00) to specify and detect unreliable statistical processes. They also are exploring the consistency relationships between the process and derivation models.

“Although we are still in the early stages of this project, we have successfully defined and automated a few of the carbon dioxide sequestration measurement processes,” says Clarke. Visualization, execution, and easy modification of these processes have been demonstrated at an ecological conference. Work on improving the model representations and the associated analyses is underway. Eventually the researchers want to make the Analytic Web framework available to the general scientific community. Through this framework, they hope to provide support for defining, executing, and analyzing scientific processes that should foster safe reuse of data and processes and facilitate scientific discovery. “Ultimately we hope to see these scientific processes made available to students in universities, colleges, and high schools, in order to bring the challenges and excitement of scientific discovery into laboratories and classrooms around the country and the world,” says Clarke.

     


Comments:
www-admin@cs.umass.edu

© 2008 University of Massachusetts Amherst. Site Policies.
This site is maintained by the Department of Computer Science.