Home

People

Publications

Software

Funding

Proto-Value Functions Group

Welcome to the Proto-Value Functions (PVF) research website. Proto-value functions are automatically learned basis functions that are useful in approximating task-specific value functions and compactly representing powers of transition matrices. Applications of proto-value functions include solving Markov decision processes, new algorithms for reinforcement learning, and planning.




An example of a how proto-value functions reflect large-scale geometric structure of an environment. A three-room "grid-world" and a corresponding Laplacian (or Fourier) proto-value basis function is shown. Note how the PVF captures the structure inherent to the state space. These bais functions are task-independent, and can support the learning of multiple tasks in this environment.



Proto-Value functions are a novel framework for solving the credit assignment problem, which is a fundamental challenge facing any AI system. The framework provides a unified approach to learning both representation and behavior. It provides a novel way of solving Markov decision processes and reinforcement learning problems, using multiscale spectral and manifold learning methods. Previous work on manifold and spectral methods have largely focused on dimensionality reduction, (semi-) supervised learning and clustering. Furthermore, manifold and spectral techniques have mostly focused on Laplacian or Fourier-based global approaches.

Laplacian proto-value functions are learned by constructing a directed or undirected graph connecting "nearby" states or state action pairs, and diagonalizing the random walk matrix representing the Markov diffusion process on the graph. These basis functions are eigenvector based, global in nature, whose support is the entire state space. In contrast, diffusion wavelet proto-value functions are formed by doing a multiscale diffusion analysis of the random walk diffusion matrix. In this approach, the basis functions constructed are compact, and represent an integrated temporal and spatial abstraction of the underlying diffusion process.

Together, the Laplacian and wavelet based manifold learning methods hold the promise of a new generation of powerful tools for solving MDPs and RL, including ways of approximating value functions that respect geodesic distances on the underlying manifold; faster methods of policy evaluation and novel variants of policy iteration where both the representation and optimal policy can be simultaneously learned; algorithms for hierarchical reinforcement learning where the underlying hierarchy is automatically learned; novel approaches to transfer learning by transferring shared representations; and enabling reinforcement learning methods without requiring (task-specific) rewards.