|
Home People Publications Software Funding |
Proto-Value Functions Group
Proto-Value functions are a novel framework for solving the credit assignment problem, which is a fundamental challenge facing any AI system. The framework provides a unified approach to learning both representation and behavior. It provides a novel way of solving Markov decision processes and reinforcement learning problems, using multiscale spectral and manifold learning methods. Previous work on manifold and spectral methods have largely focused on dimensionality reduction, (semi-) supervised learning and clustering. Furthermore, manifold and spectral techniques have mostly focused on Laplacian or Fourier-based global approaches. Laplacian proto-value functions are learned by constructing a directed or undirected graph connecting "nearby" states or state action pairs, and diagonalizing the random walk matrix representing the Markov diffusion process on the graph. These basis functions are eigenvector based, global in nature, whose support is the entire state space. In contrast, diffusion wavelet proto-value functions are formed by doing a multiscale diffusion analysis of the random walk diffusion matrix. In this approach, the basis functions constructed are compact, and represent an integrated temporal and spatial abstraction of the underlying diffusion process. Together, the Laplacian and wavelet based manifold learning methods hold the promise of a new generation of powerful tools for solving MDPs and RL, including ways of approximating value functions that respect geodesic distances on the underlying manifold; faster methods of policy evaluation and novel variants of policy iteration where both the representation and optimal policy can be simultaneously learned; algorithms for hierarchical reinforcement learning where the underlying hierarchy is automatically learned; novel approaches to transfer learning by transferring shared representations; and enabling reinforcement learning methods without requiring (task-specific) rewards. |