Machine Learning and Friends Lunch

Concurrent Decision Making in Markov Decision Processes

Abstract

Concurrent decision making and coordination has been recognized as a fundamental problem in many areas of robotics, control, and computer science. In the field of Artificial Intelligence in particular, this problem is recognized as a formidable challenge. By concurrent decision making we refer to a class of problems that require agents to accomplish long-term goals by concurrently executing multiple activities. In general, the problem is difficult to solve as it requires learning and planning with a combinatorial set of interacting concurrent activities with uncertain outcomes that compete for limited resources in the system.

In this talk we present a general framework for modeling the concurrent decision making problem based on semi-Markov decision processes (SMDPs). Our approach is based on a centralized control formalism, where we assume a central control mechanism initiates, executes and monitors concurrent activities. This view also captures the type of concurrency that exists in single agent domains, where a single agent is capable of performing multiple activities simultaneously by exploiting the degrees of freedom (DOF) in the system. We present a set of coordination mechanisms employed by our model for monitoring the execution and termination of concurrent activities. Such coordination mechanisms incorporate various natural activity completion mechanisms based on the individual termination of each activity. We provide theoretical results that assert the correctness of the model semantics which allows us to apply standard SMDP learning and planning techniques for solving the concurrent decision making problem.

Theoretically, standard SMDP solution methods do not scale to concurrent decision making systems with large degrees of freedom. This problem is a classic example of the curse of dimensionality in the action space, where the size of the set of concurrent activities exponentially grows as the system admits more degrees of freedom. To alleviate this problem, we develop a novel decision theoretic framework in spirit motivated by the coarticulation phenomenon investigated in speech and motor control research. We show that by applying coarticulation to systems with excess degrees of freedom, concurrency is naturally generated. We present a set of theoretical results that characterizes the efficiency of the concurrent decision making based on the coarticulation framework when compared to the case in which the agent is allowed to only execute activities sequentially (i.e., no coarticulation). We also present a set of techniques for scaling the coarticulation framework to large domains. We empirically evaluate our algorithms in a set of simulated domains ranging from an agent navigating in a grid world performing concurrent activities, to a simulated domain with multiple degrees of freedom that is capable of performing tasks concurrently.

Back to ML Lunch home