Percentile Policies for Tracking of Markovian Random Processes with Asymmetric Cost and Observation

Author name not available (Why is that?)

Publication date: 3 March 2017

Abstract: Motivated by wide-ranging applications such as video delivery over networks using Multiple Description Codes, congestion control, and inventory management, we study the state-tracking of a Markovian random process with a known transition matrix and a finite ordered state set. The decision-maker must select a state as an action at each time step to minimize the total expected cost. The decision-maker is faced with asymmetries both in cost and observation: in case the selected state is less than the actual state of the Markovian process, an under-utilization cost occurs and only partial observation about the actual state is revealed; otherwise, the decision incurs an over-utilization cost and reveals full information about the actual state. We can formulate this problem as a Partially Observable Markov Decision Process which can be expressed as a dynamic program based on the last full observed state and the time of full observation. This formulation determines the sequence of actions to be taken between any two consecutive full observations of the actual state. However, this DP grows exponentially in the number of states, with little hope for a computationally feasible solution. We present an interesting class of computationally tractable policies with a percentile structure. A generalization of binary search, this class of policies attempt at any given time to reduce the uncertainty by a given percentage. Among all percentile policies, we search for the one with the minimum expected cost. The result of this search is a heuristic policy which we evaluate through numerical simulations. We show that it outperforms the myopic policies and under some conditions performs close to the optimal policies. Furthermore, we derive a lower bound on the cost of the optimal policy which can be computed with low complexity and give a measure for how close our heuristic policy is to the optimal policy.

This page was built for publication: Percentile Policies for Tracking of Markovian Random Processes with Asymmetric Cost and Observation

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6283864)