Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning - MaRDI portal

A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning (Q473823)

From MaRDI portal





scientific article; zbMATH DE number 6372503
Language Label Description Also known as
English
A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning
scientific article; zbMATH DE number 6372503

    Statements

    A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    24 November 2014
    0 references
    Summary: Solving reinforcement learning problems in continuous space with function approximation is currently a research hotspot of machine learning. When dealing with the continuous space problems, the classic \(Q\)-iteration algorithms based on lookup table or function approximation converge slowly and are difficult to derive a continuous policy. To overcome the above weaknesses, we propose an algorithm named DFR-\(Sarsa(\lambda)\) based on double-layer fuzzy reasoning and prove its convergence. In this algorithm, the first reasoning layer uses fuzzy sets of state to compute continuous actions; the second reasoning layer uses fuzzy sets of action to compute the components of \(Q\)-value. Then, these two fuzzy layers are combined to compute the \(Q\)-value function of continuous action space. Besides, this algorithm utilizes the membership degrees of activation rules in the two fuzzy reasoning layers to update the eligibility traces. Applying DFR-\(Sarsa(\lambda)\) to the Mountain Car and Cart-pole Balancing problems, experimental results show that the algorithm not only can be used to get a continuous action policy, but also has a better convergence performance.
    0 references

    Identifiers