引用本文
  • 陈希亮,曹 雷,沈 驰.基于深度逆向强化学习的行动序列规划问题研究[J].国防科技,2019,40(4):    [点击复制]
  • .[cn_title][J].2019,40(4):   [点击复制]
【打印本页】 【在线阅读全文】【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 38次   下载 7  
基于深度逆向强化学习的行动序列规划问题研究
陈希亮,曹 雷,沈 驰
0
()
摘要:
针对深度强化学习在解决序贯决策任务中严重依赖回报函数,而回报函数又存在着反馈稀疏和反馈延迟等问题,论文提出了基于深度逆向强化学习方法的行动序列生成与优化方法,通过专家示例轨迹数据重构回报函数,实现高质量示例轨迹数据中隐性专家经验的获取和利用,挖掘数据背后的规律。然后将重构的回报函数与环境固有的回报函数进行奖赏塑型,生成的新的回报函数能够更加及时、准确地对智能实体的行为给予反馈,大幅加速了强化学习的收敛速度。
关键词:  深度强化学习  作战行动序列  智能化战争
DOI:10.13943/j.issn 1671-4547.2019.04.12
基金项目:
Research on course of action planning based on deep inverse reinforcement learning
Abstract:
Deep reinforcement learning relies heavily on the reward function in solving sequential decision tasks, and the reward function faces the problems of sparse feedback and delayed feedback. In this paper, a method of generating and optimizing action sequence based on deep inverse reinforcement learning method is proposed, and the reward function is reconstructed by expert demonstrations to achieve high quality display. The acquisition and utilization of implicit expert experience in demonstration trajectory data are exemplified, and the potential laws behind the demonstrations are mined. Then the reconstructed reward function is merged with the inherent return function of the environment. The new reward function can give more timely and accurate feedback to the behavior of intelligent entities, and can greatly accelerate the convergence speed of reinforcement learning.
Key words:  deep reinforcement learning  course of action planning  smart warfare