经典条件反射【RL强化学习】

介绍

巴普洛夫的狗：摇铃给食物

被植入恐惧记忆的老鼠：老鼠走进一个房间，给电击，加强学习后形成恐怖记忆。

：

转化为强化学习的问题

巴普洛夫的狗：摇铃给食物, 下次摇铃，狗流口水

1. 巴普洛夫的狗：

1. state(s)：

铃

食物

Conditioned Stimuli - Unconditioned Stimuli 即 CS - US

2. reward(r)：
恒定的

好事： r（食物）>0 / Consequence

3. value(v)：

通过学习可以改变的值。学习过程 = 更新v(铃)的值

馋：v(铃)>0

Expected consequence 主观（from agent）

2. 被植入恐惧记忆的老鼠：

1. state(s)：

电击

房间

Conditioned Stimuli - Unconditioned Stimuli 即 CS - US

2. reward(r)：

恒定的

坏事： r（电击）<0 / Consequence

3. value(v)：

通过学习可以改变的值。学习过程 = 更新v(房间)的值

恐惧：v(房间)>0

Expected consequence 主观（from agent）

算法实现

Rescorla-Wagner model：

V_cs = V_cs + A * (V_us * us - V_cs * cs)

*注：cs,us为是或者否的 ninary variable

*A为学习速率，取值范围[0，1)

代码实现

1
2
3

# conditioning: CS, US, P(US|CS), R(US) =>each trail, (CS,US=0 or 1, R = 0 or 1)
# learning: Rescola-Wagner model, trial-by-trail learning; learning parameter: A as learning rate
# variation to the tested: several componded cs

1 2	import numpy as np import matplotlib.pyplot as plt

# environment creator
def gen_trial():
    # N is the numer of trials
    cs = True #single conditional stimuli
    us = np.random.rand(1)>.2 # in 80% of the cases cs will be followed by a US
    trial = [cs,us]
    return trial

1 2	# settings for the learner A = .1 # learning rate

# now start the experiment
Ntr = 30
V_us = 1 # initial value for us (e.g., fund) is positive -- an attractive stimuli
V_cs = 0 # initial value for the cs (e.g., a beep sound) is neutral
V_arr = np.ones(Ntr)
for k in range(Ntr):
    V_arr[k] = V_cs
    [cs,us] = gen_trial()
    V_cs = V_cs + A * (V_us * us - V_cs * cs)

1
2
3

plt.plot(np.arange(Ntr),V_arr)
plt.xlabel('trial num')
plt.ylabel('V(cs)')

Text(0, 0.5, 'V(cs)')

png

Tags: Keras深度学习

评估问题 →

赏

使用支付宝打赏

使用微信打赏

若你觉得我的文章对你有帮助，欢迎点击上方按钮对我打赏

扫描二维码，分享此文章