My Learning Channel

经典条件反射 【RL强化学习】

2022-09-18


介绍

巴普洛夫的狗:摇铃给食物

被植入恐惧记忆的老鼠:老鼠走进一个房间,给电击,加强学习后形成恐怖记忆。


转化为强化学习的问题

巴普洛夫的狗:摇铃给食物, 下次摇铃,狗流口水

1. 巴普洛夫的狗:

1. state(s):

食物

Conditioned Stimuli - Unconditioned Stimuli 即 CS - US

2. reward(r):
恒定的

好事: r(食物)>0 / Consequence

3. value(v):

通过学习可以改变的值。学习过程 = 更新v(铃)的值

馋:v(铃)>0

Expected consequence 主观(from agent)

2. 被植入恐惧记忆的老鼠:

1. state(s):

电击

房间

Conditioned Stimuli - Unconditioned Stimuli 即 CS - US

2. reward(r):

恒定的

坏事: r(电击)<0 / Consequence

3. value(v):

通过学习可以改变的值。学习过程 = 更新v(房间)的值

恐惧:v(房间)>0

Expected consequence 主观(from agent)

算法实现

Rescorla-Wagner model:

V_cs = V_cs + A * (V_us * us - V_cs * cs)

*注:cs,us为是或者否的 ninary variable

*A为学习速率,取值范围[0,1)

代码实现

1
2
3
# conditioning: CS, US, P(US|CS), R(US) =>each trail, (CS,US=0 or 1, R = 0 or 1)
# learning: Rescola-Wagner model, trial-by-trail learning; learning parameter: A as learning rate
# variation to the tested: several componded cs
1
2
import numpy as np 
import matplotlib.pyplot as plt
1
2
3
4
5
6
7
# environment creator
def gen_trial():
# N is the numer of trials
cs = True #single conditional stimuli
us = np.random.rand(1)>.2 # in 80% of the cases cs will be followed by a US
trial = [cs,us]
return trial
1
2
# settings for the learner
A = .1 # learning rate
1
2
3
4
5
6
7
8
9
# now start the experiment
Ntr = 30
V_us = 1 # initial value for us (e.g., fund) is positive -- an attractive stimuli
V_cs = 0 # initial value for the cs (e.g., a beep sound) is neutral
V_arr = np.ones(Ntr)
for k in range(Ntr):
V_arr[k] = V_cs
[cs,us] = gen_trial()
V_cs = V_cs + A * (V_us * us - V_cs * cs)
1
2
3
plt.plot(np.arange(Ntr),V_arr)
plt.xlabel('trial num')
plt.ylabel('V(cs)')
Text(0, 0.5, 'V(cs)')

png


使用支付宝打赏
使用微信打赏

若你觉得我的文章对你有帮助,欢迎点击上方按钮对我打赏

扫描二维码,分享此文章