Java写openai_是否有可能在OpenAI中创建一个新的健身房环境?

卜盛
2023-12-01

在极小的环境中查看我的banana-gym .

创建新环境

请参阅存储库的主页面:

步骤是:

使用PIP包结构创建新存储库

它看起来应该是这样的

gym-foo/

README.md

setup.py

gym_foo/

__init__.py

envs/

__init__.py

foo_env.py

foo_extrahard_env.py

有关其内容,请点击上面的链接 . 那里没有提到的细节特别是 foo_env.py 中的某些函数应该是什么样子 . 查看示例和gym.openai.com/docs/有帮助 . 这是一个例子:

class FooEnv(gym.Env):

metadata = {'render.modes': ['human']}

def __init__(self):

pass

def _step(self, action):

"""

Parameters

----------

action :

Returns

-------

ob, reward, episode_over, info : tuple

ob (object) :

an environment-specific object representing your observation of

the environment.

reward (float) :

amount of reward achieved by the previous action. The scale

varies between environments, but the goal is always to increase

your total reward.

episode_over (bool) :

whether it's time to reset the environment again. Most (but not

all) tasks are divided up into well-defined episodes, and done

being True indicates the episode has terminated. (For example,

perhaps the pole tipped too far, or you lost your last life.)

info (dict) :

diagnostic information useful for debugging. It can sometimes

be useful for learning (for example, it might contain the raw

probabilities behind the environment's last state change).

However, official evaluations of your agent are not allowed to

use this for learning.

"""

self._take_action(action)

self.status = self.env.step()

reward = self._get_reward()

ob = self.env.getState()

episode_over = self.status != hfo_py.IN_GAME

return ob, reward, episode_over, {}

def _reset(self):

pass

def _render(self, mode='human', close=False):

pass

def _take_action(self, action):

pass

def _get_reward(self):

""" Reward is given for XY. """

if self.status == FOOBAR:

return 1

elif self.status == ABC:

return self.somestate ** 2

else:

return 0

使用您的环境

import gym

import gym_foo

env = gym.make('MyEnv-v0')

例子

 类似资料: