deluca.envs.classic.MountainCar¶

class deluca.envs.classic.MountainCar(*args, **kwargs)[source]¶

Public Data Attributes:

Inherited from Env

`reward_range`
`action_space`
`observation_space`
`observation`	assume observations are fully observable

Inherited from JaxObject

`name`
`attrs`

Inherited from Env

`metadata`
`reward_range`
`spec`
`action_space`
`observation_space`
`unwrapped`	Completely unwrap this env.

Public Methods:

`__init__`([goal_velocity, seed, horizon])	Initialize self.
`step`(action)	Run one timestep of the environment’s dynamics.
`reset`()	Resets the environment to an initial state and returns an initial observation.
`render`([mode])	Renders the environment.

Inherited from Env

`__new__`(cls, args, *kwargs)	For avoiding super().__init__()
`check_spaces`()
`__init_subclass__`(args, *kwargs)	For avoiding a decorator for each subclass
`reset`()	Resets the environment to an initial state and returns an initial observation.
`dynamics`(state, action)
`check_action`(action)
`check_observation`(observation)
`step`(action)	Run one timestep of the environment’s dynamics.
`jacobian`(func, state, action)
`hessian`(func, state, action)
`close`()	Override close in your subclass to perform any necessary cleanup.

Inherited from JaxObject

`__new__`(cls, args, *kwargs)	For avoiding super().__init__()
`__init_subclass__`(args, *kwargs)	For avoiding a decorator for each subclass
`__str__`()	Return str(self).
`__setattr__`(key, val)	Implement setattr(self, name, value).
`save`(path)
`load`(path)
`throw`(err, msg)

Inherited from Env

`step`(action)	Run one timestep of the environment’s dynamics.
`reset`()	Resets the environment to an initial state and returns an initial observation.
`render`([mode])	Renders the environment.
`close`()	Override close in your subclass to perform any necessary cleanup.
`seed`([seed])	Sets the seed for this env’s random number generator(s).
`__str__`()	Return str(self).
`__enter__`()	Support with-statement for the environment.
`__exit__`(*args)	Support with-statement for the environment.

Private Methods:

_height(xs)

__init__(goal_velocity=0, seed=0, horizon=50)[source]¶: Initialize self. See help(type(self)) for accurate signature.

render(mode='human')[source]¶

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters: mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns: the initial observation.
Return type: observation (object)

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)