deluca.envs.classic.Acrobot

class deluca.envs.classic.Acrobot(*args, **kwargs)[source]

Acrobot is a 2-link pendulum with only the second joint actuated. Initially, both links point downwards. The goal is to swing the end-effector at a height at least the length of one link above the base. Both links can swing freely and can pass by each other, i.e., they don’t collide when they have the same angle. STATE: The state consists of the sin() and cos() of the two rotational joint angles and the joint angular velocities : [cos(theta1) sin(theta1) cos(theta2) sin(theta2) thetaDot1 thetaDot2]. For the first link, an angle of 0 corresponds to the link pointing downwards. The angle of the second link is relative to the angle of the first link. An angle of 0 corresponds to having the same angle between the two links. A state of [1, 0, 1, 0, …, …] means that both links point downwards. ACTIONS: The action is either applying +1, 0 or -1 torque on the joint between the two pendulum links. REFERENCE: .. warning:

This version of the domain uses the Runge-Kutta method for integrating
the system dynamics and is more realistic, but also considerably harder
than the original version which employs Euler integration,
see the AcrobotLegacy class.

Public Data Attributes:

dt

LINK_LENGTH_1

LINK_LENGTH_2

LINK_MASS_1

[kg] mass of link 1

LINK_MASS_2

[kg] mass of link 2

LINK_COM_POS_1

[m] position of the center of mass of link 1

LINK_COM_POS_2

[m] position of the center of mass of link 2

LINK_MOI

moments of inertia for both links

MAX_VEL_1

MAX_VEL_2

AVAIL_TORQUE

torque_noise_max

book_or_nips

use dynamics equations from the nips paper or the book

action_arrow

domain_fig

actions_num

observation

assume observations are fully observable

Inherited from Env

reward_range

action_space

observation_space

observation

assume observations are fully observable

Inherited from JaxObject

name

attrs

Inherited from Env

metadata

reward_range

spec

action_space

observation_space

unwrapped

Completely unwrap this env.

Public Methods:

__init__([seed, horizon])

Initialize self.

reset()

Resets the environment to an initial state and returns an initial observation.

step(action)

Run one timestep of the environment’s dynamics.

Inherited from Env

__new__(cls, *args, **kwargs)

For avoiding super().__init__()

check_spaces()

__init_subclass__(*args, **kwargs)

For avoiding a decorator for each subclass

reset()

Resets the environment to an initial state and returns an initial observation.

dynamics(state, action)

check_action(action)

check_observation(observation)

step(action)

Run one timestep of the environment’s dynamics.

jacobian(func, state, action)

hessian(func, state, action)

close()

Override close in your subclass to perform any necessary cleanup.

Inherited from JaxObject

__new__(cls, *args, **kwargs)

For avoiding super().__init__()

__init_subclass__(*args, **kwargs)

For avoiding a decorator for each subclass

__str__()

Return str(self).

__setattr__(key, val)

Implement setattr(self, name, value).

save(path)

load(path)

throw(err, msg)

Inherited from Env

step(action)

Run one timestep of the environment’s dynamics.

reset()

Resets the environment to an initial state and returns an initial observation.

render([mode])

Renders the environment.

close()

Override close in your subclass to perform any necessary cleanup.

seed([seed])

Sets the seed for this env’s random number generator(s).

__str__()

Return str(self).

__enter__()

Support with-statement for the environment.

__exit__(*args)

Support with-statement for the environment.

Private Methods:

_terminal()

_dsdt(augmented_state, t)


[m] position of the center of mass of link 1

[m] position of the center of mass of link 2

[kg] mass of link 1

[kg] mass of link 2

moments of inertia for both links

__init__(seed=0, horizon=50)[source]

Initialize self. See help(type(self)) for accurate signature.

book_or_nips = 'book'

use dynamics equations from the nips paper or the book

property observation

assume observations are fully observable

Type

NOTE

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns

the initial observation.

Return type

observation (object)

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters

action (object) – an action provided by the agent

Returns

agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

observation (object)