deluca.envs.classic.Acrobot¶

class deluca.envs.classic.Acrobot(*args, **kwargs)[source]¶

Acrobot is a 2-link pendulum with only the second joint actuated. Initially, both links point downwards. The goal is to swing the end-effector at a height at least the length of one link above the base. Both links can swing freely and can pass by each other, i.e., they don’t collide when they have the same angle. STATE: The state consists of the sin() and cos() of the two rotational joint angles and the joint angular velocities : [cos(theta1) sin(theta1) cos(theta2) sin(theta2) thetaDot1 thetaDot2]. For the first link, an angle of 0 corresponds to the link pointing downwards. The angle of the second link is relative to the angle of the first link. An angle of 0 corresponds to having the same angle between the two links. A state of [1, 0, 1, 0, …, …] means that both links point downwards. ACTIONS: The action is either applying +1, 0 or -1 torque on the joint between the two pendulum links. REFERENCE: .. warning:

This version of the domain uses the Runge-Kutta method for integrating
the system dynamics and is more realistic, but also considerably harder
than the original version which employs Euler integration,
see the AcrobotLegacy class.

Public Data Attributes:

`dt`
`LINK_LENGTH_1`
`LINK_LENGTH_2`
`LINK_MASS_1`	[kg] mass of link 1
`LINK_MASS_2`	[kg] mass of link 2
`LINK_COM_POS_1`	[m] position of the center of mass of link 1
`LINK_COM_POS_2`	[m] position of the center of mass of link 2
`LINK_MOI`	moments of inertia for both links
`MAX_VEL_1`
`MAX_VEL_2`
`AVAIL_TORQUE`
`torque_noise_max`
`book_or_nips`	use dynamics equations from the nips paper or the book
`action_arrow`
`domain_fig`
`actions_num`
`observation`	assume observations are fully observable

Inherited from Env

`reward_range`
`action_space`
`observation_space`
`observation`	assume observations are fully observable

Inherited from JaxObject

`name`
`attrs`

Inherited from Env

`metadata`
`reward_range`
`spec`
`action_space`
`observation_space`
`unwrapped`	Completely unwrap this env.

Public Methods:

`__init__`([seed, horizon])	Initialize self.
`reset`()	Resets the environment to an initial state and returns an initial observation.
`step`(action)	Run one timestep of the environment’s dynamics.

Inherited from Env

`__new__`(cls, args, *kwargs)	For avoiding super().__init__()
`check_spaces`()
`__init_subclass__`(args, *kwargs)	For avoiding a decorator for each subclass
`reset`()	Resets the environment to an initial state and returns an initial observation.
`dynamics`(state, action)
`check_action`(action)
`check_observation`(observation)
`step`(action)	Run one timestep of the environment’s dynamics.
`jacobian`(func, state, action)
`hessian`(func, state, action)
`close`()	Override close in your subclass to perform any necessary cleanup.

Inherited from JaxObject

`__new__`(cls, args, *kwargs)	For avoiding super().__init__()
`__init_subclass__`(args, *kwargs)	For avoiding a decorator for each subclass
`__str__`()	Return str(self).
`__setattr__`(key, val)	Implement setattr(self, name, value).
`save`(path)
`load`(path)
`throw`(err, msg)

Inherited from Env

`step`(action)	Run one timestep of the environment’s dynamics.
`reset`()	Resets the environment to an initial state and returns an initial observation.
`render`([mode])	Renders the environment.
`close`()	Override close in your subclass to perform any necessary cleanup.
`seed`([seed])	Sets the seed for this env’s random number generator(s).
`__str__`()	Return str(self).
`__enter__`()	Support with-statement for the environment.
`__exit__`(*args)	Support with-statement for the environment.

Private Methods:

`_terminal`()
`_dsdt`(augmented_state, t)

LINK_COM_POS_1 = 0.5¶: [m] position of the center of mass of link 1

LINK_COM_POS_2 = 0.5¶: [m] position of the center of mass of link 2

LINK_MASS_1 = 1.0¶: [kg] mass of link 1

LINK_MASS_2 = 1.0¶: [kg] mass of link 2

LINK_MOI = 1.0¶: moments of inertia for both links

__init__(seed=0, horizon=50)[source]¶: Initialize self. See help(type(self)) for accurate signature.

book_or_nips = 'book'¶: use dynamics equations from the nips paper or the book

property observation¶

assume observations are fully observable

Type: NOTE

reset()[source]¶

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns: the initial observation.
Return type: observation (object)

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)