deluca.envs.classic.Acrobot¶
-
class
deluca.envs.classic.
Acrobot
(*args, **kwargs)[source]¶ Acrobot is a 2-link pendulum with only the second joint actuated. Initially, both links point downwards. The goal is to swing the end-effector at a height at least the length of one link above the base. Both links can swing freely and can pass by each other, i.e., they don’t collide when they have the same angle. STATE: The state consists of the sin() and cos() of the two rotational joint angles and the joint angular velocities : [cos(theta1) sin(theta1) cos(theta2) sin(theta2) thetaDot1 thetaDot2]. For the first link, an angle of 0 corresponds to the link pointing downwards. The angle of the second link is relative to the angle of the first link. An angle of 0 corresponds to having the same angle between the two links. A state of [1, 0, 1, 0, …, …] means that both links point downwards. ACTIONS: The action is either applying +1, 0 or -1 torque on the joint between the two pendulum links. REFERENCE: .. warning:
This version of the domain uses the Runge-Kutta method for integrating the system dynamics and is more realistic, but also considerably harder than the original version which employs Euler integration, see the AcrobotLegacy class.
Public Data Attributes:
dt
LINK_LENGTH_1
LINK_LENGTH_2
[kg] mass of link 1
[kg] mass of link 2
[m] position of the center of mass of link 1
[m] position of the center of mass of link 2
moments of inertia for both links
MAX_VEL_1
MAX_VEL_2
AVAIL_TORQUE
torque_noise_max
use dynamics equations from the nips paper or the book
action_arrow
domain_fig
actions_num
assume observations are fully observable
Inherited from
Env
reward_range
action_space
observation_space
assume observations are fully observable
Inherited from
JaxObject
name
attrs
Inherited from
Env
metadata
reward_range
spec
action_space
observation_space
unwrapped
Completely unwrap this env.
Public Methods:
__init__
([seed, horizon])Initialize self.
reset
()Resets the environment to an initial state and returns an initial observation.
step
(action)Run one timestep of the environment’s dynamics.
Inherited from
Env
__new__
(cls, *args, **kwargs)For avoiding super().__init__()
check_spaces
()__init_subclass__
(*args, **kwargs)For avoiding a decorator for each subclass
reset
()Resets the environment to an initial state and returns an initial observation.
dynamics
(state, action)check_action
(action)check_observation
(observation)step
(action)Run one timestep of the environment’s dynamics.
jacobian
(func, state, action)hessian
(func, state, action)close
()Override close in your subclass to perform any necessary cleanup.
Inherited from
JaxObject
__new__
(cls, *args, **kwargs)For avoiding super().__init__()
__init_subclass__
(*args, **kwargs)For avoiding a decorator for each subclass
__str__
()Return str(self).
__setattr__
(key, val)Implement setattr(self, name, value).
save
(path)load
(path)throw
(err, msg)Inherited from
Env
step
(action)Run one timestep of the environment’s dynamics.
reset
()Resets the environment to an initial state and returns an initial observation.
render
([mode])Renders the environment.
close
()Override close in your subclass to perform any necessary cleanup.
seed
([seed])Sets the seed for this env’s random number generator(s).
__str__
()Return str(self).
__enter__
()Support with-statement for the environment.
__exit__
(*args)Support with-statement for the environment.
Private Methods:
_terminal
()_dsdt
(augmented_state, t)
-
LINK_COM_POS_1
= 0.5¶ [m] position of the center of mass of link 1
-
LINK_COM_POS_2
= 0.5¶ [m] position of the center of mass of link 2
-
LINK_MASS_1
= 1.0¶ [kg] mass of link 1
-
LINK_MASS_2
= 1.0¶ [kg] mass of link 2
-
LINK_MOI
= 1.0¶ moments of inertia for both links
-
book_or_nips
= 'book'¶ use dynamics equations from the nips paper or the book
-
property
observation
¶ assume observations are fully observable
- Type
NOTE
-
reset
()[source]¶ Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
-