deluca.agents.Deep¶

class deluca.agents.Deep(*args, **kwargs)[source]¶

Generic deep controller that uses zero-order methods to train on an environment.

Public Data Attributes:

Inherited from JaxObject

`name`
`attrs`

Public Methods:

`__init__`(env_state_size, action_space[, …])	Description: initializes the Deep agent
`reset`()	Description: reset agent
`policy`(state, w)	Description: Policy that maps state to action parameterized by w
`softmax_grad`(softmax)	Description: Vectorized softmax Jacobian
`__call__`(state)	Description: provide an action given a state
`feed`(reward)	Description: compute gradient and save with reward in memory for weight updates
`update`()	Description: update weights

Inherited from Agent

`__init_subclass__`(args, *kwargs)	For avoiding a decorator for each subclass
`__call__`(state)	Description: provide an action given a state
`reset`()	Description: reset agent
`feed`(reward)	Description: compute gradient and save with reward in memory for weight updates

Inherited from JaxObject

`__new__`(cls, args, *kwargs)	For avoiding super().__init__()
`__init_subclass__`(args, *kwargs)	For avoiding a decorator for each subclass
`__str__`()	Return str(self).
`__setattr__`(key, val)	Implement setattr(self, name, value).
`save`(path)
`load`(path)
`throw`(err, msg)

__call__(state: jax._src.numpy.lax_numpy.ndarray)[source]¶

Description: provide an action given a state

Parameters: state (jnp.ndarray) –
Returns: action to take
Return type: jnp.ndarray

__init__(env_state_size, action_space, learning_rate: numbers.Real = 0.001, gamma: numbers.Real = 0.99, max_episode_length: int = 500, seed: int = 0) → None[source]¶

Description: initializes the Deep agent

Parameters

env (Env) – a deluca environment
learning_rate (Real) –
gamma (Real) –
max_episode_length (int) –
seed (int) –

Returns

None

feed(reward: numbers.Real) → None[source]¶

Description: compute gradient and save with reward in memory for weight updates

Parameters: reward (Real) –
Returns: None

policy(state: jax._src.numpy.lax_numpy.ndarray, w: jax._src.numpy.lax_numpy.ndarray) → jax._src.numpy.lax_numpy.ndarray[source]¶

Description: Policy that maps state to action parameterized by w

Parameters

state (jnp.ndarray) –
w (jnp.ndarray) –

reset() → None[source]¶

Description: reset agent

Parameters: None –
Returns: None

softmax_grad(softmax: jax._src.numpy.lax_numpy.ndarray) → jax._src.numpy.lax_numpy.ndarray[source]¶

Description: Vectorized softmax Jacobian

Parameters: softmax (jnp.ndarray) –

update() → None[source]¶

Description: update weights

Parameters: None –
Returns: None