gym_os2r.rewards¶

class gym_os2r.rewards.BalancingV1(observation_index, normalized)¶

Bases: gym_os2r.rewards.RewardBase

Balancing reward. Start from standing positions and stay standing.

calculate_reward(obs, actions)¶

Calculates the reward given observation and action. The reward is calculated in a provided reward class defined in the tasks kwargs.

Parameters

obs (np.array) – numpy array with the same size task dimensions as observation space.
Deque[np.array] (actions) – Deque of actions taken by the environment numpy array with the same size task dimensions as action space.

Returns

True for done, False otherwise.

Return type

(bool)

class gym_os2r.rewards.BalancingV2(observation_index, normalized)¶

Bases: gym_os2r.rewards.RewardBase

Balancing reward. Start from standing positions and stay standing. Smaller control signals are favoured.

calculate_reward(obs, actions)¶

Calculates the reward given observation and action. The reward is calculated in a provided reward class defined in the tasks kwargs.

Parameters

obs (np.array) – numpy array with the same size task dimensions as observation space.
Deque[np.array] (actions) – Deque of actions taken by the environment numpy array with the same size task dimensions as action space.

Returns

True for done, False otherwise.

Return type

(bool)

class gym_os2r.rewards.BalancingV3(observation_index, normalized)¶

Bases: gym_os2r.rewards.RewardBase

Balancing reward. Start from standing positions and stay standing. Small changes in control signal magnitude are favoured.

calculate_reward(obs, actions)¶

Calculates the reward given observation and action. The reward is calculated in a provided reward class defined in the tasks kwargs.

Parameters

obs (np.array) – numpy array with the same size task dimensions as observation space.
Deque[np.array] (actions) – Deque of actions taken by the environment numpy array with the same size task dimensions as action space.

Returns

True for done, False otherwise.

Return type

(bool)

class gym_os2r.rewards.HoppingV1(observation_index, normalized)¶

Bases: gym_os2r.rewards.RewardBase

Balancing reward. Start from standing positions and stay standing. Smaller control signals are favoured.

calculate_reward(obs, actions)¶

Calculates the reward given observation and action. The reward is calculated in a provided reward class defined in the tasks kwargs.

Parameters

obs (np.array) – numpy array with the same size task dimensions as observation space.
Deque[np.array] (actions) – Deque of actions taken by the environment numpy array with the same size task dimensions as action space.

Returns

True for done, False otherwise.

Return type

(bool)

class gym_os2r.rewards.RewardBase(observation_index, normalized)¶

Bases: object

Baseclass for rewards. Please follow this convention when making a new reward.

observation_index is a dictionary which gives the index of the observation for a specfied joints position or velocity.

abstract calculate_reward(obs, actions)¶

Calculates the reward given observation and action. The reward is calculated in a provided reward class defined in the tasks kwargs.

Parameters

obs (np.array) – numpy array with the same size task dimensions as observation space.
Deque[np.array] (actions) – Deque of actions taken by the environment numpy array with the same size task dimensions as action space.

Returns

True for done, False otherwise.

Return type

(bool)

get_supported_task_modes()¶

Get list of tasks supported by the reward function

Returns: list of supported task modes.
Return type: (list)

is_task_supported(task_mode)¶

Check if the ‘task_mode’ is supported by the reward function.

Parameters: task_mode (str) – name of task mode.
Returns: True for supported, False otherwise.
Return type: (bool)

class gym_os2r.rewards.StandingV1(observation_index, normalized)¶

Bases: gym_os2r.rewards.RewardBase

Standing reward. Start from ground and stand up.

calculate_reward(obs, actions)¶

Calculates the reward given observation and action. The reward is calculated in a provided reward class defined in the tasks kwargs.

Parameters

obs (np.array) – numpy array with the same size task dimensions as observation space.
Deque[np.array] (actions) – Deque of actions taken by the environment numpy array with the same size task dimensions as action space.

Returns

True for done, False otherwise.

Return type

(bool)

class gym_os2r.rewards.StraightV1(observation_index, normalized)¶

Bases: gym_os2r.rewards.RewardBase

Standing reward. Start from ground and stand up.

calculate_reward(obs, actions)¶

Calculates the reward given observation and action. The reward is calculated in a provided reward class defined in the tasks kwargs.

Parameters

obs (np.array) – numpy array with the same size task dimensions as observation space.
Deque[np.array] (actions) – Deque of actions taken by the environment numpy array with the same size task dimensions as action space.

Returns

True for done, False otherwise.

Return type

(bool)

gym_os2r.rewards.rewards_utils¶

Soft indicator function evaluating whether a number is within bounds.

gym_os2r.rewards.rewards_utils.tolerance(x, bounds=(0.0, 0.0), margin=0.0, sigmoid='gaussian', value_at_margin=0.1)¶

Returns 1 when x falls inside the bounds, between 0 and 1 otherwise.

Parameters

x (A scalar or numpy array) – value to apply tolerance to.
bounds (tuple) – A tuple of floats specifying inclusive (lower, upper) bounds for the target interval. These can be infinite if the interval is unbounded at one or both ends, or they can be equal to one another if the target value is exact.
margin (float) –
Parameter that controls how steeply the output decreases as x moves out-of-bounds.
- If margin == 0 then the output will be 0 for all values of x outside of bounds.
- If margin > 0 then the output will decrease sigmoidally with increasing distance from the nearest bound.
sigmoid (String) – choice of sigmoid type Valid values are: ‘gaussian’, ‘linear’, ‘hyperbolic’, ‘long_tail’, ‘cosine’, ‘tanh_squared’. value_at_margin (float) : A float between 0 and 1 specifying the output value when the distance from x to the nearest bound is equal to margin. Ignored if margin == 0.

Returns

A float or numpy array with values between 0.0 and 1.0.

Return type

(float, nparray)

Raises

(ValueError) – If bounds[0] > bounds[1].
(ValueError) – If margin is negative.