Quick Start¶

This sections gives simple boilerplate code using gym-os2r to perform a simulation of the physical monopod. This simulation provides an environment that parallels the physical properties of the real monopod for reinforcement learning research compatible with OpenAI Gym.

Following the structure of gym-ignition we provide the following classes to train a model for the monopod:

runtime: Provides the code that deals with real-time execution for environments running on the real robot. The implementation for the physical platform is RealtimeRuntime.
monopod: Provides the structure of the decision-making logic for the monopod. The code of the task is independent from the runtime, and only the core ScenarIO APIs should be used. The active runtime will then execute the task on either simulated or real worlds by selecting the scenario implementation.
gym_os2r.randomizers: Randomizer acts as a gym.Wrapper class that randomizes the domain of the simulated environment every rollout. The two randomizes provided are gym_os2r.randomizers.monopod and gym_os2r.randomizers.monopod_no_rand.
gym_os2r.rewards.RewardBase: Abstract base class that provides a simple interface to implement new rewards functions. This class can be passed into the monopod *kwargs.

A minimal example for gym-os2r. This example creates a monopod environment then performs random actions.

Minimal Example¶

import gym
import time
import functools
from gym_ignition.utils import logger

from gym_os2r import randomizers
from gym_os2r.common import make_env_from_id

# Set verbosity
# logger.set_level(gym.logger.ERROR)
logger.set_level(gym.logger.DEBUG)

env_id = "Monopod-stand-v1"

kwargs = {}

make_env = functools.partial(make_env_from_id, env_id=env_id, **kwargs)

# env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env = randomizers.monopod_no_rand.MonopodEnvNoRandomizer(env=make_env)

# Enable the rendering
env.render('human')

# Initialize the seed
env.seed(42)

for epoch in range(1000):

    # Reset the environment
    observation = env.reset()

    # Initialize returned values
    done = False

    while not done:
        # Execute a random action
        action = env.action_space.sample()
        observation, reward, done, _ = env.step(action)

env.close()
time.sleep(5)

Example of the simulation. The minimal example performs random actions until it reaches the maximum time steps.¶

Environment Ids¶

The provided environment ids with their corresponding kwargs are listed in the table below.

Environment Id	task_mode	reward_class	reset_positions
`Monopod-stand-v1`	`'free_hip'`	`StandingV1`	`['ground']`
`Monopod-balance-v1`	`'free_hip'`	`BalancingV1`	`['stand']`
`Monopod-walk-v1`	`'free_hip'`	`WalkingV1`	`['stand']`

Kwarg Options¶

The gym-os2r package provides multiple kwargs for ease of customizing the environment. The available kwargs are listed in the attributes of the monopod class. The following table concisely lists all the different options:

Required `kwarg`	Type	Description	Available Options
`task_mode`	str	Defines the configured mode of the monopod i.e. how many actuated joints and how many observed joints.	`‘free_hip’`, `‘fixed_hip’`, `‘fixed’`, Deprecated Options (`‘old-free_hip’`, `‘old-fixed_hip’`, `‘old-fixed’`).
`reward_class`	`gym_os2r.rewards.RewardBase`	Defines the reward function for the task. The reward class has access to the previous action and the current observation.	Provided reward functions: `BalancingV1`, `BalancingV2`, `StandingV1`, `WalkingV1`.
`reset_position`	[str]	Array of allowed positions for the monopod to be reset into. This will be randomly chosen during each reset.	`‘stand’`, `‘half_stand’`, `‘ground’`, `‘lay’`, `‘float’`

Example of how to specify the kwargs in the env. Replace the kwargs with the ones that are desired.

from gym_os2r import randomizers
from gym_os2r.common import make_env_from_id
from gym_os2r.rewards import BalancingV1

env_id = "Monopod-stand-v1"
kwargs = {
  'task_mode': 'free_hip',
  'reward_class': BalancingV1,
  'reset_positions': ['float', 'lay', 'stand']
}

make_env = functools.partial(make_env_from_id, env_id=env_id, **kwargs)
env = randomizers.monopod_no_rand.MonopodEnvNoRandomizer(env=make_env)

Default Reset Positions¶

The reset positions shipped with the environment are all shown below. You can choose any number of these positions to train with.

ground reset position¶

half_stand reset position¶

stand reset position¶

float reset position¶

lay reset position¶

Default Randomizer Features¶

Randomized Property	Method	Distribution	Note
`link mass`	Scale	`UniformParams(low=0.8, high=1.2)`	Mass of each link in the robot is scaled between 80% and 120% of the default value. The scaling is sampled from a uniform distribution.
`joint friction`	Absolute	`UniformParams(low=0.01, high=0.1)`	Joint frictions are sampled from a uniform distribution between 0.01 and 0.1.
`joint damping`	Scale	`UniformParams(low=0.8, high=1.2)`	Dampening of each link in the robot is scaled between 80% and 120% of the default value. The scaling is sampled from a uniform distribution.
`surface friction`	Absolute	`UniformParams(low=0.8, high=1.2)`	Ground planes surface frictions is sampled from a uniform distribution between 0.8 and 1.2.
`link inertia`	–	–	Link inertia needs to satisfy the triangle inequality. This means the the link inertia can only have scaling trivially. Will add better randomization in future. Track feature here.

To use domain randomization while training only one line of code is required to change. The following code block illustrate this.

from gym_os2r import randomizers
from gym_os2r.common import make_env_from_id
from gym_os2r.rewards import BalancingV1

env_id = "Monopod-stand-v1"
make_env = functools.partial(make_env_from_id, env_id=env_id, **kwargs)

env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)