Models

All models are derived from nnabla_rl.models.Model

Model

class nnabla_rl.models.model.Model(scope_name: str)[source]

Model Class

Parameters

scope_name (str) – the scope name of model

deepcopy(new_scope_name: str) T[source]

Create a (deep) copy of the model. All the model parameter’s (if exist) associated with will be copied and new_scope_name will be assigned.

Parameters

new_scope_name (str) – scope_name of parameters for newly created model

Returns

copied model

Return type

Model

Raises

ValueError – Given scope name is same as the model or already exists.

get_internal_states() Dict[str, Variable][source]

get_internal states Get the internal state variable of rnn cell. Model which use LSTM, GRU and/or any other recurrent network component must implement this method.

Returns

Value of each internal state. key is the name of each internal state.

Return type

Dict[str, nn.Variable]

get_parameters(grad_only: bool = True) Dict[str, Variable][source]

Retrive parameters associated with this model

Parameters

grad_only (bool) – Retrive parameters only with need_grad = True. Defaults to True.

Returns

Parameter map.

Return type

parameters (OrderedDict)

internal_state_shapes() Dict[str, Tuple[int, ...]][source]

Return internal state shape as tuple of ints for each internal state (excluding the batch_size). This method will be called by (RNNModelTrainer) and its subclasses to setup training variables. Model which use LSTM, GRU and/or any other recurrent network component must implement this method.

Returns

internal state shapes. key is the name of each internal state.

Return type

Dict[str, Tuple[int, …]]

is_recurrent() bool[source]

Check whether the model uses recurrent network component or not. Model which use LSTM, GRU and/or any other recurrent network component must return True. :returns: True if the model uses recurrent network component. Otherwise False. :rtype: bool

load_parameters(filepath: Union[str, Path]) None[source]

Load model parameters from given filepath.

Parameters

filepath (str or pathlib.Path) – paramter file path

reset_internal_states()[source]

reset_internal states Set the internal state variable of rnn cell to given zero.

save_parameters(filepath: Union[str, Path]) None[source]

Save model parameters to given filepath.

Parameters

filepath (str or pathlib.Path) – paramter file path

property scope_name: str

Get scope name of this model.

Returns

scope name of the model

Return type

scope_name (str)

set_internal_states(states: Optional[Dict[str, Variable]] = None)[source]

set_internal states Set the internal state variable of rnn cell to given state. Model which use LSTM, GRU and/or any other recurrent network component must implement this method. :param states: If None, reset all internal state to zero. :type states: None or Dict[str, nn.Variable] :param If state is provided: :param set the provided state as internal state.:

shallowcopy() T[source]

Create a (shallow) copy of the model. Unlike deepcopy, shallowcopy will KEEP sharing the original network parameter by using same scope_name as original model. However, all the class members will be (deep) copied to the new instance. Do NOT use this method unless you understand what this method does.

Returns

(shallow) copied model

Return type

Model

List of Models

class nnabla_rl.models.Perturbator(scope_name)[source]

Bases: Model

DeterministicPolicy Abstract class for perturbator

Perturbator generates noise to append to current state’s action

class nnabla_rl.models.Policy(scope_name: str)[source]

Bases: Model

class nnabla_rl.models.DeterministicPolicy(scope_name: str)[source]

Bases: Policy

Abstract class for deterministic policy

This policy returns an action for the given state.

abstract pi(s: Variable) Variable[source]
Parameters

state (nnabla.Variable) – State variable

Returns

Action for the given state

Return type

nnabla.Variable

class nnabla_rl.models.StochasticPolicy(scope_name: str)[source]

Bases: Policy

Abstract class for stochastic policy

This policy returns a probability distribution of action for the given state.

abstract pi(s: Variable) Distribution[source]
Parameters

state (nnabla.Variable) – State variable

Returns

Probability distribution of the action for the given state

Return type

nnabla_rl.distributions.Distribution

class nnabla_rl.models.QFunction(scope_name: str)[source]

Bases: Model

Base QFunction Class

all_q(s: Variable) Variable[source]

Compute Q-values for each action for given state

Parameters

s (nn.Variable) – state variable

Returns

Q-values for each action for given state

Return type

nn.Variable

argmax_q(s: Variable) Variable[source]

Compute the action which maximizes the Q-value for given state

Parameters

s (nn.Variable) – state variable

Returns

action which maximizes the Q-value for given state

Return type

nn.Variable

max_q(s: Variable) Variable[source]

Compute maximum Q-value for given state

Parameters

s (nn.Variable) – state variable

Returns

maximum Q-value value for given state

Return type

nn.Variable

abstract q(s: Variable, a: Variable) Variable[source]

Compute Q-value for given state and action

Parameters
  • s (nn.Variable) – state variable

  • a (nn.Variable) – action variable

Returns

Q-value for given state and action

Return type

nn.Variable

class nnabla_rl.models.ValueDistributionFunction(scope_name: str, n_action: int, n_atom: int, v_min: float, v_max: float)[source]

Bases: Model

Base value distribution class.

Computes the probabilities of q-value for each action. Value distribution function models the probabilities of q value for each action by dividing the values between the maximum q value and minimum q value into ‘n_atom’ number of bins and assigning the probability to each bin.

Parameters
  • scope_name (str) – scope name of the model

  • n_action (int) – Number of actions which used in target environment.

  • n_atom (int) – Number of bins.

  • v_min (int) – Minimum value of the distribution.

  • v_max (int) – Maximum value of the distribution.

all_probs(s: Variable) Variable[source]

Compute probabilities of atoms for all posible actions for given state

Parameters

s (nn.Variable) – state variable

Returns

probabilities of atoms for all posible actions for given state

Return type

nn.Variable

as_q_function() QFunction[source]

Convert the value distribution function to QFunction.

Returns

QFunction instance which computes the q-values based on the probabilities.

Return type

nnabla_rl.models.q_function.QFunction

max_q_probs(s: Variable) Variable[source]

Compute probabilities of atoms for given state that maximizes the q_value

Parameters

s (nn.Variable) – state variable

Returns

probabilities of atoms for given state that maximizes the q_value

Return type

nn.Variable

abstract probs(s: Variable, a: Variable) Variable[source]

Compute probabilities of atoms for given state and action

Parameters
  • s (nn.Variable) – state variable

  • a (nn.Variable) – action variable

Returns

probabilities of atoms for given state and action

Return type

nn.Variable

class nnabla_rl.models.QuantileDistributionFunction(scope_name: str, n_quantile: int)[source]

Bases: Model

Base quantile distribution class.

Computes the quantiles of q-value for each action. Quantile distribution function models the quantiles of q value for each action by dividing the probability (which is between 0.0 and 1.0) into ‘n_quantile’ number of bins and assigning the n-quantile to n-th bin.

Parameters
  • scope_name (str) – scope name of the model

  • n_quantile (int) – Number of bins.

all_quantiles(s: Variable) Variable[source]

Computes the quantiles of q-value for each action for the given state.

Parameters

s (nn.Variable) – state variable

Returns

quantiles of q-value for each action for the given state

Return type

nn.Variable

as_q_function() QFunction[source]

Convert the quantile distribution function to QFunction.

Returns

QFunction instance which computes the q-values based on the quantiles.

Return type

nnabla_rl.models.q_function.QFunction

max_q_quantiles(s: Variable) Variable[source]

Compute the quantiles of q-value for given state that maximizes the q_value

Parameters

s (nn.Variable) – state variable

Returns

quantiles of q-value for given state that maximizes the q_value

Return type

nn.Variable

quantiles(s: Variable, a: Variable) Variable[source]

Computes the quantiles of q-value for given state and action.

Parameters
  • s (nn.Variable) – state variable

  • a (nn.Variable) – action variable

Returns

quantiles of q-value for given state and action.

Return type

nn.Variable

class nnabla_rl.models.StateActionQuantileFunction(scope_name: str, n_action: int, K: int, risk_measure_function: ~typing.Callable[[~nnabla._variable.Variable], ~nnabla._variable.Variable] = <function risk_neutral_measure>)[source]

Bases: Model

state-action quantile function class.

Computes the return samples of q-value for each action. State-action quantile function computes the return samples of q value for each action using sampled quantile threshold (e.g. \(\tau\sim U([0,1])\)) for given state.

Parameters
  • scope_name (str) – scope name of the model

  • n_action (int) – Number of actions which used in target environment.

  • K (int) – Number of samples for quantile threshold \(\tau\).

  • risk_measure_function (Callable[[nn.Variable], nn.Variable]) – Risk measure funciton which modifies the weightings of tau. Defaults to risk neutral measure which does not do any change to the taus.

all_quantile_values(s: Variable, tau: Variable) Variable[source]

Compute the return samples for all action for given state and quantile threshold.

Parameters
  • s (nn.Variable) – state variable.

  • tau (nn.Variable) – quantile threshold.

Returns

return samples from implicit return distribution for given state using tau.

Return type

nn.Variable

as_q_function() QFunction[source]

Convert the state action quantile function to QFunction.

Returns

QFunction instance which computes the q-values based on return samples.

Return type

nnabla_rl.models.q_function.QFunction

max_q_quantile_values(s: Variable, tau: Variable) Variable[source]

Compute the return samples from distribution that maximizes q value for given state using quantile threshold.

Parameters
  • s (nn.Variable) – state variable.

  • tau (nn.Variable) – quantile threshold.

Returns

return samples from implicit return distribution that maximizes q for given state using tau.

Return type

nn.Variable

quantile_values(s: Variable, a: Variable, tau: Variable) Variable[source]

Compute the return samples for given state and action.

Parameters
  • s (nn.Variable) – state variable.

  • a (nn.Variable) – action variable.

  • tau (nn.Variable) – quantile threshold.

Returns

return samples from implicit return distribution for given state and action using tau.

Return type

nn.Variable

sample_tau(shape: Optional[Iterable] = None) Variable[source]

Sample quantile thresholds from uniform distribution

Parameters

shape (Tuple[int] or None) – shape of the quantile threshold to sample. If None the shape will be (1, K).

Returns

quantile thresholds

Return type

nn.Variable

class nnabla_rl.models.reward_function.RewardFunction(scope_name: str)[source]

Bases: Model

Base reward function class

abstract r(s_current: Variable, a_current: Variable, s_next: Variable) Variable[source]

Computes the reward for the given state, action and next state. One (or more than one) of the input variables may not be used in the actual computation.

Parameters
  • s_current (nnabla.Variable) – State variable

  • a_current (nnabla.Variable) – Action variable

  • s_next (nnabla.Variable) – Next state variable

Returns

Reward for the given state, action and next state.

Return type

nnabla.Variable

class nnabla_rl.models.VFunction(scope_name: str)[source]

Bases: Model

Base Value function class

abstract v(s: Variable) Variable[source]

Compute the state value (V) for given state

Parameters

s (nn.Variable) – state variable

Returns

State value for given state

Return type

nn.Variable

class nnabla_rl.models.Encoder(scope_name: str)[source]

Bases: Model

abstract encode(x: Variable, **kwargs) Variable[source]

Encode the input variable to latent representation.

Parameters

x (nn.Variable) – encoder input.

Returns

latent variable

Return type

nn.Variable

class nnabla_rl.models.VariationalAutoEncoder(scope_name: str)[source]

Bases: Encoder

abstract decode(z: Optional[Variable], **kwargs) Variable[source]

Reconstruct the latent representation.

Parameters

z (nn.Variable, optional) – latent variable. If the input is None, random sample will be used instead.

Returns

reconstructed variable

Return type

nn.Variable

abstract decode_multiple(z: Optional[Variable], decode_num: int, **kwargs)[source]

Reconstruct multiple latent representations.

Parameters

z (nn.Variable, optional) – encoder input. If the input is None, random sample will be used instead.

Returns

Reconstructed input and latent distribution

Return type

nn.Variable

abstract encode_and_decode(x: Variable, **kwargs) Tuple[Distribution, Variable][source]

Encode the input variable and reconstruct.

Parameters

x (nn.Variable) – encoder input.

Returns

latent distribution and reconstructed input

Return type

Tuple[Distribution, nn.Variable]

abstract latent_distribution(x: Variable, **kwargs) Distribution[source]

Compute the latent distribution \(p(z|x)\).

Parameters

x (nn.Variable) – encoder input.

Returns

latent distribution

Return type

Distribution