Models¶
All models are derived from nnabla_rl.models.Model
Model¶
- class nnabla_rl.models.model.Model(scope_name: str)[source]¶
Model Class
- Parameters
scope_name (str) – the scope name of model
- deepcopy(new_scope_name: str) → nnabla_rl.models.model.Model[source]¶
Create a copy of the model. All the model parameter’s (if exist) associated with will be copied.
- Parameters
new_scope_name (str) – scope_name of parameters for newly created model
- Returns
copied model
- Return type
- Raises
ValueError – Given scope name is same as the model or already exists.
- get_parameters(grad_only: bool = True) → Dict[str, nnabla._variable.Variable][source]¶
Retrive parameters associated with this model
- Parameters
grad_only (bool) – Retrive parameters only with need_grad = True. Defaults to True.
- Returns
Parameter map.
- Return type
parameters (OrderedDict)
- load_parameters(filepath: Union[str, pathlib.Path]) → None[source]¶
Load model parameters from given filepath.
- Parameters
filepath (str or pathlib.Path) – paramter file path
- save_parameters(filepath: Union[str, pathlib.Path]) → None[source]¶
Save model parameters to given filepath.
- Parameters
filepath (str or pathlib.Path) – paramter file path
- property scope_name: str¶
Get scope name of this model.
- Returns
scope name of the model
- Return type
scope_name (str)
List of Models¶
- class nnabla_rl.models.Perturbator(scope_name)[source]¶
Bases:
nnabla_rl.models.model.Model
DeterministicPolicy Abstract class for perturbator
Perturbator generates noise to append to current state’s action
- class nnabla_rl.models.Policy(scope_name: str)[source]¶
Bases:
nnabla_rl.models.model.Model
- class nnabla_rl.models.DeterministicPolicy(scope_name: str)[source]¶
Bases:
nnabla_rl.models.policy.Policy
Abstract class for deterministic policy
This policy returns an action for the given state.
- class nnabla_rl.models.StochasticPolicy(scope_name: str)[source]¶
Bases:
nnabla_rl.models.policy.Policy
Abstract class for stochastic policy
This policy returns a probability distribution of action for the given state.
- abstract pi(s: nnabla._variable.Variable) → nnabla_rl.distributions.distribution.Distribution[source]¶
- Parameters
state (nnabla.Variable) – State variable
- Returns
Probability distribution of the action for the given state
- Return type
- class nnabla_rl.models.QFunction(scope_name: str)[source]¶
Bases:
nnabla_rl.models.model.Model
Base QFunction Class
- all_q(s: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute Q-values for each action for given state
- Parameters
s (nn.Variable) – state variable
- Returns
Q-values for each action for given state
- Return type
nn.Variable
- argmax_q(s: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute the action which maximizes the Q-value for given state
- Parameters
s (nn.Variable) – state variable
- Returns
action which maximizes the Q-value for given state
- Return type
nn.Variable
- class nnabla_rl.models.ValueDistributionFunction(scope_name: str, n_action: int, n_atom: int, v_min: float, v_max: float)[source]¶
Bases:
nnabla_rl.models.model.Model
Base value distribution class.
Computes the probabilities of q-value for each action. Value distribution function models the probabilities of q value for each action by dividing the values between the maximum q value and minimum q value into ‘n_atom’ number of bins and assigning the probability to each bin.
- Parameters
scope_name (str) – scope name of the model
n_action (int) – Number of actions which used in target environment.
n_atom (int) – Number of bins.
v_min (int) – Minimum value of the distribution.
v_max (int) – Maximum value of the distribution.
- all_probs(s: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute probabilities of atoms for all posible actions for given state
- Parameters
s (nn.Variable) – state variable
- Returns
probabilities of atoms for all posible actions for given state
- Return type
nn.Variable
- as_q_function() → nnabla_rl.models.q_function.QFunction[source]¶
Convert the value distribution function to QFunction.
- Returns
QFunction instance which computes the q-values based on the probabilities.
- Return type
- max_q_probs(s: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute probabilities of atoms for given state that maximizes the q_value
- Parameters
s (nn.Variable) – state variable
- Returns
probabilities of atoms for given state that maximizes the q_value
- Return type
nn.Variable
- abstract probs(s: nnabla._variable.Variable, a: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute probabilities of atoms for given state and action
- Parameters
s (nn.Variable) – state variable
a (nn.Variable) – action variable
- Returns
probabilities of atoms for given state and action
- Return type
nn.Variable
- class nnabla_rl.models.QuantileDistributionFunction(scope_name: str, n_action: int, n_quantile: int)[source]¶
Bases:
nnabla_rl.models.model.Model
Base quantile distribution class.
Computes the quantiles of q-value for each action. Quantile distribution function models the quantiles of q value for each action by dividing the probability (which is between 0.0 and 1.0) into ‘n_quantile’ number of bins and assigning the n-quantile to n-th bin.
- Parameters
scope_name (str) – scope name of the model
n_action (int) – Number of actions which used in target environment.
n_quantile (int) – Number of bins.
- all_quantiles(s: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Computes the quantiles of q-value for each action for the given state.
- Parameters
s (nn.Variable) – state variable
- Returns
quantiles of q-value for each action for the given state
- Return type
nn.Variable
- as_q_function() → nnabla_rl.models.q_function.QFunction[source]¶
Convert the quantile distribution function to QFunction.
- Returns
QFunction instance which computes the q-values based on the quantiles.
- Return type
- max_q_quantiles(s: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute the quantiles of q-value for given state that maximizes the q_value
- Parameters
s (nn.Variable) – state variable
- Returns
quantiles of q-value for given state that maximizes the q_value
- Return type
nn.Variable
- quantiles(s: nnabla._variable.Variable, a: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Computes the quantiles of q-value for given state and action.
- Parameters
s (nn.Variable) – state variable
a (nn.Variable) – action variable
- Returns
quantiles of q-value for given state and action.
- Return type
nn.Variable
- class nnabla_rl.models.StateActionQuantileFunction(scope_name: str, n_action: int, K: int, risk_measure_function: Callable[[nnabla._variable.Variable], nnabla._variable.Variable] = <function risk_neutral_measure>)[source]¶
Bases:
nnabla_rl.models.model.Model
state-action quantile function class.
Computes the return samples of q-value for each action. State-action quantile function computes the return samples of q value for each action using sampled quantile threshold (e.g. \(\tau\sim U([0,1])\)) for given state.
- Parameters
scope_name (str) – scope name of the model
n_action (int) – Number of actions which used in target environment.
K (int) – Number of samples for quantile threshold \(\tau\).
risk_measure_function (Callable[[nn.Variable], nn.Variable]) – Risk measure funciton which modifies the weightings of tau. Defaults to risk neutral measure which does not do any change to the taus.
- all_quantile_values(s: nnabla._variable.Variable, tau: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute the return samples for all action for given state and quantile threshold.
- Parameters
s (nn.Variable) – state variable.
tau (nn.Variable) – quantile threshold.
- Returns
return samples from implicit return distribution for given state using tau.
- Return type
nn.Variable
- as_q_function() → nnabla_rl.models.q_function.QFunction[source]¶
Convert the state action quantile function to QFunction.
- Returns
QFunction instance which computes the q-values based on return samples.
- Return type
- max_q_quantile_values(s: nnabla._variable.Variable, tau: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute the return samples from distribution that maximizes q value for given state using quantile threshold.
- Parameters
s (nn.Variable) – state variable.
tau (nn.Variable) – quantile threshold.
- Returns
return samples from implicit return distribution that maximizes q for given state using tau.
- Return type
nn.Variable
- quantile_values(s: nnabla._variable.Variable, a: nnabla._variable.Variable, tau: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Compute the return samples for given state and action.
- Parameters
s (nn.Variable) – state variable.
a (nn.Variable) – action variable.
tau (nn.Variable) – quantile threshold.
- Returns
return samples from implicit return distribution for given state and action using tau.
- Return type
nn.Variable
- sample_tau(shape: Optional[Iterable] = None) → nnabla._variable.Variable[source]¶
Sample quantile thresholds from uniform distribution
- Parameters
shape (Tuple[int] or None) – shape of the quantile threshold to sample. If None the shape will be (1, K).
- Returns
quantile thresholds
- Return type
nn.Variable
- class nnabla_rl.models.reward_function.RewardFunction(scope_name: str)[source]¶
Bases:
nnabla_rl.models.model.Model
Base reward function class
- abstract r(s_current: nnabla._variable.Variable, a_current: nnabla._variable.Variable, s_next: nnabla._variable.Variable) → nnabla._variable.Variable[source]¶
Computes the reward for the given state, action and next state. One (or more than one) of the input variables may not be used in the actual computation.
- Parameters
s_current (nnabla.Variable) – State variable
a_current (nnabla.Variable) – Action variable
s_next (nnabla.Variable) – Next state variable
- Returns
Reward for the given state, action and next state.
- Return type
nnabla.Variable
- class nnabla_rl.models.VFunction(scope_name: str)[source]¶
Bases:
nnabla_rl.models.model.Model
Base Value function class
- class nnabla_rl.models.Encoder(scope_name: str)[source]¶
Bases:
nnabla_rl.models.model.Model
- class nnabla_rl.models.VariationalAutoEncoder(scope_name: str)[source]¶
Bases:
nnabla_rl.models.encoder.Encoder
- abstract decode(z: Optional[nnabla._variable.Variable], **kwargs) → nnabla._variable.Variable[source]¶
Reconstruct the latent representation.
- Parameters
z (nn.Variable, optional) – latent variable. If the input is None, random sample will be used instead.
- Returns
reconstructed variable
- Return type
nn.Variable
- abstract decode_multiple(z: Optional[nnabla._variable.Variable], decode_num: int, **kwargs)[source]¶
Reconstruct multiple latent representations.
- Parameters
z (nn.Variable, optional) – encoder input. If the input is None, random sample will be used instead.
- Returns
Reconstructed input and latent distribution
- Return type
nn.Variable
- abstract encode_and_decode(x: nnabla._variable.Variable, **kwargs) → Tuple[nnabla_rl.distributions.distribution.Distribution, nnabla._variable.Variable][source]¶
Encode the input variable and reconstruct.
- Parameters
x (nn.Variable) – encoder input.
- Returns
latent distribution and reconstructed input
- Return type
Tuple[Distribution, nn.Variable]
- abstract latent_distribution(x: nnabla._variable.Variable, **kwargs) → nnabla_rl.distributions.distribution.Distribution[source]¶
Compute the latent distribution \(p(z|x)\).
- Parameters
x (nn.Variable) – encoder input.
- Returns
latent distribution
- Return type