MultiHeadAttention

class ivy.neural_net_stateful.layers.MultiHeadAttention(query_dim, num_heads=8, head_dim=64, dropout_rate=0.0, context_dim=None, scale=None, dev_str=None, v=None, build_mode='on_init')[source]

Bases: ivy.neural_net_stateful.module.Module

__init__(query_dim, num_heads=8, head_dim=64, dropout_rate=0.0, context_dim=None, scale=None, dev_str=None, v=None, build_mode='on_init')[source]

Multi Head Attention layer.

Parameters
  • query_dim (int) – The dimension of the attention queries.

  • num_heads (int, optional) – Number of attention heads. Default is 8.

  • head_dim (int, optional) – The dimension of each of the heads. Default is 64.

  • dropout_rate (float, optional) – The rate of dropout. Default is 0.

  • context_dim (int, optional.) – The dimension of the context array. Default is None, in which case the query dim is used.

  • scale (float, optional) – The value by which to scale the query-key similarity measure. Default is head_dim^-0.5

  • dev_str (str, optional) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.

  • v (ivy container of variables, optional) – the variables for the attention layer, as a container, constructed internally by default.

  • build_mode (str, optional) – How the Module is built, either on initialization (now), explicitly by the user by calling build(), or the first time the __call__ method is run. Default is on initialization.


Supported Frameworks:

empty jax_logo empty tf_logo empty pytorch_logo empty mxnet_logo empty numpy_logo empty