MultiHeadAttention(query_dim, num_heads=8, head_dim=64, dropout_rate=0.0, context_dim=None, scale=None, dev_str=None, v=None, build_mode='on_init')¶
__init__(query_dim, num_heads=8, head_dim=64, dropout_rate=0.0, context_dim=None, scale=None, dev_str=None, v=None, build_mode='on_init')¶
Multi Head Attention layer.
query_dim (int) – The dimension of the attention queries.
num_heads (int, optional) – Number of attention heads. Default is 8.
head_dim (int, optional) – The dimension of each of the heads. Default is 64.
dropout_rate (float, optional) – The rate of dropout. Default is 0.
context_dim (int, optional.) – The dimension of the context array. Default is None, in which case the query dim is used.
scale (float, optional) – The value by which to scale the query-key similarity measure. Default is head_dim^-0.5
dev_str (str, optional) – device on which to create the layer’s variables ‘cuda:0’, ‘cuda:1’, ‘cpu’ etc. Default is cpu.
v (ivy container of variables, optional) – the variables for the attention layer, as a container, constructed internally by default.
build_mode (str, optional) – How the Module is built, either on initialization (now), explicitly by the user by calling build(), or the first time the __call__ method is run. Default is on initialization.