Embeddings
fasttext_embedding
Fasttext_embedding
- class mindnlp.modules.embeddings.fasttext_embedding.Fasttext(vocab: Vocab, init_embed, requires_grad: bool = True, dropout=0.0)[source]
Bases:
TokenEmbeddingEmbedding layer.
- Parameters
vocab (Vocab) – Passins into Vocab for initialization.
init_embed (Tensor) – Passing into Tensor,use these values to initialize Embedding directly.
requires_grad (bool) – Whether this parameter needs to be gradient to update. Default: True.
dropout (float) – Dropout of the output of Embedding. Default: 0.5.
Examples
>>> vocab = Vocab.from_list(['default','one','two','three']) >>> init_embed = Tensor(np.zeros((4, 4)).astype(np.float32)) >>> fasttext_embed = Fasttext(vocab, init_embed) >>> ids = Tensor([1, 2, 3]) >>> output = fasttext_embed(ids)
- construct(ids)[source]
- Parameters
ids (Tensor) – Ids to query.
- Returns
Tensor, returns the Embedding query results.
- dims = [300]
- classmethod from_pretrained(name='1M', dims=300, root='/home/docs/.mindnlp', special_tokens=('<pad>', '<unk>'), special_first=True, **kwargs)[source]
Creates Embedding instance from given pre-trained word vector.
- Parameters
name (str) – The name of the pretrained vector. Default: “1M”.
dims (int) – The dimension of the pretrained vector. Default: 300.
root (str) – Default storage directory. Default: DEFAULT_ROOT.
special_tokens (tuple<str,str>) – List of special participles. Default: (“<pad>”, “<unk>”).
special_first (bool) – Indicates whether special participles from special_tokens will be added to the top of the dictionary. If True, add special_tokens to the beginning of the dictionary, otherwise add them to the end. Default: True.
kwargs (dict) –
requires_grad (bool): Whether this parameter needs to be gradient to update.
dropout (float): Dropout of the output of Embedding.
- Returns
Fasttext, Returns an embedding instance generated through a pretrained word vector.
Vocab, Vocabulary extracted from the file.
- classmethod load(foldername=None, root='/home/docs/.mindnlp', load_npy=False, vocab=None, npy_path=None)[source]
Load embedding from the specified location.
- Parameters
foldername (str) – Name of the folder to load. Default: None.
root (Path) – Path of the embedding folder. Default: DEFAULT_ROOT.
load_npy (Bool) – Whether to initialize the embedding as a npy file. Vocab and npy_path are valid when load_npy is True. Default: False.
vocab (Vocab) – If initialized with a npy file, pass in vocab. Default: None.
npy_path (Path) – Location of the npy file. Default: None.
- Returns
None
- save(foldername, root='/home/docs/.mindnlp')[source]
Save the embedding to the specified location.
- Parameters
foldername (str) – Name of the folder to store.
root (Path) – Path of the embedding folder. Default: DEFAULT_ROOT.
- Returns
None
- urls = {'1M': 'https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip', '1M-subword': 'https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip'}
glove_embedding
glove_embedding
- class mindnlp.modules.embeddings.glove_embedding.Glove(vocab: Vocab, init_embed, requires_grad: bool = True, dropout=0.0)[source]
Bases:
TokenEmbeddingEmbedding layer.
- Parameters
vocab (Vocab) – Passins into Vocab for initialization.
init_embed (Tensor) – Passing into Tensor,use these values to initialize Embedding directly.
requires_grad (bool) – Whether this parameter needs to be gradient to update. Default: True.
dropout (float) – Dropout of the output of Embedding. Default: 0.5.
Examples
>>> vocab = Vocab.from_list(['default','one','two','three']) >>> init_embed = Tensor(np.zeros((4, 4)).astype(np.float32)) >>> glove_embed = Glove(vocab, init_embed) >>> ids = Tensor([1, 2, 3]) >>> output = glove_embed(ids)
- construct(ids)[source]
- Parameters
ids (Tensor) – Ids to query.
- Returns
Tensor, returns the Embedding query results.
- dims = [50, 100, 200, 300]
- classmethod from_pretrained(name='6B', dims=300, root='/home/docs/.mindnlp', special_tokens=('<pad>', '<unk>'), special_first=True, **kwargs)[source]
Creates Embedding instance from given pre-trained word vector.
- Parameters
name (str) – The name of the pretrained vector. Default: ‘6B’.
dims (int) – The dimension of the pretrained vector. Default: 300.
root (str) – Default storage directory. Default: DEFAULT_ROOT.
special_tokens (tuple<str,str>) – List of special participles. Default: (“<pad>”, “<unk>”).
special_first (bool) – Indicates whether special participles from special_tokens will be added to the top of the dictionary. If True, add special_tokens to the beginning of the dictionary, otherwise add them to the end. Default: True.
kwargs (dict) –
requires_grad (bool): Whether this parameter needs to be gradient to update.
dropout (float): Dropout of the output of Embedding.
- Returns
Glove, Returns an embedding instance generated through a pretrained word vector.
Vocab, Vocabulary extracted from the file.
- classmethod load(foldername=None, root='/home/docs/.mindnlp', load_npy=False, vocab=None, npy_path=None)[source]
Load embedding from the specified location.
- Parameters
foldername (str) – Name of the folder to load. Default: None.
root (Path) – Path of the embedding folder. Default: DEFAULT_ROOT.
load_npy (Bool) – Whether to initialize the embedding as a npy file. Vocab and npy_path are valid when load_npy is True. Default: False.
vocab (Vocab) – If initialized with a npy file, pass in vocab. Default: None.
npy_path (Path) – Location of the npy file. Default: None.
- Returns
None
- save(foldername, root='/home/docs/.mindnlp')[source]
Save the embedding to the specified location.
- Parameters
foldername (str) – Name of the folder to store.
root (Path) – Path of the embedding folder. Default: DEFAULT_ROOT.
- Returns
None
- urls = {'42B': 'http://nlp.stanford.edu/data/glove.42B.300d.zip', '6B': 'http://nlp.stanford.edu/data/glove.6B.zip', '840B': 'http://nlp.stanford.edu/data/glove.840B.300d.zip', 'twitter.27B': 'http://nlp.stanford.edu/data/glove.twitter.27B.zip'}
word2vec_embedding
Word2vec_embedding
- class mindnlp.modules.embeddings.word2vec_embedding.Word2vec(vocab: Vocab, init_embed, requires_grad: bool = True, dropout=0.0)[source]
Bases:
TokenEmbeddingEmbedding layer.
- Parameters
vocab (Vocab) – Passins into Vocab for initialization.
init_embed (Tensor) – Passing into Tensor,use these values to initialize Embedding directly.
requires_grad (bool) – Whether this parameter needs to be gradient to update. Default: True.
dropout (float) – Dropout of the output of Embedding. Default: 0.5.
Examples
>>> vocab = Vocab.from_list(['default','one','two','three']) >>> init_embed = Tensor(np.zeros((4, 4)).astype(np.float32)) >>> word2vec_embed = Word2vec(vocab, init_embed) >>> ids = Tensor([1, 2, 3]) >>> output = word2vec_embed(ids)
- construct(ids)[source]
- Parameters
ids (Tensor) – Ids to query.
- Returns
Tensor, returns the Embedding query results.
- dims = [300]
- classmethod from_pretrained(name='google-news', dims=300, root='/home/docs/.mindnlp', special_tokens=('<pad>', '<unk>'), special_first=True, use_gensim=True, **kwargs)[source]
Creates Embedding instance from given pre-trained word vector.
- Parameters
name (str) – The name of the pretrained vector. Default: ‘google-news’.
dims (int) – The dimension of the pretrained vector. Default: 300.
root (str) – Default storage directory. Default: DEFAULT_ROOT.
special_tokens (tuple<str,str>) – List of special participles. Default: (“<pad>”, “<unk>”).
special_first (bool) – Indicates whether special participles from special_tokens will be added to the top of the dictionary. If True, add special_tokens to the beginning of the dictionary, otherwise add them to the end. Default: True.
use_gensim (bool) – Whether to load word vectors with gensim library.
kwargs (dict) –
requires_grad (bool): Whether this parameter needs to be gradient to update.
dropout (float): Dropout of the output of Embedding.
- Returns
Word2vec, Returns an embedding instance generated through a pretrained word vector.
Vocab, Vocabulary extracted from the file.
- classmethod load(foldername=None, root='/home/docs/.mindnlp', load_npy=False, vocab=None, npy_path=None)[source]
Load embedding from the specified location.
- Parameters
foldername (str) – Name of the folder to load. Default: None.
root (Path) – Path of the embedding folder. Default: DEFAULT_ROOT.
load_npy (Bool) – Whether to initialize the embedding as a npy file. Vocab and npy_path are valid when load_npy is True. Default: False.
vocab (Vocab) – If initialized with a npy file, pass in vocab. Default: None.
npy_path (Path) – Location of the npy file. Default: None.
- Returns
None
- save(foldername, root='/home/docs/.mindnlp')[source]
Save the embedding to the specified location.
- Parameters
foldername (str) – Name of the folder to store.
root (Path) – Path of the embedding folder. Default: DEFAULT_ROOT.
- Returns
None
- urls = {'google-news': 'https://github.com/RaRe-Technologies/gensim-data/releases/download/word2vec-google-news-300/word2vec-google-news-300.gz'}
Embedding class