id2vec

Last updated 6 months ago

A little under 1M identifier embeddings, generated for identifiers extracted from half of PGA in June 2018. New pipeline was used, with splitting and stemming of identifiers, the full description can be found in the "Algorithms" section of the sourced.ml repository.

Example:

from sourced.ml.models import Id2Vec
id2vec = Id2Vec().load("3467e9ca-ec11-444a-ba27-9fa55f5ee6c1")
print("Number of tokens:", len(id2vec))

References

ID

3467e9ca-ec11-444a-ba27-9fa55f5ee6c1

Uploaded

2018-07-19 13:14:53.000621

Version

1.0.0

File

https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fid2vec%2F3467e9ca-ec11-444a-ba27-9fa55f5ee6c1.asdf

Size

1.2 GB

Data collection date

June 2018

Number of tokens

999,424

Size of each embedding

300

License