92609e70-f79c-46b5-8419-55726e873cfc

Last updated 9 months ago

Generated from 140,000 most starred projects on GitHub in October 2016. Legacy pipeline, no splitting and stemming, later converted with quality loss.

Example:

from sourced.ml.models import Id2Vec
id2vec = Id2Vec().load("92609e70-f79c-46b5-8419-55726e873cfc")
print("Number of tokens:", len(id2vec))

References

ID

92609e70-f79c-46b5-8419-55726e873cfc

Uploaded

2017-06-18 17:37:06.255615

Version

1.0.0

File

https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fid2vec%2F92609e70-f79c-46b5-8419-55726e873cfc.asdf

Size

1.1 GB

Data collection date

October 2016

Number of (sub)tokens

5,720,096

Number of repositories

112,273

License