1e3da42a-28b6-4b33-94a2-a5671f4102f4

Last updated 7 months ago

Bags of identifiers generated from 140,000 most starred projects on GitHub in October 2016 - ~112k after deduplication.

Example:

from sourced.ml.models import BOW
bow = BOW().load("1e3da42a-28b6-4b33-94a2-a5671f4102f4")
print("Number of documents:", len(bow))
print("Number of tokens:", len(bow.tokens))

References

ID

1e3da42a-28b6-4b33-94a2-a5671f4102f4

Uploaded

2017-06-19 09:16:08.942880

Version

1.0.0

File

https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fbow%2F1e3da42a-28b6-4b33-94a2-a5671f4102f4.asdf

Size

380.8 MB

Data collection date

October 2016

Number of (sub)tokens

999,424

Number of repositories

112,273

License

Dependencies