Schema

Last updated 7 months ago

You can execute the SHOW TABLES statement to get a list of the available tables. To get all the columns and types of a specific table, you can write DESCRIBE TABLE [tablename].

gitbase exposes the following tables:

Main tables

repositories

+---------------+------+
| name | type |
+---------------+------+
| repository_id | TEXT |
+---------------+------+

Table that contains all the repositories on the dataset. repository_id is the path to the repository folder.

In case of siva files, the id is the path + the siva file name.

remotes

+----------------------+------+
| name | type |
+----------------------+------+
| repository_id | TEXT |
| remote_name | TEXT |
| remote_push_url | TEXT |
| remote_fetch_url | TEXT |
| remote_push_refspec | TEXT |
| remote_fetch_refspec | TEXT |
+----------------------+------+

This table will return all the remotes configured on git config file of all the repositories.

refs

+---------------+------+
| name | type |
+---------------+------+
| repository_id | TEXT |
| ref_name | TEXT |
| commit_hash | TEXT |
+---------------+------+

This table contains all hash git references and the symbolic reference HEAD from all the repositories.

commits

+---------------------+-----------+
| name | type |
+---------------------+-----------+
| repository_id | TEXT |
| commit_hash | TEXT |
| commit_author_name | TEXT |
| commit_author_email | TEXT |
| commit_author_when | TIMESTAMP |
| committer_name | TEXT |
| committer_email | TEXT |
| committer_when | TIMESTAMP |
| commit_message | TEXT |
| tree_hash | TEXT |
| commit_parents | JSON |
+---------------------+-----------+

Commits contains all the commits from all the references from all the repositories, not duplicated by repository. Note that you can have the same commit in several repositories. In that case the commit will appear two times on the table, one per repository.

Note that this table is not only showing HEAD commits but all the commits on the repository (that can be a lot more than the commits on HEAD reference).

blobs

+---------------+-------+
| name | type |
+---------------+-------+
| repository_id | TEXT |
| blob_hash | TEXT |
| blob_size | INT64 |
| blob_content | BLOB |
+---------------+-------+

This table exposes blob objects, that are the content without path from files.

Note that this table will return all the existing blobs on all the commits on all the repositories, potentially a lot of data. In most common cases you want to filter by commit, by reference or by repository.

tree_entries

+-----------------+------+
| name | type |
+-----------------+------+
| repository_id | TEXT |
| tree_entry_name | TEXT |
| blob_hash | TEXT |
| tree_hash | TEXT |
| tree_entry_mode | TEXT |
+-----------------+------+

tree_entries table contains all the objects from all the repositories that are tree objects.

files

+-----------------+-------+
| name | type |
+-----------------+-------+
| repository_id | TEXT |
| file_path | TEXT |
| blob_hash | TEXT |
| tree_hash | TEXT |
| tree_entry_mode | TEXT |
| blob_content | BLOB |
| blob_size | INT64 |
+-----------------+-------+

files is an utility table mixing tree_entries and blobs to create files. It includes the file path.

Queries to this table are expensive and they should be done carefully (applying filters or using directly blobs or tree_entries tables).

Relation tables

commit_blobs

+---------------+------+
| name | type |
+---------------+------+
| repository_id | TEXT |
| commit_hash | TEXT |
| blob_hash | TEXT |
+---------------+------+

This table represents the relation between commits and blobs. With this table you can obtain all the blobs contained on a commit object.

commit_trees

+---------------+------+
| name | type |
+---------------+------+
| repository_id | TEXT |
| commit_hash | TEXT |
| tree_hash | TEXT |
+---------------+------+

This table represents the relation between commits and trees. With this table you can obtain all the tree entries contained on a commit object.

commit_files

+---------------+------+
| name | type |
+---------------+------+
| repository_id | TEXT |
| commit_hash | TEXT |
| file_path | TEXT |
| blob_hash | TEXT |
| tree_hash | TEXT |
+---------------+------+

This table represents the relation between commits and files. Using this table, you can obtain all the files related to a certain commit object.

ref_commits

+---------------+-------+
| name | type |
+---------------+-------+
| repository_id | TEXT |
| commit_hash | TEXT |
| ref_name | TEXT |
| history_index | INT64 |
+---------------+-------+

This table allow us to get the commit history from a specific reference name. history_index column represents the position of the commit from a specific reference.

This table it's like the log from a specific reference.

Commits will be repeated if they are in several repositories or references.

Database diagram

gitbase schema