Skip to content

Commit 2b9cd22

Browse files
authored
search: create and document git-stats script (sourcegraph#32663)
This is a script we have shared directly with customers before to understand the scale of the monorepos. This now stores it in our repository and documents it under our monorepo documentation. Test Plan: ran git-stats on the sourcegraph repo. Note the links in the documentation will only work once this PR has landed.
1 parent 811d62f commit 2b9cd22

File tree

2 files changed

+72
-0
lines changed

2 files changed

+72
-0
lines changed

dev/git-stats

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#!/usr/bin/env bash
2+
3+
# This script outputs statistics for the current git repository. This script
4+
# is used by the search-core team to understand the size and shape of a
5+
# repository. In particular we use this when understanding the scale of a
6+
# monorepo to help us guide our work.
7+
8+
set -e
9+
10+
# Do everything from the gitdir. Makes ls-tree below also not work on a
11+
# subtree.
12+
cd "$(git rev-parse --git-dir)"
13+
14+
# The size of the git store (not the working copy).
15+
echo "$(du -sh .)" gitdir
16+
17+
# The number of commits reachable from HEAD
18+
echo "$(git rev-list --count HEAD)" commits
19+
20+
21+
# Some awk which extracts statistics on the files in the latest commit.
22+
echo
23+
echo HEAD statistics
24+
git ls-tree -r --long HEAD | awk '
25+
BEGIN {
26+
base = 10
27+
logbase = log(base)
28+
}
29+
$4 != "-" {
30+
if ($4 == 0) {
31+
hist[0]++
32+
} else {
33+
hist[int(log($4) / logbase) + 1]++
34+
}
35+
total += $4
36+
count++
37+
}
38+
END {
39+
printf("%.3fGiB\n%d files\n", total / 1024 / 1024 / 1024, count)
40+
printf("histogram:\n")
41+
for (x in hist) {
42+
printf("%d^%d\t%d\n", base, x, hist[x])
43+
}
44+
}
45+
'

doc/admin/monorepo.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,30 @@ Sourcegraph's code search index scales horizontally with the number of files bei
2424
Sourcegraph clones code from your code host via the usual `git clone` or `git fetch` commands. Some organisations use custom `git` binaries or commands to speed up these operations. Sourcegraph supports using alternative git binaries to allow cloning. This can be done by inheriting from the `gitserver` docker image and installing the custom `git` onto the `$PATH`.
2525

2626
Some monorepos use a custom command for `git fetch` to speed up fetch. Sourcegraph provides the `experimentalFeatures.customGitFetch` site setting to specify the custom command.
27+
28+
## Statistics
29+
30+
You can help the Sourcegraph developers understand the scale of your monorepo by sharing some statistics with the team. The bash script [`git-stats`](https://github.com/sourcegraph/sourcegraph/blob/main/dev/git-stats) when run in your git repository will calculate these statistics.
31+
32+
Example output on the Sourcegraph repository:
33+
34+
``` shellsession
35+
$ wget https://github.com/sourcegraph/sourcegraph/blob/main/dev/git-stats
36+
$ chmod +x git-stats
37+
$ ./git-stats
38+
725M . gitdir
39+
19671 commits
40+
41+
HEAD statistics
42+
0.096GiB
43+
8638 files
44+
histogram:
45+
10^0 6
46+
10^1 69
47+
10^2 667
48+
10^3 2236
49+
10^4 4589
50+
10^5 971
51+
10^6 86
52+
10^7 14
53+
```

0 commit comments

Comments
 (0)