Skip to content

Commit 1e3ca76

Browse files
authored
Merge branch 'master' into feature/uast-imports
2 parents 583341e + 69b10c9 commit 1e3ca76

File tree

9 files changed

+51
-57
lines changed

9 files changed

+51
-57
lines changed

.travis.yml

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,28 +4,14 @@ go_import_path: github.com/src-d/gitbase
44
go: 1.12.x
55

66
env:
7-
- GO111MODULE=on
7+
- GO111MODULE=on GOPROXY=https://proxy.golang.org
88

99
matrix:
1010
fast_finish: true
11-
addons:
12-
apt:
13-
sources:
14-
- ubuntu-toolchain-r-test
15-
packages:
16-
- gcc-6
17-
- g++-6
18-
- libonig-dev
19-
20-
21-
before_install:
22-
- sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 90
23-
- sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 90
2411

2512
before_script:
2613
- docker run -d --name bblfshd --privileged -p 9432:9432 bblfsh/bblfshd:v2.14.0-drivers
2714
- docker exec -it bblfshd bblfshctl driver list
28-
- go get -v github.com/go-sql-driver/mysql/...
2915

3016
script:
3117
- make test-coverage codecov
@@ -61,8 +47,6 @@ jobs:
6147
- echo "skipping before_script for macOS"
6248

6349
script:
64-
- brew update
65-
- brew install oniguruma
6650
- make packages || echo "" # will fail because of docker being missing
6751
- if [ ! -f "build/gitbase_darwin_amd64/gitbase" ]; then echo "gitbase binary not generated" && exit 1; fi
6852
- cd build

Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ WORKDIR $GITBASE_PATH
1414
ENV GO_BUILD_ARGS="-o /bin/gitbase"
1515
ENV GO_BUILD_PATH="./cmd/gitbase"
1616
ENV GO111MODULE=on
17+
ENV GOPROXY=https://proxy.golang.org
1718

1819
RUN make static-build
1920

Jenkinsfile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ pipeline {
77
nodeSelector 'srcd.host/type=jenkins-worker'
88
containerTemplate {
99
name 'regression-gitbase'
10-
image 'srcd/regression-gitbase:v0.2.1'
10+
image 'srcd/regression-gitbase:v0.3.1'
1111
ttyEnabled true
1212
command 'cat'
1313
}
@@ -17,13 +17,16 @@ pipeline {
1717
GOPATH = "/go"
1818
GO_IMPORT_PATH = "github.com/src-d/regression-gibase"
1919
GO_IMPORT_FULL_PATH = "${env.GOPATH}/src/${env.GO_IMPORT_PATH}"
20+
GO111MODULE = "on"
21+
PROM_ADDRESS = "http://prom-pushgateway-prometheus-pushgateway.monitoring.svc.cluster.local:9091"
22+
PROM_JOB = "gitbase_perfomance"
2023
}
2124
triggers { pollSCM('0 0,12 * * *') }
2225
stages {
2326
stage('Run') {
2427
when { branch 'master' }
2528
steps {
26-
sh '/bin/regression --complexity=2 --csv local:HEAD'
29+
sh '/bin/regression --complexity=2 --csv --prom local:HEAD'
2730
}
2831
}
2932
stage('PR-run') {

docs/using-gitbase/examples.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ HAVING num > 1;
4141
## Get the number of blobs per HEAD commit
4242

4343
```sql
44-
SELECT COUNT(commit_blob),
44+
SELECT COUNT(blob_hash),
4545
commit_hash
4646
FROM ref_commits
4747
NATURAL JOIN commits
@@ -137,7 +137,7 @@ CREATE INDEX files_lang_idx ON files USING pilosa (language(file_path, blob_cont
137137
DROP INDEX files_lang_idx ON files;
138138
```
139139

140-
## Calculating code line changes in the last commit
140+
## Calculating code line changes in the last commit
141141

142142
This query will report how many lines of actual code (only code, not comments, blank lines or text) changed in the last commit of each repository.
143143

@@ -166,10 +166,10 @@ The output will be similar to this:
166166
+-----------------+------------------+--------------------+
167167
```
168168

169-
## Calculating code line changes for files in the last commit
169+
## Calculating code line changes for files in the last commit
170170

171171
This query will report how many lines of actual code (only code, not comments, blank lines or text) changed in each file of the last commit of each repository. It's similar to the previous example. `COMMIT_STATS` is an aggregation over the result of `COMMIT_FILE_STATS` so to speak.
172-
We will only report those files that whose language has been identified.
172+
We will only report those files whose language has been identified.
173173

174174
```sql
175175
SELECT
@@ -277,7 +277,7 @@ We'll get the following output:
277277

278278
From this output, we can obtain some information about our query:
279279
- It's been running for 36 seconds.
280-
- It's querying commit_files table and has processed 8 out of 9 partitions.
280+
- It's querying `commit_files` table and has processed 8 out of 9 partitions.
281281

282282
To kill a query that's currently running you can use the value in `Id`. If we were to kill the previous query, we would need to use the following query:
283283

docs/using-gitbase/functions.md

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,19 @@ To make some common tasks easier for the user, there are some functions to inter
66

77
| Name | Description |
88
|:-------------|:-------------------------------------------------------------------------------------------------------------------------------|
9-
|`commit_stats(repository_id, [from_commit_hash], to_commit_hash) json`|returns the stats between two commits for a repository. If from is empty, it will compare the given `to_commit_hash` with its parent commit. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
10-
|`commit_file_stats(repository_id, [from_commit_hash], to_commit_hash) json array`|returns an array with the stats of each file in `to_commit_hash` since the given `from_commit_hash`. If from is not given, the parent commit will be used. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
11-
|`is_remote(reference_name)bool`| check if the given reference name is from a remote one |
12-
|`is_tag(reference_name)bool`| check if the given reference name is a tag |
13-
|`is_vendor(file_path)bool`| check if the given file name is a vendored file |
14-
|`language(path, [blob])text`| gets the language of a file given its path and the optional content of the file |
15-
|`uast(blob, [lang, [xpath]]) blob`| returns a node array of UAST nodes in semantic mode |
16-
|`uast_mode(mode, blob, lang) blob`| returns a node array of UAST nodes specifying its language and mode (semantic, annotated or native) |
17-
|`uast_xpath(blob, xpath) blob`| performs an XPath query over the given UAST nodes |
18-
|`uast_extract(blob, key) text array`| extracts information identified by the given key from the uast nodes |
19-
|`uast_children(blob) blob`| returns a flattened array of the children UAST nodes from each one of the UAST nodes in the given array |
20-
|`loc(path, blob) json`| returns a JSON map, containing the lines of code of a file, separated in three categories: Code, Blank and Comment lines |
21-
|`version() text`| returns the gitbase version in the following format `8.0.11-{GITBASE_VERSION}` for compatibility with MySQL versioning |
9+
|`commit_stats(repository_id, [from_commit_hash], to_commit_hash) json`|returns the stats between two commits for a repository. If `from_commit_hash` is empty, it will compare the given `to_commit_hash` with its parent commit. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
10+
|`commit_file_stats(repository_id, [from_commit_hash], to_commit_hash) json array`|returns an array with the stats of each file in `to_commit_hash` since the given `from_commit_hash`. If `from_commit_hash` is not given, the parent commit will be used. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
11+
|`is_remote(reference_name)bool`| checks if the given reference name is from a remote one. |
12+
|`is_tag(reference_name)bool`| checks if the given reference name is a tag. |
13+
|`is_vendor(file_path)bool`| checks if the given file name is a vendored file. |
14+
|`language(path, [blob])text`| gets the language of a file given its path and the optional content of the file. |
15+
|`uast(blob, [lang, [xpath]]) blob`| returns a node array of UAST nodes in semantic mode. |
16+
|`uast_mode(mode, blob, lang) blob`| returns a node array of UAST nodes specifying its language and mode (semantic, annotated or native). |
17+
|`uast_xpath(blob, xpath) blob`| performs an XPath query over the given UAST nodes. |
18+
|`uast_extract(blob, key) text array`| extracts information identified by the given key from the uast nodes. |
19+
|`uast_children(blob) blob`| returns a flattened array of the children UAST nodes from each one of the UAST nodes in the given array. |
20+
|`loc(path, blob) json`| returns a JSON map, containing the lines of code of a file, separated in three categories: Code, Blank and Comment lines. |
21+
|`version() text`| returns the gitbase version in the following format `8.0.11-{GITBASE_VERSION}` for compatibility with MySQL versioning. |
2222
## Standard functions
2323

2424
These are all functions that are available because they are implemented in `go-mysql-server`, used by gitbase.
@@ -159,23 +159,29 @@ Check out the [UAST v2 specification](https://docs.sourced.tech/babelfish/uast/u
159159

160160
Using these selectors as in,
161161

162-
> uast_extract(nodes_column, @common_selector)
162+
```
163+
uast_extract(nodes_column, @common_selector)
164+
```
163165

164166
you will extract the value of that property for each node.
165167

166168
Nodes that have no value for the requested property will not be present in any way in the final array. That is, having a sequence of nodes `[node-1, node-2, node-3]` and knowing that node-2 doesn't have a value for the requested property, the returned array will be `[prop-1, prop-3]`.
167169

168170
Also, if you want to retrieve values from a non common property, you can pass it directly
169171

170-
> uast_extract(nodes_column, 'some-property')
172+
```
173+
uast_extract(nodes_column, 'some-property')
174+
```
171175

172176
## How to use `loc`
173177

174178
`loc` will return statistics about the lines of code in a file, such as the code lines, comment lines, etc.
175179

176180
It requires a file path and a file content.
177181

178-
> loc(file_path, blob_content)
182+
```
183+
loc(file_path, blob_content)
184+
```
179185

180186
The result of this function is a JSON document with the following shape:
181187

@@ -266,9 +272,9 @@ FROM (
266272

267273
It can be used in two ways:
268274
- To get the statistics of a specific commit `COMMIT_STATS(repository_id, commit_hash)`
269-
- To get the statistics of a the diff of a commit range `COMMIT_STATS(repository_id, from_commit, to_commit)`
275+
- To get the statistics of the diff of a commit range `COMMIT_STATS(repository_id, from_commit, to_commit)`
270276

271-
`commit_stats` it's pretty much an aggregation of the result of `commit_file_stats`. While `commit_file_stats` has the stats for each file in a commit, `commit_stats` has the global stats of all files in the commit. As a result, it outputs a single structure instead of an array of them.
277+
`commit_stats` is pretty much an aggregation of the result of `commit_file_stats`. While `commit_file_stats` has the stats for each file in a commit, `commit_stats` has the global stats of all files in the commit. As a result, it outputs a single structure instead of an array of them.
272278

273279
The shape of the result returned by this function is the following:
274280

@@ -300,7 +306,7 @@ The shape of the result returned by this function is the following:
300306

301307
**NOTE:** Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that `.gitignore` is considered a vendored file.
302308

303-
The result returned by this function is a JSON, which means to access its fields, the use of `JSON_EXTRACT is needed.
309+
The result returned by this function is a JSON, which means that to access its fields, the use of `JSON_EXTRACT` is needed.
304310

305311
For example, code additions would be accessed like this:
306312
```sql

docs/using-gitbase/indexes.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ Indexes are implemented as bitmaps using [pilosa](https://github.com/pilosa/pilo
77
Thus, to create indexes you must specify pilosa as the type of index. You can find some examples in the [examples](./examples.md#create-an-index-for-columns-on-a-table) section about managing indexes.
88

99
Note that you can create an index either **on one or more columns** or **on a single expression**.
10-
In practice, having multiple indexes - one per column is better and more flexible than one index for multiple columns. It is because of data structures (bitmaps) used to represent index values.
11-
Even if you have one index on multiple columns, every columns is stored in independent _field_.
10+
In practice, having multiple indexes (one per column) is better and more flexible than one index for multiple columns. It is because of data structures (bitmaps) used to represent index values.
11+
Even if you have one index on multiple columns, every column is stored in an independent _field_.
1212
Merging those _fields_ by any logic operations is fast and much more flexible. The main difference of having multiple columns per index is, it internally calculates intersection across columns, so the index won't be used if you use _non_ `AND` operation in a filter, e.g.:
1313

1414
With index on (`A`, `B`), the index will be used for following query:
@@ -26,4 +26,4 @@ and for the second query also two indexes will be used and the result will be a
2626

2727
You can find some more examples in the [examples](./examples.md#create-an-index-for-columns-on-a-table) section.
2828

29-
See [go-mysql-server](https://github.com/src-d/go-mysql-server/tree/541fde3b92093b3a449e803342a7a18c686275e6#indexes) documentation for more details
29+
See [go-mysql-server](https://github.com/src-d/go-mysql-server/tree/541fde3b92093b3a449e803342a7a18c686275e6#indexes) documentation for more details.

docs/using-gitbase/optimize-queries.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Even though in each release performance improvements are included to make gitbase faster, there are some queries that might take too long. By rewriting them in some ways, you can squeeze that extra performance you need by taking advantage of some optimisations that are already in place.
44

5-
There are two ways to optimize a gitbase query:
5+
There are three ways to optimize a gitbase query:
66
- Create an index for some parts.
77
- Making sure the joined tables are squashed.
88
- Making sure not squashed joins are performed in memory.
@@ -82,7 +82,7 @@ So, as a good rule of thumb, the right side of an inner join should always be th
8282
The more obvious way to improve the performance of a query is to create an index for such query. Since you can index multiple columns or a single arbitrary expression, this may be useful for some kinds of queries. For example, if you're querying by language, you may want to index that so there is no need to compute the language each time.
8383

8484
```sql
85-
CREATE INDEX files_language_idx ON files USING pilosa (language(file_path, blob_content))
85+
CREATE INDEX files_language_idx ON files USING pilosa (language(file_path, blob_content));
8686
```
8787

8888
Once you have the index in place, gitbase only looks for the rows with the values matching your conditions.
@@ -199,37 +199,37 @@ This advice can be applied to all squashed tables, not only `repository_id`.
199199

200200
This query will get squashed, because `NATURAL JOIN` makes sure all columns with equal names are used in the join.
201201
```sql
202-
SELECT * FROM refs NATURAL JOIN ref_commits NATURAL JOIN commits
202+
SELECT * FROM refs NATURAL JOIN ref_commits NATURAL JOIN commits;
203203
```
204204

205205
This query, however, will not be squashed.
206206
```sql
207207
SELECT * FROM refs r
208208
INNER JOIN ref_commits rc ON r.ref_name = rc.ref_name
209-
INNER JOIN commits c ON rc.commit_hash = c.commit_hash
209+
INNER JOIN commits c ON rc.commit_hash = c.commit_hash;
210210
```
211211

212212
**It requires some filters to be present in order to perform the squash.**
213213

214214
This query will be squashed.
215215

216216
```sql
217-
SELECT * FROM commit_files NATURAL JOIN files
217+
SELECT * FROM commit_files NATURAL JOIN files;
218218
```
219219

220220
This query will not be squashed, as the join between `commit_files` and `files` requires more filters to be squashed.
221221

222222
```sql
223223
SELECT * FROM commit_files cf
224-
INNER JOIN files f ON cf.file_path = f.file_path
224+
INNER JOIN files f ON cf.file_path = f.file_path;
225225
```
226226

227227
**TIP:** we suggest always using `NATURAL JOIN` for joining tables, since it's less verbose and already satisfies all the filters for squashing tables.
228228
The only exception to this advice is when joining `refs` and `ref_commits`. A `NATURAL JOIN` between `refs` and `ref_commits` will only get the HEAD commit of the reference. The same happens with `commits` and `commit_trees`/`commit_files`.
229229

230230
You can find the full list of conditions that need to be met for the squash to be applied [here](#list-of-filters-for-squashed-tables).
231231

232-
**Only works if the tables joined follow a hierarchy.** Joinin `commits` and `files` does not work, or joining `blobs` with `files`. It needs to follow one of the hierarchies of tables.
232+
**Only works if the tables joined follow a hierarchy.** Joining `commits` and `files` does not work, or joining `blobs` with `files`. It needs to follow one of the hierarchies of tables.
233233

234234
```
235235
repositories -> refs -> ref_commits -> commits -> commit_trees -> tree_entries -> blobs
@@ -374,4 +374,4 @@ FROM (
374374
GROUP BY lang
375375
```
376376

377-
As a good rule of thumb: defer as much as possible GROUP BY and ORDER BY operations and only perform them with the minimum amount of data needed.
377+
As a good rule of thumb: defer as much as possible GROUP BY and ORDER BY operations and only perform them with the minimum amount of data needed.

docs/using-gitbase/schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ This table represents the relation between commits and [files](#files). Using th
179179

180180
This table allow us to get the commit history from a specific reference name. `history_index` column represents the position of the commit from a specific reference.
181181

182-
This table it's like the [log](https://git-scm.com/docs/git-log) from a specific reference.
182+
This table is like the [log](https://git-scm.com/docs/git-log) from a specific reference.
183183

184184
Commits will be repeated if they are in several repositories or references.
185185

docs/using-gitbase/supported-languages.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Supported languages
1+
# Supported languages
22

33
Gitbase supports many programming languages depending on the use case.
44
For instance the `language(path, [blob])` function supports all languages which [enry's package](https://github.com/src-d/enry) can autodetect.
@@ -14,4 +14,4 @@ If your use case requires _Universal Abstract Syntax Tree_ then most likely one
1414

1515
The _UAST_ functions support programming languages which already have implemented [babelfish](https://docs.sourced.tech/babelfish) driver.
1616
The list of currently supported languages on babelfish, you can find [here](https://docs.sourced.tech/babelfish/languages#supported-languages).
17-
Drivers which are still in development can be find [here](https://docs.sourced.tech/babelfish/languages#in-development).
17+
Drivers which are still in development can be found [here](https://docs.sourced.tech/babelfish/languages#in-development).

0 commit comments

Comments
 (0)