You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
##Calculating code line changes for files in the last commit
169
+
##Calculating code line changes for files in the last commit
170
170
171
171
This query will report how many lines of actual code (only code, not comments, blank lines or text) changed in each file of the last commit of each repository. It's similar to the previous example. `COMMIT_STATS` is an aggregation over the result of `COMMIT_FILE_STATS` so to speak.
172
-
We will only report those files that whose language has been identified.
172
+
We will only report those files whose language has been identified.
173
173
174
174
```sql
175
175
SELECT
@@ -277,7 +277,7 @@ We'll get the following output:
277
277
278
278
From this output, we can obtain some information about our query:
279
279
- It's been running for 36 seconds.
280
-
- It's querying commit_files table and has processed 8 out of 9 partitions.
280
+
- It's querying `commit_files` table and has processed 8 out of 9 partitions.
281
281
282
282
To kill a query that's currently running you can use the value in `Id`. If we were to kill the previous query, we would need to use the following query:
|`commit_stats(repository_id, [from_commit_hash], to_commit_hash) json`|returns the stats between two commits for a repository. If from is empty, it will compare the given `to_commit_hash` with its parent commit. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
10
-
|`commit_file_stats(repository_id, [from_commit_hash], to_commit_hash) json array`|returns an array with the stats of each file in `to_commit_hash` since the given `from_commit_hash`. If from is not given, the parent commit will be used. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
11
-
|`is_remote(reference_name)bool`|check if the given reference name is from a remote one |
12
-
|`is_tag(reference_name)bool`|check if the given reference name is a tag |
13
-
|`is_vendor(file_path)bool`|check if the given file name is a vendored file |
14
-
|`language(path, [blob])text`| gets the language of a file given its path and the optional content of the file |
15
-
|`uast(blob, [lang, [xpath]]) blob`| returns a node array of UAST nodes in semantic mode |
16
-
|`uast_mode(mode, blob, lang) blob`| returns a node array of UAST nodes specifying its language and mode (semantic, annotated or native) |
17
-
|`uast_xpath(blob, xpath) blob`| performs an XPath query over the given UAST nodes |
18
-
|`uast_extract(blob, key) text array`| extracts information identified by the given key from the uast nodes |
19
-
|`uast_children(blob) blob`| returns a flattened array of the children UAST nodes from each one of the UAST nodes in the given array |
20
-
|`loc(path, blob) json`| returns a JSON map, containing the lines of code of a file, separated in three categories: Code, Blank and Comment lines |
21
-
|`version() text`| returns the gitbase version in the following format `8.0.11-{GITBASE_VERSION}` for compatibility with MySQL versioning |
9
+
|`commit_stats(repository_id, [from_commit_hash], to_commit_hash) json`|returns the stats between two commits for a repository. If `from_commit_hash` is empty, it will compare the given `to_commit_hash` with its parent commit. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
10
+
|`commit_file_stats(repository_id, [from_commit_hash], to_commit_hash) json array`|returns an array with the stats of each file in `to_commit_hash` since the given `from_commit_hash`. If `from_commit_hash` is not given, the parent commit will be used. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
11
+
|`is_remote(reference_name)bool`|checks if the given reference name is from a remote one.|
12
+
|`is_tag(reference_name)bool`|checks if the given reference name is a tag.|
13
+
|`is_vendor(file_path)bool`|checks if the given file name is a vendored file.|
14
+
|`language(path, [blob])text`| gets the language of a file given its path and the optional content of the file.|
15
+
|`uast(blob, [lang, [xpath]]) blob`| returns a node array of UAST nodes in semantic mode.|
16
+
|`uast_mode(mode, blob, lang) blob`| returns a node array of UAST nodes specifying its language and mode (semantic, annotated or native).|
17
+
|`uast_xpath(blob, xpath) blob`| performs an XPath query over the given UAST nodes.|
18
+
|`uast_extract(blob, key) text array`| extracts information identified by the given key from the uast nodes.|
19
+
|`uast_children(blob) blob`| returns a flattened array of the children UAST nodes from each one of the UAST nodes in the given array.|
20
+
|`loc(path, blob) json`| returns a JSON map, containing the lines of code of a file, separated in three categories: Code, Blank and Comment lines.|
21
+
|`version() text`| returns the gitbase version in the following format `8.0.11-{GITBASE_VERSION}` for compatibility with MySQL versioning.|
22
22
## Standard functions
23
23
24
24
These are all functions that are available because they are implemented in `go-mysql-server`, used by gitbase.
@@ -159,23 +159,29 @@ Check out the [UAST v2 specification](https://docs.sourced.tech/babelfish/uast/u
159
159
160
160
Using these selectors as in,
161
161
162
-
> uast_extract(nodes_column, @common_selector)
162
+
```
163
+
uast_extract(nodes_column, @common_selector)
164
+
```
163
165
164
166
you will extract the value of that property for each node.
165
167
166
168
Nodes that have no value for the requested property will not be present in any way in the final array. That is, having a sequence of nodes `[node-1, node-2, node-3]` and knowing that node-2 doesn't have a value for the requested property, the returned array will be `[prop-1, prop-3]`.
167
169
168
170
Also, if you want to retrieve values from a non common property, you can pass it directly
169
171
170
-
> uast_extract(nodes_column, 'some-property')
172
+
```
173
+
uast_extract(nodes_column, 'some-property')
174
+
```
171
175
172
176
## How to use `loc`
173
177
174
178
`loc` will return statistics about the lines of code in a file, such as the code lines, comment lines, etc.
175
179
176
180
It requires a file path and a file content.
177
181
178
-
> loc(file_path, blob_content)
182
+
```
183
+
loc(file_path, blob_content)
184
+
```
179
185
180
186
The result of this function is a JSON document with the following shape:
181
187
@@ -266,9 +272,9 @@ FROM (
266
272
267
273
It can be used in two ways:
268
274
- To get the statistics of a specific commit `COMMIT_STATS(repository_id, commit_hash)`
269
-
- To get the statistics of a the diff of a commit range `COMMIT_STATS(repository_id, from_commit, to_commit)`
275
+
- To get the statistics of the diff of a commit range `COMMIT_STATS(repository_id, from_commit, to_commit)`
270
276
271
-
`commit_stats`it's pretty much an aggregation of the result of `commit_file_stats`. While `commit_file_stats` has the stats for each file in a commit, `commit_stats` has the global stats of all files in the commit. As a result, it outputs a single structure instead of an array of them.
277
+
`commit_stats`is pretty much an aggregation of the result of `commit_file_stats`. While `commit_file_stats` has the stats for each file in a commit, `commit_stats` has the global stats of all files in the commit. As a result, it outputs a single structure instead of an array of them.
272
278
273
279
The shape of the result returned by this function is the following:
274
280
@@ -300,7 +306,7 @@ The shape of the result returned by this function is the following:
300
306
301
307
**NOTE:** Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that `.gitignore` is considered a vendored file.
302
308
303
-
The result returned by this function is a JSON, which means to access its fields, the use of `JSON_EXTRACT is needed.
309
+
The result returned by this function is a JSON, which means that to access its fields, the use of `JSON_EXTRACT` is needed.
304
310
305
311
For example, code additions would be accessed like this:
Copy file name to clipboardExpand all lines: docs/using-gitbase/indexes.md
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,8 @@ Indexes are implemented as bitmaps using [pilosa](https://github.com/pilosa/pilo
7
7
Thus, to create indexes you must specify pilosa as the type of index. You can find some examples in the [examples](./examples.md#create-an-index-for-columns-on-a-table) section about managing indexes.
8
8
9
9
Note that you can create an index either **on one or more columns** or **on a single expression**.
10
-
In practice, having multiple indexes - one per column is better and more flexible than one index for multiple columns. It is because of data structures (bitmaps) used to represent index values.
11
-
Even if you have one index on multiple columns, every columns is stored in independent _field_.
10
+
In practice, having multiple indexes (one per column) is better and more flexible than one index for multiple columns. It is because of data structures (bitmaps) used to represent index values.
11
+
Even if you have one index on multiple columns, every column is stored in an independent _field_.
12
12
Merging those _fields_ by any logic operations is fast and much more flexible. The main difference of having multiple columns per index is, it internally calculates intersection across columns, so the index won't be used if you use _non_`AND` operation in a filter, e.g.:
13
13
14
14
With index on (`A`, `B`), the index will be used for following query:
@@ -26,4 +26,4 @@ and for the second query also two indexes will be used and the result will be a
26
26
27
27
You can find some more examples in the [examples](./examples.md#create-an-index-for-columns-on-a-table) section.
28
28
29
-
See [go-mysql-server](https://github.com/src-d/go-mysql-server/tree/541fde3b92093b3a449e803342a7a18c686275e6#indexes) documentation for more details
29
+
See [go-mysql-server](https://github.com/src-d/go-mysql-server/tree/541fde3b92093b3a449e803342a7a18c686275e6#indexes) documentation for more details.
Copy file name to clipboardExpand all lines: docs/using-gitbase/optimize-queries.md
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
Even though in each release performance improvements are included to make gitbase faster, there are some queries that might take too long. By rewriting them in some ways, you can squeeze that extra performance you need by taking advantage of some optimisations that are already in place.
4
4
5
-
There are two ways to optimize a gitbase query:
5
+
There are three ways to optimize a gitbase query:
6
6
- Create an index for some parts.
7
7
- Making sure the joined tables are squashed.
8
8
- Making sure not squashed joins are performed in memory.
@@ -82,7 +82,7 @@ So, as a good rule of thumb, the right side of an inner join should always be th
82
82
The more obvious way to improve the performance of a query is to create an index for such query. Since you can index multiple columns or a single arbitrary expression, this may be useful for some kinds of queries. For example, if you're querying by language, you may want to index that so there is no need to compute the language each time.
83
83
84
84
```sql
85
-
CREATEINDEXfiles_language_idxON files USING pilosa (language(file_path, blob_content))
85
+
CREATEINDEXfiles_language_idxON files USING pilosa (language(file_path, blob_content));
86
86
```
87
87
88
88
Once you have the index in place, gitbase only looks for the rows with the values matching your conditions.
@@ -199,37 +199,37 @@ This advice can be applied to all squashed tables, not only `repository_id`.
199
199
200
200
This query will get squashed, because `NATURAL JOIN` makes sure all columns with equal names are used in the join.
INNER JOIN commits c ONrc.commit_hash=c.commit_hash
209
+
INNER JOIN commits c ONrc.commit_hash=c.commit_hash;
210
210
```
211
211
212
212
**It requires some filters to be present in order to perform the squash.**
213
213
214
214
This query will be squashed.
215
215
216
216
```sql
217
-
SELECT*FROM commit_files NATURAL JOIN files
217
+
SELECT*FROM commit_files NATURAL JOIN files;
218
218
```
219
219
220
220
This query will not be squashed, as the join between `commit_files` and `files` requires more filters to be squashed.
221
221
222
222
```sql
223
223
SELECT*FROM commit_files cf
224
-
INNER JOIN files f ONcf.file_path=f.file_path
224
+
INNER JOIN files f ONcf.file_path=f.file_path;
225
225
```
226
226
227
227
**TIP:** we suggest always using `NATURAL JOIN` for joining tables, since it's less verbose and already satisfies all the filters for squashing tables.
228
228
The only exception to this advice is when joining `refs` and `ref_commits`. A `NATURAL JOIN` between `refs` and `ref_commits` will only get the HEAD commit of the reference. The same happens with `commits` and `commit_trees`/`commit_files`.
229
229
230
230
You can find the full list of conditions that need to be met for the squash to be applied [here](#list-of-filters-for-squashed-tables).
231
231
232
-
**Only works if the tables joined follow a hierarchy.**Joinin`commits` and `files` does not work, or joining `blobs` with `files`. It needs to follow one of the hierarchies of tables.
232
+
**Only works if the tables joined follow a hierarchy.**Joining`commits` and `files` does not work, or joining `blobs` with `files`. It needs to follow one of the hierarchies of tables.
Copy file name to clipboardExpand all lines: docs/using-gitbase/schema.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -179,7 +179,7 @@ This table represents the relation between commits and [files](#files). Using th
179
179
180
180
This table allow us to get the commit history from a specific reference name. `history_index` column represents the position of the commit from a specific reference.
181
181
182
-
This table it's like the [log](https://git-scm.com/docs/git-log) from a specific reference.
182
+
This table is like the [log](https://git-scm.com/docs/git-log) from a specific reference.
183
183
184
184
Commits will be repeated if they are in several repositories or references.
0 commit comments