Skip to content

Commit 4d5ca8b

Browse files
authored
Merge pull request #247 from bzz/doc-update
doc: cleanup and simplify
2 parents a4c166c + c7272bd commit 4d5ca8b

File tree

1 file changed

+46
-106
lines changed

1 file changed

+46
-106
lines changed

README.md

Lines changed: 46 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,21 @@
11
# enry [![GoDoc](https://godoc.org/github.com/src-d/enry?status.svg)](https://godoc.org/github.com/src-d/enry) [![Build Status](https://travis-ci.com/src-d/enry.svg?branch=master)](https://travis-ci.com/src-d/enry) [![codecov](https://codecov.io/gh/src-d/enry/branch/master/graph/badge.svg)](https://codecov.io/gh/src-d/enry)
22

3-
File programming language detector and toolbox to ignore binary or vendored files. *enry*, started as a port to _Go_ of the original [linguist](https://github.com/github/linguist) _Ruby_ library, that has an improved *2x performance*.
3+
Programming language detector and toolbox to ignore binary or vendored files. *enry*, started as a port to _Go_ of the original [linguist](https://github.com/github/linguist) _Ruby_ library, that has an improved *2x performance*.
44

5-
* [Installation](#installation)
6-
* [Examples](#examples)
75
* [CLI](#cli)
8-
* [Java bindings](#java-bindings)
9-
* [Python bindings](#python-bindings)
6+
* [Library](#library)
7+
* [Go](#go)
8+
* [Java bindings](#java-bindings)
9+
* [Python bindings](#python-bindings)
1010
* [Divergences from linguist](#divergences-from-linguist)
1111
* [Benchmarks](#benchmarks)
1212
* [Why Enry?](#why-enry)
1313
* [Development](#development)
1414
* [Sync with github/linguist upstream](#sync-with-githublinguist-upstream)
1515
* [Misc](#misc)
16-
* [Benchmark](#benchmark)
17-
* [Faster regexp engine (optional)](#faster-regexp-engine-optional)
1816
* [License](#license)
1917

20-
Installation
21-
------------
18+
# CLI
2219

2320
The recommended way to install the `enry` command-line tool is to either
2421
[download a release](https://github.com/src-d/enry/releases) or run:
@@ -27,10 +24,29 @@ The recommended way to install the `enry` command-line tool is to either
2724
(cd "$(mktemp -d)" && go mod init enry && go get github.com/src-d/enry/v2/cmd/enry)
2825
```
2926

30-
Examples
31-
--------
27+
*enry* CLI accepts similar flags (`--breakdown/--json`) and produce an output, similar to *linguist*:
28+
29+
```bash
30+
$ enry
31+
97.71% Go
32+
1.60% C
33+
0.31% Shell
34+
0.22% Java
35+
0.07% Ruby
36+
0.05% Makefile
37+
0.04% Scala
38+
0.01% Gnuplot
39+
```
40+
41+
Note that enry's CLI **_does not need an actual git repository to work_**, which is an intentional difference from linguist.
42+
43+
# Library
3244

33-
If you are working in a [Go module](https://github.com/golang/go/wiki/Modules),
45+
*enry* is also available as a native Go library with FFI bindings for multiple programming languages.
46+
47+
## Go
48+
49+
In a [Go module](https://github.com/golang/go/wiki/Modules),
3450
import `enry` to the module by running:
3551

3652
```go
@@ -61,9 +77,9 @@ lang := enry.GetLanguage("foo.cpp", []byte("<cpp-code>"))
6177
// result: C++ true
6278
```
6379

64-
Note that the returned boolean value `safe` is set either to `true`, if there is only one possible language detected, or to `false` otherwise.
80+
Note that the returned boolean value `safe` is `true` if there is only one possible language detected.
6581

66-
To get a list of possible languages for a given file, you can use the plural version of the detecting functions.
82+
To get a list of all possible languages for a given file, there is a plural version of the same API.
6783

6884
```go
6985
langs := enry.GetLanguages("foo.h", []byte("<cpp-code>"))
@@ -76,96 +92,18 @@ langs := enry.GetLanguagesByFilename("Gemfile", []byte("<content>"), []string{})
7692
// result: []string{"Ruby"}
7793
```
7894

79-
80-
CLI
81-
------------
82-
83-
You can use enry as a command,
84-
85-
```bash
86-
$ enry --help
87-
enry v2.0.0 build: 05-08-2019_20_40_35 commit: 6ccf0b6, based on linguist commit: e456098
88-
enry, A simple (and faster) implementation of github/linguist
89-
usage: enry [-mode=(file|line|byte)] [-prog] <path>
90-
enry [-mode=(file|line|byte)] [-prog] [-json] [-breakdown] <path>
91-
enry [-mode=(file|line|byte)] [-prog] [-json] [-breakdown]
92-
enry [-version]
93-
```
94-
95-
and on repository root, it'll return an output similar to *linguist*'s output,
96-
97-
```bash
98-
$ enry
99-
97.71% Go
100-
1.60% C
101-
0.31% Shell
102-
0.22% Java
103-
0.07% Ruby
104-
0.05% Makefile
105-
0.04% Scala
106-
0.01% Gnuplot
107-
```
108-
109-
but not only the output; its flags are also the same as *linguist*'s ones,
110-
111-
```bash
112-
$ enry --breakdown
113-
97.71% Go
114-
1.60% C
115-
0.31% Shell
116-
0.22% Java
117-
0.07% Ruby
118-
0.05% Makefile
119-
0.04% Scala
120-
0.01% Gnuplot
121-
122-
Scala
123-
java/build.sbt
124-
java/project/plugins.sbt
125-
126-
Java
127-
java/src/main/java/tech/sourced/enry/Enry.java
128-
java/src/main/java/tech/sourced/enry/GoUtils.java
129-
java/src/main/java/tech/sourced/enry/Guess.java
130-
java/src/test/java/tech/sourced/enry/EnryTest.java
131-
132-
Makefile
133-
Makefile
134-
java/Makefile
135-
136-
Go
137-
benchmark_test.go
138-
```
139-
140-
even the JSON flag,
141-
142-
```bash
143-
$ enry --json | jq .
144-
{
145-
"C": [
146-
"internal/tokenizer/flex/lex.linguist_yy.c",
147-
"internal/tokenizer/flex/lex.linguist_yy.h",
148-
"internal/tokenizer/flex/linguist.h",
149-
"python/_c_enry.c",
150-
"python/enry.c"
151-
],
152-
"Gnuplot": [
153-
"benchmarks/plot-histogram.gp"
154-
],
155-
"Go": [
156-
"benchmark_test.go",
157-
```
158-
159-
Note that enry's CLI **_doesn't need a git repository to work_**, which is intentionally different from the linguist.
160-
16195
## Java bindings
16296

97+
Generated Java bindings using a C shared library and JNI are available under [`java`](https://github.com/src-d/enry/blob/master/java).
98+
99+
A library is published on Maven as [tech.sourced:enry-java](https://mvnrepository.com/artifact/tech.sourced/enry-java) for macOS and linux platforms. Windows support is planned under [src-d/enry#150](https://github.com/src-d/enry/issues/150).
163100

164-
Generated Java bindings using a C shared library and JNI are available under [`java`](https://github.com/src-d/enry/blob/master/java) and published on Maven at [tech.sourced:enry-java](https://mvnrepository.com/artifact/tech.sourced/enry-java) for macOS and linux.
101+
# Python bindings
165102

103+
Generated Python bindings using a C shared library and cffi are WIP under [src-d/enry#154](https://github.com/src-d/enry/issues/154).
166104

167-
## Python bindings
168-
Generated Python bindings using a C shared library and cffi are not available yet and are WIP under [src-d/enry#154](https://github.com/src-d/enry/issues/154).
105+
A library is going to be published on pypi as [enry](https://pypi.org/project/enry/) for
106+
macOS and linux platforms. Windows support is planned under [src-d/enry#150](https://github.com/src-d/enry/issues/150).
169107

170108
Divergences from linguist
171109
------------
@@ -199,26 +137,27 @@ In all the cases above that have an issue number - we plan to update enry to mat
199137
Benchmarks
200138
------------
201139

202-
Enry's language detection has been compared with Linguist's one. In order to do that, Linguist's project directory [*linguist/samples*](https://github.com/github/linguist/tree/master/samples) was used as a set of files to run benchmarks against.
140+
Enry's language detection has been compared with Linguist's on [*linguist/samples*](https://github.com/github/linguist/tree/master/samples).
203141

204142
We got these results:
205143

206144
![histogram](benchmarks/histogram/distribution.png)
207145

208-
The histogram shows the number of files detected (y-axis) per time interval bucket (x-axis). As one can see, most of the files were detected faster by enry.
146+
The histogram shows the _number of files_ (y-axis) per _time interval bucket_ (x-axis).
147+
Most of the files were detected faster by enry.
209148

210-
We found few cases where enry turns slower than linguist due to
211-
Go regexp engine being slower than Ruby's, based on [oniguruma](https://github.com/kkos/oniguruma) library, written in C.
149+
There are several cases where enry is slower than linguist due to
150+
Go regexp engine being slower than Ruby's on, wich is based on [oniguruma](https://github.com/kkos/oniguruma) library, written in C.
212151

213152
See [instructions](#misc) for running enry with oniguruma.
214153

215154

216155
Why Enry?
217156
------------
218157

219-
In the movie [My Fair Lady](https://en.wikipedia.org/wiki/My_Fair_Lady), [Professor Henry Higgins](http://www.imdb.com/character/ch0011719/?ref_=tt_cl_t2) is one of the main characters. Henry is a linguist and at the very beginning of the movie enjoys guessing the origin of people based on their accent.
158+
In the movie [My Fair Lady](https://en.wikipedia.org/wiki/My_Fair_Lady), [Professor Henry Higgins](http://www.imdb.com/character/ch0011719/) is a linguist who at the very beginning of the movie enjoys guessing the origin of people based on their accent.
220159

221-
"Enry Iggins" is how [Eliza Doolittle](http://www.imdb.com/character/ch0011720/?ref_=tt_cl_t1), [pronounces](https://www.youtube.com/watch?v=pwNKyTktDIE) the name of the Professor during the first half of the movie.
160+
"Enry Iggins" is how [Eliza Doolittle](http://www.imdb.com/character/ch0011720/), [pronounces](https://www.youtube.com/watch?v=pwNKyTktDIE) the name of the Professor.
222161

223162
## Development
224163

@@ -228,7 +167,7 @@ To build enry's CLI run:
228167

229168
this will generate a binary in the project's root directory called `enry`.
230169

231-
To run the tests:
170+
To run the tests use:
232171

233172
make test
234173

@@ -267,6 +206,7 @@ Separating all the necessary "manual" code changes to a different PR that includ
267206
## Misc
268207

269208
<details>
209+
<summary>Running a benchmark & faster regexp engine</summary>
270210

271211
### Benchmark
272212

0 commit comments

Comments
 (0)