Skip to content

Commit 55ac797

Browse files
author
Jakob Nybo Andersen
committed
Improve language in documentation
1 parent e636b0b commit 55ac797

File tree

3 files changed

+37
-40
lines changed

3 files changed

+37
-40
lines changed

docs/src/index.md

Lines changed: 32 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ end
88
```
99

1010
# CIGARStrings.jl
11-
CIGARStrings.jl provide functionality for parsing and working with Concise Idiosyncratic Gapped Alignment Report - or CIGAR - strings.
12-
CIGARs were popularized by the [SAM format](https://en.wikipedia.org/wiki/SAM_(file_format)), and are a compact run length encoding notation to represent pairwise alignments.
11+
CIGARStrings.jl provides functionality for parsing and working with Concise Idiosyncratic Gapped Alignment Report (CIGAR) strings.
12+
CIGARs were popularized by the [SAM format](https://en.wikipedia.org/wiki/SAM_(file_format)), and are a compact run-length encoding notation used to represent pairwise alignments.
1313
They can be found in the SAM, BAM, PAF, and GFA formats.
1414

1515
For example, the following pairwise alignment of a query to a reference:
@@ -19,17 +19,17 @@ For example, the following pairwise alignment of a query to a reference:
1919
|||| || | |
2020
R: TAGAACCATA--TGC
2121
```
22-
Can be represented by the CIGAR `5M3D2M2I3M`, representing:
22+
can be represented by the CIGAR `5M3D2M2I3M`, representing:
2323
1. 5 matches/mismatches
2424
2. Then, 3 deletions
2525
3. Then, 2 matches/mismatches
2626
4. Then, 2 insertions
2727
5. Finally, 3 matches/mismatches.
2828

29-
A CIGAR strings is always written in terms of the _query_, and not the reference.
29+
A CIGAR string is always written in terms of the _query_, not the reference.
3030

3131
## Individual alignment operations
32-
One run of identical alignment operations, e.g. "5 matches/mismatches" are represented
32+
One run of identical alignment operations, e.g. "5 matches/mismatches," is represented
3333
by a single `CIGARElement`.
3434
Conceptually, a `CIGARElement` is an alignment operation (represented by a `CIGAROp`) and a length:
3535

@@ -41,16 +41,16 @@ CIGAROp
4141
## CIGARs
4242
A CIGAR string is represented by an `AbstractCIGAR`, which currently has two subtypes: `CIGAR` and `BAMCIGAR`.
4343
These types differ in their memory layout: The former stores the CIGAR as its ASCII representation (as used in the SAM format), and the latter stores it in a binary format (as used in the BAM format).
44-
Both typs store its underlying data as an `ImmutableMemoryView{UInt8}`.
44+
Both types store their underlying data as an `ImmutableMemoryView{UInt8}`.
4545

4646
```@docs
4747
AbstractCIGAR
4848
```
4949

50-
The API for these two types are almost interchangeable, so examples below will use `CIGAR`, since its plaintext representation makes examples easier.
50+
The API for these two types is almost interchangeable, so examples below use `CIGAR`, since its plaintext representation makes examples easier to read.
5151
See [BAMCIGAR section](@ref bamcigar) for a list of all differences between the two types.
5252

53-
CIGAR strings are validated upon construction
53+
CIGAR strings are validated upon construction.
5454

5555
```jldoctest
5656
julia> CIGAR("2M1D3M")
@@ -64,7 +64,7 @@ ERROR: Error around byte 4: Invalid operation. Possible values are "MIDNSHP=X".
6464
Since CIGAR strings occur in various bioinformatics file formats, it is expected
6565
that users of CIGARStrings.jl will construct `CIGAR`s from a view into a buffer storing a chunk of the file.
6666

67-
This is zero-copy, and will not to allocate on Julia 1.14 and forward.
67+
This is zero-copy, and does not allocate on Julia 1.14 and later.
6868
For example:
6969

7070
```jldoctest
@@ -80,7 +80,7 @@ CIGAR("15M9D18M")
8080
CIGAR
8181
```
8282

83-
`CIGAR`s are iterable, and returns its `CIGARElement`s, in order:
83+
`CIGAR`s are iterable, and return their `CIGARElement`s in order:
8484

8585
```jldoctest
8686
julia> collect(CIGAR("2M1D3M"))
@@ -129,8 +129,6 @@ alignment length is 15.
129129
R: TAGAACCATA--TGC
130130
```
131131

132-
We always have `aln_length(c) ≥ max(query_length(c), ref_length(c))`
133-
134132
```jldoctest
135133
julia> c = CIGAR("5M3D2M2I3M");
136134
@@ -144,19 +142,19 @@ julia> aln_length(c)
144142
15
145143
```
146144

147-
Since the CIGAR operation `M` (`OP_M`) is ambiguous to whether is represents matches,
145+
Since the CIGAR operation `M` (`OP_M`) is ambiguous about whether it represents matches,
148146
mismatches, or a combination of these, the function [`count_matches`](@ref) can be used to
149147
count the number of matches in a CIGAR given the number of mismatches.
150148

151-
The number of mismatches are typically output by mappers, making this information
152-
handily accessible:
149+
Mismatch counts are typically output by mappers, making this information
150+
readily accessible.
153151

154-
The alignment identity (number of matches, not mismatches divided by alignment length)
152+
Alignment identity (number of matches, excluding mismatches, divided by alignment length)
155153
can be obtained with [`aln_identity`](@ref).
156-
Like [`count_matches`](@ref), this takes the number of mismatches as an argument:
154+
Like [`count_matches`](@ref), this takes the number of mismatches as an argument.
157155

158156
## Comparing CIGARs
159-
When comparing `CIGAR`s using `==`, it will check if the `CIGAR`s are literally identical, in the
157+
When comparing `CIGAR`s using `==`, Julia checks whether the `CIGAR`s are literally identical, in the
160158
sense that they are composed of the same bytes:
161159

162160
```jldoctest compare
@@ -176,7 +174,7 @@ However, in the above example, since the CIGAR operation `M` signifies a match o
176174
CIGARs are indeed compatible, since `10M` is also a valid CIGAR annotation for the same alignment
177175
as `4=1X5=`.
178176

179-
This notion of compatibility tested with `is_compatible`:
177+
This notion of compatibility can be tested with `is_compatible`:
180178

181179
```@docs
182180
is_compatible
@@ -204,10 +202,10 @@ are also written in this alignment.
204202
We can see that query position 6 aligns to reference position 9, which is also
205203
alignment position 9.
206204

207-
These position translation can be obtained using the function [`pos_to_pos`](@ref),
205+
These position translations can be obtained using the function [`pos_to_pos`](@ref),
208206
specifying the source and destination coordinate systems [`query`](@ref), [`ref`](@ref)
209207
or [`aln`](@ref).
210-
When passed an integer, this function returns `Translation` object that contains two properties: `.pos` and `.kind`.
208+
When passed an integer, this function returns a `Translation` object with two properties: `.pos` and `.kind`.
211209

212210
When a position translation has a straightforward answer, the `.kind` property is
213211
`CIGARStrings.pos`, and the `.pos` field is the corresponding position:
@@ -222,10 +220,10 @@ julia> pos_to_pos(aln, query, c, 9)
222220
Translation(pos, 6)
223221
```
224222

225-
Note that these operations are in __linear time__, as they scan the CIGAR string from the beginning.
223+
Note that these operations run in __linear time__, as they scan the CIGAR string from the beginning.
226224

227225
To efficiently query multiple translations in the same scan of the CIGAR string, you can pass a sorted (ascending) iterator of integers.
228-
In this case, `pos_to_pos` will return a lazy iterator of `Pair{Int, Translation}`, representing `source_index => destination_index`:
226+
In this case, `pos_to_pos` returns a lazy iterator of `Pair{Int, Translation}`, representing `source_position => mapped_translation`:
229227

230228
```jldoctest
231229
julia> c = CIGAR("4M3D2M2I3M"); # alignment above
@@ -252,21 +250,21 @@ CIGARStrings.TranslationKind
252250
## Normalization
253251
The CIGAR format is redundant, in that the same alignment can be written in multiple different ways. In particular:
254252

255-
* The `P` and `H` operations means nothing w.r.t the query and reference.
256-
`P` is only used to pad w.r.t a third sequence, and `H` signifies that part of
253+
* The `P` and `H` operations mean nothing with respect to the query and reference.
254+
`P` is only used to pad with respect to a third sequence, and `H` signifies that part of
257255
the true query is missing from the input query sequence.
258256
* The `=` and `X` operations are usually redundant with `M`, since the information of matches/mismatches is not given by the alignment itself, but can be determined from the input sequences given the alignment.
259-
* Consecutive runs of the same operation is allowed, such as `1M1M`, but is better written `2M`
257+
* Consecutive runs of the same operation are allowed, such as `1M1M`, but are better written as `2M`.
260258

261-
This package provides the functions [`normalize`](@ref), [`normalize!`](@ref) and [`unsafe_normalize`](@ref) which creates new cigars written in the canonical form.
262-
In the canonical form, each of the points above are addressed: `H` is converted to `S`, `P` is removed, `=` and `X` is converted to `M`, and consecutive identical operations are merged.
259+
This package provides the functions [`normalize`](@ref), [`normalize!`](@ref), and [`unsafe_normalize`](@ref), which create new CIGARs written in canonical form.
260+
In canonical form, each of the points above is addressed: `H` and `P` is removed, `=` and `X` are converted to `M`, and consecutive identical operations are merged.
263261

264-
Note that the normalized form of a cigar corresponds to the _same_ pairwise alignment.
262+
Note that the normalized form of a CIGAR corresponds to the _same_ pairwise alignment.
265263
Therefore, it is guaranteed that if `is_compatible(a, b)`, then `normalize(a) == normalize(b)` (though not the other way around).
266-
It is also guaranteed that the result of position translation is identical for a cigar and its normalized version.
264+
It is also guaranteed that the result of position translation is identical for a CIGAR and its normalized version.
267265

268266
## Errors and error recovery
269-
CIGARStrings.jl allows you to parse a poential CIGAR string without throwing an exception if the data is invalid, using the function [`CIGARStrings.try_parse`](@ref).
267+
CIGARStrings.jl allows you to parse a potential CIGAR string without throwing an exception if the data is invalid, using the function [`CIGARStrings.try_parse`](@ref).
270268

271269
```@docs
272270
CIGARStrings.CIGARError
@@ -282,14 +280,14 @@ However, in order to make zero-copy CIGARs possible, the `BAMCIGAR` type is back
282280
CIGARStrings.BAMCIGAR
283281
```
284282

285-
A `BAMCIGAR` can be constructed from its binary representation, using any type which implements `MemoryViews.MemoryView`:
283+
A `BAMCIGAR` can be constructed from its binary representation using any type that implements `MemoryViews.MemoryView`:
286284

287285
```jldoctest
288286
julia> BAMCIGAR("\x54\4\0\0\x70\4\0\0")
289287
BAMCIGAR(CIGAR("69S71M"))
290288
```
291289

292-
This is not zero-cost: Like `CIGAR` the type contains some metadata and is validated upon construction.
290+
This is not zero-cost: like `CIGAR`, the type contains some metadata and is validated upon construction.
293291

294292
Like `CIGAR`, the `try_parse` function can be used:
295293

@@ -298,7 +296,7 @@ julia> CIGARStrings.try_parse(BAMCIGAR, "\x5f\4\0\0\x70\4\0\0")
298296
CIGARStrings.CIGARError(1, CIGARStrings.Errors.InvalidOperation)
299297
```
300298

301-
`CIGAR` and `BAMCIGAR` can be converted ifallably to each other:
299+
`CIGAR` and `BAMCIGAR` can be converted infallably to each other:
302300

303301
```jldoctest
304302
julia> c = CIGAR("6H19S18M1I22=8I2S");

src/CIGARStrings.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@ end
564564
query_length(::AbstractCIGAR)::Int
565565
566566
Get the number of biosymbols in the query of the `CIGAR`. This is the same
567-
as the lengths of all `CIGARElement`s of type `M`, `I`, `S`, `H`, `=` and `X`.
567+
as the lengths of all `CIGARElement`s of type `M`, `I`, `S`, `=` and `X`.
568568
569569
See also: [`ref_length`](@ref), [`aln_length`](@ref)
570570

src/bamcigar.jl

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,12 @@ end
99
"""
1010
BAMCIGAR <: AbstractCIGAR
1111
12-
A BAMCIGAR is an alternative representation of a CIGAR,
12+
A `BAMCIGAR` is an alternative representation of a `CIGAR`,
1313
stored compactly in 32-bit integers.
14-
Semantically, a BAMCIGAR behaves much similar to a CIGAR.
14+
Semantically, a `BAMCIGAR` behaves much similar to a `CIGAR`.
1515
16-
Construct a BAMCIGAR either from a CIGAR, taking an optional `Vector{UInt8}`
17-
to use as backing storage, or using [`CIGARStrings.try_parse`](@ref),
18-
or [`BAMCIGAR(::MutableMemoryView{UInt8}, ::CIGAR)`](@ref)
16+
Construct a `BAMCIGAR` either from a `CIGAR`, or using [`encode!`](@ref)
17+
or [`encode_append!`](@ref)
1918
2019
# Examples
2120
```jldoctest

0 commit comments

Comments
 (0)