Skip to content
This repository was archived by the owner on Apr 30, 2021. It is now read-only.

Commit 618d118

Browse files
authored
Update README.md
1 parent d338140 commit 618d118

File tree

1 file changed

+14
-29
lines changed

1 file changed

+14
-29
lines changed

README.md

Lines changed: 14 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,25 @@
1-
# Deep-Segmentation
2-
A sentence segmenter that actually works! Now for English, French and Italian.
1+
# DeepSegment: A sentence segmenter that actually works!
2+
# For the original implementation please use the "master" branch of this repo.
33

44
The Demo is available at http://bpraneeth.com/projects
55

6-
The code and pre-trained models for "DeepCorrection 1: Sentence Segmentation of unpunctuated text." as explained in the medium posts at https://medium.com/@praneethbedapudi/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98 and
7-
https://medium.com/@praneethbedapudi/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f
8-
9-
10-
The pre-trained models are available at https://github.com/bedapudi6788/DeepSegment-Models
11-
6+
# Installation:
7+
```
8+
pip install --upgrade deepsegment
9+
# please install tensorflow or tensorflow-gpu separately. Tested with tf and tf-gpu versions 1.8 to 2.0
10+
```
1211

13-
# Requirements:
14-
seqtag
12+
# Usage:
1513

1614
```
17-
# if you are using gpu for prediction, please see https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory for restricting memory usage
18-
1915
from deepsegment import DeepSegment
20-
# the config file can be found at in the pre-trained model zip. Change the model paths in the config file before loading.
21-
# Since the complete glove embeddings are not needed for predictions, "glove_path" can be left empty in config file
22-
segmenter = DeepSegment('path_to_config')
16+
# The default language is 'en'
17+
segmenter = DeepSegment('en')
2318
segmenter.segment('I am Batman i live in gotham')
24-
['I am Batman', 'i live in gotham']
19+
# ['I am Batman', 'i live in gotham']
20+
2521
```
2622

2723
# To Do:
28-
Add a sliding window for processing very long texts.
29-
30-
Update the seqtag model to work with tf 2.0 (Change to tf.data may be).
31-
32-
Update to add Indic languages.
33-
34-
35-
# Notes:
36-
Of all the sentence segmentation models I evaluated, without doubt deepsegment is the best in terms of accuracy in real word (bad punctuation, wrong punctuation)
37-
38-
I trained flair's ner model on the same data and flair has better results but, it's miniscule (0.3% absolute accuracy increase).
39-
40-
Since I want to keep using tf and keras for now, and since flair embeddings are not available for all the languages I want deepsegment to work on, I am going to keep using seqtag for this project.
24+
1. Add a sliding window for processing very long texts.
25+
2. Publish docker tf-serving image and deepsegment-client.

0 commit comments

Comments
 (0)