This is a demonstration of our NLP system: HMTL is a neural model for resolving four fundamental tasks in NLP, namely Named Entity Recognition, Entity Mention Detection, Relation Extraction and Coreference Resolution using multi-task learning.
[comment]: <> (For a brief introduction to multi-task learning, you can refer to our blog post (LINK TO COME). Each of the four tasks considered is detailed in the following section.)
The web interface for the demo can be found here (LINK TO COME) for you to try and play with it. HMTL comes with the web visualization client if you prefer to run on your local machine.
The web demo (LINK TO COME) is based on Python 3.6 and AllenNLP.
The easiest way to setup a clean and working environment with the necessary dependencies is to refer to the setup section in the parent folder.
A few supplementary dependecies are listed in requirements.txt
and are required to run the demo.
We also release three pre-trained HMTL models on English corporas. The three models essentially differ by the size of the ELMo embeddings used and thus the size of the model. The bigger the model, the higher the performance:
Model Name | NER (F1) | EMD (F1) | RE (F1) | CR(F1) | Description |
---|---|---|---|---|---|
conll_small_elmo | 85.73 | 83.51 | 58.40 | 62.85 | Small version of ELMo |
conll_medium_elmo | 86.41 | 84.02 | 58.78 | 61.62 | Medium version of ELMo |
conll_full_elmo (default model) | 86.40 | 85.59 | 61.37 | 62.26 | Original version of ELMo |
To download the pre-trained models, please install git lfs and do a git lfs pull
. The weights of the model will be saved in the model_dumps
folder.
Named Entity Recognition aims at identifying and clasifying named entities (real-world object, such as persons, locations, etc. that can be denoted with a proper name).
[Homer Simpson]PERS lives in [Springfield]LOC with his wife and kids.
HMTL is trained on OntoNotes 5.0 and can recognized various types (18) of named entities: PERSON, NORP, FAC, ORG, GPE, LOC, etc.
Entity Mention Detection aims at identifying and clasifying entity mentions (real-world object, such as persons, locations, etc. that are not necessarily denoted with a proper name).
[The men]PERS held on [the sinking vessel]VEH until [the ship]VEH was able to reach them from [Corsica]LOC.
HMTL can recognized different types of mentions: PER, GPE, ORG, FAC, LOC, WEA and VEH.
Relation extraction aims at extracting the semantic relations between the mentions.
The different types of relation detectec by HMTL are the following:
Shortname | Full Name | Description | Example |
---|---|---|---|
ART | Artifact | User-Owner-Inventor-Manufacturer | {Leonard de Vinci painted the Joconde., ARG1 = Leonard de Vinci, ARG2 = Joconde} |
GEN-AFF | Gen-Affiliation | Citizen-Resident-Religion-Ethnicity, Org-Location | {The people of Iraq., ARG1 = The people, ARG2 = Iraq} |
ORG-AFF | Org-Affiliation | Employment, Founder, Ownership, Student-Alum, Sports-Affiliation, Investor-Shareholder, Membership | {Martin Geisler, ITV News, Safwan southern Iraq., ARG1 = Martin Geisler, ARG2 = ITV News} |
PART-WHOLE | Part-whole | Artifact, Geographical, Subsidiary | {They could safeguard the fields in Iraq., ARG1 = the fields, ARG2 = Iraq} |
PER-SOC | Person-social | Business, Family, Lasting-Personal | {Sean Flyn, son the famous actor Errol Flynn, ARG1 = son, ARG2 = Errol Flynn} |
PHYS | Physical | Located, Near | {The two journalists worked from the hotel., ARG1 = the two journalists, ARG2 = the hotel} |
For more details, please refer to the dataset release notes.
In a text, two or more expressions can link to the same person or thing in the worl. Coreference Resolution aims at finding the coreferent spans and cluster them.
[My mom]1 tasted [the cake]2. [She]1 liked [it]2.
HTML can be used as a REST API. A simple example of server script is provided as an example in server.py.
To launch a specific model (please make sure to be in a environment with all the dependencies before: source .env/bin/activate
):
gunicorn -b:8000 'server:build_app(model_name="<model_name>")'
or simply launching the default (full) model:
gunicorn -b:8000 'server:build_app()'
You can then call then the model with the following command: curl http://localhost:8000/jmd/?text=Barack%20Obama%20is%20the%20former%20president.
.