Repository: dmis-lab/bern Branch: master Commit: b56b1763c11a Files: 32 Total size: 477.1 KB Directory structure: gitextract_5jlzvvp_/ ├── .gitignore ├── LICENSE ├── README.md ├── __init__.py ├── biobert_ner/ │ ├── __init__.py │ ├── conf/ │ │ ├── bert_config.json │ │ └── vocab.txt │ ├── fast_predict2.py │ ├── modeling.py │ ├── ops.py │ ├── run_ner.py │ ├── tokenization.py │ └── utils.py ├── convert.py ├── download.py ├── load_dicts.sh ├── normalize.py ├── normalizers/ │ ├── __init__.py │ ├── chemical_normalizer.py │ ├── gene_auxiliary_normalizer.py │ ├── miRNA_normalizer.py │ ├── mutation_normalizer.py │ ├── pathway_normalizer.py │ └── species_normalizer.py ├── requirements.txt ├── scripts/ │ ├── bern_checker.sh │ ├── download_biobert_ner_models.sh │ └── download_norm.sh ├── server.py ├── service_checker.py ├── stop_normalizers.sh └── utils.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # BERN normalization/ pmc/ pubmed/ pubmed_pubtator/ .idea/ biobert_ner/pretrainedBERT/ biobert_ner/result/ biobert_ner/tmp/ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # pyenv .python-version # celery beat schedule file celerybeat-schedule # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ ================================================ FILE: LICENSE ================================================ BSD 2-Clause License Copyright (c) 2019, Donghyeon Kim, Jinhyuk Lee, Chan Ho So, Hwisang Jeon, Minbyul Jeong, Yonghwa Choi, Wonjin Yoon, Mujeen Sung, Jaewoo Kang All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ================================================ FILE: README.md ================================================ # BERN BERN is a BioBERT-based multi-type NER tool that also supports normalization of extracted entities. This repository contains the official implementation of BERN. You can use BERN at https://bern.korea.ac.kr, or host your own server by following the description below. Please refer to our [paper (Kim et al., IEEE Access 2019)](https://doi.org/10.1109/ACCESS.2019.2920708) for more details. This project is done by [DMIS Laboratory](https://dmis.korea.ac.kr) at Korea University. **[Updates]** ***** **Check out [BERN2](https://github.com/dmis-lab/BERN2), an improved version of BERN with much faster and more accurate inference!** ***** **Fixed our gene normalizer to respond to issues between 2020-03-12 and 2020-03-13** 1. Download gnormplus-normalization_19.jar at [this URL](https://drive.google.com/open?id=1ZTKJyRLBeqG2ioTtUqvmW0C_H6PmHZGl) and place (overwrite) the file under normalization/resources/normalizers/gene directory. 2. Stop normalizers by running stop_normalizers.sh 3. Start the normalizers by running load_dicts.sh **Done - Server down due to air conditioning problems in our server room 2019-10-10 - 2019-10-11 7:55 AM (UTC-0)** **Fixed our disease normalizer 2019-08-19, 2019-08-10 and 2019-08-02 issues** 1. Download disease_normalizer_19.jar at [this URL](https://drive.google.com/open?id=1YbAanyQJ24PPBOu0NO8a1aCxWLdlQhk-) and place the file under normalization/resources/normalizers/disease directory. 2. Stop normalizers by running stop_normalizers.sh and restart the normalizers by running load_dicts.sh **Done - Server check 2019-07-18 8:20 AM - 1:30 PM (UTC-0)** 
Overview of BERN.
The description below gives instructions on hosting your own BERN. Please refer to https://bern.korea.ac.kr for the RESTful Web service of BERN. ## Requirements * Environment * Python >= 3.6 * CUDA 9 or higher * Main components * [BioBERT NER models (Lee et al., 2019)](https://arxiv.org/abs/1901.08746) * [tmTool APIs (Wei et al., 2016)](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/) * [GNormPlus (Wei et al., 2015)](https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/gnormplus/) * [tmVar 2.0 (Wei et al., 2018)](https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/) * [TensorFlow 1.13.1](https://github.com/tensorflow/tensorflow/releases/tag/v1.13.1) Note that you will need at least 66 GB of free disk space and 32 GB or more RAM. ## Installation * Clone this repo ``` cd git clone https://github.com/dmis-lab/bern.git ``` * Install python packages ``` pip3 install -r requirements.txt --user ``` * Install GNormPlus & run GNormPlusServer.jar * FYI: Download Google Drive files with WGET: https://gist.github.com/iamtekeste/3cdfd0366ebfd2c0d805#gistcomment-2316906 ``` cd ~/bern wget https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/download/GNormPlus/GNormPlusJava.zip unzip GNormPlusJava.zip cd GNormPlusJava wget -O ./crfpp-0.58.tar.gz https://drive.google.com/uc?id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ tar xvfz crfpp-0.58.tar.gz cp -rf CRF++-0.58/* CRF cd CRF sh ./configure make sudo make install cd .. chmod 764 Ab3P # chmod 764 CRF/crf_test # Set FocusSpecies to 9606 (Human) sed -i 's/= All/= 9606/g' setup.txt; echo "FocusSpecies: from All to 9606 (Human)" sh Installation.sh rm -r CRF++-0.58 rm crfpp-0.58.tar.gz # Download GNormPlusServer.jar wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1g-JlhqeDIlZX5YFk8Y27_M8BXUXcQRSX' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1g-JlhqeDIlZX5YFk8Y27_M8BXUXcQRSX" -O GNormPlusServer.jar && rm -rf /tmp/cookies.txt # Start GNormPlusServer nohup java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895 >> ~/bern/logs/nohup_gnormplus.out 2>&1 & ``` * Install tmVar2 & run tmVar2Server.jar ``` cd ~/bern wget ftp://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/tmVar2/tmVarJava.zip unzip tmVarJava.zip cd tmVarJava wget -O ./crfpp-0.58.tar.gz https://drive.google.com/uc?id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ tar xvfz crfpp-0.58.tar.gz cp -rf CRF++-0.58/* CRF cd CRF sh ./configure make sudo make install cd .. chmod 764 CRF/crf_test sh Installation.sh rm -r CRF++-0.58 rm crfpp-0.58.tar.gz # Download tmVar2Server.jar wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1kQYzLHLFLsU9qKpRRGjXkIYmaYK6bPJm' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1kQYzLHLFLsU9qKpRRGjXkIYmaYK6bPJm" -O tmVar2Server.jar && rm -rf /tmp/cookies.txt # Download dependencies wget https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.20.0/sqlite-jdbc-3.20.0.jar wget https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2.jar # Start tmVar2Server nohup java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896 >> ~/bern/logs/nohup_tmvar.out 2>&1 & ``` * Download normalization resources and pre-trained BioBERT NER models ``` cd ~/bern/scripts sh download_norm.sh sh download_biobert_ner_models.sh ``` * Run named entity normalizers ``` cd .. sh load_dicts.sh ``` * Run BERN server ``` # Check your GPU number(s) echo $CUDA_VISIBLE_DEVICES # Set your GPU number(s) export CUDA_VISIBLE_DEVICES=0 # Run BERN # Please check gnormplus_home directory and tmvar2_home directory. nohup python3 -u server.py --port 8888 --gnormplus_home ~/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home ~/bern/tmVarJava --tmvar2_port 18896 >> logs/nohup_BERN.out 2>&1 & # Print logs tail -F logs/nohup_BERN.out ``` * Usage * PMID(s) (HTTP GET) * http://\
[
{
"denotations": [
{
"id": [
"MESH:C567763",
"BERN:262813101"
],
"obj": "disease",
"span": {
"begin": 0,
"end": 13
}
},
{
"id": [
"MIM:171834",
"HGNC:8975",
"Ensembl:ENSG00000121879",
"BERN:324295302"
],
"obj": "gene",
"span": {
"begin": 53,
"end": 58
}
},
{
"id": [
"MESH:C567763",
"BERN:262813101"
],
"obj": "disease",
"span": {
"begin": 133,
"end": 146
}
},
{
"id": [
"MESH:D014652",
"BERN:256572101"
],
"obj": "disease",
"span": {
"begin": 158,
"end": 174
}
},
{
"id": [
"MESH:C567763",
"BERN:262813101"
],
"obj": "disease",
"span": {
"begin": 193,
"end": 231
}
},
{
"id": [
"MESH:C567763",
"BERN:262813101"
],
"obj": "disease",
"span": {
"begin": 234,
"end": 288
}
},
{
"id": [
"MESH:C567763",
"BERN:262813101"
],
"obj": "disease",
"span": {
"begin": 589,
"end": 593
}
},
{
"id": [
"MIM:171834",
"HGNC:8975",
"Ensembl:ENSG00000121879",
"BERN:324295302"
],
"obj": "gene",
"span": {
"begin": 748,
"end": 758
}
},
{
"id": [
"CUI-less"
],
"mutationType": "ProteinMutation",
"normalizedName": "p.F83S;CorrespondingGene:5290",
"obj": "mutation",
"span": {
"begin": 857,
"end": 866
}
},
{
"id": [
"BERN:257523801"
],
"obj": "disease",
"span": {
"begin": 906,
"end": 928
}
},
{
"id": [
"CUI-less"
],
"obj": "gene",
"span": {
"begin": 1009,
"end": 1024
}
},
{
"id": [
"MESH:C567763",
"BERN:262813101"
],
"obj": "disease",
"span": {
"begin": 1043,
"end": 1047
}
}
],
"elapsed_time": {
"ner": 0.611,
"normalization": 0.218,
"tmtool": 1.281,
"total": 2.111
},
"project": "BERN",
"sourcedb": "PubMed",
"sourceid": "29446767",
"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome.",
"timestamp": "Thu Jul 04 06:15:27 +0000 2019"
}
]