Claim Detection & Semantics Extraction (Covid-19)

Installation

Clone repo
Create Python virtual environment
Make sure that your current Java environment is Java 8.
- If the setup fails at the JAMR step, check that Java 8 is configured for the newly downloaded transition-amr-parser project.
Make sure cuda is enabled if you are on a machine with a GPU.
Run make install [$isi_username]
- This assumes that your conda installation is within ~/miniconda3. If it is not, replace Line 27 of setup.sh with: source ~/PATH_TO_MINICONDA_INSTALL.
- If you provide isi_username, it will assume that you can access the minlp-dev-01 server and that you are working from a local system. In that case, you will be prompted for a password after you see "Downloading model..." If not, it will assume that you are working from a /nas-mounted server.
You will also need to download and unzip this file into data/:
1. UIUC EDL data (param: edl.edl_output_dir): https://drive.google.com/file/d/16ANEPjqy4byNY3B2BmYqsu1ZcBlp9tfR/view?usp=sharing

Docker

These instructions assume that you are building the image on the SAGA cluster.

Clone repo
cd into cdse-covid and clone the following repos:
1. git clone https://github.com/isi-vista/aida-tools.git
2. git clone https://github.com/elizlee/amr-utils.git
3. git clone https://github.com/isi-vista/saga-tools.git
4. git clone https://github.com/IBM/transition-amr-parser.git
  1. Make sure that your transition-amr-parser installation is updated and on the master branch.
  2. cd to transition-amr-parser/preprocess and do the following:
    1. git clone https://github.com/jflanigan/jamr.git
    2. git clone https://github.com/damghani/AMR_Aligner.git
    3. mv AMR_Aligner kevin
    4. cd transition-amr-parser/preprocess/kevin:
      1. git clone https://github.com/moses-smt/mgiza.git
5. Copy the following files from /scratch/dockermount/cdse_covid_resources:
  1. The Wikidata classifier: wikidata_classifier.state_dict --> cdse-covid/wikidata_linker/resources
  2. The AMR parser model: /scratch/dockermount/cdse_covid_resources/AMR2.0 --> transition-amr-parser/DATA
6. cd back into cdse-covid and run
```
docker build . -t isi-cdse-covid:<tag>
```

Usage

Via Pegasus WFMS

Generate workflow

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.claim_pipeline params/claim_detection.params

Navigate to experiment dir specified in your params file, execute the workflow, and monitor the progress

bash setup.sh
pegasus-status PEGASUS/RUN/DIR -w 60

Via Shell Script

We provide a simple way to run the whole pipeline without needing Pegasus WMS.

Create a parameter file with your own values for the parameters in params/run_pipeline_params.params
Make sure that your cdse-covid conda environment is active.

Run

bash ./run_pipeline.sh your/params/file

Via Individual Scripts

Create the AMR files

The files in TXT_FILES should consist of sentences separated by line.

conda activate transition-amr-parser
python -m cdse_covid.pegasus_pipeline.run_amr_parsing_all \
    --corpus TXT_FILES \
    --output AMR_FILES \
    --max-tokens MAX_TOKENS \
    --amr-parser-model TRANSITION_AMR_PARSER_PATH

Preprocessing

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.ingesters.aida_txt_ingester \
    --corpus TXT_FILES --output SPACIFIED --spacy-model SPACY_PATH

EDL ingestion

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.ingesters.edl_output_ingester \
    --edl-output EDL_OUTPUT --output EDL_MAPPING_FILE

Claim detection

conda activate <cdse-covid-env>
python -m cdse_covid.claim_detection.run_claim_detection \
    --input SPACIFIED \
    --patterns claim_detection/topics.json \
    --out CLAIMS_OUT \
    --spacy-model SPACY_PATH

Semantic extraction from AMR

conda activate transition-amr-parser
python -m cdse_covid.semantic_extraction.run_amr_parsing \
    --input CLAIMS_OUT \
    --output AMR_CLAIMS_OUT \
    --amr-parser-model TRANSITION_AMR_PARSER_PATH \
    --max-tokens MAX_TOKENS \
    --domain DOMAIN

Semantic extraction from SRL

conda activate <cdse-covid-env>
python -m cdse_covid.semantic_extraction.run_srl \
    --input AMR_CLAIMS_OUT \
    --output SRL_OUT \
    --spacy-model SPACY_PATH

Wikidata linking

conda activate <cdse-covid-env>
python -m cdse_covid.semantic_extraction.run_wikidata_linking \
    --claim-input CLAIMS_OUT \
    --srl-input SRL_OUT \
    --amr-input AMR_CLAIMS_OUT \
    --output WIKIDATA_OUT

Entity merging

conda activate <cdse-covid-env>
python -m cdse_covid.semantic_extraction.run_entity_merging \
    --edl EDL_MAPPING_FILE \
    --qnode-freebase QNODE_FREEBASE_MAPPING \
    --freebase-to-qnodes FREEBASE_TO_QNODES \
    --claims WIKIDATA_OUT \
    --output ENTITY_OUT \
    --include-contains

Postprocessing

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.convert_claims_to_json \
    --input ENTITY_OUT \
    --output OUTPUT_FILE

Converting the JSON to AIF

conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.ingesters.claims_json_to_aif \
   --claims-json OUTPUT FILE \
   --aif-dir AIF_OUTPUT_DIR

Contributing

Before pushing, first run make precommit to run all precommit checks.
- You can run these checks individually if you so desire. Please see (./Makefile)[Makefile] for a list of all commands.
After ensuring all linting requirements are met, rebase the new branch against master.
Create a new PR, requesting review from at least one collaborator.

Name		Name	Last commit message	Last commit date
Latest commit History 477 Commits
cdse_covid		cdse_covid
params		params
sample_data		sample_data
scripts		scripts
wikidata_linker		wikidata_linker
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
amr-requirements-docker-lock.txt		amr-requirements-docker-lock.txt
amr-requirements-lock.txt		amr-requirements-lock.txt
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-docker-lock.txt		requirements-docker-lock.txt
requirements-lock.txt		requirements-lock.txt
run_pipeline.sh		run_pipeline.sh
setup.py		setup.py
setup.sh		setup.sh
tap_environment_for_docker.yml		tap_environment_for_docker.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claim Detection & Semantics Extraction (Covid-19)

Installation

Docker

Usage

Via Pegasus WFMS

Via Shell Script

Via Individual Scripts

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claim Detection & Semantics Extraction (Covid-19)

Installation

Docker

Usage

Via Pegasus WFMS

Via Shell Script

Via Individual Scripts

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages