- Clone repo
- Create Python virtual environment
- Make sure that your current Java environment is Java 8.
- If the setup fails at the JAMR step, check that Java 8 is configured
for the newly downloaded
transition-amr-parserproject.
- If the setup fails at the JAMR step, check that Java 8 is configured
for the newly downloaded
- Make sure
cudais enabled if you are on a machine with a GPU. - Run
make install [$isi_username]- This assumes that your conda installation is within
~/miniconda3. If it is not, replace Line 27 ofsetup.shwith:source ~/PATH_TO_MINICONDA_INSTALL. - If you provide
isi_username, it will assume that you can access theminlp-dev-01server and that you are working from a local system. In that case, you will be prompted for a password after you see"Downloading model..."If not, it will assume that you are working from a/nas-mounted server.
- This assumes that your conda installation is within
- You will also need to download and unzip this file into
data/:- UIUC EDL data (param:
edl.edl_output_dir): https://drive.google.com/file/d/16ANEPjqy4byNY3B2BmYqsu1ZcBlp9tfR/view?usp=sharing
- UIUC EDL data (param:
These instructions assume that you are building the image on the SAGA cluster.
- Clone repo
cdintocdse-covidand clone the following repos:git clone https://github.com/isi-vista/aida-tools.gitgit clone https://github.com/elizlee/amr-utils.gitgit clone https://github.com/isi-vista/saga-tools.gitgit clone https://github.com/IBM/transition-amr-parser.git- Make sure that your
transition-amr-parserinstallation is updated and on themasterbranch. cdtotransition-amr-parser/preprocessand do the following:git clone https://github.com/jflanigan/jamr.gitgit clone https://github.com/damghani/AMR_Aligner.gitmv AMR_Aligner kevincd transition-amr-parser/preprocess/kevin:git clone https://github.com/moses-smt/mgiza.git
- Make sure that your
- Copy the following files from
/scratch/dockermount/cdse_covid_resources:- The Wikidata classifier:
wikidata_classifier.state_dict-->cdse-covid/wikidata_linker/resources - The AMR parser model:
/scratch/dockermount/cdse_covid_resources/AMR2.0-->transition-amr-parser/DATA
- The Wikidata classifier:
cdback intocdse-covidand rundocker build . -t isi-cdse-covid:<tag>
- Generate workflow
conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.claim_pipeline params/claim_detection.params
- Navigate to experiment dir specified in your params file, execute the workflow, and monitor the progress
bash setup.sh
pegasus-status PEGASUS/RUN/DIR -w 60
We provide a simple way to run the whole pipeline without needing Pegasus WMS.
- Create a parameter file with your own values for the parameters in
params/run_pipeline_params.params - Make sure that your cdse-covid conda environment is active.
- Run
bash ./run_pipeline.sh your/params/file
-
Create the AMR files
The files in
TXT_FILESshould consist of sentences separated by line.conda activate transition-amr-parser python -m cdse_covid.pegasus_pipeline.run_amr_parsing_all \ --corpus TXT_FILES \ --output AMR_FILES \ --max-tokens MAX_TOKENS \ --amr-parser-model TRANSITION_AMR_PARSER_PATH -
Preprocessing
conda activate <cdse-covid-env> python -m cdse_covid.pegasus_pipeline.ingesters.aida_txt_ingester \ --corpus TXT_FILES --output SPACIFIED --spacy-model SPACY_PATH -
EDL ingestion
conda activate <cdse-covid-env> python -m cdse_covid.pegasus_pipeline.ingesters.edl_output_ingester \ --edl-output EDL_OUTPUT --output EDL_MAPPING_FILE -
Claim detection
conda activate <cdse-covid-env> python -m cdse_covid.claim_detection.run_claim_detection \ --input SPACIFIED \ --patterns claim_detection/topics.json \ --out CLAIMS_OUT \ --spacy-model SPACY_PATH -
Semantic extraction from AMR
conda activate transition-amr-parser python -m cdse_covid.semantic_extraction.run_amr_parsing \ --input CLAIMS_OUT \ --output AMR_CLAIMS_OUT \ --amr-parser-model TRANSITION_AMR_PARSER_PATH \ --max-tokens MAX_TOKENS \ --domain DOMAIN -
Semantic extraction from SRL
conda activate <cdse-covid-env> python -m cdse_covid.semantic_extraction.run_srl \ --input AMR_CLAIMS_OUT \ --output SRL_OUT \ --spacy-model SPACY_PATH -
Wikidata linking
conda activate <cdse-covid-env> python -m cdse_covid.semantic_extraction.run_wikidata_linking \ --claim-input CLAIMS_OUT \ --srl-input SRL_OUT \ --amr-input AMR_CLAIMS_OUT \ --output WIKIDATA_OUT -
Entity merging
conda activate <cdse-covid-env> python -m cdse_covid.semantic_extraction.run_entity_merging \ --edl EDL_MAPPING_FILE \ --qnode-freebase QNODE_FREEBASE_MAPPING \ --freebase-to-qnodes FREEBASE_TO_QNODES \ --claims WIKIDATA_OUT \ --output ENTITY_OUT \ --include-contains -
Postprocessing
conda activate <cdse-covid-env> python -m cdse_covid.pegasus_pipeline.convert_claims_to_json \ --input ENTITY_OUT \ --output OUTPUT_FILE -
Converting the JSON to AIF
conda activate <cdse-covid-env>
python -m cdse_covid.pegasus_pipeline.ingesters.claims_json_to_aif \
--claims-json OUTPUT FILE \
--aif-dir AIF_OUTPUT_DIR
- Before pushing, first run
make precommitto run all precommit checks.- You can run these checks individually if you so desire. Please see (./Makefile)[Makefile] for a list of all commands.
- After ensuring all linting requirements are met, rebase the new branch against master.
- Create a new PR, requesting review from at least one collaborator.