Skip to content

Latest commit

 

History

History
71 lines (51 loc) · 2.12 KB

File metadata and controls

71 lines (51 loc) · 2.12 KB

Components of this Repo

Last edited: week of Feb 12th 2024.

#TODO: rewrite for public consumption, see TODO subsections

Repo components: scripts

.cleanup.py

Removes all output directories created in failed training runs, i.e. those which do not contain checkpoints or hyperparameters

.graveyard.py

Contains old code for reference, mostly methods for data module creation and batch collation.

opt_model.py

kg_dataset.py

datamodules.py

tokenizer_tools.py

augmentations.py

main.py

eval.py

bytestream.py

run_tests.py

test_load_interactive.py

train_manual.sh

Misc.:

requirements.txt

comments.txt

gitinore

results-32.11.23.txt

README.md

train_ddp_process_1.log

Repo components: folders

data/

  • data/yago3-10/
    • test/train/valid/valid_tiny.del: train, test, and validation splits as tab-delimited files. Each line in the file is a unique triple of {subject_idx, rel_idx, object_idx}.
    • entity/relation_mentions.del: links ids with their canonical mentions. first column entity/relation id, second column corresponding entity/rel mention.
comments for cleanup/TODOs:
  • describe contents / function of [and script creating] query_solutions.pckl for yago3-10 and wikidata
  • describe data/wikidataXX-XX
  • add credit for datasets and data preprocessing/train/val/test splits

outputs/

comments for cleanup/TODOs:

figures/

comments for cleanup/TODOs:

.hydra/

  • hydra.yaml: defines (default) hydra behaviour; default options for multirun execution in sweep
comments for cleanup/TODOs:
  • what actually was in here?? @Daniel
  • add default options in sweep if used.

The conf/ folder:

The config folder where hydra goes to check all available config files and file groups.

  • config.yaml: defines all (default) model/execution parameters which shouldn't be hardcoded; input to final config of run.
comments for cleanup/TODOs:
  • default folders for config packages