Skip to content

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

@denizbeser

Description

@denizbeser

If you look at the original dev data, you will see every datapoint is distinct. Training data set, however, has a lot of repetitions. This makes it infeasible to do a 90-10-10 split.

Potential solutions:

  1. one reasonable thing to do would be to (a) separate in a "not overlapping" dev and set up a cross-fold validation experiment
  2. using a fraction of the original dev as internal dev for Anli

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions