ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored

If you look at the original dev data, you will see every datapoint is distinct. Training data set, however, has a lot of repetitions. This makes it infeasible to do a 90-10-10 split.

Potential solutions:
1) one reasonable thing to do would be to (a) separate in a "not overlapping" dev and set up a cross-fold validation experiment
2) using a fraction of the original dev as internal dev for Anli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions