Question about ultrafineweb dataset preparation for tequila/sherry

Hello,

I'm trying to reproduce the results of tequila/sherry. But I'm not seeing the accuracy reported by the paper (I attached the tequila results here, trained by me). I suspect it's the way I prepare the dataset is not the same as the authors did. Can you share how we can prepare the ultrafineweb dataset for the tequila/sherry QAT.

<img width="2114" height="118" alt="Image" src="https://github.com/user-attachments/assets/50364566-222d-4d6a-b259-d75a9c497b7c" />

Thanks very much
Sho


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about ultrafineweb dataset preparation for tequila/sherry #233

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about ultrafineweb dataset preparation for tequila/sherry #233

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions