Hello,
I'm trying to reproduce the results of tequila/sherry. But I'm not seeing the accuracy reported by the paper (I attached the tequila results here, trained by me). I suspect it's the way I prepare the dataset is not the same as the authors did. Can you share how we can prepare the ultrafineweb dataset for the tequila/sherry QAT.
Thanks very much
Sho