Conversation
mlem/contrib/pandas.py
Outdated
| if has_index(df): | ||
| if ( | ||
| has_index(df) | ||
| and PANDAS_FORMATS["stata"].write_func != self.write_func |
There was a problem hiding this comment.
This problem occurs only with stata format, and only in the case described in original issue (i.e. when you import stata format file and want to re-write initial content - cause otherwise the dataframe will be saved in csv and no problem occurs at write). In this case there could be no real index (since stata format doesn't support saving index in some special way - it's going to be saved as a column). Then at import MLEM cannot have an index in this dataframe, which allows us to skip resetting it.
I can't see any solution better than creating a workaround like: if you have empty index, rename column to something like "__index__", and save the instruction to rename it on load back to "". Which is something I'd like to avoid now since we don't write to stata format anyway for now - only to csv (which means this problem just won't occur except for special case I mentioned in the first paragraph).
There was a problem hiding this comment.
Like the original reporter in the issue, I don't completely know why we're resetting index here. Any idea @aguschin ?
There was a problem hiding this comment.
You may use index in the model itself. E.g. the index may have some information used by the model directly, like customer id, timestamp, etc. So, to make things more reproducible and precise, we decided to keep the index. We argued about this with @mike0sv a while ago, so that's a decision made early on. We can debate again whether this is a bad practice and/or maybe change the default behavior: not store index by default.
There was a problem hiding this comment.
Makes sense. This is the sort of things usually code comments are used for - explain why we're doing non-obvious operations (can link to gh isues/discussions, etc)
omesser
left a comment
There was a problem hiding this comment.
Suggest to add comments - why reset index, and now adding a comment for the condition linking to the issue
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #624 +/- ##
==========================================
- Coverage 86.17% 86.16% -0.01%
==========================================
Files 107 107
Lines 9705 9710 +5
==========================================
+ Hits 8363 8367 +4
- Misses 1342 1343 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
close #618