DataCutter-related fixes for multiclass#263
Conversation
Codecov Report
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
- Coverage 86.61% 84.75% -1.86%
==========================================
Files 315 315
Lines 10345 10369 +24
Branches 346 563 +217
==========================================
- Hits 8960 8788 -172
- Misses 1385 1581 +196
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
- Coverage 86.24% 82.47% -3.78%
==========================================
Files 320 321 +1
Lines 10471 10539 +68
Branches 352 558 +206
==========================================
- Hits 9031 8692 -339
- Misses 1440 1847 +407
Continue to review full report at Codecov.
|
|
@gerashegalov can you elaborate more on the proposed fixes? thanks. |
8333743 to
618bd92
Compare
|
@tovbinm there are multiple issues. One is the sort stability when there are multiple top labels with identical cardinality. We can solve this by adding the label as a secondary sort key. Similar issue with filtering labels for the minFraction prior to sorting. And #251 caused the data cutter omission to begin with |
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala
Outdated
Show resolved
Hide resolved
|
@gerashegalov please let us know when you want the PR reviewed. |
6abd993 to
7571463
Compare
|
@tovbinm it's ready |
core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
tovbinm
left a comment
There was a problem hiding this comment.
Can you please add a new test case explicitly for label trimming issue?
|
not ready for review yet, checkpointing WIP, sorting out test issues. |
843f8cd to
a43409a
Compare
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
b7c72de to
5a5b19d
Compare
|
marking as ready for review since all unit testing are passing |
leahmcguire
left a comment
There was a problem hiding this comment.
Just remove the parts that were in there for testing and LGTM
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/tuning/DataCutter.scala
Outdated
Show resolved
Hide resolved
test table num_vals
27992db to
17c53fb
Compare
|
🎆 🍾 |
Related issues
Fixes #245
DataCutter.estimatewhereasfilterbeforesortis good for performance it may not drop labels non-deterministically unlike filtering over a stable order.Describe the proposed solution
Remove non-determinism