Threshold metrics calculation fix when unseen labels are present#293
Merged
Threshold metrics calculation fix when unseen labels are present#293
Conversation
…grifAI into km/label-trim-metrics
leahmcguire
approved these changes
Apr 19, 2019
gerashegalov
approved these changes
Apr 19, 2019
Contributor
gerashegalov
left a comment
There was a problem hiding this comment.
LGTM, minor comment
| val label: Label = scoresAndLabels._2.toInt | ||
| val trueClassScore: Double = scores(label) | ||
| // The label may be unseen during model training, so treat scores for unseen classes as all being zero | ||
| val trueClassScore: Double = if (scores.isDefinedAt(label)) scores(label) else 0.0 |
Contributor
There was a problem hiding this comment.
how about scores.lift(label).getOrElse(0.0)
Contributor
Author
There was a problem hiding this comment.
Ahh, I was trying to remember how to do that!
Closed
Merged
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related issues
Issue exposed by fixes in #263
Describe the proposed solution
Simple one-line fix to threshold metrics calculation. When the true label score is calculated, it will either come from the list of labels the model was trained on (which may be pruned by DataCutter) or be 0 (eg. if it corresponds to a label the model was not trained on).
Describe alternatives you've considered
n/a
Additional context
Merge #263 first since this PR was branched off of it.
Unit test resides in MultiClassificationModelSelectorTest