Correct conversion of Spark model stages into MLeap local models#261
Correct conversion of Spark model stages into MLeap local models#261
Conversation
Codecov Report
@@ Coverage Diff @@
## master #261 +/- ##
==========================================
- Coverage 86.67% 86.36% -0.31%
==========================================
Files 317 318 +1
Lines 10403 10447 +44
Branches 322 552 +230
==========================================
+ Hits 9017 9023 +6
- Misses 1386 1424 +38
Continue to review full report at Codecov.
|
|
So now we have to spool up spark to use this sparkless scoring?? Could you get the necessary info another way? eg. serialize the dataframe schema along with the model? |
| case m: VectorSlicerModel => x => m.apply(x(0).asInstanceOf[Vector]) | ||
| case m: WordLengthFilterModel => x => m.apply(x(0).asInstanceOf[Seq[String]]) | ||
| case m: WordToVectorModel => x => m.apply(x(0).asInstanceOf[Seq[String]]) | ||
| case m => throw new RuntimeException(s"Unsupported MLeap model: ${m.getClass.getName}") |
There was a problem hiding this comment.
so every wrapped spark stage has to be in this list? we should add that the the docs on wrapping...
There was a problem hiding this comment.
I currently added all the stages from features package. We can also add models from classification, regression and recommendation packages, but we already have the first two of them covered as our own OpTransformer stages, so I did not see much of a point adding them.
There was a problem hiding this comment.
so to your question - for right now I think we have everything covered, except recommenders, which I am planning to add once we are ready.
There was a problem hiding this comment.
Can you add a todo with the classification and regression models? I dont know that this will be much use without them...
There was a problem hiding this comment.
Adding those is very easy. the thing is we already have classification and regression models as OpTransformers so MLeap won’t be used to run them.
|
@leahmcguire we used to do before when loading spark stages. Now I just explicitly exposed an ability to users to control spark session lifecycle. The only way around avoiding spark session is to export our models into MLeap format. Which is indeed a possibility and I am open to discuss it. As of right now, local scoring assumes the model format as we have it now (i.e. json + parquet files). |
Related issues
StringIndexerModelinto MLeap model it was expectingml_attrmetadata to be present in transformed Dataframe.applymethod on MLeap models did not work correctly, since many models had more than oneapplymethod present.Describe the proposed solution
applymethod on MLeap models and explicitly convert MLeap models into scoring methods.Describe alternatives you've considered
N/A