[ADD] Robustly refit models in final ensemble in parallel#471
[ADD] Robustly refit models in final ensemble in parallel#471theodorju merged 18 commits intoautoml:developmentfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## development #471 +/- ##
================================================
+ Coverage 64.65% 85.23% +20.58%
================================================
Files 231 232 +1
Lines 16304 16456 +152
Branches 3009 3048 +39
================================================
+ Hits 10542 14027 +3485
+ Misses 4714 1578 -3136
+ Partials 1048 851 -197
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
nabenabe0928
left a comment
There was a problem hiding this comment.
Hi thanks for the PR.
I think the changes made the codebase looks better.
I added some minor comments.
autoPyTorch/api/base_task.py
Outdated
| if old_identifier_index is not None: | ||
| replace_old_identifiers_to_refit_identifiers[list(self.models_.keys())[old_identifier_index]] = refit_identifier | ||
| else: | ||
| self._logger.warning(f"Refit for {config} failed. Updating ensemble weights accordingly.") |
There was a problem hiding this comment.
Do we still update the ensemble weights?
There was a problem hiding this comment.
thanks, I have fixed it
|
Thanks for the PR. Looks good to me! |
nabenabe0928
left a comment
There was a problem hiding this comment.
Hi, thanks for the work!
I checked your changes and approved them:)
theodorju
left a comment
There was a problem hiding this comment.
Thanks for the changes. I'm just leaving some minor comments.
autoPyTorch/api/base_task.py
Outdated
| metric=self._metric, | ||
| dask_client=self._dask_client, | ||
| backend=self._backend, | ||
| memory_limit=self._memory_limit, |
There was a problem hiding this comment.
Shouldn't this use the memory limit populated above at lines 807-809:
| memory_limit=self._memory_limit, | |
| memory_limit=memory_limit, |
There was a problem hiding this comment.
Thanks for pointing it out. I have fixed it now
| temporary_directory='./tmp/autoPyTorch_example_tmp_01', | ||
| output_directory='./tmp/autoPyTorch_example_out_01', | ||
| delete_tmp_folder_after_terminate=False, | ||
| delete_output_folder_after_terminate=False, |
There was a problem hiding this comment.
If uncomment was on purpose l'd suggest we remove the lines above.
There was a problem hiding this comment.
no, it was an artefact of debugging, I have fixed it now. Thanks
theodorju
left a comment
There was a problem hiding this comment.
Thanks for the changes. Looks good now.
| self.config["early_stopping_rounds"] = early_stopping | ||
|
|
||
| if self.has_val_set: | ||
| early_stopping = 150 if X_train.shape[0] > 10000 else max(round(150 * 10000 / X_train.shape[0]), 10) |
There was a problem hiding this comment.
We don't early stopping if self.has_val_set is set as False?
There was a problem hiding this comment.
yeah, we can't as we rely on external libraries to implement this for us
| setuptools.setup( | ||
| name="autoPyTorch", | ||
| version="0.2", | ||
| version="0.2.1", |
There was a problem hiding this comment.
I believe we should also update the version in __version__.py.
There was a problem hiding this comment.
Yeah that's exactly my commit as well :P. I made the change but didn't add it in the previous commit.
|
Looks good. I'll proceed with merging this PR. |
* [FIX] Documentation and docker workflow file (#449) * fixes to documentation and docker * fix to docker * Apply suggestions from code review * add change log for release (#450) * [FIX] release docs (#452) * Release 0.2 * Release 0.2.0 * fix docs new line * [FIX] ADD forecasting init design to pip data files (#459) * add forecasting_init.json to data files under setup * avoid undefined reference in scale_value * checks for time series dataset split (#464) * checks for time series dataset split * maint * Update autoPyTorch/datasets/time_series_dataset.py Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * [FIX] Numerical stability scaling for timeseries forecasting tasks (#467) * resolve rebase conflict * add checks for scaling factors * flake8 fix * resolve conflict * [FIX] pipeline options in `fit_pipeline` (#466) * fix update of pipeline config options in fit pipeline * fix flake and test * suggestions from review * [FIX] results management and visualisation with missing test data (#465) * add flexibility to avoid checking for test scores * fix flake and test * fix bug in tests * suggestions from review * [ADD] Robustly refit models in final ensemble in parallel (#471) * add parallel model runner and update running traditional classifiers * update pipeline config to pipeline options * working refit function * fix mypy and flake * suggestions from review * fix mypy and flake * suggestions from review * finish documentation * fix tests * add test for parallel model runner * fix flake * fix tests * fix traditional prediction for refit * suggestions from review * add warning for failed processing of results * remove unnecessary change * update autopytorch version number * update autopytorch version number and the example file * [DOCS] Release notes v0.2.1 (#476) * Release 0.2.1 * add release docs * Update docs/releases.rst Co-authored-by: Difan Deng <33290713+dengdifan@users.noreply.github.com>
Similar to
fit_pipeline,refitfunction now runs the models found in the final ensemble in parallel using dask. It is also robust to failures while refitting where it reuses the original model instead.Types of changes
Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
Please separate these changes and send us individual PRs for each.
For more information on how to create a good pull request, please refer to The anatomy of a perfect pull request.
Checklist:
Description
To enable catching errors and adding constraints, I have used the
ExecuteTAEFuncWithQueueclass. As the code for training models in parallel is also used for running the traditional models, I have created arun_models_on_datasetfunction which encapsulates this functionality.Motivation and Context
Refit currently, only runs all the models sequentially and fails if any one of the models to be refitted fails. Moreover, there is no way to limit the time and the memory used for the refit. With this PR, I have added the regular
TAEwhich is used for search and other model fittings, which allow us to gracefully exit when a refit fails as well as add the relevant constraints.This PR fixes #469.
How has this been tested?
I have added a test for run_models_on_dataset which ensures that at least one of the 5 random configs is successful. I have also extended the test for tabular classification, to verify that refit works as expected, i.e, the ensemble is updated.