Mutable tensor for local session and web session by sighingnow · Pull Request #536 · mars-project/mars

sighingnow · 2019-07-11T08:32:20Z

What do these changes do?

Support mutable tensor in local session and web session.

For the LocalSession, we
1. maintain a map of name -> mut tensor in LocalSession, and do write to the ndarray directly.
2. when seal, we use the ndarray to construct a mt.tensor and execute it with self._executor know the tensor is executed.
For the WebSession, we
1. add several necessary endpoints to web api server.
2. when write, we forward the index and value to MutableTensorActor, the MutableTensorActor maintains the buffer, and send record chunks to corresponding workers.
3. When seal, do the same thing with LocalClusterSession.

Some design point that needs to clarify:

The index (such as (slice(1, None, None))) cannot be serialized to json directly, thus I add a helper class MutableTensor.Index and leverage the Serializable.to_json to do the serialization.
In WebSession, the index calculation and chunk transfer are done in MutableTensorActor.
All functionalities of mutable tensor in LocalSession work well with the WebSession (as shown by the unit test).
Along this PR I also do some refactor and improvement on the previous implementation to avoid code duplication.

About the (known) failed test cases:

It seems that there is no way to obtain the information and message of exception that initially raised in server at client side. We have dump_exception and reraise,

(Edit: this exception info issue has been fixed in commit 31520ebc9 of this PR.)

    def _dump_exception(self, exc_info):
        pickled_exc = pickle.dumps(exc_info)
        self.write(json.dumps(dict(
            exc_info=base64.b64encode(pickled_exc),
        )))
        raise web.HTTPError(500, 'Internal server error')

    if resp.status_code >= 400:
        resp_json = json.loads(resp.text)
        exc_info = base64.b64decode(resp_json['exc_info'])
        six.reraise(*exc_info)

But it doesn't work because:

The base64.b64encode(pickled_exc) is bytes, not str, which cannot be json.dumps.
Even after fixing the problem (by .decode('ascii')), we will found that resp.text is something like

<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>

rather than the serialized exception information. Not sure if this limitation is a bug or just by-design.

Related issue number

#415

wjsi · 2019-07-11T08:41:57Z

mars/web/api.py

 register_web_handler('/api/session/(?P<session_id>[^/]+)/graph/(?P<graph_key>[^/]+)/data/(?P<tileable_key>[^/]+)',
                     GraphDataHandler)
+register_web_handler('/api/session/(?P<session_id>[^/]+)/mutable-tensor', MutableTensorHandler)
+register_web_handler('/api/session/(?P<session_id>[^/]+)/mutable-tensor/write', MutableTensorWriteHandler)


This API design does not conform to RESTful styles. May change to ``/api/session/(?P<session_id>[^/]+)/mutable-tensor/(?P<name>[^/]+) and use GET / PUT / POST methods to handle data read / write / seal.

Will revise.

hi @wjsi Mutable tensor requires four endpoints: create/get/write/seal. Is it ok to use the following mapping?

POST for create

HEAD for get (is HEAD ok here ?)

PUT for write

GET for seal

GET shall not cause any side-effect on the storage. Therefore create and write can be merged into PUT and the implementation of API decide whether to create while POST for seal, or POST with an action indicating create or seal. If I were the writer of this PR, I would prefer the former solution.

create and write can be merged into PUT and the implementation of API decide whether to create while POST for seal

Nice suggestion, thanks!

I have revised the endpoint by use POST for both write and seal (easier to distinguish since write has body payload and seal doesn't have, and write's body payload is raw bytes, not json, making create and write are harder to distinguish without extra paramter).

Now all four API of mutable tensor share the same HTTP endpoint.

Customized headers can also be used to pass string meta data if you do not like query strings.

qinxuye

LGTM overall, have some question about serialization a bit mentioned in comment of #540 .

qinxuye · 2019-07-14T16:28:53Z

mars/tensor/expressions/utils.py

+                                                _nsplits=tensor.nsplits, _key=tensor.key, _chunks=tensor.chunks))
+
+
+def setitem_as_records(nsplits_acc, output_chunk, value, ts, is_scalar):


Doc should be updated.

qinxuye · 2019-07-14T16:39:24Z

One more thing, really cool to see the error can be serialized and sent to client, but sadly no test is added for that, could you please try to add some ut?

hekaisheng · 2019-07-15T04:10:00Z

We can move MutableTensor.Index out of MutableTensor as a unified way to serialize indexes. Will you do it in this PR or do it later in my PR #540 ?

sighingnow · 2019-07-15T13:09:27Z

The exception info return by the http api of web session can be validated in the self.assertRaises part of this PR. I will add a standalone test for the exception info as well.

sighingnow · 2019-07-15T13:11:34Z

@hekaisheng I will revise the patch and move the Index class out as soon as possible. Do you guys think is SerializableIndex a reasonable name for this class? (since we also have a Index class in dataframe part).

sighingnow · 2019-07-16T02:04:31Z

Add tests for return exception info from web api.
Add mars.tensor.core.mutable_tensor constructor, I haven't import it in mars.tensor.__init__ since there seems cycle import, will dig into it as soon as possible.
I will rebase to master (to adapt the serialable Index part) after Support fetch tensor data slices from client #540 merged, will ping reviewers then.

qinxuye · 2019-07-16T02:34:45Z

Add tests for return exception info from web api.

Add mars.tensor.core.mutable_tensor constructor, I haven't import it in mars.tensor.__init__ since there seems cycle import, will dig into it as soon as possible.

I will rebase to master (to adapt the serialable Index part) after Support fetch tensor data slices from client #540 merged, will ping reviewers then.

Great, I have an idea, can we add a fill_value parameter to create_mutable_tensor, so that we can initialize some value for the users?

sighingnow · 2019-07-16T02:57:23Z

Great, I have an idea, can we add a fill_value parameter to create_mutable_tensor, so that we can initialize some value for the users?

Agree. The initial_value can a property of MutableTensorActor and will be used to create the empty numpy tensor when seal. Will do that ASAP.

hekaisheng · 2019-07-16T09:51:45Z

~~#540 has been merged, you can rebase master now.~~

Better to rebase after #546 merged.

sighingnow · 2019-07-16T10:22:11Z

Better to rebase after #546 merged.

Will do that.

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

This reverts commit f566da1.

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

sighingnow · 2019-07-16T15:06:02Z

Rebase to master and delete MutableTensor.index
Implement fill_value for mutable tensor constructor.

qinxuye

LGTM

wjsi · 2019-07-17T06:26:27Z

mars/scheduler/mutable.py

+            ep = self.get_scheduler(chunk_key)
+            # register quota
+            quota_ref = self.ctx.actor_ref(MemQuotaActor.default_uid(), address=ep)
+            quota_ref.request_batch_quota({record_chunk_key: records.nbytes})


ReceiverActor takes little process memory, and quota request is not needed.

The quota is for chunk of records (the (index, value) record of write operations) and the record chunk may be spilled, thus the quota is required, IMO.

ReceiverActor stores data in plasma_store or disk. This means the cost of process memory equals to zero when receiving data from other machines. What's more, we serialize with pyarrow with zero-copy and spill in small chunks (not chunks in Mars), hence the additional memory cost is no more than the size of these chunks. Therefore there is no need requesting for quotas before data transfer.

Fixed. No need to requesting quota now.

wjsi · 2019-07-17T06:31:20Z

mars/worker/utils.py

    return '%s_load_memory_%s' % (graph_key, chunk_key)
+
+
+def put_chunk(session_id, chunk_key, data, receiver_ref):


This function shall be put in transfer.py and renamed as put_remote_chunk as it is not referenced by any code in worker module.

wjsi · 2019-07-17T06:43:13Z

mars/web/api.py

+
+    def post(self, session_id, name):
+        try:
+            # If the request contains no body payload, it is seal, otherwise it is create


I still think it is better to use a customized header or a query string argument to define the action of POST.

Fixed. A parameter action has been added.

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

qinxuye · 2019-07-17T16:51:47Z

Could you please confirm that the new commits have resolved your comments? @wjsi

wjsi

LGTM

wjsi reviewed Jul 11, 2019

View reviewed changes

sighingnow force-pushed the mut-web branch from 7c7470e to 1d70904 Compare July 12, 2019 09:50

qinxuye added mod: mutable tensor type: feature New feature labels Jul 12, 2019

qinxuye added this to the v0.2.0rc1 milestone Jul 12, 2019

qinxuye mentioned this pull request Jul 14, 2019

Support fetch tensor data slices from client #540

Merged

qinxuye reviewed Jul 14, 2019

View reviewed changes

sighingnow force-pushed the mut-web branch from e5e8d59 to f5ac2b6 Compare July 16, 2019 02:15

qinxuye mentioned this pull request Jul 16, 2019

Mars roadmaps and enhancement proposals #537

Open

sighingnow added 12 commits July 16, 2019 23:03

Mutable tensor for LocalSession.

c2a626a

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Mutable tensor for web session, and a bit refactor.

6cbc018

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Fix style.

9a60ee9

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Try fix py27 error.

6a7b02a

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Skip error message checking for web session.

d2e720c

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Fix py35 error.

e98a264

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Revert "Skip error message checking for web session."

c84cf4b

This reverts commit f566da1.

Return the mars web api exception in a proper manner.

c1a2534

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Fix record type of mutable tensor.

61f555e

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Fix flake8.

e7c7dcc

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Revise the web endpoint for mutable tensor.

7f4ded5

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Fix for py27.

f8aa7a6

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

sighingnow added 5 commits July 16, 2019 23:03

Test _dump_exception of web API.

a8fa970

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Revise the doc string, and add mt.mutable_tensor constructor.

5830158

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Add mars.tensor.core.mutable_tensor constructor.

0091b8c

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Remove the Index definition inside the MutableTensor.

902339a

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Implement fill_value parameter for mutable_tensor constructor.

3b982fa

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

sighingnow force-pushed the mut-web branch from f5ac2b6 to 3b982fa Compare July 16, 2019 15:05

qinxuye previously approved these changes Jul 17, 2019

View reviewed changes

wjsi requested changes Jul 17, 2019

View reviewed changes

sighingnow added 3 commits July 17, 2019 17:42

Rename put_chunk as put_remote_chunk and move it to transfer.py.

4c14ba9

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

Revise the endpoint of create and seal.

ddb353f

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

No need to request quota.

486c209

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>

sighingnow dismissed qinxuye’s stale review via 486c209 July 17, 2019 09:53

wjsi approved these changes Jul 18, 2019

View reviewed changes

wjsi merged commit 7fe0cef into mars-project:master Jul 18, 2019

sighingnow deleted the mut-web branch July 18, 2019 09:55

		_nsplits=tensor.nsplits, _key=tensor.key, _chunks=tensor.chunks))


		def setitem_as_records(nsplits_acc, output_chunk, value, ts, is_scalar):

		return '%s_load_memory_%s' % (graph_key, chunk_key)


		def put_chunk(session_id, chunk_key, data, receiver_ref):

Conversation

sighingnow commented Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

Related issue number

Uh oh!

wjsi Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinxuye left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinxuye commented Jul 14, 2019

Uh oh!

hekaisheng commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sighingnow commented Jul 15, 2019

Uh oh!

sighingnow commented Jul 15, 2019

Uh oh!

sighingnow commented Jul 16, 2019

Uh oh!

qinxuye commented Jul 16, 2019

Uh oh!

sighingnow commented Jul 16, 2019

Uh oh!

hekaisheng commented Jul 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sighingnow commented Jul 16, 2019

Uh oh!

sighingnow commented Jul 16, 2019

Uh oh!

qinxuye left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinxuye commented Jul 17, 2019

Uh oh!

wjsi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

sighingnow commented Jul 11, 2019 •

edited

Loading

wjsi Jul 11, 2019 •

edited

Loading

hekaisheng commented Jul 15, 2019 •

edited

Loading

hekaisheng commented Jul 16, 2019 •

edited

Loading