Summary: Adding support for DLPack tensors to Python op
Reviewed By: Yangqing
Differential Revision: D6577702
fbshipit-source-id: e14ef213fcdb2930ffe164667971a92aa8db503c
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device
Reviewed By: dzhulgakov
Differential Revision: D5985110
fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.
Next I will integrate this with predictor and run canary (separate diff).
Reviewed By: asaadaldien
Differential Revision: D5375392
fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
Summary:
This allows to construct a python op by passing a pickled "builder function call" as an argument to the op.
The builder function is called at PythonOp construction time and returns a function that will be called when the op is run.
This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely.
Reviewed By: dzhulgakov
Differential Revision: D5080833
fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.
Reviewed By: salexspb
Differential Revision: D4893083
fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
Summary:
aaronmarkham this solves your Windows build issue. Basically:
(1) VS 2017 does not have CUDA support yet, and we will be waiting on NVidia to do so.
(2) VS 2015 and 2017 need different cmake generator strings.
This PR shows how to determine those and also updates appveyor to do contbuild guard for the following 3 settings:
- VS2015 without cuda
- VS2017 without cuda
- VS2015 with cuda
Closes https://github.com/caffe2/caffe2/pull/210
Differential Revision: D4745007
Pulled By: Yangqing
fbshipit-source-id: 50952552843abd0eb6f4145d9f132daeee3a6794
Summary:
DPER has very strange python ops that play with Workspace - they are somewhat similar to LoadOp/SaveOp, so I guess the semantics is fine.
Thus it makes sense to allow python operators to receive workspace pointer similarly to regular Operators.
I didn't figure out a better way to implement optional argument than just checking the number of args function receives on python side.
Reviewed By: ajtulloch
Differential Revision: D4242943
fbshipit-source-id: d97d4227815b741c8f884cfe254b06d2b56b5a41