Summary:
Strength matcher for StaticDispatch kernels: all input, output tensor must be on CPU, all Device-typed attribute must be CPU.
Previously, we only check output tensor on CPU. This will miss catching the case where we do DeviceToHost aten._to_copy.
Prepare for turning on static dispatch kernel by default.
Test Plan:
I should add some test before land.
Rollback Plan:
Differential Revision: D78747600
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159187
Approved by: https://github.com/dolpm
Summary:
Placement is leaked to too many classes!
In this diff, we consolidate all placement lookup into one place: Graph::ApplyDevicePlacement.
After applying placement, the in-memory graph, tensorMeta, weightMeta would already have the re-mapped device.
The subsequence weight loading, sample input loading, target device inference would look up the re-mapped device from graph's tensorMeta.
graph's tensorMeta becomes the only ground truth!
Test Plan:
Need to add some tests before landing.
This is a big change.
Rollback Plan:
Differential Revision: D78841818
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158996
Approved by: https://github.com/henryoier