pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	2a332afbf4	Add SymFloat, support SymInt to SymFloat conversion (#84284 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284 Approved by: https://github.com/albanD	2022-09-03 01:30:32 +00:00
Elias Ellison	97b2dff600	Add Initial Support For Fake Tensor Constant Tracking (#84387 ) Adds support for constant tensor tracking within FakeTensors. Copy-pasta'ing from `proxy_tensor.py` why this is useful: ``` # In some circumstances, we will be tracing in a situation where a tensor # is statically known to be a constant (currently, this only happens if # you run torch.tensor; deterministic factory functions like torch.arange # don't get this treatment). When the tensor in question is small, it's # helpful to due constant propagation in case we call item() (in which # case we can return the constant value that is known, rather than give # an error.) ``` This PR only attempts to add support for the tracing scenarios where we run each operation linearly - aot autograd, torchdynamo. It does not yet handle how constant tensors should be handled as part of the persistent fx graph. Additionally, it does not yet attempt to de-duplicate or interact with ProxyMode's only constant tensor handling. Edit: plan is to rely on functionalization for fx graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/84387 Approved by: https://github.com/ezyang	2022-09-02 02:43:04 +00:00
Horace He	6a3ecda5a2	Started storing faketensor/symbolic shape metadata on FX nodes in make_fx (#84114 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84114 Approved by: https://github.com/SherlockNoMad	2022-08-31 04:39:48 +00:00
Edward Z. Yang	ad44670fa1	Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628 )" (#84173 ) Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature" Original commit changeset: dab4a9dba4fa Original commit changeset: dcaf16c037a9 Original Phabricator Diff: D38984222 Original Phabricator Diff: D39075159 Also update Metal registrations for C++ registration changes. Also update NNPI registration to account for tightened schema checking Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173 Approved by: https://github.com/Krovatkin	2022-08-29 18:01:07 +00:00
Kimish Patel	cfd18e105f	[Pytorch][Ondevice quantization] Add device side API to convert model (#83807 ) Summary: This diff adds device side API which will convert the model to its quantized equivalent. THe input model must have been prepared AOT for quantization. API is implemented by: - Running reset obervers - Running observe method - Running quantize method - And replacing method, e.g. forward, with its quantized equivalent. Test Plan: test/quantization/jit/test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38889818](https://our.internmc.facebook.com/intern/diff/D38889818) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83807 Approved by: https://github.com/iseeyuan	2022-08-29 17:57:38 +00:00
Kimish Patel	5c7e801c50	[pytorch][on device quant] Finalize method for ondevice quant (#83571 ) Summary: After inserting quant dequant nodes in the graph, we need 1. Insert packed param creation and quantized op 2. Create packed_params attribute in the top module. For this we need graph that inlined except for calculate_qparams method calls. But they can be inlined too. So perhaps we need to make sure no other callmethods exist. 3. Insert SetAttr for the packed param 4. Insert GetAttr for the packed param 5. Use GetAttr output for quantized op where applicable, e.g. linear_dynamic The above is added to quantize_<method-name> method created inprevious step. Once the above steps are done clone the method into quantized_<method-name> Modify quantize_<method-name>: 1. Remove all outputs from the method. 2. Run dce 3. Remove all inputs from the method except self. Modify quantized_<method-name>: 1. Remove all packed_param setAttr nodes. 2. Run dce. This should result in removal of all nodes that generate packed param. Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571 Approved by: https://github.com/jerryzh168	2022-08-29 17:53:11 +00:00
Kimish Patel	446afb5f9f	[On Device Quantization][pytorch]Make insert_quant_dequant support ondevice ptq (#83570 ) Summary: This diff adds a way to: - clone previously observed method - Add calls to observer's calculate_qparams methods - Extract the scale and zero point - Use them to insert quant dequant nodes Now for forward method we have - observe_forward - quantize_forward observe_forward is used post training to observer statistics. In the case of dynamic PTQ this requires just running that method once to update weight observer statistics. quantize_forward method will be used to use the observer statistics to calculate quantization parameters and apply that to quant dequant op. Subsequent diffs will replace dequant + op with their quantized op counter parts and replace quantize ops with relevant packed params class where possible Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771419](https://our.internmc.facebook.com/intern/diff/D38771419) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83570 Approved by: https://github.com/jerryzh168	2022-08-29 17:51:00 +00:00
Kimish Patel	9189edb3b3	[Quantization][Pytorch] On device quantization support part 1 (#83568 ) Summary: TO support on device quantization this diff introduces observer insertion. Specifically observers are inserted by adding new method with prefix observ_. Intent is that post training, this method will be run to record statistics Test Plan: test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771417](https://our.internmc.facebook.com/intern/diff/D38771417) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83568 Approved by: https://github.com/jerryzh168	2022-08-29 17:22:30 +00:00
Ivan Yashchuk	3aae6ff1e1	Add nvprims.var_mean (#83508 ) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py # WITH THIS PR # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.033792 ms, achieved: 621.818 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.032608 ms, achieved: 644.396 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.03072 ms, achieved: 684 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # ON MASTER # kernel1 run in 0.05632 ms, achieved: 373.091 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043808 ms, achieved: 479.649 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel	2022-08-28 18:45:25 +00:00
PyTorch MergeBot	b159a5230f	Revert "Add nvprims.var_mean (#83508 )" This reverts commit `7e7694b661`. Reverted https://github.com/pytorch/pytorch/pull/83508 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-28 11:30:27 +00:00
Ivan Yashchuk	7e7694b661	Add nvprims.var_mean (#83508 ) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py # WITH THIS PR # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.033792 ms, achieved: 621.818 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.032608 ms, achieved: 644.396 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.032768 ms, achieved: 641.25 GB/s # kernel1 run in 0.03072 ms, achieved: 684 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # kernel1 run in 0.031744 ms, achieved: 661.935 GB/s # ON MASTER # kernel1 run in 0.05632 ms, achieved: 373.091 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043808 ms, achieved: 479.649 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.044032 ms, achieved: 477.209 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s # kernel1 run in 0.043008 ms, achieved: 488.571 GB/s ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel	2022-08-27 09:05:20 +00:00
PyTorch MergeBot	c7edcd6968	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `9790d90e4b`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487	2022-08-27 01:23:17 +00:00
Edward Z. Yang	9790d90e4b	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-26 01:35:40 +00:00
PyTorch MergeBot	a7edf71360	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `8fae7027b3`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222	2022-08-25 00:49:40 +00:00
Edward Z. Yang	8fae7027b3	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-23 22:04:07 +00:00
Nikolay Korovaiko	5b621205f4	Revert "Revert "adding a custom caster for c10::SymInt (#82692 )"" (#83223 ) This should fix the MacOS build errors and reland #82692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83223 Approved by: https://github.com/albanD	2022-08-12 00:46:50 +00:00
David Berard	1f99bdfcc4	[JIT] Retry - Support scripting torch.is_autocast_enabled() (#82394 ) This adds an `aten::is_autocast_enabled` op into the jit runtime so that autocasting ops can be scripted and called from within jit. Differential Revision: [D38294040](https://our.internmc.facebook.com/intern/diff/D38294040) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82394 Approved by: https://github.com/eellison	2022-08-10 18:26:17 +00:00
goldenxuett	2b6905413e	[JIT] Add SchemaCheckMode OpInfo test (#82442 ) - Move test_schema_check to torch/test directory. - Add opInfo test for SchemaCheckMode to check all operator schemas - Add various changes (using isClose instead of equals, skipping complex number cases for certain ops, etc...) in order to have test_schema_check pass. Differential Revision: [D38437946](https://our.internmc.facebook.com/intern/diff/D38437946) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82442 Approved by: https://github.com/davidberard98	2022-08-09 23:13:43 +00:00
PyTorch MergeBot	daeea7d2c3	Revert "adding a custom caster for c10::SymInt (#82692 )" This reverts commit `dee63f4f7b`. Reverted https://github.com/pytorch/pytorch/pull/82692 on behalf of https://github.com/seemethere due to Broke internal builds, see [logs](https://www.internalfb.com/intern/sandcastle/job/4503600373141339/insights)	2022-08-09 22:17:41 +00:00
Edward Z. Yang	988bd0173c	Add OpOverload.decompose API (#83075 ) This allows you to directly call into the CompositeImplicitAutograd implementation of an operator, without changing any aspects of the dispatcher state. In particular, you can use this to recursively call into a decomposition, dispatching back to your tensor subclass/mode as desired. Hypothetically, we should also make these available in the decompositions dictionary, but I'm leaving this as future work as enumerating these decompositions is annoying (as operators are lazily registered.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83075 Approved by: https://github.com/albanD	2022-08-09 18:53:19 +00:00
Tugsbayasgalan Manlaibaatar	b4b60c2a2e	Get rid of ENABLE_UPGRADERS macro (#77574 ) Since it's been a while after we merged the upgrader design and we haven't encountered any issues, let's get rid of the macro for safe rollout Pull Request resolved: https://github.com/pytorch/pytorch/pull/77574 Approved by: https://github.com/gmagogsfm	2022-08-09 05:33:14 +00:00
Nikolay Korovaiko	dee63f4f7b	adding a custom caster for c10::SymInt (#82692 ) ### Description Adding a custom caster for `c10::SymInt`. This simplifies handling of c10::SymInt on C++/Pytorch boundary. Namely, removing if statements to handle the union nature (e.g. SymIntNode, int) of c10::SymInt. ### Issue <!-- Link to Issue ticket or RFP --> ### Testing <!-- How did you test your change? --> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82692 Approved by: https://github.com/ezyang	2022-08-08 21:40:53 +00:00
Nikolay Korovaiko	8b20e47974	add integer divison for symints (#82791 ) ### Description This PR brings integer division (floor) to symints + tests. ### Issue https://github.com/orgs/pytorch/projects/17/views/2 ### Testing added two tests to TestPySymInts Pull Request resolved: https://github.com/pytorch/pytorch/pull/82791 Approved by: https://github.com/ezyang	2022-08-04 20:00:51 +00:00
Ivan Yashchuk	ec67c6abbe	Add torch.ops.nvprims namespace for nvFuser-specific prims (#82155 ) New namespace `torch.ops.nvprims` is meant for specific to the nvFuser set of primitives. All `impl_nvfuser` attributes are removed from `torch.ops.prims` functions. `NvfuserPrimsMode()` context manager can be used for automatic rewrite of `torch.ops.prims` calls to `torch.ops.nvprims` when possible. The previous way to test whether a prim would be executable with nvFuser was to test `impl_nvfuser is not None`, now all functions in the `torch.ops.nvprims` namespace are supposed to have the `impl_nvfuser` attribute and hence all are executable by nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82155 Approved by: https://github.com/jjsjann123, https://github.com/ngimel	2022-08-04 16:51:56 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit `532b8a9e00`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
Andres Lugo-Reyes	f1a1356907	[ROCm] Enable/fix unit tests test_stream_args and test_event_args (#82346 ) ### Description Removed some stubbed out code that was necessary for ROCm builds to support JIT compilation of Event and Stream classes. Original motivation for the code to be stubbed out in the ROCm case was likely due to this pull request: https://github.com/pytorch/pytorch/pull/48020 In this PR, the include statement at the at the top of cuda.h was incorrectly pointed to aten/src/ATen/cuda/CUDAEvent.h when it should have been set to ATen/cuda/CUDAEvent.h. This error caused the hipification process of build_amd.py to not hipify this include statement correctly, causing errors. The include statement in question was subsequently fixed in the following commit: `acd072967a` This PR re-introduces the stubbed out code to the ROCm build and "unskips" the associated unit tests. ### Testing Note: bullets prepended by ROCm were tested on systems with AMD GPUs while the others were tested with NVIDIA GPUs. - apply commit - (ROCm)`python tools/amd_build/build_amd.py` - `python setup.py develop` - (ROCm)`PYTORCH_TEST_WITH_ROCM=1 python test/test_jit.py TestCUDA.test_event_args` - (ROCm)`PYTORCH_TEST_WITH_ROCM=1 python test/test_jit.py TestCUDA.test_stream_args` - `python test/test_jit.py TestCUDA.test_event_args` - `python test/test_jit.py TestCUDA.test_stream_args` - Confirm tests pass in all scenarios Pull Request resolved: https://github.com/pytorch/pytorch/pull/82346 Approved by: https://github.com/malfet	2022-08-01 22:55:15 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit `9465c0e0b5`. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00
Edward Z. Yang	50e8abbcad	Change SymIntNode into an intrusive pointer (#82548 ) This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548 Approved by: https://github.com/albanD	2022-08-01 15:07:21 +00:00
PyTorch MergeBot	3b9cbb1738	Revert "Change SymIntNode into an intrusive pointer (#82432 )" This reverts commit `7be44f8158`. Reverted https://github.com/pytorch/pytorch/pull/82432 on behalf of https://github.com/ezyang due to segfaults on test but not caught in CI	2022-07-29 20:08:59 +00:00
Edward Z. Yang	7be44f8158	Change SymIntNode into an intrusive pointer (#82432 ) This will make the pointer type a single word, which is important for packing it into an int64_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82432 Approved by: https://github.com/albanD, https://github.com/Krovatkin	2022-07-29 17:32:54 +00:00
William Tambellini	6e56629efa	[JIT] JIT script init verbose assert (#80495 ) Log the sizes of inputs in the assert of setInputTensorTypes(...) in jit/python/script_init.cpp for easy debugging. Helps/close: https://github.com/pytorch/pytorch/issues/72763 Fixes #72763 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80495 Approved by: https://github.com/davidberard98	2022-07-29 00:50:18 +00:00
Edward Z. Yang	34bdd46e6e	Rename shared_ptr<SymIntNodeImpl> to SymIntNode (#82355 ) Makes code a lot more compact! It also makes it possible to swap out the shared ptr implementation, which I am about to do next. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82355 Approved by: https://github.com/Krovatkin	2022-07-28 18:27:45 +00:00
Edward Z. Yang	fd5ac1e6b5	Rename SymbolicIntNode to SymIntNodeImpl (#82350 ) Done via ``` git grep -l 'SymbolicIntNode' \| xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g' ``` Reasoning for the change: * Sym is shorter than Symbolic, and consistent with SymInt * You usually will deal in shared_ptr<...>, so we're going to reserve the shorter name (SymIntNode) for the shared pointer. But I don't want to update the Python name, so afterwards I ran ``` git grep -l _C.SymIntNodeImpl \| xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/' ``` and manually fixed up the binding code Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350 Approved by: https://github.com/Krovatkin	2022-07-28 18:27:45 +00:00
PyTorch MergeBot	554b4060aa	Revert "[JIT] Support scripting torch.is_autocast_enabled() (#81305 )" This reverts commit `bcc9084bc4`. Reverted https://github.com/pytorch/pytorch/pull/81305 on behalf of https://github.com/malfet due to Broke lite-intepreter builds, see https://github.com/pytorch/pytorch/runs/7550084494?check_suite_focus=true	2022-07-28 00:02:53 +00:00
David Berard	bcc9084bc4	[JIT] Support scripting torch.is_autocast_enabled() (#81305 ) This adds an `aten::is_autocast_enabled` op into the jit runtime so that autocasting ops can be scripted and called from within jit. Differential Revision: [D37901585](https://our.internmc.facebook.com/intern/diff/D37901585) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81305 Approved by: https://github.com/qihqi, https://github.com/eellison	2022-07-27 22:32:08 +00:00
albanD	4b7de26556	Fix C API to be compatible with latest 3.11 beta (#81242 ) Based off https://github.com/pytorch/pytorch/pull/80511 with extra changes: - Update pybind to the latest release as it contains some needed fixes - Extend the compat header to do reduce changes in code Pull Request resolved: https://github.com/pytorch/pytorch/pull/81242 Approved by: https://github.com/malfet, https://github.com/mattip	2022-07-27 08:37:10 +00:00
goldenxuett	d576a7dc97	[JIT] Fix python binding error with empty containers in init.cpp (#81786 ) - toTypeInferredIValue will throw an error when given an empty container because it isn't able to tell what kind of container it is. Thus empty containers are ignored in addArgumentValue/s, overlaps, and is_alias_of. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81786 Approved by: https://github.com/davidberard98	2022-07-23 05:50:39 +00:00
goldenxuett	c9497886fd	[JIT] Modify is_mutable in FunctionSchema and SchemaInfo to have SchemaArgument parameter instead of index (#81784 ) - Modify the is_mutable(size_t index) overload to become is_mutable(const SchemaArgument& argument) due to cases where one might want to check the mutability of either input or output arguments. - Refactored all calls to the function to use this new overload - Tested through is_mutable() tests in test_schema_info.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/81784 Approved by: https://github.com/davidberard98	2022-07-20 22:09:56 +00:00
goldenxuett	21a4be34cd	[JIT] Enchance training ops check to be more inclusive and account for possible pybind exceptions (#81782 ) - Modified is_mutable python binding to accept a string instead of a string_view for better python compatibility. - Modified argument value adding python bindings to deal with input/self edge case due to inconsistencies in how the first variable is named. - Modified _is_alias_of and created _contains_alias_of python bindings to accurately find out if values are aliasing, or contain an alias. - Fixed is_mutable implementation to cover all ops that have mutable optional arguments. (These are all the ops that have the optional arguments 'running_mean' and 'running_var' along with either 'train', 'training' or 'use_input_stats.' Pull Request resolved: https://github.com/pytorch/pytorch/pull/81782 Approved by: https://github.com/davidberard98	2022-07-20 22:09:54 +00:00
goldenxuett	8e454cc702	[JIT] Add SchemaInfo python bindings to init.cpp (#81518 ) - Added python bindings for SchemaInfo class, SchemaArgument struct, and SchemaArgType enum. - Tested that argument values are added correctly to SchemaInfo binding in test_schema_check.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/81518 Approved by: https://github.com/davidberard98	2022-07-19 22:33:19 +00:00
Edward Z. Yang	7e60e315da	Add support for Generator conversion to/from IValue (#81697 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81697 Approved by: https://github.com/anjali411	2022-07-19 16:50:10 +00:00
Horace He	97938d872e	Added a couple more symint magic methods + symbolic shape infra (#81086 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81086 Approved by: https://github.com/ezyang	2022-07-16 23:47:58 +00:00
albanD	1afb804f26	Improve wrapper subclass detection for serialization (#81105 ) Fixes https://github.com/pytorch/pytorch/issues/80983 Also fix a small bug uncovered by the new test where creating memory_view for 0-sized inputs is not valid and is now skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/81105 Approved by: https://github.com/ezyang	2022-07-11 14:02:37 +00:00
Edward Z. Yang	91b0250606	Remove dead code from torch.ops torch function handling (#80993 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/80993 Approved by: https://github.com/anjali411	2022-07-06 22:56:18 +00:00
Edward Z. Yang	421f04dd02	Only allow numbers as tensors if operator was explicitly allowlisted so (#80587 ) Fixes https://github.com/pytorch/pytorch/issues/80508 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/80587 Approved by: https://github.com/ngimel	2022-06-30 18:59:38 +00:00
Rodrigo Kumpera	b4e491798c	Avoid temporary buffers for tensors with torch.save. (#80404 ) Fix torch.save _open_zipfile_writer optimization that uses a c++ stream when `f` is a os.PathLike. This fastpath requires that we don't `open()` in python if possible, so don't do it unconditionally. Fix PyTorchStreamWriter construction binding that takes a buffer object. Use py::memoryview instead of py::bytes as the former doesn't copy the data. Validated with a trivial benchmark that calls torch.save in a loop 20x with a 10M elements float32 tensor either on cpu or cuda. Saved to /dev/null. Tried two variants 'str' and 'open' In 'str' we pass the string "/dev/null" to torch.save. In 'open' we pass `open("/dev/null", "wb")` to torch.save. Timing in seconds. Before this patch: str-cpu :: 0.757 open-cpu :: 0.757 str-cuda :: 1.367 open-cuda :: 1.366 After this patch: str-cpu :: 0.256 open-cpu :: 0.251 str-cuda :: 0.896 open-cuda :: 0.834 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/80404 Approved by: https://github.com/jamesr66a	2022-06-30 00:19:42 +00:00
Horace He	7850a328b4	Revert "Revert "parse pysymints to IValues (#80066 )"" (#80419 ) This is a reland of https://github.com/pytorch/pytorch/pull/80066 with the relative path changed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80419 Approved by: https://github.com/Krovatkin	2022-06-28 17:21:34 +00:00
PyTorch MergeBot	0322ecc3fd	Revert "parse pysymints to IValues (#80066 )" This reverts commit `f532b3a619`. Reverted https://github.com/pytorch/pytorch/pull/80066 on behalf of https://github.com/seemethere due to Uses relative includes which causes internal builds to fail	2022-06-24 20:15:09 +00:00
Nikolay Korovaiko	f532b3a619	parse pysymints to IValues (#80066 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/80066 Approved by: https://github.com/Chillee	2022-06-23 19:51:08 +00:00

1 2 3 4 5 ...

655 Commits