As this is the oldest gcc that is fully compatible with C++17 standard.
- Replace number of conditional version with simpler `if(CMAKE_COMPILER_IS_GNUCXX)` or `append_cxx_flag_if_supported`.
- As `-Wsuggest-override` condition was hidden before incorrect guard, add missing `override` keywords to `torch::autograd::PyFunctionTensorPostAccGradHooks::apply_with_saved` , `caffe2::python::TensorFeeder::Feed` and `cafee2::NetObserverReporterPrint::report```
Fixes https://github.com/pytorch/pytorch/issues/101839
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112858
Approved by: https://github.com/Skylion007, https://github.com/albanD
This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string.
Some other guts functions in the affected source files are also replaced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480
Approved by: https://github.com/Skylion007
```
In file included from /local/pytorch3/test/cpp/api/optim.cpp:7:
local/pytorch3/test/cpp/api/support.h:44:3: warning: '~WarningCapture' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override]
~WarningCapture() {
^
local/pytorch3/c10/util/Exception.h:167:11: note: overridden virtual function is here
virtual ~WarningHandler() = default;
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107191
Approved by: https://github.com/janeyx99
This is part of effort to enable missed cpp tests for ROCm platform.
In this change,
- enabled test_libtorch cpp tests (more than 3107 tests)
- fixed missing dependency: libcaffe2_nvrtc.so required by FunctionalTest.Conv1d
- test_api binary is changed to exclude failed tests InitTest and IntegrationTest - to revisit later
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106712
Approved by: https://github.com/jithunnair-amd, https://github.com/kit1980
When the hook registered by Tensor::register_hook (in C++) gets passed
an undefined tensor, it raises an internal assert in debug mode.
The cause is that we attempt to construct an OptionalTensorRef
(4448c78a5d/aten/src/ATen/core/Tensor.h (L68))
which asserts that the passed-in TensorBase is defined.
The fix is that we create a new TensorRef class to convert the
TensorBase into a Tensor without bumping the refcount (which is what
OptionalTensorRef does). We cannot reuse OptionalTensorRef because
OptionalTensorRef represents `optional<Tensor>` that cannot hold an
Undefined Tensor.
For some more historical context, it looks like this behavior was introduced
in #63612
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105587
Approved by: https://github.com/soulitzer
This PR enables `-Winconsistent-missing-destructor-override` and `-Winconsistent-missing-override`
and fixes violations.
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 47e904e</samp>
This pull request updates the code of various classes and operators in the `caffe2` and `aten` subdirectories to use the `override` specifier instead of the `virtual` keyword for destructors and other virtual functions that override a base class function. This improves the code readability, quality, and consistency with C++ best practices. It also modifies the `./CMakeLists.txt` file to enable warnings for these specifiers, but disable errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104032
Approved by: https://github.com/malfet
BackendMeta offers a binary interface for the backend to attach arbitrary data to TensorImpl. TensorImpl has exactly one "slot" for backend metadata, however backend is free to compose any structure that is opaque to the framework beyond iheriting standard BackendMeta base.
Change-Id: I670fcdd16dd1c2b00f7eaa1cbc5b5dfea59a6221
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97429
Approved by: https://github.com/ezyang
BackendMeta offers a binary interface for the backend to attach arbitrary data to TensorImpl. TensorImpl has exactly one "slot" for backend metadata, however backend is free to compose any structure that is opaque to the framework beyond iheriting standard BackendMeta base.
Change-Id: I670fcdd16dd1c2b00f7eaa1cbc5b5dfea59a6221
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97429
Approved by: https://github.com/ezyang
Fixes https://github.com/pytorch/pytorch/issues/96887
We error out in BOTH the case when graph is created and when it is not created.
Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context.
This makes setup_context and non-setup_context versions diverge in their behavior
- With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs.
- But now with the setup_context version, we produce an error for this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212
Approved by: https://github.com/zou3519
Fixes https://github.com/pytorch/pytorch/issues/96887
We error out in BOTH the case when graph is created and when it is not created.
Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context.
This makes setup_context and non-setup_context versions diverge in their behavior
- With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs.
- But now with the setup_context version, we produce an error for this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212
Approved by: https://github.com/zou3519
Fixes#95796
### Implementation
Adds python implementation for `nn.ZeroPad1d` and `nn.ZeroPad3d` in `torch/nn/modules/padding.py`.
Adds cpp implementation for `nn::ZeroPad1d` and `nn::ZeroPad3d` in the following 3 files, refactored with templates similarly to `nn::ConstantPad`'s implementation: <br>
- `torch/crsc/api/include/torch/nn/modules/padding.h`
- `torch/csrc/api/include/torch/nn/options/padding.h`
- `torch/csrc/api/src/nn/modules/padding.cpp`
Also added relevant definitions in `torch/nn/modules/__init__.py`.
### Testing
Adds the following tests:
- cpp tests of similar length and structure as `ConstantPad` and the existing `ZeroPad2d` impl in `test/cpp/api/modules.cpp`
- cpp API parity tests in `torch/testing/_internal/common_nn.py`
- module init tests in `test/test_module_init.py`
Also added relevant definitions in `test/cpp_api_parity/parity-tracker.md`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96295
Approved by: https://github.com/soulitzer
Attempts to fix#92656
BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731
Approved by: https://github.com/ngimel
Fixes#81690
TODO:
* [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++)
* [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python)
* [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this)
* [x] Add Comments
* [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.)
Notes:
Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484
Sparse Tensor:
```python
>>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse()
>>> a.conj().is_conj()
False
>>> a._neg_view()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: Cannot access storage of SparseTensorImpl
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182
Approved by: https://github.com/ezyang, https://github.com/anjali411
The symbol seems to conflict under some compiler versions, giving
an error like "relocation refers to global symbol which is defined in a
discarded section". Simple enough to put it in an anonymous namespace,
so why not.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86092
Approved by: https://github.com/Chillee
### Introduction
<!-- What did you change and why was it needed? -->
Removing unnecessary weight gradient calculation is very important for applications that need high-order derivatives during training. However, this is not supported by the current Autograd engine.
For more detail: The backward function of a `matmul` operator (e.g., `linear` `addmm` `mm`), has two matmuls, one for `input gradient` and another for `weight gradient`. For a typical neural network (nn) with a few linear layers and activation functions, if the user calls `torch.autograd.grad()` to calculate the derivative of the nn output `y` w.r.t the nn input `x`, only the `input gradient` of the `matmul` operator is needed, and the `weight gradient` is discarded. However, the current PyTorch autograd engine will always calculate the `weight gradient` if `weight` requires gradient (the calculation of the high-order derivative is performed during training).
The figure attached shows the autograd graph of the following code snippet:
```py
y = torch.nn.functional.linear(x, weight, bias)
y = y.pow(2)
# first order derivative
y__x, = torch.autograd.grad(y, x, grad_outputs=grad_outputs, create_graph=True)
# first order derivative
y__x__x, = torch.autograd.grad(y__x, x, grad_outputs=grad_outputs, create_graph=True)
```
The path with ❌ is not needed when calculating derivatives.
<img width="50%" alt="image" src="https://user-images.githubusercontent.com/9999318/182018117-719c5a23-bcc6-4a63-8e8d-1bca3ebda2e3.png">
### Issue
<!-- Link to Issue ticket or RFP -->
Related issue: https://github.com/pytorch/pytorch/issues/56500
### Method
When calling `torch.autograd.grad`, `exec_info_` is created for each GraphTask, which allows filtering paths on the graph that are not needed. However, when the GraphTask calls into the node, the node still does not know whether the edges are needed or not. In the case of matmul, `weight.requires_grad is True` so the weight gradient is always calculated.
Following https://github.com/pytorch/pytorch/issues/56500#issuecomment-825694656, this PR passes the graph task's thread_local `exec_info_` into the node, so it could trim unnecessary edges during `torch.autograd.grad` calls.
### Benchmark
Benchmark script: https://gist.github.com/yueyericardo/24158433a2021c51eeef9c3e2722df99
Benchmark result:
6 hidden layers, batch size 10000, on A100
FP32 result
| hessian benchmark | FP32 (before) | FP32 (After) | FP32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward) | 55.658 ms | 29.392 ms (1.90X) | 29.547 ms (1.90X) |
| Linear + ReLU (with backward) | 81.173 ms | 54.917 ms (1.47X) | 68.988 ms (1.18X) |
TF32 result
| hessian benchmark | TF32 (before) | TF32 (after) | TF32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward) | 19.801 ms | 11.259 ms (1.76X) | 10.754 ms (1.84X) |
| Linear + ReLU (with backward) | 29.167 ms | 20.466 ms (1.42X) | 22.784 ms (1.28X) |
For FP32 result, we could get 1.9X speed up for hessian calculation, and 1.47X speed up during training, which is even faster than functorch `vmap(jacfwd(jacrev` implementation. (functorch has performance regression on v0.2.0, https://github.com/pytorch/functorch/issues/989, so we are using v0.1.1 for benchmark)
@zou3519 does functorch also includes similar optimizations during hessian calculation? If not, what do we need to do so the functorch could also benefit from this PR?
### Testing
<!-- How did you test your change? -->
- [x] we need to figure out a way for unittest
### Thanks
Thanks for the great blog: [How Computational Graphs are Executed in PyTorch | PyTorch](https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/)
cc @zasdfgbnm @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82544
Approved by: https://github.com/soulitzer
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on
Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`
Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on
Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`
Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit#
Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here:
- Hooks in cpp in general
- (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn
associated with older version of the tensor when the first hook was ever registered
- (unchanged) hooks registered to the older version of the tensor remain active on
- Retains grad hooks
- (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks
or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness
which is a property of the tensor.
- (not in this PR) Python hooks
- (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated
with the older version of the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996
Approved by: https://github.com/albanD
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244
Original commit changeset: d653a5af662a
Original Phabricator Diff: D35060736 (d9d34922a0)
Test Plan: Model loading test, verified that D35060736 (d9d34922a0) will cause the torch::save => torch::load failure.
Reviewed By: yinghai, jianyuh
Differential Revision: D35387009
fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298
(cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)
Summary:
There are various possible approaches, but the approach chosen minimizes disruption to source control blame.
Addresses:
```
error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument]
```
Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional
Reviewed By: jamesr66a
Differential Revision: D34027291
fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006
(cherry picked from commit d43b5a7ed6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394
Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D32837991
fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D31705359
fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195
Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero`
ghstack-source-id: 141514416
Test Plan: CI
Reviewed By: larryliu0820
Differential Revision: D31704041
fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D31705358
fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878
See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context:
In this PR:
- Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel
- Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node.
- This limitation makes it impossible to use this node for general use
- view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor)
- do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable
- Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature:
- static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`.
- Adds testing:
- for views: view relationship created
- performing in-place operation on the view, raises properly
- trying to create two view relationships is not allowed,
- single view relationship but not first input/first output should error
- view relation created properly for tensor vector output
- for in-place:
- version count bump
- triggers rebase_history
- multiple mutations is okay and also updates version counter
- TODO (follow up): Update tutorials for adding third-party operators (and document the above limitations)
- TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code
EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30901714
Pulled By: soulitzer
fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Do not delete `caffe2::OperatorBase::Output` calls as they have side effects
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041
Reviewed By: ngimel
Differential Revision: D31360142
Pulled By: malfet
fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954
Reviewed By: ngimel
Differential Revision: D31326599
Pulled By: malfet
fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861
First in a series. This PR changes the code in deploy.h/cpp and
interpreter_impl.h/cpp to be camel case instead of snake case. Starting
with this as it has the most impact on downstream users.
Test Plan: Imported from OSS
Reviewed By: shannonzhu
Differential Revision: D31291183
Pulled By: suo
fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612
This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.
Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.
To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.
I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D30728580
Pulled By: ezyang
fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03