Since clamp_min and maximum is the same op, reuse the same kernel (it also correctly propagate nans from both input and boundary, clamp* propagated from input only).
Also fixed codegen to make Tensor? overloads come before Scalar? overloads, cc @alband.
Fixes#67428 and #76795 (scalar overloads for clamp* are still not fixed, will do in the next PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77306
Approved by: https://github.com/albanD
Consider the following JIT graph, where the type of `%a` and `%b` are out of sync with tuple `%c`.
Before:
```
graph(%a : Float(123), %b : Float(4, 5, 6)):
c : (Tensor, Tensor) = prim::TupleConstruct(%a, %b)
return (%c)
```
After:
```
graph(%a : Float(123), %b : Float(4, 5, 6)):
c : (Float(123), Float(4, 5, 6)) = prim::TupleConstruct(%a, %b)
return (%c)
```
This PR adds a pass `RefineTypes(...)` to update all such instances with the correct type. This is also available via Python by using `torch._C._jit_pass_refine_types(...)`.
A unit test has been added for unnamed tuples, but no test exists for `NamedTuple` (though it was tested manually) since it isn't supported by the parser:
```
RuntimeError:
unknown type specifier:
graph(%a : Float(123), %b : Float(4, 5, 6)):
%c : NamedTuple(Tensor : Tuple, Tensor : Tuple) = prim::TupleConstruct(%a, %b)
~~~~~~~~~~ <--- HERE
return (%c)
```
cc: @ke1337 @antoniojkim @wconstab @eellison
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76919
Approved by: https://github.com/eellison
Adds the ability to grab the git tag when using
`generate_torch_version.py` so that users who build from source on a
specific tag will get the version that they expect.
Behavior is now this:
1. Check if git tag is available on current commit
2. If tag available use tagged version, do not attempt to grab other versions
3. If tag is not available, use previous workflow for determining version
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Fixes https://github.com/pytorch/pytorch/issues/77052
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77279
Approved by: https://github.com/ezyang
This is convenient for cases where we don't have Storage bound
correctly (e.g., meta tensors). It is also consistent with a universe
where we get rid of storages, although arguably this is never
gonna happen.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77007
Approved by: https://github.com/ngimel
Summary:
For some reason `aten/src/ATen/native/BatchLinearAlgebraKernel.cpp` were part of `aten_cpu_source_non_codegen_list` rather than `aten_native_source_non_codegen_list`
Fixes linking issues after https://github.com/pytorch/pytorch/pull/67833
```
stderr: ld.lld: error: undefined symbol: at::TensorIteratorBase::for_each(c10::function_ref<void (char**, long long const*, long long, long long)>, long long)
>>> referenced by TensorIterator.h:352 (buck-out/gen/fe3a39b8/xplat/caffe2/aten_headerAndroid#header-mode-symlink-tree-with-header-map,headers/ATen/TensorIterator.h:352)
>>> buck-out/gen/fe3a39b8/xplat/caffe2/aten_cpuAndroid#android-x86,compile-pic-BatchLinearAlgebraKernel.cpp.o93aa6b34/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp.o:(at::native::(anonymous namespace)::unpack_pivots_cpu_kernel(at::TensorIterator&, long long))
clang: error: linker command failed with exit code 1 (use -v to see invocation)
When running <c++ link>.
When building rule //xplat/caffe2:aten_cpuAndroid#android-x86,shared (ovr_config//platform/android:fbsource-base)."
```
Test Plan: CI
Reviewed By: dreiss, cccclai
Differential Revision: D36215453
fbshipit-source-id: 5f9c7cab742bb87a70b5acda46ef85817e50575c
(cherry picked from commit a1691c34f6bae484f710ac9321bfd8a8c999189e)
grep_linter.py was using the `-P` flag of `grep`, which is available in
GNU grep but notably *not* available in the BSD grep that is installed
on Macs.
Use `-E` instead, which uses ERE instead of PCRE. Sadly we were actually
using two PCRE features in our linters:
- Negative lookaheads. I changed these to less-accurate-but-still-good-enough
versions that use `[^...]` expressions.
- Apparently ERE doesn't support the `\t` atom lol. So I used a literal tab
character instead (and then had to disable the TAB linter for
`.lintrunner.toml` lol.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76947
Approved by: https://github.com/ezyang
- Run attempt detection was broken because it was comparing a str
(retrieved from the CLI input) to an int (retrieved from the
filename). Make them both ints so they will actually compare equal.
- `root.findall` only searches direct children, which didn't work for cpp
unittests and pytest-generated reports. Change to `root.iter` which
does a recursive search.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76982
Approved by: https://github.com/janeyx99
Summary: Currently OpKind is stored as an object field called op_ for each IR
node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we
need to downcast a base-node pointer into a concrete sub-node pointer.
As a result, we need to construct and pass in an op when downcasting
nodes, and this becomes quite anonnying when we start to implement the
trie-based IR node reusing. More importantly, the op for each subclass
should be unique for that subclass and thus making it a const static field
is a more logical design.
In this PR, we still keep the object-level op_ for easier XLA adoption. As
furture work, we can come back to remove op_, make the op() method
virtual, and get rid of OpKind in all the node constructors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711
Approved by: https://github.com/wconstab, https://github.com/JackCaoG
This PR allows user to author a CUDA kernel in python.
```
from torch.cuda.jiterator import create_jit_fn
code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return -x * y + x - y + alpha; }"
jitted_fn = create_jit_fn(code_string, alpha=0)
a = torch.rand(3, device='cuda')
b = torch.rand(3, device='cuda')
result = jitted_fn(a, b, alpha=1.0)
```
Limitations:
- Only supports elementwise kernel
- 1~8 tensor inputs (empty input, e.g. factory methods, is not supported)
- inputs tensors must live in cuda device
- cpu Scalar is not supported
- kwargs must be pre-declared when calling create_jit_fn
- kwargs must be convertible to at::Scalar, one of float64, int64_t, bool. (complex not support for now)
TODOs:
- [x] consolidate union and c10::variant implementation
- [x] plug into existing op testing framework
- [ ] rename files, place files in the right folder
- [ ] place util functions in the right file
- [x] enforce assumptions in python interface e.g <8 inputs, kwargs types
- [x] Add user-facing documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76394
Approved by: https://github.com/mruberry
This PR adds `linalg.lu_solve`. While doing so, I found a bug in MAGMA
when calling the batched MAGMA backend with trans=True. We work around
that by solving the system solving two triangular systems.
We also update the heuristics for this function, as they were fairly
updated. We found that cuSolver is king, so luckily we do not need to
rely on the buggy backend from magma for this function.
We added tests testing this function left and right. We also added tests
for the different backends. We also activated the tests for AMD, as
those should work as well.
Fixes https://github.com/pytorch/pytorch/issues/61657
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72935
Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
This PR modifies `lu_unpack` by:
- Using less memory when unpacking `L` and `U`
- Fuse the subtraction by `-1` with `unpack_pivots_stub`
- Define tensors of the correct types to avoid copies
- Port `lu_unpack` to be a strucutred kernel so that its `_out` version
does not incur on extra copies
Then we implement `linalg.lu` as a structured kernel, as we want to
compute its derivative manually. We do so because composing the
derivatives of `torch.lu_factor` and `torch.lu_unpack` would be less efficient.
This new function and `lu_unpack` comes with all the things it can come:
forward and backward ad, decent docs, correctness tests, OpInfo, complex support,
support for metatensors and support for vmap and vmap over the gradients.
I really hope we don't continue adding more features.
This PR also avoids saving some of the tensors that were previously
saved unnecessarily for the backward in `lu_factor_ex_backward` and
`lu_backward` and does some other general improvements here and there
to the forward and backward AD formulae of other related functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67833
Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry
We derive and implement a more concise rule for the forward and backward
derivatives of the QR decomposition. While doing this we:
- Fix the composite compliance of `linalg.qr` and we make it support batches
- Improve the performance and simplify the implementation of both foward and backward
- Avoid saving the input matrix for the backward computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76115
Approved by: https://github.com/nikitaved, https://github.com/albanD
Summary: TrieCache provides a way to look up an IR node before we
actually create it. If the lookup hits in TrieCache, we reuse the
existing node and move the current pointer in TrieCache to point to that
node; if the lookup misses, we create a new node and insert it into TrieCache.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76542
Approved by: https://github.com/wconstab, https://github.com/JackCaoG
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75800
This leads to more similarities between OSS CMake and eventually OSS
Bazel. We will be able to generate files with the same names and not
have different file lists between the builds.
ghstack-source-id: 155300043
Test Plan: Verified locally and in CI.
Reviewed By: dreiss
Differential Revision: D35648586
fbshipit-source-id: 9f1638b5665ebcc64466883f65ef24a2bfd05228
(cherry picked from commit 7f2acff1baa8dfafddefdc720714f8d39feda436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76720
This PR fixes an issue in hipify_python introduced by https://github.com/pytorch/pytorch/pull/76141.
https://github.com/pytorch/pytorch/pull/76141 made all the `includes` paths "absolute", but this was not done for `args.extra_include_dir`; `new_dir`, which is a relative path, is directly added to `includes`. This PR fixes it by passing the absolute path (`abs_new_dir`).
Test Plan: CI
Reviewed By: albanD
Differential Revision: D36089556
fbshipit-source-id: 1607075a4cb13696c1b25923f56b08a8cb3c6578
(cherry picked from commit 2ca648728f01c03320015f90d33404e75f978206)
This functionality does not seem to be used
and there are some requests to update dependency.
Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76173
We need this facility temporarily to sequence some changes without
breakage. This is generally not a good idea since the main purpose of
this effort is to replicate builds in OSS Bazel.
ghstack-source-id: 155215491
Test Plan: Manual test and rely on CI.
Reviewed By: dreiss
Differential Revision: D35815290
fbshipit-source-id: 89bacda373e7ba03d6a3fcbcaa5af42ae5eac154
(cherry picked from commit 1b808bbc94c939da1fd410d81b22d43bdfe1cda0)
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710
IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors.
Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first.
**Background**
- there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node. (DeviceData, Expand, Scalar...)
- we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely
- it is hard to have shared 'IR classes' in core/ because they depend on 'Node'
**Motivation**
1. avoid copy-paste of "special" node classes for each backend
2. in general decouple and remove all dependencies that LTC has on the TS backend
**Summary of changes**
- new 'IRBuilder' interface that knows how to make 5 special ops
- move 'special' node classes to `ts_backend/`
- implement TSIRBuilder that makes the special TS Nodes
- new backend interface API to get the IRBuilder
- update core code to call the builder
CC: @wconstab @JackCaoG @henrytwo
Partially Fixes#74628
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433
Approved by: https://github.com/wconstab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76129
Previously, quantized_max_pool2d_cudnn was made available to the
frontend through torch.ops.quantized.max_pool2d.
We improve the integration by also making it available through
torch.max_pool2d, which is made possible by registering
quantized_max_pool2d_cudnn in native_functions.yaml under
quantized_max_pool2d, which is called in max_pool2d.
Ideally and ultimately, we will get rid of the quantized_max_pool2d
registration in native_functions.yaml, and directly register
quantized_max_pool2d and quantized_max_pool2d_cudnn under max_pool2d,
but current support for quantized dispatch keys blocks us from doing so.
Test Plan:
```
python test/run_tests.py
```
```
python test/run_tests.py
```
Differential Revision:
D35789078
D35789078
Reviewed By: jerryzh168
Pulled By: dzdang
fbshipit-source-id: 5d8220255bfab663b4779b5d3c66dea9f79d8ee7
(cherry picked from commit c27164da29043f7dc9a4c27d24a93cd37162c23e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76333
The current PyTorch multi-head attention and transformer
implementations are slow. This should speed them up for inference.
ghstack-source-id: 154737857
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: cpuhrsch
Differential Revision: D35239925
fbshipit-source-id: 5a7eb8ff79bc6afb4b7d45075ddb2a24a6e2df28
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75869
ghstack-source-id: 154696012
Test Plan: Verified nothing uses this and relying on CI for confirmation.
Reviewed By: dreiss
Differential Revision: D35674694
fbshipit-source-id: c1d602aa4d85642594160a33606093c33817988f
(cherry picked from commit cac15ca941be298a692570491e96f2db6095e3c1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75868
This is unused in OSS and internally.
ghstack-source-id: 154696014
Test Plan: I manually verified it is unused and am relying on CI to confirm.
Reviewed By: dreiss
Differential Revision: D35674693
fbshipit-source-id: 945ec0590e9d939eab8944ae48bae72cb61e6261
(cherry picked from commit 01a29161b0a3b386078df3cd081358786a6d8f53)
Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused.
API is :
```
with torch.jit.strict_fusion():
...
```
A few TODOs:
[+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well
[+] Support all of the nvfuser operators that are added in guarding
[+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/ - will just error
[+] (After those are figured out) add to docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777
Approved by: https://github.com/davidberard98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76180
Provides string variables to let xla customize the generated code sufficiently enough to facilitate their migration onto LTC.
Some/all of these custom variables are expected to be short-lived for migration and eventually revert to using the original content that points to LTC functionality.
Test Plan: Imported from OSS
Reviewed By: huiguoo
Differential Revision: D35861778
Pulled By: wconstab
fbshipit-source-id: ef7aae55334628e2e7ff0c22e5c86ab95439256d
(cherry picked from commit 971f075e0c21804558f46c685508bd23daa42d4f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75605
Usecase: Milan models have multiple backends and need to use static dispatch to save on static initialization time and to hit native functions directly from the unboxed APIs.
This change passes in List[BackendIndex] and adds ability to generate code for multiple static backends with 1 or 0 kernels
ghstack-source-id: 154525738
(Note: this ignores all push blocking failures!)
Test Plan:
Builds lite_predictor_flatbuffer with multiple backends
```
buck build --config pt.enable_lightweight_dispatch=1 --config pt.static_dispatch_backend=CPU,QuantizedCPU,CompositeExplicitAutograd //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer
```
Reviewed By: larryliu0820
Differential Revision: D35510644
fbshipit-source-id: f985718ad066f8578b006b4759c4a3bd6caac176
(cherry picked from commit a6999729c8cc26c54b8d5684f6585d6c50d8d913)
Summary:
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710
Move shape cache implementation to the backend interface. Also, clean up some of the hashing logic in the base node class.
CC: wconstab JackCaoG henrytwo
Partially Fixes https://github.com/pytorch/pytorch/issues/74628
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75324
Reviewed By: anjali411
Differential Revision: D35730823
Pulled By: wconstab
fbshipit-source-id: cf6fa326319b9324e5f422a78817b6fb5bf7e9b8
(cherry picked from commit faec5043df56639e2fd23de2d91ae796e4f3df70)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75525
Creating injection point for ProfilerKineto to attach global callback. We'll disable the KinetoObserver via `'kineto.disable_libkineto_observer=1'` and enable this to swap out the implementations.
Test Plan:
1. add temporary logs in the stub + registration method
2. `buck build mode/opt //kineto/libkineto/fb/integration_tests:trace_tester --config 'kineto.disable_libkineto_observer=1' --config "kineto.enable_libkineto_client=1`
3. `./buck-out/gen/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 1000000` should see log for registration
4. `dyno gputrace` should see log for start/stop
Reviewed By: aaronenyeshi, robieta
Differential Revision: D35456304
fbshipit-source-id: c0a23a57181818e5a0ee495410163d90874355a9
(cherry picked from commit 5dfc723937356693fc041f5a011161e83a8d2528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76049
## Context
We are trying to add an out variant for an existing operator, e.g.,:
```
chunk.out(Tensor self, int chunks, int dim=0, *, Tensor(a!)[] out) -> Tensor(a!)[]
```
Notice the out argument is a mutable list of tensors. The existing guideline defined in [model.py](https://fburl.com/nn299ifx) requires the same argument type to be returned from this operator. Given the fact that we don't support mutable tensor list as a return type and it seems not useful to add such a return type.
The solution I'm proposing is to relax the constraint that the number of outs needs to be the same as the number of returns, so we can return a `void`.
```
chunk.out(Tensor self, int chunks, int dim=0, *, Tensor(a!)[] out) -> ()
```
Test Plan: Rely on existing CI
Reviewed By: ezyang, iseeyuan
Differential Revision: D35737310
fbshipit-source-id: 66b5738cc1dcd13d532a6c97fea979bd58f381df
(cherry picked from commit 9aac5493285cd4f49a07053edfa5916c449a930c)
Moves jit shape function registration to python. Like jit decompositions, a script must be run after adding new definitions which serializes them in a c++ file.
This was a request so that torch-mlir could define functions in python and upstream their shape functions. cc @silvasean @makslevental
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75546
Approved by: https://github.com/davidberard98
Fixes #ISSUE_NUMBER
Sharding for linux-bionic-py3.7-clang9 previously included slow test times in the calculation for how long a test takes, causing the sharding to be uneven:
| Duration | Count | Name|
| ----------- | ----------- | ---|
| 11.2m | 221 |linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)|
| 1.1h | 218 | linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)|
Numbers taken from https://hud.pytorch.org/metrics from 04/10/2022 12:20 PM to 04/17/2022 12:20 PM.
The duration of these jobs on this PR are 39m and 38m.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75918
Approved by: https://github.com/seemethere, https://github.com/janeyx99
This pull request enables accumulating gradients for the CSR tensor.
Functions that work and are tested:
- tensor.abs()
- tensor.neg()
- tensor.conj_physical()
- torch.addmm
`torch.mm` also works, but tests will be added later.
In addition, this PR adds throwing an error when trying to access strides, storage, and contiguity info on a CSR tensor.
`tensor.to_sparse_csr().to_sparse_csr()` was failing and now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75435
Approved by: https://github.com/cpuhrsch
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808
Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor.
ghstack-source-id: 154171542
(Note: this ignores all push blocking failures!)
Test Plan: CI?
Reviewed By: bdhirsh
Differential Revision: D35603836
fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6
(cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)
To do https://github.com/pytorch/pytorch/pull/75972 in a lint free
way I need to reformat all the imports (which are now incorrectly
indented). This is a pain to do manually, so I plan to ask black to
do it for me. But the files are not black compliant. So first reformat
everything with black.
This commit was generated with:
```
black tools/codegen
```
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76015
Approved by: https://github.com/bdhirsh
We would for some reason report formatting-based lints as showing up at
line 1 column 1. This removes them for now. Maybe eventually we can
recover better line numbers from the formatting diff and post messages
for each diff cluster, but that requires actual changes to the linting
engine.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75928
Approved by: https://github.com/janeyx99
This PR turns the previously introduced `ITensorList` into a more general `IList`
class. It is a container wrapper for arbitrary types (given their appropriate
implementations).
In summary, I have:
- Renamed `ITensorList` (its iterators and macros, for consistency) to `IList`
- Made `IList` a templated function (for an arbitrary type `T`), given that they:
- Specialize `IListTagImpl<T, Tag>`, for all `IListTag`
- Introduced type aliases (for both list and iterator types):
- `at::ITensorList` -> `c10::IList<at::Tensor>`
- `at::IOptTensorRefList` -> `c10::IList<at::OptionalTensorRef>`
- Added support for `Tensor?[]` in the structured codegen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69606
Approved by: https://github.com/ezyang
I figured these out by unconditionally turning on a no-op torch function
mode on the test suite and then fixing errors as they showed up. Here's
what I found:
- _parse_to failed internal assert when __torch_function__'ed because it
claims its name is "to" to the argument parser; added a name override
so we know how to find the correct name
- Infix operator magic methods on Tensor did not uniformly handle
__torch_function__ and TypeError to NotImplemented. Now, we always
do the __torch_function__ handling in
_wrap_type_error_to_not_implemented and your implementation of
__torch_function__ gets its TypeErrors converted to NotImplemented
(for better or for worse; see
https://github.com/pytorch/pytorch/issues/75462 )
- A few cases where code was incorrectly testing if a Tensor was
Tensor-like in the wrong way, now use is_tensor_like (in grad
and in distributions). Also update docs for has_torch_function to
push people to use is_tensor_like.
- is_grads_batched was dropped from grad in handle_torch_function, now
fixed
- Report that you have a torch function even if torch function is
disabled if a mode is enabled. This makes it possible for a mode
to return NotImplemented, pass to a subclass which does some
processing and then pass back to the mode even after the subclass
disables __torch_function__ (so the tensors are treated "as if"
they are regular Tensors). This brings the C++ handling behavior
in line with the Python behavior.
- Make the Python implementation of overloaded types computation match
the C++ version: when torch function is disabled, there are no
overloaded types (because they all report they are not overloaded).
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75484
Approved by: https://github.com/zou3519
README Instructions of the coverage tool should be fixed.
1. Some CMAKE options are not consistent with 'pytorch/CmakeLists.txt'.
- 'CODE_COVERAGE' should be 'USE_CPP_CODE_COVERAGE'.
- 'CMAKE_BUILD_CONFIG' should be 'CMAKE_BUILD_TYPE'.
2. Some arguments of 'oss_coverage.py' are incorrect.
- Both '--interested-only' and '--interested-folder' doesn't work. I guess both of them were meant to be '--interest-only'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75091
Approved by: https://github.com/ezyang
Here's what native_dropout does:
- it randomly drops out things in the input with probability p by
multiplying the input with a random mask
- it scales the output with `(p == 1 ? 0.0 : 1.0 / (1.0 - p))`
Further, native_dropout returns two things: the output and the mask
used.
Derivation of formula:
- dropout(x, mask) = mask * x * (p == 1 ? 0.0 : 1.0 / (1.0 - p))
- therefore the formula for `x` is: x_tangent * mask * (p == 1 ? 0.0 : 1.0 / (1.0 - p))
Test Plan:
- OpInfo
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75288
Approved by: https://github.com/soulitzer
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74838
This matches structure of internal builds.
ghstack-source-id: 153272856
Test Plan: Should be a no-op, rely on CI to validate.
Reviewed By: ezyang
Differential Revision: D35187899
fbshipit-source-id: 44b51145df54c41836149704d0d84d4a882f158e
(cherry picked from commit af48ae5e7dd6ea7d6dc3acfb367d76508f7d6b0c)