Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57689
* Older versions of libgnustd have issues with thread_local C++ qualifier on Android devices prior to r17+. Use c10::tls<> wrapper with smart pointer semantics in such cases.
* Convenient macro `C10_DEFINE_TLS_static` was added as well:
```
// Define static TLS variable str_tls_ of type std::string
C10_DEFINE_TLS_static(std::string, str_tls_);
//////// Excercise it ////////
{
*str_tls_ = "abc";
assert(str_tls_->length(), 3);
}
```
ghstack-source-id: 128233742
Test Plan: CI +
Reviewed By: ilia-cher
Differential Revision: D27875779
fbshipit-source-id: 7764f96ac1e121051c6ea66eabcedb9ef54d290e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57309
Addressing a race condition that can occur in `torch_shm_manager` between the time its temporary file is unlinked and when it `bind()`s the manager server socket to that same name. In that time window, other threads/processes can re-create another temporary file with the same name, causing `bind()` to fail with `EADDRINUSE`.
This diff introduces `c10::TempDir` and associated helper functions that mirror those of `c10::TempFile` and generates the manager socket name using a combination of a temporary directory, which will be valid for the lifetime of `torch_shm_manager`, and a well-known file name within that directory that will never be used outside of `bind()`.
Reviewed By: ejguan
Differential Revision: D28047914
fbshipit-source-id: 148d54818add44159881d3afc2ffb31bd73bcabf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57049
There was a comment above CUDAMultiStreamGuard which said "TODO: Implement this generically in c10". This is what I'm doing here.
The new generic MultiStreamGuard class is able to take a vector of device-agnostic c10::Streams and is able to support any device type (CUDA, but also ROCm and others) by using a VirtualGuardImpl. A class called CUDAMultiStreamGuard is still kept around, for convenience, and slightly for performance as it avoids a vtable lookup.
ghstack-source-id: 127713139
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D28029158
fbshipit-source-id: 2f3181371f8cb0d77a3b2e6aa510f1dd74e8f69b
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
That test was skipped due to a compiler bug. That bug should be fixed in 11.2, so we should enable it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50227
Reviewed By: malfet
Differential Revision: D27909195
Pulled By: anjali411
fbshipit-source-id: c802702079d0e521f53fc98cd0fc3ded0c12b455
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55684
Upcoming changes to `MaybeOwned<T>` will require that T is
one of these two types and will have custom code for both.
This diff updates the tests to continue to build under these new
requirements; it is being sent separately to demonstrate that the
tests continue to work on the current implementation.
ghstack-source-id: 126405918
Test Plan: CI will run the rewritten tests.
Reviewed By: bhosmer
Differential Revision: D27630289
fbshipit-source-id: e38097d9ca04f3337cfa543ebcc8fb5d6916fcf3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55419
Turns out it's useful to have these. I chose to implement them in the straightforward safe way, rather than always borrowing.
ghstack-source-id: 126369328
Test Plan: Added more automated tests.
Reviewed By: hlu1
Differential Revision: D27545805
fbshipit-source-id: 84bb4458b86672ad340cc1f0aa18b80ca7ee13f1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55244
Add the ability to move from the underlying object in a `MaybeOwned`.
FWIW, `MaybeOwned` is new territory for me personally and this move-and-dereference operation is even more so, but I think it makes sense and the tests pass.
ghstack-source-id: 126170046
Test Plan: Added automated tests.
Reviewed By: bhosmer
Differential Revision: D27522809
fbshipit-source-id: 82b180031e93d725209b6328f656315c232e5237
Summary:
fix Semmle warning: Comparison of narrow type with wide type in loop condition
For example there is below piece of code:
for (int i=0; i<array.size(); ++i) {}
The problem is that array.size() return type is size_t can be larger type than int depending on the implementation so there is chance that i overflows (for very large array that array size is beyond the range of integer) and this loop will never be terminated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53951
Reviewed By: zou3519
Differential Revision: D27181495
Pulled By: malfet
fbshipit-source-id: 0612c5cedcdc656c193085e7fbb87dd163f20688
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53303
The old code did a heap allocation unnecessarily and was a
little convoluted. I think that it was structured that way to avoid
double-evaluating arguments; I just forced them to be evaluated once
as though they were passed to a function by binding const references
to them.
ghstack-source-id: 123918262
Test Plan:
1) `buck run mode/opt-clang //caffe2/caffe2/fb/tests:logging_bench`
Before:
```
============================================================================
caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s
============================================================================
glog_CHECK 2.01ns 498.63M
caffe2_ENFORCE_GE 50.00% 4.01ns 249.31M
glog_CHECK_GE 17.39% 11.53ns 86.73M
fbcode_ENFORCE 100.00% 2.01ns 498.65M
caffe2_ENFORCE 100.00% 2.01ns 498.63M
caffe2_ENFORCE_THAT 50.00% 4.01ns 249.33M
============================================================================
```
After:
```
============================================================================
caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s
============================================================================
glog_CHECK 2.01ns 498.63M
caffe2_ENFORCE_GE 97.44% 2.06ns 485.88M
glog_CHECK_GE 17.39% 11.53ns 86.73M
fbcode_ENFORCE 100.00% 2.01ns 498.65M
caffe2_ENFORCE 100.00% 2.01ns 498.65M
caffe2_ENFORCE_THAT 97.28% 2.06ns 485.06M
============================================================================
```
Looks like about a 1.94x speedup!
2) Inspect generated assembly for logging_bench.cpp before & after by:
```
$ compile-commands caffe2/caffe2/fb/tests/logging_bench.cpp -f "mode/opt-clang"
$ jq -r '.[0].arguments | sh' < compile_commands.json | sed -e "s/'-c'/'-S'/g" | sed -E -e "s/'-g[12]'/'-g0'/g" > out.sh
$ sh out.sh
```
Then diff logging_bench.s as you like.
Before: P255408666
After: P277883307
Net about 1500 lines deleted from the assembly. We can see that the
happy path (which the benchmark tests) no longer contains string
creation.
Reviewed By: dzhulgakov
Differential Revision: D26829714
fbshipit-source-id: 6e11f8ea29292ae3d9f2cc89d08afcb06f7d39c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53317
This seems like it might help in cases where we have to call
`Tensor::contiguous`, but we expect that the tensor in question will
be contiguous a good portion of the time.
ghstack-source-id: 123203771
Test Plan:
Profiled AdIndexer on inline_cvr; time spent in
clip_ranges_gather_sigrid_hash_each_feature<int> was cut in half from
1.37% to 0.66%
Reviewed By: smessmer
Differential Revision: D26738036
fbshipit-source-id: b5db10783ccd103dae0ab3e79338a83b5e507ebb
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52221
The previous code forced a `std::string` to be created even when the default message or a user-provided string literal message was used. Now it's not forced and we don't need an outlined lambda in those cases either.
ghstack-source-id: 121877056
Test Plan:
Compare assembly for
```
#include <c10/util/Exception.h>
void f(bool b) {
TORCH_CHECK(b, "message");
}
void g(bool b) {
TORCH_CHECK(b);
}
void h(bool b) {
TORCH_CHECK(b, "message", random());
}
```
before/after in fbcode optimized build.
Before: P174696735
After: P174696840
For `f()` and `g()`, we go from a call to an outlined lambda that did a bunch of `std::string` creation to a load of a string constant before calling `torchCheckFail`. This is a clear improvement.
For `h()`, results are mixed: we save a bunch of *extra* string goop in the outlined lambda and instead call `c10::detail::_str_wrapper` directly. This is good for overall size. However, we no longer outline the call to `random()`, which is less than ideal. I hope to recover the ability to fully outline the `random()` call in future diffs; this is just thorny enough that I don't want to cram even more into one diff.
Added automated test to make sure `TORCH_CHECK` and `TORCH_INTERNAL_ASSERT` only evaluate their arguments once.
Profiled AdIndexer mergenet benchmark in perf to check that `IValue::toTensor` is still getting inlined.
Reviewed By: bhosmer
Differential Revision: D26380783
fbshipit-source-id: 288860772423994ac739a8f33e2c09f718e8dd38
Summary:
libc++ implements csqrt using polar form of the number, which results in higher numerical error, if `arg` is close to 0, pi/2, pi, 3pi/4
Fixes https://github.com/pytorch/pytorch/issues/47500
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52018
Reviewed By: walterddr
Differential Revision: D26359947
Pulled By: malfet
fbshipit-source-id: 8c9f4dc45948cb29c43230dcee9b030c2642d981
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46414
For loops are often written with mismatched data types which causes silent type and sign coercion in the absence of integer conversion warnings. Getting around this in templated code requires convoluted patterns such as
```
for(auto i=decltype(var){0};i<var;i++)
```
with this diff we can instead write
```
for(const auto i = c10::irange(var))
```
Note that this loop is type-safe and const-safe.
The function introduced here (`c10::irange`) allows for type-safety and const-ness within for loops, which prevents the accidental truncation or modification of integers and other types, improving code safety.
Test Plan:
```
buck test //caffe2/c10:c10_test_0
```
Reviewed By: ngimel
Differential Revision: D24334732
fbshipit-source-id: fec5ebda3643ec5589f7ea3a8e7bbea4432ed771
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47508
This moves SizesAndStrides to a specialized representation
that is 5 words smaller in the common case of tensor rank 5 or less.
ghstack-source-id: 119313560
Test Plan:
SizesAndStridesTest added in previous diff passes under
ASAN + UBSAN.
Run framework overhead benchmarks. Looks more or less neutral.
Reviewed By: ezyang
Differential Revision: D24772023
fbshipit-source-id: 0a75fd6c2daabb0769e2f803e80e2d6831871316
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47507
This introduces a new SizesAndStrides class as a helper for
TensorImpl, in preparation for changing its representation.
ghstack-source-id: 119313559
Test Plan:
Added new automated tests as well.
Run framework overhead benchmarks. Results seem to be neutral-ish.
Reviewed By: ezyang
Differential Revision: D24762557
fbshipit-source-id: 6cc0ede52d0a126549fb51eecef92af41c3e1a98
Summary:
All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815
Reviewed By: VitalyFedyunin
Differential Revision: D25734507
Pulled By: mruberry
fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48911
This enables us to use hacky_wrapper_for_legacy_signatures for ops with out arguments so they can use templated unboxing logic without having to be rewritten.
This only actually enables it for one op as a proof of concept. There will be a separate PR enabling it for more ops.
ghstack-source-id: 118379659
Test Plan: waitforsandcastle
Reviewed By: bhosmer
Differential Revision: D25363336
fbshipit-source-id: da075d2cc58814f886a25d52652511dbbe990cec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47014
Some tests are better than zero tests.
ghstack-source-id: 115769678
Test Plan: Run new tests, passes
Reviewed By: smessmer
Differential Revision: D24558649
fbshipit-source-id: 50b8872f4f15c9a6e1f39b945124a31b57dd61d9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46007
When owner released the object, target will become null and illegal to
access refcount_ again. This PR fixes this and return null in that case.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D24374846
Pulled By: wanchaol
fbshipit-source-id: 741074f59c0904a4d60b7bde956cad2d0925be4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44066
Add STL Input iterator to DispatchKeySet:
* Iterator is able to iterate from first not undefined DispatchKey
to NumDispatchKeys.
* Iterator is invalidated once underlying DispatchKeySet is invalidated
Note see http://www.cplusplus.com/reference/iterator/ for comparisons of
different iterators.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D23611405
Pulled By: linux-jedi
fbshipit-source-id: 131b287d60226a1d67a6ee0f88571f8c4d29f9c3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610
Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case.
The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing.
This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`.
For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds.
ghstack-source-id: 108873701
Test Plan: waitforsandcastle
Reviewed By: bhosmer
Differential Revision: D22607879
fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38999
Adds boxing for inplace and outplace kernels, itemizes
remaining unsupported cases, and fails compilation when
new unsupported types are introduced in op signatures.
Test Plan: Imported from OSS
Differential Revision: D21718547
Pulled By: bhosmer
fbshipit-source-id: 03295128b21d1843e86789fb474f38411b26a8b6
Summary:
This file should have been renamed as `complex.h`, but unfortunately, it was named as `complex_type.h` due to a name clash with FBCode. Is this still the case and is it easy to resolve the name clash? Maybe related to the comment at https://github.com/pytorch/pytorch/pull/39834#issuecomment-642950012
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39885
Differential Revision: D22018575
Pulled By: ezyang
fbshipit-source-id: e237ccedbe2b30c31aca028a5b4c8c063087a30f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39492
This PR adds use_c10_dispatcher: full to ops taking TensorOptions. To allow this, since the c10 operator library doesn't know about TensorOptions, we need to register the operator kernels as optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead, and also call them this way.
Changes:
Add use_c10_dispatcher: full to those ops
Write hacky_wrapper_for_legacy_signatures which takes an old-style kernel (i.e. one written to take TensorOptions) an creates a wrapper kernel for it that takes the scattered optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead.
Change codegen so that all op registrations are wrapped into hacky_wrapper_for_legacy_signatures. This is added to all ops but is a no-op if the op doesn't take TensorOptions. This allows us in the future to just change a kernel signature from TensorOptions to the scattered version and have it work without having to touch codegen.
Change codegen so that the frontend calls those operators with expanded arguments instead of with a TensorOptions object. This is required because now the kernels are written in this way.
This PR does not remove TensorOptions special cases from codegen, but instead it separates kernels from the codegen/frontend issues. After this, kernels can be worked on separately without having to touch codegen and codegen can be worked on without having to touch kernels.
Codegen diff: P133121032
ghstack-source-id: 106426630
Test Plan: waitforsandcastle
Differential Revision: D21581908
fbshipit-source-id: 6d4a9f526fd70fae40581bf26f3ccf794ce6a89e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39823
Add a compile time function pointer that can be used to pass function pointers in template args.
This is very useful for metaprogramming function wrappers.
ghstack-source-id: 105944072
Test Plan: waitforsandcastle
Differential Revision: D21986243
fbshipit-source-id: a123571c18aa0e65908cbb131f28922ceb59061c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38418
This is useful in reducing verbosity in c10::complex's general usage, and potentially also offers
performance benefits.
This brings back #34506 (which was made for std::complex).
Differential Revision: D21587012
Test Plan: Imported from OSS
Pulled By: malfet
fbshipit-source-id: 6dd10c2f417d6f6d0935c9e1d8b457fd29c163af
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30922
New c++14 feature we can use now
ghstack-source-id: 103767403
Test Plan: waitforsandcastle
Differential Revision: D18869644
fbshipit-source-id: 54541c8004b2116386668a31eb9b0410a603b7dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37101Fixes#36954.
The basic concept is to streamline the process of rethrowing
c10::Error with extra error information. This is in a few
steps:
- I completely remodeled the Error data type and the internal
invariants. Instead of manually adding in newlines, the
message stack formatting process is responsible for inserting
newlines and spacing as necessary. Call sites are then
modified to respect the new API model.
- TORCH_RETHROW macro is added, which adds context to an error
message and then rethrows it.
New internal assert failure looks like:
```
0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch.
Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first):
frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so)
frame #1: ...
```
Error message with context looks like:
```
This is an error
This is context 1
This is context 2
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21202891
Pulled By: ezyang
fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37094
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21202892
Pulled By: ezyang
fbshipit-source-id: d59e6bffabd90cc734056bdce2cd1fe63262fab8
Summary:
Issue: https://github.com/pytorch/pytorch/issues/35284
~This depends on and contains https://github.com/pytorch/pytorch/pull/35524. Please review after the dependency gets merged and I will rebase to get a clean diff.~
The implementation of most functions follow the pattern
```C++
template<typename T>
C10_HOST_DEVICE c10::complex<T> some_function(c10::complex<T> x) {
#if defined(__CUDACC__) || defined(__HIPCC__)
return static_cast<c10::complex<T>>(thrust::some_function(static_cast<thrust::complex<T>>(x)));
#else
return static_cast<c10::complex<T>>(std::some_function(static_cast<std::complex<T>>(x)));
#endif
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35725
Differential Revision: D21256854
Pulled By: ezyang
fbshipit-source-id: 2112ba6b79923450feafd7ebdc7184a3eaecadb6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31091
This implements a C++17 "if constexpr" like feature for C++14.
This can be used, for example, to replace SFINAE or to force the compiler to remove some parts of a function in the assembly based on a condition.
PRs stacked on top will use this to simplify some of our template metaprogramming.
ghstack-source-id: 102867141
Test Plan: unit tests
Differential Revision: D18927220
fbshipit-source-id: 19a135e00af6ebb0139ce3730353762d4512158f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36984
Follow LOG(WARNING) format for c++ side warnings in order to play well with larger services, especially when using glog. I need to hook up into GLOG internals a bit in order to override FILE/LINE without having to change the whole thing to be macros, but it seems to be stable between glog versions.
Note, this also changes caffe2_log_level to warning by default - I think it's a much better default when compiling without glog (or maybe even have info).
With glog output, stderr capture doesn't work any more in tests. That's why we instead use c10-level warnings capture.
Test Plan:
Run unittest in both glog and non-glog build mode:
glog:
```
W0416 12:06:49.778215 3311666 exception_test.cpp:23] Warning: I'm a warning (function TestBody)
```
no-glog:
```
[W exception_test.cpp:23] Warning: I'm a warning (function TestBody)
```
Reviewed By: ilia-cher
Differential Revision: D21151351
fbshipit-source-id: fa926d9e480db5ff696990dad3d80f79ef79f24a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36768
Follow LOG(WARNING) format for c++ side warnings in order to play well with larger services, especially when using glog. I need to hook up into GLOG internals a bit in order to override FILE/LINE without having to change the whole thing to be macros, but it seems to be stable between glog versions.
Note, this also changes caffe2_log_level to warning by default - I think it's a much better default when compiling without glog (or maybe even have info)
Test Plan:
Run unittest in both glog and non-glog build mode:
glog:
```
W0416 12:06:49.778215 3311666 exception_test.cpp:23] Warning: I'm a warning (function TestBody)
```
no-glog:
```
[W exception_test.cpp:23] Warning: I'm a warning (function TestBody)
```
Reviewed By: ilia-cher
Differential Revision: D21078446
fbshipit-source-id: b5d36aac54d6b6295a72de6754696ccafbcb84ca
Summary:
Step 0 of https://github.com/pytorch/pytorch/issues/35284
Reference: https://en.cppreference.com/w/cpp/numeric/complex
We are targeting C++20. The difference across C++ versions are mostly `constexpr` qualifiers, newer version has more function declared as `constexpr`
This PR adds the core of `c10::complex`, it includes
- standard constructors as in `std::complex`
- explicit conversion constructors converting from `std/thrust::complex` to `c10::complex`
- standard assignment operators as in `std::complex`
- conversion assignment operators converting from `std/thrust::complex` to `c10::complex`
- other standard operators as in `std::complex`
- standard methods as in `std::complex`
- explicit casting operators to std/thrust
- basic non-member functions as in `std::complex`:
- arithmetic operators
- `==`, `!=`
- `<<`, `>>`
- `std::real`, `std::imag`, `std::abs`, `std::arg`, `std::norm`, `std::conj`, `std::proj`, `std::polar`
- Some of them are intentionally not completely implemented, these are marked as `TODO` and will be implemented in the future.
This PR does not include:
- overload of math functions
which will come in the next PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35524
Differential Revision: D21021677
Pulled By: anjali411
fbshipit-source-id: 9e144e581fa4b2bee62d33adaf756ce5aadc0c71
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33313
Instead of just remembering the number of arguments and iterating over the stack,
the DispatchKeyExtractor now remembers the exact locations of the dispatch relevant arguments
(i.e. Tensor arguments) and only looks at those.
ghstack-source-id: 101908386
Test Plan: unit tests, benchmarks
Differential Revision: D19748549
fbshipit-source-id: b5b9ff2233b3507e0b600460f422912cfa9e3f0f
Summary:
Introduce DISABLED_ON_WINDOWS macro, that adds `DISABLED_` prefix to string if compiled for Win32
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35549
Test Plan: CI
Differential Revision: D20700915
Pulled By: malfet
fbshipit-source-id: adddfe2db89b7139093ceef6899862bce0adcf2d
Summary:
Fixes incorrect usages of symbol annotations including:
1. Exporting or importing a function/class in an anonymous namespace.
2. Exporting or importing a function/class implementation in a header file. However, by removing the symbol annotations, they are now local symbols. If they need to be remain global, I can move the implementations to the source file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35364
Differential Revision: D20670031
Pulled By: ezyang
fbshipit-source-id: cd8018dee703e2424482c27fe9608e040d8105b8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32728
It doesn't have much to do with tensors anymore.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19628093
Pulled By: ezyang
fbshipit-source-id: 4d57111cdf44ba347bec8a32bb5b4b47a83c1eaf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31313
This is a bugfix. The reason we couldn't enable the constexpr-ness for it before is that it was buggy,
and without constexpr it crashed at runtime and not at compile time which seems to have passed our CI unfortunately...
ghstack-source-id: 96380160
Test Plan: Now it works even when enabling constexpr for it
Differential Revision: D19087471
fbshipit-source-id: 28be107389f4507d35d08eab4b089a405690529b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31917
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19301480
Pulled By: ezyang
fbshipit-source-id: fcce8868733965b9fbd326b4ec273135759df377
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30916
These macros said "make it constexpr if we're in C++14". Since we're now always C++14, we can just say "constexpr" isntead.
ghstack-source-id: 96369584
Test Plan: waitforsandcastle
Differential Revision: D18869635
fbshipit-source-id: f41751e4e26fad6214ec3a98db2d961315fd73ff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915
Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609
Test Plan: waitforsandcastle
Differential Revision: D18869639
fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26618
Implement a mechanism to get type names at compile time
In a future diff, I'm planning to introduce this to caffe2::TypeMeta and a few other places.
ghstack-source-id: 95337871
Test Plan: unit tests
Differential Revision: D17519253
fbshipit-source-id: e14017f962fd181d147accb3f53fa8d6ee42a3f8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30709
Intrusive_ptr doesn't provide a explicit incref method. When a users want to
incref the target, they creates a intrusive_ptr to wrap the target, then makes
a copy which does the actual incref, then release both the first intrusive_ptr
and the copy to prevent decref at deconstruction time. This is very
inefficient. Instead, do the incref/decref directly.
Differential Revision: D18798505
fbshipit-source-id: 524d4f30d07d733df09d54423b044d80e4651454
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26616
Implement C++17 std::string_view for C++11.
This is useful for compile time type name retrievaly which I'm going to stack on top of this.
It is also useful to replace `const std::string&` with throughout our codebase.
ghstack-source-id: 92100314
Test Plan: unit tests
Differential Revision: D17518992
fbshipit-source-id: 48e31c677d51b0041f4b37e89a92bd176d4a0b08
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26502
Create type ids at compile time instead of incrementing a counter at runtime. This is done by computing a compile time crc64 on the type name. We couldn't do this before, because we still used GCC4 and that compiler didn't support the use of `__PRETTY_FUNCTION__` in a constexpr context. However, since GCC5 this is possible and we can use this trick.
This does not change the semantics of preallocated type ids. I actually think we don't need to preallocate anymore, but I split the removal of preallocation into a separate diff to be able to test it separately.
ghstack-source-id: 91896920
Test Plan: unit tests
Differential Revision: D17488861
fbshipit-source-id: ce7b059d7c8686b69cb091a4a8beaf4b96391343
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25675
This will be used to support OrderedDict in python. Modifies the existing `flat_hash_map` to preserve insertion and deletion order.
Test Plan: Imported from OSS
Differential Revision: D17440131
Pulled By: eellison
fbshipit-source-id: c7a6a290c8471627f5a061c0cca8e98ff131c9b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25597
We now take advantage of the new bitset representation TensorTypeSet to store "Variable-ness" of a tensor directly in the dispatch key. We introduce a new thread local TensorTypeSet "excluded" and replace the previous thread local boolean with it; we no longer have to query `is_variable()` to do dispatch (I didn't delete `is_variable`, because there are still a lot of uses of it). The key change is in `dispatchTypeId`.
Knock-on effects:
* Because Variable is now a TensorTypeId, I can eliminate the out-of-line registration `registerVariableOp` for variables; instead, make the registrar take a TensorTypeId (instead of a Backend) and you just register under the Variable key.
* Tensors aren't really ever created with Variable information initialized correctly at the start; instead, a tensor "becomes" a Variable because we set its `autograd_meta_`. These setters now correctly setup invariants on the dispatch type set. The new invariant is that if `autograd_meta_ != nullptr`, then `type_set().has(TensorTypeId::VariableTensorId)`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D17265919
Pulled By: ezyang
fbshipit-source-id: a90a7ed14f5cb1086137483ae3d0646fcd4c42d0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308
Instead of storing a single TensorTypeId in a Tensor, we store a bitset of tensor type IDs in a Tensor, TensorTypeSet. This class comes with some unit tests. This is in preparation for making Variable a TensorTypeId. In order to help flush out places where this makes a semantic difference, we rename `Tensor::type_id()` to `Tensor::type_set()` and smoke out all of the locations where this was semantically meaningful.
Because the new tensor type set is 64-bits, this increases the size of Tensor by a word.
Listing of semantic changes:
* Many TensorImpl related constructors just propagate TensorTypeId to a parent constructor. These are pretty simple to adjust.
* Backend extensions are now in the business of explicitly constructing a TensorTypeSet and then passing it in. This is probably OK for now but when Variable drops, these dispatch IDs may get immediately overwritten to have Variable set.
* `sparseTensorSetToDeviceType` and similar functions previously did an equality test with TensorTypeId, to determine what an appropriate device type is. This equality is now replaced with a set inclusion test. This is valid, under the assumption that we don't ever have weird sets like "this tensor is simultaneously a sparse CPU tensor and a sparse CUDA tensor", which will be true in the short term plan of adding Variable to the dispatch ID.
* `impl::dispatchTypeId` was generally introduced for cases where we legitimately need to convert from `TensorTypeSet -> TensorTypeId` in a dispatch related manner. At the moment, the implementation is trivial, but they will soon be adjusted to handle TLS. I've tried to make these call sites as forwards compatible as possible:
* `checked_tensor_unwrap` and co now use `dispatchTypeId`. When Variable is added to the type set, these will always be called in a context where the Variable type ID is disabled, so we will get the correct underlying tensor type ID.
* Uses of `Backend` in dispatch are now replaced with `TensorTypeSet`. The general heuristic here for whether or not to accept a `TensorTypeId` or `TensorTypeSet` is that we want to make the generated code as simple as possible. It is easier to retrieve a `TensorTypeSet`, so that's a more appropriate API in these cases.
* In some cases, I could not conveniently switch an implementation to the new semantics, because it was blocked on some other refactor. In this case, I introduced `legacyExtractTypeId`, which gives what would be a BC-compatible `TensorTypeSet` to `TensorTypeId` implementation that will continue to report the same values it would have prior to this change. This is **different** from `dispatchTypeId`, because this function does NOT respect TLS; it always ignores Variable type IDs.
* c10 dispatcher tests, which are oblivious to Variable dispatch, use this BC function (actually, they use `extractTypeId`, an overload for Tensor.
* The implementation of `new_*` methods heavily relies on tensor type ID, I chose not to unwind this. PR to refactor this at https://github.com/pytorch/pytorch/pull/25475
* Slicing also relies on tensor type ID, see `torch/csrc/autograd/python_variable_indexing.cpp` (though in some cases in this file, I was able to replace use of tensor type ID with TensorOptions)
* In some cases, there is an equality test on tensor type ID which would be better done by testing "tensor axes". In those cases, I replaced those equality tests with more equality tests.
* Example: `torch/csrc/nn/type_checks.h`
* There is a total punt in `torch/csrc/tensor/python_tensor.cpp` where "instance of" checking is done via dispatch ids. In general, the Variable-ness of a tensor doesn't participate in instanceof testing. It's not entirely clear what to do here.
* Instead of storing `Backend` in `VariableInfo`, we now just store Layout.
c10 dispatcher test updates were done with:
```
:%s/\([^ ]\+\)\.type_id()/extractTypeId(\1)/g
:%s/\([^( ]\+\)->type_id()/extractTypeId(*\1)/g
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25308
Differential Revision: D17092791
Test Plan: sandcastle and ossci
Reviewed By: bwasti
Pulled By: ezyang
fbshipit-source-id: 22207d14fe62dd31ee19cc5011af22e3d9aabb5b
Summary:
Enabled torch.nn.functional.log_softmax and torch.nn.CrossEntropyLoss for bfloat16 data type.
In order to do that, following dependency have to be enabled.
- RNE (round to nearest even)
- AccumulateType
- bfloat16 arithmetic operator overload
Also, we implement std::numeric_limits fully support for bfloat16 data type
background for dependency:
- RNE vs truncate
From torch.nn.CrossEntropyLoss test. input_size=(128, 1000)
RNE result:
float output: tensor(7.3981, dtype=torch.float32, grad_fn=<NllLossBackward>)
bfloat16 output: tensor(7.3125, dtype=torch.bfloat16, grad_fn=<NllLossBackward>)
truncate result:
float output: tensor(7.3981, dtype=torch.float32, grad_fn=<NllLossBackward>)
bfloat16 output: tensor(5.8750, dtype=torch.bfloat16, grad_fn=<NllLossBackward>)
- scalar_t vs AccumulateType (AccumulateType of bfloat16 is float)
AccumulateType is essential to keep accuracy, especially for reduction related operation.
we have verified it with both local case and real topology. It turns out that bfloat16 type accumulator would cause huge relative error when elements number is large, even more than 50%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24457
Differential Revision: D17113018
Pulled By: ezyang
fbshipit-source-id: 8d61297ca118f9b5c6730a01efcf3a3704d2f206
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252
Our model going forward for extensions will be that you will have to
get an allocation of an ID in our system. This is how things work
in practice today; we're just simplifying our underlying registration
since there is no need to have distributed registration.
There are some codemods in this diff:
```
codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId\(\)' 'TensorTypeId::\1TensorId'
codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined\(\)' 'TensorTypeId::UndefinedTensorId'
codemod --extensions cpp 'TensorType1\(\)' 'TensorTypeId::CPUTensorId'
codemod --extensions cpp 'TensorType2\(\)' 'TensorTypeId::CUDATensorId'
codemod --extensions cpp 'TensorType3\(\)' 'TensorTypeId::XLATensorId'
codemod --extensions cpp 'TensorType1' 'CPUTensorId'
codemod --extensions cpp 'TensorType2' 'CUDATensorId'
codemod --extensions cpp 'TensorType3' 'XLATensorId'
```
The main hand-written changes are in c10/core/TensorTypeId.h
Other manual fixes:
- aten/src/ATen/core/op_registration/op_registration.cpp - stop using
std::string operator+
- aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that
wasn't caught by codemod
- torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration
of TensorTypeId
- aten/src/ATen/core/op_registration/ - remove out-of-line registration
Differential Revision: D17072001
Test Plan: ossci and sandcastle
Pulled By: ezyang
fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1
Summary:
Enable Add, sub, mul, and div on CPU for bfloat16 type.
Tested via unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22851
Differential Revision: D16256757
Pulled By: izdeby
fbshipit-source-id: 8b62f7581fc0ca0d2cff48ab40d877a9fcf70a5b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21809
Many error messages show dispatch keys, for example when the dispatcher didn't find a kernel to dispatch to.
Previously, this was a string like "CPU" or "CUDA" for known backends and just an arbitrary number for other backends.
Now, tensor type id registration also registers a name for the dispatch key and shows that in the error messages.
There is no API change, just the error messages are better now.
Differential Revision: D15835809
fbshipit-source-id: 4f0c9d0925c6708b02d79c653a2fae75b6623bb9