Summary:
[Comment](https://github.com/pytorch/pytorch/pull/62445/files#r680132022) claims, it got added for consistency with top level CMakeLists.txt, but `-Wno-unused-variable` is not mentioned there.
Modify violations in 50+ files that were added in the interim by either removing unused variables, or decorating the code with `C10_UNUSED` if local variable is likely used to extend object lifetime until the end of the block.
Caused preventable revert in https://github.com/pytorch/pytorch/pull/72633#issuecomment-1092300787
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75538
Reviewed By: anjali411
Differential Revision: D35747333
Pulled By: malfet
fbshipit-source-id: 3fc5828e44a4c05ba0e89e92613e6ebbdb260626
(cherry picked from commit c179fba21cfa2a0093fad50ccad5a22dd7cff52c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74169
Alias DB was being way too conservative about the semantics of exported Caffe2 ops - it thought some pure functions were writing to their inputs, which caused `ReplaceWithMaybeCopy` to fail. This in turn lead to a huge decrease in out variant coverage and regressions in many models.
I've extended the export macro to let the user specify an `AliasAnalysisKind` and marked all of the quantization compression ops as pure functions.
ghstack-source-id: 151394133
Reviewed By: hlu1
Differential Revision: D34733630
fbshipit-source-id: e968812e052f14261c10f9a280abe1d910de1f2f
(cherry picked from commit 5e9de49b98caff57be13e8bd101144ae2475b6b5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71196
`caffe2` headers contain code that can elicit warnings when built with strict compiler flags. Rather than force downstream/consuming code to weaken their compiler flags, suppress those warnings in the header using `#pragma clang diagnostic` suppressions.
Test Plan: CI Pass
Reviewed By: malfet
Differential Revision: D33536233
fbshipit-source-id: 74404e7a5edaf244f79f7a0addd991a84442a31f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D31705361
fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67436
This information is useful for comparing static runtime to c2
Reviewed By: d1jang
Differential Revision: D31991571
fbshipit-source-id: eb83bc4564b05d56fb9a550863eea3f6312f3f6c
Summary:
This PR is to update PyTorch with the following cub changes:
- Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added.
And I do the following change to PyTorch:
- Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag.
- Fix caffe2 failures caused by the above change.
- Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219
Reviewed By: bdhirsh
Differential Revision: D31626931
Pulled By: ngimel
fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345
FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165
Test Plan:
CI
perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.
Reviewed By: hlu1
Differential Revision: D31027361
fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66113
For a benchmark compiled in opt-mode in which the lookup items were shuffled and then the items were looked up round-robin fashion 10M times (for a total of 140M lookups) compiled in opt-mode we see:
```
Function Container Time (ms) Multiplier
TypeMetaToDataType if-chain 233 1x
TypeMetaToDataType std::vector 795 3.41x
TypeMetaToDataType std::map 1566 6.72x
TypeMetaToDataType std::unordered_map 2136 9.17x
DataTypeToTypeMeta switch 102 1x
DataTypeToTypeMeta std::vector 666 6.53x
DataTypeToTypeMeta std::map 1212 11.9x
DataTypeToTypeMeta std::unordered_map 1539 15.1x
DataTypeToTypeMeta folly::F14FastMap 1789 17.5x
```
From this, we draw two conclusions:
1. Using a complex container like `std::map` is worse than using a simple vector lookup here (there aren't enough items for the Big-O to assert itself).
2. Using any container at all is a mistake. (Unless we pull in more exotic reasoning like invalidating the code cache or preventing inlining.)
Test Plan: Sandcastle
Reviewed By: dzhulgakov
Differential Revision: D31375117
fbshipit-source-id: 0b310c6c2e94080d125c82fb7c2b43ab869adbcb
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Do not delete `caffe2::OperatorBase::Output` calls as they have side effects
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041
Reviewed By: ngimel
Differential Revision: D31360142
Pulled By: malfet
fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954
Reviewed By: ngimel
Differential Revision: D31326599
Pulled By: malfet
fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612
This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.
Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.
To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.
I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D30728580
Pulled By: ezyang
fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64401
PlanExecutorTest.BlockingErrorPlan uses `ASSERT_DEATH` which internally performs a `fork()`. This can cause problems under certain configurations that use threads. This change updates this test to use the "threadsafe" style for GTest death tests in order to improve its quality in multithreaded environments.
Test Plan:
I confirmed that this change fixes the issue on my devvm with the following command:
```
buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest.BlockingErrorPlan
```
Reviewed By: praihan
Differential Revision: D30709447
fbshipit-source-id: 12ffd9ad0371e2e5b43a9873c80568e5ab02d246
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64244
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040
In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug.
We would instead use the size in bytes based on actual data type.
Test Plan:
Added unit tests BatchMatMulMemCostTest:
buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest
Extended existing unit test test_columnwise_concat for different data types:
buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat
Reviewed By: CrazySherman
Differential Revision: D30656698
fbshipit-source-id: d42c0c9a0c5b0ddc5dba39e4994f1f85a5e618bf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64285
With C++14 heterogeneous ordered container lookup, it is no longer necessary to create a `std::string` in order to look up elements of a `CaffeMap` keyed by std::string. Accordingly, this diff reworks the argument-getting operator functions to avoid that in favor of `c10::string_view`.
ghstack-source-id: 137139818
ghstack-source-id: 137139818
Test Plan: buildsizebot iOS apps -- code size win. less strings is probably marginally good for perf but this only happens at setup time anyway.
Reviewed By: dzhulgakov
Differential Revision: D26826676
fbshipit-source-id: ee653b14dc2c528bae8c90f0fc6a7a419cbca1d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040
In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug.
We would instead use the size in bytes based on actual data type.
Test Plan:
Added unit tests BatchMatMulMemCostTest:
buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest
Extended existing unit test test_columnwise_concat for different data types:
buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat
Differential Revision: D30561459
fbshipit-source-id: 976fa5167097a35af548498480001aafd7851d93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61500
libstdc++ defines a static variable called `std::__ioinit` in iostream that adds global constructor size overhead to each translation that includes iostream. To reduce the size overhead from that, we can often include ostream instead.
ghstack-source-id: 136163529
Test Plan: buildsizebot some mobile apps
Reviewed By: dhruvbird
Differential Revision: D29648016
fbshipit-source-id: 9c3139712c71248513cc5032d21e77f3ecbae8fe
Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons
Found while building locally on M1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930
Reviewed By: albanD
Differential Revision: D30171981
Pulled By: malfet
fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e
Summary:
The cases are found out by compiling against clang on Windows.
Those functions will still be exported under this case, which is a waste of space in the symbol table.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62952
Reviewed By: gchanan
Differential Revision: D30191291
Pulled By: ezyang
fbshipit-source-id: 3319b0ec4f5fb02e0fe1b81dbbcedcf12a0c795e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62632
Update the caffe2/core/context.h to directly use `at::mt19937` instead of the
`at::CPUGeneratorImpl` wrapper class from the ATen-cpu library.
Using `at::CPUGeneratorImpl` causes circular dependencies between the ATen and
caffe2 code. In particular the `at::CPUGeneratorImpl::get_state()` logic
depends on CPU Tensor functionality that currently depends on code from
caffe2.
Test Plan:
The RNG behavior should be identically to the previous code (perhaps even
faster since we now avoid virtual function calls).
buck test //caffe2/caffe2:caffe2_test_cpu \
//caffe2/caffe2/python: //caffe2/caffe2/fb/operators:
Differential Revision: D29915701
fbshipit-source-id: f9b2eab8d3b21b2224d30bcf52be9c0e7eb7cd0a
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Follow-up to https://github.com/pytorch/pytorch/issues/18584. This PR covers the remaining places where event or stream query might result in not ready errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61554
Reviewed By: mrshenli
Differential Revision: D29763973
Pulled By: ezyang
fbshipit-source-id: 41d988d1826b2309cc6b01a81144094b353abdf9
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61006
Test Plan: Modified existing unit test to test for eps = 0. It would fail without the equality test first.
Reviewed By: ajyu
Differential Revision: D29423770
fbshipit-source-id: 168e7de00d8522c4b646a8335d0120700915f260
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775
This operator is similar to `GetAllBlobNames` but also returns the estimated
size required to serialize each node.
One goal of this operator is to allow checkpoint saving logic to estimate the
amount of space/bandwidth required to save a checkpoint when first starting
training, without actually serializing any blobs yet. Currently the
checkpointing logic uses `GetAllBlobNames` to determine the blobs to
checkpoint. It can instead be updated to use `EstimateAllBlobSizes` to also
get an estimate for how much space will be required for the checkpoint.
ghstack-source-id: 132275153
Test Plan: Included a new unit test.
Reviewed By: mraway
Differential Revision: D29020227
fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60208
Update the DB APIs so that `db::Transaction::Put()` accepts the value by
rvalue reference. This allows DB implementations to write data asynchronously
without being forced to make an additional copy of the data in memory.
`Put()` implementations can now use the string move constructor or assignment
operator to get the string data and continue performing the write
asynchronously after returning from `Put()`.
Note that I chose to entirely replace the existing `Put()`, removing the
ability for callers to call `Put()` with a `const std::string&` argument for
the value, rather than simply adding another overloaded version of `Put()`.
This was done because in practice there were no call sites using `Put()` that
cannot move in their data. Eliminating the `const std::string&` API entirely
simplifies the DB implementations: DBs that wish do support move semantics do
not have to implement both the move and the copy versions of `Put()`.
Test Plan:
Searched through fbcode to try and make sure I found all `db::Transaction`
subclasses, and will check sandcastle results to help confirm.
Ran the modelstore checkpointing unit tests.
Differential Revision: D29204425
fbshipit-source-id: 28be6646e92e5df71954d4bb3dc0c8add30ed041
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60207
Update the `BlobSerializerBase` API so that the serizialized blob data is
passed as a `std::string&&` rather than `const std::string&`. This allows the
acceptor to take ownership of the string data. This allows the acceptor to do
things like queue it for storing asynchronously, rather than having to make a
copy of the data if they need it to remain valid after returning.
All existing `BlobSerializerBase` implementations already pass in a valid
rvalue reference to the data, so this change did not require updating any of
the existing serializer implementations.
ghstack-source-id: 132216750
Test Plan:
Examined all ~46 `BlobSerializerBase` subclasses in fbsource to confirm they
already pass in an rvalue reference for this argument. Also searched for
`BlobSerializerBase` on google and did not find any external references to
this class in other open source projects that might be affected.
Differential Revision: D29204426
fbshipit-source-id: b1d567e52a5c17a01d651c70bbfa2fddbaea6cd9
Summary:
Previous is https://github.com/pytorch/pytorch/issues/57781
We add now two CUDA bindings to avoid using ctypes to fix a windows issue.
However, we use ctypes to allocate the stream and create its pointer
(we can do this with a 0-dim tensor too if it feels better).
CC. ezyang rgommers ngimel mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59527
Reviewed By: albanD
Differential Revision: D29053062
Pulled By: ezyang
fbshipit-source-id: 661e7e58de98b1bdb7a0871808cd41d91fe8f13f