Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D31705359
fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66443
For some reason, this logging is adding noise to a lot of flow jobs. I am not sure if this is actually needed.
This is called from the __init__ so it's logged all the time and logs all key:values the current local symbol.
Test Plan: N/A
Reviewed By: chowarfb
Differential Revision: D31534372
fbshipit-source-id: bed032b66fed548c97a6f66b1b9e905fd2738851
Summary:
Addresses this network risk mitigation mentioned in https://github.com/pytorch/pytorch/issues/65439#issuecomment-924627239.
I didn't include any mobile app/benchmarking changes because I think the pretrained matters there.
I ended up removing the changes in test_utils because those were sensitive to the pretrained variable.
I am saving the quantization test changes for another PR because they are currently disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66312
Reviewed By: ejguan
Differential Revision: D31542992
Pulled By: janeyx99
fbshipit-source-id: 57b4f70247af25cc96c57abd9e689c34641672ff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66056
keep running into this unrelated failure when landing diffs regarding the gpu inference project,
disabling this operator unit test in gpu because it doesn't exist
RuntimeError: [enforce fail at operator.cc:277] op. Cannot create operator of type 'SmartDecaySparseAdam' on the device 'CUDA'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "param" input: "mom1" input: "mom2" input: "last_seen" input: "indices" input: "grad" input: "lr" input: "iter" output: "param" output: "mom1" output: "mom2" output: "last_seen" name: "" type: "SmartDecaySparseAdam" arg { name: "beta1" f: 0 } arg { name: "beta2" f: 0.9 } arg { name: "epsilon" f: 1e-05 } device_option { device_type: 1 }
https://www.internalfb.com/intern/testinfra/diagnostics/5910974579962988.562949996565057.1633122845/
Test Plan: sandcastle
Reviewed By: jianyuh
Differential Revision: D31364731
fbshipit-source-id: 7fbd994cbe7f6ca116f5f34506a1ed7f14759bdf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64244
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040
In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug.
We would instead use the size in bytes based on actual data type.
Test Plan:
Added unit tests BatchMatMulMemCostTest:
buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest
Extended existing unit test test_columnwise_concat for different data types:
buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat
Reviewed By: CrazySherman
Differential Revision: D30656698
fbshipit-source-id: d42c0c9a0c5b0ddc5dba39e4994f1f85a5e618bf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040
In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug.
We would instead use the size in bytes based on actual data type.
Test Plan:
Added unit tests BatchMatMulMemCostTest:
buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest
Extended existing unit test test_columnwise_concat for different data types:
buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat
Differential Revision: D30561459
fbshipit-source-id: 976fa5167097a35af548498480001aafd7851d93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63133
SplitOp is costly but missing cost inference function which hurts cost based balancing. Changes are:
(1) Addition of CostInferenceFunction for SplitOp
(2) Small fix in CostInferenceFunction for ConcatOp
Test Plan:
Added unit tests:
buck test //caffe2/caffe2/python/operator_test:split_op_cost_test
buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test
Reviewed By: smacke
Differential Revision: D30247360
fbshipit-source-id: 989e962f3a981acc85b73aac3fb23e603b7d1591
Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons
Found while building locally on M1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930
Reviewed By: albanD
Differential Revision: D30171981
Pulled By: malfet
fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63068
The caffe2 core.Net constructor can accept a caffe2_pb2.NetDef proto, but it always creates a copy. This is wasteful when we can prove that the proto being passed to it will not be used anywhere else. So we add an "inplace" argument to the `core.Net` constructor that allows clients to give away ownership of the passed proto without copying. We default this argument to `False`, ensuring that behavior does not change unless explicitly requested.
Test Plan: Let CI run.
Differential Revision: D29976510
fbshipit-source-id: 26e13ca76f3431b8ef0de51f08bbf263491d323e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62493
This diff adds a broadcast fastpath for the caffe2 broadcast utility function, which just copies the contents of a smaller tensor into a larger one. We also update the tests to exercise the new functionality.
Test Plan: unit tests + let CI run
Differential Revision: D29938285
fbshipit-source-id: 543ecc548500380e307be91902696033454964a2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62437
In this diff we add a broadcast fastpath for MulGradient and DivGradient ops, whose tests we update to exercise the new functionality.
Test Plan: Added test cases to elementwise ops (which will exercise the new MulGradient / DivGradient broadcast fastpath functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request allow_broadcast_fastpath=True, and nothing outside of the added tests currently does so.
Differential Revision: D29938273
fbshipit-source-id: 281c1a109e38c25b9bf9ff8d832de60ac3c231a9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62428
In this diff we add a broadcast fastpath for reduce utility functions. These functions are used by various elementwise ops, whose tests we update to exercise the new functionality.
Test Plan: Added test cases to elementwise ops (which will exercise the new reducer functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request `allow_broadcast_fastpath=True`, and nothing outside of the added tests currently does so.
Differential Revision: D29938264
fbshipit-source-id: 5d5542bd93afb85fd9f7a4073f766adc07eb3b65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62059
GetOperatorCost in Workspace exposes flops and bytes_written only. Make the an additional piece, bytes_read, available from OperatorSchema::Cost.
Test Plan:
Added the two additional pieces in the unit test testGetOperatorCost in workspace_test
buck test caffe2/caffe2/python:workspace_test -- testGetOperatorCost
buck test //aml/ml_foundation/exp_platform/large_scale_training/distributed_hogwild/auto_device_placement/tests/...
buck test //aiplatform/training/autotuning/tests/...
buck test //aiplatform/training/pipelining/tests/...
buck test //deeplearning/fblsim/tests/...
Flow tests:
ADP Greedy: f288078287
ADP MILP: f288079278
Reviewed By: CrazySherman, xtaofb
Differential Revision: D29860676
fbshipit-source-id: 8b3a9f2bf17c0dae48cfe2800e8821bf441e0b03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62135
The initial implementation of Adam with Smart Decay had an off-by-one error. This was in the summation of the geometric series used to calculate how much built-up momentum would have been discharged in skipped minibatches.
The unit tests should have caught these, but the testing strategy missed this because k, the "number of skipped minibatches" was always either 0 or so high that the impact of the bug was too small. The impact of the bug was proportional to 1/k. The testing strategy has also been adjusted to cover this bug.
Differential Revision: D29889309
fbshipit-source-id: b086c0efed5c27f621061e726533c73658daffc6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62058
This is the second diff in this stack. This diff includes the changes to DPER3; the first diff includes the changes to Caffe2.
We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly.
The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.
To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen.
We hope this will significantly improve the inconsistent learning parameter issue we have seen with Adam.
Differential Revision: D29638897
fbshipit-source-id: 18d8e227d72c2e23010ca81e0f6eeb78872c8d3c
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary: A test case that triggers db_options with the save operator is missing.
Test Plan: buck test
Differential Revision: D29642719
fbshipit-source-id: 72b7374d40430398abac26dfe91538550525384d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61548
We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly.
The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.
To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen
* we calculate the amount of momentum that would have been discharged over the missed minibatches and update the weight accordingly.
Differential Revision: D29654246
fbshipit-source-id: 7a6cd7966eb1f31116d99dfce79a78b2d3ee9e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551
We aim to enable rate limiter in C2 load, with a fix bandwidth limit.
This diff update LoadOp to pass down the manifold db options.
Test Plan:
```
buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test
```
Differential Revision: D29639102
fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61488
We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly.
The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.
To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen.
Differential Revision: D27978269
fbshipit-source-id: e47524101ddfcb281c46c505b9b7a8f0835bc64a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60402
Add float64 data type support for ScatterWeightedSum for cases that 10^7 precision is not sufficient.
Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_ops_test -- testScatterWeightedSum
Reviewed By: jianyuh
Differential Revision: D29190324
fbshipit-source-id: 871a60744694e901a2c7685a67350860745d6729
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775
This operator is similar to `GetAllBlobNames` but also returns the estimated
size required to serialize each node.
One goal of this operator is to allow checkpoint saving logic to estimate the
amount of space/bandwidth required to save a checkpoint when first starting
training, without actually serializing any blobs yet. Currently the
checkpointing logic uses `GetAllBlobNames` to determine the blobs to
checkpoint. It can instead be updated to use `EstimateAllBlobSizes` to also
get an estimate for how much space will be required for the checkpoint.
ghstack-source-id: 132275153
Test Plan: Included a new unit test.
Reviewed By: mraway
Differential Revision: D29020227
fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60382
Instead of setting weight_decay w uniformly for all ids, for each row i in the sparse embedding table, the actual weight_decay `w_i` becomes `w*freq_i` where `freq_i = halflife/counter_i \in [\log(2), halflife]`. Counter is from `rowwise_counter` with definition `counter_i = 1 + \exp(-iter_{\delta}*\rho)*counter_i`.
Test Plan:
buck test //caffe2/caffe2/python/operator_test:adagrad_test -- test_row_wise_sparse_adagrad
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay
Reviewed By: 0x10cxR1
Differential Revision: D25581030
fbshipit-source-id: 54b3831b20516c76c559b13d8deb809e2ee3b446
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60106
In Caffe2, some elementwise in-place compatible ops lack coverage for the in-place case. We add tests for a subset of them here and thereby increase coverage.
Test Plan:
```
buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test
```
Let CI run.
Reviewed By: clrfb
Differential Revision: D29143189
fbshipit-source-id: 83138ad8eff8fe95c40aece53714da3577396a23
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080
ONNX optimizer is removed in ONNX 1.9
This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9.
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D28467330
Pulled By: malfet
fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568
Co-authored-by: neginraoof <neginmr@utexas.edu>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58062
Make templated function to make sure BatchSparseToDense supports int32 lengths/indices
Test Plan:
```buck test //caffe2/caffe2/python/operator_test:batch_sparse_to_dense_op_test
```
Reviewed By: khabinov
Differential Revision: D28271423
fbshipit-source-id: 41b88b7a3663616b533aaf4731ff35cdf6ec4c85
Summary: Relax test deadlines for c2 tests. We run on loaded machines, and timings are unreliable.
Test Plan: Fixes existing tests
Reviewed By: mruberry
Differential Revision: D28690006
fbshipit-source-id: 457707e81a1ec92548c1f23ea7a0022fa0a3bfda
Summary: Tests are frequently failing with "exceeded the deadline of 1000.00ms", we expect this to happen, so remove the deadline
Test Plan: N/A: Fix breakages
Reviewed By: robieta
Differential Revision: D28581051
fbshipit-source-id: 4825ada9af151fa5d57c45c549138c15ba613705
Summary: When run on very heavily loaded machines, some of these tests are timing out. It's not an issue with the test, it's an issue with the environment. I've removed the timeout so we at least keep unit test coverage.
Test Plan: N/A: Fix breakages
Reviewed By: ngimel
Differential Revision: D28492334
fbshipit-source-id: aed3ee371763161aab2d356f5623c7df053fda6f
Summary:
This is the only line (not in `third_party`) matching the regex `^#!.*python2`, and [it is not the first line of its file](https://github.com/koalaman/shellcheck/wiki/SC1128), so it has no effect. As a followup to https://github.com/pytorch/pytorch/issues/58275, this PR removes that shebang to reduce confusion, so now all Python shebangs in this repo are `python3`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58409
Reviewed By: walterddr
Differential Revision: D28478469
Pulled By: samestep
fbshipit-source-id: c17684c8651e45d3fc383cbbc04a31192d10f52f