Summary: A test case that triggers db_options with the save operator is missing.
Test Plan: buck test
Differential Revision: D29642719
fbshipit-source-id: 72b7374d40430398abac26dfe91538550525384d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61548
We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly.
The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.
To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen
* we calculate the amount of momentum that would have been discharged over the missed minibatches and update the weight accordingly.
Differential Revision: D29654246
fbshipit-source-id: 7a6cd7966eb1f31116d99dfce79a78b2d3ee9e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551
We aim to enable rate limiter in C2 load, with a fix bandwidth limit.
This diff update LoadOp to pass down the manifold db options.
Test Plan:
```
buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test
```
Differential Revision: D29639102
fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61488
We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly.
The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.
To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen.
Differential Revision: D27978269
fbshipit-source-id: e47524101ddfcb281c46c505b9b7a8f0835bc64a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60402
Add float64 data type support for ScatterWeightedSum for cases that 10^7 precision is not sufficient.
Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_ops_test -- testScatterWeightedSum
Reviewed By: jianyuh
Differential Revision: D29190324
fbshipit-source-id: 871a60744694e901a2c7685a67350860745d6729
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775
This operator is similar to `GetAllBlobNames` but also returns the estimated
size required to serialize each node.
One goal of this operator is to allow checkpoint saving logic to estimate the
amount of space/bandwidth required to save a checkpoint when first starting
training, without actually serializing any blobs yet. Currently the
checkpointing logic uses `GetAllBlobNames` to determine the blobs to
checkpoint. It can instead be updated to use `EstimateAllBlobSizes` to also
get an estimate for how much space will be required for the checkpoint.
ghstack-source-id: 132275153
Test Plan: Included a new unit test.
Reviewed By: mraway
Differential Revision: D29020227
fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60382
Instead of setting weight_decay w uniformly for all ids, for each row i in the sparse embedding table, the actual weight_decay `w_i` becomes `w*freq_i` where `freq_i = halflife/counter_i \in [\log(2), halflife]`. Counter is from `rowwise_counter` with definition `counter_i = 1 + \exp(-iter_{\delta}*\rho)*counter_i`.
Test Plan:
buck test //caffe2/caffe2/python/operator_test:adagrad_test -- test_row_wise_sparse_adagrad
buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay
Reviewed By: 0x10cxR1
Differential Revision: D25581030
fbshipit-source-id: 54b3831b20516c76c559b13d8deb809e2ee3b446
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60106
In Caffe2, some elementwise in-place compatible ops lack coverage for the in-place case. We add tests for a subset of them here and thereby increase coverage.
Test Plan:
```
buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test
```
Let CI run.
Reviewed By: clrfb
Differential Revision: D29143189
fbshipit-source-id: 83138ad8eff8fe95c40aece53714da3577396a23
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58062
Make templated function to make sure BatchSparseToDense supports int32 lengths/indices
Test Plan:
```buck test //caffe2/caffe2/python/operator_test:batch_sparse_to_dense_op_test
```
Reviewed By: khabinov
Differential Revision: D28271423
fbshipit-source-id: 41b88b7a3663616b533aaf4731ff35cdf6ec4c85
Summary: Relax test deadlines for c2 tests. We run on loaded machines, and timings are unreliable.
Test Plan: Fixes existing tests
Reviewed By: mruberry
Differential Revision: D28690006
fbshipit-source-id: 457707e81a1ec92548c1f23ea7a0022fa0a3bfda
Summary: Tests are frequently failing with "exceeded the deadline of 1000.00ms", we expect this to happen, so remove the deadline
Test Plan: N/A: Fix breakages
Reviewed By: robieta
Differential Revision: D28581051
fbshipit-source-id: 4825ada9af151fa5d57c45c549138c15ba613705
Summary: When run on very heavily loaded machines, some of these tests are timing out. It's not an issue with the test, it's an issue with the environment. I've removed the timeout so we at least keep unit test coverage.
Test Plan: N/A: Fix breakages
Reviewed By: ngimel
Differential Revision: D28492334
fbshipit-source-id: aed3ee371763161aab2d356f5623c7df053fda6f
Summary: Removed the deadline restriction since the first run can take more than the deadline, wile subsequent runs are shorter.
Reviewed By: ngimel
Differential Revision: D28260077
fbshipit-source-id: 8ed2f5c16bc184bf4fae0a59b662fa1da2d4dd0a
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.
The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:
- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`
I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):
- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)
To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737
Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:
- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true
In contrast, this run (after correcting the trailing newlines in this PR) succeeded:
- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241
To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```
Reviewed By: malfet
Differential Revision: D27409736
Pulled By: samestep
fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54042
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53881
1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices.
2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction.
3. also implemented decoupled weight decay in the new optimizer.
Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization
buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad
buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test
ctr_mbl_feed work flow: f255731660
oc work flow: f255739503
Reviewed By: 0x10cxR1
Differential Revision: D26839668
fbshipit-source-id: 2b6881c1a88540ef5766be40f5e80001257e2199
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53735
Add an option to BlobSerializationOptions to request that float data be
serialized as bfloat16. This reduces the serialized data size at the expense
of some loss in precision.
ghstack-source-id: 124317910
Test Plan: Included a new unit test.
Reviewed By: mraway
Differential Revision: D26658205
fbshipit-source-id: 74521ed161059066355a3f208488ed01a344dbb5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402
Add an `options` field to the `Save` operator which accepts options for how to
serialize different blobs. At the moment this simply allows controlling the
existing `chunk_size` behavior, but in the future we can add other options,
such as the ability to control compression settings or other serialization
formats.
ghstack-source-id: 123567034
Test Plan:
Added a new test to `load_save_test.py` that passes in options and verifies
that blobs were serialized with the expected number of chunks.
buck test caffe2/caffe2:caffe2_test_cpu \
caffe2/caffe2/core:serialization_test \
caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26502577
fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53401
This is a reland of D26641599 (cd9ac54ea7) after rebasing onto D26802576 (f595ba1bae).
Add some small utility functions to read the blob names back from the minidb
file so that we can verify how many chunks were written for each blob.
ghstack-source-id: 123567033
Test Plan: buck test caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26853942
fbshipit-source-id: 0b45078fdd279f547752c8fdb771e296374a00da
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Add some small utility functions to read the blob names back from the minidb
file so that we can verify how many chunks were written for each blob.
Test Plan: buck test caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26641599
fbshipit-source-id: bccb0af157d85e585e95bc7be61c4584fba3cb04
Summary:
Add a test in `load_save_test.py` that passes in a chunk_size parameter,
to ensure that we exercise the logic that passes the chunk size to the C++
serialization code.
Test Plan:
Ran the tests with the vlog level set to 3 and manually verified the log
messages showed that we were serializing in the expected chunks.
There are existing C++ tests that confirm chunking behavior works as expected
in the pure C++ code.
Reviewed By: mraway
Differential Revision: D26502578
fbshipit-source-id: cd0074f2358da81c68b0fed2c2a94818d83a957d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51762
Update test_util.py to add a `make_tempdir()` function to the `TestCase`
class. The main advantage of this function is that the temporary
directory will be automatically cleaned up when the test case finishes,
so that test case does not need to worry about manually cleaning up this
directory.
This also prefixes the directory name with `caffe2_test.` so that it is
more obvious where the temporary directories came from if they are ever
left behind after a crashed or killed test process.
This updates the tests in `operator_test/load_save_test.py` to use this
new function, so they no longer have to perform their own manual cleanup
in each test.
Test Plan: python caffe2/python/operator_test/load_save_test.py
Reviewed By: mraway
Differential Revision: D26271178
Pulled By: simpkins
fbshipit-source-id: 51175eefed39d65c03484482e84923e5f39a4768
Summary:
Increasing the deadline as to avoid
flakiness of the test on ROCM.
Signed-off-by: Roy, Arindam <rarindam@gmail.com>
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52013
Reviewed By: albanD
Differential Revision: D26360209
Pulled By: mrshenli
fbshipit-source-id: 1ddc7062c5ff7c980233d22844073de9fb7dcbb3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52083
This makes minor fixes in `caffe2/python` to address all errors currently
reported by Pyre.
I update the code to fix errors when doing so looked simple and safe,
and added `pyre-fixme` comments in other places.
ghstack-source-id: 121109695
Test Plan: Confirmed that Pyre no longer reports errors under `caffe2/python`
Differential Revision: D26272279
fbshipit-source-id: b1eb19d323b613f23280ce9c71e800e874ca1162
Summary:
The test is flaky on ROCM when deadline is set to 1 second. This is affecting builds as it is failing randomly.
Disabling for now.
Signed-off-by: Arindam Roy <rarindam@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50964
Reviewed By: houseroad
Differential Revision: D26049370
Pulled By: BIT-silence
fbshipit-source-id: 22337590a8896ad75f1281e56fbbeae897f5c3b2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49402
In cases of NCCLAllReduce operations there could be non-trivial overhead for
launching cooperative kernels (especially in case of async execution of
different parts of the model). This diff is reviving this operator to make it
possible to fuse multiple operations into a single kernel.
Test Plan:
Unit-test.
Used in a later diff.
Reviewed By: xianjiec
Differential Revision: D25531206
fbshipit-source-id: 64b1c161233a726f9e2868f1059316e42a8ea1fc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49322
In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve.
This operator allows to address these issues by introducing extra explicit dependencies between ops.
Test Plan:
Unit-test/
E2E testing in the future diffs.
Reviewed By: xianjiec
Differential Revision: D24933471
fbshipit-source-id: 1668994c7856d73926cde022378a99e1e8db3567
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal
Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462
Reviewed By: zou3519
Differential Revision: D24422343
Pulled By: ezyang
fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45551
The FP16 version of SparseNormalize op in Caffe2 is missing. This Diff adds FP16 support to unblock MC process of adding FP16 to Dper3.
Check https://fb.quip.com/L0T2AXGwUY3n#EReACAeifk3 .
One question is whether the pure FP16 Sparse Normalized op will affect the accuracy? Maybe we should do it in FP32 domain.
ghstack-source-id: 114184398
Test Plan:
```
buck run mode/opt //caffe2/caffe2/python/operator_test:sparse_normalize_test
```
```
buck run mode/opt -c python.package_style=inplace mode/no-gpu //caffe2/caffe2/python/benchmarks:sparse_normalize_benchmark -- --fp16
```
Reviewed By: jspark1105
Differential Revision: D24005618
fbshipit-source-id: 8b918ec4063fdaafa444779b95206ba2b7b38537