Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal
Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462
Reviewed By: zou3519
Differential Revision: D24422343
Pulled By: ezyang
fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46457
Wanted to see if using CopyMatrix specialized for float that uses mkl_somatcopy can be faster but it wasn't. Still want to check in benchmark that can be used later.
Test Plan: .
Reviewed By: dskhudia
Differential Revision: D24345901
fbshipit-source-id: d3e68dbb560e3138fda11c55789cd41bc0715c6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45551
The FP16 version of SparseNormalize op in Caffe2 is missing. This Diff adds FP16 support to unblock MC process of adding FP16 to Dper3.
Check https://fb.quip.com/L0T2AXGwUY3n#EReACAeifk3 .
One question is whether the pure FP16 Sparse Normalized op will affect the accuracy? Maybe we should do it in FP32 domain.
ghstack-source-id: 114184398
Test Plan:
```
buck run mode/opt //caffe2/caffe2/python/operator_test:sparse_normalize_test
```
```
buck run mode/opt -c python.package_style=inplace mode/no-gpu //caffe2/caffe2/python/benchmarks:sparse_normalize_benchmark -- --fp16
```
Reviewed By: jspark1105
Differential Revision: D24005618
fbshipit-source-id: 8b918ec4063fdaafa444779b95206ba2b7b38537
Summary: This diff adds a string equality checking operator.
Test Plan: Unit tests
Differential Revision: D24042344
fbshipit-source-id: c8997c6130e3438f2ae95dae69f76978e2e95527
Summary: `__repr__` calling self.tasks() ends up marking the instance as "used", which doesn't seem appropriate. I was debugging a value being passed around and then ran into `Cannot add Task to an already used TaskGroup.` because the value had been logged once.
Test Plan:
Added a unit test -- didn't see a clean public method to test it, but I'm happy to add one if that makes sense.
Will wait for sandcastle to trigger everything else; I'm not at all familiar with this code so any other recommendations would be great!
Reviewed By: cryptopic
Differential Revision: D23541198
fbshipit-source-id: 5d1ec674a1ddaedf113140133b90e0da6afa7270
Summary: Currently GetSingleArgument is overflowing since it's expecting an int instead of an int64 when using a 1cycle (hill policy) annealing schedule
Test Plan:
unittest
buck test caffe2/caffe2/python/operator_test:learning_rate_op_test
Differential Revision: D23938169
fbshipit-source-id: 20d65df800d7a0f1dd9520705af31f63ae716463
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45315
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45314
in D23858329 (721cfbf842), we put PriorCorrectionCalibrationPrediction unit test in OSS file which causes test failure issue in public trunk.
this diff moves it to FB only test file.
Test Plan:
```
buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op
buck test //caffe2/caffe2/fb/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op
```
all pass.
Reviewed By: houseroad
Differential Revision: D23899012
fbshipit-source-id: 1ed97d8702e2765991e6caf5695d4c49353dae82
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45178
## Motivation
* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
occurs we need to be able to safely stop all net execution so we can throw
the exception to the caller.
## Summary
* Adds a hypothesis test for queue ops cancellation.
Test Plan:
## Unit test added to verify that queue ops propagate errors
```
buck test caffe2/caffe2/python:hypothesis_test
buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000
```
```
Summary
Pass: 1000
ListingSuccess: 1
```
Reviewed By: d4l3k
Differential Revision: D23847576
fbshipit-source-id: 2fc351e1ee13ea8b32d976216d2d01dfb6fcc1ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45231
There are two operators:
`PriorCorrectionCalibrationPrediction` and `GatherRangesToDense` is not supported in PT which makes GLOW cannot work.
To unblock, we first try to use C2->PT conversion. In the long-term, we need to implement PT custom ops.
This diff does this conversion to unblock current project.
Test Plan:
Run unit test. the Test input is from current DPER example.
All pass.
```buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op --print-passing-details
> c2 reference output
> [0.14285715 0.27272728 0.39130434 0.5 ]
> PT converted output
> tensor([0.1429, 0.2727, 0.3913, 0.5000])
buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op --print-passing-details
c2 reference output
> [array([[6, 5, 4, 3], [0, 0, 0, 0]], dtype=int64)]
> PT converted output
> [tensor([[6, 5, 4, 3], [0, 0, 0, 0]])]
```
Reviewed By: allwu, qizzzh
Differential Revision: D23858329
fbshipit-source-id: ed37118ca7f09e1cd0ad1fdec3d37f66dce60dd9
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:
```2to3 -f future -w caffe2```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033
Reviewed By: seemethere
Differential Revision: D23808648
Pulled By: bugra
fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
Summary:
## Motivation
* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
occurs we need to be able to safely stop all net execution so we can throw
the exception to the caller.
* When an error occurs in a net or it got cancelled, running ops will have the
`Cancel` method called.
* This diff adds `Cancel` method to the `SafeEnqueueBlobsOp`
and `SafeDequeueBlobsOp` to have the call queue->close() to force all the
blocking ops to return.
* Adds unit test that verified the error propagation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44495
Test Plan:
## Unit Test added to verify that queue ops propagate errors
```
buck test caffe2/caffe2/python:hypothesis_test
```
Reviewed By: dzhulgakov
Differential Revision: D23236088
Pulled By: dahsh
fbshipit-source-id: daa90d9ee32483fb51195e269a52cf5987bb0a5a
Summary:
Make `gcs_cuda_only` and `gcs_gpu_only` return empty device lists if CUDA/GPU(CUDA or RocM) not available
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44578
Reviewed By: walterddr
Differential Revision: D23664227
Pulled By: malfet
fbshipit-source-id: 176b5d964c0b02b8379777cd9a38698c11818690
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44540
Support output type to be fp16 for UniformFill
Reviewed By: jianyuh
Differential Revision: D23558030
fbshipit-source-id: 53a5b2c92cfe78cd11f55e6ee498e1bd682fe4a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44089
Add support of fp16 as input type in SparseLengthSum/Mean caffe2 operator
Reviewed By: xianjiec
Differential Revision: D23436877
fbshipit-source-id: 02fbef2fde17d4b0abea9ca5d17a36aa989f98a0
Summary:
Expose the interface of `nesterov` of SGD Optimizer from caffe2 to dper.
dper sgd optimizer (https://fburl.com/diffusion/chpobg0h) has referred to NAG sgdoptimizer in caffe2: https://fburl.com/diffusion/uat2lnan. So just need to add the parameter 'nesterov' in dper sgd optimizer.
Analysis of run resutls: N345540.
- train_ne increases as momentum (m) decreases.
- for m=0.95, 0.9: eval_ne is lower with NAG than production (no NAG, m = 0.95).
- for m=0.99: eval_ne with or without NAG is higher than production. It indicates larger variance in validation and overfit in training (lower train_ne).
Test Plan:
1. unit tests:
`buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_without_nesterov`
`buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_with_nesterov`
.
1. build dper front end package: `flow-cli canary ads.dper3.workflows.sparse_nn.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here https://www.internalfb.com/intern/buck/build/2a368b55-d94b-45c1-8617-2753fbce994b. Flow package version is ads_dper3.canary:856b545cc6b249c0bd328f845adeb0d2.
.
2. To build dper back end package: `flow-cli canary dper.workflows.dper3.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here: https://www.internalfb.com/intern/buck/build/70fa91cd-bf6e-4a08-8a4d-41e41a77fb52. Flow package version is aml.dper2.canary:84123a34be914dfe86b1ffd9925869de.
.
3. Compare prod with NAG-enabled runs:
a) refreshed prod run (m=0.95): f213877098
NAG enabled run (m=0.95): f213887113
.
b) prod run (m=0.9): f214065288
NAG enabled run (m=0.9): f214066319
.
c) prod run (m=0.99): f214065804
NAG enabled run (m=0.99): f214066725
.
d) change date type of nestrov to `bool` and launched a validation run
NAG enabled (m=0.95): f214500597
Reviewed By: ustctf
Differential Revision: D23152229
fbshipit-source-id: 61703ef6b4e72277f4c73171640fb8afc6d31f3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44043
To invoke `cancel` from the net instance in Python, we expose it through pybind state.
Reviewed By: dzhulgakov
Differential Revision: D23249660
fbshipit-source-id: 45a1e9062dca811746fcf2e5e42199da8f76bb54
Summary: Exports the operator to PyTorch, to be made into a low-level module.
Test Plan:
```
buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_learning_rate
```
Reviewed By: yf225
Differential Revision: D23545582
fbshipit-source-id: 6b6d9aa6a47b2802ccef0f87c1263c6cc2d2fdf6
Summary: Integrate aot flow with model exporter.
Test Plan:
buck test dper3/dper3_backend/delivery/tests:dper3_model_export_test
replayer test see D23407733
Reviewed By: ipiszy
Differential Revision: D23313689
fbshipit-source-id: 39ae8d578ed28ddd6510db959b65974a5ff62888
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43938
resubmit
Test Plan: unit test included
Reviewed By: mruberry
Differential Revision: D23443493
fbshipit-source-id: 7b68f8f7d1be58bee2154e9a498b5b6a09d11670
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43591
100 randomized inputs vs 50 doesn't change the balance that much but speed up test runtime
Test Plan: CI
Reviewed By: orionr, seemethere
Differential Revision: D23332393
fbshipit-source-id: 7a8ff9127ee3e045a83658a7a670a844f3862987
Summary:
Separate user embeddings and ad embeddings in blobsOrder. New order:
1. meta_net_def
2. preload_blobs
3. user_embeddings (embeddings in remote request only net)
4. ad_embeddings (embeddings in remote other net)
Add a field requestOnlyEmbeddings in meta_net_def to record user_embeddings.
This is for flash verification.
Test Plan:
buck test dper3/dper3_backend/delivery/tests:blob_reorder_test
Run a flow with canary package f211282476
Check the net: n326826, request_only_embeddings are recorded as expected
Reviewed By: ipiszy
Differential Revision: D23008305
fbshipit-source-id: 9360ba3d078f205832821005e8f151b8314f0cf2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43205
A number of tests that forward to `TestLoadSaveBase.load_save` are all marked as flaky due to them regularly taking much longer to start up than hypothesis' default timeout of 200ms. This diff fixes the problem by removing the timeout for `load_save`. This is alright as these tests aren't meant to be testing the performance of these operators.
I would set the deadline to 60s if I could however it appears the that caffe2 github CI uses a different version of hypothesis that doesn't allow using `dateutil.timedelta` so instead of trying to figure out an approach that works on both I've just removed the deadline time.
I've also tagged all existing tasks WRT these failures.
Differential Revision: D23175752
fbshipit-source-id: 324f9ff034df1ac4874797f04f50067149a6ba48
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42927
added fp16 fusion to net transforms
refactored the transforms as well as glow_transform to get out of opt/custom so that the OSS builds passed
Test Plan: added net runner tests for this
Reviewed By: yinghai
Differential Revision: D23080881
fbshipit-source-id: ee6451811fedfd07c6560c178229854bca29301f
Summary:
1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context.
2. Add support to scaling lengths vector for SplitByLengths operator.
3. Add support to test SplitByLengths operator in the CUDA context.
Example for SplitByLengths operator processing scaling lengths vector:
value vector A = [1, 2, 3, 4, 5, 6]
length vector B = [1, 2]
after execution of SplitByLengths operator,
the output should be [1,2] and [3,4,5,6]
Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test
Reviewed By: kennyhorror
Differential Revision: D23079841
fbshipit-source-id: 3700e7f2ee0a5a2791850071fdc16e5b054f8400
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42763
add the fp16 fusions as net transforms:
-layernorm fused with mul+add
-swish int8
Test Plan: added unit test, ran flows
Reviewed By: yinghai
Differential Revision: D23002043
fbshipit-source-id: f0b13d51d68c240b05d2a237a7fb8273e996328b