Commit Graph

125 Commits

Author SHA1 Message Date
Yun Wang (Speech)
0d57e87000 Fix test_div in caffe2/caffe2/python:hypothesis_test (#106694)
Summary: Suppress the "too_slow" health check for `test_div`.

Test Plan: Sandcastle

Differential Revision: D48105842

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106694
Approved by: https://github.com/malfet
2023-08-08 04:50:21 +00:00
Omkar Salpekar
ae1ed27756 [codemod][numpy] replace np.str with str (#103931)
Summary:
`np.str` is removed from numpy 1.20.0. It was an alias to builtin `str` and it's safe to do the replacement.

The whole changes is mechanical, generated using the following onliner:
```
fbgr -sl 'np\.str\b' | xargs perl -pi -e 's,\bnp\.str\b,str,g'
```

Test Plan: sandcastle

Differential Revision: D46586144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103931
Approved by: https://github.com/huydhn
2023-06-21 18:16:42 +00:00
Yun Wang (Speech)
4130e4f284 [hypothesis==6.70.1] Fix more test errors (#98685)
Summary:
This diff fixes more test failures (T150117218) caused by upgrading the "hypothesis" library to 6.70.1 (D44523679).

# //caffe2/caffe2/python:hypothesis_test
This test generates float numbers and filters out those whose absolute values are less than 1e-2.
It is a known issue of the new version of "hypothesis" that it generates zeros or floats with small absolute values too often:
https://github.com/HypothesisWorks/hypothesis/issues/3603
I'm circumventing this issue by suppressing the health check `filter_too_much`.

# //caffe2/caffe2/quantization/server:resize_nearest_dnnlowp_op_test
All arithmetic should be done in float32 when calculating the reference, since the network being tested uses float32 everywhere.
Mixing float32, float64 or even integers will result in intermediate values in float64.
The different precision may cause off-by-1 errors when converting to integer.

Test Plan:
Run all the tests in both "dev" and "opt" modes:
```
for mode in dev opt; do
  buck2 test mode/$mode //caffe2/caffe2/python:hypothesis_test -- --run-disabled
  buck2 test mode/$mode //caffe2/caffe2/quantization/server:resize_nearest_dnnlowp_op_test -- --run-disabled
  buck2 test mode/$mode //caffe2/caffe2/fb/layers/tests:tum_history_test -- --run-disabled
  buck2 test mode/$mode //caffe2/caffe2/fb/dper/layer_models/tests:nn_ops_test -- --run-disabled
  buck2 test mode/$mode //caffe2/caffe2/fb/metrics:metrics_test -- --run-disabled
  buck2 test mode/$mode //deeplearning/numeric_suite/toolkit/test:net_transform_test -- --run-disabled
  buck2 test mode/$mode //f3/type_system:tests -- --run-disabled
done
```

**NOTE:** In the first test (`//caffe2/caffe2/python:hypothesis_test`), the two methods `test_constant_fill_from_tensor` and `test_recurrent` would crash.
But these crash on hypothesis 5.49.0, too, so I'm leaving them alone.

Differential Revision: D44812706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98685
Approved by: https://github.com/malfet
2023-04-11 19:07:55 +00:00
Nikita Shulga
1906eaf22f [BE] Get rid of future (#92596)
PyTorch has been Python-3.X+ for ages, so it's a shame to still rely on `future.utils` even in a deprecated Caffe2 codebase

For the reference:
https://peps.python.org/pep-0469/#migrating-directly-to-python-3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92596
Approved by: https://github.com/kit1980, https://github.com/orionr
2023-01-19 08:46:50 +00:00
Adam Simpkins
c4eb22009e Drop some Python 2 compatibility code (#51769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51769

Remove some Python 2 compatibility code that otherwise causes errors to
be reported from static type checkers.

Static type checkers complain that the old Python 2 modules and
functions referenced by this code do not exist.  Given that Python 2
support is entirely deprecated now we can simply remove the
compatibility code.
ghstack-source-id: 121313191

Test Plan:
Was able to get Pyre to successfully type check the `caffe2/python`
directory with this and some other changes.

Reviewed By: Tianshu-Bao

Differential Revision: D26271723

Pulled By: simpkins

fbshipit-source-id: fec8a09466be6867388832380480aafd36616aa1
2021-02-11 11:02:33 -08:00
Lu Fang
1fdc35da2c [BE] Fix the broken test -- caffe2/caffe2/python:hypothesis_test - test_recurrent (#50668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50668

GPU initialization sometimes is slow

Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --exact 'caffe2/caffe2/python:hypothesis_test - test_recurrent (caffe2.caffe2.python.hypothesis_test.TestOperators)' --run-disabled

Reviewed By: hl475

Differential Revision: D25939037

fbshipit-source-id: 832700cf42ece848cda66dd629a06ecda207f086
2021-01-17 21:21:38 -08:00
Taylor Robie
faf6032945 Remove deadlines for Caffe2 hypothesis_test when running on GPU. (#49591)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49591

A bunch of these tests are marked flaky, and have been since time immemorial. (Read: as far back as Buck will build.) However closer inspection reveals that they fail if and only if run on a GPU worker. What seems to be going on is that there are more jobs than GPUs, so the contention causes waits which registers as timeouts on the test.

This diff is kind of hacky, but it basically just drops deadlines if a GPU is present. Because Caffe2 is going away I'm not too terribly concerned about a beautiful solution, but we may as well keep some test coverage if it's easy.

CC Sebastian, Ilia, Min, and Hongzheng who also have tasks for what seems to be the same flakiness.

Test Plan: Turn the tests back on and see if they fall over. (The failure repros reliably on an OnDemand GPU and is fixed by this change, so it's not really just a hail Mary.)

Reviewed By: ngimel

Differential Revision: D25632981

fbshipit-source-id: 43dcce416fea916ba91f891e9e5b59b2c11cca1a
2020-12-18 10:00:24 -08:00
Peiyao Zhou
4078f44668 [TB][embedding supporting] Modify histogram to accept multipy types to skip Castop and avoid OOMing in Castop
Summary: To support min/max/mean/std, SummarizeOp need to skip size checking (similar to the LpNorm error mentioned above) and accept multiple types

Test Plan:
unit test:
`buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test`

https://our.intern.facebook.com/intern/testinfra/testrun/1407375057859572

`buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test --stress-runs 1000`

https://our.intern.facebook.com/intern/testinfra/testrun/2533274832166362

Reviewed By: cryptopic

Differential Revision: D24605507

fbshipit-source-id: fa08372d7c9970083c38abd432d4c86e84fb10e0
2020-11-11 12:03:54 -08:00
Brandon Lin
4a581ba6c2 Implement LengthsToOffsets operator in Caffe2 (#46590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46590

This operator is very similar to LengthsToRanges but doesn't pack the offsets next to the original lengths.

Reviewed By: yf225

Differential Revision: D24419746

fbshipit-source-id: aa8b014588bb22eced324853c545f8684086c4e4
2020-10-29 07:03:34 -07:00
Danny Huang
cd7a682282 [caffe2] adds hypothesis test for queue ops cancel (#45178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45178

## Motivation
* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
occurs we need to be able to safely stop all net execution so we can throw
the exception to the caller.

## Summary
* Adds a hypothesis test for queue ops cancellation.

Test Plan:
## Unit test added to verify that queue ops propagate errors

```
buck test caffe2/caffe2/python:hypothesis_test
buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000
```

```
Summary
  Pass: 1000
  ListingSuccess: 1
```

Reviewed By: d4l3k

Differential Revision: D23847576

fbshipit-source-id: 2fc351e1ee13ea8b32d976216d2d01dfb6fcc1ad
2020-09-24 14:43:52 -07:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Mike Ruberry
b6f4bb0a70 Revert D23236088: [pytorch][PR] [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp
Test Plan: revert-hammer

Differential Revision:
D23236088 (0ccc38b773)

Original commit changeset: daa90d9ee324

fbshipit-source-id: 933c7deab177250075683a9bea143ac37f16a598
2020-09-16 23:32:50 -07:00
Danny Huang
0ccc38b773 [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#44495)
Summary:
## Motivation

* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
  occurs we need to be able to safely stop all net execution so we can throw
  the exception to the caller.

* When an error occurs in a net or it got cancelled, running ops will have the
 `Cancel` method called.

* This diff adds `Cancel` method to the `SafeEnqueueBlobsOp`
and `SafeDequeueBlobsOp` to have the call queue->close() to force all the
 blocking ops to return.
* Adds unit test that verified the error propagation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44495

Test Plan:
## Unit Test added to verify that queue ops propagate errors
```
buck test caffe2/caffe2/python:hypothesis_test
```

Reviewed By: dzhulgakov

Differential Revision: D23236088

Pulled By: dahsh

fbshipit-source-id: daa90d9ee32483fb51195e269a52cf5987bb0a5a
2020-09-16 18:17:34 -07:00
Lingyi Liu
bc64efae48 Back out "Revert D19987020: [pytorch][PR] Add the sls tensor train op" (#43938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43938

resubmit

Test Plan: unit test included

Reviewed By: mruberry

Differential Revision: D23443493

fbshipit-source-id: 7b68f8f7d1be58bee2154e9a498b5b6a09d11670
2020-09-01 11:42:12 -07:00
Mike Ruberry
cc52386096 Revert D19987020: [pytorch][PR] Add the sls tensor train op
Test Plan: revert-hammer

Differential Revision:
D19987020 (f31b111a35)

Original commit changeset: e3ca7b00a374

fbshipit-source-id: a600c747a45dfb51e0882196e382a21ccaa7b989
2020-08-29 12:46:11 -07:00
Lingyi Liu
f31b111a35 Add the sls tensor train op (#33525)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33525

Reviewed By: wx1988

Differential Revision: D19987020

Pulled By: lly-zero-one

fbshipit-source-id: e3ca7b00a374a75ee42716c4e6236bf168ebebf1
2020-08-29 12:16:44 -07:00
Christopher Whelan
7a9ae52550 [hypothesis] Deadline followup (#42842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42842

Test Plan: `buck test`

Reviewed By: thatch

Differential Revision: D23045269

fbshipit-source-id: 8a3f4981869287a0f5fb3f0009e13548b7478086
2020-08-11 15:33:23 -07:00
Christopher Whelan
5cd0f5e8ec [PyFI] Update hypothesis and switch from tp2 (#41645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41645

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1405

Test Plan: buck test

Reviewed By: thatch

Differential Revision: D20323893

fbshipit-source-id: 54665d589568c4198e96a27f0ed8e5b41df7b86b
2020-08-08 12:13:04 -07:00
Dinesh Govindaraj
f153b35b9b Shape inference for SparseToDense in ExpertCombiner
Summary: Adding shape inference for SpraseToDense. Proposal impl of shape inference only works when data_to_infer_dim is given, otherwise SpraseToDense output dimension depends on max value of input tensor

Test Plan:
buck test //caffe2/caffe2/python:sparse_to_dense_test
buck test //caffe2/caffe2/python:hypothesis_test -- test_sparse_to_dense

Dper3 Changes:
f204594813
buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test

Reviewed By: zhongyx12, ChunliF

Differential Revision: D22479511

fbshipit-source-id: 8983a9baea8853deec53ad6f795c874c3fb93de0
2020-07-15 08:04:48 -07:00
rohithkrn
df252c059c [ROCm] Skip caffe2 unique op test for rocm3.5 (#41219)
Summary:
unique op test failure in caffe2 blocks upgrading CI to rocm3.5.1. Skipping the test to unblock will re-enable after root causing and fixing the issue.
jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41219

Differential Revision: D22471452

Pulled By: xw285cornell

fbshipit-source-id: 9e503c8b37c0a4b92632f77b2f8a90281a9889c3
2020-07-09 20:00:29 -07:00
Hongzheng Shi
f6b0fbe2c5 topk tensor k support (#39407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39407

 - support passing a single element tensor as k for topk module
 - support passing a single element tensor to constant fill output

Test Plan:
buck test dper3/dper3/modules/tests:core_modules_test -- test_topk_gating_without_split_examples_tensor_k
buck test caffe2/caffe2/python:hypothesis_test -- test_constant_fill_from_tensor

Reviewed By: huayuli00

Differential Revision: D21843739

fbshipit-source-id: 0c5f5c03e9f57eeba40c0068784625164c2527ec
2020-06-15 13:10:20 -07:00
Nikita Shulga
e2a178ca21 Update cafe2 hypothesis_test_util to support hypothesis-5 (#39498)
Summary:
Extracting forward-backward `hypothesis` interface update  parts of https://github.com/pytorch/pytorch/pull/39430 into a separate PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39498

Differential Revision: D21900210

Pulled By: malfet

fbshipit-source-id: 75e637cf839f49dc141d37e1686ce45ff4721245
2020-06-05 08:27:50 -07:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Lu Fang
c89340f068 Extend HasElements to support multiple inputs (#28717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28717

Make HasElements support multiple inputs. Any input has element, then return true.

Test Plan: to be added

Reviewed By: BIT-silence

Differential Revision: D17972759

fbshipit-source-id: 3ecdea74a30fcfaaa6490fef1debc6cde68db922
2019-10-27 23:00:07 -07:00
Junjie Bai
a7eb18e243 Enable Unique operator tests on ROCm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26046

Differential Revision: D17331522

Pulled By: bddppq

fbshipit-source-id: 729624d1df15a1c0c7ba2b7e7e3c3a903fb13abf
2019-09-11 16:36:14 -07:00
Andrey Malevich
d58059bc6f Fix SliceGradientOp to handle properly empty batches (#23784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23784

Backward path does nothing during the gradient path when the input as empty, as
a result workspace can preserve gradient values from previous iteration and get
inconsistent inputs for some of the backward pass operators. This diff should
fix this disrepancy by always reinitializing output during the backward path.

Reviewed By: dzhulgakov

Differential Revision: D16646096

fbshipit-source-id: 8ca68dfad17a63fc87c033cce7b36b40bd77245c
2019-08-06 02:43:32 -07:00
Your Name
99674eb86f Re-enable test_dag_net_forking on ROCm (#21013)
Summary:
Fixes #16229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21013

Differential Revision: D15515824

Pulled By: bddppq

fbshipit-source-id: 23a6c7eaad6129328c6b9dfcc55ac2d31a6d2dc0
2019-05-28 12:12:53 -07:00
rohithkrn
aa88c2c0b6 Unify gpu_support variable in python tests (#16748)
Summary:
Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748

Differential Revision: D13983132

Pulled By: bddppq

fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f
2019-02-07 00:29:51 -08:00
peter.yeh@amd.com
10cd9d5a03 Skip dag_net_forking test on Rocm (#16639)
Summary:
-Skip the test due to flaky behavior on AMD/Rocm
-The fix is expected in Rocm 2.2 ( HSA runtime)
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16639

Differential Revision: D13915231

Pulled By: bddppq

fbshipit-source-id: 66e1d275836337170b15ceb9d60cfdd3242d4df8
2019-02-01 00:53:54 -08:00
Yangqing Jia
da73d709a8 Remove unsafecoalesce op (#12897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12897

UnsafeCoalesce Op is used during memonger days when we try to coalesce operators
for better efficienct computation kernels. It creates a little bit of an unsafe
underlying memory storage pattern.

With the new tensor unification I am not sure if it is still safe for us to do
so, so I propose we delete it for the sake of safety.

Reviewed By: bddppq, ilia-cher

Differential Revision: D10475980

fbshipit-source-id: b1a838c9f47d681c309ee8e2f961b432236e157e
2018-10-22 09:42:26 -07:00
Will Feng
cdead5ace1 Enable CircleCI for Linux jobs (#12389)
Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.

After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389

Differential Revision: D10224267

Pulled By: yf225

fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
2018-10-08 17:09:37 -07:00
Jongsoo Park
29610621ec 64B align for avx512 (#11748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748

For avx512, we need to align at a multiple of 64B not 32B
Regardless of avx512, it's in general a good idea to be cache line aligned.

Reviewed By: ilia-cher

Differential Revision: D9845056

fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512
2018-09-17 14:08:31 -07:00
Will Feng
c9e66351a7 Port all PyTorch and Caffe2 jobs to CircleCI (#11264)
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.

Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect

Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264

Differential Revision: D9656793

Pulled By: yf225

fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
2018-09-05 16:28:11 -07:00
rohithkrn
f5910c8a36 Add MIOPEN recurrent operator (#10840)
Summary:
The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test.
bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840

Differential Revision: D9518980

Pulled By: bddppq

fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f
2018-08-27 15:39:56 -07:00
Xiuyan Ni
db96a0951f Add SIMD version to GFTRL optimizer (#9698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698

Add SIMD version to GFTRL optimizer

Differential Revision: D8949723

fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26
2018-07-30 15:27:24 -07:00
Junjie Bai
bdbbcf068a Temporarily disable test_unique on rocm since it keeps running into segfault (#9872)
Summary:
petrex

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872

Reviewed By: ezyang

Differential Revision: D9013335

Pulled By: bddppq

fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e
2018-07-26 08:34:00 -07:00
Junjie Bai
7af5883860 Eanble python tests on ROCM (#9616)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616

Differential Revision: D8960623

Pulled By: bddppq

fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993
2018-07-24 11:37:58 -07:00
Xiuyan Ni
4e5369349f Add FTRL Optimzier with Group Lasso regularizer (#9074)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9074

Implement an optimzier based on FTRL Optimzier which support Group
Lasso regularizer.

The relevant paper list for this optimizer:
1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf,
2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf

Differential Revision: D8623146

fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452
2018-07-06 13:41:00 -07:00
bddppq
bc4feab3e3
Fix flaky atomic iter test (#7649) 2018-05-17 21:17:29 -07:00
Paul Jesse Hellemn
b875fb281c
Update from facebook (#7451)
* [bootcamp] Improve "Shape" operator to support axes specification

To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.

* Back out "Add barrier net that runs before training nets"

Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.

* Change warning to verbose log to reduce log spam

The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.

* Extract the shared code from different caffe2_benchmark binaries

The OSS benchmark and Internal benchmark will share most functions in the benchmark.

* Support MFR in sequence training

As titled.

* Make knowledge distillation work with using logged prediction feature as teacher label.

1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label

* [C2/CUDA]: unjoined cross entropy sigmoid

as desc

* Add async_scheduling executor into deferrable_net_exec_test

Add async_scheduling into tests and fix some exception cases

* Fix Event disabled error

When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync

* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA

as desc.

* [C2 Core] Infer input device option in C2 hypothesis_test checkers

Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.

For example:
CopyCPUToGPU

* [C2 Op]SplitByLengthsOp CPU/GPU implementation

[C2 Op]SplitByLengthsOp CPU/GPU implementation

* fix undefined symbol error

not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now

* Add tools in DAIPlayground platform to help debugging models

Add additional tools to allow Plauground override individual method defined in AnyExp.  This will allow user to create module that specificly change certain default method behavior.  An example included in this diff is deactivating test model and checkpointing.  When debugging any model problems, switching off components helps me quickly narrow down the location of the bug.  The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)

* add shape and type inference for int8 conversion operator

* Fix flaky test for group_norm

Fix flaky test for group_norm

* Fix group_norm_op_test flaky

Fix group_norm_op_test flaky

* Implementation of composite learning rate policy

In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.

For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.

* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader

# Use Cases:

1). input: DB file -> output: DatasetReader.

Use DBFileReader.

2). input: Reader -> build cache DB file -> output: DatasetReader.

Use CachedReader.

# Changes to CachedReader:

1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.

# Changes to tests:

1). Make a separate TestCase class for CachedReader and DBFileReader.

2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.

3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.

* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"

Original commit changeset: 4489c6133f11

* Fix LARS bug

Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.

* [tum] support sparse init & add uniformFill option

as title

* Propagate exception for async nets

Capture the exception when an exception is thrown in async nets and re-throw it after wait().  This allows exceptions to be propagated up to the caller.

This diff was a part of D7752068.  We split the diff so that C2 core files changes are in a separate diff.

* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc

Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a

Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>

* [C2]ReluN Op

relu n op.

tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6

* Call destructor when assigning a blob value

* Add executor overrides

Add executor overrides flag to enable migration to async_scheduling executor

* Add barrier net that runs before training nets - attempt #2

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.

This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.

To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net.  Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.

* Handle empty nets in async_scheduling

Make sure we don't get stuck on empty nets

* use CUDA_ARCH for conditional compile

* [C2 fix] infer function for ensure_cpu_output_op

* Update group_norm test to reduce flaky test

* Fix lr_multiplier for GPU
2018-05-10 23:14:27 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
mlappelbaum
d11fc90317 Export atomic iter count (#2379)
* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Add axis to top_k_op. (#2416)

* Revert update on top_k_op

* Add axis to top_k_op

Add axis to top_k_op

* [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647)
a8e4648a7d

* [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617)
f4acf281ef

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Initialize cpuinfo in the thread pool

Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.

This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.

* Updated Python Op and Image Pre-Processing Pipeline tutorials && Added CIFAR-10 Part 1 tutorial (#2286)

* Updated Basics tutorial: (1) Added Python 3 support with __future__ statements; (2) Various grammatical/typo fixes and minor refactoring of Markdown

* Added Python 3 support and made minor typo fixes

* Added Python 3 support with future imports, refactored and corrected errors in Markdown, added comments

* Added Python 3 support with future imports, Added use of caffe_translator.py to translate downloaded .caffemodel file to .pb files

* Upgrades to Image Pre-Processing Pipeline tutorial

* Updated Python Op tutorial

* removed markdown with empty links

* Added Part 1 of an end-to-end CIFAR-10 tutorial

* Updated MNIST Dataset and Databases tutorial with python3 support and markdown fixes

* Tweaks to markup, less training iterations

* changed permissions of CIFAR10_Part1; typo corrections in Image_Pre-Processing_Pipeline

* Typo corrections in Multi-GPU Training tutorial

* sync Python_Op py_gen with the IPython notebook

* nit typo correction

* [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653)
5cb999ddc1

* [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657)
ecac1c1624

* Strip down onnx to only pb definitions in mobile build (#2426)

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count
2018-03-26 19:26:09 -07:00
Orion Reblitz-Richardson
0ea8964fd6 Revert "Export number of iterations of AtomicIterOp" (#2359)
* Revert "Use -DCMAKE_BUILD_TYPE=Release for local build by default"

This reverts commit 035c62081f6420405b9f1380cc5d21b4c6ae78f6.

* Revert "Export number of iterations of AtomicIterOp (#2338)"

This reverts commit 91b7a0cb48c6b079e2ca8fd5c26819a003937d76.
2018-03-21 16:11:29 -07:00
mlappelbaum
8346088094 Export number of iterations of AtomicIterOp (#2338)
* Exported AtomicIterOp count

* Exported AtomicIterOp count
2018-03-21 12:39:30 -07:00
Orion Reblitz-Richardson
6aa087d902 Revert "export num iterations of AtomicIter"
This reverts commit be9c8e5591f5d38131b9bdc2249542f27dadc221.
2018-03-20 13:34:22 -07:00
Matan Appelbaum
fac306d3c9 export num iterations of AtomicIter
as title.  Useful for tracking number of EASGD updates.
2018-03-20 13:34:22 -07:00
Orion Reblitz-Richardson
5c381bbc57 Patch cuda-convnet2 from internal Facebook changes.
* Unfortunately this needs to be manually monkey patched.
* This should get it so GitHub and fbcode versions match.
2018-02-28 14:20:48 -08:00
Pieter Noordhuis
52fa742c51 Revert D6893040: Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent).
Summary:
This reverts commit 30f614beea6f859fee25ce4f85573142885dde45

bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files

Differential Revision:
D6893040

Original commit changeset: 30f614beea6f

fbshipit-source-id: 5e98a24699088283f864efe31234874bdacbe3c3
2018-02-14 10:34:08 -08:00
Maxim Naumov
f7cc8e8822 Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent).
Summary: The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu.

Reviewed By: houseroad

Differential Revision: D6893040

fbshipit-source-id: 30f614beea6f859fee25ce4f85573142885dde45
2018-02-13 17:46:35 -08:00
Huazhong Ning
90543ff13a weighted sampling reader dequeue outputs table index
Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob.

Reviewed By: kennyhorror

Differential Revision: D6621070

fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b
2018-01-24 19:06:25 -08:00