Commit Graph

1443 Commits

Author SHA1 Message Date
Richard Barnes
c44300884e Clarify timing of GetDeviceProperty() (#46715)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715

Test Plan: N/A

Reviewed By: ezyang

Differential Revision: D24455538

fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009
2020-10-22 11:29:31 -07:00
Tristan Rice
0c9787c758 caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987

This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb)

For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current.

Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions.

Reviewed By: dzhulgakov

Differential Revision: D23219710

fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814
2020-10-16 16:08:35 -07:00
Tristan Rice
dd169ca17c caffe2/plan_executor: propagate exceptions from reporter substeps (#46424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46424

Currently if an exception occurs in a reporter thread the process is killed via std::terminate. This adds support for handling the reporter exception if FLAGS_caffe2_handle_executor_threads_exceptions is set to true.

Test Plan: buck test mode/opt -c python.package_style=inplace //caffe2/caffe2/python:hypothesis_test //caffe2/caffe2:caffe2_test_cpu -- --stress-runs 100

Reviewed By: dahsh

Differential Revision: D24345027

fbshipit-source-id: 0659495c9e27680ebae41fe5a3cf26ce2f455cb3
2020-10-16 12:28:57 -07:00
Hao Lu
16c52d918b [caffe2] Bypass memonger for in-place ops (#46378)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46378

Reviewed By: dzhulgakov

Differential Revision: D24236604

fbshipit-source-id: 9f599687467ea969e89243482f8e2a41f7db0a23
2020-10-15 16:03:52 -07:00
Danny Huang
85c3ba5588 [caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46110

## Motivation
* `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145).
* We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor.

## Summary
* Added PlanExecutorTest `ErrorPlanWithCancellableStuckNet` for plan executor.
* Set cancelCount to zero at the beginning of tests to avoid global state be carried over in some test environment.

Test Plan:
## Unit Test Added

```
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 1000
```

Reviewed By: d4l3k

Differential Revision: D24226577

fbshipit-source-id: c834383bfe6ab50747975c229eb42a363eed3458
2020-10-12 12:00:15 -07:00
Danny Huang
87226f72d2 [caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46080

temp removal of ErrorPlanWithCancellableStuckNet, will fill out more

Test Plan:
```
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest
```
remove a test

Reviewed By: fegin

Differential Revision: D24213971

fbshipit-source-id: e6e600bad00b45c726311193b4b3238f1700526e
2020-10-08 23:35:45 -07:00
Danny Huang
487624e369 [caffe2] plan executor error propagation test with blocking cancellable op (#45319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45319

## Motivation
* `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145)
* We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor.

## Summary
* Added `ErrorPlanWithCancellableStuckNet` for plan executor.
* We set a plan with two nets: one stuck net with blocking operator that never returns, and one with error
  net with error op that throws, and tested it throw and cancel.

Test Plan:
## Unit Test added
```
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100
```
```
Summary
  Pass: 400
  ListingSuccess: 2
```

Reviewed By: d4l3k

Differential Revision: D23920548

fbshipit-source-id: feff41f73698bd6ea9b744f920e0fece4ee44438
2020-10-08 19:54:49 -07:00
Tristan Rice
59e4803b94 Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981

This is a recommit of previously reverted D20850851 (3fbddb92b1).

TL;DR - combining condition_variables and atomics is a bad idea

https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock

This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them.

Test Plan:
buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120
  buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100
  buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100

no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/

will ensure no timeouts in OSS

Reviewed By: walterddr, dahsh

Differential Revision: D24165505

fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196
2020-10-08 14:17:30 -07:00
Rong Rong
1bb2d41b68 Revert D20850851: caffe2/plan_executor: wait for 1 minute after exception and then abort
Test Plan: revert-hammer

Differential Revision:
D20850851 (3fbddb92b1)

Original commit changeset: 330503775d80

fbshipit-source-id: 612c6c3c4d5586bc8ad00a112cd00fc74fb44243
2020-10-07 09:04:24 -07:00
Tristan Rice
3fbddb92b1 caffe2/plan_executor: wait for 1 minute after exception and then abort (#45297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45297

If we have two concurrent substeps and one of them throws an exception and the other is blocking, we'll currently hang. This waits up to 1 minute for it to complete before terminating the process.

Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100

Reviewed By: dahsh

Differential Revision: D20850851

fbshipit-source-id: 330503775d8062a34645ba55fe38e6770de5e3c7
2020-10-06 12:59:09 -07:00
Sebastian Messmer
2ac7de7d53 Remove hacky_wrapper from BackendSelect kernels (#44062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062

Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory,  and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch.

Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them.
ghstack-source-id: 112825789

Test Plan:
vs master: https://www.internalfb.com/intern/fblearner/details/216117032/

vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/

Reviewed By: ezyang

Differential Revision: D23484192

fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada
2020-09-25 09:04:03 -07:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Nikita Shulga
2ae74c0632 Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453)
Summary:
2nd attempt to land https://github.com/pytorch/pytorch/pull/44079

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453

Reviewed By: walterddr, seemethere

Differential Revision: D23619528

Pulled By: malfet

fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7
2020-09-11 16:27:47 -07:00
Danny Huang
2b8f0b2023 [caffe2] adds Cancel to OperatorBase and NetBase (#44145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44145

## Motivation

* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
  occurs we need to be able to safely stop all net execution so we can throw
  the exception to the caller.

## Summary
*  Adds `NetBase::Cancel()` to NetBase which iterates over the entire list of
   operators and call Cancel.
* Cancel on all ops was added to Net since there's nothing Asyc specific about it.
* `AsyncSchedulingNet` calls parent Cancel.
* To preserve backwards compatibility, `AsyncSchedulingNet`'s Cancel still calls
   `CancelAndFinishAsyncTasks` .
* Adds `Cancel()` to `OperatorBase`.

Reviewed By: dzhulgakov

Differential Revision: D23279202

fbshipit-source-id: e1bb0ff04a4e1393f935dbcac7c78c0baf728550
2020-09-11 12:50:26 -07:00
Wanchao Liang
d07a36e0c1 Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False
Test Plan: revert-hammer

Differential Revision:
D23490149 (15e99b6ff6)

Original commit changeset: a76382c30d83

fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38
2020-09-04 22:59:39 -07:00
Nikita Shulga
15e99b6ff6 Compile less legacy code when BUILD_CAFFE2 is set to False (#44079)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079

Reviewed By: walterddr

Differential Revision: D23490149

Pulled By: malfet

fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53
2020-09-04 20:04:21 -07:00
Jiakai Liu
3a0e35c9f2 [pytorch] deprecate static dispatch (#43564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564

Static dispatch was originally introduced for mobile selective build.

Since we have added selective build support for dynamic dispatch and
tested it in FB production for months, we can deprecate static dispatch
to reduce the complexity of the codebase.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23324452

Pulled By: ljk53

fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7
2020-08-27 14:52:48 -07:00
Natalia Gimelshein
d1d32003bb force pytorch tensors to contiguous before calling c2 ops
Summary: per title, makes c2 wrappers safer as contiguity of torch inputs is not guaranteed

Test Plan: covered by existing tests

Reviewed By: dzhulgakov

Differential Revision: D23310137

fbshipit-source-id: 3fe12abc7e394b8762098d032200778018e5b591
2020-08-24 23:04:13 -07:00
Sean Lynch
f80b695a75 Properly format db.h and db.cc (#43027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43027

Format db.h and db.cc using the default formatter.

This change was split off of D22705434.

Test Plan: Wait for sandcastle.

Reviewed By: rohithmenon, marksantaniello

Differential Revision: D23113765

fbshipit-source-id: 3f02d55bfb055bda0fcba5122336fa001562d42e
2020-08-24 18:29:45 -07:00
Tristan Rice
5e04bb2c1c caffe2: expose CPUContext RandSeed for backwards compatibility with external RNG (#43239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43239

This is an incremental step as part of the process to migrate caffe2 random number generator off of std::mt19937 and to instead use at::mt19937+at::CPUGeneratorImpl. The ATen variants are much more performant (10x faster).

This adds a way to get the CPUContext RandSeed for tail use cases that require a std::mt19937 and borrow the CPUContext one.

Test Plan: This isn't used anywhere within the caffe2 codebase. Compile should be sufficient.

Reviewed By: dzhulgakov

Differential Revision: D23203280

fbshipit-source-id: 595c1cb447290604ee3ef61d5b5fc079b61a4e14
2020-08-21 19:36:38 -07:00
Ehsan K. Ardestani
ecb9e790ed Remove excessive logging in plan_executor (#42888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42888

as title

Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file /mnt/public/ehsanardestani/temp/quant_eval_inputs_all.json

Reviewed By: amylittleyang

Differential Revision: D23066529

fbshipit-source-id: f925afd1734e617e412b0f171e16c781d13272d9
2020-08-11 23:57:17 -07:00
Ehsan K. Ardestani
a5af2434fe NVMified NE Eval
Summary:
This diff NVMifies the NE Eval Flow.
- It defines a `LoadNVM` operator which either
  - receives a list of nvm blobs, or
  - extracts the blobs that could be NVMified from the model.
- dumps NVMified blobs into NVM
-  and deallocates from DRAM
- NVMify the Eval net on dper and C2 backend

Specific NVMOp for SLS is pushed through different diffs.

Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file=/mnt/public/ehsaardestani/temp/small_model.json 2>&1 | tee log

Reviewed By: yinghai, amylittleyang

Differential Revision: D22469973

fbshipit-source-id: ed8379ad404e96d04ac05e580176d3aca984575b
2020-08-06 10:25:31 -07:00
Dmytro Dzhulgakov
06d978a9ad [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249

Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths.

Basic logic:

| Case | Call to device_count() | init_cuda, e.g. allocating tensor |
| -- | -- | -- |
| all good | non-zero | just works |
| no gpus | 0, no warning | throw exception with good message |
| driver issues | 0, produce warning | throw exception with good message |
| out of memory with ASAN | 0, produce warning| throw exception with ASAN message |

Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs.

Other clean up changes:
* cache device_count() always in a static variable
* move all asan macros in c10

Test Plan:
Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=):

```
print('before import')
import torch
print('after import')
print('devices: ', torch.cuda.device_count())
x = torch.tensor([1,2,3])
print('tensor creation')
x = x.cuda()
print('moved to cuda')
```

Reviewed By: ngimel

Differential Revision: D22824329

fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5
2020-08-05 11:39:31 -07:00
Rohith Menon
4e16be9073 [MemLeak] Fix memory leak from releasing unique ptr (#41883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41883

Fix memory leak from releasing unique ptr

Test Plan:
Tested serialization with and without the change.

Heap profile without change:
```
Welcome to jeprof!  For help, type 'help'.
(jeprof) top
Total: 7298.4 MB
  4025.2  55.2%  55.2%   4025.2  55.2% c10::alloc_cpu (inline)
  3195.3  43.8%  98.9%   3195.3  43.8% caffe2::SerializeUsingBytesOrInt32
    63.6   0.9%  99.8%     63.6   0.9% __gnu_cxx::new_allocator::allocate (inline)
     5.0   0.1%  99.9%      5.0   0.1% google::protobuf::RepeatedField::Reserve
     2.5   0.0%  99.9%      2.5   0.0% folly::aligned_malloc (inline)
     1.2   0.0%  99.9%      1.2   0.0% caffe2::detail::CopyFromProtoWithCast (inline)
     1.0   0.0%  99.9%      1.0   0.0% __new_exitfn
     1.0   0.0% 100.0%      1.0   0.0% std::_Function_base::_Base_manager::_M_init_functor (inline)
     0.5   0.0% 100.0%      0.5   0.0% folly::HHWheelTimerBase::newTimer (inline)
     0.5   0.0% 100.0%      0.5   0.0% std::__detail::_Hashtable_alloc::_M_allocate_node
```

Heap profile with change:
```
Welcome to jeprof!  For help, type 'help'.
(jeprof) top
Total: 6689.2 MB
  4025.2  60.2%  60.2%   4025.2  60.2% c10::alloc_cpu (inline)
  2560.0  38.3%  98.4%   2560.0  38.3% caffe2::::HugePagesArena::alloc_huge (inline)
    90.9   1.4%  99.8%     90.9   1.4% __gnu_cxx::new_allocator::allocate (inline)
     5.0   0.1%  99.9%      5.0   0.1% google::protobuf::RepeatedField::Reserve
     2.0   0.0%  99.9%      2.0   0.0% prof_backtrace_impl (inline)
     1.0   0.0%  99.9%     20.3   0.3% std::__cxx11::basic_string::_M_construct (inline)
     1.0   0.0%  99.9%      1.0   0.0% std::_Function_base::_Base_manager::_M_init_functor (inline)
     0.5   0.0%  99.9%      0.5   0.0% folly::UnboundedQueue::allocNextSegment (inline)
     0.5   0.0% 100.0%      0.5   0.0% folly::aligned_malloc (inline)
     0.5   0.0% 100.0%      0.5   0.0% __new_exitfn
```

Reviewed By: yinghai

Differential Revision: D22662093

fbshipit-source-id: d0b8ff1ed26c72b14bb02fb1146c51ef11a7e519
2020-07-22 16:54:19 -07:00
Stanislau Hlebik
b774ce54f8 remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:19:47 -07:00
Stanislau Hlebik
8fdea489af remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:17:03 -07:00
Lu Fang
b2e52186b9 Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41461

capacity is misleading, and we have many wrong uses internally. Let's rename to nbytes to avoid the confusion in future. Ultimately, we could remove this parameter if possible.
So far I haven't seen any case this capacity is necessary.

Test Plan: oss ci

Differential Revision: D22544189

fbshipit-source-id: f310627f2ab8f4ebb294e0dd5eabc380926991eb
2020-07-15 22:04:18 -07:00
Linbin Yu
df1f8a48d8 add null check for c2 tensor conversion (#41096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41096

The spark spot model had some issues in tensor conversion, see P134598596. It happens when we convert an undefined c10 tensor to caffe2 tensor.
This diff added a null check.

Test Plan: spark spot model runs without problem

Reviewed By: smessmer

Differential Revision: D22330705

fbshipit-source-id: dfe0f29a48019b6611cad3fd8f2ae49e8db5427e
2020-07-09 11:44:23 -07:00
Nikita Shulga
d1352192e2 Move OperatorBase::AddRelatedBlobInfo implementation to .cc file (#40844)
Summary:
If virtual function is implemented in header file, it's implementation will be included as a weak symbol to every shared library that includes this header along with all of it's dependencies.

This was one of the reasons why size of libcaffe2_module_test_dynamic.so  was 500Kb (AddRelatedBlobInfo implementation pulled a quarter of libprotobuf.a with it)

Combination of this and https://github.com/pytorch/pytorch/issues/40845 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40844

Differential Revision: D22334725

Pulled By: malfet

fbshipit-source-id: 836a4cbb9f344355ddd2512667e77472546616c0
2020-07-01 11:48:15 -07:00
Nikita Shulga
cbdf399fc6 Move OperatorSchema default inference function implementations to .cc… (#40845)
Summary:
… file

This prevents implementation of those functions(as lambdas) to be embedded as weak symbol into every shared library that includes this header.

Combination of this and https://github.com/pytorch/pytorch/pull/40844 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40845

Differential Revision: D22334779

Pulled By: malfet

fbshipit-source-id: 64706918fc2947350a58c0877f294b1b8b085455
2020-07-01 11:42:52 -07:00
Sean Lynch
64689c2474 Remove unecessary copy within blob serialization (#40096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40096

Declaring `tensor_proto` to be of type `auto` means that it will copy the entire `TensorProto` instead of just keeping a reference. This changes it to just use a const reference instead.

Test Plan:
Using the model loader benchmark to measure model loading performance:

### `tensor_proto` is of type `const auto&`
```
============================================================================
caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative  time/iter  iters/s
============================================================================
BlobProtoInt32DeserializationFloat16                        11.08ms    90.27
BlobProtoByteDeserializationFloat16             1509.73%   733.73us    1.36K
----------------------------------------------------------------------------
BlobProtoInt32DeserializationUInt8                          10.48ms    95.45
BlobProtoByteDeserializationUInt8               2974.57%   352.22us    2.84K
============================================================================
```

### `tensor_proto` is of type `auto`
```
============================================================================
caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative  time/iter  iters/s
============================================================================
BlobProtoInt32DeserializationFloat16                        13.84ms    72.26
BlobProtoByteDeserializationFloat16              658.85%     2.10ms   476.08
----------------------------------------------------------------------------
BlobProtoInt32DeserializationUInt8                          17.09ms    58.51
BlobProtoByteDeserializationUInt8               3365.98%   507.80us    1.97K
============================================================================
```

Reviewed By: marksantaniello

Differential Revision: D21959644

fbshipit-source-id: 6bc2dfbde306f88bf7cd4f9b14b95ac69c2e1b4d
2020-06-16 14:45:59 -07:00
Ilia Cherniavskii
01986e9890 Wait for all op types in SimpleNet (#39493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39493

Make sure we wait for all types, incl. async cpu ops

Test Plan: CI

Reviewed By: kennyhorror

Differential Revision: D21873540

fbshipit-source-id: 37875cade68e1b3323086833f8d4db79362a68e8
2020-06-11 13:00:34 -07:00
Dmytro Dzhulgakov
e46060701d [caffe2] Fix of initializing ATen's CUDA before using caching allocator (#39759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39759

Caffe2 has a mode where it uses PT's caching allocator. Somehow we were not calling the initialization explicitly.

Now, I have no idea why it worked before. Probably worth to run a bisect separately.

Reviewed By: houseroad

Differential Revision: D21962331

fbshipit-source-id: f16ad6b27a67dbe0bda93939cca8c94620d22a09
2020-06-09 17:25:42 -07:00
Natalia Gimelshein
9c19a12965 fix asserts in cuda code (#39047)
Summary:
Gets rid of some in-kernel asserts where they can be replaced with static_asserts
Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary
replaces host code `assert`s with `TORCH_INTERNAL_ASSERT`
Another group of asserts is in fractional max pooling kernels which should be fixed regardless https://github.com/pytorch/pytorch/issues/39044, the problems there are not just asserts.
I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39047

Differential Revision: D21750392

Pulled By: ngimel

fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719
2020-05-28 15:51:38 -07:00
Natalia Gimelshein
ba14a701dc restore proper cuda assert behavior with DNDEBUG (#38943)
Summary:
Per title. https://github.com/pytorch/pytorch/issues/32719 essentially disabled asserts in cuda kernels in release build. Asserts in cuda kernels are typically used to prevent invalid reads/writes, so without asserts invalid read/writes are silent errors in most cases (sometimes they would still cause "illegal memory access" errors, but because of caching allocator this usually won't happen).
We don't need 2 macros, CUDA_ALWAYS_ASSERT and CUDA_KERNEL_ASSERT because all current asserts in cuda kernels are important to prevent illegal memory accesses, and they should never be disabled.
This PR removes macro CUDA_ALWAYS_ASSERT and instead makes CUDA_KERNEL_ASSERT (that is commonly used in the kernels) an asserttion both in release and debug builds.
Fixes https://github.com/pytorch/pytorch/issues/38771
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38943

Differential Revision: D21723767

Pulled By: ngimel

fbshipit-source-id: d88d8aa1b047b476d5340e69311e65aff4da5074
2020-05-26 18:11:00 -07:00
Kurt Mohler
f9eb8824f1 Remove datatype from Storage and StorageImpl (#38870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870

* Removed dtype data member from StorageImpl
* Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Original PR: https://github.com/pytorch/pytorch/pull/38038

Reviewed By: albanD

Differential Revision: D21549645

Pulled By: ezyang

fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de
2020-05-21 15:26:08 -07:00
Ilia Cherniavskii
a94fb71b12 Memory profiling (#37775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37775

Adding memory usage into profiler table output

Test Plan:
BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py
develop install --cmake

```
import torch
import torchvision.models as models
model = models.resnet18()
inp = torch.randn(5, 3, 224, 224)

with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof:
    model(inp)

print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15))
```

```
---------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Name                         Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CPU Mem Total    Number of Calls  Input Shapes
---------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
resize_                      0.37%            577.936us        0.37%            577.936us        9.796us          339.03 Mb        59               [[0]]
empty                        0.69%            1.061ms          0.74%            1.139ms          5.556us          47.42 Mb         205              []
stride                       0.00%            0.853us          0.00%            0.853us          0.853us          19.53 Kb         1                [[5, 1000]]
empty_strided                0.01%            21.393us         0.02%            26.033us         5.207us          252 b            5                []
is_complex                   0.02%            37.425us         0.02%            37.425us         1.291us          208 b            29               [[]]
masked_select                0.04%            55.333us         0.06%            93.616us         46.808us         120 b            2                [[30], [30]]
conv2d                       0.01%            18.009us         9.62%            14.902ms         14.902ms         0 b              1                [[5, 3, 224, 224], [64, 3, 7, 7], [
convolution                  0.01%            12.436us         9.61%            14.884ms         14.884ms         0 b              1                [[5, 3, 224, 224], [64, 3, 7, 7], [
_convolution                 0.03%            52.381us         9.60%            14.871ms         14.871ms         0 b              1                [[5, 3, 224, 224], [64, 3, 7, 7], [
size                         0.00%            5.429us          0.00%            5.429us          0.339us          0 b              16               [[5, 3, 224, 224]]
contiguous                   0.00%            1.934us          0.00%            1.934us          0.967us          0 b              2                [[5, 3, 224, 224]]
_convolution_nogroup         0.02%            27.505us         9.57%            14.814ms         14.814ms         0 b              1                [[5, 3, 224, 224], [64, 3, 7, 7], [
_nnpack_available            0.02%            34.267us         0.02%            34.267us         1.713us          0 b              20               []
thnn_conv2d                  0.01%            13.274us         9.54%            14.771ms         14.771ms         0 b              1                [[5, 3, 224, 224], [64, 3, 7, 7], [
thnn_conv2d_forward          5.98%            9.264ms          19.02%           29.446ms         14.723ms         0 b              2                [[5, 3, 224, 224], [64, 3, 7, 7], [
---------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  -----------------------------------
Self CPU time total: 154.855ms
```

Reviewed By: ngimel

Differential Revision: D21384248

Pulled By: ilia-cher

fbshipit-source-id: 31359cce2aa06f6255ed1ad8c60d03cb640bfec3
2020-05-19 15:48:48 -07:00
Xiang Gao
5e2d8745c8 RIP CUDA <9.2: circleci, aten, and caffe2 (#36846)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36846

Test Plan: Imported from OSS

Differential Revision: D21620850

Pulled By: ngimel

fbshipit-source-id: 7ad1676a12f86250f301095ffc6f365a3b370f34
2020-05-18 13:41:05 -07:00
Allan Di Wu
d35ab0b7ae Fix CUDA memory management issues caused by not using PinnedCPUAllocator (#38066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38066

Increasing priority for PinnedCPUAllocator to make sure it is set when CUDA is enabled.

Test Plan: buck test mode/dev-nosan //vision/fair/detectron2/tests:test_export_caffe2 -- 'testMaskRCNNGPU \(test_export_caffe2\.TestCaffe2Export\)'

Reviewed By: ppwwyyxx

Differential Revision: D21465835

fbshipit-source-id: 643cff30d35c174085e5fde5197ddb05885b2e99
2020-05-07 21:52:00 -07:00
Ansha Yu
32329c3338 [nomni] fix outputs check to replaceSubgraph (#38005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38005

D21445887 runs into a dbgo build crash on this stack P130135519

It is because the assertion sg_inputs_copy.size() == 0 is too restrictive.
nn::getOutputs(sg) returns "output" nodes which can include any inputs
that have additional consumers that are not in the subgraph itself.
To fix, proposing to remove inputs from the output check.

Test Plan:
Run tests

Sanity canaries:
https://our.intern.facebook.com/intern/ads/canary/426498931666198610/
https://our.intern.facebook.com/intern/ads/canary/426498935267166205/

Reviewed By: bwasti

Differential Revision: D21445881

fbshipit-source-id: 419a4b1a230f0370619cea574403bfa114e56a7c
2020-05-07 19:58:15 -07:00
Edward Yang
fe88806784 Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893

Original commit changeset: 50746043acf3

Test Plan: sandcastle and ossci

Reviewed By: malfet, seemethere, ngimel

Differential Revision: D21416509

fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b
2020-05-05 22:43:15 -07:00
Nikita Shulga
9f060d3873 [Caffe2] Increase timing threshold to 50 ms on Windows (#37892)
Summary:
Helps prevent following accidental failures:
```
..\caffe2\core\parallel_net_test.cc:303
The difference between ms and 350 is 41, which exceeds kTimeThreshold, where
ms evaluates to 391,
350 evaluates to 350, and
kTimeThreshold evaluates to 40.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37892

Differential Revision: D21417251

Pulled By: malfet

fbshipit-source-id: 300cff7042e466f014850cc7cc406c725d5d0c04
2020-05-05 19:45:36 -07:00
Edward Yang
a2fc7f787a Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count
Test Plan: revert-hammer

Differential Revision:
D21171334

Original commit changeset: 37329a379de9

fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f
2020-05-05 16:36:15 -07:00
Kurt Mohler
3706803b60 Change StorageImpl to track byte count rather than element count (#37776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776

* Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl
* Changed numel() and set_numel() to nbytes() and set_nbytes()
* Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028

Differential Revision: D21171334

Pulled By: ezyang

fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8
2020-05-05 14:20:51 -07:00
Edward Yang
a058e938f9 Refactor error msg stack handling, add TORCH_RETHROW (#37101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37101

Fixes #36954.

The basic concept is to streamline the process of rethrowing
c10::Error with extra error information.  This is in a few
steps:

- I completely remodeled the Error data type and the internal
  invariants.  Instead of manually adding in newlines, the
  message stack formatting process is responsible for inserting
  newlines and spacing as necessary.  Call sites are then
  modified to respect the new API model.
- TORCH_RETHROW macro is added, which adds context to an error
  message and then rethrows it.

New internal assert failure looks like:

```
0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch.
Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first):
frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so)
frame #1: ...
```

Error message with context looks like:

```
This is an error
  This is context 1
  This is context 2
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21202891

Pulled By: ezyang

fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169
2020-05-04 11:56:45 -07:00
Edward Yang
efd8f70cac Make msg() and msg_with_backtrace() private (#37094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37094

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21202892

Pulled By: ezyang

fbshipit-source-id: d59e6bffabd90cc734056bdce2cd1fe63262fab8
2020-05-04 11:54:34 -07:00
cyy
2658bae570 use std::move (#34365)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34365

Differential Revision: D21349942

Pulled By: mrshenli

fbshipit-source-id: 4deb51cbb557501b43990ec7080c71a839cb5db9
2020-05-01 13:42:23 -07:00
Sebastian Messmer
4e976b9334 Remove callBoxedWorkaround (#36850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36850

Since now all unboxing happens after dispatch, which means that all c10 ops support unboxing, we can now use op.callBoxed() for all ops and don't need callBoxedWorkaround (which was going through the JIT registry) anymore.
ghstack-source-id: 102879558

Test Plan: waitforsandcastle

Differential Revision: D21102375

fbshipit-source-id: d1e041116563a9650d5a86b07eb96d217d8756f3
2020-04-24 23:13:31 -07:00
Nikita Shulga
e7a72bb0c6 Add nomnigraph include folder to Caffe2_GPU_INCLUDE (#37056)
Summary:
Because `caffe2/contrib/tensort` includes nomnigraph headers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37056

Test Plan: `cmake ../pytorch -DPYTHON_EXECUTABLE=/usr/bin/python3.7 -DCMAKE_BUILD_TYPE=RELWITHDEBINFO -DUSE_CUDA=YES -DBUILD_TEST=YES -DUSE_TENSORRT=YES -DTENSORRT_ROOT=$HOME/Downloads/TensorRT-7.0.0.11 -DCMAKE_CXX_COMPILER=/usr/bin/cuda-g++ -DCMAKE_C_COMPILER=/usr/bin/cuda-gcc -DUSE_MKLDNN=ON -G Ninja; ninja torch_cuda`

Differential Revision: D21178927

Pulled By: malfet

fbshipit-source-id: e1bed94fdb395ebfd6eb5d950ca378da77592531
2020-04-22 09:44:13 -07:00
Lu Fang
d933ec14ce [c10] Fix the hanlding for Caffe2 ops which return tensor list (#36841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36841

right now, all c2 ops's output will be unwrapped blindly. This is not correct, if we have a single tensor list returned.

Test Plan: buck test mode/dev-nosan mode/no-gpu //caffe2/caffe2/fb/python/operator_test:torch_integration_test

Reviewed By: alyssawangqq

Differential Revision: D21100463

fbshipit-source-id: 9f22f3ddf029e7da9d98008d68820bf7f8239d4f
2020-04-18 13:30:43 -07:00
Lin Yang
cc5befc461 [Format] format a few files (#35187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35187

When I touch these files, lint will always introduce some unintended change, to prevent it from happening, we need to format the code first.
change is generated by:
  arc f

Test Plan: integration test.

Differential Revision: D20587596

fbshipit-source-id: 512cf6b86bd6632a61c80ed53e3a9e229feecc2a
2020-04-17 14:30:01 -07:00
Edward Yang
dd64e738c5 Expunge TensorId from all DispatchKey names. (#36240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240

It's annoying, historical, and unnecessary (enum class is already
namespaced).  I did this codemod with:

```
git grep -l 'CPUTensorId' | xargs sed -i 's/CPUTensorId/CPU/g'
git grep -l 'CUDATensorId' | xargs sed -i 's/CUDATensorId/CUDA/g'
git grep -l 'VariableTensorId' | xargs sed -i 's/VariableTensorId/Autograd/g'
git grep -l 'HIPTensorId' | xargs sed -i 's/HIPTensorId/HIP/g'
git grep -l 'MSNPUTensorId' | xargs sed -i 's/MSNPUTensorId/MSNPU/g'
git grep -l 'XLATensorId' | xargs sed -i 's/XLATensorId/XLA/g'
git grep -l 'PrivateUse1_TensorId' | xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g'
git grep -l 'PrivateUse2_TensorId' | xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g'
git grep -l 'PrivateUse3_TensorId' | xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g'
git grep -l 'AutocastTensorId' | xargs sed -i 's/AutocastTensorId/Autocast/g'
git grep -l '_PreAutogradTensorId' | xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g'
git grep -l 'TESTING_ONLY_GenericWrapperTensorId' | xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g'
git grep -l 'TESTING_ONLY_GenericModeTensorId' | xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g'
```

Then I did a git grep for remaining TensorId occurrences, and manually
killed those (mostly in codegen, and some docs that needed updating).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929255

Pulled By: ezyang

fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d
2020-04-13 23:33:44 -07:00
Hao Lu
4d1ccafb4b [caffe2] Enable copying for caffe2::Tensor (#36468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36468

Since `caffe2::Tensor` is now refcounted, enabling copy constructor and the copy assignment operator should be fine.

Test Plan:
```
buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- TensorTest
```

AI/AF canaries with changes up to D20959214:

https://our.intern.facebook.com/intern/experiment_store/experiment/3298538636995/#commit1-commit2
https://our.intern.facebook.com/intern/experiment_store/experiment/2199027015376/#commit1-commit2

AI/AF canaries on this diff:
https://our.intern.facebook.com/intern/ads/canary/425960191574068914/
https://our.intern.facebook.com/intern/ads/canary/425960179835413033/

Reviewed By: yinghai

Differential Revision: D20985924

fbshipit-source-id: ead5f5ceff23d0adc06d598128de16a5533d767b
2020-04-13 21:41:52 -07:00
Tristan Rice
ce54f0d411 Back out "Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets" (#36172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36172

Original commit changeset: 3d7801613f86

D20449887 broke some OSS tests as the OSS export sync wasn't working correctly.

Test Plan:
Manually export latest version to OSS to trigger the tests

+ test plan in D20449887

verified onnx tests are passing in https://github.com/pytorch/pytorch/pull/36172

Reviewed By: andrewwdye

Differential Revision: D20902279

fbshipit-source-id: bc30fcc9f5cc8076f69a5d92675fd27455948372
2020-04-13 11:31:52 -07:00
Tristan Rice
90c7db8ae3 caffe2/core/plan_executor: add cancellation of async nets on error + propagate exceptions via std::exception_ptr for stack traces (#31966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31966

This has three parts:
* When `--caffe2_handle_executor_threads_exceptions` is set when a parallel execution step throws an exception it can hang waiting for async nets to finish. This adds cancellation code to cancel any async nets.
* This makes the exceptions returned from parallel workers pass a std::exception_ptr so the stack trace can be recorded with folly::SmartExceptionTracer.
* Define Cancel method at NetBase level to avoid pulling in unsupported AsyncSchedulingNet for fbandroid.

Test Plan:
Added unit tests for plan_executor

  buck test //caffe2/caffe2:caffe2_test_cpu
  buck test //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100

Reviewed By: boryiingsu

Differential Revision: D19320177

fbshipit-source-id: d9939fcea1317751fa3de4172dfae7f781b71b75
2020-04-09 14:38:18 -07:00
Nikita Shulga
0f34d648c8 Fix signed-unsigned warnings (RELAND) (#36224)
Summary:
This is a realand of https://github.com/pytorch/pytorch/pull/36196
Before the fix bazel spews following multi-line warning for every single caffe2 operator:
```
In file included from ./c10/util/logging_is_google_glog.h:50,
                 from ./c10/util/Logging.h:26,
                 from ./caffe2/core/logging.h:2,
                 from ./caffe2/core/blob.h:13,
                 from ./caffe2/core/operator.h:18,
                 from ./caffe2/sgd/adadelta_op.h:1,
                 from caffe2/sgd/adadelta_op.cc:1:
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]':
./caffe2/core/operator.h:192:5:   required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]'
./caffe2/core/operator.h:890:48:   required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]'
./caffe2/sgd/adadelta_op.h:87:5:   required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]'
./caffe2/sgd/adadelta_op.h:85:8:   required from here
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare]
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      |                                ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE'
  148 | #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
      |                                                     ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL'
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      | ^~~~~~~~~~~~~~~~~~~~
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36224

Test Plan: CI

Differential Revision: D20919506

Pulled By: malfet

fbshipit-source-id: b8b4b7c62dcbc109b30165b19635a6ef30033e73
2020-04-08 16:29:27 -07:00
Akshay Bhandary
83abd7ffbf Revert D20909696: [pytorch][PR] Fix signed-unsigned warnings
Test Plan: revert-hammer

Differential Revision:
D20909696

Original commit changeset: 16723355f473

fbshipit-source-id: e1cf6e9d42f852693549a94d7f5830196781f00e
2020-04-08 01:21:04 -07:00
Nikita Shulga
25fe27981f Fix signed-unsigned warnings (#36196)
Summary:
Otherwise, while bazel spews following multi-line warning for every single caffe2 operator:
```
In file included from ./c10/util/logging_is_google_glog.h:50,
                 from ./c10/util/Logging.h:26,
                 from ./caffe2/core/logging.h:2,
                 from ./caffe2/core/blob.h:13,
                 from ./caffe2/core/operator.h:18,
                 from ./caffe2/sgd/adadelta_op.h:1,
                 from caffe2/sgd/adadelta_op.cc:1:
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]':
./caffe2/core/operator.h:192:5:   required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]'
./caffe2/core/operator.h:890:48:   required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]'
./caffe2/sgd/adadelta_op.h:87:5:   required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]'
./caffe2/sgd/adadelta_op.h:85:8:   required from here
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare]
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      |                                ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE'
  148 | #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
      |                                                     ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL'
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      | ^~~~~~~~~~~~~~~~~~~~
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36196

Differential Revision: D20909696

Pulled By: malfet

fbshipit-source-id: 16723355f473379ba9da6d3c33bd561b9724800a
2020-04-07 21:31:01 -07:00
Edward Yang
459163b8eb Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets
Test Plan: revert-hammer

Differential Revision:
D20449887

Original commit changeset: 047fdf1bd52f

fbshipit-source-id: 3d7801613f86885c204f3946f3a52a855516faa3
2020-04-06 19:37:05 -07:00
Tristan Rice
8ef82fc2c9 [dt][caffe2] enable using smart exceptions in async nets (#34753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34753

This improves support for exceptions and capturing stack traces in caffe2 async nets. We generally want to use exceptions everywhere we can in order to preserve stack information. It also makes the exception timestamp more accurate so multiple exceptions at the same time can be correctly ordered.

Test Plan: Updated the tests to use the new error semantics + adds a test to ensure the stack is correctly propagated through deferrable async scheduling.

Reviewed By: andrewwdye

Differential Revision: D20449887

fbshipit-source-id: 047fdf1bd52fd7c7c1f3fde77df9a27ed9e288e7
2020-04-06 14:27:07 -07:00
Tristan Rice
676fc929b7 [caffe2] fix type and shape inference for common gradient ops (#35857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35857

This fixes a lot of common ops for InferBlobShapesAndTypes as well as adds support for testing the inferred shapes and types of gradient ops.

Ops:
* Concat
* Split
* LeakyReLU
* Relu
* Prelu
* Gelu
* Elu
* Sinh, Tanh, Cosh
* Abs
* ... and a number of other simple element wise ops

Test Plan:
Added support to hypothesis test to check the shape and type of gradient ops.

Enabled it for all the ops I fixed the shape and type inference for.

  buck test caffe2/caffe2/python/operator_test:

Reviewed By: pradeepd24

Differential Revision: D20806284

fbshipit-source-id: 77f796d9ff208e09e871bdbadf9a0a7c196b77f2
2020-04-02 11:17:04 -07:00
Nikita Shulga
16774f7353 Increase TimerTest tolerance to 20% on Windows (#35818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35818

Test Plan: CI

Differential Revision: D20798424

Pulled By: malfet

fbshipit-source-id: 57e8d9c6b93903a6632168a4a35bf946d8c518aa
2020-04-01 14:29:05 -07:00
Edward Yang
3f3b96b1f8 Revert D20735881: [pytorch][PR] [WIP] [reland][pytorch][PR] Fix some incorrect annotation…
Test Plan: revert-hammer

Differential Revision:
D20735881

Original commit changeset: d21e940380f0

fbshipit-source-id: fb50a099320bfac92c9b8e1ca12cdc50d302342f
2020-03-30 12:28:27 -07:00
peter
e7a37823b0 [WIP] [reland][pytorch][PR] Fix some incorrect annotation… (#35588)
Summary:
…s found by clang-cl"

This reverts commit a9b540d109.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35588

Differential Revision: D20735881

Pulled By: ezyang

fbshipit-source-id: d21e940380f0c1b9b9b84e9cc892985fd3ad0ac3
2020-03-30 11:42:19 -07:00
Nikita Shulga
a9b540d109 Revert D20670031: [pytorch][PR] Fix some incorrect annotations found by clang-cl
Test Plan: revert-hammer

Differential Revision:
D20670031

Original commit changeset: cd8018dee703

fbshipit-source-id: 6900bf46346f0f415812607e5eff67259fc7b478
2020-03-27 18:26:01 -07:00
peter
45c9ed825a Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521)
Summary:
Running commands:
```bash
shopt -s globstar

sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake.in
```
We may further convert all the commands into lowercase according to the following issue: 77543bde41.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521

Differential Revision: D20704382

Pulled By: malfet

fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80
2020-03-27 14:25:17 -07:00
peter
0c16cedafe Fix some incorrect annotations found by clang-cl (#35364)
Summary:
Fixes incorrect usages of symbol annotations including:
1. Exporting or importing a function/class in an anonymous namespace.
2. Exporting or importing a function/class implementation in a header file. However, by removing the symbol annotations, they are now local symbols. If they need to be remain global, I can move the implementations to the source file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35364

Differential Revision: D20670031

Pulled By: ezyang

fbshipit-source-id: cd8018dee703e2424482c27fe9608e040d8105b8
2020-03-27 10:40:04 -07:00
Linbin Yu
93065ff767 [1] add missing header for C10_EXPORT_CAFFE2_OP_TO_C10 (#35245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35245

add missing header file for C10_EXPORT_CAFFE2_OP_TO_C10_CPU micro

(Note: this ignores all push blocking failures!)

Test Plan: buck build -c caffe2.expose_op_to_c10=1 //xplat/caffe2:mask_rcnn_opsAndroid

Reviewed By: dreiss

Differential Revision: D20528761

fbshipit-source-id: 7cd186ba72964c2e193aca994f87a91a71c3c5d7
2020-03-24 22:16:03 -07:00
Nikita Shulga
6f737dd4a3 Fix signed-unsigned warnings (#34791)
Summary:
And few typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34791

Test Plan: CI

Differential Revision: D20524879

Pulled By: malfet

fbshipit-source-id: 58fa03bd6356979e77cd1bffb6370d41a177c409
2020-03-19 00:29:56 -07:00
Nikita Shulga
a3de359464 Do not throw from CUDAContext destructor (#34756)
Summary:
Throwing from destructor leads to undefined behaviour (most often to segault)
So it's better to leak memory then segault
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34756

Test Plan: Run `test_pytorch_onnx_caffe2`

Differential Revision: D20504228

Pulled By: malfet

fbshipit-source-id: 7a05776fea9036f602e95b8182f8493cb5886dab
2020-03-18 00:13:18 -07:00
Nikita Shulga
e70c28856f [Caffe2] Move more method implementations from tensor.h to tensor.cc (#34811)
Summary:
To speed up compilation time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34811

Test Plan: CI

Differential Revision: D20476992

Pulled By: malfet

fbshipit-source-id: 922cde93783fbfc04854851d7a05a635d5239792
2020-03-16 22:15:18 -07:00
Nikita Shulga
ef78fa8668 caffe2::OperatorBase do not need to be aware of at::Tensor functions (#34810)
Summary:
Replacing <ATen/core/Tensor.h> with <<ATen/core/TensorBody.h> speeds up compilation of caffe2 operators by 15%
For example, it reduces pool_op.cu compilation from 18.8s to 16s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34810

Test Plan: CI

Differential Revision: D20472230

Pulled By: malfet

fbshipit-source-id: e1b261cc24ff577f09e2d5f6428be2063c6d4a8b
2020-03-16 12:58:05 -07:00
Linbin Yu
2fe7fc681d [PT] add macro to expose caffe2 ops to PyTorch mobile (#34578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34578

Right now C10_EXPORT_CAFFE2_OP_TO_C10_CPU didn't work on mobile since we disabled some code paths. This diff added a new macro to enable these code paths so we can register caffe2 ops in PT mobile.

Test Plan:
verified caffe2 ops are registered in PT mobile
(on the whole stack)

```
_caffe2::BBoxConcatBatchSplits(Tensor[] input_list, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output)
_caffe2::BBoxTransform(Tensor rois, Tensor deltas, Tensor im_info, float[] weights, bool apply_scale, bool rotated, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1)
_caffe2::BoxWithNMSLimit(Tensor scores, Tensor boxes, Tensor batch_splits, float score_thresh, float nms, int detections_per_im, bool soft_nms_enabled, str soft_nms_method, float soft_nms_sigma, float soft_nms_min_score_thres, bool rotated, bool cls_agnostic_bbox_reg, bool input_boxes_include_bg_cls, bool output_classes_include_bg_cls, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor scores, Tensor boxes, Tensor classes, Tensor batch_splits, Tensor keeps, Tensor keeps_size)
_caffe2::GenerateProposals(Tensor scores, Tensor bbox_deltas, Tensor im_info, Tensor anchors, float spatial_scale, int pre_nms_topN, int post_nms_topN, float nms_thresh, float min_size, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1)
_caffe2::HeatmapMaxKeypoint(Tensor heatmaps, Tensor bboxes_in, bool should_output_softmax=True, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor keypoints)
_caffe2::ResizeNearest(Tensor X, str order, float width_scale, float height_scale, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor Y)
_caffe2::RoIAlign(Tensor features, Tensor rois, str order, float spatial_scale, int pooled_h, int pooled_w, int sampling_ratio, bool aligned, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor)

Reviewed By: dreiss

Differential Revision: D20128254

fbshipit-source-id: 49a837dddc431eb528b5c72ffdfe0d0131cd10b4
2020-03-11 19:15:14 -07:00
Rohith Menon
879a90b322 [ModelLoading] Use byte encoding for uint8, fp16 etc. instead of int32 (#34343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34343

Use byte encoding for uint8, fp16 etc. instead of int32 in TensorProto serialization/deserialization

tl;dr
- fp16 tensor deserialization 12x faster, serialized size 25% lower
- uint8 tensor deserialization 36x faster, serialized size 25% lower

Test Plan:
```
============================================================================
caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative  time/iter  iters/s
============================================================================
BlobProtoInt32DeserializationFloat16                        12.37ms    80.82
BlobProtoByteDeserializationFloat16             1125.46%     1.10ms   909.64
----------------------------------------------------------------------------
BlobProtoInt32DeserializationUInt8                          17.57ms    56.92
BlobProtoByteDeserializationUInt8               3629.45%   484.02us    2.07K
============================================================================
```

Reviewed By: yinghai

Differential Revision: D20137451

fbshipit-source-id: 8ed4be2286a6d4c7e134fcb0832f22bc645039a1
2020-03-06 11:58:30 -08:00
Artem Volkhin
75d29f8d3e Allow converting IValue to vector<string> (#34269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34269

follow up for https://github.com/pytorch/pytorch/pull/16519

Test Plan: unit tests

Reviewed By: houseroad

Differential Revision: D20261495

fbshipit-source-id: 947f3cbd469d9258ec2dbb36cb68efe15a3b19eb
2020-03-05 12:31:23 -08:00
Michael Ranieri
1702152ef9 fixup unit tests (#34105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105

make parallel_net_test.cc chronos conforming.
exclude gtest asserts that check thrown exceptions when exceptions are disabled.

Test Plan: CI green

Differential Revision: D20153525

fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0
2020-03-03 10:33:21 -08:00
cyy
8a14b41617 fix warnings reported by PVS (#33868)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33868

Differential Revision: D20169059

Pulled By: ailzhang

fbshipit-source-id: ec12226ae27ddd89fa5bacdd35151981ebfedcfd
2020-03-02 18:51:38 -08:00
Michael Ranieri
b874c039f6 Allow checking for cached module before asserting (#33954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33954

fixes caffe2/core/module_test.cc on windows
misc lint fixes.

Test Plan: CI green

Reviewed By: malfet

Differential Revision: D20153512

fbshipit-source-id: aeae84a028e26edd65c7218611e3c49a8d9bb8c0
2020-03-02 15:43:50 -08:00
Michael Ranieri
9239608037 fix windows clang attributes (#33959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959

make sure clang on windows uses correct attributes.
add support for cl.exe style pragma attributes

Test Plan: CI green

Differential Revision: D20153548

fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1
2020-03-02 13:20:51 -08:00
Igor Sugak
5dde8cd483 [caffe2] fix no matching function min/max Clang errors (#33563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563

When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used.

Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1].

1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20005795

fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696
2020-02-28 11:33:24 -08:00
Michael Suo
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
Yinghai Lu
a2f3c6c26f Call RandomNumberSeed() on-demand (#33539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539

We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`.

Test Plan:
unittests.

Canaries:
AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410
AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838
Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569

Reviewed By: ipiszy

Differential Revision: D19993190

fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff
2020-02-22 01:22:18 -08:00
Dmytro Dzhulgakov
e10aa6b72f Fix flaky DagNetTest unittest
Summary: The first run of the net is noisy sometimes - just run it twice.

Reviewed By: cheshen1

Differential Revision: D20039274

fbshipit-source-id: 639e65646bf52f3efe1ecd4bbcd0e413d9389b29
2020-02-21 16:08:04 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Zachary DeVito
7e3c438913 Renaming IValue List functions (#32093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093

toGenericListRef -> toListRef
isGenericList -> isList
toGenericList -> toList
toXListRef -> toXVector

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D19369767

Pulled By: zdevito

fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae
2020-01-17 15:17:45 -08:00
Yanghan Wang
9b6ec61bfd exposing CPU/GPU Copy ops (#32248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248

expose CPU/GPU copy ops

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test

Reviewed By: houseroad

Differential Revision: D19405856

fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e
2020-01-17 12:40:43 -08:00
Yinghai Lu
df514fd8c0 C++ C2/Glow operator unittest
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258

Test Plan:
```
 buck test glow/fb/test/numerics:fp16_op_test
```

Reviewed By: bddppq

Differential Revision: D19401786

fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9
2020-01-17 12:13:34 -08:00
Pavel Belevich
62b06b9fae Rename TensorTypeId to DispatchKey (#32154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154

TensorTypeId -> DispatchKey
	c10/core/TensorTypeId.h -> c10/core/DispatchKey.h
	c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp
	TensorTypeId::* -> DispatchKey::*
	TensorTypeId type_id -> DispatchKey dispatch_key
		type_id -> dispatch_key
	TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys
	RealTensorTypeId -> RealDispatchKey
TensorTypeSet -> DispatchKeySet
	TensorTypeIds -> DispatchKeys
	c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h
	c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp
	type_set() -> key_set()
	type_set_ -> key_set_
	typeSet -> keySet
ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard
IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard
LocalTensorTypeSet -> LocalDispatchKeySet
	c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h
	c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp
	tls_local_tensor_type_set -> tls_local_dispatch_key_set
	tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded
	tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded
	tls_is_tensor_type_id_included -> tls_is_dispatch_key_included
	tls_set_tensor_type_id_included -> tls_set_dispatch_key_included
MultiDispatchTensorTypeSet -> MultiDispatchKeySet
	multi_dispatch_tensor_type_set -> multi_dispatch_key_set
tensorTypeIdToBackend -> dispatchKeyToBackend
backendToTensorTypeId -> backendToDispatchKey
initForTensorTypeSet -> initForDispatchKeySet
inferred_type_set -> inferred_key_set
computeTensorTypeId -> computeDispatchKey
PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set
get_default_tensor_type_id -> get_default_dispatch_key
inferred_type_id -> inferred_dispatch_key
actual_type_id -> actual_dispatch_key
typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_
get_type_id() -> get_dispatch_key()
legacyExtractTypeId -> legacyExtractDispatchKey
extractTypeId -> extractDispatchKey

Test Plan: Imported from OSS

Differential Revision: D19398900

Pulled By: pbelevich

fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776
2020-01-15 11:16:08 -08:00
Zachary DeVito
14593f077f remove list specialization from ivalue (#30734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734

What are specialized lists?

The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types.
e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>.

Why do we have specialized lists?

When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>,
std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain
these same types. Conversion was just unwrapping the IValue, very easy and cheap.

What is the problem with specialized lists?

We end up with significant special cases through the compiler. Other types like Dict are not
specialized. So in the Pickler, for instance, there is a single piece of logic to handle
their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't
match Python, leading to problems along translation boundaries. Our pickle serialization
is slightly different than python, so it is harder to load objects from our IValue serialization
as Python values.

They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++
bindings to TorchScript. This would entail having a single torch::List class (untemplated)
that can be used to construct inputs. This is made much harder if the underlying ivalue needs
to be different depending on the type inside the list. The ideal case would be to have a constructor like

```
template<typename T>
List(std::vector<T> foo);
```

It would then set up the type tags correctly based on type T, without the need for passing tags.

Do specialized lists improve perf?

Not in a way we have been able to measure. Our major concern initially was having to translate
a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern
for aten::_convolution which takes a number of mostly-constant lists of integers. However,
when we measure the effect of actually having to do this conversion for an aten::_convolution,
it does not take measurable time (benchmark results below).
This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code.

What are the issues removing them?

This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly
the same. The only visible change is that toTensorListRef and family have turned into toTensorVector
because they now return by value a copy of the list as a vector.

Further PRs can then clean up the complexity issues that arose from speclization. This will likely
involve removing the isTensorList/isIntList functions, and refactoring the code that used them to
work generically. At some point we will also change serialization to no longer write specialized
lists in the pickle binary. This is forward incompatible, so will go in its own PR.

Benchmark:
```
import torch

import torch.nn as nn
import torch.nn.functional as F
import time

class MnistNet(nn.Module):
    def __init__(self):
        super(MnistNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, kernel_size=1)
        self.conv2 = nn.Conv2d(1, 1, kernel_size=1)

    def forward(self, x):
        for i in range(10):
            x = F.relu(self.conv1(x))
            x = F.relu(self.conv2(x))
        return x

model = MnistNet()
x = torch.rand(1, 1, 1, 1)
r = torch.jit.trace(model, x )
r(x)
r(x)
r(x)
r(x)
print(torch.jit.last_executed_optimized_graph())

while True:
    b = time.time()
    for i in range(100):
        r(x)
    e = time.time()
    print(e - b)
```

Results (no observable difference):

```
Before (actual conv)
0.13251137733459473
0.13260436058044434
0.13276338577270508
0.1327497959136963
0.13250041007995605
0.13270330429077148
0.13290190696716309
0.13265132904052734
0.13274288177490234
0.1326758861541748
0.13253355026245117
0.13254785537719727
0.13260746002197266
0.13285017013549805
0.13264012336730957
0.132490873336792
0.13280034065246582
0.13243484497070312
0.1325232982635498
0.1326127052307129
0.13264131546020508
0.13274383544921875
0.13298296928405762
0.1326909065246582
-------------------
After (actual conv)
0.13127517700195312
0.13150334358215332
0.13092470169067383
0.13102364540100098
0.13134360313415527
0.13155555725097656
0.13314104080200195
0.13151955604553223
0.13160037994384766
0.1315293312072754
0.13137340545654297
0.13148093223571777
0.131455659866333
0.1327371597290039
0.13134026527404785
0.13152337074279785
0.13151192665100098
0.13165974617004395
0.13403725624084473
0.13251852989196777
0.13135504722595215
0.1315624713897705
0.1317615509033203
0.1314380168914795
0.13157200813293457
--------------------

The following replace the convolution operator with a no-op, to show
that even if the conv op was made faster, then we still would not see
a difference:

Before (fake conv)
0.0069539546966552734
0.0069522857666015625
0.007120847702026367
0.007344722747802734
0.007689952850341797
0.007932662963867188
0.00761723518371582
0.007501363754272461
0.007532835006713867
0.007141828536987305
0.007174253463745117
0.007114410400390625
0.007071495056152344
------------------
After (fake conv)
0.007458209991455078
0.007337093353271484
0.007268190383911133
0.007313251495361328
0.007306575775146484
0.007468700408935547
0.0073091983795166016
0.007308483123779297
0.007538318634033203
0.007356882095336914
0.007464170455932617
0.007372140884399414
```

Test Plan: Imported from OSS

Differential Revision: D18814702

Pulled By: zdevito

fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6
2020-01-12 18:28:25 -08:00
Hong Xu
daf00beaba Remove duplicated Numa detection code. (#30628)
Summary:
cmake/Dependencies.cmake (1111a6b810/cmake/Dependencies.cmake (L595-L609)) has already detected Numa. Duplicated detection and variables may lead to
incorrect results.

Close https://github.com/pytorch/pytorch/issues/29968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628

Differential Revision: D18782479

Pulled By: ezyang

fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0
2020-01-03 08:48:46 -08:00
peterjc123
c4121ed8db Fix is_fundamental template for MSVC (#30959)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30932
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959

Differential Revision: D18891797

Pulled By: mingbowan

fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1
2019-12-19 12:10:22 -08:00
Tristan Rice
b0bd35ff13 caffe2/event: allow multiple errors such as when cancelled (#31335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335

When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well.

Typically we see:
1. SendOp failed due to a network error
2. async scheduling cancels all other ops via `SetFinished("Cancelled");`
3. Another SendOp fails due to a network error and crashes the process when the exception is thrown.

This changes caffe2 ops to allow failing twice.

Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu

Reviewed By: andrewwdye

Differential Revision: D19106548

fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9
2019-12-18 13:10:57 -08:00
Sebastian Messmer
643ca5def2 Replace c10::guts::stuff with std::stuff (#30915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915

Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609

Test Plan: waitforsandcastle

Differential Revision: D18869639

fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
2019-12-16 13:57:19 -08:00
Sebastian Messmer
409151e1bb Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917

This is a C++14 feature, we can use this now.
ghstack-source-id: 95255753

Test Plan: waitforsandcastle

Differential Revision: D18869637

fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b
2019-12-15 23:54:16 -08:00
Richard Zou
9047d4df45 Remove all remaining usages of BUILD_NAMEDTENSOR (#31116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116

Changelist:
- remove BUILD_NAMEDTENSOR macro
- remove torch._C._BUILD_NAMEDTENSOR
- remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR

Future:
- In the next diff, I will remove all usages of
ATen/core/EnableNamedTensor.h since that header doesn't do anything
anymore
- After that, we'll be done with the BUILD_NAMEDTENSOR removal.

Test Plan: - run CI

Differential Revision: D18934951

Pulled By: zou3519

fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d
2019-12-12 09:53:03 -08:00
Shunting Zhang
7f5f2e8871 add ZERO_COLLISION_HASH to caffe2 data type (#30912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30912

Add a new data type ZERO_COLLISION_HASH .

Test Plan: ci

Reviewed By: boryiingsu

Differential Revision: D18843626

fbshipit-source-id: b2d8280f13c78b4a656cf95822198df59de7b64c
2019-12-10 21:36:24 -08:00
Edward Yang
38986e1dea Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.

Some things of note:

* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18790941

Pulled By: ezyang

fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
2019-12-04 08:04:57 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Sebastian Messmer
aa2862b843 Hide the OperatorKernel* argument from the stack based kernel API (#29337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337

This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it.
But if a kernel is registered in a boxed way, we don't need it and should hide this from the API.
This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does.
Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so.
ghstack-source-id: 94481316

Test Plan: unit tests

Differential Revision: D18361991

fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492
2019-11-23 15:25:01 -08:00
Sebastian Messmer
583c288232 Add a OperatorHandle argument to boxed kernels (#29201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201

This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called.
ghstack-source-id: 94481313

Test Plan: I will add unit tests in a diff stacked on top

Differential Revision: D18282746

fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99
2019-11-23 15:24:49 -08:00
Junjie Bai
352731bd6e Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip}
Test Plan: revert-hammer

Differential Revision:
D18632773

Original commit changeset: ea717c81e0d7

fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d
2019-11-21 15:01:09 -08:00
Edward Yang
ec30d9028a Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.

Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18632773

Pulled By: ezyang

fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
2019-11-21 11:27:33 -08:00
Edward Yang
65bb34d885 Remove TensorImpl::is_variable, deprecate Tensor::is_variable (#29653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29653

I didn't remove is_variable from Tensor for BC reasons, but I did
remove as many uses as I could from the codebase.
at::impl::variable_excluded_from_dispatch got moved to TensorBody.h
so that it's more widely accessible.

This diff is NOT semantics preserving.  Here are the major differences:

- In a number of native operator implementations, we tested that arguments
  are not variable.  I replaced these with asserts that variable is
  excluded from dispatch.  I actually don't think these asserts are really
  necessary now (they should certainly be true, but it's hard to get
  it wrong), but I've kept them for old time's sake.  At least, they'll detect
  if you call these functions before you've processed variable (indicating
  a bug in your kernel.)

- There are a number of places where we do a per-tensor test for being a
  variable, for better error reporting when someone commits Tensor/Variable
  confusion.  Although these tests are substantively the same as the
  tests above, in these cases I decided to *delete* the test entirely.
  The reasoning is that in these cases, we didn't really care about
  dispatch (also, see above; I'm not too sure we really need the dispatch
  asserts), we cared about Tensor/Variable confusion.  Since Tensor/Variable
  confusion is impossible now, we don't need the tests.  One of the key
  factors which pushed me one way or another was whether or not a function
  was doing per-tensor validation; if I kept the assert in such functions,
  I'd repeatedly access the TLS.  Even if we want to bring back the asserts,
  they would have to go somewhere else.

  Another similar idiom is the number of places we do !x.defined() ||
  x.is_variable(); I treated this equivalently.

- nuclear_norm's computation of compute_uv is a bit weird, but I think
  it's OK to just delete the is_variable case (I *suspect* that it is
  always the case that self.is_variable(), but it doesn't really matter.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18496168

Pulled By: ezyang

fbshipit-source-id: 5a1ded931e0c10a6b758ba64a8380d34110e0c3e
2019-11-14 11:41:02 -08:00
Christy Lee
b8dca04f73 Add error message if CUDA startup fails (#29670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670

This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included.

Test Plan: Build without gpu code.  Run the binary.  Check that the new error message exists.

Reviewed By: yfeldblum

Differential Revision: D18453798

fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e
2019-11-13 16:48:40 -08:00
Junjie Bai
b0c245d52d Consolidate the places that find pybind11 include dirs (#29659)
Summary:
Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659

Differential Revision: D18458208

Pulled By: bddppq

fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d
2019-11-12 14:51:56 -08:00
Yavuz Yetim
2704af0970 AsyncIf op implementation
Summary:
This diff adds the following:
- An AsyncIf to support conditional async execution. This op assumes that then_net and else_net are async scheduling nets. This op itself completes when every async op in the active net completes. Cancellation cancels the inner nets and the async ops.
- Unit tests targeting asynchronicity and error/cancellation handling.

Test Plan:
New unit tests

With --stress-runs=2000:
https://our.intern.facebook.com/intern/testinfra/testrun/4785074616784325

Reviewed By: ilia-cher

Differential Revision: D18051357

fbshipit-source-id: 1399a437b3ca63fd4ea0cf08d173f85b9242cc1f
2019-11-07 08:51:31 -08:00
Ilia Cherniavskii
7190789f58 Handling of failing and terminal async cpu ops (#29052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29052

Make sure we handle the case of multiple, async, terminal (no children)
and failing cpu ops.

Test Plan: AsyncIf tests

Reviewed By: yyetim

Differential Revision: D18276401

Pulled By: ilia-cher

fbshipit-source-id: 35b175dd025bc7e392056ac1331b159376a29e60
2019-11-04 12:01:21 -08:00
Yinghai Lu
c60bf2704a Support Offline Tensors through ONNXIFI layer
Summary:
Previous import was b2ec1a8041879b7be98d81387a14cae895f952f4

Included changes:
- **[97fe555](https://github.com/houseroad/foxi/commit/97fe555)**: Add deferred weight reader pointer when initializing the graph (#15) <Yinghai Lu>
- **[ba2faf7](https://github.com/houseroad/foxi/commit/ba2faf7)**: Add status and timeout to events (#14) <Jack Montgomery>

Test Plan: kicksandcastle

Reviewed By: ipiszy

Differential Revision: D18231697

fbshipit-source-id: 7566e2438d2b57f0feaadcd51f55a03552adeab9
2019-10-31 10:33:42 -07:00
Sebastian Messmer
bb0e46b65a Remove preallocation of type ids (#28024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28024

We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id.
However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning.
caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType.

I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned
and remove the functionality for preallocated type ids. This simplifies our type ids.
ghstack-source-id: 92051872

Test Plan: unit tests

Differential Revision: D17936165

fbshipit-source-id: 2c9df2b9b3f35b3e319641c96638321ac3433d5c
2019-10-16 23:08:11 -07:00
Sebastian Messmer
d9de2e0ba9 Back out "Revert D17936166: [wip] Constexpr type ids" (#28155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28155

Original commit changeset: 92c63a96dedd
ghstack-source-id: 92051874

Test Plan: unit tests

Differential Revision: D17964410

fbshipit-source-id: 1d989d28b3e1de6d43c915f122f2b65a77a332eb
2019-10-16 18:24:04 -07:00
Lu Fang
1819fade35 Revert D17936166: [wip] Constexpr type ids
Test Plan: revert-hammer

Differential Revision:
D17936166

Original commit changeset: 68cfa926c721

fbshipit-source-id: 92c63a96dedd8764e342c6437c6ea308d93d29b2
2019-10-16 06:47:10 -07:00
Sebastian Messmer
9cc4405dc9 Constexpr type ids (#28023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28023

ghstack-source-id: 91987335

Test Plan: waitforsandcastle

Differential Revision: D17936166

fbshipit-source-id: 68cfa926c721e5fbc96e083eb47e784bf34a9df4
2019-10-15 21:21:20 -07:00
Sebastian Messmer
ef8bcfe2c7 Revert D17488861: constexpr type ids
Test Plan: revert-hammer

Differential Revision:
D17488861

Original commit changeset: ce7b059d7c86

fbshipit-source-id: 426fca9abe7122190fc17ac6976bc6bcbd5718df
2019-10-15 09:59:21 -07:00
Sebastian Messmer
1865f31efa Revert D17490109: Remove preallocation of type ids
Test Plan: revert-hammer

Differential Revision:
D17490109

Original commit changeset: 800c340d9d35

fbshipit-source-id: a3e39bbce53c828fe553379d9f2b66dc8a07c982
2019-10-15 09:59:17 -07:00
Sebastian Messmer
cf01f53b5a Remove preallocation of type ids (#26509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26509

We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id, see https://github.com/pytorch/pytorch/pull/10139.
However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning.
caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType.

I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned
and remove the functionality for preallocated type ids. This simplifies our type ids.
ghstack-source-id: 91896918

Test Plan: unit tests

Differential Revision: D17490109

fbshipit-source-id: 800c340d9d3556a99f6e3ffc33af14ad68d7cc59
2019-10-15 08:47:13 -07:00
Sebastian Messmer
6f865c1e37 constexpr type ids (#26502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26502

Create type ids at compile time instead of incrementing a counter at runtime. This is done by computing a compile time crc64 on the type name. We couldn't do this before, because we still used GCC4 and that compiler didn't support the use of `__PRETTY_FUNCTION__` in a constexpr context. However, since GCC5 this is possible and we can use this trick.

This does not change the semantics of preallocated type ids. I actually think we don't need to preallocate anymore, but I split the removal of preallocation into a separate diff to be able to test it separately.

ghstack-source-id: 91896920

Test Plan: unit tests

Differential Revision: D17488861

fbshipit-source-id: ce7b059d7c8686b69cb091a4a8beaf4b96391343
2019-10-15 08:47:09 -07:00
Edward Yang
0b6186d778 Remove Tensor.h, TensorMethods.h from src/core. (#27086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086

This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past).

This is a commandeer of #25031

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D17687345

Pulled By: ezyang

fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f
2019-10-06 09:37:50 -07:00
Hao Lu
1f0328c6d4 Add randomFill to test_utils.h
Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests.

Test Plan:
```
buck run mode/opt //tvm/sparse:cblas_bench
```

Reviewed By: yinghai

Differential Revision: D17759193

fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1
2019-10-04 18:29:22 -07:00
Sebastian Messmer
ed207b53ab c10::KernelFunction (#26337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26337

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

ghstack-source-id: 90459911

Test Plan: unit tests

Differential Revision: D17416607

fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
2019-09-20 18:55:25 -07:00
Ansha Yu
e44ea6cd5e tvm operator dynolog (#26295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26295

Log the following in scuba caffe2_tvm_operator_stats:
1. everything in caffe2_operator_stats
2. fallback netdef
3. tvm module graph_json
4. whether compilation triggered this round
5. number of compilations stored in tvm_runtime_map
6. (not yet logged) last compilation time if any
7. (not yet logged) total bytes occupied by compilation
8. whether this compilation is fallback
9. batch size as observed by tvm op

Test Plan:
```
buck run mode/dbg //tvm/sparse:tvm_bbpredictor_benchmark -- --init_net ~/tmp/ads/84480054_204/init_net.pb --input_init_net ~/tmp/ads/84480054_204/input_init_net.pb --pred_net ~/tmp/ads/84480054_204/pred_net.pb --warmup 1000 --iter 1000 --num_cycles 5 --caffe2_logging_operator_dyno_sampling_rate=1 --vmodule=Logger=
2
```

Logs show up in the scuba:
https://our.intern.facebook.com/intern/scuba/query/?dataset=caffe2_tvm_operator_stats

https://fburl.com/scuba/lq2h22e4

Auto submitted adindexer canary:
https://our.intern.facebook.com/intern/ads/canary/421064436039494716
Additional adindexer canary:
https://our.intern.facebook.com/intern/ads/canary/421082681202831286/
Additional adfinder canary:
https://our.intern.facebook.com/intern/ads/canary/421082685084831037/

Reviewed By: yinghai

Differential Revision: D17358412

fbshipit-source-id: d2119c12ddeaa86217c163e32fb1e211952139f5
2019-09-18 18:37:17 -07:00
Andrey Malevich
28d3eb8156 Back out "Back out "[Caffe2] Fix device_option propagation"" (#25908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25908

Original commit changeset: f6e961e88c01

device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device).
This diff is trying to fix this issue.

Original diff had a problem, that Caffe2 is not handling cases when device option is present, but contains only metadata (for example one for auto-generated reduction ops in backward pass). This diff is addressing this issue by merging device options during the backward pass

Test Plan:
1. net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components.
2. New unit-test.
3. Verify that previously broken benchmark is now passing

ezyang do you have suggestions what else I should test?

Reviewed By: ezyang

Differential Revision: D17281528

fbshipit-source-id: 4a1bc386f29f6a34fbf8008effde9d4890abebfa
2019-09-17 04:01:36 -07:00
Sebastian Messmer
0e30e6570d Call aten ops through c10 dispatcher (#23668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668

- The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch().
- These operators aren't registered with globalAtenDispatch anymore, only on c10 now.
- Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them.

ghstack-source-id: 90130455

Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit#

Differential Revision: D16603133

fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82
2019-09-15 01:18:07 -07:00
Jiakai Liu
67c530851c get rid of protobuf dependencies (#25650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650

This PR removes protobuf dependencies from mobile build altogether:
- caffe2/proto: protobuf files, including caffe2.proto and torch.proto;
- caffe2 components that depend on caffe2.proto, including most part of
caffe2/core, caffe2/utils;
- libprotobuf / libprotobuf-lite dependencies;
- protobuf compiler;
- some utils class, e.g.: netdef_converter.cpp;
- introduce a macro to disable third_party/onnx which depends on protobuf;

Test Plan:
- builds;
- link with demo app to make sure it can load and run a model in pickle format;

Differential Revision: D17183548

Pulled By: ljk53

fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531
2019-09-06 08:48:20 -07:00
Jiakai Liu
a3d0abf729 move GetDimFromOrderString to caffe2/core/types.h (#25671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25671

To decouple string_utils.h from types.h and protobuf headers.
Logically GetDimFromOrderString seems to be more similiar to
StringToStorageOrder comparing to other string_utils functions.

Test Plan: - Will check all internal/external CI jobs.

Reviewed By: yinghai

Differential Revision: D17191912

Pulled By: ljk53

fbshipit-source-id: fe555feef27bfd74c92b6297c12fb668252ca9ff
2019-09-05 04:32:04 -07:00
Sebastian Messmer
791347642b Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888

This is an alternative to https://github.com/pytorch/pytorch/pull/23684.

Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed.
ghstack-source-id: 89357687

Test Plan: waitforsandcastle

Differential Revision: D16673569

fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf
2019-09-04 01:35:19 -07:00
iotamudelta
4fe857187c switch to rocThrust for thrust/cub APIs (#25620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602

Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option.

Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header.

Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust.

Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable.

Skip four tests that fail with the new rocThrust for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864

Reviewed By: xw285cornell

Differential Revision: D16940768

Pulled By: bddppq

fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5
2019-09-03 22:16:30 -07:00
Yinghai Lu
4edf77b6c0 Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (#25519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25519

Fuse  Gather-Fused8BitRowwiseQuantizedToFloat-Mul-LengthsSum opportunistically.

Test Plan:
```
buck test caffe2/caffe2/opt/custom:concat_elim_test
```

Reviewed By: dreamingleo

Differential Revision: D17125045

fbshipit-source-id: 8ee50410eb13a82e1e5c8180f392fce2fe9cd728
2019-09-03 19:08:49 -07:00
Edward Yang
58a0dee749 Replace open registration TensorTypeId with closed enum. (#25252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252

Our model going forward for extensions will be that you will have to
get an allocation of an ID in our system.  This is how things work
in practice today; we're just simplifying our underlying registration
since there is no need to have distributed registration.

There are some codemods in this diff:

```
codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId\(\)' 'TensorTypeId::\1TensorId'
codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined\(\)' 'TensorTypeId::UndefinedTensorId'
codemod --extensions cpp 'TensorType1\(\)' 'TensorTypeId::CPUTensorId'
codemod --extensions cpp 'TensorType2\(\)' 'TensorTypeId::CUDATensorId'
codemod --extensions cpp 'TensorType3\(\)' 'TensorTypeId::XLATensorId'
codemod --extensions cpp 'TensorType1' 'CPUTensorId'
codemod --extensions cpp 'TensorType2' 'CUDATensorId'
codemod --extensions cpp 'TensorType3' 'XLATensorId'
```

The main hand-written changes are in c10/core/TensorTypeId.h

Other manual fixes:

- aten/src/ATen/core/op_registration/op_registration.cpp - stop using
  std::string operator+
- aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that
  wasn't caught by codemod
- torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration
  of TensorTypeId
- aten/src/ATen/core/op_registration/ - remove out-of-line registration

Differential Revision: D17072001

Test Plan: ossci and sandcastle

Pulled By: ezyang

fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1
2019-08-29 08:55:58 -07:00
Lucian Grijincu
9c9f14029d Revert D16929363: Revert D16048264: Add static dispatch mode to reduce mobile code size
Differential Revision:
D16929363

Original commit changeset: 69d302929e18

fbshipit-source-id: add36a6047e4574788eb127c40f6166edeca705f
2019-08-20 17:08:31 -07:00
Lucian Grijincu
bd6cf5099b Revert D16048264: Add static dispatch mode to reduce mobile code size
Differential Revision:
D16048264

Original commit changeset: ad1e50951273

fbshipit-source-id: 69d302929e183e2da26b64dcc24c69c3b7de186b
2019-08-20 16:26:18 -07:00
Roy Li
6824c9018d Add static dispatch mode to reduce mobile code size
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22335

Test Plan: Imported from OSS

Differential Revision: D16048264

Pulled By: li-roy

fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e
2019-08-20 12:21:32 -07:00
Rui Zhu
5b0de85868 Register FC/Conv DNNLowp separately for supporting both tensor type (#24361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24361

Currently we only support Conv in kernel but have entrance for both type using one same class
It is time make change

Reviewed By: csummersea

Differential Revision: D16604713

fbshipit-source-id: b98d39a2c7960707cd50ba27e43dce73f741eeeb
2019-08-14 17:15:42 -07:00
Edward Yang
5ae909b443 Revert D15920763: Move TensorOptions to ATen/core
Differential Revision:
D15920763

Original commit changeset: c3429973180a

fbshipit-source-id: 0efb27722b371e1047f02240f071bc222b52e51d
2019-08-13 12:07:18 -07:00
Zachary DeVito
4a754dc3e3 cleanup warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24133

Test Plan: Imported from OSS

Differential Revision: D16746249

Pulled By: zdevito

fbshipit-source-id: 051f048b03043d6947544cd02ae44288bd439ef9
2019-08-12 16:12:30 -07:00
Richard Zou
bde73860c6 Move TensorOptions to ATen/core (#22020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22020
ghimport-source-id: 62766d49658ee84b8076c555432b50e13d104bc6

Test Plan: Imported from OSS

Differential Revision: D15920763

Pulled By: zou3519

fbshipit-source-id: c3429973180a65606da82face5c0ee377035e716
2019-08-12 07:41:12 -07:00
Supriya Rao
9223fa1c46 Add support to serialize qtensor in JIT. (#23356)
Summary:
Adds qtensor specific fields to the proto file so that they get serialized into the model.json

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356
ghstack-source-id: 87263428

Differential Revision: D16473237

fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b
2019-07-26 15:52:15 -07:00
Yinghai Lu
b964bdb53a Fbgemm fp16 tensor support (#23101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101

Support for
- Shape inference
- Tensor info extraction

Reviewed By: zrphercule

Differential Revision: D16345251

fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79
2019-07-19 17:08:03 -07:00
Yinghai Lu
2a8d5a132c Fix workspace destruction ordering (#23096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096

nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first.

Reviewed By: ajyu

Differential Revision: D16382987

fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7
2019-07-19 16:49:50 -07:00
Will Feng
3a12520844 Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473)
Summary:
As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach.

Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked.

This supersedes https://github.com/pytorch/pytorch/pull/22418.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473

Differential Revision: D16099042

Pulled By: yf225

fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972
2019-07-08 11:31:10 -07:00
Jongsoo Park
9db7bc8bc7 fix uninitialized variable warning (#22477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477

There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together.

Reviewed By: hx89

Differential Revision: D16100211

fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea
2019-07-06 00:36:45 -07:00
Sebastian Messmer
ed60d9fcf9 List/Dict remember and check their element type (#22005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005

When a Dict or List is created with type information, it will remember that.
If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type.

Differential Revision: D15914462

fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7
2019-07-05 15:17:51 -07:00
Sebastian Messmer
e68dc899d1 Fix compiler warnings (#22162)
Summary:
Fix various compiler warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162

Differential Revision: D16085339

Pulled By: smessmer

fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d
2019-07-02 14:12:55 -07:00
Hong Xu
693871ded3 Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360)
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_").  This commit eradicate this issue before it
is made into a stable release.

The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.

 ---

Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360

Differential Revision: D16074509

Pulled By: zou3519

fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
2019-07-02 11:46:13 -07:00
Haixin Liu
869ce89474 use feenableexcept when glibc is available (#22241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387

glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace.

Reviewed By: jspark1105

Differential Revision: D15301095

fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb
2019-07-02 10:49:55 -07:00
Andrew Naguib
3cba9e8aaa Error Message Paraphrasing (#22369)
Summary:
Saying `I` in an err msg is too subjective to be used in a framework.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369

Differential Revision: D16067712

Pulled By: soumith

fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6
2019-06-30 00:13:02 -07:00
Vitaly Fedyunin
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
Sebastian Messmer
de85abf226 Allow default construction of Dict/List (#22084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084

For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr.
But since we renamed them to Dict/List, we can now allow default construction without ambiguity.

Differential Revision: D15948098

fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6
2019-06-25 17:40:48 -07:00
Sebastian Messmer
e425789286 Fix "missing return statement" warning (#22216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216

-

Differential Revision: D15989670

fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043
2019-06-25 16:57:42 -07:00
Ilia Cherniavskii
7b1d6c8912 Update intra_inter_benchmark (#22051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051
ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15933951

Pulled By: ilia-cher

fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba
2019-06-21 23:06:27 -07:00
Sebastian Messmer
275087383b ListPtr->List DictPtr->Dict step 2 (#21937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937

This changes call sites to use the new naming scheme

Reviewed By: zdevito

Differential Revision: D15892404

fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133
2019-06-19 18:02:05 -07:00
Sebastian Messmer
44128e09f0 Speed up op lookup and registration (#21806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806

Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it.

This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle)
and it also speeds up op registration since that needs to look if an op with the same name already eists.

Differential Revision: D15834256

fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1
2019-06-19 12:05:14 -07:00
Will Feng
04f09d4235 Move unwrap logic from c10 to caffe2 (#21620)
Summary:
After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path.

Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620

Differential Revision: D15763560

Pulled By: yf225

fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8
2019-06-14 22:02:43 -07:00
Sherman Wong
adc99efb46 Add batch id to tracer event (#21446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446

this is used for easier tracing of iter id when looking at trace diagram

Reviewed By: ilia-cher

Differential Revision: D15628950

fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d
2019-06-13 17:13:42 -07:00
Sungmann Cho
f59581218f Fix spelling errors (#21665)
Summary:
alloctor -> allocator
excutable -> executable
excution -> execution
foward -> forward
initiaize -> initialize
paralell -> parallel
preprocesor -> preprocessor
tranpose -> transpose
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665

Differential Revision: D15806155

Pulled By: soumith

fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c
2019-06-13 15:21:55 -07:00
Karl Ostmo
49481d576d Torch rename (#20774)
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants).  Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.

The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774

Differential Revision: D15769965

Pulled By: kostmo

fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
2019-06-12 20:12:34 -07:00
Sebastian Messmer
b527e48588 Use c10::List (#21177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177

- Integrate c10::ListPtr into IValue and the c10 dispatcher.
- Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now

Differential Revision: D15476433

fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954
2019-06-12 13:58:24 -07:00
Sebastian Messmer
fe5ceea580 Rename caffe2<->c10 operator wrappers (#21322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322

Naming is everything.

- Rename c10_operator.h -> export_caffe2_op_to_c10.h
- Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h
- Rename corresponding macros

This hugely improves readability and explains what these things are doing.

Reviewed By: dzhulgakov

Differential Revision: D15616816

fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603
2019-06-07 13:48:10 -07:00
Rui Zhu
2b902e9738 Fix the offset numerical bug when casting (#21484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484

cast<int32_t*> => cast<int32_t>

Also fixed reserve problem which might cause incorrect pointer.

Reviewed By: yinghai

Differential Revision: D15699866

fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990
2019-06-07 12:33:18 -07:00
Peng Gong
78a376592d add cancelAsyncCallback method to OperatorBase (#21492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492

If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks.

The new behavior is to cancel the callbacks first, then set event status to finished.

Reviewed By: ilia-cher

Differential Revision: D15702475

fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca
2019-06-06 20:57:12 -07:00
Junjie Bai
4c19421f16 Register gradient op with engine (#21205)
Summary:
cc dreiss
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205

Differential Revision: D15578948

Pulled By: bddppq

fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345
2019-05-31 18:48:47 -07:00
Sebastian Messmer
85777b92b2 Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14430749

fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707
2019-05-29 13:16:00 -07:00
Jongsoo Park
0290897bca tracing for intra_op_parallel (#20603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603

When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized.
This diff also traces child tasks.

Reviewed By: ilia-cher

Differential Revision: D14820008

fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9
2019-05-28 17:39:23 -07:00
Kimish Patel
d6d192e0af Added engine information to the profiling result. (#20493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493

This helps distinguish if the op was a quantized op or not.

Reviewed By: salexspb

Differential Revision: D15337854

fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94
2019-05-28 16:41:12 -07:00
Kimish Patel
7afa75006e Enable operator profiling via command line (#20173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173

Enabled op profiling even when net type is not dag or prof dag. Also added
engine type info to summary.

Reviewed By: salexspb, ilia-cher

Differential Revision: D15177813

fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb
2019-05-28 16:41:08 -07:00
Sebastian Messmer
6063ffd055 Specify dispatch key with kernel (#20821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821

Change registration API. Instead of

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>()
        .dispatchKey(CPUTensorId()));

it is now

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(CPUTensorId()));

This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers.

The semantic problem behind this is that the dispatch key is a *kernel config parameter* and not an *operator config parameter* while things like autograd wrappers, alias info, and actually the kernel itself are *operator config parameters*. And while previously, the different kind of config parameters have been mixed, this diff now separates them.

Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example.

    // what is this supposed to do?
    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .aliasInfo(DEFAULT)
        .dispatchKey(CPUTensorId()));

If we get more kernel config parameters in the future, we could introduce something like this

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(torch::RegisterOperators::kernelOptions()
            .dispatchKey(CPUTensorId())
            .otherConfig());

but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility.

A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call:

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel1>(CPUTensorId())
        .kernel<Kernel2>(CUDATensorId()));

Reviewed By: dzhulgakov

Differential Revision: D15455790

fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11
2019-05-24 14:23:35 -07:00
Sebastian Messmer
4501dc305d Assert against using Operator methods not supported when exporting it to c10 (#17818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14392459

fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52
2019-05-24 13:45:01 -07:00
Dmytro Dzhulgakov
c25e33789e Lightweight at-most-once logging for API usage (#20745)
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
2019-05-23 23:17:59 -07:00
Yinghai Lu
48bf7b9be8 Fix oscillation in coalesceInsertedDataDependencies (#20833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20833

Att. The algorithm is still "horrendously inefficient". But since we are sunsetting Nomnigraph, I just did the minimal fix here.

Reviewed By: tracelogfb

Differential Revision: D15463880

fbshipit-source-id: 413a1280a92c1923ba49031177816a2d5f888575
2019-05-23 14:04:20 -07:00
Yinghai Lu
cf7ef5e631 Add onnxifi support for Int8FCDNNLowPPackedWeightBlob (#20564)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20564

Reviewed By: bddppq

Differential Revision: D15106712

fbshipit-source-id: 428db9c23cfd36ddedc8d79121fbbb3bb484c993
2019-05-20 16:57:11 -07:00
Edward Z. Yang
9b1dbffba5
Re-sync with internal repository (#20702) 2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov
d3059b9c49 Lightweight logging for once-only API usage 2019-05-19 23:04:40 -07:00
Jongsoo Park
7b9ee598d6 separate option for FE_OVERFLOW (#20476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20476

There're overflow exceptions happening for legitimate computation like
for big x, sigmoid(x) = 1 / (1 + exp(-x)) = 1 / (1 + inf) = 1
This diff separates the option for FE_OVERFLOW to make caffe2_operator_throw_if_fp_exceptions=1 option less noisy.

Reviewed By: hx89

Differential Revision: D15332947

fbshipit-source-id: 9148233f5b84551a0900f0557ba22f2b1508ae0c
2019-05-19 16:05:27 -07:00
Sebastian Messmer
cb6be42403 Options based registration API (#20514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20514

Change API from

    static auto registry = c10::RegisterOperators()
      .op("my::op",
        c10::kernel(...),
        c10::dispatchKey(...)
      );

to

    static auto registry = c10::RegisterOperators()
      .op("my::op", c10::RegisterOperators::options()
        .kernel(...)
        .dispatchKey(...)
      );

because this allows better discoverability. People looking for which options are available will easier find it and IDE autocompletion will work better.

Reviewed By: zdevito

Differential Revision: D15346348

fbshipit-source-id: 4b74a33b75c2b9cda4a903639fb7abd2c7cff167
2019-05-17 20:54:42 -07:00
Vitaly Fedyunin
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
Rui Zhu
c129ab06e9 Change onnxifi workflow to support multi-group quantized & Add multi quantization info to caffe2.proto (#20439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20439

This is the QTensorProto workflow for multi group quantization in C2 side.
No DNNLOWP Tensor related thing is included in this pr, so once we finished glow side, we should be able to test this pr using resnet50.

Reviewed By: yinghai

Differential Revision: D15096919

fbshipit-source-id: 741eecd59eb79d24d9fe2b035f6246d42422d25c
2019-05-15 19:24:08 -07:00
Edward Yang
73a97387c1 Replace AT_CHECK with TORCH_CHECK [shard 9/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20435

Reviewed By: jerryzh168

Differential Revision: D15318877

fbshipit-source-id: 4d83571187ea14a604fef83ac355d328b46d93e1
2019-05-15 08:05:59 -07:00
Kedar Pujara
254de9e8ec Removing cyclic dependency (#20511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20511

Removed cyclic dependency of caffe2/core/net.h and workspace.h

Differential Revision: D15303412

fbshipit-source-id: 6e772e372cd0cf2af05d7815f1df8ae20bc2a65e
2019-05-14 18:55:19 -07:00
Sebastian Messmer
9e7f22b223 Remove dependencies from Caffe2Go on PyTorch JIT (#20463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20463

Source file changes mostly involve ifdef'ing-out references to JIT code
from files that are part of Caffe2Go.  Update Internal build scripts to
remove those files from our globs.

After this, changes to most of the JIT files should not trigger mobile CI.

Reviewed By: dzhulgakov

Differential Revision: D15329407

fbshipit-source-id: 48f614c6b028eef0a03ce5161d083a3e078b0412
2019-05-14 14:36:08 -07:00
Ansha Yu
a9aaf698a4 add c2 benchmark runs in cpp (#20108)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20108

Add cpp runs for c2, hooked up via pybinds. Print output to terminal. This is not hooked up with the pep output yet because I'd like to verify the numbers first.

Note that this isn't quite the same mechanism as the pytorch cpp hookup, which uses cpp_python_extensions. If I can use the same mechanism to pull all the inputs for c2 through cpp and do FeedBlobs in cpp, then I'll switch to that.

Reviewed By: zheng-xq

Differential Revision: D15155976

fbshipit-source-id: 708079dacd3e19aacfe43d70c5e5bc54da2cf9e3
2019-05-13 17:01:08 -07:00
Richard Zou
e01a5bf28b Add USE_NAMEDTENSOR compilation flag. (#20162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20162
ghimport-source-id: 0efcd67f04aa087e1dd5faeee550daa2f13ef1a5

Reviewed By: gchanan

Differential Revision: D15278211

Pulled By: zou3519

fbshipit-source-id: 6fee981915d83e820fe8b50a8f59da22a428a9bf
2019-05-09 09:09:16 -07:00
Yanbo Liang
a8387b7779 Delete TensorImpl::GetDevice() (#20025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20025

Delete TensorImpl::GetDevice() and clean all its call sites.

Reviewed By: ezyang

Differential Revision: D15170917

fbshipit-source-id: b6862b74aa036198544f79d18a8c0f995cb0ca7b
2019-05-06 12:44:23 -07:00
Tongliang Liao
1dfeffbff5 Expose test utils (#20114)
Summary:
Some functions were not decorated with `CAFFE2_API`, makes them unusable when creating unit tests for custom ops outside Caffe2 repo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20114

Differential Revision: D15217490

Pulled By: ezyang

fbshipit-source-id: dda3910ad24e566567607deaac705a34ec8e7b8d
2019-05-06 07:06:04 -07:00
Tongliang Liao
f2c715cbe1 Fix the spelling of "context"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20055

Differential Revision: D15217488

Pulled By: ezyang

fbshipit-source-id: bb2b57b5e749357b47a01c6c3e73addf3c5418c7
2019-05-06 06:54:30 -07:00
Sebastian Messmer
fb8792e2b6 Remove torch/jit from xplat build (#19967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19967

-

Reviewed By: dreiss, dzhulgakov

Differential Revision: D15150843

fbshipit-source-id: af7d6902934883be9d8021b3601de2fe1f3bf806
2019-05-02 15:31:06 -07:00
Zachary DeVito
55c719b161 Remove operator.h's dependency on function_schema.h (#19817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19817

A lot of files were depending on the JIT's typesystem
because operator.h depends on function_schema.h. However,
this isn't fundamental to the design. This diff tries to
remove the direct depenency and only includes the c10
wrapper helpers in files where it is required.

Reviewed By: smessmer

Differential Revision: D15112247

fbshipit-source-id: 2c53d83e542c32d9a398c8b60dbf40ab7a1cb0f6
2019-04-29 19:50:43 -07:00
Xiaomeng Yang
2ce39de3fc Add elementwise_affine for layer_norm_op (#19713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713

Add elementwise_affine for layer_norm_op

Reviewed By: houseroad

Differential Revision: D15075454

fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
David Goodwin
c855e04d5f Caffe2 shouldn't fail if CUDA peer access is already enabled
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19586

Differential Revision: D15061544

Pulled By: dzhulgakov

fbshipit-source-id: 6a5f9f4fe45259d689671f58ad5206cdaf15c5bd
2019-04-24 13:22:27 -07:00
Yinghai Lu
b85edac16f Fix out-of-topological-order issue in Nomnigraph (#19458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19458

The algorithm in https://fburl.com/ggh9iyvc fails to really ensure topological ordering of nodes. The fix is ugly but effective. I think we need a real topological sort to fix this issue more nicely. Mikhail Zolotukhin, Bram Wasti.

Differential Revision: D15011893

fbshipit-source-id: 130c3aa442f5d578adfb14fbe5f16aa722434942
2019-04-19 12:19:39 -07:00
Sebastian Messmer
17f05ad5e5 Moving at::Tensor into caffe2::Tensor without bumping refcount (#19388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19388

The old implementation forced a refcount bump when converting at::Tensor to caffe2::Tensor.
Now, it is possible to move it without a refcount bump.

Reviewed By: dzhulgakov

Differential Revision: D14986815

fbshipit-source-id: 92b4b0a6f323ed38376ffad75f960cad250ecd9b
2019-04-18 14:13:26 -07:00
Sebastian Messmer
601f36bacc Use string based schema for exposing caffe2 ops (#19287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19287

Since we now have a string-schema-based op registration API, we can also use it when exposing caffe2 operators.

Reviewed By: dzhulgakov

Differential Revision: D14931925

fbshipit-source-id: ec162469d2d94965e8c99d431c801ae7c43849c8
2019-04-18 02:04:50 -07:00
Sebastian Messmer
db611b7caf Delete C10Tensor (#19328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19328

Plans changed and we don't want this class anymore.

Reviewed By: dzhulgakov

Differential Revision: D14966746

fbshipit-source-id: 09ea4c95b352bc1a250834d32f35a94e401f2347
2019-04-17 00:02:27 -07:00
Will Feng
c7b5a8a876 Change is_variable() to check existence of AutogradMeta, and remove is_variable_ (#19139)
Summary:
Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead.

Removing `is_variable_` is part of the work in Variable/Tensor merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139

Differential Revision: D14893339

Pulled By: yf225

fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a
2019-04-11 14:03:33 -07:00
Gregory Chanan
b6ee83a5b4 Materialize a non-default device for C2 legacy storage. (#18605)
Summary:
It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath.

This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device.

Instead, we materialize a device by allocating 0 bytes via the allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605

Differential Revision: D14680620

Pulled By: gchanan

fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432
2019-04-11 13:50:41 -07:00
Yinghai Lu
bbe648dffb Allow empty net type (#19154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19154

I recently saw some weird workflow error due to empty but set net_type. Maybe we should just fallback to simple net in this case.

Reviewed By: dzhulgakov

Differential Revision: D14890072

fbshipit-source-id: 4e9edf8232298000713bebb0bfdec61e9c5df17d
2019-04-11 12:43:07 -07:00
Alexander Sidorov
0ca8f7a15f Make BlackBoxPredictor handle networks throwing exceptions (#19080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080

OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test

Reviewed By: dzhulgakov

Differential Revision: D14814194

fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c
2019-04-09 16:42:12 -07:00
Lu Fang
75d6d8833d remove interned_string.h dep (#19061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061

remove the deps on interned_string.h

Reviewed By: BIT-silence

Differential Revision: D14850078

fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32
2019-04-09 09:59:15 -07:00
Lu Fang
443a58e03d Export C10 operator in PyTorch Model (#18210)
Summary:
Almost there, feel free to review.

these c10 operators are exported to _caffe2 domain.

TODO:

- [x] let the onnx checker pass
- [x] test tensor list as argument
- [x] test caffe2 backend and converter
- [x] check the c10 schema can be exported to onnx
- [x] refactor the test case to share some code
- [x] fix the problem in ONNX_ATEN_FALLBACK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210

Reviewed By: zrphercule

Differential Revision: D14600916

Pulled By: houseroad

fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144
2019-04-08 16:06:00 -07:00
Duc Ngo
e7b2669151 caffe2 - Expose tensor filler util to Python (#18886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886

Expose tensor filler util to Python and add a unit test (both C++/Python)

Reviewed By: salexspb

Differential Revision: D14784470

fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d
2019-04-08 11:54:10 -07:00
Jerry Zhang
40a54bf2f1 Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531

Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service,
we would like to change it to C10_LOG_FIRST_N to prevent that.

Reviewed By: dzhulgakov

Differential Revision: D14647704

fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e
2019-04-02 21:03:37 -07:00
Rui Zhu
19fe2b9db4 Adding quantized tensor shape/type info support for caffe2=>glow in caffe2 side (#18621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18621

This diff added caffe2 support for onnxifi quantization.

Reviewed By: yinghai

Differential Revision: D14648767

fbshipit-source-id: 4ddb492cacbba6142305866e6dbb875880acaea3
2019-03-31 17:42:27 -07:00