pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Richard Barnes	c44300884e	Clarify timing of GetDeviceProperty() (#46715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715 Test Plan: N/A Reviewed By: ezyang Differential Revision: D24455538 fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009	2020-10-22 11:29:31 -07:00
Tristan Rice	0c9787c758	caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987 This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb) For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current. Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions. Reviewed By: dzhulgakov Differential Revision: D23219710 fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814	2020-10-16 16:08:35 -07:00
Tristan Rice	dd169ca17c	caffe2/plan_executor: propagate exceptions from reporter substeps (#46424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46424 Currently if an exception occurs in a reporter thread the process is killed via std::terminate. This adds support for handling the reporter exception if FLAGS_caffe2_handle_executor_threads_exceptions is set to true. Test Plan: buck test mode/opt -c python.package_style=inplace //caffe2/caffe2/python:hypothesis_test //caffe2/caffe2:caffe2_test_cpu -- --stress-runs 100 Reviewed By: dahsh Differential Revision: D24345027 fbshipit-source-id: 0659495c9e27680ebae41fe5a3cf26ce2f455cb3	2020-10-16 12:28:57 -07:00
Hao Lu	16c52d918b	[caffe2] Bypass memonger for in-place ops (#46378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46378 Reviewed By: dzhulgakov Differential Revision: D24236604 fbshipit-source-id: 9f599687467ea969e89243482f8e2a41f7db0a23	2020-10-15 16:03:52 -07:00
Danny Huang	85c3ba5588	[caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46110 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145). * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added PlanExecutorTest `ErrorPlanWithCancellableStuckNet` for plan executor. * Set cancelCount to zero at the beginning of tests to avoid global state be carried over in some test environment. Test Plan: ## Unit Test Added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 1000 ``` Reviewed By: d4l3k Differential Revision: D24226577 fbshipit-source-id: c834383bfe6ab50747975c229eb42a363eed3458	2020-10-12 12:00:15 -07:00
Danny Huang	87226f72d2	[caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46080 temp removal of ErrorPlanWithCancellableStuckNet, will fill out more Test Plan: ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest ``` remove a test Reviewed By: fegin Differential Revision: D24213971 fbshipit-source-id: e6e600bad00b45c726311193b4b3238f1700526e	2020-10-08 23:35:45 -07:00
Danny Huang	487624e369	[caffe2] plan executor error propagation test with blocking cancellable op (#45319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45319 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145) * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added `ErrorPlanWithCancellableStuckNet` for plan executor. * We set a plan with two nets: one stuck net with blocking operator that never returns, and one with error net with error op that throws, and tested it throw and cancel. Test Plan: ## Unit Test added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 ``` ``` Summary Pass: 400 ListingSuccess: 2 ``` Reviewed By: d4l3k Differential Revision: D23920548 fbshipit-source-id: feff41f73698bd6ea9b744f920e0fece4ee44438	2020-10-08 19:54:49 -07:00
Tristan Rice	59e4803b94	Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981 This is a recommit of previously reverted D20850851 (`3fbddb92b1`). TL;DR - combining condition_variables and atomics is a bad idea https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them. Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120 buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100 buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/ will ensure no timeouts in OSS Reviewed By: walterddr, dahsh Differential Revision: D24165505 fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196	2020-10-08 14:17:30 -07:00
Rong Rong	1bb2d41b68	Revert D20850851: caffe2/plan_executor: wait for 1 minute after exception and then abort Test Plan: revert-hammer Differential Revision: D20850851 (`3fbddb92b1`) Original commit changeset: 330503775d80 fbshipit-source-id: 612c6c3c4d5586bc8ad00a112cd00fc74fb44243	2020-10-07 09:04:24 -07:00
Tristan Rice	3fbddb92b1	caffe2/plan_executor: wait for 1 minute after exception and then abort (#45297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45297 If we have two concurrent substeps and one of them throws an exception and the other is blocking, we'll currently hang. This waits up to 1 minute for it to complete before terminating the process. Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: dahsh Differential Revision: D20850851 fbshipit-source-id: 330503775d8062a34645ba55fe38e6770de5e3c7	2020-10-06 12:59:09 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00
Danny Huang	2b8f0b2023	[caffe2] adds Cancel to OperatorBase and NetBase (#44145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44145 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * Adds `NetBase::Cancel()` to NetBase which iterates over the entire list of operators and call Cancel. * Cancel on all ops was added to Net since there's nothing Asyc specific about it. * `AsyncSchedulingNet` calls parent Cancel. * To preserve backwards compatibility, `AsyncSchedulingNet`'s Cancel still calls `CancelAndFinishAsyncTasks` . * Adds `Cancel()` to `OperatorBase`. Reviewed By: dzhulgakov Differential Revision: D23279202 fbshipit-source-id: e1bb0ff04a4e1393f935dbcac7c78c0baf728550	2020-09-11 12:50:26 -07:00
Wanchao Liang	d07a36e0c1	Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False Test Plan: revert-hammer Differential Revision: D23490149 (`15e99b6ff6`) Original commit changeset: a76382c30d83 fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38	2020-09-04 22:59:39 -07:00
Nikita Shulga	15e99b6ff6	Compile less legacy code when BUILD_CAFFE2 is set to False (#44079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079 Reviewed By: walterddr Differential Revision: D23490149 Pulled By: malfet fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53	2020-09-04 20:04:21 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Natalia Gimelshein	d1d32003bb	force pytorch tensors to contiguous before calling c2 ops Summary: per title, makes c2 wrappers safer as contiguity of torch inputs is not guaranteed Test Plan: covered by existing tests Reviewed By: dzhulgakov Differential Revision: D23310137 fbshipit-source-id: 3fe12abc7e394b8762098d032200778018e5b591	2020-08-24 23:04:13 -07:00
Sean Lynch	f80b695a75	Properly format db.h and db.cc (#43027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43027 Format db.h and db.cc using the default formatter. This change was split off of D22705434. Test Plan: Wait for sandcastle. Reviewed By: rohithmenon, marksantaniello Differential Revision: D23113765 fbshipit-source-id: 3f02d55bfb055bda0fcba5122336fa001562d42e	2020-08-24 18:29:45 -07:00
Tristan Rice	5e04bb2c1c	caffe2: expose CPUContext RandSeed for backwards compatibility with external RNG (#43239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43239 This is an incremental step as part of the process to migrate caffe2 random number generator off of std::mt19937 and to instead use at::mt19937+at::CPUGeneratorImpl. The ATen variants are much more performant (10x faster). This adds a way to get the CPUContext RandSeed for tail use cases that require a std::mt19937 and borrow the CPUContext one. Test Plan: This isn't used anywhere within the caffe2 codebase. Compile should be sufficient. Reviewed By: dzhulgakov Differential Revision: D23203280 fbshipit-source-id: 595c1cb447290604ee3ef61d5b5fc079b61a4e14	2020-08-21 19:36:38 -07:00
Ehsan K. Ardestani	ecb9e790ed	Remove excessive logging in plan_executor (#42888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42888 as title Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file /mnt/public/ehsanardestani/temp/quant_eval_inputs_all.json Reviewed By: amylittleyang Differential Revision: D23066529 fbshipit-source-id: f925afd1734e617e412b0f171e16c781d13272d9	2020-08-11 23:57:17 -07:00
Ehsan K. Ardestani	a5af2434fe	NVMified NE Eval Summary: This diff NVMifies the NE Eval Flow. - It defines a `LoadNVM` operator which either - receives a list of nvm blobs, or - extracts the blobs that could be NVMified from the model. - dumps NVMified blobs into NVM - and deallocates from DRAM - NVMify the Eval net on dper and C2 backend Specific NVMOp for SLS is pushed through different diffs. Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file=/mnt/public/ehsaardestani/temp/small_model.json 2>&1 \| tee log Reviewed By: yinghai, amylittleyang Differential Revision: D22469973 fbshipit-source-id: ed8379ad404e96d04ac05e580176d3aca984575b	2020-08-06 10:25:31 -07:00
Dmytro Dzhulgakov	06d978a9ad	[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: \| Case \| Call to device_count() \| init_cuda, e.g. allocating tensor \| \| -- \| -- \| -- \| \| all good \| non-zero \| just works \| \| no gpus \| 0, no warning \| throw exception with good message \| \| driver issues \| 0, produce warning \| throw exception with good message \| \| out of memory with ASAN \| 0, produce warning\| throw exception with ASAN message \| Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5	2020-08-05 11:39:31 -07:00
Rohith Menon	4e16be9073	[MemLeak] Fix memory leak from releasing unique ptr (#41883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41883 Fix memory leak from releasing unique ptr Test Plan: Tested serialization with and without the change. Heap profile without change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 7298.4 MB 4025.2 55.2% 55.2% 4025.2 55.2% c10::alloc_cpu (inline) 3195.3 43.8% 98.9% 3195.3 43.8% caffe2::SerializeUsingBytesOrInt32 63.6 0.9% 99.8% 63.6 0.9% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.5 0.0% 99.9% 2.5 0.0% folly::aligned_malloc (inline) 1.2 0.0% 99.9% 1.2 0.0% caffe2::detail::CopyFromProtoWithCast (inline) 1.0 0.0% 99.9% 1.0 0.0% __new_exitfn 1.0 0.0% 100.0% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::HHWheelTimerBase::newTimer (inline) 0.5 0.0% 100.0% 0.5 0.0% std::__detail::_Hashtable_alloc::_M_allocate_node ``` Heap profile with change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 6689.2 MB 4025.2 60.2% 60.2% 4025.2 60.2% c10::alloc_cpu (inline) 2560.0 38.3% 98.4% 2560.0 38.3% caffe2::::HugePagesArena::alloc_huge (inline) 90.9 1.4% 99.8% 90.9 1.4% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.0 0.0% 99.9% 2.0 0.0% prof_backtrace_impl (inline) 1.0 0.0% 99.9% 20.3 0.3% std::__cxx11::basic_string::_M_construct (inline) 1.0 0.0% 99.9% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 99.9% 0.5 0.0% folly::UnboundedQueue::allocNextSegment (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::aligned_malloc (inline) 0.5 0.0% 100.0% 0.5 0.0% __new_exitfn ``` Reviewed By: yinghai Differential Revision: D22662093 fbshipit-source-id: d0b8ff1ed26c72b14bb02fb1146c51ef11a7e519	2020-07-22 16:54:19 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Lu Fang	b2e52186b9	Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41461 capacity is misleading, and we have many wrong uses internally. Let's rename to nbytes to avoid the confusion in future. Ultimately, we could remove this parameter if possible. So far I haven't seen any case this capacity is necessary. Test Plan: oss ci Differential Revision: D22544189 fbshipit-source-id: f310627f2ab8f4ebb294e0dd5eabc380926991eb	2020-07-15 22:04:18 -07:00
Linbin Yu	df1f8a48d8	add null check for c2 tensor conversion (#41096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41096 The spark spot model had some issues in tensor conversion, see P134598596. It happens when we convert an undefined c10 tensor to caffe2 tensor. This diff added a null check. Test Plan: spark spot model runs without problem Reviewed By: smessmer Differential Revision: D22330705 fbshipit-source-id: dfe0f29a48019b6611cad3fd8f2ae49e8db5427e	2020-07-09 11:44:23 -07:00
Nikita Shulga	d1352192e2	Move `OperatorBase::AddRelatedBlobInfo` implementation to .cc file (#40844 ) Summary: If virtual function is implemented in header file, it's implementation will be included as a weak symbol to every shared library that includes this header along with all of it's dependencies. This was one of the reasons why size of libcaffe2_module_test_dynamic.so was 500Kb (AddRelatedBlobInfo implementation pulled a quarter of libprotobuf.a with it) Combination of this and https://github.com/pytorch/pytorch/issues/40845 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40844 Differential Revision: D22334725 Pulled By: malfet fbshipit-source-id: 836a4cbb9f344355ddd2512667e77472546616c0	2020-07-01 11:48:15 -07:00
Nikita Shulga	cbdf399fc6	Move OperatorSchema default inference function implementations to .cc… (#40845 ) Summary: … file This prevents implementation of those functions(as lambdas) to be embedded as weak symbol into every shared library that includes this header. Combination of this and https://github.com/pytorch/pytorch/pull/40844 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40845 Differential Revision: D22334779 Pulled By: malfet fbshipit-source-id: 64706918fc2947350a58c0877f294b1b8b085455	2020-07-01 11:42:52 -07:00
Sean Lynch	64689c2474	Remove unecessary copy within blob serialization (#40096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40096 Declaring `tensor_proto` to be of type `auto` means that it will copy the entire `TensorProto` instead of just keeping a reference. This changes it to just use a const reference instead. Test Plan: Using the model loader benchmark to measure model loading performance: ### `tensor_proto` is of type `const auto&` ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 11.08ms 90.27 BlobProtoByteDeserializationFloat16 1509.73% 733.73us 1.36K ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 10.48ms 95.45 BlobProtoByteDeserializationUInt8 2974.57% 352.22us 2.84K ============================================================================ ``` ### `tensor_proto` is of type `auto` ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 13.84ms 72.26 BlobProtoByteDeserializationFloat16 658.85% 2.10ms 476.08 ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 17.09ms 58.51 BlobProtoByteDeserializationUInt8 3365.98% 507.80us 1.97K ============================================================================ ``` Reviewed By: marksantaniello Differential Revision: D21959644 fbshipit-source-id: 6bc2dfbde306f88bf7cd4f9b14b95ac69c2e1b4d	2020-06-16 14:45:59 -07:00
Ilia Cherniavskii	01986e9890	Wait for all op types in SimpleNet (#39493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39493 Make sure we wait for all types, incl. async cpu ops Test Plan: CI Reviewed By: kennyhorror Differential Revision: D21873540 fbshipit-source-id: 37875cade68e1b3323086833f8d4db79362a68e8	2020-06-11 13:00:34 -07:00
Dmytro Dzhulgakov	e46060701d	[caffe2] Fix of initializing ATen's CUDA before using caching allocator (#39759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39759 Caffe2 has a mode where it uses PT's caching allocator. Somehow we were not calling the initialization explicitly. Now, I have no idea why it worked before. Probably worth to run a bisect separately. Reviewed By: houseroad Differential Revision: D21962331 fbshipit-source-id: f16ad6b27a67dbe0bda93939cca8c94620d22a09	2020-06-09 17:25:42 -07:00
Natalia Gimelshein	9c19a12965	fix asserts in cuda code (#39047 ) Summary: Gets rid of some in-kernel asserts where they can be replaced with static_asserts Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary replaces host code `assert`s with `TORCH_INTERNAL_ASSERT` Another group of asserts is in fractional max pooling kernels which should be fixed regardless https://github.com/pytorch/pytorch/issues/39044, the problems there are not just asserts. I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39047 Differential Revision: D21750392 Pulled By: ngimel fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719	2020-05-28 15:51:38 -07:00
Natalia Gimelshein	ba14a701dc	restore proper cuda assert behavior with DNDEBUG (#38943 ) Summary: Per title. https://github.com/pytorch/pytorch/issues/32719 essentially disabled asserts in cuda kernels in release build. Asserts in cuda kernels are typically used to prevent invalid reads/writes, so without asserts invalid read/writes are silent errors in most cases (sometimes they would still cause "illegal memory access" errors, but because of caching allocator this usually won't happen). We don't need 2 macros, CUDA_ALWAYS_ASSERT and CUDA_KERNEL_ASSERT because all current asserts in cuda kernels are important to prevent illegal memory accesses, and they should never be disabled. This PR removes macro CUDA_ALWAYS_ASSERT and instead makes CUDA_KERNEL_ASSERT (that is commonly used in the kernels) an asserttion both in release and debug builds. Fixes https://github.com/pytorch/pytorch/issues/38771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38943 Differential Revision: D21723767 Pulled By: ngimel fbshipit-source-id: d88d8aa1b047b476d5340e69311e65aff4da5074	2020-05-26 18:11:00 -07:00
Kurt Mohler	f9eb8824f1	Remove datatype from Storage and StorageImpl (#38870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870 * Removed dtype data member from StorageImpl * Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Original PR: https://github.com/pytorch/pytorch/pull/38038 Reviewed By: albanD Differential Revision: D21549645 Pulled By: ezyang fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de	2020-05-21 15:26:08 -07:00
Ilia Cherniavskii	a94fb71b12	Memory profiling (#37775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37775 Adding memory usage into profiler table output Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake ``` import torch import torchvision.models as models model = models.resnet18() inp = torch.randn(5, 3, 224, 224) with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof: model(inp) print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15)) ``` ``` --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]] empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 [] stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]] empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 [] is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]] masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]] conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]] contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]] _convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 [] thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [ --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 154.855ms ``` Reviewed By: ngimel Differential Revision: D21384248 Pulled By: ilia-cher fbshipit-source-id: 31359cce2aa06f6255ed1ad8c60d03cb640bfec3	2020-05-19 15:48:48 -07:00
Xiang Gao	5e2d8745c8	RIP CUDA <9.2: circleci, aten, and caffe2 (#36846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36846 Test Plan: Imported from OSS Differential Revision: D21620850 Pulled By: ngimel fbshipit-source-id: 7ad1676a12f86250f301095ffc6f365a3b370f34	2020-05-18 13:41:05 -07:00
Allan Di Wu	d35ab0b7ae	Fix CUDA memory management issues caused by not using PinnedCPUAllocator (#38066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38066 Increasing priority for PinnedCPUAllocator to make sure it is set when CUDA is enabled. Test Plan: buck test mode/dev-nosan //vision/fair/detectron2/tests:test_export_caffe2 -- 'testMaskRCNNGPU $test_export_caffe2\.TestCaffe2Export$' Reviewed By: ppwwyyxx Differential Revision: D21465835 fbshipit-source-id: 643cff30d35c174085e5fde5197ddb05885b2e99	2020-05-07 21:52:00 -07:00
Ansha Yu	32329c3338	[nomni] fix outputs check to replaceSubgraph (#38005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38005 D21445887 runs into a dbgo build crash on this stack P130135519 It is because the assertion sg_inputs_copy.size() == 0 is too restrictive. nn::getOutputs(sg) returns "output" nodes which can include any inputs that have additional consumers that are not in the subgraph itself. To fix, proposing to remove inputs from the output check. Test Plan: Run tests Sanity canaries: https://our.intern.facebook.com/intern/ads/canary/426498931666198610/ https://our.intern.facebook.com/intern/ads/canary/426498935267166205/ Reviewed By: bwasti Differential Revision: D21445881 fbshipit-source-id: 419a4b1a230f0370619cea574403bfa114e56a7c	2020-05-07 19:58:15 -07:00
Edward Yang	fe88806784	Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893 Original commit changeset: 50746043acf3 Test Plan: sandcastle and ossci Reviewed By: malfet, seemethere, ngimel Differential Revision: D21416509 fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b	2020-05-05 22:43:15 -07:00
Nikita Shulga	9f060d3873	[Caffe2] Increase timing threshold to 50 ms on Windows (#37892 ) Summary: Helps prevent following accidental failures: ``` ..\caffe2\core\parallel_net_test.cc:303 The difference between ms and 350 is 41, which exceeds kTimeThreshold, where ms evaluates to 391, 350 evaluates to 350, and kTimeThreshold evaluates to 40. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37892 Differential Revision: D21417251 Pulled By: malfet fbshipit-source-id: 300cff7042e466f014850cc7cc406c725d5d0c04	2020-05-05 19:45:36 -07:00
Edward Yang	a2fc7f787a	Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count Test Plan: revert-hammer Differential Revision: D21171334 Original commit changeset: 37329a379de9 fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f	2020-05-05 16:36:15 -07:00
Kurt Mohler	3706803b60	Change StorageImpl to track byte count rather than element count (#37776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776 * Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl * Changed numel() and set_numel() to nbytes() and set_nbytes() * Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028 Differential Revision: D21171334 Pulled By: ezyang fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8	2020-05-05 14:20:51 -07:00
Edward Yang	a058e938f9	Refactor error msg stack handling, add TORCH_RETHROW (#37101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37101 Fixes #36954. The basic concept is to streamline the process of rethrowing c10::Error with extra error information. This is in a few steps: - I completely remodeled the Error data type and the internal invariants. Instead of manually adding in newlines, the message stack formatting process is responsible for inserting newlines and spacing as necessary. Call sites are then modified to respect the new API model. - TORCH_RETHROW macro is added, which adds context to an error message and then rethrows it. New internal assert failure looks like: ``` 0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch. Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first): frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so) frame #1: ... ``` Error message with context looks like: ``` This is an error This is context 1 This is context 2 ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202891 Pulled By: ezyang fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169	2020-05-04 11:56:45 -07:00
Edward Yang	efd8f70cac	Make msg() and msg_with_backtrace() private (#37094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37094 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202892 Pulled By: ezyang fbshipit-source-id: d59e6bffabd90cc734056bdce2cd1fe63262fab8	2020-05-04 11:54:34 -07:00
cyy	2658bae570	use std::move (#34365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34365 Differential Revision: D21349942 Pulled By: mrshenli fbshipit-source-id: 4deb51cbb557501b43990ec7080c71a839cb5db9	2020-05-01 13:42:23 -07:00
Sebastian Messmer	4e976b9334	Remove callBoxedWorkaround (#36850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36850 Since now all unboxing happens after dispatch, which means that all c10 ops support unboxing, we can now use op.callBoxed() for all ops and don't need callBoxedWorkaround (which was going through the JIT registry) anymore. ghstack-source-id: 102879558 Test Plan: waitforsandcastle Differential Revision: D21102375 fbshipit-source-id: d1e041116563a9650d5a86b07eb96d217d8756f3	2020-04-24 23:13:31 -07:00
Nikita Shulga	e7a72bb0c6	Add nomnigraph include folder to `Caffe2_GPU_INCLUDE` (#37056 ) Summary: Because `caffe2/contrib/tensort` includes nomnigraph headers Pull Request resolved: https://github.com/pytorch/pytorch/pull/37056 Test Plan: `cmake ../pytorch -DPYTHON_EXECUTABLE=/usr/bin/python3.7 -DCMAKE_BUILD_TYPE=RELWITHDEBINFO -DUSE_CUDA=YES -DBUILD_TEST=YES -DUSE_TENSORRT=YES -DTENSORRT_ROOT=$HOME/Downloads/TensorRT-7.0.0.11 -DCMAKE_CXX_COMPILER=/usr/bin/cuda-g++ -DCMAKE_C_COMPILER=/usr/bin/cuda-gcc -DUSE_MKLDNN=ON -G Ninja; ninja torch_cuda` Differential Revision: D21178927 Pulled By: malfet fbshipit-source-id: e1bed94fdb395ebfd6eb5d950ca378da77592531	2020-04-22 09:44:13 -07:00
Lu Fang	d933ec14ce	[c10] Fix the hanlding for Caffe2 ops which return tensor list (#36841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36841 right now, all c2 ops's output will be unwrapped blindly. This is not correct, if we have a single tensor list returned. Test Plan: buck test mode/dev-nosan mode/no-gpu //caffe2/caffe2/fb/python/operator_test:torch_integration_test Reviewed By: alyssawangqq Differential Revision: D21100463 fbshipit-source-id: 9f22f3ddf029e7da9d98008d68820bf7f8239d4f	2020-04-18 13:30:43 -07:00
Lin Yang	cc5befc461	[Format] format a few files (#35187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35187 When I touch these files, lint will always introduce some unintended change, to prevent it from happening, we need to format the code first. change is generated by: arc f Test Plan: integration test. Differential Revision: D20587596 fbshipit-source-id: 512cf6b86bd6632a61c80ed53e3a9e229feecc2a	2020-04-17 14:30:01 -07:00
Edward Yang	dd64e738c5	Expunge TensorId from all DispatchKey names. (#36240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240 It's annoying, historical, and unnecessary (enum class is already namespaced). I did this codemod with: ``` git grep -l 'CPUTensorId' \| xargs sed -i 's/CPUTensorId/CPU/g' git grep -l 'CUDATensorId' \| xargs sed -i 's/CUDATensorId/CUDA/g' git grep -l 'VariableTensorId' \| xargs sed -i 's/VariableTensorId/Autograd/g' git grep -l 'HIPTensorId' \| xargs sed -i 's/HIPTensorId/HIP/g' git grep -l 'MSNPUTensorId' \| xargs sed -i 's/MSNPUTensorId/MSNPU/g' git grep -l 'XLATensorId' \| xargs sed -i 's/XLATensorId/XLA/g' git grep -l 'PrivateUse1_TensorId' \| xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g' git grep -l 'PrivateUse2_TensorId' \| xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g' git grep -l 'PrivateUse3_TensorId' \| xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g' git grep -l 'AutocastTensorId' \| xargs sed -i 's/AutocastTensorId/Autocast/g' git grep -l '_PreAutogradTensorId' \| xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g' git grep -l 'TESTING_ONLY_GenericWrapperTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g' git grep -l 'TESTING_ONLY_GenericModeTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g' ``` Then I did a git grep for remaining TensorId occurrences, and manually killed those (mostly in codegen, and some docs that needed updating). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929255 Pulled By: ezyang fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d	2020-04-13 23:33:44 -07:00
Hao Lu	4d1ccafb4b	[caffe2] Enable copying for caffe2::Tensor (#36468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36468 Since `caffe2::Tensor` is now refcounted, enabling copy constructor and the copy assignment operator should be fine. Test Plan: ``` buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- TensorTest ``` AI/AF canaries with changes up to D20959214: https://our.intern.facebook.com/intern/experiment_store/experiment/3298538636995/#commit1-commit2 https://our.intern.facebook.com/intern/experiment_store/experiment/2199027015376/#commit1-commit2 AI/AF canaries on this diff: https://our.intern.facebook.com/intern/ads/canary/425960191574068914/ https://our.intern.facebook.com/intern/ads/canary/425960179835413033/ Reviewed By: yinghai Differential Revision: D20985924 fbshipit-source-id: ead5f5ceff23d0adc06d598128de16a5533d767b	2020-04-13 21:41:52 -07:00
Tristan Rice	ce54f0d411	Back out "Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets" (#36172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36172 Original commit changeset: 3d7801613f86 D20449887 broke some OSS tests as the OSS export sync wasn't working correctly. Test Plan: Manually export latest version to OSS to trigger the tests + test plan in D20449887 verified onnx tests are passing in https://github.com/pytorch/pytorch/pull/36172 Reviewed By: andrewwdye Differential Revision: D20902279 fbshipit-source-id: bc30fcc9f5cc8076f69a5d92675fd27455948372	2020-04-13 11:31:52 -07:00
Tristan Rice	90c7db8ae3	caffe2/core/plan_executor: add cancellation of async nets on error + propagate exceptions via std::exception_ptr for stack traces (#31966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31966 This has three parts: * When `--caffe2_handle_executor_threads_exceptions` is set when a parallel execution step throws an exception it can hang waiting for async nets to finish. This adds cancellation code to cancel any async nets. * This makes the exceptions returned from parallel workers pass a std::exception_ptr so the stack trace can be recorded with folly::SmartExceptionTracer. * Define Cancel method at NetBase level to avoid pulling in unsupported AsyncSchedulingNet for fbandroid. Test Plan: Added unit tests for plan_executor buck test //caffe2/caffe2:caffe2_test_cpu buck test //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: boryiingsu Differential Revision: D19320177 fbshipit-source-id: d9939fcea1317751fa3de4172dfae7f781b71b75	2020-04-09 14:38:18 -07:00
Nikita Shulga	0f34d648c8	Fix signed-unsigned warnings (RELAND) (#36224 ) Summary: This is a realand of https://github.com/pytorch/pytorch/pull/36196 Before the fix bazel spews following multi-line warning for every single caffe2 operator: ``` In file included from ./c10/util/logging_is_google_glog.h:50, from ./c10/util/Logging.h:26, from ./caffe2/core/logging.h:2, from ./caffe2/core/blob.h:13, from ./caffe2/core/operator.h:18, from ./caffe2/sgd/adadelta_op.h:1, from caffe2/sgd/adadelta_op.cc:1: bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]': ./caffe2/core/operator.h:192:5: required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]' ./caffe2/core/operator.h:890:48: required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]' ./caffe2/sgd/adadelta_op.h:87:5: required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]' ./caffe2/sgd/adadelta_op.h:85:8: required from here bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare] 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE' 148 \| #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1)) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL' 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36224 Test Plan: CI Differential Revision: D20919506 Pulled By: malfet fbshipit-source-id: b8b4b7c62dcbc109b30165b19635a6ef30033e73	2020-04-08 16:29:27 -07:00
Akshay Bhandary	83abd7ffbf	Revert D20909696: [pytorch][PR] Fix signed-unsigned warnings Test Plan: revert-hammer Differential Revision: D20909696 Original commit changeset: 16723355f473 fbshipit-source-id: e1cf6e9d42f852693549a94d7f5830196781f00e	2020-04-08 01:21:04 -07:00
Nikita Shulga	25fe27981f	Fix signed-unsigned warnings (#36196 ) Summary: Otherwise, while bazel spews following multi-line warning for every single caffe2 operator: ``` In file included from ./c10/util/logging_is_google_glog.h:50, from ./c10/util/Logging.h:26, from ./caffe2/core/logging.h:2, from ./caffe2/core/blob.h:13, from ./caffe2/core/operator.h:18, from ./caffe2/sgd/adadelta_op.h:1, from caffe2/sgd/adadelta_op.cc:1: bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]': ./caffe2/core/operator.h:192:5: required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]' ./caffe2/core/operator.h:890:48: required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]' ./caffe2/sgd/adadelta_op.h:87:5: required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]' ./caffe2/sgd/adadelta_op.h:85:8: required from here bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare] 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE' 148 \| #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1)) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL' 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36196 Differential Revision: D20909696 Pulled By: malfet fbshipit-source-id: 16723355f473379ba9da6d3c33bd561b9724800a	2020-04-07 21:31:01 -07:00
Edward Yang	459163b8eb	Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets Test Plan: revert-hammer Differential Revision: D20449887 Original commit changeset: 047fdf1bd52f fbshipit-source-id: 3d7801613f86885c204f3946f3a52a855516faa3	2020-04-06 19:37:05 -07:00
Tristan Rice	8ef82fc2c9	[dt][caffe2] enable using smart exceptions in async nets (#34753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34753 This improves support for exceptions and capturing stack traces in caffe2 async nets. We generally want to use exceptions everywhere we can in order to preserve stack information. It also makes the exception timestamp more accurate so multiple exceptions at the same time can be correctly ordered. Test Plan: Updated the tests to use the new error semantics + adds a test to ensure the stack is correctly propagated through deferrable async scheduling. Reviewed By: andrewwdye Differential Revision: D20449887 fbshipit-source-id: 047fdf1bd52fd7c7c1f3fde77df9a27ed9e288e7	2020-04-06 14:27:07 -07:00
Tristan Rice	676fc929b7	[caffe2] fix type and shape inference for common gradient ops (#35857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35857 This fixes a lot of common ops for InferBlobShapesAndTypes as well as adds support for testing the inferred shapes and types of gradient ops. Ops: * Concat * Split * LeakyReLU * Relu * Prelu * Gelu * Elu * Sinh, Tanh, Cosh * Abs * ... and a number of other simple element wise ops Test Plan: Added support to hypothesis test to check the shape and type of gradient ops. Enabled it for all the ops I fixed the shape and type inference for. buck test caffe2/caffe2/python/operator_test: Reviewed By: pradeepd24 Differential Revision: D20806284 fbshipit-source-id: 77f796d9ff208e09e871bdbadf9a0a7c196b77f2	2020-04-02 11:17:04 -07:00
Nikita Shulga	16774f7353	Increase TimerTest tolerance to 20% on Windows (#35818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35818 Test Plan: CI Differential Revision: D20798424 Pulled By: malfet fbshipit-source-id: 57e8d9c6b93903a6632168a4a35bf946d8c518aa	2020-04-01 14:29:05 -07:00
Edward Yang	3f3b96b1f8	Revert D20735881: [pytorch][PR] [WIP] [reland][pytorch][PR] Fix some incorrect annotation… Test Plan: revert-hammer Differential Revision: D20735881 Original commit changeset: d21e940380f0 fbshipit-source-id: fb50a099320bfac92c9b8e1ca12cdc50d302342f	2020-03-30 12:28:27 -07:00
peter	e7a37823b0	[WIP] [reland][pytorch][PR] Fix some incorrect annotation… (#35588 ) Summary: …s found by clang-cl" This reverts commit `a9b540d109`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35588 Differential Revision: D20735881 Pulled By: ezyang fbshipit-source-id: d21e940380f0c1b9b9b84e9cc892985fd3ad0ac3	2020-03-30 11:42:19 -07:00
Nikita Shulga	a9b540d109	Revert D20670031: [pytorch][PR] Fix some incorrect annotations found by clang-cl Test Plan: revert-hammer Differential Revision: D20670031 Original commit changeset: cd8018dee703 fbshipit-source-id: 6900bf46346f0f415812607e5eff67259fc7b478	2020-03-27 18:26:01 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
peter	0c16cedafe	Fix some incorrect annotations found by clang-cl (#35364 ) Summary: Fixes incorrect usages of symbol annotations including: 1. Exporting or importing a function/class in an anonymous namespace. 2. Exporting or importing a function/class implementation in a header file. However, by removing the symbol annotations, they are now local symbols. If they need to be remain global, I can move the implementations to the source file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35364 Differential Revision: D20670031 Pulled By: ezyang fbshipit-source-id: cd8018dee703e2424482c27fe9608e040d8105b8	2020-03-27 10:40:04 -07:00
Linbin Yu	93065ff767	[1] add missing header for C10_EXPORT_CAFFE2_OP_TO_C10 (#35245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35245 add missing header file for C10_EXPORT_CAFFE2_OP_TO_C10_CPU micro (Note: this ignores all push blocking failures!) Test Plan: buck build -c caffe2.expose_op_to_c10=1 //xplat/caffe2:mask_rcnn_opsAndroid Reviewed By: dreiss Differential Revision: D20528761 fbshipit-source-id: 7cd186ba72964c2e193aca994f87a91a71c3c5d7	2020-03-24 22:16:03 -07:00
Nikita Shulga	6f737dd4a3	Fix signed-unsigned warnings (#34791 ) Summary: And few typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/34791 Test Plan: CI Differential Revision: D20524879 Pulled By: malfet fbshipit-source-id: 58fa03bd6356979e77cd1bffb6370d41a177c409	2020-03-19 00:29:56 -07:00
Nikita Shulga	a3de359464	Do not throw from CUDAContext destructor (#34756 ) Summary: Throwing from destructor leads to undefined behaviour (most often to segault) So it's better to leak memory then segault Pull Request resolved: https://github.com/pytorch/pytorch/pull/34756 Test Plan: Run `test_pytorch_onnx_caffe2` Differential Revision: D20504228 Pulled By: malfet fbshipit-source-id: 7a05776fea9036f602e95b8182f8493cb5886dab	2020-03-18 00:13:18 -07:00
Nikita Shulga	e70c28856f	[Caffe2] Move more method implementations from tensor.h to tensor.cc (#34811 ) Summary: To speed up compilation time Pull Request resolved: https://github.com/pytorch/pytorch/pull/34811 Test Plan: CI Differential Revision: D20476992 Pulled By: malfet fbshipit-source-id: 922cde93783fbfc04854851d7a05a635d5239792	2020-03-16 22:15:18 -07:00
Nikita Shulga	ef78fa8668	caffe2::OperatorBase do not need to be aware of at::Tensor functions (#34810 ) Summary: Replacing <ATen/core/Tensor.h> with <<ATen/core/TensorBody.h> speeds up compilation of caffe2 operators by 15% For example, it reduces pool_op.cu compilation from 18.8s to 16s Pull Request resolved: https://github.com/pytorch/pytorch/pull/34810 Test Plan: CI Differential Revision: D20472230 Pulled By: malfet fbshipit-source-id: e1b261cc24ff577f09e2d5f6428be2063c6d4a8b	2020-03-16 12:58:05 -07:00
Linbin Yu	2fe7fc681d	[PT] add macro to expose caffe2 ops to PyTorch mobile (#34578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34578 Right now C10_EXPORT_CAFFE2_OP_TO_C10_CPU didn't work on mobile since we disabled some code paths. This diff added a new macro to enable these code paths so we can register caffe2 ops in PT mobile. Test Plan: verified caffe2 ops are registered in PT mobile (on the whole stack) ``` _caffe2::BBoxConcatBatchSplits(Tensor[] input_list, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output) _caffe2::BBoxTransform(Tensor rois, Tensor deltas, Tensor im_info, float[] weights, bool apply_scale, bool rotated, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::BoxWithNMSLimit(Tensor scores, Tensor boxes, Tensor batch_splits, float score_thresh, float nms, int detections_per_im, bool soft_nms_enabled, str soft_nms_method, float soft_nms_sigma, float soft_nms_min_score_thres, bool rotated, bool cls_agnostic_bbox_reg, bool input_boxes_include_bg_cls, bool output_classes_include_bg_cls, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor scores, Tensor boxes, Tensor classes, Tensor batch_splits, Tensor keeps, Tensor keeps_size) _caffe2::GenerateProposals(Tensor scores, Tensor bbox_deltas, Tensor im_info, Tensor anchors, float spatial_scale, int pre_nms_topN, int post_nms_topN, float nms_thresh, float min_size, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::HeatmapMaxKeypoint(Tensor heatmaps, Tensor bboxes_in, bool should_output_softmax=True, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor keypoints) _caffe2::ResizeNearest(Tensor X, str order, float width_scale, float height_scale, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor Y) _caffe2::RoIAlign(Tensor features, Tensor rois, str order, float spatial_scale, int pooled_h, int pooled_w, int sampling_ratio, bool aligned, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor) Reviewed By: dreiss Differential Revision: D20128254 fbshipit-source-id: 49a837dddc431eb528b5c72ffdfe0d0131cd10b4	2020-03-11 19:15:14 -07:00
Rohith Menon	879a90b322	[ModelLoading] Use byte encoding for uint8, fp16 etc. instead of int32 (#34343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34343 Use byte encoding for uint8, fp16 etc. instead of int32 in TensorProto serialization/deserialization tl;dr - fp16 tensor deserialization 12x faster, serialized size 25% lower - uint8 tensor deserialization 36x faster, serialized size 25% lower Test Plan: ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 12.37ms 80.82 BlobProtoByteDeserializationFloat16 1125.46% 1.10ms 909.64 ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 17.57ms 56.92 BlobProtoByteDeserializationUInt8 3629.45% 484.02us 2.07K ============================================================================ ``` Reviewed By: yinghai Differential Revision: D20137451 fbshipit-source-id: 8ed4be2286a6d4c7e134fcb0832f22bc645039a1	2020-03-06 11:58:30 -08:00
Artem Volkhin	75d29f8d3e	Allow converting IValue to vector<string> (#34269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34269 follow up for https://github.com/pytorch/pytorch/pull/16519 Test Plan: unit tests Reviewed By: houseroad Differential Revision: D20261495 fbshipit-source-id: 947f3cbd469d9258ec2dbb36cb68efe15a3b19eb	2020-03-05 12:31:23 -08:00
Michael Ranieri	1702152ef9	fixup unit tests (#34105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105 make parallel_net_test.cc chronos conforming. exclude gtest asserts that check thrown exceptions when exceptions are disabled. Test Plan: CI green Differential Revision: D20153525 fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0	2020-03-03 10:33:21 -08:00
cyy	8a14b41617	fix warnings reported by PVS (#33868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33868 Differential Revision: D20169059 Pulled By: ailzhang fbshipit-source-id: ec12226ae27ddd89fa5bacdd35151981ebfedcfd	2020-03-02 18:51:38 -08:00
Michael Ranieri	b874c039f6	Allow checking for cached module before asserting (#33954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33954 fixes caffe2/core/module_test.cc on windows misc lint fixes. Test Plan: CI green Reviewed By: malfet Differential Revision: D20153512 fbshipit-source-id: aeae84a028e26edd65c7218611e3c49a8d9bb8c0	2020-03-02 15:43:50 -08:00
Michael Ranieri	9239608037	fix windows clang attributes (#33959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959 make sure clang on windows uses correct attributes. add support for cl.exe style pragma attributes Test Plan: CI green Differential Revision: D20153548 fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1	2020-03-02 13:20:51 -08:00
Igor Sugak	5dde8cd483	[caffe2] fix no matching function min/max Clang errors (#33563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563 When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used. Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1]. 1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20005795 fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696	2020-02-28 11:33:24 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Yinghai Lu	a2f3c6c26f	Call RandomNumberSeed() on-demand (#33539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539 We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`. Test Plan: unittests. Canaries: AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410 AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838 Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569 Reviewed By: ipiszy Differential Revision: D19993190 fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff	2020-02-22 01:22:18 -08:00
Dmytro Dzhulgakov	e10aa6b72f	Fix flaky DagNetTest unittest Summary: The first run of the net is noisy sometimes - just run it twice. Reviewed By: cheshen1 Differential Revision: D20039274 fbshipit-source-id: 639e65646bf52f3efe1ecd4bbcd0e413d9389b29	2020-02-21 16:08:04 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Zachary DeVito	7e3c438913	Renaming IValue List functions (#32093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093 toGenericListRef -> toListRef isGenericList -> isList toGenericList -> toList toXListRef -> toXVector Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D19369767 Pulled By: zdevito fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae	2020-01-17 15:17:45 -08:00
Yanghan Wang	9b6ec61bfd	exposing CPU/GPU Copy ops (#32248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248 expose CPU/GPU copy ops Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test Reviewed By: houseroad Differential Revision: D19405856 fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e	2020-01-17 12:40:43 -08:00
Yinghai Lu	df514fd8c0	C++ C2/Glow operator unittest Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258 Test Plan: ``` buck test glow/fb/test/numerics:fp16_op_test ``` Reviewed By: bddppq Differential Revision: D19401786 fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9	2020-01-17 12:13:34 -08:00
Pavel Belevich	62b06b9fae	Rename TensorTypeId to DispatchKey (#32154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154 TensorTypeId -> DispatchKey c10/core/TensorTypeId.h -> c10/core/DispatchKey.h c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp TensorTypeId::* -> DispatchKey::* TensorTypeId type_id -> DispatchKey dispatch_key type_id -> dispatch_key TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys RealTensorTypeId -> RealDispatchKey TensorTypeSet -> DispatchKeySet TensorTypeIds -> DispatchKeys c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp type_set() -> key_set() type_set_ -> key_set_ typeSet -> keySet ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard LocalTensorTypeSet -> LocalDispatchKeySet c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp tls_local_tensor_type_set -> tls_local_dispatch_key_set tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded tls_is_tensor_type_id_included -> tls_is_dispatch_key_included tls_set_tensor_type_id_included -> tls_set_dispatch_key_included MultiDispatchTensorTypeSet -> MultiDispatchKeySet multi_dispatch_tensor_type_set -> multi_dispatch_key_set tensorTypeIdToBackend -> dispatchKeyToBackend backendToTensorTypeId -> backendToDispatchKey initForTensorTypeSet -> initForDispatchKeySet inferred_type_set -> inferred_key_set computeTensorTypeId -> computeDispatchKey PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set get_default_tensor_type_id -> get_default_dispatch_key inferred_type_id -> inferred_dispatch_key actual_type_id -> actual_dispatch_key typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_ get_type_id() -> get_dispatch_key() legacyExtractTypeId -> legacyExtractDispatchKey extractTypeId -> extractDispatchKey Test Plan: Imported from OSS Differential Revision: D19398900 Pulled By: pbelevich fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776	2020-01-15 11:16:08 -08:00
Zachary DeVito	14593f077f	remove list specialization from ivalue (#30734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734 What are specialized lists? The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types. e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>. Why do we have specialized lists? When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>, std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain these same types. Conversion was just unwrapping the IValue, very easy and cheap. What is the problem with specialized lists? We end up with significant special cases through the compiler. Other types like Dict are not specialized. So in the Pickler, for instance, there is a single piece of logic to handle their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't match Python, leading to problems along translation boundaries. Our pickle serialization is slightly different than python, so it is harder to load objects from our IValue serialization as Python values. They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++ bindings to TorchScript. This would entail having a single torch::List class (untemplated) that can be used to construct inputs. This is made much harder if the underlying ivalue needs to be different depending on the type inside the list. The ideal case would be to have a constructor like ``` template<typename T> List(std::vector<T> foo); ``` It would then set up the type tags correctly based on type T, without the need for passing tags. Do specialized lists improve perf? Not in a way we have been able to measure. Our major concern initially was having to translate a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern for aten::_convolution which takes a number of mostly-constant lists of integers. However, when we measure the effect of actually having to do this conversion for an aten::_convolution, it does not take measurable time (benchmark results below). This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code. What are the issues removing them? This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly the same. The only visible change is that toTensorListRef and family have turned into toTensorVector because they now return by value a copy of the list as a vector. Further PRs can then clean up the complexity issues that arose from speclization. This will likely involve removing the isTensorList/isIntList functions, and refactoring the code that used them to work generically. At some point we will also change serialization to no longer write specialized lists in the pickle binary. This is forward incompatible, so will go in its own PR. Benchmark: ``` import torch import torch.nn as nn import torch.nn.functional as F import time class MnistNet(nn.Module): def __init__(self): super(MnistNet, self).__init__() self.conv1 = nn.Conv2d(1, 1, kernel_size=1) self.conv2 = nn.Conv2d(1, 1, kernel_size=1) def forward(self, x): for i in range(10): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x model = MnistNet() x = torch.rand(1, 1, 1, 1) r = torch.jit.trace(model, x ) r(x) r(x) r(x) r(x) print(torch.jit.last_executed_optimized_graph()) while True: b = time.time() for i in range(100): r(x) e = time.time() print(e - b) ``` Results (no observable difference): ``` Before (actual conv) 0.13251137733459473 0.13260436058044434 0.13276338577270508 0.1327497959136963 0.13250041007995605 0.13270330429077148 0.13290190696716309 0.13265132904052734 0.13274288177490234 0.1326758861541748 0.13253355026245117 0.13254785537719727 0.13260746002197266 0.13285017013549805 0.13264012336730957 0.132490873336792 0.13280034065246582 0.13243484497070312 0.1325232982635498 0.1326127052307129 0.13264131546020508 0.13274383544921875 0.13298296928405762 0.1326909065246582 ------------------- After (actual conv) 0.13127517700195312 0.13150334358215332 0.13092470169067383 0.13102364540100098 0.13134360313415527 0.13155555725097656 0.13314104080200195 0.13151955604553223 0.13160037994384766 0.1315293312072754 0.13137340545654297 0.13148093223571777 0.131455659866333 0.1327371597290039 0.13134026527404785 0.13152337074279785 0.13151192665100098 0.13165974617004395 0.13403725624084473 0.13251852989196777 0.13135504722595215 0.1315624713897705 0.1317615509033203 0.1314380168914795 0.13157200813293457 -------------------- The following replace the convolution operator with a no-op, to show that even if the conv op was made faster, then we still would not see a difference: Before (fake conv) 0.0069539546966552734 0.0069522857666015625 0.007120847702026367 0.007344722747802734 0.007689952850341797 0.007932662963867188 0.00761723518371582 0.007501363754272461 0.007532835006713867 0.007141828536987305 0.007174253463745117 0.007114410400390625 0.007071495056152344 ------------------ After (fake conv) 0.007458209991455078 0.007337093353271484 0.007268190383911133 0.007313251495361328 0.007306575775146484 0.007468700408935547 0.0073091983795166016 0.007308483123779297 0.007538318634033203 0.007356882095336914 0.007464170455932617 0.007372140884399414 ``` Test Plan: Imported from OSS Differential Revision: D18814702 Pulled By: zdevito fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6	2020-01-12 18:28:25 -08:00
Hong Xu	daf00beaba	Remove duplicated Numa detection code. (#30628 ) Summary: cmake/Dependencies.cmake (`1111a6b810/cmake/Dependencies.cmake (L595-L609)`) has already detected Numa. Duplicated detection and variables may lead to incorrect results. Close https://github.com/pytorch/pytorch/issues/29968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628 Differential Revision: D18782479 Pulled By: ezyang fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0	2020-01-03 08:48:46 -08:00
peterjc123	c4121ed8db	Fix is_fundamental template for MSVC (#30959 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959 Differential Revision: D18891797 Pulled By: mingbowan fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1	2019-12-19 12:10:22 -08:00
Tristan Rice	b0bd35ff13	caffe2/event: allow multiple errors such as when cancelled (#31335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335 When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well. Typically we see: 1. SendOp failed due to a network error 2. async scheduling cancels all other ops via `SetFinished("Cancelled");` 3. Another SendOp fails due to a network error and crashes the process when the exception is thrown. This changes caffe2 ops to allow failing twice. Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu Reviewed By: andrewwdye Differential Revision: D19106548 fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9	2019-12-18 13:10:57 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Sebastian Messmer	409151e1bb	Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917 This is a C++14 feature, we can use this now. ghstack-source-id: 95255753 Test Plan: waitforsandcastle Differential Revision: D18869637 fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b	2019-12-15 23:54:16 -08:00
Richard Zou	9047d4df45	Remove all remaining usages of BUILD_NAMEDTENSOR (#31116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116 Changelist: - remove BUILD_NAMEDTENSOR macro - remove torch._C._BUILD_NAMEDTENSOR - remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR Future: - In the next diff, I will remove all usages of ATen/core/EnableNamedTensor.h since that header doesn't do anything anymore - After that, we'll be done with the BUILD_NAMEDTENSOR removal. Test Plan: - run CI Differential Revision: D18934951 Pulled By: zou3519 fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d	2019-12-12 09:53:03 -08:00
Shunting Zhang	7f5f2e8871	add ZERO_COLLISION_HASH to caffe2 data type (#30912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30912 Add a new data type ZERO_COLLISION_HASH . Test Plan: ci Reviewed By: boryiingsu Differential Revision: D18843626 fbshipit-source-id: b2d8280f13c78b4a656cf95822198df59de7b64c	2019-12-10 21:36:24 -08:00
Edward Yang	38986e1dea	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. This is a reland of https://github.com/pytorch/pytorch/pull/29731 but I've extracted all of the prep work into separate PRs which can be landed before this one. Some things of note: * torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) * The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO" * A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly * I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an exported fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way. * There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18790941 Pulled By: ezyang fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7	2019-12-04 08:04:57 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Sebastian Messmer	aa2862b843	Hide the OperatorKernel* argument from the stack based kernel API (#29337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337 This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it. But if a kernel is registered in a boxed way, we don't need it and should hide this from the API. This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does. Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so. ghstack-source-id: 94481316 Test Plan: unit tests Differential Revision: D18361991 fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492	2019-11-23 15:25:01 -08:00
Sebastian Messmer	583c288232	Add a OperatorHandle argument to boxed kernels (#29201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201 This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called. ghstack-source-id: 94481313 Test Plan: I will add unit tests in a diff stacked on top Differential Revision: D18282746 fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99	2019-11-23 15:24:49 -08:00
Junjie Bai	352731bd6e	Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip} Test Plan: revert-hammer Differential Revision: D18632773 Original commit changeset: ea717c81e0d7 fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d	2019-11-21 15:01:09 -08:00
Edward Yang	ec30d9028a	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. Some subtleties about the patch: - There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file. - DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases. - torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) - The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 - In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l ibprotobuf.a(arena.cc.o) is referenced by DSO" - A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions. - There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this. - Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases. Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18632773 Pulled By: ezyang fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82	2019-11-21 11:27:33 -08:00
Edward Yang	65bb34d885	Remove TensorImpl::is_variable, deprecate Tensor::is_variable (#29653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29653 I didn't remove is_variable from Tensor for BC reasons, but I did remove as many uses as I could from the codebase. at::impl::variable_excluded_from_dispatch got moved to TensorBody.h so that it's more widely accessible. This diff is NOT semantics preserving. Here are the major differences: - In a number of native operator implementations, we tested that arguments are not variable. I replaced these with asserts that variable is excluded from dispatch. I actually don't think these asserts are really necessary now (they should certainly be true, but it's hard to get it wrong), but I've kept them for old time's sake. At least, they'll detect if you call these functions before you've processed variable (indicating a bug in your kernel.) - There are a number of places where we do a per-tensor test for being a variable, for better error reporting when someone commits Tensor/Variable confusion. Although these tests are substantively the same as the tests above, in these cases I decided to delete the test entirely. The reasoning is that in these cases, we didn't really care about dispatch (also, see above; I'm not too sure we really need the dispatch asserts), we cared about Tensor/Variable confusion. Since Tensor/Variable confusion is impossible now, we don't need the tests. One of the key factors which pushed me one way or another was whether or not a function was doing per-tensor validation; if I kept the assert in such functions, I'd repeatedly access the TLS. Even if we want to bring back the asserts, they would have to go somewhere else. Another similar idiom is the number of places we do !x.defined() \|\| x.is_variable(); I treated this equivalently. - nuclear_norm's computation of compute_uv is a bit weird, but I think it's OK to just delete the is_variable case (I suspect that it is always the case that self.is_variable(), but it doesn't really matter.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18496168 Pulled By: ezyang fbshipit-source-id: 5a1ded931e0c10a6b758ba64a8380d34110e0c3e	2019-11-14 11:41:02 -08:00
Christy Lee	b8dca04f73	Add error message if CUDA startup fails (#29670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670 This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included. Test Plan: Build without gpu code. Run the binary. Check that the new error message exists. Reviewed By: yfeldblum Differential Revision: D18453798 fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e	2019-11-13 16:48:40 -08:00
Junjie Bai	b0c245d52d	Consolidate the places that find pybind11 include dirs (#29659 ) Summary: Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659 Differential Revision: D18458208 Pulled By: bddppq fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d	2019-11-12 14:51:56 -08:00
Yavuz Yetim	2704af0970	AsyncIf op implementation Summary: This diff adds the following: - An AsyncIf to support conditional async execution. This op assumes that then_net and else_net are async scheduling nets. This op itself completes when every async op in the active net completes. Cancellation cancels the inner nets and the async ops. - Unit tests targeting asynchronicity and error/cancellation handling. Test Plan: New unit tests With --stress-runs=2000: https://our.intern.facebook.com/intern/testinfra/testrun/4785074616784325 Reviewed By: ilia-cher Differential Revision: D18051357 fbshipit-source-id: 1399a437b3ca63fd4ea0cf08d173f85b9242cc1f	2019-11-07 08:51:31 -08:00
Ilia Cherniavskii	7190789f58	Handling of failing and terminal async cpu ops (#29052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29052 Make sure we handle the case of multiple, async, terminal (no children) and failing cpu ops. Test Plan: AsyncIf tests Reviewed By: yyetim Differential Revision: D18276401 Pulled By: ilia-cher fbshipit-source-id: 35b175dd025bc7e392056ac1331b159376a29e60	2019-11-04 12:01:21 -08:00
Yinghai Lu	c60bf2704a	Support Offline Tensors through ONNXIFI layer Summary: Previous import was b2ec1a8041879b7be98d81387a14cae895f952f4 Included changes: - [97fe555](https://github.com/houseroad/foxi/commit/97fe555): Add deferred weight reader pointer when initializing the graph (#15) <Yinghai Lu> - [ba2faf7](https://github.com/houseroad/foxi/commit/ba2faf7): Add status and timeout to events (#14) <Jack Montgomery> Test Plan: kicksandcastle Reviewed By: ipiszy Differential Revision: D18231697 fbshipit-source-id: 7566e2438d2b57f0feaadcd51f55a03552adeab9	2019-10-31 10:33:42 -07:00
Sebastian Messmer	bb0e46b65a	Remove preallocation of type ids (#28024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28024 We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id. However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning. caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType. I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned and remove the functionality for preallocated type ids. This simplifies our type ids. ghstack-source-id: 92051872 Test Plan: unit tests Differential Revision: D17936165 fbshipit-source-id: 2c9df2b9b3f35b3e319641c96638321ac3433d5c	2019-10-16 23:08:11 -07:00
Sebastian Messmer	d9de2e0ba9	Back out "Revert D17936166: [wip] Constexpr type ids" (#28155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28155 Original commit changeset: 92c63a96dedd ghstack-source-id: 92051874 Test Plan: unit tests Differential Revision: D17964410 fbshipit-source-id: 1d989d28b3e1de6d43c915f122f2b65a77a332eb	2019-10-16 18:24:04 -07:00
Lu Fang	1819fade35	Revert D17936166: [wip] Constexpr type ids Test Plan: revert-hammer Differential Revision: D17936166 Original commit changeset: 68cfa926c721 fbshipit-source-id: 92c63a96dedd8764e342c6437c6ea308d93d29b2	2019-10-16 06:47:10 -07:00
Sebastian Messmer	9cc4405dc9	Constexpr type ids (#28023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28023 ghstack-source-id: 91987335 Test Plan: waitforsandcastle Differential Revision: D17936166 fbshipit-source-id: 68cfa926c721e5fbc96e083eb47e784bf34a9df4	2019-10-15 21:21:20 -07:00
Sebastian Messmer	ef8bcfe2c7	Revert D17488861: constexpr type ids Test Plan: revert-hammer Differential Revision: D17488861 Original commit changeset: ce7b059d7c86 fbshipit-source-id: 426fca9abe7122190fc17ac6976bc6bcbd5718df	2019-10-15 09:59:21 -07:00
Sebastian Messmer	1865f31efa	Revert D17490109: Remove preallocation of type ids Test Plan: revert-hammer Differential Revision: D17490109 Original commit changeset: 800c340d9d35 fbshipit-source-id: a3e39bbce53c828fe553379d9f2b66dc8a07c982	2019-10-15 09:59:17 -07:00
Sebastian Messmer	cf01f53b5a	Remove preallocation of type ids (#26509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26509 We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id, see https://github.com/pytorch/pytorch/pull/10139. However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning. caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType. I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned and remove the functionality for preallocated type ids. This simplifies our type ids. ghstack-source-id: 91896918 Test Plan: unit tests Differential Revision: D17490109 fbshipit-source-id: 800c340d9d3556a99f6e3ffc33af14ad68d7cc59	2019-10-15 08:47:13 -07:00
Sebastian Messmer	6f865c1e37	constexpr type ids (#26502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26502 Create type ids at compile time instead of incrementing a counter at runtime. This is done by computing a compile time crc64 on the type name. We couldn't do this before, because we still used GCC4 and that compiler didn't support the use of `__PRETTY_FUNCTION__` in a constexpr context. However, since GCC5 this is possible and we can use this trick. This does not change the semantics of preallocated type ids. I actually think we don't need to preallocate anymore, but I split the removal of preallocation into a separate diff to be able to test it separately. ghstack-source-id: 91896920 Test Plan: unit tests Differential Revision: D17488861 fbshipit-source-id: ce7b059d7c8686b69cb091a4a8beaf4b96391343	2019-10-15 08:47:09 -07:00
Edward Yang	0b6186d778	Remove Tensor.h, TensorMethods.h from src/core. (#27086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086 This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past). This is a commandeer of #25031 Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D17687345 Pulled By: ezyang fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f	2019-10-06 09:37:50 -07:00
Hao Lu	1f0328c6d4	Add randomFill to test_utils.h Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests. Test Plan: ``` buck run mode/opt //tvm/sparse:cblas_bench ``` Reviewed By: yinghai Differential Revision: D17759193 fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1	2019-10-04 18:29:22 -07:00
Sebastian Messmer	ed207b53ab	c10::KernelFunction (#26337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26337 - Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. ghstack-source-id: 90459911 Test Plan: unit tests Differential Revision: D17416607 fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934	2019-09-20 18:55:25 -07:00
Ansha Yu	e44ea6cd5e	tvm operator dynolog (#26295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26295 Log the following in scuba caffe2_tvm_operator_stats: 1. everything in caffe2_operator_stats 2. fallback netdef 3. tvm module graph_json 4. whether compilation triggered this round 5. number of compilations stored in tvm_runtime_map 6. (not yet logged) last compilation time if any 7. (not yet logged) total bytes occupied by compilation 8. whether this compilation is fallback 9. batch size as observed by tvm op Test Plan: ``` buck run mode/dbg //tvm/sparse:tvm_bbpredictor_benchmark -- --init_net ~/tmp/ads/84480054_204/init_net.pb --input_init_net ~/tmp/ads/84480054_204/input_init_net.pb --pred_net ~/tmp/ads/84480054_204/pred_net.pb --warmup 1000 --iter 1000 --num_cycles 5 --caffe2_logging_operator_dyno_sampling_rate=1 --vmodule=Logger= 2 ``` Logs show up in the scuba: https://our.intern.facebook.com/intern/scuba/query/?dataset=caffe2_tvm_operator_stats https://fburl.com/scuba/lq2h22e4 Auto submitted adindexer canary: https://our.intern.facebook.com/intern/ads/canary/421064436039494716 Additional adindexer canary: https://our.intern.facebook.com/intern/ads/canary/421082681202831286/ Additional adfinder canary: https://our.intern.facebook.com/intern/ads/canary/421082685084831037/ Reviewed By: yinghai Differential Revision: D17358412 fbshipit-source-id: d2119c12ddeaa86217c163e32fb1e211952139f5	2019-09-18 18:37:17 -07:00
Andrey Malevich	28d3eb8156	Back out "Back out "[Caffe2] Fix device_option propagation"" (#25908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25908 Original commit changeset: f6e961e88c01 device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device). This diff is trying to fix this issue. Original diff had a problem, that Caffe2 is not handling cases when device option is present, but contains only metadata (for example one for auto-generated reduction ops in backward pass). This diff is addressing this issue by merging device options during the backward pass Test Plan: 1. net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components. 2. New unit-test. 3. Verify that previously broken benchmark is now passing ezyang do you have suggestions what else I should test? Reviewed By: ezyang Differential Revision: D17281528 fbshipit-source-id: 4a1bc386f29f6a34fbf8008effde9d4890abebfa	2019-09-17 04:01:36 -07:00
Sebastian Messmer	0e30e6570d	Call aten ops through c10 dispatcher (#23668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668 - The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch(). - These operators aren't registered with globalAtenDispatch anymore, only on c10 now. - Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them. ghstack-source-id: 90130455 Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit# Differential Revision: D16603133 fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82	2019-09-15 01:18:07 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Jiakai Liu	a3d0abf729	move GetDimFromOrderString to caffe2/core/types.h (#25671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25671 To decouple string_utils.h from types.h and protobuf headers. Logically GetDimFromOrderString seems to be more similiar to StringToStorageOrder comparing to other string_utils functions. Test Plan: - Will check all internal/external CI jobs. Reviewed By: yinghai Differential Revision: D17191912 Pulled By: ljk53 fbshipit-source-id: fe555feef27bfd74c92b6297c12fb668252ca9ff	2019-09-05 04:32:04 -07:00
Sebastian Messmer	791347642b	Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888 This is an alternative to https://github.com/pytorch/pytorch/pull/23684. Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed. ghstack-source-id: 89357687 Test Plan: waitforsandcastle Differential Revision: D16673569 fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf	2019-09-04 01:35:19 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
Yinghai Lu	4edf77b6c0	Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (#25519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25519 Fuse Gather-Fused8BitRowwiseQuantizedToFloat-Mul-LengthsSum opportunistically. Test Plan: ``` buck test caffe2/caffe2/opt/custom:concat_elim_test ``` Reviewed By: dreamingleo Differential Revision: D17125045 fbshipit-source-id: 8ee50410eb13a82e1e5c8180f392fce2fe9cd728	2019-09-03 19:08:49 -07:00
Edward Yang	58a0dee749	Replace open registration TensorTypeId with closed enum. (#25252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252 Our model going forward for extensions will be that you will have to get an allocation of an ID in our system. This is how things work in practice today; we're just simplifying our underlying registration since there is no need to have distributed registration. There are some codemods in this diff: ``` codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId' 'TensorTypeId::\1TensorId' codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined' 'TensorTypeId::UndefinedTensorId' codemod --extensions cpp 'TensorType1' 'TensorTypeId::CPUTensorId' codemod --extensions cpp 'TensorType2' 'TensorTypeId::CUDATensorId' codemod --extensions cpp 'TensorType3' 'TensorTypeId::XLATensorId' codemod --extensions cpp 'TensorType1' 'CPUTensorId' codemod --extensions cpp 'TensorType2' 'CUDATensorId' codemod --extensions cpp 'TensorType3' 'XLATensorId' ``` The main hand-written changes are in c10/core/TensorTypeId.h Other manual fixes: - aten/src/ATen/core/op_registration/op_registration.cpp - stop using std::string operator+ - aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that wasn't caught by codemod - torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration of TensorTypeId - aten/src/ATen/core/op_registration/ - remove out-of-line registration Differential Revision: D17072001 Test Plan: ossci and sandcastle Pulled By: ezyang fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1	2019-08-29 08:55:58 -07:00
Lucian Grijincu	9c9f14029d	Revert D16929363: Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16929363 Original commit changeset: 69d302929e18 fbshipit-source-id: add36a6047e4574788eb127c40f6166edeca705f	2019-08-20 17:08:31 -07:00
Lucian Grijincu	bd6cf5099b	Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16048264 Original commit changeset: ad1e50951273 fbshipit-source-id: 69d302929e183e2da26b64dcc24c69c3b7de186b	2019-08-20 16:26:18 -07:00
Roy Li	6824c9018d	Add static dispatch mode to reduce mobile code size Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22335 Test Plan: Imported from OSS Differential Revision: D16048264 Pulled By: li-roy fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e	2019-08-20 12:21:32 -07:00
Rui Zhu	5b0de85868	Register FC/Conv DNNLowp separately for supporting both tensor type (#24361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24361 Currently we only support Conv in kernel but have entrance for both type using one same class It is time make change Reviewed By: csummersea Differential Revision: D16604713 fbshipit-source-id: b98d39a2c7960707cd50ba27e43dce73f741eeeb	2019-08-14 17:15:42 -07:00
Edward Yang	5ae909b443	Revert D15920763: Move TensorOptions to ATen/core Differential Revision: D15920763 Original commit changeset: c3429973180a fbshipit-source-id: 0efb27722b371e1047f02240f071bc222b52e51d	2019-08-13 12:07:18 -07:00
Zachary DeVito	4a754dc3e3	cleanup warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24133 Test Plan: Imported from OSS Differential Revision: D16746249 Pulled By: zdevito fbshipit-source-id: 051f048b03043d6947544cd02ae44288bd439ef9	2019-08-12 16:12:30 -07:00
Richard Zou	bde73860c6	Move TensorOptions to ATen/core (#22020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22020 ghimport-source-id: 62766d49658ee84b8076c555432b50e13d104bc6 Test Plan: Imported from OSS Differential Revision: D15920763 Pulled By: zou3519 fbshipit-source-id: c3429973180a65606da82face5c0ee377035e716	2019-08-12 07:41:12 -07:00
Supriya Rao	9223fa1c46	Add support to serialize qtensor in JIT. (#23356 ) Summary: Adds qtensor specific fields to the proto file so that they get serialized into the model.json Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356 ghstack-source-id: 87263428 Differential Revision: D16473237 fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b	2019-07-26 15:52:15 -07:00
Yinghai Lu	b964bdb53a	Fbgemm fp16 tensor support (#23101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101 Support for - Shape inference - Tensor info extraction Reviewed By: zrphercule Differential Revision: D16345251 fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79	2019-07-19 17:08:03 -07:00
Yinghai Lu	2a8d5a132c	Fix workspace destruction ordering (#23096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096 nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first. Reviewed By: ajyu Differential Revision: D16382987 fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7	2019-07-19 16:49:50 -07:00
Will Feng	3a12520844	Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473 ) Summary: As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach. Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked. This supersedes https://github.com/pytorch/pytorch/pull/22418. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473 Differential Revision: D16099042 Pulled By: yf225 fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972	2019-07-08 11:31:10 -07:00
Jongsoo Park	9db7bc8bc7	fix uninitialized variable warning (#22477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477 There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together. Reviewed By: hx89 Differential Revision: D16100211 fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea	2019-07-06 00:36:45 -07:00
Sebastian Messmer	ed60d9fcf9	List/Dict remember and check their element type (#22005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005 When a Dict or List is created with type information, it will remember that. If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type. Differential Revision: D15914462 fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7	2019-07-05 15:17:51 -07:00
Sebastian Messmer	e68dc899d1	Fix compiler warnings (#22162 ) Summary: Fix various compiler warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162 Differential Revision: D16085339 Pulled By: smessmer fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d	2019-07-02 14:12:55 -07:00
Hong Xu	693871ded3	Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360 ) Summary: Currently the build system accepts USE_NAMEDTENSOR from the environment variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake. This discrepancy does not seem necessary and complicates the build system. The naming of this build option is also semantically incorrect ("BUILD_" vis-a-vis "USE_"). This commit eradicate this issue before it is made into a stable release. The support of NO_NAMEDTENSOR is also removed, since PyTorch has been quite inconsistent about "NO_*" build options. --- Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360 Differential Revision: D16074509 Pulled By: zou3519 fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae	2019-07-02 11:46:13 -07:00
Haixin Liu	869ce89474	use feenableexcept when glibc is available (#22241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387 glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace. Reviewed By: jspark1105 Differential Revision: D15301095 fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb	2019-07-02 10:49:55 -07:00
Andrew Naguib	3cba9e8aaa	Error Message Paraphrasing (#22369 ) Summary: Saying `I` in an err msg is too subjective to be used in a framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369 Differential Revision: D16067712 Pulled By: soumith fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6	2019-06-30 00:13:02 -07:00
Vitaly Fedyunin	516c7e4456	Adding memory_format to empty and empty_like operators (#20558 ) Summary: Original RFC https://github.com/pytorch/pytorch/issues/19092 To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default. ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nCwh = torch.empty_like(nhwC) new_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Now we need a way to preserve memory format in `empty_like` ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format) new_nhwC.is_contiguous(memory_format=torch.channels_last) == True like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format) like_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Usage of `torch.preserve_format` allows us to avoid `if` constructs. We can also generate different memory format outputs ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last) new_nhwC.is_contiguous(memory_format=torch.channels_last) == True new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format) new_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558 Differential Revision: D15502474 Pulled By: VitalyFedyunin fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331	2019-06-26 11:48:27 -07:00
Sebastian Messmer	de85abf226	Allow default construction of Dict/List (#22084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084 For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr. But since we renamed them to Dict/List, we can now allow default construction without ambiguity. Differential Revision: D15948098 fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6	2019-06-25 17:40:48 -07:00
Sebastian Messmer	e425789286	Fix "missing return statement" warning (#22216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216 - Differential Revision: D15989670 fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043	2019-06-25 16:57:42 -07:00
Ilia Cherniavskii	7b1d6c8912	Update intra_inter_benchmark (#22051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051 ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e Test Plan: PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake ./build/bin/intra_inter_benchmark Imported from OSS Differential Revision: D15933951 Pulled By: ilia-cher fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba	2019-06-21 23:06:27 -07:00
Sebastian Messmer	275087383b	ListPtr->List DictPtr->Dict step 2 (#21937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937 This changes call sites to use the new naming scheme Reviewed By: zdevito Differential Revision: D15892404 fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133	2019-06-19 18:02:05 -07:00
Sebastian Messmer	44128e09f0	Speed up op lookup and registration (#21806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806 Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it. This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle) and it also speeds up op registration since that needs to look if an op with the same name already eists. Differential Revision: D15834256 fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1	2019-06-19 12:05:14 -07:00
Will Feng	04f09d4235	Move unwrap logic from c10 to caffe2 (#21620 ) Summary: After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path. Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620 Differential Revision: D15763560 Pulled By: yf225 fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8	2019-06-14 22:02:43 -07:00
Sherman Wong	adc99efb46	Add batch id to tracer event (#21446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446 this is used for easier tracing of iter id when looking at trace diagram Reviewed By: ilia-cher Differential Revision: D15628950 fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d	2019-06-13 17:13:42 -07:00
Sungmann Cho	f59581218f	Fix spelling errors (#21665 ) Summary: alloctor -> allocator excutable -> executable excution -> execution foward -> forward initiaize -> initialize paralell -> parallel preprocesor -> preprocessor tranpose -> transpose Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665 Differential Revision: D15806155 Pulled By: soumith fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c	2019-06-13 15:21:55 -07:00
Karl Ostmo	49481d576d	Torch rename (#20774 ) Summary: This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR. The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774 Differential Revision: D15769965 Pulled By: kostmo fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821	2019-06-12 20:12:34 -07:00
Sebastian Messmer	b527e48588	Use c10::List (#21177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177 - Integrate c10::ListPtr into IValue and the c10 dispatcher. - Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now Differential Revision: D15476433 fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954	2019-06-12 13:58:24 -07:00
Sebastian Messmer	fe5ceea580	Rename caffe2<->c10 operator wrappers (#21322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322 Naming is everything. - Rename c10_operator.h -> export_caffe2_op_to_c10.h - Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h - Rename corresponding macros This hugely improves readability and explains what these things are doing. Reviewed By: dzhulgakov Differential Revision: D15616816 fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603	2019-06-07 13:48:10 -07:00
Rui Zhu	2b902e9738	Fix the offset numerical bug when casting (#21484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484 cast<int32_t*> => cast<int32_t> Also fixed reserve problem which might cause incorrect pointer. Reviewed By: yinghai Differential Revision: D15699866 fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990	2019-06-07 12:33:18 -07:00
Peng Gong	78a376592d	add cancelAsyncCallback method to OperatorBase (#21492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492 If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks. The new behavior is to cancel the callbacks first, then set event status to finished. Reviewed By: ilia-cher Differential Revision: D15702475 fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca	2019-06-06 20:57:12 -07:00
Junjie Bai	4c19421f16	Register gradient op with engine (#21205 ) Summary: cc dreiss Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205 Differential Revision: D15578948 Pulled By: bddppq fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345	2019-05-31 18:48:47 -07:00
Sebastian Messmer	85777b92b2	Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946 Some of these are probably implementable for exported operators, but aren't implemented yet and for now it's better to assert than to just return wrong results. Reviewed By: ezyang Differential Revision: D14430749 fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707	2019-05-29 13:16:00 -07:00
Jongsoo Park	0290897bca	tracing for intra_op_parallel (#20603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603 When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized. This diff also traces child tasks. Reviewed By: ilia-cher Differential Revision: D14820008 fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9	2019-05-28 17:39:23 -07:00
Kimish Patel	d6d192e0af	Added engine information to the profiling result. (#20493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493 This helps distinguish if the op was a quantized op or not. Reviewed By: salexspb Differential Revision: D15337854 fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94	2019-05-28 16:41:12 -07:00
Kimish Patel	7afa75006e	Enable operator profiling via command line (#20173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173 Enabled op profiling even when net type is not dag or prof dag. Also added engine type info to summary. Reviewed By: salexspb, ilia-cher Differential Revision: D15177813 fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb	2019-05-28 16:41:08 -07:00
Sebastian Messmer	6063ffd055	Specify dispatch key with kernel (#20821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821 Change registration API. Instead of static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>() .dispatchKey(CPUTensorId())); it is now static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>(CPUTensorId())); This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers. The semantic problem behind this is that the dispatch key is a kernel config parameter and not an operator config parameter while things like autograd wrappers, alias info, and actually the kernel itself are operator config parameters. And while previously, the different kind of config parameters have been mixed, this diff now separates them. Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example. // what is this supposed to do? static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .aliasInfo(DEFAULT) .dispatchKey(CPUTensorId())); If we get more kernel config parameters in the future, we could introduce something like this static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>(torch::RegisterOperators::kernelOptions() .dispatchKey(CPUTensorId()) .otherConfig()); but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility. A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call: static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel1>(CPUTensorId()) .kernel<Kernel2>(CUDATensorId())); Reviewed By: dzhulgakov Differential Revision: D15455790 fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11	2019-05-24 14:23:35 -07:00
Sebastian Messmer	4501dc305d	Assert against using Operator methods not supported when exporting it to c10 (#17818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818 Some of these are probably implementable for exported operators, but aren't implemented yet and for now it's better to assert than to just return wrong results. Reviewed By: ezyang Differential Revision: D14392459 fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52	2019-05-24 13:45:01 -07:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Yinghai Lu	48bf7b9be8	Fix oscillation in coalesceInsertedDataDependencies (#20833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20833 Att. The algorithm is still "horrendously inefficient". But since we are sunsetting Nomnigraph, I just did the minimal fix here. Reviewed By: tracelogfb Differential Revision: D15463880 fbshipit-source-id: 413a1280a92c1923ba49031177816a2d5f888575	2019-05-23 14:04:20 -07:00
Yinghai Lu	cf7ef5e631	Add onnxifi support for Int8FCDNNLowPPackedWeightBlob (#20564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20564 Reviewed By: bddppq Differential Revision: D15106712 fbshipit-source-id: 428db9c23cfd36ddedc8d79121fbbb3bb484c993	2019-05-20 16:57:11 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Jongsoo Park	7b9ee598d6	separate option for FE_OVERFLOW (#20476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20476 There're overflow exceptions happening for legitimate computation like for big x, sigmoid(x) = 1 / (1 + exp(-x)) = 1 / (1 + inf) = 1 This diff separates the option for FE_OVERFLOW to make caffe2_operator_throw_if_fp_exceptions=1 option less noisy. Reviewed By: hx89 Differential Revision: D15332947 fbshipit-source-id: 9148233f5b84551a0900f0557ba22f2b1508ae0c	2019-05-19 16:05:27 -07:00
Sebastian Messmer	cb6be42403	Options based registration API (#20514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20514 Change API from static auto registry = c10::RegisterOperators() .op("my::op", c10::kernel(...), c10::dispatchKey(...) ); to static auto registry = c10::RegisterOperators() .op("my::op", c10::RegisterOperators::options() .kernel(...) .dispatchKey(...) ); because this allows better discoverability. People looking for which options are available will easier find it and IDE autocompletion will work better. Reviewed By: zdevito Differential Revision: D15346348 fbshipit-source-id: 4b74a33b75c2b9cda4a903639fb7abd2c7cff167	2019-05-17 20:54:42 -07:00
Vitaly Fedyunin	5b78a5eadb	Memory format support for contiguous and is_contiguous (#20455 ) Summary: #19975 was separated by 2 PRs. This one: Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions. At this moment both functions just operate with strides and doesn't store any tensor state. (Original RFC #19092) ----- Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api). Note: We had several complaints about `.to(memory_format)` function, and decided not to support it. 1. `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`. - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior. - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern. `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise. 2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`. - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged. - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format. Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455 Differential Revision: D15341577 Pulled By: VitalyFedyunin fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d	2019-05-16 07:18:24 -07:00
Rui Zhu	c129ab06e9	Change onnxifi workflow to support multi-group quantized & Add multi quantization info to caffe2.proto (#20439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20439 This is the QTensorProto workflow for multi group quantization in C2 side. No DNNLOWP Tensor related thing is included in this pr, so once we finished glow side, we should be able to test this pr using resnet50. Reviewed By: yinghai Differential Revision: D15096919 fbshipit-source-id: 741eecd59eb79d24d9fe2b035f6246d42422d25c	2019-05-15 19:24:08 -07:00
Edward Yang	73a97387c1	Replace AT_CHECK with TORCH_CHECK [shard 9/10] Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20435 Reviewed By: jerryzh168 Differential Revision: D15318877 fbshipit-source-id: 4d83571187ea14a604fef83ac355d328b46d93e1	2019-05-15 08:05:59 -07:00
Kedar Pujara	254de9e8ec	Removing cyclic dependency (#20511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20511 Removed cyclic dependency of caffe2/core/net.h and workspace.h Differential Revision: D15303412 fbshipit-source-id: 6e772e372cd0cf2af05d7815f1df8ae20bc2a65e	2019-05-14 18:55:19 -07:00
Sebastian Messmer	9e7f22b223	Remove dependencies from Caffe2Go on PyTorch JIT (#20463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20463 Source file changes mostly involve ifdef'ing-out references to JIT code from files that are part of Caffe2Go. Update Internal build scripts to remove those files from our globs. After this, changes to most of the JIT files should not trigger mobile CI. Reviewed By: dzhulgakov Differential Revision: D15329407 fbshipit-source-id: 48f614c6b028eef0a03ce5161d083a3e078b0412	2019-05-14 14:36:08 -07:00
Ansha Yu	a9aaf698a4	add c2 benchmark runs in cpp (#20108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20108 Add cpp runs for c2, hooked up via pybinds. Print output to terminal. This is not hooked up with the pep output yet because I'd like to verify the numbers first. Note that this isn't quite the same mechanism as the pytorch cpp hookup, which uses cpp_python_extensions. If I can use the same mechanism to pull all the inputs for c2 through cpp and do FeedBlobs in cpp, then I'll switch to that. Reviewed By: zheng-xq Differential Revision: D15155976 fbshipit-source-id: 708079dacd3e19aacfe43d70c5e5bc54da2cf9e3	2019-05-13 17:01:08 -07:00
Richard Zou	e01a5bf28b	Add USE_NAMEDTENSOR compilation flag. (#20162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20162 ghimport-source-id: 0efcd67f04aa087e1dd5faeee550daa2f13ef1a5 Reviewed By: gchanan Differential Revision: D15278211 Pulled By: zou3519 fbshipit-source-id: 6fee981915d83e820fe8b50a8f59da22a428a9bf	2019-05-09 09:09:16 -07:00
Yanbo Liang	a8387b7779	Delete TensorImpl::GetDevice() (#20025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20025 Delete TensorImpl::GetDevice() and clean all its call sites. Reviewed By: ezyang Differential Revision: D15170917 fbshipit-source-id: b6862b74aa036198544f79d18a8c0f995cb0ca7b	2019-05-06 12:44:23 -07:00
Tongliang Liao	1dfeffbff5	Expose test utils (#20114 ) Summary: Some functions were not decorated with `CAFFE2_API`, makes them unusable when creating unit tests for custom ops outside Caffe2 repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20114 Differential Revision: D15217490 Pulled By: ezyang fbshipit-source-id: dda3910ad24e566567607deaac705a34ec8e7b8d	2019-05-06 07:06:04 -07:00
Tongliang Liao	f2c715cbe1	Fix the spelling of "context" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20055 Differential Revision: D15217488 Pulled By: ezyang fbshipit-source-id: bb2b57b5e749357b47a01c6c3e73addf3c5418c7	2019-05-06 06:54:30 -07:00
Sebastian Messmer	fb8792e2b6	Remove torch/jit from xplat build (#19967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19967 - Reviewed By: dreiss, dzhulgakov Differential Revision: D15150843 fbshipit-source-id: af7d6902934883be9d8021b3601de2fe1f3bf806	2019-05-02 15:31:06 -07:00
Zachary DeVito	55c719b161	Remove operator.h's dependency on function_schema.h (#19817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19817 A lot of files were depending on the JIT's typesystem because operator.h depends on function_schema.h. However, this isn't fundamental to the design. This diff tries to remove the direct depenency and only includes the c10 wrapper helpers in files where it is required. Reviewed By: smessmer Differential Revision: D15112247 fbshipit-source-id: 2c53d83e542c32d9a398c8b60dbf40ab7a1cb0f6	2019-04-29 19:50:43 -07:00
Xiaomeng Yang	2ce39de3fc	Add elementwise_affine for layer_norm_op (#19713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713 Add elementwise_affine for layer_norm_op Reviewed By: houseroad Differential Revision: D15075454 fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f	2019-04-26 17:20:01 -07:00
David Goodwin	c855e04d5f	Caffe2 shouldn't fail if CUDA peer access is already enabled Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19586 Differential Revision: D15061544 Pulled By: dzhulgakov fbshipit-source-id: 6a5f9f4fe45259d689671f58ad5206cdaf15c5bd	2019-04-24 13:22:27 -07:00
Yinghai Lu	b85edac16f	Fix out-of-topological-order issue in Nomnigraph (#19458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19458 The algorithm in https://fburl.com/ggh9iyvc fails to really ensure topological ordering of nodes. The fix is ugly but effective. I think we need a real topological sort to fix this issue more nicely. Mikhail Zolotukhin, Bram Wasti. Differential Revision: D15011893 fbshipit-source-id: 130c3aa442f5d578adfb14fbe5f16aa722434942	2019-04-19 12:19:39 -07:00
Sebastian Messmer	17f05ad5e5	Moving at::Tensor into caffe2::Tensor without bumping refcount (#19388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19388 The old implementation forced a refcount bump when converting at::Tensor to caffe2::Tensor. Now, it is possible to move it without a refcount bump. Reviewed By: dzhulgakov Differential Revision: D14986815 fbshipit-source-id: 92b4b0a6f323ed38376ffad75f960cad250ecd9b	2019-04-18 14:13:26 -07:00
Sebastian Messmer	601f36bacc	Use string based schema for exposing caffe2 ops (#19287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19287 Since we now have a string-schema-based op registration API, we can also use it when exposing caffe2 operators. Reviewed By: dzhulgakov Differential Revision: D14931925 fbshipit-source-id: ec162469d2d94965e8c99d431c801ae7c43849c8	2019-04-18 02:04:50 -07:00
Sebastian Messmer	db611b7caf	Delete C10Tensor (#19328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19328 Plans changed and we don't want this class anymore. Reviewed By: dzhulgakov Differential Revision: D14966746 fbshipit-source-id: 09ea4c95b352bc1a250834d32f35a94e401f2347	2019-04-17 00:02:27 -07:00
Will Feng	c7b5a8a876	Change is_variable() to check existence of AutogradMeta, and remove is_variable_ (#19139 ) Summary: Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead. Removing `is_variable_` is part of the work in Variable/Tensor merge. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139 Differential Revision: D14893339 Pulled By: yf225 fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a	2019-04-11 14:03:33 -07:00
Gregory Chanan	b6ee83a5b4	Materialize a non-default device for C2 legacy storage. (#18605 ) Summary: It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath. This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device. Instead, we materialize a device by allocating 0 bytes via the allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605 Differential Revision: D14680620 Pulled By: gchanan fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432	2019-04-11 13:50:41 -07:00
Yinghai Lu	bbe648dffb	Allow empty net type (#19154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19154 I recently saw some weird workflow error due to empty but set net_type. Maybe we should just fallback to simple net in this case. Reviewed By: dzhulgakov Differential Revision: D14890072 fbshipit-source-id: 4e9edf8232298000713bebb0bfdec61e9c5df17d	2019-04-11 12:43:07 -07:00
Alexander Sidorov	0ca8f7a15f	Make BlackBoxPredictor handle networks throwing exceptions (#19080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080 OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test Reviewed By: dzhulgakov Differential Revision: D14814194 fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c	2019-04-09 16:42:12 -07:00
Lu Fang	75d6d8833d	remove interned_string.h dep (#19061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061 remove the deps on interned_string.h Reviewed By: BIT-silence Differential Revision: D14850078 fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32	2019-04-09 09:59:15 -07:00
Lu Fang	443a58e03d	Export C10 operator in PyTorch Model (#18210 ) Summary: Almost there, feel free to review. these c10 operators are exported to _caffe2 domain. TODO: - [x] let the onnx checker pass - [x] test tensor list as argument - [x] test caffe2 backend and converter - [x] check the c10 schema can be exported to onnx - [x] refactor the test case to share some code - [x] fix the problem in ONNX_ATEN_FALLBACK Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210 Reviewed By: zrphercule Differential Revision: D14600916 Pulled By: houseroad fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144	2019-04-08 16:06:00 -07:00
Duc Ngo	e7b2669151	caffe2 - Expose tensor filler util to Python (#18886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886 Expose tensor filler util to Python and add a unit test (both C++/Python) Reviewed By: salexspb Differential Revision: D14784470 fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d	2019-04-08 11:54:10 -07:00
Jerry Zhang	40a54bf2f1	Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531 Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service, we would like to change it to C10_LOG_FIRST_N to prevent that. Reviewed By: dzhulgakov Differential Revision: D14647704 fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e	2019-04-02 21:03:37 -07:00
Rui Zhu	19fe2b9db4	Adding quantized tensor shape/type info support for caffe2=>glow in caffe2 side (#18621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18621 This diff added caffe2 support for onnxifi quantization. Reviewed By: yinghai Differential Revision: D14648767 fbshipit-source-id: 4ddb492cacbba6142305866e6dbb875880acaea3	2019-03-31 17:42:27 -07:00

... 2 3 4 5 6 ...

1443 Commits