pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Hao Lu	51bf7bed84	[caffe2] Allow memonger to optimize nets with inplace(enforced) ops (#46560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46560 Follow-up for D24236604 (`16c52d918b`). For nets that pass the schema check, memonger actually makes sure to preserve the inplaceness of operators if they are already inplace. So we can safely enable it for correct input nets. (Note: this ignores all push blocking failures!) Differential Revision: D24402482 fbshipit-source-id: a7e95cb0e3eb87adeac79b9b69eef207957b0bd5	2020-10-22 13:23:33 -07:00
Richard Barnes	c44300884e	Clarify timing of GetDeviceProperty() (#46715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715 Test Plan: N/A Reviewed By: ezyang Differential Revision: D24455538 fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009	2020-10-22 11:29:31 -07:00
Tristan Rice	0c9787c758	caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987 This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb) For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current. Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions. Reviewed By: dzhulgakov Differential Revision: D23219710 fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814	2020-10-16 16:08:35 -07:00
Tristan Rice	dd169ca17c	caffe2/plan_executor: propagate exceptions from reporter substeps (#46424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46424 Currently if an exception occurs in a reporter thread the process is killed via std::terminate. This adds support for handling the reporter exception if FLAGS_caffe2_handle_executor_threads_exceptions is set to true. Test Plan: buck test mode/opt -c python.package_style=inplace //caffe2/caffe2/python:hypothesis_test //caffe2/caffe2:caffe2_test_cpu -- --stress-runs 100 Reviewed By: dahsh Differential Revision: D24345027 fbshipit-source-id: 0659495c9e27680ebae41fe5a3cf26ce2f455cb3	2020-10-16 12:28:57 -07:00
Hao Lu	16c52d918b	[caffe2] Bypass memonger for in-place ops (#46378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46378 Reviewed By: dzhulgakov Differential Revision: D24236604 fbshipit-source-id: 9f599687467ea969e89243482f8e2a41f7db0a23	2020-10-15 16:03:52 -07:00
Danny Huang	85c3ba5588	[caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46110 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145). * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added PlanExecutorTest `ErrorPlanWithCancellableStuckNet` for plan executor. * Set cancelCount to zero at the beginning of tests to avoid global state be carried over in some test environment. Test Plan: ## Unit Test Added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 1000 ``` Reviewed By: d4l3k Differential Revision: D24226577 fbshipit-source-id: c834383bfe6ab50747975c229eb42a363eed3458	2020-10-12 12:00:15 -07:00
Danny Huang	87226f72d2	[caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46080 temp removal of ErrorPlanWithCancellableStuckNet, will fill out more Test Plan: ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest ``` remove a test Reviewed By: fegin Differential Revision: D24213971 fbshipit-source-id: e6e600bad00b45c726311193b4b3238f1700526e	2020-10-08 23:35:45 -07:00
Danny Huang	487624e369	[caffe2] plan executor error propagation test with blocking cancellable op (#45319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45319 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145) * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added `ErrorPlanWithCancellableStuckNet` for plan executor. * We set a plan with two nets: one stuck net with blocking operator that never returns, and one with error net with error op that throws, and tested it throw and cancel. Test Plan: ## Unit Test added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 ``` ``` Summary Pass: 400 ListingSuccess: 2 ``` Reviewed By: d4l3k Differential Revision: D23920548 fbshipit-source-id: feff41f73698bd6ea9b744f920e0fece4ee44438	2020-10-08 19:54:49 -07:00
Tristan Rice	59e4803b94	Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981 This is a recommit of previously reverted D20850851 (`3fbddb92b1`). TL;DR - combining condition_variables and atomics is a bad idea https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them. Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120 buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100 buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/ will ensure no timeouts in OSS Reviewed By: walterddr, dahsh Differential Revision: D24165505 fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196	2020-10-08 14:17:30 -07:00
Rong Rong	1bb2d41b68	Revert D20850851: caffe2/plan_executor: wait for 1 minute after exception and then abort Test Plan: revert-hammer Differential Revision: D20850851 (`3fbddb92b1`) Original commit changeset: 330503775d80 fbshipit-source-id: 612c6c3c4d5586bc8ad00a112cd00fc74fb44243	2020-10-07 09:04:24 -07:00
Tristan Rice	3fbddb92b1	caffe2/plan_executor: wait for 1 minute after exception and then abort (#45297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45297 If we have two concurrent substeps and one of them throws an exception and the other is blocking, we'll currently hang. This waits up to 1 minute for it to complete before terminating the process. Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: dahsh Differential Revision: D20850851 fbshipit-source-id: 330503775d8062a34645ba55fe38e6770de5e3c7	2020-10-06 12:59:09 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00
Danny Huang	2b8f0b2023	[caffe2] adds Cancel to OperatorBase and NetBase (#44145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44145 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * Adds `NetBase::Cancel()` to NetBase which iterates over the entire list of operators and call Cancel. * Cancel on all ops was added to Net since there's nothing Asyc specific about it. * `AsyncSchedulingNet` calls parent Cancel. * To preserve backwards compatibility, `AsyncSchedulingNet`'s Cancel still calls `CancelAndFinishAsyncTasks` . * Adds `Cancel()` to `OperatorBase`. Reviewed By: dzhulgakov Differential Revision: D23279202 fbshipit-source-id: e1bb0ff04a4e1393f935dbcac7c78c0baf728550	2020-09-11 12:50:26 -07:00
Wanchao Liang	d07a36e0c1	Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False Test Plan: revert-hammer Differential Revision: D23490149 (`15e99b6ff6`) Original commit changeset: a76382c30d83 fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38	2020-09-04 22:59:39 -07:00
Nikita Shulga	15e99b6ff6	Compile less legacy code when BUILD_CAFFE2 is set to False (#44079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079 Reviewed By: walterddr Differential Revision: D23490149 Pulled By: malfet fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53	2020-09-04 20:04:21 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Natalia Gimelshein	d1d32003bb	force pytorch tensors to contiguous before calling c2 ops Summary: per title, makes c2 wrappers safer as contiguity of torch inputs is not guaranteed Test Plan: covered by existing tests Reviewed By: dzhulgakov Differential Revision: D23310137 fbshipit-source-id: 3fe12abc7e394b8762098d032200778018e5b591	2020-08-24 23:04:13 -07:00
Sean Lynch	f80b695a75	Properly format db.h and db.cc (#43027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43027 Format db.h and db.cc using the default formatter. This change was split off of D22705434. Test Plan: Wait for sandcastle. Reviewed By: rohithmenon, marksantaniello Differential Revision: D23113765 fbshipit-source-id: 3f02d55bfb055bda0fcba5122336fa001562d42e	2020-08-24 18:29:45 -07:00
Tristan Rice	5e04bb2c1c	caffe2: expose CPUContext RandSeed for backwards compatibility with external RNG (#43239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43239 This is an incremental step as part of the process to migrate caffe2 random number generator off of std::mt19937 and to instead use at::mt19937+at::CPUGeneratorImpl. The ATen variants are much more performant (10x faster). This adds a way to get the CPUContext RandSeed for tail use cases that require a std::mt19937 and borrow the CPUContext one. Test Plan: This isn't used anywhere within the caffe2 codebase. Compile should be sufficient. Reviewed By: dzhulgakov Differential Revision: D23203280 fbshipit-source-id: 595c1cb447290604ee3ef61d5b5fc079b61a4e14	2020-08-21 19:36:38 -07:00
Ehsan K. Ardestani	ecb9e790ed	Remove excessive logging in plan_executor (#42888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42888 as title Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file /mnt/public/ehsanardestani/temp/quant_eval_inputs_all.json Reviewed By: amylittleyang Differential Revision: D23066529 fbshipit-source-id: f925afd1734e617e412b0f171e16c781d13272d9	2020-08-11 23:57:17 -07:00
Ehsan K. Ardestani	a5af2434fe	NVMified NE Eval Summary: This diff NVMifies the NE Eval Flow. - It defines a `LoadNVM` operator which either - receives a list of nvm blobs, or - extracts the blobs that could be NVMified from the model. - dumps NVMified blobs into NVM - and deallocates from DRAM - NVMify the Eval net on dper and C2 backend Specific NVMOp for SLS is pushed through different diffs. Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file=/mnt/public/ehsaardestani/temp/small_model.json 2>&1 \| tee log Reviewed By: yinghai, amylittleyang Differential Revision: D22469973 fbshipit-source-id: ed8379ad404e96d04ac05e580176d3aca984575b	2020-08-06 10:25:31 -07:00
Dmytro Dzhulgakov	06d978a9ad	[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: \| Case \| Call to device_count() \| init_cuda, e.g. allocating tensor \| \| -- \| -- \| -- \| \| all good \| non-zero \| just works \| \| no gpus \| 0, no warning \| throw exception with good message \| \| driver issues \| 0, produce warning \| throw exception with good message \| \| out of memory with ASAN \| 0, produce warning\| throw exception with ASAN message \| Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5	2020-08-05 11:39:31 -07:00
Rohith Menon	4e16be9073	[MemLeak] Fix memory leak from releasing unique ptr (#41883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41883 Fix memory leak from releasing unique ptr Test Plan: Tested serialization with and without the change. Heap profile without change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 7298.4 MB 4025.2 55.2% 55.2% 4025.2 55.2% c10::alloc_cpu (inline) 3195.3 43.8% 98.9% 3195.3 43.8% caffe2::SerializeUsingBytesOrInt32 63.6 0.9% 99.8% 63.6 0.9% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.5 0.0% 99.9% 2.5 0.0% folly::aligned_malloc (inline) 1.2 0.0% 99.9% 1.2 0.0% caffe2::detail::CopyFromProtoWithCast (inline) 1.0 0.0% 99.9% 1.0 0.0% __new_exitfn 1.0 0.0% 100.0% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::HHWheelTimerBase::newTimer (inline) 0.5 0.0% 100.0% 0.5 0.0% std::__detail::_Hashtable_alloc::_M_allocate_node ``` Heap profile with change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 6689.2 MB 4025.2 60.2% 60.2% 4025.2 60.2% c10::alloc_cpu (inline) 2560.0 38.3% 98.4% 2560.0 38.3% caffe2::::HugePagesArena::alloc_huge (inline) 90.9 1.4% 99.8% 90.9 1.4% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.0 0.0% 99.9% 2.0 0.0% prof_backtrace_impl (inline) 1.0 0.0% 99.9% 20.3 0.3% std::__cxx11::basic_string::_M_construct (inline) 1.0 0.0% 99.9% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 99.9% 0.5 0.0% folly::UnboundedQueue::allocNextSegment (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::aligned_malloc (inline) 0.5 0.0% 100.0% 0.5 0.0% __new_exitfn ``` Reviewed By: yinghai Differential Revision: D22662093 fbshipit-source-id: d0b8ff1ed26c72b14bb02fb1146c51ef11a7e519	2020-07-22 16:54:19 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Lu Fang	b2e52186b9	Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41461 capacity is misleading, and we have many wrong uses internally. Let's rename to nbytes to avoid the confusion in future. Ultimately, we could remove this parameter if possible. So far I haven't seen any case this capacity is necessary. Test Plan: oss ci Differential Revision: D22544189 fbshipit-source-id: f310627f2ab8f4ebb294e0dd5eabc380926991eb	2020-07-15 22:04:18 -07:00
Linbin Yu	df1f8a48d8	add null check for c2 tensor conversion (#41096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41096 The spark spot model had some issues in tensor conversion, see P134598596. It happens when we convert an undefined c10 tensor to caffe2 tensor. This diff added a null check. Test Plan: spark spot model runs without problem Reviewed By: smessmer Differential Revision: D22330705 fbshipit-source-id: dfe0f29a48019b6611cad3fd8f2ae49e8db5427e	2020-07-09 11:44:23 -07:00
Nikita Shulga	d1352192e2	Move `OperatorBase::AddRelatedBlobInfo` implementation to .cc file (#40844 ) Summary: If virtual function is implemented in header file, it's implementation will be included as a weak symbol to every shared library that includes this header along with all of it's dependencies. This was one of the reasons why size of libcaffe2_module_test_dynamic.so was 500Kb (AddRelatedBlobInfo implementation pulled a quarter of libprotobuf.a with it) Combination of this and https://github.com/pytorch/pytorch/issues/40845 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40844 Differential Revision: D22334725 Pulled By: malfet fbshipit-source-id: 836a4cbb9f344355ddd2512667e77472546616c0	2020-07-01 11:48:15 -07:00
Nikita Shulga	cbdf399fc6	Move OperatorSchema default inference function implementations to .cc… (#40845 ) Summary: … file This prevents implementation of those functions(as lambdas) to be embedded as weak symbol into every shared library that includes this header. Combination of this and https://github.com/pytorch/pytorch/pull/40844 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40845 Differential Revision: D22334779 Pulled By: malfet fbshipit-source-id: 64706918fc2947350a58c0877f294b1b8b085455	2020-07-01 11:42:52 -07:00
Sean Lynch	64689c2474	Remove unecessary copy within blob serialization (#40096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40096 Declaring `tensor_proto` to be of type `auto` means that it will copy the entire `TensorProto` instead of just keeping a reference. This changes it to just use a const reference instead. Test Plan: Using the model loader benchmark to measure model loading performance: ### `tensor_proto` is of type `const auto&` ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 11.08ms 90.27 BlobProtoByteDeserializationFloat16 1509.73% 733.73us 1.36K ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 10.48ms 95.45 BlobProtoByteDeserializationUInt8 2974.57% 352.22us 2.84K ============================================================================ ``` ### `tensor_proto` is of type `auto` ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 13.84ms 72.26 BlobProtoByteDeserializationFloat16 658.85% 2.10ms 476.08 ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 17.09ms 58.51 BlobProtoByteDeserializationUInt8 3365.98% 507.80us 1.97K ============================================================================ ``` Reviewed By: marksantaniello Differential Revision: D21959644 fbshipit-source-id: 6bc2dfbde306f88bf7cd4f9b14b95ac69c2e1b4d	2020-06-16 14:45:59 -07:00
Ilia Cherniavskii	01986e9890	Wait for all op types in SimpleNet (#39493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39493 Make sure we wait for all types, incl. async cpu ops Test Plan: CI Reviewed By: kennyhorror Differential Revision: D21873540 fbshipit-source-id: 37875cade68e1b3323086833f8d4db79362a68e8	2020-06-11 13:00:34 -07:00
Dmytro Dzhulgakov	e46060701d	[caffe2] Fix of initializing ATen's CUDA before using caching allocator (#39759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39759 Caffe2 has a mode where it uses PT's caching allocator. Somehow we were not calling the initialization explicitly. Now, I have no idea why it worked before. Probably worth to run a bisect separately. Reviewed By: houseroad Differential Revision: D21962331 fbshipit-source-id: f16ad6b27a67dbe0bda93939cca8c94620d22a09	2020-06-09 17:25:42 -07:00
Natalia Gimelshein	9c19a12965	fix asserts in cuda code (#39047 ) Summary: Gets rid of some in-kernel asserts where they can be replaced with static_asserts Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary replaces host code `assert`s with `TORCH_INTERNAL_ASSERT` Another group of asserts is in fractional max pooling kernels which should be fixed regardless https://github.com/pytorch/pytorch/issues/39044, the problems there are not just asserts. I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39047 Differential Revision: D21750392 Pulled By: ngimel fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719	2020-05-28 15:51:38 -07:00
Natalia Gimelshein	ba14a701dc	restore proper cuda assert behavior with DNDEBUG (#38943 ) Summary: Per title. https://github.com/pytorch/pytorch/issues/32719 essentially disabled asserts in cuda kernels in release build. Asserts in cuda kernels are typically used to prevent invalid reads/writes, so without asserts invalid read/writes are silent errors in most cases (sometimes they would still cause "illegal memory access" errors, but because of caching allocator this usually won't happen). We don't need 2 macros, CUDA_ALWAYS_ASSERT and CUDA_KERNEL_ASSERT because all current asserts in cuda kernels are important to prevent illegal memory accesses, and they should never be disabled. This PR removes macro CUDA_ALWAYS_ASSERT and instead makes CUDA_KERNEL_ASSERT (that is commonly used in the kernels) an asserttion both in release and debug builds. Fixes https://github.com/pytorch/pytorch/issues/38771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38943 Differential Revision: D21723767 Pulled By: ngimel fbshipit-source-id: d88d8aa1b047b476d5340e69311e65aff4da5074	2020-05-26 18:11:00 -07:00
Kurt Mohler	f9eb8824f1	Remove datatype from Storage and StorageImpl (#38870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870 * Removed dtype data member from StorageImpl * Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Original PR: https://github.com/pytorch/pytorch/pull/38038 Reviewed By: albanD Differential Revision: D21549645 Pulled By: ezyang fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de	2020-05-21 15:26:08 -07:00
Ilia Cherniavskii	a94fb71b12	Memory profiling (#37775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37775 Adding memory usage into profiler table output Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake ``` import torch import torchvision.models as models model = models.resnet18() inp = torch.randn(5, 3, 224, 224) with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof: model(inp) print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15)) ``` ``` --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]] empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 [] stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]] empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 [] is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]] masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]] conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]] contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]] _convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 [] thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [ --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 154.855ms ``` Reviewed By: ngimel Differential Revision: D21384248 Pulled By: ilia-cher fbshipit-source-id: 31359cce2aa06f6255ed1ad8c60d03cb640bfec3	2020-05-19 15:48:48 -07:00
Xiang Gao	5e2d8745c8	RIP CUDA <9.2: circleci, aten, and caffe2 (#36846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36846 Test Plan: Imported from OSS Differential Revision: D21620850 Pulled By: ngimel fbshipit-source-id: 7ad1676a12f86250f301095ffc6f365a3b370f34	2020-05-18 13:41:05 -07:00
Allan Di Wu	d35ab0b7ae	Fix CUDA memory management issues caused by not using PinnedCPUAllocator (#38066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38066 Increasing priority for PinnedCPUAllocator to make sure it is set when CUDA is enabled. Test Plan: buck test mode/dev-nosan //vision/fair/detectron2/tests:test_export_caffe2 -- 'testMaskRCNNGPU $test_export_caffe2\.TestCaffe2Export$' Reviewed By: ppwwyyxx Differential Revision: D21465835 fbshipit-source-id: 643cff30d35c174085e5fde5197ddb05885b2e99	2020-05-07 21:52:00 -07:00
Ansha Yu	32329c3338	[nomni] fix outputs check to replaceSubgraph (#38005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38005 D21445887 runs into a dbgo build crash on this stack P130135519 It is because the assertion sg_inputs_copy.size() == 0 is too restrictive. nn::getOutputs(sg) returns "output" nodes which can include any inputs that have additional consumers that are not in the subgraph itself. To fix, proposing to remove inputs from the output check. Test Plan: Run tests Sanity canaries: https://our.intern.facebook.com/intern/ads/canary/426498931666198610/ https://our.intern.facebook.com/intern/ads/canary/426498935267166205/ Reviewed By: bwasti Differential Revision: D21445881 fbshipit-source-id: 419a4b1a230f0370619cea574403bfa114e56a7c	2020-05-07 19:58:15 -07:00
Edward Yang	fe88806784	Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893 Original commit changeset: 50746043acf3 Test Plan: sandcastle and ossci Reviewed By: malfet, seemethere, ngimel Differential Revision: D21416509 fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b	2020-05-05 22:43:15 -07:00
Nikita Shulga	9f060d3873	[Caffe2] Increase timing threshold to 50 ms on Windows (#37892 ) Summary: Helps prevent following accidental failures: ``` ..\caffe2\core\parallel_net_test.cc:303 The difference between ms and 350 is 41, which exceeds kTimeThreshold, where ms evaluates to 391, 350 evaluates to 350, and kTimeThreshold evaluates to 40. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37892 Differential Revision: D21417251 Pulled By: malfet fbshipit-source-id: 300cff7042e466f014850cc7cc406c725d5d0c04	2020-05-05 19:45:36 -07:00
Edward Yang	a2fc7f787a	Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count Test Plan: revert-hammer Differential Revision: D21171334 Original commit changeset: 37329a379de9 fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f	2020-05-05 16:36:15 -07:00
Kurt Mohler	3706803b60	Change StorageImpl to track byte count rather than element count (#37776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776 * Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl * Changed numel() and set_numel() to nbytes() and set_nbytes() * Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028 Differential Revision: D21171334 Pulled By: ezyang fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8	2020-05-05 14:20:51 -07:00
Edward Yang	a058e938f9	Refactor error msg stack handling, add TORCH_RETHROW (#37101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37101 Fixes #36954. The basic concept is to streamline the process of rethrowing c10::Error with extra error information. This is in a few steps: - I completely remodeled the Error data type and the internal invariants. Instead of manually adding in newlines, the message stack formatting process is responsible for inserting newlines and spacing as necessary. Call sites are then modified to respect the new API model. - TORCH_RETHROW macro is added, which adds context to an error message and then rethrows it. New internal assert failure looks like: ``` 0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch. Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first): frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so) frame #1: ... ``` Error message with context looks like: ``` This is an error This is context 1 This is context 2 ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202891 Pulled By: ezyang fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169	2020-05-04 11:56:45 -07:00
Edward Yang	efd8f70cac	Make msg() and msg_with_backtrace() private (#37094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37094 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202892 Pulled By: ezyang fbshipit-source-id: d59e6bffabd90cc734056bdce2cd1fe63262fab8	2020-05-04 11:54:34 -07:00
cyy	2658bae570	use std::move (#34365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34365 Differential Revision: D21349942 Pulled By: mrshenli fbshipit-source-id: 4deb51cbb557501b43990ec7080c71a839cb5db9	2020-05-01 13:42:23 -07:00
Sebastian Messmer	4e976b9334	Remove callBoxedWorkaround (#36850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36850 Since now all unboxing happens after dispatch, which means that all c10 ops support unboxing, we can now use op.callBoxed() for all ops and don't need callBoxedWorkaround (which was going through the JIT registry) anymore. ghstack-source-id: 102879558 Test Plan: waitforsandcastle Differential Revision: D21102375 fbshipit-source-id: d1e041116563a9650d5a86b07eb96d217d8756f3	2020-04-24 23:13:31 -07:00
Nikita Shulga	e7a72bb0c6	Add nomnigraph include folder to `Caffe2_GPU_INCLUDE` (#37056 ) Summary: Because `caffe2/contrib/tensort` includes nomnigraph headers Pull Request resolved: https://github.com/pytorch/pytorch/pull/37056 Test Plan: `cmake ../pytorch -DPYTHON_EXECUTABLE=/usr/bin/python3.7 -DCMAKE_BUILD_TYPE=RELWITHDEBINFO -DUSE_CUDA=YES -DBUILD_TEST=YES -DUSE_TENSORRT=YES -DTENSORRT_ROOT=$HOME/Downloads/TensorRT-7.0.0.11 -DCMAKE_CXX_COMPILER=/usr/bin/cuda-g++ -DCMAKE_C_COMPILER=/usr/bin/cuda-gcc -DUSE_MKLDNN=ON -G Ninja; ninja torch_cuda` Differential Revision: D21178927 Pulled By: malfet fbshipit-source-id: e1bed94fdb395ebfd6eb5d950ca378da77592531	2020-04-22 09:44:13 -07:00

1 2 3 4 5 ...

1294 Commits