pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
David Reiss	ad8c0e57ef	Add a command-line flag for overriding pthreadpool size (#46781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46781 Test Plan: Passed it to speed_benchmark_torch and saw perf change. Reviewed By: iseeyuan Differential Revision: D24752889 Pulled By: dreiss fbshipit-source-id: 762981510f271d20f76e33b6e6f361c4a6f48e6c	2020-11-05 21:30:54 -08:00
Tristan Rice	0c9787c758	caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987 This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb) For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current. Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions. Reviewed By: dzhulgakov Differential Revision: D23219710 fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814	2020-10-16 16:08:35 -07:00
Yinghai Lu	a92b49f7c8	[Onnxifi] Don't throw exception when we cannot write out debug files (#45979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45979 For some reason, sometime we cannot write out the debug files. This shouldn't block the whole service. Hence, we opt in to error out instead of throw error. Test Plan: Run net_runner test at `/` and observe error being printed out but the test passes. Reviewed By: ipiszy Differential Revision: D24165081 fbshipit-source-id: a4e1d0479d54d741e615e3a00b3003f512394fd4	2020-10-08 00:18:24 -07:00
Michael Suo	18253f4a48	Fix BUILD_CAFFE2 if FBGEMM and NNPACK are not built (#45610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45610 Also add to the usual documentation places that this option exists. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24058199 Pulled By: suo fbshipit-source-id: 81574fbd042f47587e2c7820c726fac0f68af2a7	2020-10-01 14:58:55 -07:00
Xiang Gao	0a15646e15	CUDA RTX30 series support (#45489 ) Summary: I also opened a PR on cmake upstream: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/5292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45489 Reviewed By: zhangguanheng66 Differential Revision: D23997844 Pulled By: ezyang fbshipit-source-id: 4e7443dde9e70632ee429184f0d51cb9aa5a98b5	2020-09-29 18:19:23 -07:00
Lingyi Liu	2d884f2263	Optimize Scale function (#44913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18322 Optimize Scale function i-am-not-moving-c2-to-c10 Test Plan: buck test mode/dbg caffe2/caffe2/python/operator_test:weighted_sum_test Reviewed By: BIT-silence Differential Revision: D14575780 fbshipit-source-id: db333a7964581dcaff6e432ff1d6b517ba1a075f	2020-09-18 14:31:33 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00
Yangxin Zhong	514f20ea51	Histogram Binning Calibration Summary: Adding a calibration module called histogram binning: Divide the prediction range (e.g., [0, 1]) into B bins. In each bin, use two parameters to store the number of positive examples and the number of examples that fall into this bucket. So we basically have a histogram for the model prediction. As a result, for each bin, we have a statistical value for the real CTR (num_pos / num_example). We use this statistical value as the final calibrated prediction if the pre-cali prediction falls into the corresponding bin. In this way, the predictions within each bin should be well-calibrated if we have sufficient examples. That is, we have a fine-grained calibrated model by this calibration module. Theoretically, this calibration layer can fix any uncalibrated model or prediction if we have sufficient bins and examples. It provides the potential to use any kind of training weight allocation to our training data, without worrying about the calibration issue. Test Plan: buck test dper3/dper3/modules/calibration/tests:calibration_test -- test_histogram_binning_calibration buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_histogram_binning_calibration All tests passed. Example workflows: f215431958 {F326445092} f215445048 {F326445223} Reviewed By: chenshouyuan Differential Revision: D23356450 fbshipit-source-id: c691b66c51ef33908c17575ce12e5bee5fb325ff	2020-09-06 17:11:16 -07:00
Wanchao Liang	d07a36e0c1	Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False Test Plan: revert-hammer Differential Revision: D23490149 (`15e99b6ff6`) Original commit changeset: a76382c30d83 fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38	2020-09-04 22:59:39 -07:00
Nikita Shulga	15e99b6ff6	Compile less legacy code when BUILD_CAFFE2 is set to False (#44079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079 Reviewed By: walterddr Differential Revision: D23490149 Pulled By: malfet fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53	2020-09-04 20:04:21 -07:00
Hao Lu	39b4701d31	[caffe2][redo] Reimplement RemoveOpsByType with SSA (#41606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41606 The previous diff (D22220798 (`59294fbbb9`) and D22220797) was recently reverted (D22492356 (`28291d3cf8`), D22492355) because of a bug associated with the op AsyncIf. The AsyncIf op has net_defs as args and the SSA rewriting didn't take that into account. It has a special path for the op If, but not for AsyncIf. Several changes I made to fix the bug: 1) Add op AsyncIf to the special path for If op in SSA rewriting 2) clear inputs/outputs of the netdefs that are args in If/AsyncIf ops because they're no longer valid 3) revert renamed inputs/outputs in the arg netdefs that are in the external_outputs in the parent netdef 2) and 3) are existing bugs in the `SsaRewrite` function that were just never exposed before. The algorithm for `RemoveOpsByType` is the same as in my previous diff D22220798 (`59294fbbb9`). The only new changes in this diff are in `onnx::SsaRewrite` and a few newly added unit tests. (Note: this ignores all push blocking failures!) Reviewed By: yinghai Differential Revision: D22588652 fbshipit-source-id: ebb68ecd1662ea2bae14d4be8f61a75cd8b7e3e6	2020-07-17 16:06:43 -07:00
David Reiss	b7e044f0e5	Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5	2020-06-23 19:26:21 -07:00
Kate Mormysh	92d3182c11	Revert D21232894: Unify PyTorch mobile's threadpool usage. Test Plan: revert-hammer Differential Revision: D21232894 (`b9d3869df3`) Original commit changeset: 8b3de86247fb fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd	2020-06-23 17:09:14 -07:00
Ashkan Aliabadi	b9d3869df3	Unify PyTorch mobile's threadpool usage. (#37243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243 * Why * As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version. The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands. This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell. So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do. The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the exact same third party implementation in this PR. Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged. * How * This is where things get tricky. A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use. pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far. Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration. When it is all said or done, the layering will look like this: a) aten::parallel_for, uses b) caffe2::PThreadPool, which uses c) pthreadpool C API, which delegates to c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here. c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to c-2-1) caffe2::ThreadPool, and the rabbit hole ends here. NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b). Differential Revision: D21232894 Test Plan: Imported from OSS Reviewed By: dreiss Pulled By: AshkanAliabadi fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354	2020-06-23 16:34:51 -07:00
Xiang Gao	b3fac8af6b	Initial support for building on Ampere GPU, CUDA 11, cuDNN 8 (#39277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277 This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8. TF32 related features will not be included in this PR. Test Plan: Imported from OSS Differential Revision: D21832814 Pulled By: malfet fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06	2020-06-02 10:03:42 -07:00
Xiang Gao	5e2d8745c8	RIP CUDA <9.2: circleci, aten, and caffe2 (#36846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36846 Test Plan: Imported from OSS Differential Revision: D21620850 Pulled By: ngimel fbshipit-source-id: 7ad1676a12f86250f301095ffc6f365a3b370f34	2020-05-18 13:41:05 -07:00
peter	ec7beda822	Use thrust::host_vector instead of std::vector (#38178 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38024. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38178 Differential Revision: D21502379 Pulled By: ezyang fbshipit-source-id: 74dd6504c56f4150ed4cef129fd3f32f378c0564	2020-05-11 20:34:04 -07:00
Nikita Shulga	44345ad08c	Do not define C10_IOS on Mac (#37283 ) Summary: Because MacOS is not iOS Pull Request resolved: https://github.com/pytorch/pytorch/pull/37283 Test Plan: CI Differential Revision: D21244398 Pulled By: malfet fbshipit-source-id: b822e216e83887e2f2961b5c5384eaf749629f61	2020-04-25 13:52:46 -07:00
Richard J. Knight	93cd05b0f4	Fix CMake errors on systems where {Q/X}NNPACK is not supported (#35607 ) Summary: - add a couple of checks for USE_XNNPACK to disable additional code paths if XNNPACK is not supported When passing through the code paths where the platform checks are made (cmake/Dependencies.cmake:89), if XNNPACK is not supported, then the var FXDIV_SOURCE_DIR will not be set. CMake emits the errors when add_directory is called and FXDIV_SOURCE_DIR is empty. see: https://github.com/pytorch/pytorch/issues/34606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35607 Differential Revision: D20895645 Pulled By: seemethere fbshipit-source-id: 3bd10cf89f0fb6825fdd6e1d52c71ee37c67b953	2020-04-24 12:37:23 -07:00
Dmytro Dzhulgakov	49457a7be7	Logging for ATen op subtype Summary: ATenOp should go away, but before it does it's important to understand what's going inside of it. We already log `arguments`, but it's rather hard to parse in scuba as its a list, not a dictionary. Let's extract operator name explicitly so that grouping works well Test Plan: unittest Reviewed By: ngimel Differential Revision: D21057966 fbshipit-source-id: 86be7cca39055620477a28bd5d8ab29e8edd2ff9	2020-04-19 23:02:50 -07:00
Nikita Shulga	f548946363	Fix out-of-boundary access in `caffe2::StartsWith` (#36672 ) Summary: `std::mismatch( InputIt1 first1, InputIt1 last1, InputIt2 first2 )` assumes that container for `first2` iterator contains at least `last1 - first` elements, which is not the case if `prefix` is longer than `str` Found while running unit tests on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/36672 Differential Revision: D21049407 Pulled By: malfet fbshipit-source-id: ad45779d47a0c6898900e0247c920829a2179f62	2020-04-15 20:40:59 -07:00
Dmytro Dzhulgakov	7576cf8d00	[caffe2] Use cpuinfo in perfkernels to simplify build dependency (#36371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36371 It allows to drop circular dependency and remove unknown_symbols in Buck build. It'd be good to get rid of GetCpuId all together in favor of cpuinfo, but it's not really blocking anything Reviewed By: malfet Differential Revision: D20958000 fbshipit-source-id: ed17a2a90a51dc1adf9e634af56c85f0689f8f29	2020-04-10 13:26:34 -07:00
Johannes M Dieterich	be125d18dd	[ROCm] [ROCm 2.10+] enable fp16 dot in Caffe2 backend (#30432 ) Summary: ROCm 2.10 has a hdot implementation, use it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30432 Differential Revision: D20777482 Pulled By: ezyang fbshipit-source-id: b4826cc399faa08bd83047375283b17bcd2477eb	2020-04-03 08:01:23 -07:00
Dmytro Dzhulgakov	1f759936f0	Propagate model id used by Predictor to Caffe2 logging Summary: Does the same things as D19658565 but for Caffe2 models. From investigation https://fb.quip.com/PbgsAEmoJVuf the model id that predictor uses and the model id saved inside the model don't match. Common reason is recurring fluent2 jobs but there are others. Since model_id from predictor is what the rest of datasets use, it's way more useful imho. I've considered adding both ids, but it'd require additional piping and I don't think it's that useful. Test Plan: unittests added Reviewed By: houseroad Differential Revision: D20630599 fbshipit-source-id: 3e6d0cb0b6f8c8b6ae5935138f55ae7a2ff60653	2020-03-29 23:07:32 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
Kevin Matzen	6d8649dc53	[caffe2] fix Transpose2D calls in NHWC<->NCHW (#34625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34625 These templated function calls are not specifying the template args correctly. The first arg is the index type, not the array data type. That means, right now it's using `T` as the index type as well, which will break if we do a template specialization for uint8_t. If we omit both, it will correctly infer that the index type is `int` and the data type is `T`. Reviewed By: BIT-silence Differential Revision: D20358728 fbshipit-source-id: 8cbd8eeb14bce602c02eb6fce2cc141f0121fa24	2020-03-16 15:18:44 -07:00
Yinghai Lu	79e1305519	[net_runner] Get shape info from qtensors (#34321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34321 Mostly cosmetic as we can infer the shape anyway. It can remove a lot of the noise in the log though. Note that weight sharing doesn't work yet. I'll add another diff to address this. Reviewed By: houseroad Differential Revision: D20290841 fbshipit-source-id: fe6f9b60d05dbe150af15b5d9d7a69fd902e12cc	2020-03-09 18:34:16 -07:00
Kimish Patel	8269c4f3d3	Added nullptr check for pthradpool_get_threads_count (#34087 ) Summary: We get seg fault without this in using XNNPACK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34087 Differential Revision: D20199787 Pulled By: kimishpatel fbshipit-source-id: d3d274e7bb197461632b21688820cd4c10dcd819	2020-03-04 11:10:53 -08:00
Michael Ranieri	51d969e86a	preprocessor cleanup (#33957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957 lots of small preprocessor warning cleanup for windows Test Plan: CI green Reviewed By: malfet, albanD Differential Revision: D20153582 fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04	2020-03-02 13:37:19 -08:00
Kimish Patel	0e52627358	Fixing pthreadpool symbol conflict issue. (#33869 ) Summary: Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that is conflicting, to pthread_create_c2. Removed 2 other conflicting symbols that are not used internally at all. Pointing XNNPACK to original repo instead of the fork. Copy pasted the new interface and implementation to caff2/utils/threadpool, so that for internal builds we compile against this. When threadpool is unified this will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869 Differential Revision: D20140580 Pulled By: kimishpatel fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3	2020-02-28 21:23:18 -08:00
Igor Sugak	23846d5a38	[caffe2] use Clang identification macro in various places (#33574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33574 Sprinkle with Clang identification macro places that otherwise would cause build errors when Clang is used to drive the CUDA compilation. Note: `__clang__` is defined when either Clang is used as host compiler by NVCC or when Clang drives the compilation. `__CUDA__` is defined only for the latter case. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Reviewed By: BIT-silence Differential Revision: D20007440 fbshipit-source-id: 53caa70695b99461a3910d41dc71a9f6d0728a75	2020-02-20 15:16:11 -08:00
Igor Sugak	108fc78395	[caffe2] fix invalid % escape in inline assembly strings (#33554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554 NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool. 1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow Reviewed By: bddppq Differential Revision: D20003621 fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc	2020-02-20 14:31:52 -08:00
Hao Lu	81394581a3	[Caffe2][ThreadPool] Make sure numThreads does not exceed the number of big cores (#33523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33523 When using `ThreadPool::setNumThreads` to set the number of threads, it should not exceed the number of big cores. Otherwise, the performance could degrade significantly. Test Plan: ``` cd ~/fbsource/xplat buck test caffe2:caffe2_testAndroid ``` Reviewed By: dreiss Differential Revision: D19779267 fbshipit-source-id: 4e980e8a0ccc2f37e1c8ed16e2f4651d72924dbd	2020-02-19 18:24:24 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
James Donald	84dfa96f62	Fix -Wundef warning in conversions.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31911 Test Plan: * CI builds including GPU and OSS-build tests * The `defined(__HIP_DEVICE_COMPILE__) ` instance a few lines below is proof that this is a define/undef flag, not a define01 flag Reviewed By: hlu1 Differential Revision: D19296560 fbshipit-source-id: 1c45069aec534b0bf4a87751a74680675c985e06	2020-01-08 08:39:37 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Ivan Kobzarev	ca8cb3241a	Expose setNumThreads to android api (#31205 ) Summary: PR https://github.com/pytorch/pytorch/pull/31033 was unlanded due to macos build failure: https://app.circleci.com/jobs/github/pytorch/pytorch/3916388 This PR has changes that `setNumThreads` is only for android and moved to separate class `org.pytorch.PytorchAndroid` as a static function which is better as it has global effect Pull Request resolved: https://github.com/pytorch/pytorch/pull/31205 Reviewed By: dreiss Differential Revision: D18977250 Pulled By: IvanKobzarev fbshipit-source-id: 4995859808af498c82933c4db52bd7c7dfae90e5	2019-12-12 18:57:27 -08:00
Michael Suo	c0bcfd0445	Revert D18923167: Expose setNumThreads to android api Test Plan: revert-hammer Differential Revision: D18923167 Original commit changeset: 8d98c2edbff4 fbshipit-source-id: 7db37cff298c511d0dd9eb373811c769e4a73be9	2019-12-12 09:23:58 -08:00
Ivan Kobzarev	6225443009	Expose setNumThreads to android api (#31033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31033 Intention: There are requests from users to control number of threads from android side: https://discuss.pytorch.org/t/android-pytorch-forward-method-running-in-a-separate-thread-slow-down-ui-thread/63516/2 https://discuss.pytorch.org/t/threading-of-model-pytorch-android/62490/2 At the moment `setNumThreads` is placed in `org.pytorch.Module`, but this method changes global threadPool size, in future we will move it to some separate class to repeat python binding structure, which has torch.set_num_threads() Test Plan: Imported from OSS Differential Revision: D18923167 Pulled By: IvanKobzarev fbshipit-source-id: 8d98c2edbff42e9b673509672dce3f2dd03a923e	2019-12-11 14:20:14 -08:00
Tao Xu	b730d04ed2	Fix deadlock issues in ThreadPool (#29885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29885 ### Summary Currently, we have a deadlock issue on iOS when running Resnet50. The problem happens when the task being run in the ThreadPool wants to call `getNumThread()` who will try to acquire the same mutex. And thus cause the deadlock situation. The fix is just remove the guard for `_numThreads`, as it's not likely to change after initialization. ### Test Plan 1. Generate a Resnet50 model using trace_model.py 2. Run `ios/TestApp/bootstrap.sh` to do the benchmark cc shoumikhin AshkanAliabadi Test Plan: Imported from OSS Differential Revision: D18533505 Pulled By: xta0 fbshipit-source-id: 2a069d20b59833ec8b02ff05515c3739a85a15de	2019-11-15 19:27:52 -08:00
Elliott Clark	ad58045af9	Remove LOG(INFO) from math_cpu.cc (#27001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27001 This unconditional log line spams the logs enough that it's a drag on cpu and will eventually fill up logs. Test Plan: Allow unit test and automated testing to give feedback. Reviewed By: jspark1105 Differential Revision: D17638140 fbshipit-source-id: 4e8a44bda31327ba7e797f7579a9e3bf866eef7e	2019-09-27 16:37:49 -07:00
Jongsoo Park	d5490c662e	batch size 0 tests in BatchMatMul ops (#26874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26874 Add batch_size == 0 testings of BatchMatMul DNNLOWP operator. Test Plan: CI Reviewed By: jianyuh Differential Revision: D17596117 fbshipit-source-id: 029e29e6c2bd7894d83dac46e8ce8484cc92b1c0	2019-09-26 16:08:39 -07:00
Jiakai Liu	6d0b004574	rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool Summary: Rename old mobile_threadpool() API, replace it with a new version that returns caffe2::ThreadPool instead of pthreadpool_t. Test Plan: - builds Differential Revision: D17543413 Pulled By: ljk53 fbshipit-source-id: a3effd24e8ce9d677a2a04ebe6b6e1582e6f0a65	2019-09-24 22:27:35 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Jiakai Liu	a3d0abf729	move GetDimFromOrderString to caffe2/core/types.h (#25671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25671 To decouple string_utils.h from types.h and protobuf headers. Logically GetDimFromOrderString seems to be more similiar to StringToStorageOrder comparing to other string_utils functions. Test Plan: - Will check all internal/external CI jobs. Reviewed By: yinghai Differential Revision: D17191912 Pulled By: ljk53 fbshipit-source-id: fe555feef27bfd74c92b6297c12fb668252ca9ff	2019-09-05 04:32:04 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
Zachary DeVito	6a48a5b65c	Fix more warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24291 Test Plan: Imported from OSS Differential Revision: D16795898 Pulled By: zdevito fbshipit-source-id: cbd5f2dd4e3bbd361909ae13c243561899568ad0	2019-08-14 17:47:54 -07:00
Supriya Rao	40db964455	Add support for using caffe2::ThreadPool in pytorch mobile QNNPACK. (#23658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23658 How things work for caffe2: Caffe2 Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (caffe2 shim) -> caffe2::ThreadPool Before this PR: Pytorch Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (third_party implementation without mobile optimization) caffe2::ThreadPool is optimized for mobile. This change leverages this logic for pytorch mobile as a temporary solution improve pytorch mobile perf. It is guarded by the C10_MOBILE macro. For server side we return nullptr. Plan for next steps: Implement a mobile version of "at::parallel_for" which uses caffe2::ThreadPool internally so all ATen/TH multithreading usage is mobile optimized. Refactor QNNPACK and/or pthreadpool to explicitly using "at::parallel_for" primitive to replace pthreadpool_compute_1d for Pytorch. After QNNPACK is refactored, we will delete the mobile_threadpool() API. ghstack-source-id: 88073396 Reviewed By: dreiss Differential Revision: D16594020 fbshipit-source-id: 9f94600756d5f86d24a12a2fd7df3eebd0994f1d	2019-08-12 18:14:15 -07:00
Gregory Chanan	2f03205c65	Support torch::tensor and at::tensor with bool and BFloat16 dtypes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23337 Test Plan: Imported from OSS Differential Revision: D16467979 Pulled By: gchanan fbshipit-source-id: 2e6ad431c47a61c917d501390d14c55b788958ab	2019-08-09 12:36:35 -07:00
Hong Xu	513c4291c5	Suppress implicit-fallthrough warning on g++ >= 7 in caffe2/utils/math_cpu.cc (#24053 ) Summary: These implicit fallthroughs lead to the following warning on g++ 7, because g++ could not recognize the implicit `abort` call in `LOG(FATAL)`. We suppress by adding explicit `return`s. /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc: In function void caffe2::math::GemmEx(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int , int, int, T, const T, int, const T, int, T, T, int, Context) [with T = float; Context = caffe2::CPUContext; Engine = caf fe2::DefaultEngine]: /home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10: warning: this statement may fall through [-Wimplicit-fall through=] ::c10::MessageLogger((char)__FILE__, __LINE__, n).stream() ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:179:11: note: in expansion of macro LOG LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B"; ^ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:182:5: note: here case CblasTrans: { ^~~~ In file included from /home/hong/wsrc/pytorch/c10/util/Logging.h:28:0, from /home/hong/wsrc/pytorch/caffe2/core/logging.h:2, from /home/hong/wsrc/pytorch/caffe2/core/types.h:9, from /home/hong/wsrc/pytorch/caffe2/utils/math.h:17, from /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:14: /home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10: warning: this statement may fall through [-Wimplicit-fall through=] ::c10::MessageLogger((char)__FILE__, __LINE__, n).stream() ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:202:11: note: in expansion of macro LOG LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B"; ^ /home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:205:5: note: here default: ^~~~~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/24053 Differential Revision: D16732530 Pulled By: ezyang fbshipit-source-id: 90373879f25b52efca5bf151c7ed58d6ad19d925	2019-08-09 09:17:23 -07:00

1 2 3 4 5 ...

475 Commits