pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Richard Barnes	72e4aab74b	Eliminate unused parameters in PyTorch (#73749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73749 Unused parameters cause compiler warnings which distract from real issues. Let's remove unused parameters! Test Plan: Sandcastle Reviewed By: swolchok, ngimel Differential Revision: D34567731 fbshipit-source-id: 2e42301a29a8e1014ac8ab429588bb773db58850 (cherry picked from commit 3eda4743991328d532194efd0fe3d127a294343d)	2022-03-04 02:31:37 +00:00
Can Balioglu	7366724e07	Introduce an environment variable to change c10 log level (#71746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71746 This PR contains the following improvements: - It exposes a new environment variable `TORCH_CPP_LOG_LEVEL` that enables users to set the log level of c10 logging facility (supports both GLOG and c10 loggers). Valid values are `INFO`, `WARNING`, `ERROR`, and `FATAL` or their numerical equivalents `0`, `1`, `2`, and `3`. - It implements an `initLogging()` function and calls it as part of `torch._C` module import to ensure that the underlying logging facility is correctly initialized in Python. With these changes a user can dynamically set the log level of c10 as in the following example: ``` $ TORCH_CPP_LOG_LEVEL=INFO python my_torch_script.py ``` ghstack-source-id: 149822703 Test Plan: Run existing tests. Reviewed By: malfet Differential Revision: D33756252 fbshipit-source-id: 7fd078c03a598595d992de0b474a23cec91838af (cherry picked from commit 01d6ec6207faedf259ed1368730e9e197cb3e1c6)	2022-02-24 14:34:01 +00:00
Yanli Zhao	1e77ba36db	change ddpLoggingData struct to map or dict (#56641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56641 currently ddpLoggingData is flat struct, which requires internal DDP developers and external users to know about the struct field names. This is not flexible to delete or add new fields in the future. also it is hard to access ddpLoggingData. With maps/dict, developers and users can easily access the fields without knowing the field names, also easier to add/remove a new/old field. Since C++ does not support map values to be different types, right now ddpLoggingData containes two types of maps. ghstack-source-id: 127482694 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D27923723 fbshipit-source-id: c90199c14925fc50ef219000e2f809dc7601cce1	2021-04-28 06:43:25 -07:00
Rohan Varma	3575e71be8	[DDP Logging] Log use of uneven inputs API (#54919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54919 Log the use of uneven inputs API for better tracking and use case detection. ghstack-source-id: 125446499 Test Plan: CI, added ut Reviewed By: zhaojuanmao, SciPioneer Differential Revision: D27410764 fbshipit-source-id: abc8055a2e15a3ee087d9959f8881b05a0ea933e	2021-04-01 16:22:32 -07:00
Rohan Varma	6b7652e26c	[DDP logging] Prefer use of c10::Join (#54649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54649 Some operator<< code manually implemented string join in C++, turns out there is a c10 util for this. Use the util instead of rolling our own. ghstack-source-id: 124840043 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D27316705 fbshipit-source-id: 5118097f84be2f38a503d8f81faa38c8d95ec17a	2021-03-25 15:54:48 -07:00
Scott Wolchok	0c8f16622b	[Caffe2] Rework CAFFE_ENFORCE_THAT (#53303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53303 The old code did a heap allocation unnecessarily and was a little convoluted. I think that it was structured that way to avoid double-evaluating arguments; I just forced them to be evaluated once as though they were passed to a function by binding const references to them. ghstack-source-id: 123918262 Test Plan: 1) `buck run mode/opt-clang //caffe2/caffe2/fb/tests:logging_bench` Before: ``` ============================================================================ caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s ============================================================================ glog_CHECK 2.01ns 498.63M caffe2_ENFORCE_GE 50.00% 4.01ns 249.31M glog_CHECK_GE 17.39% 11.53ns 86.73M fbcode_ENFORCE 100.00% 2.01ns 498.65M caffe2_ENFORCE 100.00% 2.01ns 498.63M caffe2_ENFORCE_THAT 50.00% 4.01ns 249.33M ============================================================================ ``` After: ``` ============================================================================ caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s ============================================================================ glog_CHECK 2.01ns 498.63M caffe2_ENFORCE_GE 97.44% 2.06ns 485.88M glog_CHECK_GE 17.39% 11.53ns 86.73M fbcode_ENFORCE 100.00% 2.01ns 498.65M caffe2_ENFORCE 100.00% 2.01ns 498.65M caffe2_ENFORCE_THAT 97.28% 2.06ns 485.06M ============================================================================ ``` Looks like about a 1.94x speedup! 2) Inspect generated assembly for logging_bench.cpp before & after by: ``` $ compile-commands caffe2/caffe2/fb/tests/logging_bench.cpp -f "mode/opt-clang" $ jq -r '.[0].arguments \| sh' < compile_commands.json \| sed -e "s/'-c'/'-S'/g" \| sed -E -e "s/'-g[12]'/'-g0'/g" > out.sh $ sh out.sh ``` Then diff logging_bench.s as you like. Before: P255408666 After: P277883307 Net about 1500 lines deleted from the assembly. We can see that the happy path (which the benchmark tests) no longer contains string creation. Reviewed By: dzhulgakov Differential Revision: D26829714 fbshipit-source-id: 6e11f8ea29292ae3d9f2cc89d08afcb06f7d39c9	2021-03-16 23:01:00 -07:00
Yanli Zhao	d032287ec3	fix data type logging (#53162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53162 it is possible there are multiple data types in mixed precision training, so log data types as a list of data type names. ghstack-source-id: 123452626 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26769256 fbshipit-source-id: 8f7d73821e89864fedbbce723f301fe8fbad5685	2021-03-10 11:35:26 -08:00
Yanli Zhao	7d4b229d61	add is_multi_device_module logging field (#53149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53149 add is_multi_device_module logging field ghstack-source-id: 123444621 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26765355 fbshipit-source-id: d4d9c5981b18b1744299aebe8af37eb4e2e35c61	2021-03-10 11:35:22 -08:00
Yanli Zhao	a08fc1a7fc	allow users to set sample rate and add per iteration latency breakdowns (#53145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53145 add a new API to allow users to set sample rate for runtime stats, also add per iteration latency breakdowns to DDPLoggingData struct. e.g. if users set sample rate to be 1, they can analyze per iteration latency change over time (not avged) ghstack-source-id: 123443369 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26763957 fbshipit-source-id: baff6a09c2a590e6eb91362ca6f47ae8fa6ddb0e	2021-03-10 11:35:18 -08:00
Rohan Varma	14fa47631b	[DDP Logging] Log comm. hook in ddp logging (#52966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52966 Logs registerd comm hook if there is one, else logs "builtin_allreduce" ghstack-source-id: 123174803 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D26709388 fbshipit-source-id: 484fdbbd6643ec261b3797bd8d9824b2b6a1a490	2021-03-05 11:23:26 -08:00
Rohan Varma	5d9b7bee1a	[DDP Logging] Log nccl_async_error_handling (#52965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52965 Logs nccl async error handling in ddp logger ghstack-source-id: 123171876 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26709030 fbshipit-source-id: 530456a5005b8e4956d7fb023986e9b948ebe1a8	2021-03-05 11:23:22 -08:00
Rohan Varma	bdbfc2582d	[Dist Debugality] Log key DDP metrics to stderr under debug mode. (#52957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52957 This diff: 1. Under TORCH_DISTRIBUTED_DEBUG=INFO or DETAIL, logs DDP information during init time (all stats in ddp_logging_data_) 2. Under TORCH_DISTRIBUTED_DEBUG=DETAIL, logs runtime stats when they are collected (first 10 iterations and then once every 100 iterations). Avoiding logging every iteration to not spam logs. Verified by inspecting logs: ``` I0226 19:12:47.109243 2818475 logger.cpp:140] [Rank 1]: DDP Initialized with: world_size: 2 module_name: Linear device_ids: 1 output_device: 1 backend_name: nccl parameter_dtype: float total _parameter_size_in_bytes: 40 num_parameter_tensors: 2 bucket_sizes: 40 CUDA_VISIBLE_DEVICES: N/Abroadcast_buffer s: 1 bucket_cap_mb: 25 find_unused_parameters: 0 gradient_as_bucket_view: 0 Backend Info: nccl_socket_ifname: N/A nccl_blocking_wait: N/A nccl_debug: WARN nccl_nthreads: N/A nccl_ib_timeo ut: N/A I0226 19:12:47.109252 2818473 logger.cpp:140] [Rank 0]: DDP Initialized with: world_size: 2 module_name: Linear device_ids: 0 output_device: 0 backend_name: nccl parameter_dtype: float total _parameter_size_in_bytes: 40 num_parameter_tensors: 2 bucket_sizes: 40 CUDA_VISIBLE_DEVICES: N/Abroadcast_buffer s: 1 bucket_cap_mb: 25 find_unused_parameters: 0 gradient_as_bucket_view: 0 Backend Info: nccl_socket_ifname: N/A nccl_blocking_wait: N/A nccl_debug: WARN nccl_nthreads: N/A nccl_ib_timeo ut: N/A ``` ``` I0226 19:12:48.117936 2818473 logger.cpp:286] [Rank 0 / 2] Training Linear unused_parameter_size=0 Avg forward compute time: 568944 Avg backward compute time: 885504 Avg backward comm. time: 692496 Avg backward comm/comp overlap time: 113536 I0226 19:12:48.118517 2818475 logger.cpp:286] [Rank 1 / 2] Training Linear unused_parameter_size=0 Avg forward compute time: 565584 Avg backward compute time: 876992 Avg backward comm. time: 201872 Avg backward comm/comp overlap time: 128624 ``` ghstack-source-id: 123171875 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26708184 fbshipit-source-id: 16defd5610d28bc4cf3fc2a0cc564e84efcfa791	2021-03-05 11:23:18 -08:00
Scott Wolchok	566f7c79d3	[c10] Take advantage of c10::str optis for simple CAFFE_ENFORCE (#52223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52223 After the previous diffs, `c10::str()` will return a `CompileTimeEmptyString` when passed 0 arguments and a `const char` when passed 1 `const char ` argument. We can take advantage of this to outline further std::string creation from CAFFE_ENFORCE. ghstack-source-id: 121877053 (Note: this ignores all push blocking failures!) Test Plan: Compare assembly for ``` #include <c10/util/Logging.h> void f(bool b) { CAFFE_ENFORCE(b); } void g(bool b) { CAFFE_ENFORCE(b, "message"); } void h(bool b) { CAFFE_ENFORCE(b, "message", random()); } ``` before & after this diff. before: P174902847 after: P174902912 f & g are clearly much improved, and h is about the same. (I tried measuring caffe2 perf on the AdIndexer MergeNet benchmark, but didn't see a win, which makes sense because the change is small.) Reviewed By: bhosmer Differential Revision: D26405181 fbshipit-source-id: c51a9e459ae7d9876494a83ade6f6fe725619512	2021-02-19 12:45:35 -08:00
Yanli Zhao	c75fa39b6c	add stats that can only be collected at runtime (#51386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178	2021-02-19 00:13:11 -08:00
Yanli Zhao	18e0a61388	add more logging fields that can be set in construction time (#51260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51260 add more logging fields to DDPLoggingData, including param stats, bucket stats, environment variables, nccl version, data type ghstack-source-id: 121260224 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D26118245 fbshipit-source-id: ba48b7a11340bda1f5f3b24c8603545d346361e9	2021-02-09 21:58:58 -08:00
Yanli Zhao	e54cbb8250	Create PyTorch DDP logging APIs for applications to use (#50637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50637 add APIs for logging pytorch ddp logging data in applications. Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D25933411 fbshipit-source-id: 57c248a2f002da06a386fc7406d3e5533ebb9124	2021-02-02 18:24:21 -08:00
Yanli Zhao	d5541c50a3	add a c++ interface in processGroup to get its backend name (#51066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51066 backend name of a processgroup created using distributed_c10d python API is tracked, but there is no good way to track name of a processgroup created using processGroup c++ API. In some cases, knowing backend name of a processGroup is useful, e,g., log the backend name, or write some codes that have dependency on the known backend. ghstack-source-id: 120628432 Test Plan: unit tests Reviewed By: pritamdamania87 Differential Revision: D26059769 fbshipit-source-id: 6584c6695c5c3570137dc98c16e06cbe4b7f5503	2021-01-29 17:28:42 -08:00
Yanli Zhao	250c71121b	Create a DDPLoggingData and expose it to python interface (#50622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50622 1. Define a DDPLoggingData struct that is the placeholder for all the ddp related logging fields 2. Put the DDPLoggingData struct in the C10 directory so that it can be easily imported by c10 and torch files 3. Expose get_ddp_logging_data() method in python so that users can get the logging data and dump in their applications 4. Unit test tested the logging data can be set and got as expected 5. Follow up will add more logging fields such as perf stats, internal states, env variables and etc ghstack-source-id: 120275870 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D25930527 fbshipit-source-id: 290c200161019c58e28eed9a5a2a7a8153113f99	2021-01-25 15:23:07 -08:00
Samuel Marks	8aad66a7bd	[c10/**] Fix typos (#49815 ) Summary: All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815 Reviewed By: VitalyFedyunin Differential Revision: D25734507 Pulled By: mruberry fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd	2021-01-01 02:11:56 -08:00
Sebastian Messmer	de090c42b1	Optimize binary size of assert macros (#37023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37023 Optimize binary size of assert macros, through two ideas: Concatenate string literals with __FILE__ and __LINE__ at compile time into one literal instead of keeping them in separate literals and combining them with c10::str Optimize binary size of c10::str for some scenarios, especially for the scenario where it is called with an empty parameter list, this is actually a common call scenario in assert macros. In server oss builds, this PR reduces binary size from 118.05 MB to 117.05 MB ghstack-source-id: 102607237 Test Plan: Run oss server build (python setup.py install) and check size of libtorch_cpu.so reducing from 118.05MB to 117.05MB Differential Revision: D20719400 fbshipit-source-id: 5c61f4195b947f06aafb8f0c8e255de3366e1ff2	2020-04-22 17:13:17 -07:00
Xiang Gao	15c7486416	Canonicalize includes in c10, and add tests for it (#36299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36299 Test Plan: Imported from OSS Differential Revision: D20943005 Pulled By: ezyang fbshipit-source-id: 9dd0a58824bd0f1b5ad259942f92954ba1f63eae	2020-04-10 12:07:52 -07:00
Liqian Peng	9407137102	Update the descriptive error message for enforce fail (#31575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31575 We need a new exception class specifically for the enforce_finite operator, because we need to map it to a specific python exception ExitException, not the RuntimeError type that all c10::Errors get mapped to by default. This diff includes: - Define c10::EnforceFiniteNotMet - API CAFFE_ENFORCE_FINITE to throw c10::EnforceFiniteNotMet - Map from c10::EnforceFiniteNotMet to python ExitException - Apply CAFFE_ENFORCE_FINITE in caffe2 op Test Plan: - integration test pass: https://fburl.com/fblearner/xwkzbqyo - integration test with D19213617: https://fburl.com/fblearner/479y4jrj Generate error message as desired - Example: - Original error message f157597803 {F225477055} - Updated error message (with D19213617 to generate the error): f158571327 {F225477071} Reviewed By: zheng-xq Differential Revision: D19206240 fbshipit-source-id: bd256862801d5957a26b76d738edf4e531f03827	2020-01-03 13:53:20 -08:00
Sebastian Messmer	409151e1bb	Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917 This is a C++14 feature, we can use this now. ghstack-source-id: 95255753 Test Plan: waitforsandcastle Differential Revision: D18869637 fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b	2019-12-15 23:54:16 -08:00
Xinyi Zhang	f5ea2ca34a	Reduce logging frequency for empty range tolarence Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28704 Reviewed By: xianjiec Differential Revision: D18138828 fbshipit-source-id: 4f3c376502cb6e30b931217702c4ca537c9eb644	2019-10-28 09:52:17 -07:00
Sebastian Messmer	99b057d89c	Failing assertions is unlikely (#20876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20876 Tell the compiler that assertions are likely to succeed. This allows the compiler to generate betterr code and optimize for the success case. Differential Revision: D15480066 fbshipit-source-id: 4485154d66b2ee0ef8a401718712dbd61d811aee	2019-05-29 15:59:33 -07:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Jerry Zhang	40a54bf2f1	Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531 Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service, we would like to change it to C10_LOG_FIRST_N to prevent that. Reviewed By: dzhulgakov Differential Revision: D14647704 fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e	2019-04-02 21:03:37 -07:00
Benny Chen	f25322fb97	Fix issues under caffe round 1 Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2 Reviewed By: ezyang Differential Revision: D13776185 fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24	2019-01-23 19:04:59 -08:00
Dmytro Dzhulgakov	47c0d88739	Bring back warning for dtype uninitialized in serialization (#13239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13239 Previous diff missed the if (dtype_initialized) check, duh. Also, for safety of spamming - using LOG_EVERY_MS if it's available Reviewed By: kennyhorror Differential Revision: D12818938 fbshipit-source-id: 76590bd1b28010fb13f5d33423c8eac1395e9f76	2018-10-29 22:09:54 -07:00
Yangqing Jia	7dbb38e856	Moving logging from caffe2 to c10. (#12881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12881 TSIA. This should not change any functionality. Remaining work: - change the build script to deprecate use of CAFFE2_USE_MINIMAL_GOOGLE_GLOG and use a C10 macro instead. - Unify the exception name (EnforceNotMet -> Error) - Unify the logging and warning APIs (like AT_WARNING) Reviewed By: dzhulgakov Differential Revision: D10441597 fbshipit-source-id: 4784dc0cd5af83dacb10c4952a2d1d7236b3f14d	2018-10-19 20:22:08 -07:00

32 Commits