Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71746
This PR contains the following improvements:
- It exposes a new environment variable `TORCH_CPP_LOG_LEVEL` that enables users to set the log level of c10 logging facility (supports both GLOG and c10 loggers). Valid values are `INFO`, `WARNING`, `ERROR`, and `FATAL` or their numerical equivalents `0`, `1`, `2`, and `3`.
- It implements an `initLogging()` function and calls it as part of `torch._C` module import to ensure that the underlying logging facility is correctly initialized in Python.
With these changes a user can dynamically set the log level of c10 as in the following example:
```
$ TORCH_CPP_LOG_LEVEL=INFO python my_torch_script.py
```
ghstack-source-id: 149822703
Test Plan: Run existing tests.
Reviewed By: malfet
Differential Revision: D33756252
fbshipit-source-id: 7fd078c03a598595d992de0b474a23cec91838af
(cherry picked from commit 01d6ec6207faedf259ed1368730e9e197cb3e1c6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56641
currently ddpLoggingData is flat struct, which requires internal DDP developers and external users to know about the struct field names. This is not flexible to delete or add new fields in the future. also it is hard to access ddpLoggingData.
With maps/dict, developers and users can easily access the fields without knowing the field names, also easier to add/remove a new/old field.
Since C++ does not support map values to be different types, right now ddpLoggingData containes two types of maps.
ghstack-source-id: 127482694
Test Plan: unit tests
Reviewed By: SciPioneer
Differential Revision: D27923723
fbshipit-source-id: c90199c14925fc50ef219000e2f809dc7601cce1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54919
Log the use of uneven inputs API for better tracking and use case
detection.
ghstack-source-id: 125446499
Test Plan: CI, added ut
Reviewed By: zhaojuanmao, SciPioneer
Differential Revision: D27410764
fbshipit-source-id: abc8055a2e15a3ee087d9959f8881b05a0ea933e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54649
Some operator<< code manually implemented string join in C++, turns
out there is a c10 util for this. Use the util instead of rolling our own.
ghstack-source-id: 124840043
Test Plan: Ci
Reviewed By: SciPioneer
Differential Revision: D27316705
fbshipit-source-id: 5118097f84be2f38a503d8f81faa38c8d95ec17a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53303
The old code did a heap allocation unnecessarily and was a
little convoluted. I think that it was structured that way to avoid
double-evaluating arguments; I just forced them to be evaluated once
as though they were passed to a function by binding const references
to them.
ghstack-source-id: 123918262
Test Plan:
1) `buck run mode/opt-clang //caffe2/caffe2/fb/tests:logging_bench`
Before:
```
============================================================================
caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s
============================================================================
glog_CHECK 2.01ns 498.63M
caffe2_ENFORCE_GE 50.00% 4.01ns 249.31M
glog_CHECK_GE 17.39% 11.53ns 86.73M
fbcode_ENFORCE 100.00% 2.01ns 498.65M
caffe2_ENFORCE 100.00% 2.01ns 498.63M
caffe2_ENFORCE_THAT 50.00% 4.01ns 249.33M
============================================================================
```
After:
```
============================================================================
caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s
============================================================================
glog_CHECK 2.01ns 498.63M
caffe2_ENFORCE_GE 97.44% 2.06ns 485.88M
glog_CHECK_GE 17.39% 11.53ns 86.73M
fbcode_ENFORCE 100.00% 2.01ns 498.65M
caffe2_ENFORCE 100.00% 2.01ns 498.65M
caffe2_ENFORCE_THAT 97.28% 2.06ns 485.06M
============================================================================
```
Looks like about a 1.94x speedup!
2) Inspect generated assembly for logging_bench.cpp before & after by:
```
$ compile-commands caffe2/caffe2/fb/tests/logging_bench.cpp -f "mode/opt-clang"
$ jq -r '.[0].arguments | sh' < compile_commands.json | sed -e "s/'-c'/'-S'/g" | sed -E -e "s/'-g[12]'/'-g0'/g" > out.sh
$ sh out.sh
```
Then diff logging_bench.s as you like.
Before: P255408666
After: P277883307
Net about 1500 lines deleted from the assembly. We can see that the
happy path (which the benchmark tests) no longer contains string
creation.
Reviewed By: dzhulgakov
Differential Revision: D26829714
fbshipit-source-id: 6e11f8ea29292ae3d9f2cc89d08afcb06f7d39c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53162
it is possible there are multiple data types in mixed precision training, so log data types as a list of data type names.
ghstack-source-id: 123452626
Test Plan: unit test
Reviewed By: SciPioneer
Differential Revision: D26769256
fbshipit-source-id: 8f7d73821e89864fedbbce723f301fe8fbad5685
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53145
add a new API to allow users to set sample rate for runtime stats, also add per iteration latency breakdowns to DDPLoggingData struct. e.g.
if users set sample rate to be 1, they can analyze per iteration latency change over time (not avged)
ghstack-source-id: 123443369
Test Plan: unit test
Reviewed By: SciPioneer
Differential Revision: D26763957
fbshipit-source-id: baff6a09c2a590e6eb91362ca6f47ae8fa6ddb0e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52966
Logs registerd comm hook if there is one, else logs
"builtin_allreduce"
ghstack-source-id: 123174803
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D26709388
fbshipit-source-id: 484fdbbd6643ec261b3797bd8d9824b2b6a1a490
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52223
After the previous diffs, `c10::str()` will return a
`CompileTimeEmptyString` when passed 0 arguments and a `const char*` when
passed 1 `const char *` argument. We can take advantage of this to
outline further std::string creation from CAFFE_ENFORCE.
ghstack-source-id: 121877053
(Note: this ignores all push blocking failures!)
Test Plan:
Compare assembly for
```
#include <c10/util/Logging.h>
void f(bool b) {
CAFFE_ENFORCE(b);
}
void g(bool b) {
CAFFE_ENFORCE(b, "message");
}
void h(bool b) {
CAFFE_ENFORCE(b, "message", random());
}
```
before & after this diff.
before: P174902847
after: P174902912
f & g are clearly much improved, and h is about the same.
(I tried measuring caffe2 perf on the AdIndexer MergeNet benchmark, but didn't see a win, which makes sense because the change is small.)
Reviewed By: bhosmer
Differential Revision: D26405181
fbshipit-source-id: c51a9e459ae7d9876494a83ade6f6fe725619512
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51386
add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data
1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices
2. use at::cuda::event API for safer calls
3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results.
4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls
ghstack-source-id: 121933566
Test Plan: unit tests
Reviewed By: SciPioneer
Differential Revision: D26158645
fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51066
backend name of a processgroup created using distributed_c10d python API is tracked, but there is no good way to track name of a processgroup created using processGroup c++ API. In some cases, knowing backend name of a processGroup is useful, e,g., log the backend name, or write some codes that have dependency on the known backend.
ghstack-source-id: 120628432
Test Plan: unit tests
Reviewed By: pritamdamania87
Differential Revision: D26059769
fbshipit-source-id: 6584c6695c5c3570137dc98c16e06cbe4b7f5503
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50622
1. Define a DDPLoggingData struct that is the placeholder for all the ddp related logging fields
2. Put the DDPLoggingData struct in the C10 directory so that it can be easily imported by c10 and torch files
3. Expose get_ddp_logging_data() method in python so that users can get the logging data and dump in their applications
4. Unit test tested the logging data can be set and got as expected
5. Follow up will add more logging fields such as perf stats, internal states, env variables and etc
ghstack-source-id: 120275870
Test Plan: unit tests
Reviewed By: SciPioneer
Differential Revision: D25930527
fbshipit-source-id: 290c200161019c58e28eed9a5a2a7a8153113f99
Summary:
All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815
Reviewed By: VitalyFedyunin
Differential Revision: D25734507
Pulled By: mruberry
fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37023
Optimize binary size of assert macros, through two ideas:
Concatenate string literals with __FILE__ and __LINE__ at compile time into one literal instead of keeping them in separate literals and combining them with c10::str
Optimize binary size of c10::str for some scenarios, especially for the scenario where it is called with an empty parameter list, this is actually a common call scenario in assert macros.
In server oss builds, this PR reduces binary size from 118.05 MB to 117.05 MB
ghstack-source-id: 102607237
Test Plan: Run oss server build (python setup.py install) and check size of libtorch_cpu.so reducing from 118.05MB to 117.05MB
Differential Revision: D20719400
fbshipit-source-id: 5c61f4195b947f06aafb8f0c8e255de3366e1ff2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31575
We need a new exception class specifically for the enforce_finite operator, because we need to map it to a specific python exception ExitException, not the RuntimeError type that all c10::Errors get mapped to by default. This diff includes:
- Define c10::EnforceFiniteNotMet
- API CAFFE_ENFORCE_FINITE to throw c10::EnforceFiniteNotMet
- Map from c10::EnforceFiniteNotMet to python ExitException
- Apply CAFFE_ENFORCE_FINITE in caffe2 op
Test Plan:
- integration test pass: https://fburl.com/fblearner/xwkzbqyo
- integration test with D19213617: https://fburl.com/fblearner/479y4jrj Generate error message as desired
- Example:
- Original error message f157597803
{F225477055}
- Updated error message (with D19213617 to generate the error): f158571327
{F225477071}
Reviewed By: zheng-xq
Differential Revision: D19206240
fbshipit-source-id: bd256862801d5957a26b76d738edf4e531f03827
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917
This is a C++14 feature, we can use this now.
ghstack-source-id: 95255753
Test Plan: waitforsandcastle
Differential Revision: D18869637
fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20876
Tell the compiler that assertions are likely to succeed.
This allows the compiler to generate betterr code and optimize for the success case.
Differential Revision: D15480066
fbshipit-source-id: 4485154d66b2ee0ef8a401718712dbd61d811aee
Summary:
Resubmit #20698 which got messed up.
Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.
Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745
Differential Revision: D15429196
Pulled By: dzhulgakov
fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531
Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service,
we would like to change it to C10_LOG_FIRST_N to prevent that.
Reviewed By: dzhulgakov
Differential Revision: D14647704
fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e
Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2
Reviewed By: ezyang
Differential Revision: D13776185
fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13239
Previous diff missed the if (dtype_initialized) check, duh.
Also, for safety of spamming - using LOG_EVERY_MS if it's available
Reviewed By: kennyhorror
Differential Revision: D12818938
fbshipit-source-id: 76590bd1b28010fb13f5d33423c8eac1395e9f76
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12881
TSIA. This should not change any functionality.
Remaining work:
- change the build script to deprecate use of CAFFE2_USE_MINIMAL_GOOGLE_GLOG and use a C10 macro instead.
- Unify the exception name (EnforceNotMet -> Error)
- Unify the logging and warning APIs (like AT_WARNING)
Reviewed By: dzhulgakov
Differential Revision: D10441597
fbshipit-source-id: 4784dc0cd5af83dacb10c4952a2d1d7236b3f14d