Commit Graph

15 Commits

Author SHA1 Message Date
albanD
c141f28b64 Fix compilation warning and spurious print (#87297)
Fixes compilation warning, make this warning an error and remove a random print.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87297
Approved by: https://github.com/malfet
2022-10-19 20:56:37 +00:00
zaf
78c8a0d752 [quant][ao_migration] torch.nn.quantized.functionaltorch.ao.nn.quantized.functional (#78712)
Context: In order to avoid the cluttering of the `torch.nn` namespace
the quantized modules namespace is moved to `torch.ao.nn`.

The list of the `nn.quantized` files that are being migrated:

- [ ] `torch.nn.quantized` → `torch.ao.nn.quantized`
  - [X] [Current PR] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional`
  - [ ] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules`
  - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic`
  - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference`
- [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable`
- [ ] `torch.nn.qat` → `torch.ao.nn.qat`
  - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules`
  - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic`
- [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic`
  - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules`
  - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat`
  - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized`
    - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules`
    - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic`

Majority of the files are just moved to the new location.
However, specific files need to be double checked:

- [Documentation](docs/source/quantization-support.rst) @vkuzo
- [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10

Differential Revision: [D36792967](https://our.internmc.facebook.com/intern/diff/D36792967/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36792967/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78712
Approved by: https://github.com/jerryzh168
2022-08-18 17:51:54 +00:00
David Chen
90821aab10 Add SOFT_ASSERT to gracefully recover from invariant violations (#82689)
Summary: Implement SOFT_ASSERT that only fails in debug mode, but only trigger a warning log in release mode. This allows us to gracefully handle some of the invariant violation when processing traces that doesn't necessarily need to crash the entire program.

Test Plan: Added SOFT_ASSERT test in containers.cpp

Differential Revision: D38327334

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82689
Approved by: https://github.com/robieta
2022-08-10 00:58:07 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Scott Wolchok
52af4fc5ba [PyTorch] Make RecordFunction store inputs as ArrayRef (#72484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72484

Stepping stone toward stack-allocating array of inputs.

Funnily enough, this seems to improve performance too.
ghstack-source-id: 155492056

Test Plan:
1) CI
2) framework overhead benchmark with --stressTestRecordFunction --captureRecordFunctionInputs goes from 0.76 usec/iter to 0.72.

Reviewed By: chaekit, robieta

Differential Revision: D34061169

fbshipit-source-id: 073fedf1d3d162f927c4e9867cfda7dbfabba215
(cherry picked from commit dae77cf1cd8813d902d73999ad97133a3ef8e291)
2022-05-05 21:38:42 +00:00
Nikita Shulga
f6c275f55d Remove -Wno-unused-variable from utils.cmake (take 2) (#75538)
Summary:
[Comment](https://github.com/pytorch/pytorch/pull/62445/files#r680132022) claims, it got added for consistency with  top level CMakeLists.txt, but `-Wno-unused-variable` is not mentioned there.

Modify violations in 50+ files that were added in the interim by either removing unused variables, or decorating the code with `C10_UNUSED` if local variable is likely used to extend object lifetime until the end of the block.

Caused preventable revert in https://github.com/pytorch/pytorch/pull/72633#issuecomment-1092300787

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75538

Reviewed By: anjali411

Differential Revision: D35747333

Pulled By: malfet

fbshipit-source-id: 3fc5828e44a4c05ba0e89e92613e6ebbdb260626
(cherry picked from commit c179fba21cfa2a0093fad50ccad5a22dd7cff52c)
2022-04-20 17:41:59 +00:00
PyTorch MergeBot
5c56b2286b Revert "Remove -Wno-unused-variable from utils.cmake"
This reverts commit 018cbe1f5c.

Reverted https://github.com/pytorch/pytorch/pull/75538 on behalf of https://github.com/seemethere
2022-04-19 17:19:09 +00:00
Nikita Shulga
018cbe1f5c Remove -Wno-unused-variable from utils.cmake
[Comment](https://github.com/pytorch/pytorch/pull/62445/files#r680132022) claims, it got added for consistency with  top level CMakeLists.txt, but `-Wno-unused-variable` is not mentioned there.

Modify violations in 50+ files that were added in the interim by either removing unused variables, or decorating the code with `C10_UNUSED` if local variable is likely used to extend object lifetime until the end of the block.

Caused preventable revert in https://github.com/pytorch/pytorch/pull/72633#issuecomment-1092300787

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75538
Approved by: https://github.com/cpuhrsch
2022-04-19 15:26:55 +00:00
alexmsettle
c0a6add7ee Changes to support input sequence ID tracking (#70264)
Summary:
in the NVTX markers.  This feature adds additional information
to the NVTX marker string eg seq_ids=[101, 102, 103].  This indicates
the sequence id of the op which produced the input tensor based on its
position index in the array.  In the above example input tensor 0 was produced by
the node with sequence id 101, input tensor 1 is from node 102, input tensor 2 is from
node with sequence id 103. This is the same way the sizes array is
organized. If you know the sequence id of the node and the sequence ids
of the input edges, then you have enough information to construct the
network graph.

Fixes https://github.com/pytorch/pytorch/issues/66105

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70264

Reviewed By: chaekit

Differential Revision: D34792707

Pulled By: robieta

fbshipit-source-id: 4407b853c929a737505803b0db77a8ecd966cce2
(cherry picked from commit cd3c0c8c9d4d63d7897f60521c407883240d1d5b)
2022-03-31 22:15:39 +00:00
Taylor Robie
0b1f3bd158 [Profiler] Prefer TSC to wall clock when available (#73855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855

Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale)

Test Plan: I added a cpp unit test with *very* aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us)

Reviewed By: chaekit

Differential Revision: D34231071

fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8
(cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)
2022-03-13 18:29:06 +00:00
Scott Wolchok
bf82d2012e [PyTorch] Add IValue::toDimVector & mostly replace toIntVector with it (#71247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71247

Most uses of toIntVector() were for a Tensor shape. We have DimVector to avoid heap allocations in those cases, so let's use it.
ghstack-source-id: 146933314

Test Plan: CI -- if we think DimVector is good in general then I think we have to think this change is good?

Reviewed By: mikeiovine

Differential Revision: D33556198

fbshipit-source-id: cf2ad92c2d0b99ab1df4da0f6843e6ccb9a6320b
2022-01-14 14:32:40 -08:00
Taylor Robie
786f946098 [Profiler] Add glue layer to reduce the use of #ifdef USE_KINETO in the profiler code. (#69798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69798

One of the major sources of complexity in `profiler_kineto.cpp` is that kineto may or may not be available. The code (including the types) follows two related but often distict codepaths, and large sections may or may not be `#ifdef`'d out.

Optimizing such code which preserving correctness is quite difficult; at one point I realized that I had broken the non-Kineto case, because moving work into the finalize step runs astray of a very large `#ifdef` around the finalize logic.

In order to make optimization more tractable, I gathered all of the calls to Kineto APIs and isolated them in the `kineto_shim.h/.cpp` files: the header allows callers to pretend as though Kineto is always available (mostly), and the cpp file hides most of the horrible `#ifdef`s so they don't pollute the main profiler code.

Test Plan: Unit tests.

Reviewed By: aaronenyeshi

Differential Revision: D32690568

fbshipit-source-id: 9a276654ef0ff9d40817c2f88f95071683f150c5
2022-01-11 15:57:46 -08:00
Amir Khojaste
748790588c Upgrading the loop to use irange (#70326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70326

See D24145988 for context: it allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This is nice because it auto-types the loops and adds const-safety to the iteration variable.

Test Plan: buck run //caffe2/torch/fb/sparsenn:test

Reviewed By: r-barnes

Differential Revision: D33243400

fbshipit-source-id: b1f1b4163f4bf662031baea9e5268459b40c69a3
2022-01-06 07:06:53 -08:00
Jay Chae
12653be434 [PyTorch] Optimize no input NVTX collection (#70133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70133

we were creating `sstream` + string concats via `getNvtxStr` even when there were no inputs and wasting precious time. this diff avoids `stringstream` when there is no input to squeeze performance. 60% reduction in overhead

Test Plan:
Before
```
I1214 22:48:07.964118 2971180 bench.cpp:154] Mean 0.970494
I1214 22:48:07.964139 2971180 bench.cpp:155] Median 0.969054
I1214 22:48:07.964144 2971180 bench.cpp:156] Min 0.962247
I1214 22:48:07.964148 2971180 bench.cpp:157] stddev 0.00774841
I1214 22:48:07.964154 2971180 bench.cpp:158] stddev / mean 0.00798398
```

After
```
I1214 22:59:00.039872 3437853 bench.cpp:154] Mean 0.384333
I1214 22:59:00.039896 3437853 bench.cpp:155] Median 0.384886
I1214 22:59:00.039899 3437853 bench.cpp:156] Min 0.370235
I1214 22:59:00.039902 3437853 bench.cpp:157] stddev 0.00435907
I1214 22:59:00.039907 3437853 bench.cpp:158] stddev / mean 0.0113419
```

Reviewed By: aaronenyeshi, robieta

Differential Revision: D33137501

fbshipit-source-id: ce0e8cf9aef7ea22fd8aed927e76be4ca375efc3
2022-01-04 23:40:22 -08:00
Taylor Robie
ebc66bfeea [Profiler] Pull helper methods into dedicated file. (And start torch/csrc/profiler folder. (#69255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69255

One thing that I've found as I optimize profier is that there's a lot of intermingled code, where the kineto profiler relies on the legacy (autograd) profiler for generic operations. This made optimization hard because I had to manage too many complex dependencies. (Exaserbated by the USE_KINETO #ifdef's sprinkled around.) This PR is the first of several to restructure the profiler(s) so the later optimizations go in easier.

Test Plan: Unit tests

Reviewed By: aaronenyeshi

Differential Revision: D32671972

fbshipit-source-id: efa83b40dde4216f368f2a5fa707360031a85707
2021-12-16 10:33:47 -08:00