Summary: Fixed a bunch of fbcode imports that happened to work but confused autodeps. After this autodeps still suggests "improvements" to TARGETS (which breaks our builds) but at least it can find all the imports.
Test Plan:
```
fbpython fbcode/tools/build/buck/linters/lint_autoformat.py --linter=autodeps --default-exec-timeout=1800 -- fbcode/caffe2/TARGETS fbcode/caffe2/test/TARGETS
```
Before:
```
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/testing.py:229) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fbur$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export.py:87) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_serdes.py:9) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fb$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_serdes.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_retraceability.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https:$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_retraceability.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See ht$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_nonstrict.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See http$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_nonstrict.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See $
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:8) when processing rule "test_export". Please make sure it's listed in the srcs parameter of an$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of anoth$
ERROR while processing caffe2/test/TARGETS: Found "//python/typeshed_internal:typeshed_internal_library" owner for "cv2" but it is protected by visibility rules: [] (from caffe2/test/test_bundled_images.py:7) when processing rule "test_bundled_$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "caffe2.test.profiler_test_cpp_thread_lib" (from caffe2/test/profiler/test_cpp_thread.py:29) when processing rule "profiler_test_cpp_thread". Please make sure it's listed in t$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_custom_ops.py:23) when processing rule "custom_ops". Please make sure it's listed in the srcs parameter of anoth$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_public_bindings.py:13) when processing rule "public_bindings". Please make sure it's listed in the srcs paramete$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.symbolize_tracebacks" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another $
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.gather_traceback" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another rule$
ERROR while processing caffe2/test/TARGETS: Cannot find an owner for include <torch/csrc/autograd/profiler_kineto.h> (from caffe2/test/profiler/test_cpp_thread.cpp:2) when processing profiler_test_cpp_thread_lib. Some things to try:
```
Differential Revision: D62049222
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135614
Approved by: https://github.com/oulgen, https://github.com/laithsakka
enable Windows inductor UTs of `test/inductor/test_binary_folding.py`
Failed UT depends on https://github.com/pytorch/pytorch/pull/134427
Need to rebase after https://github.com/pytorch/pytorch/pull/134427 merged.
```cmd
2024-08-25T23:32:23.0905727Z Traceback (most recent call last):
2024-08-25T23:32:23.0906516Z File "C:\actions-runner\_work\pytorch\pytorch\test\inductor\test_binary_folding.py", line 18, in <module>
2024-08-25T23:32:23.0908200Z from inductor.test_inductor_freezing import TestCase
2024-08-25T23:32:23.0909883Z File "C:\actions-runner\_work\pytorch\pytorch\test\inductor\test_inductor_freezing.py", line 39, in <module>
2024-08-25T23:32:23.0911128Z raise unittest.SkipTest("requires sympy/functorch/filelock")
2024-08-25T23:32:23.0911801Z unittest.case.SkipTest: requires sympy/functorch/filelock
2024-08-25T23:32:23.0912370Z Got exit code 1
2024-08-25T23:32:23.0913155Z No stepcurrent file found. Either pytest didn't get to run (e.g. import error) or file got deleted (contact dev infra)
```
Local test pass:
<img width="1898" alt="image" src="https://github.com/user-attachments/assets/4a6e3f66-4bbc-4aab-8f0d-2e2318046e53">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134425
Approved by: https://github.com/ezyang, https://github.com/jansel
This PR properly registers the tensor used in the module compute as a parameter. This bug was hidden previously because all tensors on the nn modules would be considered constant by dynamo, with inlining NN modules, this is no longer the case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128356
Approved by: https://github.com/anijain2305
ghstack dependencies: #128355
`test_dist` uses bfloat16 which isn't well supported by triton on pre-sm80
hardware, so split the test in two and add a skip. This also adds a
`skipCUDAIf` decorator which only skips on CUDA devices so the test still runs
on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113384
Approved by: https://github.com/lezcano
This PR adds an `efficient_conv_bn_eval_graph_transform` pass to the inductor. It tries to identify consecutive conv + bn **computation** with bn in eval mode, and changes it to a more efficient implementation. It does not modify parameters, which makes it **support training** without any pain. If no such patterns are identified, it does nothing. Therefore, it is backward compatible.
It has great benefit in terms of memory footprint:
For resnet50 with input batchsize 64, image size 224, forward + backward training:
| Technique | Memory Footprint (GB) | Remarks |
|-------------------------------|----------------------------|-------------------------------------------|
| Eager Mode | 5.18 | |
| torch.compile | 5.46 | Strangely, not saving memory |
| torch.compile with this PR | 2.88 | **Saves about 50% memory! ** |
The script to measure the memory footprint:
```python
from torchvision.models.resnet import resnet50
import torch
net = resnet50().eval().cuda()
input = torch.randn(64, 3, 224, 224).cuda()
opt_net = torch.compile(net) # Use torch.compile
# opt_net = net # Eager mode
current_memory = torch.cuda.memory_allocated()
torch.cuda.reset_peak_memory_stats()
for i in range(10):
opt_net.zero_grad()
output = opt_net(input)
output.sum().backward()
del output
peak_memory = torch.cuda.max_memory_allocated()
additional_peak_memory = peak_memory - current_memory
print(f"Additional peak memory used: {additional_peak_memory / (1024 ** 3)} GB")
```
More results can be found in the corresponding paper: (this method is called Tune Mode in the tables).
<img width="709" alt="image" src="https://github.com/pytorch/pytorch/assets/23236638/db4815b0-d93e-4726-b1d5-e6651f256484">
<img width="653" alt="image" src="https://github.com/pytorch/pytorch/assets/23236638/22e5e1ab-6129-4c3d-a875-3c7343293b2e">
Note: the difference between this PR and https://github.com/pytorch/pytorch/pull/106372 is that, https://github.com/pytorch/pytorch/pull/106372 tries to fix and change the implementation of `torch.fx.experimental.optimization.fuse`, which causes compatibility issues; this PR only introduces a new graph transform passes, and does not break the previous code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108757
Approved by: https://github.com/jansel