Summary:
Currently, we define some C++ functions in one C++ Python extension
which are used by another. This happens to work, but isn't guaranteed to.
This diff moves these functions to a separate C++ library rule to fix this.
Test Plan: CI
Differential Revision: D42552515
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92325
Approved by: https://github.com/kit1980, https://github.com/Skylion007
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons
Found while building locally on M1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930
Reviewed By: albanD
Differential Revision: D30171981
Pulled By: malfet
fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62059
GetOperatorCost in Workspace exposes flops and bytes_written only. Make the an additional piece, bytes_read, available from OperatorSchema::Cost.
Test Plan:
Added the two additional pieces in the unit test testGetOperatorCost in workspace_test
buck test caffe2/caffe2/python:workspace_test -- testGetOperatorCost
buck test //aml/ml_foundation/exp_platform/large_scale_training/distributed_hogwild/auto_device_placement/tests/...
buck test //aiplatform/training/autotuning/tests/...
buck test //aiplatform/training/pipelining/tests/...
buck test //deeplearning/fblsim/tests/...
Flow tests:
ADP Greedy: f288078287
ADP MILP: f288079278
Reviewed By: CrazySherman, xtaofb
Differential Revision: D29860676
fbshipit-source-id: 8b3a9f2bf17c0dae48cfe2800e8821bf441e0b03
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717
The signal_handler was under the caffe2 namespacee but was being used
by PyTorch as well.
I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch
can use it.
The signal_handler interface in caffe2/utils/signal_handler.h is kept the same
for backward compatiblity for C2, but most of the commmon code is moved to c10.
ghstack-source-id: 127446929
Test Plan: waitforbuildbot
Reviewed By: ezyang
Differential Revision: D27946738
fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55003
Using the `caffe2::setPrintStackTracesOnFatalSignal` utility in
distributed tests to set a signal handler that dumps the state of all threads
for all processes when it receives a FATAL signal. This would help in debugging
tests further.
I had to revert all the python faulthandler code since only one signal handler
function is supported, so running python faulthandler with
`setPrintStackTracesOnFatalSignal` doesn't work.
Sample output:
```
SIGSEGV(11), PID: 3492872, Thread 3492872:
[0] ???(0x7fa7b2d1d61b) in libcaffe2_caffe2_caffe2_cpu.so
[1] ???(0x7fa7b2d1d3fb) in libcaffe2_caffe2_caffe2_cpu.so
[2] ???(0x7fa7b2d1d33d) in libcaffe2_caffe2_caffe2_cpu.so
[3] ???(0x7fa7b2d1d167) in libcaffe2_caffe2_caffe2_cpu.so
[4] ???(0x7fa7ce683150) in libpthread.so.0
[5] ???(0x7fa7be2b233c) in libcaffe2__C_impl_cuda.so
[6] ???(0x7fa7be2ce80c) in libcaffe2__C_impl_cuda.so
[7] ???(0x7fa7be2a0512) in libcaffe2__C_impl_cuda.so
[8] torch::distributed::rpc::TensorPipeAgent::send(torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, float, std::unordered_map<signed char, signed char, std::hash<signed char>, std::equal_to<signed char>, std::allocator<std::pair<signed char const, signed char> > > const&)+0x24f(0x7fa7be29f71f) in libcaffe2__C_impl_cuda.so
[9] torch::distributed::autograd::sendMessageWithAutograd(torch::distributed::rpc::RpcAgent&, torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, bool, float, bool)+0x393(0x7fa7b602b203) in libcaffe2_libtorch.so
[10] torch::distributed::rpc::pyRpcPythonUdf(torch::distributed::rpc::WorkerInfo const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<at::Tensor, std::allocator<at::Tensor> >&, float, bool)+0x201(0x7fa7bd844971) in libcaffe2__C_impl_cuda.so
```
ghstack-source-id: 125630551
Test Plan: waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D27419714
fbshipit-source-id: 8aca9a14ef688004053d8798124d9c3a3fbe3489
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54274
Some of the Python tests need to be aware of whether or not FBGEMM is
available, so expose this setting in the pybind extension.
ghstack-source-id: 124317732
Test Plan: Will use this variable in the tests on D26658205.
Reviewed By: mraway
Differential Revision: D27171780
fbshipit-source-id: 4c94144a959bf8bf0e1553b6e029e94a91794e29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52388
Pull Request resolved: https://github.com/pytorch/glow/pull/5364
This allows us to change global variables through onnxifi calls. And add python bindings along with it. Note that we supply a dummy backend_id as it's not needed by glow due to setting being global.
#codemod
Test Plan:
```
buck test mode/dev //glow/fb/test:test_onnxifi_optionnnpi
```
Reviewed By: jfix71, khabinov
Differential Revision: D26481652
fbshipit-source-id: 19b8201c77f653cf7d93ad68760aa7fb5ec45ff4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44043
To invoke `cancel` from the net instance in Python, we expose it through pybind state.
Reviewed By: dzhulgakov
Differential Revision: D23249660
fbshipit-source-id: 45a1e9062dca811746fcf2e5e42199da8f76bb54
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42927
added fp16 fusion to net transforms
refactored the transforms as well as glow_transform to get out of opt/custom so that the OSS builds passed
Test Plan: added net runner tests for this
Reviewed By: yinghai
Differential Revision: D23080881
fbshipit-source-id: ee6451811fedfd07c6560c178229854bca29301f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42763
add the fp16 fusions as net transforms:
-layernorm fused with mul+add
-swish int8
Test Plan: added unit test, ran flows
Reviewed By: yinghai
Differential Revision: D23002043
fbshipit-source-id: f0b13d51d68c240b05d2a237a7fb8273e996328b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249
Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths.
Basic logic:
| Case | Call to device_count() | init_cuda, e.g. allocating tensor |
| -- | -- | -- |
| all good | non-zero | just works |
| no gpus | 0, no warning | throw exception with good message |
| driver issues | 0, produce warning | throw exception with good message |
| out of memory with ASAN | 0, produce warning| throw exception with ASAN message |
Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs.
Other clean up changes:
* cache device_count() always in a static variable
* move all asan macros in c10
Test Plan:
Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=):
```
print('before import')
import torch
print('after import')
print('devices: ', torch.cuda.device_count())
x = torch.tensor([1,2,3])
print('tensor creation')
x = x.cuda()
print('moved to cuda')
```
Reviewed By: ngimel
Differential Revision: D22824329
fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42516
att. We need it for some scripts.
Reviewed By: houseroad
Differential Revision: D22918112
fbshipit-source-id: 8a1696ceeeda67a34114bc57cb52c925711cfb4c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42421
Previously, we can only feed shape info from Python with float dtype, and batch based dim type when we do onnxifi from Python. This diff removes this limitation and uses TensorBoundShapes protobuf as a generic shape info struct. This will make the onnxifi interface in Python more flexible.
Reviewed By: ChunliF
Differential Revision: D22889781
fbshipit-source-id: 1a89f3a68c215a0409738c425b4e0d0617d58245
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40081
Adding the functionality to enable timeout of OnnxifiOp run. In the case of backend hanging, it can error out quickly.
Test Plan:
```
buck test glow/fb/test:test_onnxifinnpi -- test_timeout
```
Reviewed By: jackm321
Differential Revision: D22064533
fbshipit-source-id: 25487287c10ab217eb95692f09d48e13e19436ab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36741
Create child workspace that shares parent workspace's blobs. Register child workspace in registrar to enable switching into child workspace and feeding to child workspace alone.
Test Plan: numeric suite unit tests in stacked diff
Reviewed By: hx89
Differential Revision: D21055567
fbshipit-source-id: 374b12aef75a4c58452c271f8961ee156ce6c559
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35555
Att. So that we can lower the SparseLengthsSum* part of SparseLengthsSum*Sparse. We update the tying policy between Gather and SparsLengthsWeightSum* so that we don't bother lowering a single Gather into the backend, which is inefficient to execute on card and creates bubbles between continuous lowering graphs.
Test Plan:
```
buck test glow/fb/test:test_onnxifinnpi
```
Reviewed By: ipiszy
Differential Revision: D20688525
fbshipit-source-id: cb8e38239057ff13a8d385ed09d0d019421de78b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515
Once upon a time we thought this was necessary. In reality it is not, so
removing it.
For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.
There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.
Test Plan: Imported from OSS
Differential Revision: D20353503
Pulled By: suo
fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30802
Change shape_hints from map<string, TensorShape> to ShapeInfoMap to catch dimType info from model file.
Reviewed By: ipiszy
Differential Revision: D18821486
fbshipit-source-id: c5d9ed72e158d3698aba38900aeda00f776745b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23848
Problem:
In experiment running feed model 127607201 (/mnt/public/tracelog/feed_repro2/127607201_0.predictor), encountered blob dimensionality mismatch error when running onnxified net. This is due to the model initializing input blobs in current workspace with blob size 0, and onnxifi() falsely identified those input blobs as weight blobs and assigned wrong dimension.
Solution:
Add option to pass correct weight blob names to onnxifi() instead of using all blobs in current workspace.
Reviewed By: yinghai
Differential Revision: D16661396
fbshipit-source-id: cabe44db6b64e6538bef4b65e380312214b3ba9f
Summary:
As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach.
Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked.
This supersedes https://github.com/pytorch/pytorch/pull/22418.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473
Differential Revision: D16099042
Pulled By: yf225
fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21718
adding a detection method on whether the package is built for AMD.
Reviewed By: bddppq
Differential Revision: D15795893
fbshipit-source-id: 91a21ee76b2273b1032507bdebe57e016717181d