pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	dee100945e	[2/N] Move c10::variant to std::variant (#109723 ) This PR moves most of c10::variant calls to std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723 Approved by: https://github.com/ezyang	2023-09-24 02:47:43 +00:00
Bin Bao	8856c1628e	[inductor] Change AOTInductor to return output tensors (#109790 ) Summary: Change AOTInductor to directly return output tensors instead of taking pre-allocated output tensors to return the results. This gives several benefits: * It makes sure AOTInductor has the same behavior when managing the output tensors as the default Inductor, which is widely tested and thus more reliable. * As we have debugged before, there are cases we still have to codegen extra copy_ ops to fill the pre-allocated output tensors which doesn't make sense for performance. * With the coming enhanced memory planning, this again will make sure the memory planning logic is the between AOTInductor and Inductor, which will greatly simplify the problem and improve the reliability. This change also combines D49494954 from Yang and https://github.com/pytorch/pytorch/pull/109560 from Angela. Differential Revision: D49502318 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109790 Approved by: https://github.com/chenyang78	2023-09-22 02:31:52 +00:00
cyy	e9e93c5350	[Reland] Move torch::make_unique to std::make_unique (#109780 ) We can first try to move torch::make_unique to std::make_unique despite reverting of #108866 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/109780 Approved by: https://github.com/ezyang	2023-09-21 18:30:21 +00:00
Bin Bao	9c2715bbb2	[inductor] Clean up AOTInductor runtime ABI (#109678 ) Summary: Change the AOTInductor runtime interface to avoid referring to aten data structures directly, mostly at::Tensor and ProxyExecutor. This a combination of https://github.com/pytorch/pytorch/pull/109436, https://github.com/pytorch/pytorch/pull/109498, https://github.com/pytorch/pytorch/pull/109450, https://github.com/pytorch/pytorch/pull/109606, plus a few internal build changes. Reviewed By: frank-wei Differential Revision: D49374820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109678 Approved by: https://github.com/frank-wei, https://github.com/chenyang78	2023-09-21 00:25:24 +00:00
PyTorch MergeBot	1cc052bcab	Revert "[1/N] Add -Wdeprecated and related fixes (#108626 )" This reverts commit `a53a677b4d`. Reverted https://github.com/pytorch/pytorch/pull/108626 on behalf of https://github.com/clee2000 due to I'm getting errors internally that look like the below on x86_64-apple-ios-simulator with clang 16 ([comment](https://github.com/pytorch/pytorch/pull/108626#issuecomment-1728102447))	2023-09-20 16:49:11 +00:00
cyy	a53a677b4d	[1/N] Add -Wdeprecated and related fixes (#108626 ) This PR adds -Wdeprecated to CMake warnings and fixes related issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108626 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-09-19 09:24:04 +00:00
PyTorch MergeBot	525e4f42d0	Revert "replace torch::make_unique with std::make_unique (#108866 )" This reverts commit `03e35efbf7`. Reverted https://github.com/pytorch/pytorch/pull/108866 on behalf of https://github.com/clee2000 due to Sorry but I found more usages of `torch::make_unique` internally, I can go change all of these, but I'd prefer if that gets done before this gets merged ([comment](https://github.com/pytorch/pytorch/pull/108866#issuecomment-1722577925))	2023-09-17 21:57:30 +00:00
cyy	4c208c1475	Remove unneeded linking in CMake targets (#109192 ) This PR removes unused library dependencies, help refactoring in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109192 Approved by: https://github.com/ezyang	2023-09-15 19:43:25 +00:00
cyy	03e35efbf7	replace torch::make_unique with std::make_unique (#108866 ) It should be safe to remove the old torch::make_unique functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108866 Approved by: https://github.com/albanD	2023-09-14 20:52:26 +00:00
Yang Chen	9cd4548f01	AOTInductor dynamic shape (#109012 ) Summary: This PR adds dynamic-shape support for AOTInductor * On the runtime/interface side, we added two structs, StaticDimInfo and DynamicDimInfo, to hold values for static and dynamic dimensions, respectively. Dynamic dimensions are tracked by an unordered map field defined in AOTInductorModelBase. At inference time, the inference run method will assign the current real dimensional value to each dynamic dimension before executing any kernel. * On the CUDA wrapper codegen side, we generate dynamic symbols appropriately for shape computations. We simulate kernel launch grids in the C++ land by re-using the grid functions from the Python world. The returned grid configs, which may contain symbolic expressions, are printed out in their C++ forms via the CppPrinter. Note that when dynamic shapes are involved, we have to compute grid configs for each kernel at runtime in the same way as we do for launching the corresponding Triton kernel. Otherwise, we may end up with memory-access failures or mis-computations caused by invalid indices for fetching or storing data in device memory. Differential Revision: D49100472 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109012 Approved by: https://github.com/khabinov, https://github.com/desertfire, https://github.com/hl475	2023-09-14 08:00:30 +00:00
FFFrog	003c5bb156	Add checks to `num_layers` for `RNN`, `LSTM`, `GRU` (#108853 ) Fixes #108223 As the title shown Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853 Approved by: https://github.com/mikaylagawarecki	2023-09-09 19:33:52 +00:00
shibo19	a5e1d38025	add check for torch_arg (#108397 ) Fixes https://github.com/pytorch/pytorch/issues/108219 add check for torch_arg marco, as for inchannel/outchannel/groups, it should be greater than 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108397 Approved by: https://github.com/mikaylagawarecki	2023-09-08 23:18:27 +00:00
FFFrog	f30f9fec87	Fix the issue described by #106769 (#108340 ) Fixes #106769 Align the behavior of the C++ interface with the Python interface 1. Remove some checks in C++ frontend api ,which duplicate with below `50fa5880e8/aten/src/ATen/native/RNN.cpp (L676-L690)` 3. Add some checks 4. support 1D 5. Add Test Pull Request resolved: https://github.com/pytorch/pytorch/pull/108340 Approved by: https://github.com/mikaylagawarecki	2023-09-08 22:22:09 +00:00
Mu-Chu Lee	30a33b76b9	[AOTInductor] Include constants in AOTInductor .so file. (#108473 ) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108473 Approved by: https://github.com/angelayi	2023-09-08 03:49:53 +00:00
cyy	e4f3e5434f	[Reland] Elimates c10::guts::to_string (#108748 ) Reland of PR #108480, after relanding another blocking PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108748 Approved by: https://github.com/huydhn	2023-09-07 13:35:17 +00:00
Bin Bao	60bd30ee0b	[inductor] Move AOTInductor runtime headers (#108564 ) Summary: Move AOTInductor runtime header files into its own subdirectory, to separate them from to-be-added libtorch C interface. Reviewed By: frank-wei Differential Revision: D48905038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108564 Approved by: https://github.com/frank-wei	2023-09-06 11:50:41 +00:00
PyTorch MergeBot	8da04e023e	Revert "Eliminate c10::guts::to_string (#108480 )" This reverts commit `4146be192e`. Reverted https://github.com/pytorch/pytorch/pull/108480 on behalf of https://github.com/huydhn due to Sorry for reverting this, but this is needed to keep trunk green after https://github.com/pytorch/pytorch/pull/108479 was reverted. Both will need to be relanded ([comment](https://github.com/pytorch/pytorch/pull/108480#issuecomment-1707067595))	2023-09-05 18:04:53 +00:00
shibo19	03aac0bff6	add input check at the beginning for C++ API `interpolate` (#108506 ) Fixes https://github.com/pytorch/pytorch/issues/108346 add the input check to the beginning for C++ API `interpolate`, raise an error when got an invalid input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108506 Approved by: https://github.com/ezyang	2023-09-05 17:56:17 +00:00
cyy	4146be192e	Eliminate c10::guts::to_string (#108480 ) This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string. Some other guts functions in the affected source files are also replaced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480 Approved by: https://github.com/Skylion007	2023-09-04 08:12:53 +00:00
Sherlock Huang	b9dfdc091b	[AOTInductor][Reland] Proxy Executor for Extern Fallback kernels (#107279 ) (#108350 ) Summary: This is a prototype for running extern fallback kernels with a host side proxy executor. Sample of generated cpp wrapper call: ``` at::Tensor buf0; // output buffer void* tensor_args_var_0[] = {&arg0_1, &arg0_1, &arg1_1, &arg0_1, &arg1_1, &buf0}; int64_t int_args_var_1[] = {81, 81, 7, 7, 7, 81}; proxy_executor->call_function("buf0", int_args_var_1, tensor_args_var_0); ``` - In my current implementation, proxy executor interprets the raw pointers according to the ops schema. This assumes that custom op MUST have a valid schema registered to Dispatcher. (I would like to validate this assumption) - I am using callboxed() API of the custom kernels. This is inevitable, as we wish to have a single call_function API for all possible custom kernels. - These are all the input argument types I have support so far. union Argument { # Bool value does not matter 1: bool asNone; 2: TensorArgument asTensor; 3: list<TensorArgument> asTensors; 5: i64 asInt; 7: list<i64> asInts; 8: double asFloat; 9: list<double> asFloats; 10: string asString; 10.5: list<string> asStrings; 11: SymIntArgument asSymInt; 12: list<SymIntArgument> asSymInts; 13: ScalarType asScalarType; 14: MemoryFormat asMemoryFormat; 15: Layout asLayout; 16: Device asDevice; 17: bool asBool; 18: list<bool> asBools; } - Need a policy for handling unpopulated argument with default values. Here are the options, and it has BC implications. 1. requires exported fx graph to explicitly populate default values, if users doesn't specify. 2. requires cpp wrapper to explicitly populate default values, if fx graph doesn't specify. 3. Proxy executor look up from opSchema for default values. For fixing T162112344 Test Plan: frontend: buck2 run mode/dev-sand mode/inplace -c fbcode.enable_gpu_sections=True sigmoid/frontend:export_main test: buck2 run mode/dev-sand //deeplearning/aot_inductor/test:test_custom_ops backend: buck2 run mode/dev-nosan //deeplearning/aot_inductor/fb:main buck2 test 'fbcode//mode/opt' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_cmf30x (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' Reviewed By: suo Differential Revision: D48747417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108350 Approved by: https://github.com/izaitsevfb	2023-09-02 17:14:10 +00:00
Bin Bao	06d74e6b24	Revert "[AOTInductor] Include constants in AOTInductor .so file. (#10… (#108349 ) This reverts commit `c3239442a3` due to internal test failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108349 Approved by: https://github.com/aakhundov, https://github.com/zhxchen17	2023-08-31 16:26:02 +00:00
Pritam Damania	704b0b3c67	[RESUBMIT] Standardize on error types for distributed errors. (#108191 ) We have a plethora of error types for various errors raised from c10d. These include `RuntimeError`, `TimeoutError`, `SocketError`, `DistBackendError` etc. This results in messy code during error handling somewhat like this: ``` if "NCCL" in exception_str: ... if "Timed out initializing process group in store based barrier on rank" in exception_str: ... if "The client socket has timed out after" in exception_str: ... if "Broken pipe" in exception_str: ... if "Connection reset by peer" in exception_str: ... ``` To address this issue, in this PR I've ensured added these error types: 1. DistError - the base type of all distributed errors 2. DistBackendError - this already existed and referred to PG backend errors 3. DistStoreError - for errors originating from the store 4. DistNetworkError - for general network errors coming from the socket library Pull Request resolved: https://github.com/pytorch/pytorch/pull/108191 Approved by: https://github.com/H-Huang	2023-08-30 21:47:39 +00:00
Emmanuel Menage	fe1f26af8a	Add support for PickleOpCode::APPEND in torch unpickler (#104027 ) Reviewed By: qiminglu Differential Revision: D46760650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104027 Approved by: https://github.com/ezyang	2023-08-30 14:24:50 +00:00
Mu-Chu Lee	c3239442a3	[AOTInductor] Include constants in AOTInductor .so file. (#107718 ) Summary: Include the constants into AOTInductor .so file. We do not modify existing API signatures but create necessary format with weight lifted out instead. Test Plan: test/inductor/test_aot_inductor.py Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107718 Approved by: https://github.com/angelayi, https://github.com/eellison	2023-08-29 22:37:30 +00:00
FFFrog	78810d78e8	Fix the coredump described by #106702 (#108002 ) Fixes #106702 and add some tests As shown by [maxUnpool1d](https://pytorch.org/docs/master/generated/torch.nn.MaxUnpool1d)(`MaxUnpool2d`, `MaxUnpool3d` also), `Input` and `Output` support `(N,C,)` or `(C,)`, but the c++ api currently supports the former, and the latter will cause a coredump. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108002 Approved by: https://github.com/albanD	2023-08-29 17:14:16 +00:00
Rodrigo Kumpera	fe2cda64dc	[C10D] Implement new libuv backend for TCPStore. (#108066 ) The new backend is currently under a flag 'use_libuv' in TCPStore constructor to reduce the impact on existing users as we test it. This is a reland of #105870 with a fix for a bad test. Differential Revision: [D48742554](https://our.internmc.facebook.com/intern/diff/D48742554) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108066 Approved by: https://github.com/H-Huang, https://github.com/fduwjj	2023-08-29 14:55:14 +00:00
PyTorch MergeBot	d4ff06ec84	Revert "Standardize on error types for distributed errors. (#107651 )" This reverts commit `0e2317479b`. Reverted https://github.com/pytorch/pytorch/pull/107651 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing inductor test in trunk for one of its model moco ([comment](https://github.com/pytorch/pytorch/pull/107651#issuecomment-1696578138))	2023-08-28 23:58:33 +00:00
Pritam Damania	0e2317479b	Standardize on error types for distributed errors. (#107651 ) We have a plethora of error types for various errors raised from c10d. These include `RuntimeError`, `TimeoutError`, `SocketError`, `DistBackendError` etc. This results in messy code during error handling somewhat like this: ``` if "NCCL" in exception_str: ... if "Timed out initializing process group in store based barrier on rank" in exception_str: ... if "The client socket has timed out after" in exception_str: ... if "Broken pipe" in exception_str: ... if "Connection reset by peer" in exception_str: ... ``` To address this issue, in this PR I've ensured added these error types: 1. DistError - the base type of all distributed errors 2. DistBackendError - this already existed and referred to PG backend errors 3. DistStoreError - for errors originating from the store 4. DistNetworkError - for general network errors coming from the socket library Pull Request resolved: https://github.com/pytorch/pytorch/pull/107651 Approved by: https://github.com/H-Huang	2023-08-28 21:58:15 +00:00
PyTorch MergeBot	d3f92ca9e9	Revert "[C10D] Implement new libuv backend for TCPStore. (#105870 )" This reverts commit `3c841163ce`. Reverted https://github.com/pytorch/pytorch/pull/105870 on behalf of https://github.com/huydhn due to I think the distributed failure is related as this is now failing in trunk ([comment](https://github.com/pytorch/pytorch/pull/105870#issuecomment-1683117192))	2023-08-17 23:41:00 +00:00
Rodrigo Kumpera	3c841163ce	[C10D] Implement new libuv backend for TCPStore. (#105870 ) The new backend is currently under a flag 'use_libuv' in TCPStore constructor to reduce the impact on existing users as we test it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105870 Approved by: https://github.com/H-Huang	2023-08-17 20:40:32 +00:00
soulitzer	d9dc4b2b4c	[BE] Add missing override to remove build warning spam (#107191 ) ``` In file included from /local/pytorch3/test/cpp/api/optim.cpp:7: local/pytorch3/test/cpp/api/support.h:44:3: warning: '~WarningCapture' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override] ~WarningCapture() { ^ local/pytorch3/c10/util/Exception.h:167:11: note: overridden virtual function is here virtual ~WarningHandler() = default; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107191 Approved by: https://github.com/janeyx99	2023-08-15 17:32:34 +00:00
hongxyan	df8493455e	[ROCm] enable test_api (test_libtorch) cpp unit tests (#106712 ) This is part of effort to enable missed cpp tests for ROCm platform. In this change, - enabled test_libtorch cpp tests (more than 3107 tests) - fixed missing dependency: libcaffe2_nvrtc.so required by FunctionalTest.Conv1d - test_api binary is changed to exclude failed tests InitTest and IntegrationTest - to revisit later Pull Request resolved: https://github.com/pytorch/pytorch/pull/106712 Approved by: https://github.com/jithunnair-amd, https://github.com/kit1980	2023-08-14 20:09:34 +00:00
angelayi	5b13c779d4	[AOTInductor] Remove call to aot_autograd when receiving ExportedProgram (#105977 ) https://github.com/pytorch/pytorch/issues/105555 Existing flow first exports and then calls torch._inductor.aot_compile. However, export calls aot_autograd with the core aten decomposition table, and then torch._inductor.aot_compile calls aot_autograd again with the inductor decomposition table. The 2nd calling of aot_autograd is supposedly causing some problems, and seems excessive, so instead we will create a new function, torch._export.aot_compiler which will export using the inductor decomposition table, pass it to inductor's compile_fx_aot, and because it has already been exported, avoid recalling aot_autograd. ``` def aot_compile( f: Callable, args: Tuple[Any], kwargs: Optional[Dict[str, Any]] = None, constraints: Optional[List[Constraint]] = None, ) -> Tuple[str, ExportedProgram]: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105977 Approved by: https://github.com/desertfire, https://github.com/zhxchen17, https://github.com/eellison	2023-08-04 15:35:23 +00:00
Edward Z. Yang	7b9d250f06	Change _dynamo.export to be export(f)(args, *kwargs) (#106109 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106109 Approved by: https://github.com/voznesenskym	2023-07-27 21:41:13 +00:00
Rodrigo Kumpera	174b0c22cb	[C10D] Remove watchKey functionality from the Store. (#105014 ) The feature was never fully finished and never got any adoption but TCPStore pays the cost of twice the number of tcp connections anyway. While the cost of all those idle connections is minimal is doesn't come for free: - It increases the likelyhood of a connection refused failure during the initialization stampede. - TCPStore uses poll for checking for socket availability which scales linearly on the number of sockets regardless of their status. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105014 Approved by: https://github.com/fduwjj	2023-07-21 21:18:55 +00:00
Richard Zou	ed6de45563	Fix Tensor::register_hook behavior on undefined tensors (#105587 ) When the hook registered by Tensor::register_hook (in C++) gets passed an undefined tensor, it raises an internal assert in debug mode. The cause is that we attempt to construct an OptionalTensorRef (`4448c78a5d/aten/src/ATen/core/Tensor.h (L68)`) which asserts that the passed-in TensorBase is defined. The fix is that we create a new TensorRef class to convert the TensorBase into a Tensor without bumping the refcount (which is what OptionalTensorRef does). We cannot reuse OptionalTensorRef because OptionalTensorRef represents `optional<Tensor>` that cannot hold an Undefined Tensor. For some more historical context, it looks like this behavior was introduced in #63612 Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/105587 Approved by: https://github.com/soulitzer	2023-07-21 14:37:21 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
Bin Bao	b10de43c0a	Add aot_inductor as a test backend for benchmarking (#105221 ) Summary: Original PR at https://github.com/pytorch/pytorch/pull/104977. Landing from fbcode instead. Add an aot_inductor backend (Export+AOTInductor) in the benchmarking harness. Note it is not a dynamo backend. Moved files from torch/_inductor/aot_inductor_include to torch/csrc/inductor as a more standard way for exposing headers Created a caching function in benchmarks/dynamo/common.py for compiling, loading and caching the .so file, as a proxy for a pure C++ deployment, but easier for benchmarking. Differential Revision: D47452591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105221 Approved by: https://github.com/jansel	2023-07-18 13:16:36 +00:00
Howard Huang	9165d46b89	DDP + C10D sparse all_reduce changes (#103916 ) (#104256 ) Summary: reland of https://github.com/pytorch/pytorch/pull/103916 ## Changes prototyping sparse allreduce using the sparse dispatch key. When passing in sparse tensors into `dist.allreduce()` we can execute our dispatched function. prior to this change, passing a sparse tensor into `allreduce()` will error out with `Tensor must be dense...` ## Example script ```python # python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 this_script.py import torch import torch.distributed as dist def main(): dist.init_process_group(backend="nccl") rank = dist.get_rank() a = torch.tensor([[0, 2.], [3, 0]]).to(rank) a = a.to_sparse() print(f"rank {rank} - a: {a}") dist.all_reduce(a) if __name__ == "__main__": main() ``` output: ``` rank 1 - a: tensor(indices=tensor([[0, 1], [1, 0]]), values=tensor([2., 3.]), device='cuda:1', size=(2, 2), nnz=2, layout=torch.sparse_coo) allreduce_sparse_cuda_ tensor.is_sparse() = 1 in ProcessGroupNCCL::allreduceSparse rank 0 - a: tensor(indices=tensor([[0, 1], [1, 0]]), values=tensor([2., 3.]), device='cuda:0', size=(2, 2), nnz=2, layout=torch.sparse_coo) allreduce_sparse_cuda_ tensor.is_sparse() = 1 in ProcessGroupNCCL::allreduceSparse ``` Test Plan: Testing commands (OSS): ``` # python pytest test/distributed/test_c10d_nccl.py -vsk test_sparse_allreduce_ops # c++ build/bin/ProcessGroupNCCLTest --gtest_filter=ProcessGroupNCCLTest.testSparseAllreduce ``` Testing commands (internal, ondemand GPU): ddp tests: ``` buck build mode/opt -c hpc_comms.use_ncclexp=default //caffe2/test/distributed:c10d --show-full-output # Get the .par file from the previous command and use it below TORCH_SHOW_CPP_STACKTRACE=1 /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/c8344b52091f4f7f/caffe2/test/distributed/__c10d__/c10d.par -r test_ddp_set_sparse_metadata ``` c10d tests: ``` # build tests and run with log output (python) buck build mode/opt -c hpc_comms.use_ncclexp=default //caffe2/test/distributed:c10d --show-full-output NCCL_DEBUG=WARN /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/c8344b52091f4f7f/caffe2/test/distributed/__c10d__/c10d.par -r test_sparse_allreduce_ops # python NCCL_DEBUG=WARN buck test mode/opt -c hpc_comms.use_ncclexp=default //caffe2/test/distributed:c10d -- --exact 'caffe2/test/distributed:c10d - test_sparse_allreduce_ops (test_c10d_nccl.ProcessGroupNCCLTest)' # c++ NCCL_DEBUG=WARN buck run mode/opt -c hpc_comms.use_ncclexp=default //caffe2/test/cpp/c10d:ProcessGroupNCCLTest -- --gtest_filter=ProcessGroupNCCLTest.testSparseAllreduce ``` Differential Revision: D47056695 Pulled By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/104256 Approved by: https://github.com/rohan-varma	2023-06-28 00:37:52 +00:00
Yang Chen	d2281e38ae	Adds the initial support for AOTInductor model and interface (#104202 ) This PR combines the C++ code for the AOTInductor's model and interface with Bin Bao's changes to AOTInductor codegen. It adds a number of AOTInductor C interfaces that can be used by an inference runtime. Under the hood of the interfaces, the model code generated by the AOTInductor's codegen is wrapped into a class, AOTInductorModel, which manages tensors and run the model inference. On top of AOTInductorModel, we provide one more abstract layer, AOTInductorModelContainer, which allows the user to have multiple inference runs concurrently for the same model. This PR also adjusts the compilation options for AOT codegen, particularly some fbcode-related changes such as libs to be linked and header-file search paths. Note that this is the very first version of the AOTInductor model and interface, so many features (e.g. dynamic shape) are incomplete. We will support those missing features in in future PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104202 Approved by: https://github.com/desertfire	2023-06-27 00:37:26 +00:00
PyTorch MergeBot	436d035dc7	Revert "DDP + C10D sparse all_reduce changes (#103916 )" This reverts commit `fed5fba6e4`. Reverted https://github.com/pytorch/pytorch/pull/103916 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/103916#issuecomment-1608412325))	2023-06-26 22:37:58 +00:00
Howard Huang	fed5fba6e4	DDP + C10D sparse all_reduce changes (#103916 ) Summary: ## Changes prototyping sparse allreduce using the sparse dispatch key. When passing in sparse tensors into `dist.allreduce()` we can execute our dispatched function. prior to this change, passing a sparse tensor into `allreduce()` will error out with `Tensor must be dense...` ## Example script ```python # python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 this_script.py import torch import torch.distributed as dist def main(): dist.init_process_group(backend="nccl") rank = dist.get_rank() a = torch.tensor([[0, 2.], [3, 0]]).to(rank) a = a.to_sparse() print(f"rank {rank} - a: {a}") dist.all_reduce(a) if __name__ == "__main__": main() ``` output: ``` rank 1 - a: tensor(indices=tensor([[0, 1], [1, 0]]), values=tensor([2., 3.]), device='cuda:1', size=(2, 2), nnz=2, layout=torch.sparse_coo) allreduce_sparse_cuda_ tensor.is_sparse() = 1 in ProcessGroupNCCL::allreduceSparse rank 0 - a: tensor(indices=tensor([[0, 1], [1, 0]]), values=tensor([2., 3.]), device='cuda:0', size=(2, 2), nnz=2, layout=torch.sparse_coo) allreduce_sparse_cuda_ tensor.is_sparse() = 1 in ProcessGroupNCCL::allreduceSparse ``` Test Plan: Testing commands (OSS): ``` # python pytest test/distributed/test_c10d_nccl.py -vsk test_sparse_allreduce_ops # c++ build/bin/ProcessGroupNCCLTest --gtest_filter=ProcessGroupNCCLTest.testSparseAllreduce ``` Testing commands (internal, ondemand GPU): ddp tests: ``` buck build mode/opt -c hpc_comms.use_nccl=exp //caffe2/test/distributed:c10d --show-full-output # Get the .par file from the previous command and use it below TORCH_SHOW_CPP_STACKTRACE=1 /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/c8344b52091f4f7f/caffe2/test/distributed/__c10d__/c10d.par -r test_ddp_set_sparse_metadata ``` c10d tests: ``` # build tests and run with log output (python) buck build mode/opt -c hpc_comms.use_nccl=exp //caffe2/test/distributed:c10d --show-full-output NCCL_DEBUG=WARN /data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/c8344b52091f4f7f/caffe2/test/distributed/__c10d__/c10d.par -r test_sparse_allreduce_ops # python NCCL_DEBUG=WARN buck test mode/opt -c hpc_comms.use_nccl=exp //caffe2/test/distributed:c10d -- --exact 'caffe2/test/distributed:c10d - test_sparse_allreduce_ops (test_c10d_nccl.ProcessGroupNCCLTest)' # c++ NCCL_DEBUG=WARN buck run mode/opt -c hpc_comms.use_nccl=exp //caffe2/test/cpp/c10d:ProcessGroupNCCLTest -- --gtest_filter=ProcessGroupNCCLTest.testSparseAllreduce ``` Differential Revision: D46724856 Pulled By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/103916 Approved by: https://github.com/rohan-varma	2023-06-26 20:42:17 +00:00
cyy	483f748dd5	[BE] Enforce missing `override` keyword (#104032 ) This PR enables `-Winconsistent-missing-destructor-override` and `-Winconsistent-missing-override` and fixes violations. <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 47e904e</samp> This pull request updates the code of various classes and operators in the `caffe2` and `aten` subdirectories to use the `override` specifier instead of the `virtual` keyword for destructors and other virtual functions that override a base class function. This improves the code readability, quality, and consistency with C++ best practices. It also modifies the `./CMakeLists.txt` file to enable warnings for these specifiers, but disable errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104032 Approved by: https://github.com/malfet	2023-06-24 02:34:24 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00
PaDarochek	b00d388ada	Update test_misc.cpp (#97768 ) Potential null dereference after dynamic cast was found during static analysis. Description: Dereference of `ctx` is performed in `TORCH_CHECK` on line 1176, while `ctx` pointer may equal `nullptr`. Previous `TORCH_CHECK` on line 1175 checks the value of `ctx_ptr` pointer that may be of type that cannot be casted to `TestContext`. In such case, `dynamic_cast` returns `nullptr` despite `ctx_ptr` is not equal to `nullptr`. Fix:* - Check `ctx` instead of `ctx_ptr` for equality to zero. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97768 Approved by: https://github.com/kit1980	2023-06-13 16:14:11 +00:00
Daniil Kutz	e6fc7d814d	Segmentation fault in flatbuffers when parsing malformed modules (#95221 ) Fixes #95061, #95062 Add Flatbuffer verification before parsing to avoid crashing on malformed modules. Flatbuffers doesn't perform boundary checks at runtime for the sake of performance, so when parsing untrusted modules it is highly recommended to verify overall buffer integrity. This bug can be triggered both by C++ (`torch::jit::load`, `torch::jitload_jit_module_from_file`) and Python API (`torch.jit.load`, `torch.jit.jit_module_from_flatbuffer`). Crash files to reproduce: [crash-1feb368861083e3d242e5c3fcb1090869f4819c4.txt](https://github.com/pytorch/pytorch/files/10795267/crash-1feb368861083e3d242e5c3fcb1090869f4819c4.txt) [crash-7e8ffd314223be96b43ca246d3d3481702869455.txt](https://github.com/pytorch/pytorch/files/10795268/crash-7e8ffd314223be96b43ca246d3d3481702869455.txt) [crash-ad4d7c6183af8f34fe1cb5c8133315c6389c409f.txt](https://github.com/pytorch/pytorch/files/10795279/crash-ad4d7c6183af8f34fe1cb5c8133315c6389c409f.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95221 Approved by: https://github.com/qihqi, https://github.com/davidberard98	2023-05-24 21:16:19 +00:00
Michael Voznesensky	39f52c0218	Switch AOT Inductor test to export, add dynamic, fix invocation bug (#101585 ) Fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/101585 Approved by: https://github.com/ngimel, https://github.com/desertfire	2023-05-17 05:52:08 +00:00
Howard Huang	a206e8b027	[small BE] update NcclTest dim size (#101127 ) Previously input dimensions are fixed to 3x3, this is a small change to make that configurable. Will be used in future additions to nccl tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/101127 Approved by: https://github.com/rohan-varma	2023-05-15 23:05:10 +00:00
mantaionut	bfb2888b51	Re enable AutogradNotImplementedFallback on Windows (#101062 ) Fixes #48763 Due to #48763 AutogradNotImplementedFallback were disabled. I re-enabled them and the CI is successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101062 Approved by: https://github.com/zou3519	2023-05-15 13:41:06 +00:00
Rohan Varma	6d6abba0d8	[IValue] Better handle sparseTensors in extractStorages (#100783 ) Sparse tensors don't seem to be handled when we have tensors instead of pyobjects. Differential Revision: [D45632427](https://our.internmc.facebook.com/intern/diff/D45632427/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100783 Approved by: https://github.com/H-Huang	2023-05-11 23:44:51 +00:00
PyTorch MergeBot	9ff547a57f	Revert "Fix ordered dict loading with LibTorch (#100743 )" This reverts commit `d371a890a2`. Reverted https://github.com/pytorch/pytorch/pull/100743 on behalf of https://github.com/jeanschmidt due to New test introduced SerializationTest.SaveStateDict is adding regressions ([comment](https://github.com/pytorch/pytorch/pull/100743#issuecomment-1542400538))	2023-05-10 15:29:14 +00:00
Daniel Falbel	d371a890a2	Fix ordered dict loading with LibTorch (#100743 ) Fixes #100741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100743 Approved by: https://github.com/Skylion007	2023-05-09 13:52:45 +00:00
kwanghoon-meta	3fb0bf4d96	Automatic pulling ExtraFileMaps without explicit mapping. Differential Revision: D45170126nnPull Request resolved: https://github.com/pytorch/pytorch/pull/99747	2023-05-01 16:27:56 -07:00
PyTorch MergeBot	380ccfd442	Revert "Added round_with_scale_factor arg to ATen (#97868 )" This reverts commit `aa99c5b4ed`. Reverted https://github.com/pytorch/pytorch/pull/97868 on behalf of https://github.com/osalpekar due to Caused breakages in the glow compiler - see [D45374622](https://www.internalfb.com/diff/D45374622) for more details	2023-04-28 20:47:00 +00:00
vfdev-5	aa99c5b4ed	Added round_with_scale_factor arg to ATen (#97868 ) Addresses #62396 following the strategy described in https://github.com/pytorch/pytorch/pull/64983#issuecomment-1026177629. Fixing output size to match opencv, scikit-image, scipy if scale factor is specified on ATen side only due to JIT FC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97868 Approved by: https://github.com/lezcano, https://github.com/mikaylagawarecki	2023-04-26 18:48:37 +00:00
Bin Bao	e43918b93a	[inductor] Fix AOTInductor (#99203 ) Summary: Fix the broken AOTInductor flow and add a smoketest on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99203 Approved by: https://github.com/jansel	2023-04-25 14:42:12 +00:00
soulitzer	abe96654de	[reland][BE][autograd Function] Raise an error if input is returned a… (#98051 ) …s-is and saved for forward or backward in setup_context Fixes #ISSUE_NUMBER Relanding this in a new non-ghstack PR so I can import this to do co-dev Pull Request resolved: https://github.com/pytorch/pytorch/pull/98051 Approved by: https://github.com/zou3519	2023-04-11 15:42:54 +00:00
mikey dagitses	9d36361601	make TensorImpl::data_ptr_impl() non-const and have mutable in the name (#97744 ) See D44409928. Differential Revision: [D44450468](https://our.internmc.facebook.com/intern/diff/D44450468/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97744 Approved by: https://github.com/ezyang	2023-04-09 11:08:41 +00:00
mikey dagitses	c68a94c5ea	distinguish mutability of untyped Storage::data (#97690 ) See D44409928. Differential Revision: [D44429769](https://our.internmc.facebook.com/intern/diff/D44429769/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97690 Approved by: https://github.com/ezyang	2023-04-08 02:02:28 +00:00
Paweł Piskorski	2d9b2bcfba	Extend TensorImpl with BackendMeta (#97429 ) BackendMeta offers a binary interface for the backend to attach arbitrary data to TensorImpl. TensorImpl has exactly one "slot" for backend metadata, however backend is free to compose any structure that is opaque to the framework beyond iheriting standard BackendMeta base. Change-Id: I670fcdd16dd1c2b00f7eaa1cbc5b5dfea59a6221 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97429 Approved by: https://github.com/ezyang	2023-04-04 23:47:03 +00:00
PyTorch MergeBot	7eaaefafb3	Revert "Extend TensorImpl with BackendMeta (#97429 )" This reverts commit `bc38b278bf`. Reverted https://github.com/pytorch/pytorch/pull/97429 on behalf of https://github.com/huydhn due to Sorry for reverting your PR as I am trying to root cause a libtorch build failure on Windows starting from your change `bc38b278bf`. AFAICT, there is no other change from the log. I will reland this if the failure is unrelated	2023-04-04 05:13:18 +00:00
ppiskorski	bc38b278bf	Extend TensorImpl with BackendMeta (#97429 ) BackendMeta offers a binary interface for the backend to attach arbitrary data to TensorImpl. TensorImpl has exactly one "slot" for backend metadata, however backend is free to compose any structure that is opaque to the framework beyond iheriting standard BackendMeta base. Change-Id: I670fcdd16dd1c2b00f7eaa1cbc5b5dfea59a6221 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97429 Approved by: https://github.com/ezyang	2023-04-04 03:01:14 +00:00
PyTorch MergeBot	45acfc8574	Revert "[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212 )" This reverts commit `313db584f3`. Reverted https://github.com/pytorch/pytorch/pull/97212 on behalf of https://github.com/soulitzer due to Internally someone is rely on _wrap_outputs and we updated its signature	2023-03-30 22:03:07 +00:00
soulitzer	313db584f3	[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212 ) Fixes https://github.com/pytorch/pytorch/issues/96887 We error out in BOTH the case when graph is created and when it is not created. Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context. This makes setup_context and non-setup_context versions diverge in their behavior - With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs. - But now with the setup_context version, we produce an error for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212 Approved by: https://github.com/zou3519	2023-03-29 17:54:00 +00:00
PyTorch MergeBot	2ef6ffdfa1	Revert "[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212 )" This reverts commit `f3aca45a16`. Reverted https://github.com/pytorch/pytorch/pull/97212 on behalf of https://github.com/soulitzer due to TestAutogradFunctionCUDA.test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_True_cuda leaks	2023-03-28 18:30:51 +00:00
soulitzer	f3aca45a16	[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212 ) Fixes https://github.com/pytorch/pytorch/issues/96887 We error out in BOTH the case when graph is created and when it is not created. Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context. This makes setup_context and non-setup_context versions diverge in their behavior - With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs. - But now with the setup_context version, we produce an error for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212 Approved by: https://github.com/zou3519	2023-03-28 03:14:32 +00:00
Nikita Shulga	96e3b3ac72	[BE] Cleanup CMake flag suppressions (#97584 ) Use `append_cxx_flag_if_supported` to determine whether or not `-Werror` is supported Do not suppress deprecation warnings if glog is not used/installed, as the way check is written right now, it will suppress deprecations even if `glog` is not installed. Similarly, do not suppress deprecations on MacOS simply because we are compiling with protobuf. Fix deprecation warnings in: - MPS by replacing `MTLResourceOptionCPUCacheModeDefault`->`MTLResourceCPUCacheModeDefaultCache` - In GTests by replacing `TYPED_TEST_CASE`->`TYPED_TEST_SUITE` - In `codegen/onednn/interface.cpp`, by using passing `Stack` by reference rathern than pointer. Do not guard calls to `append_cxx_flag_if_supported` with `if(CLANG)` or `if(GCC)`. Fix some deprecated calls in `Metal` hide more complex exception under `C10_CLANG_DIAGNOSTIC_IGNORE` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97584 Approved by: https://github.com/kit1980	2023-03-27 18:46:09 +00:00
Ke Wen	f89af60183	Rewrite NCCL watchdog to more reliably throw timeout (#97066 ) Fixes #97191 This PR aims to propagate collective exceptions (async error or timeout) up to the program, so as to avoid silent stuck job. ### Previous output in #97191 ``` Rank 0 is the problematic rank Rank 4 completed Rank 5 completed Rank 3 completed Rank 6 completed Rank 2 completed Rank 7 completed Rank 1 completed [E ProcessGroupNCCL.cpp:464] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=10000) ran for 10917 milliseconds before timing out. Rank 0 completed [E ProcessGroupNCCL.cpp:478] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. [E ProcessGroupNCCL.cpp:483] To avoid data inconsistency, we are taking the entire process down. ``` Although it says that it is taking the process down, it sometimes fails to do so. ### New output after this PR: ``` ... [E ProcessGroupNCCL.cpp:459] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=10000) ran for 10599 milliseconds before timing out. [E ProcessGroupNCCL.cpp:473] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. [E ProcessGroupNCCL.cpp:479] To avoid data inconsistency, we are taking the entire process down. [E ProcessGroupNCCL.cpp:818] [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=10000) ran for 10599 milliseconds before timing out. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 194470) of binary: /data/home/kw2501/repos/pytorch-dev-env/bin/python Traceback (most recent call last): File "/pytorch-dev-env/bin/torchrun", line 33, in <module> sys.exit(load_entry_point('torch', 'console_scripts', 'torchrun')()) File "/pytorch-dev/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(args, *kwargs) File "/pytorch-dev/torch/distributed/run.py", line 794, in main run(args) File "/pytorch-dev/torch/distributed/run.py", line 785, in run elastic_launch( File "/pytorch-dev/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/pytorch-dev/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ hang.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-03-20_22:00:42 host : node0 rank : 0 (local_rank: 0) exitcode : -6 (pid: 194470) error_file: <N/A> traceback : Signal 6 (SIGABRT) received by PID 194470 ============================================================ ``` The log suggests that TorchX monitor is triggered, and job is torn down. ### Major changes in this PR: 1. Merge ncclWatchDog thread and workCleanupLoop thread into one so that the watch action and the throw action are streamlined. Previously, ncclWatchDog is responsible for watching comm error and timeout, and workCleanupLoop is responsible for watching Work item error and throwing exception. This two-thread design is not streamlined, raising the chance of missing the throw. Also, it is duplicated to watch at multiple level. 2. Rethrow exception at watchdog thread. 3. Clean up a bunch of duplicated functions, e.g. `checkAndThrowException` and `handleNcclException`. 4. Turn on ASYNC_ERROR_HANDLING by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/97066 Approved by: https://github.com/rohan-varma	2023-03-25 04:30:20 +00:00
Pruthvi Madugundu	baf71a8aad	[ROCm] Update clock intrinsic handling for AMD gfx11 family (#97005 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97005 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2023-03-24 18:29:49 +00:00
PyTorch MergeBot	f25cdf8aeb	Revert "Rewrite NCCL watchdog to more reliably throw timeout (#97066 )" This reverts commit `95e8d0c39e`. Reverted https://github.com/pytorch/pytorch/pull/97066 on behalf of https://github.com/clee2000 due to sorry but I think this broke periodic mutigpu tests `416bac5b81` https://github.com/pytorch/pytorch/actions/runs/4505085943/jobs/7930826040	2023-03-24 06:27:00 +00:00
Ke Wen	95e8d0c39e	Rewrite NCCL watchdog to more reliably throw timeout (#97066 ) Fixes #97191 This PR aims to propagate collective exceptions (async error or timeout) up to the program, so as to avoid silent stuck job. ### Previous output in #97191 ``` Rank 0 is the problematic rank Rank 4 completed Rank 5 completed Rank 3 completed Rank 6 completed Rank 2 completed Rank 7 completed Rank 1 completed [E ProcessGroupNCCL.cpp:464] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=10000) ran for 10917 milliseconds before timing out. Rank 0 completed [E ProcessGroupNCCL.cpp:478] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. [E ProcessGroupNCCL.cpp:483] To avoid data inconsistency, we are taking the entire process down. ``` Although it says that it is taking the process down, it sometimes fails to do so. ### New output after this PR: ``` ... [E ProcessGroupNCCL.cpp:459] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=10000) ran for 10599 milliseconds before timing out. [E ProcessGroupNCCL.cpp:473] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. [E ProcessGroupNCCL.cpp:479] To avoid data inconsistency, we are taking the entire process down. [E ProcessGroupNCCL.cpp:818] [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, Timeout(ms)=10000) ran for 10599 milliseconds before timing out. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 194470) of binary: /data/home/kw2501/repos/pytorch-dev-env/bin/python Traceback (most recent call last): File "/pytorch-dev-env/bin/torchrun", line 33, in <module> sys.exit(load_entry_point('torch', 'console_scripts', 'torchrun')()) File "/pytorch-dev/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(args, *kwargs) File "/pytorch-dev/torch/distributed/run.py", line 794, in main run(args) File "/pytorch-dev/torch/distributed/run.py", line 785, in run elastic_launch( File "/pytorch-dev/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/pytorch-dev/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ hang.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-03-20_22:00:42 host : node0 rank : 0 (local_rank: 0) exitcode : -6 (pid: 194470) error_file: <N/A> traceback : Signal 6 (SIGABRT) received by PID 194470 ============================================================ ``` The log suggests that TorchX monitor is triggered, and job is torn down. ### Major changes in this PR: 1. Merge ncclWatchDog thread and workCleanupLoop thread into one so that the watch action and the throw action are streamlined. Previously, ncclWatchDog is responsible for watching comm error and timeout, and workCleanupLoop is responsible for watching Work item error and throwing exception. This two-thread design is not streamlined, raising the chance of missing the throw. Also, it is duplicated to watch at multiple level. 2. Rethrow exception at watchdog thread. 3. Clean up a bunch of duplicated functions, e.g. `checkAndThrowException` and `handleNcclException`. 4. Turn on ASYNC_ERROR_HANDLING by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/97066 Approved by: https://github.com/rohan-varma	2023-03-23 21:31:21 +00:00
Rishub Tamirisa	f3b8638074	Adding nn.ZeroPad1d and nn.ZeroPad3d (#96295 ) Fixes #95796 ### Implementation Adds python implementation for `nn.ZeroPad1d` and `nn.ZeroPad3d` in `torch/nn/modules/padding.py`. Adds cpp implementation for `nn::ZeroPad1d` and `nn::ZeroPad3d` in the following 3 files, refactored with templates similarly to `nn::ConstantPad`'s implementation: <br> - `torch/crsc/api/include/torch/nn/modules/padding.h` - `torch/csrc/api/include/torch/nn/options/padding.h` - `torch/csrc/api/src/nn/modules/padding.cpp` Also added relevant definitions in `torch/nn/modules/__init__.py`. ### Testing Adds the following tests: - cpp tests of similar length and structure as `ConstantPad` and the existing `ZeroPad2d` impl in `test/cpp/api/modules.cpp` - cpp API parity tests in `torch/testing/_internal/common_nn.py` - module init tests in `test/test_module_init.py` Also added relevant definitions in `test/cpp_api_parity/parity-tracker.md` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96295 Approved by: https://github.com/soulitzer	2023-03-10 03:51:41 +00:00
kshitij12345	3b966a6ce3	[autograd] disable backward/grad for complex scalar output (#92753 ) Fixes https://github.com/pytorch/pytorch/issues/92750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753 Approved by: https://github.com/ezyang	2023-02-23 11:38:27 +00:00
Peter Bell	bc438af6fe	std/var: support floating point correction value (#94073 ) Ref https://github.com/pytorch/pytorch/issues/61492#issuecomment-1413003480 The array API specifies correction to be `Union[int, float]` while we currently only support integers. https://data-apis.org/array-api/latest/API_specification/generated/array_api.std.html As std/var is calculated currently, the final count of elements is already done in floating point so we can make the correction floating point without any loss of precision or generality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94073 Approved by: https://github.com/ezyang	2023-02-23 05:50:45 +00:00
cyy	1ab112cfab	code is clean enough that some warnings can be enabled (#95139 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95139 Approved by: https://github.com/Skylion007	2023-02-21 07:24:20 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
cyy	bf9be50bb8	Some more fixes (#94049 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/94049 Approved by: https://github.com/Skylion007	2023-02-07 01:51:06 +00:00
Ivan Yashchuk	fba13d94a1	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. - [x] XLA PR: https://github.com/pytorch/xla/pull/4498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet	2023-01-31 11:59:11 +00:00
Kshiteej K	68a98537d5	[fix] nn c++ : segfault in modulelist and moduledict (#93074 ) Fixes https://github.com/pytorch/pytorch/issues/73565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93074 Approved by: https://github.com/albanD	2023-01-27 12:20:19 +00:00
Han Qi	1f352f7c1f	Update flatbuffer test models to match pkl models (#93022 ) Also regenerate upgrader with ``` python torchgen/operator_versions/gen_mobile_upgraders.py ``` Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/93022 Approved by: https://github.com/tugsbayasgalan	2023-01-26 21:17:57 +00:00
Jane Xu	819bd5b77a	[nn] add set_to_none flag for C++ optim endpoint (#92989 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92989 Approved by: https://github.com/ngimel, https://github.com/Skylion007	2023-01-26 04:16:52 +00:00
jjsjann123	c11b301bcd	[NVFUSER] refactor nvfuser build (#89621 ) This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library. Contents inside this PR: 1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp) 2. splits the build system so nvfuser is generating its own `.so` files. Currently there are: - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser` 3. nvfuser cpp tests is currently being compiled into `nvfuser_tests` 4. cmake is refactored so that: - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`. - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built. - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary` Future work that's scoped in following PR: - Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet - Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621 Approved by: https://github.com/davidberard98	2023-01-26 02:50:44 +00:00
Jane Xu	b90496eef5	[nn] zero_grad() set_to_none default True (#92731 ) Attempts to fix #92656 BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more). Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731 Approved by: https://github.com/ngimel	2023-01-26 01:04:28 +00:00
PyTorch MergeBot	acdd462b1a	Revert "Remove deprecated torch.symeig (#70988 )" This reverts commit `d70ed68162`. Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful	2023-01-24 19:03:40 +00:00
Ivan Yashchuk	d70ed68162	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980	2023-01-23 22:51:40 +00:00
Edward Z. Yang	5c6f5439b7	Implement SymBool (#92149 ) We have known for a while that we should in principle support SymBool as a separate concept from SymInt and SymFloat ( in particular, every distinct numeric type should get its own API). However, recent work with unbacked SymInts in, e.g., https://github.com/pytorch/pytorch/pull/90985 have made this a priority to implement. The essential problem is that our logic for computing the contiguity of tensors performs branches on the passed in input sizes, and this causes us to require guards when constructing tensors from unbacked SymInts. Morally, this should not be a big deal because, we only really care about the regular (non-channels-last) contiguity of the tensor, which should be guaranteed since most people aren't calling `empty_strided` on the tensor, however, because we store a bool (not a SymBool, prior to this PR it doesn't exist) on TensorImpl, we are forced to immediately compute these values, even if the value ends up not being used at all. In particular, even when a user allocates a contiguous tensor, we still must compute channels-last contiguity (as some contiguous tensors are also channels-last contiguous, but others are not.) This PR implements SymBool, and makes TensorImpl use SymBool to store the contiguity information in ExtraMeta. There are a number of knock on effects, which I now discuss below. * I introduce a new C++ type SymBool, analogous to SymInt and SymFloat. This type supports logical and, logical or and logical negation. I support the bitwise operations on this class (but not the conventional logic operators) to make it clear that logical operations on SymBool are NOT short-circuiting. I also, for now, do NOT support implicit conversion of SymBool to bool (creating a guard in this case). This does matter too much in practice, as in this PR I did not modify the equality operations (e.g., `==` on SymInt) to return SymBool, so all preexisting implicit guards did not need to be changed. I also introduced symbolic comparison functions `sym_eq`, etc. on SymInt to make it possible to create SymBool. The current implementation of comparison functions makes it unfortunately easy to accidentally introduce guards when you do not mean to (as both `s0 == s1` and `s0.sym_eq(s1)` are valid spellings of equality operation); in the short term, I intend to prevent excess guarding in this situation by unit testing; in the long term making the equality operators return SymBool is probably the correct fix. * ~~I modify TensorImpl to store SymBool for the `is_contiguous` fields and friends on `ExtraMeta`. In practice, this essentially meant reverting most of the changes from https://github.com/pytorch/pytorch/pull/85936 . In particular, the fields on ExtraMeta are no longer strongly typed; at the time I was particularly concerned about the giant lambda I was using as the setter getting a desynchronized argument order, but now that I have individual setters for each field the only "big list" of boolean arguments is in the constructor of ExtraMeta, which seems like an acceptable risk. The semantics of TensorImpl are now that we guard only when you actually attempt to access the contiguity of the tensor via, e.g., `is_contiguous`. By in large, the contiguity calculation in the implementations now needs to be duplicated (as the boolean version can short circuit, but the SymBool version cannot); you should carefully review the duplicate new implementations. I typically use the `identity` template to disambiguate which version of the function I need, and rely on overloading to allow for implementation sharing. The changes to the `compute_` functions are particularly interesting; for most of the functions, I preserved their original non-symbolic implementation, and then introduce a new symbolic implementation that is branch-less (making use of our new SymBool operations). However, `compute_non_overlapping_and_dense` is special, see next bullet.~~ This appears to cause performance problems, so I am leaving this to an update PR. * (Update: the Python side pieces for this are still in this PR, but they are not wired up until later PRs.) While the contiguity calculations are relatively easy to write in a branch-free way, `compute_non_overlapping_and_dense` is not: it involves a sort on the strides. While in principle we can still make it go through by using a data oblivious sorting network, this seems like too much complication for a field that is likely never used (because typically, it will be obvious that a tensor is non overlapping and dense, because the tensor is contiguous.) So we take a different approach: instead of trying to trace through the logic computation of non-overlapping and dense, we instead introduce a new opaque operator IsNonOverlappingAndDenseIndicator which represents all of the compute that would have been done here. This function returns an integer 0 if `is_non_overlapping_and_dense` would have returned `False`, and an integer 1 otherwise, for technical reasons (Sympy does not easily allow defining custom functions that return booleans). The function itself only knows how to evaluate itself if all of its arguments are integers; otherwise it is left unevaluated. This means we can always guard on it (as `size_hint` will always be able to evaluate through it), but otherwise its insides are left a black box. We typically do NOT expect this custom function to show up in actual boolean expressions, because we will typically shortcut it due to the tensor being contiguous. It's possible we should apply this treatment to all of the other `compute_` operations, more investigation necessary. As a technical note, because this operator takes a pair of a list of SymInts, we need to support converting `ArrayRef<SymNode>` to Python, and I also unpack the pair of lists into a single list because I don't know if Sympy operations can actually validly take lists of Sympy expressions as inputs. See for example `_make_node_sizes_strides` * On the Python side, we also introduce a SymBool class, and update SymNode to track bool as a valid pytype. There is some subtlety here: bool is a subclass of int, so one has to be careful about `isinstance` checks (in fact, in most cases I replaced `isinstance(x, int)` with `type(x) is int` for expressly this reason.) Additionally, unlike, C++, I do NOT define bitwise inverse on SymBool, because it does not do the correct thing when run on booleans, e.g., `~True` is `-2`. (For that matter, they don't do the right thing in C++ either, but at least in principle the compiler can warn you about it with `-Wbool-operation`, and so the rule is simple in C++; only use logical operations if the types are statically known to be SymBool). Alas, logical negation is not overrideable, so we have to introduce `sym_not` which must be used in place of `not` whenever a SymBool can turn up. To avoid confusion with `__not__` which may imply that `operators.__not__` might be acceptable to use (it isn't), our magic method is called `__sym_not__`. The other bitwise operators `&` and `\|` do the right thing with booleans and are acceptable to use. * There is some annoyance working with booleans in Sympy. Unlike int and float, booleans live in their own algebra and they support less operations than regular numbers. In particular, `sympy.expand` does not work on them. To get around this, I introduce `safe_expand` which only calls expand on operations which are known to be expandable. TODO: this PR appears to greatly regress performance of symbolic reasoning. In particular, `python test/functorch/test_aotdispatch.py -k max_pool2d` performs really poorly with these changes. Need to investigate. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92149 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-01-21 02:21:56 +00:00
Aaron Enye Shi	2edf589e66	[Profiler] Fix SOFT_ASSERT test to not raise on debug builds (#91464 ) Summary: There was a patch to not raise SOFT_ASSERT in debug builds. Update this test to match it. Test Plan: This test passes after this patch. Differential Revision: D42270123 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/91464 Approved by: https://github.com/robieta	2022-12-30 05:31:03 +00:00
Han Qi	b8ba4802fe	Add an option to skip loading of debug traces (#91430 ) Summary: Debug traces consumes lots of memory especially for small models. Test Plan: Unit test Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/91430 Approved by: https://github.com/davidberard98	2022-12-29 22:53:17 +00:00
lezcano	41a0318f2d	Remove overload at::frobenius_norm(const Tensor&) (#81762 ) This function is an auxiliary function for `torch.norm`. This particular overload was not even used or tested. I hope it's not used internally either. If it is, we can simply drop this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/81762 Approved by: https://github.com/ngimel	2022-12-28 13:12:01 +00:00
mikey dagitses	322e4b4c8a	set -Wsuggest-override for builds (#89852 ) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852). * __->__ #89852 * #89851 set -Wsuggest-override for builds Summary: This was flagged by a Meta internal build. Test Plan: Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852 Approved by: https://github.com/malfet	2022-12-19 22:08:47 +00:00
Salil Desai	e2dc60c6cb	[Vulkan + Profiler] Add Timestamp Adjustment Algorithm (#90672 ) @bypass-github-export-checks This change ensures that vulkan event start/end times are correctly synced with their parent CPU times. This sometimes requires increasing CPU event durations (to fully contain their child events) and delaying CPU event start times (to prevent overlaps), so this should not be used unless Vulkan events are being profiled and it is ok to use this modified timestamp/duration information instead of the the original information. Differential Revision: [D39893109](https://our.internmc.facebook.com/intern/diff/D39893109/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39893109/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/90672 Approved by: https://github.com/kimishpatel	2022-12-19 20:01:07 +00:00
Howard Huang	7a0f29b776	Allow Process Group to support multiple backends (#88330 ) (#90997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88330 ### Implementation Move backend-specific (NCCL, Gloo, etc) collective implementations to corresponding `Backend` class. Update ProcessGroup to support multiple backends and use dispatcher to calls backends based on tensor device type. ### Changes #### c++ changes (ProcessGroup files, `Ops.cpp`, `init.cpp`) - Update pybind definitions for new process group base class and new backend class - Update pybinded backend class with collective definitions to keep BC with Python PG instances (e.g. `dist.ProcessGroupGloo`, `dist.ProcessGroupNCCL`) which are used in tests - Switch `ProcessGroupGloo`, `ProcessGroupNCCL`, `ProcessGroupMPI`, `ProcessGroupUCC` to derive from the `Backend` class. - Update CPU/CUDA `Ops.cpp` and `OpsImpl.cpp` to perform this dispatching by querying the backend using the device type - Update internal dispatched implementation of `barrier` to use a tensor which allows operation to be dispatched. - Update `allgather` collective to use `TensorList`. For some reason it was using the default implementation of `allgather` rather than dispatching it correctly. I still don't understand why and had originally filed an issue in 85122. #### python changes (`distributed_c10d.py`, test files) - Add BackendConfig class to specify the default configurations of backends and `get_backend_config()` API - `get_backend()` deprecation warning - `init_process_group` how returns a generic `ProcessGroup` object, it contains a list of backends (the ones stated above) which it will dispatch operations to. - `new_group` updated to return the same as above - Update `test_c10d_gloo.py`, Update `DistributedDataParallelTest` to use `init_process_group`, Update `ReducerTest`, update `test_broadcast_coalesced_gloo` to move from PG instance and gloo options - Update `test_c10d_nccl.py`, Update `DistributedDataParallelTest` to use `init_process_group` - Specific tests updated: `test_Backend_enum_class` ### Changes missing - lazy initialization of backends - support parsing of BackendConfig ### open questions - Pure Python PG extensions (https://github.com/pytorch/pytorch/pull/66338) # Example This is a basic script (using 2 backends within a process group) ```python # python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 basic_scenario.py import torch.distributed as dist import torch import os if __name__ == "__main__": rank = os.environ.get("RANK") # initialize with both gloo and nccl dist.init_process_group() # with gloo dist.all_reduce(torch.tensor([1.0])) print(f"Rank {rank} finished") # with nccl dist.all_reduce(torch.tensor([1.0], device=f"cuda:{rank}")) ``` Test Plan: Imported from OSS Differential Revision: D42069829 Pulled By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/90997 Approved by: https://github.com/awgu, https://github.com/fduwjj	2022-12-16 23:15:00 +00:00
Han Qi (qihqi)	25eb7c3ae3	Clean up dependancy for flatbuffer_loader (#86041 ) Test Plan: waitforsandcastle Differential Revision: D38445936 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86041 Approved by: https://github.com/cccclai	2022-12-08 03:48:04 +00:00
Richard Barnes	a580a63448	[codemod][llvm15] LLVM-15 fixes for caffe2/test/cpp/jit/test_module_api.cpp (#89938 ) Summary: This fixes issues which block `caffe2/test/cpp/jit/test_module_api.cpp` from compiling with LLVM-15. Test Plan: Sandcastle Reviewed By: meyering Differential Revision: D41603454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89938 Approved by: https://github.com/soumith	2022-12-04 12:50:14 +00:00
Richard Barnes	cc01614186	[codemod][llvm15] LLVM-15 fixes for caffe2/test/cpp/jit/test_graph_executor.cpp (#89936 ) Summary: This fixes issues which block `caffe2/test/cpp/jit/test_graph_executor.cpp` from compiling with LLVM-15. Test Plan: Sandcastle Reviewed By: meyering Differential Revision: D41603459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89936 Approved by: https://github.com/soumith	2022-12-01 03:30:31 +00:00
Everton Constantino	a00efe55c3	Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722 ) `JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722 Approved by: https://github.com/davidberard98	2022-11-23 22:46:29 +00:00
mikey dagitses	f057a45faf	reland "support running test_mobile_profiler with buck1/buck2 and OSS (#89001 )" (#89091 ) We modify this to no longer use std::experimental::filesystem::path and use our own custom type instead. This reverts commit `c53a5ac6cc`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89091 Approved by: https://github.com/r-barnes, https://github.com/malfet	2022-11-17 21:04:23 +00:00
Kazuaki Ishizaki	088f2fa567	Fix typos in messages under test (#89121 ) This PR fixes typos of messages in `.cpp` and `.py` files under test directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121 Approved by: https://github.com/mruberry, https://github.com/kit1980	2022-11-17 01:55:03 +00:00
Everton Constantino	dd6beca854	Changing the use from ASSERT_EQ to ASSERT_FLOAT_EQ on nn_utils test. (#83693 ) Changing the use from ASSERT_EQ to ASSERT_FLOAT_EQ on nn_utils.cpp:ClipGradNorm as this is the proper way to compare equality between floating point values. This avoids `test_api` ClipGradNorm failing for WoA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83693 Approved by: https://github.com/ngimel, https://github.com/kit1980	2022-11-15 04:10:52 +00:00

1 2 3 4 5 ...

2184 Commits