Summary:
When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24947
Differential Revision: D18838986
Pulled By: ezyang
fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30551
To enable quantizing with shared types, we need to insert GetAttr nodes for
quantization parameters since the code might be shared by multiple module instances
and we'd like to make quantized module instance also share the same code but with
different values of attributes.
Test Plan:
test_jit.py, test_quantization.py
Imported from OSS
Differential Revision: D18818652
fbshipit-source-id: fc95623cac59dcedd9e3f95397524eae515e7a11
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30474
There are some common parts in `isBiasOfConvOrLinear` and `isWeightOfConvOrLinear`, we can factor
them out, the refactor will allow for easier extension of new patterns
Test Plan:
python test/test_jit.py
python test/test_quantization.py
Imported from OSS
Differential Revision: D18795725
fbshipit-source-id: 446463da5e3fa8464db441ed0d9651930487b3b7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30679
Caffe2 expects quantized ops to be in NHWC format while pytorch inputs are in NCHW.
Add a jit pass to insert permutes to convert from nchw2nhwc before each conv op and add nhwc2nchw permute after the conv op.
Using graph rewriter to find consecutive redundant permutes and remove them from the graph
Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps
Imported from OSS
Differential Revision: D18790518
fbshipit-source-id: 4dd39cf0b31b21f5586c0edfdce2260d4e245112
Summary:
This PR adds docs for how we expose declarations in `at::` to `torch::`, to make the semantics more clear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30760
Differential Revision: D18833081
Pulled By: yf225
fbshipit-source-id: eff4d8815c67f681ce3a930ce99771cf2e55dbd9
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29161.
I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281
Differential Revision: D18830818
Pulled By: ezyang
fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20
Summary:
To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985
Differential Revision: D18809875
Pulled By: albanD
fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30550
Right now we have a `InsertQuantDeQuantHelper` for each module, but we need
it to be global because we need to know what graphs have been quantized before
and based on this information we can decide how to handle the module instance.
Test Plan:
test_jit.py, test_quantization.py
Imported from OSS
Differential Revision: D18818651
fbshipit-source-id: bfcaf37094ce20a257171a0c99b05b9348ebc13d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30037
Support quantization for modules with reused submodules, e.g. relu (automatically make unique)
We first do a pass on the graph to find all duplicate uses of the same module, and record the `Value`s of the
module instance, for each of these values we create a new module and change the access to that module.
Test Plan:
python test/test_jit.py
Imported from OSS
Differential Revision: D18821483
fbshipit-source-id: 1698b981e9e9f0c728d9f03fcbcfbd260151f679
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30473
Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.
ghstack-source-id: 94887831
Test Plan:
This is needed for quantizing traced model tests to pass
Imported from OSS
Differential Revision: D18795722
fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29217
We want to preserve constant information in ClassType so that
users can access the constants in the module by name.
This is also used later for freezing some attribute(converting
attributes to constant)
Test Plan:
tbd
Imported from OSS
Differential Revision: D18799955
fbshipit-source-id: fbfbcd5d3f7f560368b96e2a87e270c822a3d03a
Summary:
This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (b8792c0438). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures.
I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730
Differential Revision: D18813270
Pulled By: ezyang
fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30549
Preparing for later refactoring
Test Plan:
.
Imported from OSS
Differential Revision: D18802464
fbshipit-source-id: 0b5afb143549d93eed4c429125d3d5fd253093a9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30548
ClassTypes can be shared among different module instances, but previously we assumed
they would be unique, this PR enables the insert_observers pass to work with shared class types
Test Plan:
python test/test_jit.py
python test/test_quantization.py
Imported from OSS
Differential Revision: D18802465
fbshipit-source-id: b782e71e44a043af45577ac2b5c83e695155bb8b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315
The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.
Some things of note:
* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706Fixes#27215 (as our libraries are smaller), and executes on
part of the plan in #29235.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18790941
Pulled By: ezyang
fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30467
Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.
Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))
The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']
Test Plan: Imported from OSS
Differential Revision: D18801619
Pulled By: iseeyuan
fbshipit-source-id: f9b198d3e82b095daf704ee595d8026ad889bb13
Summary:
With the CI failure caused in 8bbafa0b32 fixed (incorrect return type of the lambdas in CUDA kernels)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30521
Differential Revision: D18770151
Pulled By: ailzhang
fbshipit-source-id: 02f0fe1d5718c34d24da6dbb5884ee8b247ce39a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30197
This default constructor was added because std::map's operator[]
requires a default constructor. However, instead of using operator[], we can
use emplace and remove the constructor, to ensure that the FutureInfo struct
doesnt get constructed with garbage values.
ghstack-source-id: 94802453
Test Plan: Unit tests pass.
Differential Revision: D18627675
fbshipit-source-id: c4cb000e60081478c0fd7308e17103ebbc4dc554
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30677
Currently you can only add FunctionEvents to FunctionEventAvg. This makes it so you can add multiple FunctionEventAvg objects together. This is useful for merging multiple profiles together such as when dealing with distributed training.
Test Plan:
added unit test
buck test //caffe2/test:autograd -- test_profiler
Reviewed By: bddppq
Differential Revision: D18785578
fbshipit-source-id: 567a441dec885db7b0bd8f6e0ac9a60b18092278
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28389
Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006
Differential Revision: D18782456
Pulled By: ezyang
fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
Summary:
This fixes the second issue reported in https://github.com/pytorch/pytorch/issues/29909 namely, a loop counter is assigned the wrong values after transitioning to a bailout graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30186
Differential Revision: D18646845
Pulled By: Krovatkin
fbshipit-source-id: 1f7c601dd9f35892979385ffa132fb0886a4f203
Summary:
This PR removes `namespace F = torch::nn::functional` from `torch/nn/modules/batchhnorm.h`, so that people don't have to define `torch::nn::functional` as `F` if they don't want to.
Fixes https://github.com/pytorch/pytorch/issues/30682.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30684
Differential Revision: D18795717
Pulled By: yf225
fbshipit-source-id: c9feffbeb632cc6b4ce3e6c22c0a78533bab69ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30659
I could only find one usage of TupleParser and it doesn't seem worth maintaining just for that one usage.
Test Plan: Imported from OSS
Differential Revision: D18795979
Pulled By: nairbv
fbshipit-source-id: 6e50d65fc8fade0944f36ab20d00f1539a3d4cb8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30498
Updated Int8SliceOp to accept dim, start and end index similar to Pytorch.
Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_slice
Imported from OSS
Differential Revision: D18740519
fbshipit-source-id: 2313f37a4936edb150ce04911b241e591e191801
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30345
Skip ProcessGroupGlooAyncTest if there is no CUDA available, otherwise in sandcastle non GPU host the test will abort with failing to load CUDA library
ghstack-source-id: 94771241
Test Plan: test skipped on non GPU host
Differential Revision: D18665322
fbshipit-source-id: 8c7b89aeecc6ec007bee12d864a6058384254e61
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30636
Currently DeQuantStub is still in whitelist because set union has
lower precedence than set difference
fix issue: https://github.com/pytorch/pytorch/issues/29646
Test Plan:
verified locally that we don't attach qconfig for DeQuantStub
Imported from OSS
Differential Revision: D18775275
fbshipit-source-id: 8da07e40963555671b3d4326c9291706103f858e
Summary:
Convolution nodes are traced as aten:_convolution and are currently supported in ONNX.
Scripting convolution uses aten:conv<1,2,3>d which are currently not supported in ONNX.
This PR adds the symbolics for aten:conv<1,2,3>d and aten:conv_transpose<1,2,3>d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30618
Reviewed By: hl475
Differential Revision: D18778145
Pulled By: houseroad
fbshipit-source-id: 4af0379f29974a1ce8443024d1d87b3eb8d2dd36
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30546
factor out this function for later support of quantizing shared types
Test Plan:
test_jit.py, test_quantization.py
Imported from OSS
Differential Revision: D18776304
fbshipit-source-id: f5a736b0f69019cefe17ec4517da1ae5462f78e1
Summary:
This tests seems to only test that we throw exceptions in the `WorkerInfo` constructor when invalid names are passed in, so I don't think we need to complicate by initializing RPC, and exposing ourselves to potential flakiness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30620
Differential Revision: D18766955
Pulled By: rohan-varma
fbshipit-source-id: 11643de4d57431e5f46e096c7766de3ab0b9b05a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30527
When we introduced dtype.is_signed we allowed for support of
quantized types, but we're not sure what the correct result should be.
See discussion at https://github.com/pytorch/pytorch/pull/29511
Test Plan: Imported from OSS
Differential Revision: D18765410
Pulled By: nairbv
fbshipit-source-id: c87cfe999b604cfcbbafa561e04d0d5cdbf41e6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30603
Pickler object needs to be kept in scope until data is written out to the
final serialized string. tensorData in particular is a reference to memory
owned by the descoped Pickle object.
Noticed this by inspection. In practice, this potential read-after-free here
is limited to non-cpu tensors, and any such use was very soon after free.
ghstack-source-id: 94756036
Test Plan: existing test suite at buck test mode/dev-nosan caffe2/test:rpc_fork
Differential Revision: D18760463
fbshipit-source-id: 9de890d66626aa48f13ca376dd9bd50b92e0cb00
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30354
TCPStoreTest would timeout since the TCPStore constructor for the
server would block the main thread waiting for workers. The workers themselves
were spawned later on once the server store is created. As a result, this test
would always timeout.
To fix the test, I moved the server store to a thread so that the workers can
register with the server in parallel.
In addition to this made a few improvements to tcputils::connect. When
tcputils::connect() encountered an exception, it always looked at `errno` for
the error code. In some cases `errno` could be overwritten and the real error
code would be stored in `std::system_error`. As a result, I've modified the
code to look at the error code in `std::system_error` if we catch an exception
of that type.
ghstack-source-id: 94758939
Test Plan: waitforbuildbot
Differential Revision: D18668454
fbshipit-source-id: d5a3c57b066b094bfecda9a79d9d31bfa32e17f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30490
Add symbolic mapping to Int8AvgPool2d and Int8Reshape op in C2
Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps
Imported from OSS
Differential Revision: D18740520
fbshipit-source-id: 1606125500c4b549fbc984e7929b7fd5204396a0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29785
TLDR: This change improves process_group's serialization speed:
Serialize_Tensor64: 12.38us -> 1.99us (~-84%)
Deserialize_Tensor64: 33.89us -> 5.62us (~-84%)
Serialize_Tensor1M: 525.74us -> 285.43us (~-45%)
Deserialize_Tensor1M: 892.61us -> 273.68us (~-70%)
After speaking with the jit team, we had consensus that torch::save()/load()
are somewhat high-overhead for RPC serialization, mostly intended for
persistent disk data.
(Particularly, for large tensors, 35% of the time is spent in CRC checking, even
with the fb-side changes to subsitute 40x faster SSE-accelerated crc checking;
Also, for small tensors, the zip container overhead is considerable, as is the
overhead of lexing/parsing an embedded text python program for each RPC).
The jit team encouraged us to use jit::pickler, with the WriteableTensorData
way of outputting result tensors (not the default side-tensor table, or
with pickling the actual tensors). This ends up just pickling some tensor
metadata, and giving us some tensor blobs that we can mindlessly
blit over the wire (they copy to cpu memory if needed).
There is yet no standardized container format for the pickled data
(there is jit::pickle_save() checked in, but but it's experimental,
no load function is yet provided), but they encouraged us to just use
something sensible for this, and possibly revisit later. For now, I made
the directory headers slightly http-inspired.
Note that serialization is just one component of the pipeline, but that
said, we also see reasonable reductions in end-to-end echo times (noisier):
ProcessGroupAgent_Echo(Tensor_Small) 855.25us -> 492.65us (~-42%)
ProcessGroupAgent_Echo(Tensor_1M) 10.82ms -> 6.94ms (~-35%)
ProcessGroupAgent_Echo(Small_NoTensor) 688.82us -> 301.72us (~-56%)
ProcessGroupAgent_Echo(1MB_NoTensor) 4.65ms -> 3.71ms (~-20%)
I moved the "wire serialization" logic to a separate file to assist with
unittesting.
ghstack-source-id: 94694682
Test Plan:
buck test mode/dev-nosan caffe2/test/cpp/api:serialize
buck test mode/dev-nosan caffe2/test/...
Differential Revision: D18493938
fbshipit-source-id: 07ddfe87dbe56472bc944f7d070627052c94a8f4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately.
ghstack-source-id: 94673884
ghstack-source-id: 94673884
Test Plan: Unit tests pass.
Reviewed By: mrshenli
Differential Revision: D18661775
fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30361
### Summary
By default, the compiler will choose `clock_gettime` for the iOS build. However, that API is not available until iOS 10. Since the Facebook app still supports iOS 9.0, we have to use `gettimeofday` instead.
```shell
xplat/caffe2/torch/csrc/autograd/profiler.h:86:3: error: 'clock_gettime' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability]
xplat/caffe2/torch/csrc/autograd/profiler.h:86:17: error: '_CLOCK_MONOTONIC' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability]
```
P.S. the open-sourced version is iOS 12.0 and above, so we don't have this problem.
### Test Plan
- buck build works
- Don't break CIs
Test Plan: Imported from OSS
Differential Revision: D18730262
Pulled By: xta0
fbshipit-source-id: fe6d954b8d3c23cbc9d1e25a2e72e0b0c1d4eaa9
Summary:
PyTorch dim and ONNX axis have different meanings.
ONNX only supports log_softmax with dim = -1. Transpose must be added before and after log_softmax to support other cases.
This requires input rank to be known at export time.
Fixes https://github.com/pytorch/pytorch/issues/17918
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30433
Reviewed By: hl475
Differential Revision: D18723520
Pulled By: houseroad
fbshipit-source-id: d0ed3b3f051d08d46495a7abfa854edd120dca3a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25768
The round robin process group can be constructed from multiple other
process groups. Every collective call against this new process group
is delegated to the specified process groups in a round robin fashion.
Doing so may benefit performance when calling into multiple NCCL
process groups. Instead of adding support for round-robin usage of
NCCL communicators, we achieve the same without changing the NCCL
process group and adding this wrapper class.
The API to create this round robin process group is a bit harsh. If we
find it adds significant benefit we can revisit and make this a first
class citizen in the torch.distributed module.
ghstack-source-id: 94578376
Test Plan: The newly added test passes.
Reviewed By: chenyangyu1988
Differential Revision: D17226323
fbshipit-source-id: ec9f754b66f33b983fee30bfb86a1c4c5d74767d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30415
This enables subclassing of c10d.Store and implementing its interface in Python.
ghstack-source-id: 94586627
Test Plan: New tests passes.
Reviewed By: vladbelous
Differential Revision: D18693018
fbshipit-source-id: fa1eba4bd11cc09a3d6bf3f35369c885033c63c0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30120
The example given for functional conv2d didn't work. This diff fixes the example in docs so that it works.
Fixes https://github.com/pytorch/pytorch/issues/29649
ghstack-source-id: 94601559
Test Plan: Tried the example locally
Differential Revision: D18604606
fbshipit-source-id: ff1a4f903e2843efe30d962d4ff00e5065cd1d7e
Summary:
In ONNX opset 11, a series of sequence ops were added. Operators that are related to Tensor[] in PyTorch can be exported using these sequence ops.
In this PR, unbind/split that produces Tensor[], and __getitem__ that takes Tensor[] as input, are exported correctly to ONNX opset 11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29136
Reviewed By: hl475
Differential Revision: D18309222
Pulled By: houseroad
fbshipit-source-id: be12c96bf8d0a56900683ef579f1c808c0a1af21
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30202
Pytorch Upsample operator has output_size as an argument.
For quantized tensor inputs we cannot get the input_size to calculate the width and height scale factor.
Instead we pass the output_size directly to caffe2 to calculate the scale factors.
Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_upsample
Imported from OSS
Differential Revision: D18631478
fbshipit-source-id: 38a39129bc863f4ecf2293acc068e40ab7edc825
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30217
Before this commit, RRefContext throws an error if it detects any
RRef leak during shutdown. However, this requires applications to
make sure that is has freed all references to RRefs in application
code, which can be a bad debugging experience when for large
applications. Besides, this also relies on Python GC to free things
up in time, which might not always be true. After this commit,
RRefContext would ignore leaking RRefs during shutdown, as shutdown
is called when the application has finished training and no longer
care about local states. Hence, it should be OK to just ignore
those leaks and destroy OwnerRRefs. If application would like to
enforce no leaks, just set torch.distributed.rpc.api._ignore_rref_leak
to False.
Test Plan: Imported from OSS
Differential Revision: D18632546
Pulled By: mrshenli
fbshipit-source-id: 2744b2401dafdd16de0e0a76cf8e07777bed0f38
Summary:
The PyTorch exporter does not add any name to the ONNX operators in the exported graph. A common request is to add names to op nodes by default. This helps the readability of the graph in visualization tools such a Netron, or when the ONNX graph is printed as a string. Also, it helps with the debuggability of the ONNX graph.
Therefore this PR adds name to operators in the exporters. The names follow a simple format, <op_type>_<index>. Expect files for tests in `test/onnx/test_operators.py` have been updated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27342
Reviewed By: hl475
Differential Revision: D17790979
Pulled By: houseroad
fbshipit-source-id: 1eaae88b5f51f152735a2ff96e22827837e34d9d
Summary:
This should resolve https://github.com/pytorch/pytorch/issues/29008. This flag has two effects on the tracer.
- Remove the underscroll for inplace operators. E.g.: index_put_ ==> index_put. This is handled in utils.py separately as well.
- Add out as input for backward computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29466
Reviewed By: hl475
Differential Revision: D18422815
Pulled By: houseroad
fbshipit-source-id: 317b6a3c8a5751fe6fe49d7543e429d281ed0d6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357
Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant.
ghstack-source-id: 94468814
Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly.
Differential Revision: D18668517
fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30430
When a module isn't a TracedModule, attempt to get name information with `original_name` property on module and default to 'Module' when no such property exists.
Test Plan:
### Change child module to scripted module:
```
model = torchvision.models.alexnet()
model.classifier = torch.jit.script(model.classifier)
```
### Add graph
```
w = SummaryWriter()
w.add_graph(model, torch.rand((2, 3, 224, 224)))
w.close()
```
### No errors
However, graph is disconnected at parts and hard to understand.
{F223327878}
Reviewed By: sanekmelnikov
Differential Revision: D18690836
fbshipit-source-id: 42295d06b7c1d48d5401776dca1e0d12cd64b49d
Summary:
There is no `out` argument to `argsort` according to the source code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24335
Differential Revision: D16829134
Pulled By: vincentqb
fbshipit-source-id: 8f91154984cd4a753ba1d6105fb8a9bfa0da22b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30362
Right now the qat modules(qat.ConvBn2d, qat.ConvBnReLU2d, qat.Conv2d)
are not convinent to support other dimensions of Conv, this PR refactors
these modules so that we can support Conv1d/Conv3d better
Test Plan:
python test/test_quantization.py
Imported from OSS
Differential Revision: D18691152
fbshipit-source-id: 5b561e6b054eadd31b98cabdf1ac67a61ee9b805
Summary:
In this PR, we mainly handle the case there are multiple usage of a Value when inserting the quant-dequant pair. This change will add one dequant for each usage of the Value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30145
Differential Revision: D18671600
Pulled By: lly-zero-one
fbshipit-source-id: 61324a98861da85b80dcf7e930381311118ae53b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208
Adds default arg for init_method so users don't have to pass this in,
and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs.
ghstack-source-id: 94500475
Test Plan: Unit tests pass.
Reviewed By: mrshenli
Differential Revision: D18630074
fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337
This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it.
But if a kernel is registered in a boxed way, we don't need it and should hide this from the API.
This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does.
Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so.
ghstack-source-id: 94481316
Test Plan: unit tests
Differential Revision: D18361991
fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201
This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called.
ghstack-source-id: 94481313
Test Plan: I will add unit tests in a diff stacked on top
Differential Revision: D18282746
fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99
Summary:
In the PR, we enhance the graph-mode quantization for aten::_convolution, which could be generated from tracing path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30245
Differential Revision: D18671597
Pulled By: lly-zero-one
fbshipit-source-id: 78a2470fbb0fe0def55d63c6bda7cbb5c89f7848
Summary:
This PR updates `torch::pickle_save` to use the new zipfile format introduced in #29232 and adds `torch::pickle_load` which can decode the zipfile format. Now that `torch.save/load` use this format as well (if the `_use_new_zipfile_serialization` flag is `True`), raw values saved in Python can be loaded in C++ and vice versa.
Fixes#20356
](https://our.intern.facebook.com/intern/diff/18607087/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30108
Pulled By: driazati
Differential Revision: D18607087
fbshipit-source-id: 067cdd5b1cf9c30ddc7e2e5021a8cceee62d8a14
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30241
We need an API to get all worker infos. This will be used by backend-agnostic `rpc.wait_all_workers()` API.
ghstack-source-id: 94454935
Test Plan:
# Unit tests
```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_get_worker_infos
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_get_worker_infos
```
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_get_worker_infos
buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_get_worker_infos
```
Differential Revision: D5693412
fbshipit-source-id: 5123c8248b6d44fd36b8a5f381dbabb2660e6f0f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29164
- Created GlooDeviceFactory to hide device creation details
- Added transport option while on Python interface
The reason of making the factory class is to make it easier to extend gloo transport in the future
Test Plan: Imported from OSS
Reviewed By: satgera, d4l3k
Differential Revision: D18596527
fbshipit-source-id: e8114162ee8d841c0e0769315b48356b37d6ca0a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29207
The logic calling c10 ops from JIT did some variable wrapping to make sure all results are always variables.
Thanks to ezyang, this is not needed anymore because everything is a variable now.
ghstack-source-id: 93345590
Test Plan: waitforsandcastle
Differential Revision: D18327507
fbshipit-source-id: 86512c5e19d6972d70f125feae172461c25e3cb6
Summary:
This PR looks for a `constants.pkl` file at the top level in a zip file
in `torch.load`. If found, it calls `torch.jit.load` instead and issues
a warning to call `torch.jit.load` directly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29339
Differential Revision: D18611095
Pulled By: driazati
fbshipit-source-id: f070a02f6b5509054fc3876b3e8356bbbcc183e1
Summary:
Perf improvements to multi_head_attention_forward
- qkv_same and kv_same were not used outside of that branch. Further, kv_same was calculated even though it is not used if qkv_same
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30142
Differential Revision: D18610938
Pulled By: cpuhrsch
fbshipit-source-id: 19b7456f20aef90032b0f42d7da8c8a2d5563ee3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times.
ghstack-source-id: 94415336
Test Plan: Unit tests pass.
Differential Revision: D5578006
fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30239
Use unboxed registration per smessmer 's request. For some ops with optional arg or tensor list that unboxed registration are not supported, we still use boxed.
Test Plan: Imported from OSS
Differential Revision: D18653846
Pulled By: iseeyuan
fbshipit-source-id: c22ce8111dfff0ba63316a9bcfe2b712b2d31fc1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30201
Provide a default constructor so that users don't have to construct
RPC agent options. Also rename this to RPCBackend Options as suggested.
ghstack-source-id: 94411768
Test Plan: Unit tests pass.
Differential Revision: D18628698
fbshipit-source-id: 81fb45f124ad1006e628f6045162308093c9d446
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29118
It's never a good idea to throw from a destructor and per #28288 we
can't use `std::make_shared` on a class with a `noexcept(false)`
destructor.
To fix this, we `abort` instead of throw from the `NCCLComm` destructor.
Closes#28288.
ghstack-source-id: 93182910
Test Plan: ProcessGroupNCCLErrorsTest runs successfully.
Reviewed By: pritamdamania87
Differential Revision: D18298271
fbshipit-source-id: ccac37753fef64fb63cb304433f4f97dc5621379
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30286
add_hparams() in torch.utils.tensorboard.writer produced the following error
python3.7/site-packages/torch/utils/tensorboard/writer.py", line 294, in add_hparams
with SummaryWriter(log_dir=os.path.join(self.file_writer.get_logdir(), str(time.time()))) as w_hp:
AttributeError: 'NoneType' object has no attribute 'get_logdir'
Other methods such as add_scalar() and add_histogram() use self._get_file_writer() instead of self.file_writer directly.
Test Plan:
```
writer = summary_writer()
writer.add_hparams({"a": 0, "b": 0}, {"hparam/test_accuracy": 0.5}))
writer.flush()
writer.close()
```
Reviewed By: J0Nreynolds, sanekmelnikov
Differential Revision: D18650610
fbshipit-source-id: 1039dd2067d37913a8a131c8b372491a63154899
Summary:
When creating the onnx graph, we overwrite the output type with the output type of the PT graph.
In some special cases, when using scripting, the PT graph does not have type information. We want to avoid overwriting the input type is these cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25906
Reviewed By: hl475
Differential Revision: D18645903
Pulled By: houseroad
fbshipit-source-id: 56acc43e0c15c74ac8ebd689e04f7371054e362e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30244
This makes several small changes to the tensorboard graph parsing methods to address the recent changes to the PyTorch JIT trace/graph.
- Inline graph to get information for all nodes
- Assign and propagate scope names to GetAttr nodes
- Prune all useless GetAttr nodes (any with a ClassType output type - tensors and primitives are kept)
- Create output nodes so output tensor shape can be examined
Reviewed By: sanekmelnikov
Differential Revision: D18556323
fbshipit-source-id: b73a809bacfa554c3fe9c4ae3563525f57539874
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30243
Before this commit, rpc docs shows init_rpc as the following:
```
torch.distributed.rpc.init_rpc(
name,
backend=<BackendType.PROCESS_GROUP: BackendValue(
construct_rpc_agent_options_handler=<function _process_group_construct_rpc_agent_options_handler>,
init_backend_handler=<function _process_group_init_backend_handler>)>,
init_method=None,
rank=-1,
world_size=None,
rpc_agent_options=None
)
```
It unnecessarily leaks implementation details. This commit adds a
__repr__ function to BackendType Enum class to address this problem.
closes#29905
Test Plan: Imported from OSS
Differential Revision: D18641559
Pulled By: mrshenli
fbshipit-source-id: 19bf8a2d21c8207f026d097d8e3f077578d53106
Summary:
Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions.
Fixes https://github.com/pytorch/pytorch/issues/29065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095
Differential Revision: D18301806
Pulled By: ezyang
fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30168
Previous implementation of `clone` in `script::Module` copies both the module instance and the
class type, after we enabled type sharing https://github.com/pytorch/pytorch/pull/26666 we also
need to have a function to clone instance only and share the underlying class type.
Test Plan:
tbd
Imported from OSS
Differential Revision: D18631324
fbshipit-source-id: dbadcf19695faee0f755f45093b24618c047b9d1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731
The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.
Fixes#27215 (as our libraries are smaller), and executes on
part of the plan in #29235.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18632773
Pulled By: ezyang
fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122
Differential Revision: D18630541
Pulled By: lly-zero-one
fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
Summary:
The original design of `torch::nn::utils::clip_grad_norm_` / `clip_grad_value_` takes input by non-const reference, which prevents users from passing rvalue reference as input into the functions. This PR changes the functions to take input by value, which matches the Python version's semantics, and also adheres to the C++ API convention that if a function modifies its input in-place, it should take that input by value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30216
Differential Revision: D18632543
Pulled By: yf225
fbshipit-source-id: 97a09d6467f982fe9c8120f483a9c07fcf13699e
Summary:
A prim::BailOut also needs to capture max trip counts as for some graphs they aren't constants and they are used in continuation graphs to figure out the remaining number of iterations to run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30097
Differential Revision: D18624446
Pulled By: Krovatkin
fbshipit-source-id: 085d25981c6669f65848996cd2d50066cc252048
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28287
This PR eliminates the static distinction between
Tensor and Variable. Every Variable is a Tensor, no need to static_cast
or call the Variable constructor.
To do this, I need Tensor to have API parity with Variable. I have already
moved most of the methods I don't want in Tensor off Variable.
These implementations are all placed in Tensor.cpp.
One API difference is that all Variable methods now have const, so we no longer
have faux const-correctness (see https://github.com/zdevito/ATen/issues/27 for
back story)
This diff is BC breaking in a few ways:
- Because torch::autograd::Variable is now just an alias of at::Tensor, ADL for
`torch::autograd` functions no longer works, you have to explicitly qualify
them with `torch::autograd` (examples: `torch/nn/parallel/data_parallel.h`)
- Because Variable and Tensor are now the same type, code which assumes that
they are different types (e.g., for the purposes of templating, or enable_if checks)
will not work until you delete the (now) redundant overload/specialization.
(examples: `torch/nn/modules/container/any.h`, `torch/csrc/utils/pybind.h`)
Some other notes:
- I'm not sure what was going with the old template implementation of `extract_vars`,
but I couldn't get the sfinae version to work. Replacing it with an overloading based version
made it work.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18571426
Pulled By: ezyang
fbshipit-source-id: 2ea8151e5f1d8512cdebf1345399642e68b707b8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29577
`torch.autograd.grad` can return none is one of the input is not in the
autograd graph or not requires_grad, this fix it so that it return a
list of optional tensor instead of list of tensor.
This might have BC issue unfortunately, but I think it's rare both
internal and external (only training use it, and most of the training
use backward, instead of autograd.grad), so whitelist it.
Test Plan: Imported from OSS
Differential Revision: D18491642
fbshipit-source-id: d32b2b3446cf9e8b9a98f6d203a21a75643d8991
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29928
Original author: Shihao Xu
- Add abort to `c10d::ProcessGroup::Work`.
- Change the return type of `c10d::ProcessGroup::Work::wait()` to boolean to indicate if the work is aborted after waiting.
- Add unit test for the correctness of abort.
ghstack-source-id: 94305515
ghstack-source-id: 94305515
Differential Revision: D5685727
fbshipit-source-id: 6e682bb563c2393a5c303c877331140417d3f607
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30052
Some of the examples provided in `rpc/api.py` were not updated along
with the code changes, this PR updates them. Also removes the
`dist.ProcessGroup` information since `init_rpc` now initializes a default
process group.
ghstack-source-id: 94273004
Test Plan: Unit tests pass
Differential Revision: D18582596
fbshipit-source-id: a637683f0221f9600f7e50b74e9f7e5a1d331d8f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30172
RRefContext is a conventional singleton, used by rref.cpp. At module teardown
time, it's not defined whether rref_context.cpp or rref.cpp will be destroyed first.
We were observing a SIGSEGV because RRefContext is destroyed before a dangling
~UserRRef() call is able to execute. Particularly, the underlying
ctx.agent()->getWorkerInfo(ownerId_) call failed.
This change just avoids the SIGSEGV by forcing an intentional leak, though we still
need to deal with why there's a dangling UserRref at module destruction time.
ghstack-source-id: 94287441
Test Plan:
existing test suite
test_elastic_averaging in context of D18511430, where the segfault reproed reliable.
Differential Revision: D18620786
fbshipit-source-id: 17b6ccc0eb1724b579a68615e4afb8e9672b0662
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30140
This seems more semantically correct to me, and makes it so we don't have to iterate over Uses of observed values
Test Plan: Imported from OSS
Differential Revision: D18610676
Pulled By: jamesr66a
fbshipit-source-id: f835266f148bd8198b05cd9df95276e1112dd250
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30050
Renames this API to wait_all_workers as discussed.
ghstack-source-id: 94273005
Test Plan: Unit tests pass
Differential Revision: D18581466
fbshipit-source-id: 4ff5d5fb2d528f17252d5b5f30c3047d2efb92bf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146
This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options.
Test Plan: Imported from OSS
Differential Revision: D18618971
Pulled By: yf225
fbshipit-source-id: 2af62c1a0ace2cd0c36c2f1071639bf131d8fe61
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494
`calculate_qparams` of per channel quantization should return the axis, this
PR added this and also added corresponding support in graph mode
Test Plan:
python test/test_jit.py
Imported from OSS
Differential Revision: D18580905
fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29881
Breaking these into separate files allows us to have three different builds:
- Mobile inference-only.
- Mobile with module saving.
- Server with module saving and other export functions like ONNX.
And this can be accomplished just by selecting which cpp files to compile,
without setting any preprocessor flags.
Test Plan: CI. Local mobile+saving build.
Reviewed By: smessmer
Differential Revision: D18509296
fbshipit-source-id: 9438273bac4624df5c7f035b2bacb901cce43053
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146
This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options.
Test Plan: Imported from OSS
Differential Revision: D18612158
Pulled By: yf225
fbshipit-source-id: 8c403fa1c2a0a65734a3ec2387cc0937c46cab24
Summary:
VitalyFedyunin, This PR is about port sigmoid backward to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time
torch.manual_seed(0)
def _time():
if torch.cuda.is_available():
torch.cuda.synchronize()
return time.time()
device = "cpu"
if torch.cuda.is_available():
device = "cuda"
#warm up
for n in [100, 10000]:
input = torch.randn(128, n, requires_grad=True, device=device)
for i in range(1000):
output = input.sigmoid().sum()
output.backward()
#get running time
for n in [100, 10000]:
bwd_t = 0
input = torch.randn(128, n, requires_grad=True, device=device)
for i in range(10000):
output = input.sigmoid().sum()
t1 = _time()
output.backward()
t2 = _time()
bwd_t = bwd_t + (t2 - t1)
bwd_avg = bwd_t / 10000 * 1000
print("input size(128, %d), backwad avg time is %.2f (ms)." % (n, bwd_avg))
```
Test Device: CPU: skx-8280, GPU: Tesla P40
**Perfromance**:
Before:
```
GPU:
input size(128, 100), backwad avg time is 0.14 (ms).
input size(128, 10000), backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100), backwad avg time is 0.06 (ms).
input size(128, 10000), backwad avg time is 4.21 (ms).
OMP_NUM_THREADS=1
input size(128, 100), backwad avg time is 0.06 (ms).
input size(128, 10000), backwad avg time is 2.30 (ms).
```
After:
```
GPU:
input size(128, 100), backwad avg time is 0.14 (ms).
input size(128, 10000), backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100), backwad avg time is 0.05 (ms).
input size(128, 10000), backwad avg time is 0.48 (ms).
OMP_NUM_THREADS=1
input size(128, 100), backwad avg time is 0.04 (ms).
input size(128, 10000), backwad avg time is 0.86 (ms).
```
How to set number thread? using following script:
```
num_threads=$1
script=$2
last_core=`expr $num_threads - 1`
echo "using $num_threads OMP threads"
echo "bind cores to 0~$last_core"
export OMP_NUM_THREADS=$num_threads
export KMP_AFFINITY=granularity=fine,compact,1,0
numactl --physcpubind=0-$last_core --membind=0 python $script
```
and run **./run.sh num_threads test.py**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29185
Differential Revision: D18587352
Pulled By: VitalyFedyunin
fbshipit-source-id: 8167ca261960399f795d35a83fa8c4be365bc4da
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29826
After save/load, we lose concrete type information. So if you tried to
script something that contained a loaded ScriptModule as a submodule,
the following sequence happened:
1. During ConcreteType inference, the loaded submodule got a new
inferred type.
2. But it already has a type! So there was a type mismatch.
To fix this, we should generate a ConcreteType directly from the loaded
submodule type (similar to what we do for interfaces). This makes sense
too--the ConcreteModuleType should be empty, since all the "sugaredness"
was stripped out during the save/load process.
Test Plan: Imported from OSS
Differential Revision: D18575009
Pulled By: suo
fbshipit-source-id: 4d329b7e9b7e7624f459e50092e35ab0ab813791
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29825
We made `ModuleInfo` a union initially to represent the idea that a
submodule could either be a regular module or a module interface.
This PR represents module interfaces as a ConcreteModuleType with no
info (e.g. no "sugaredness"), and with the interface type as the
underlying `jitType_`. This has the effect of reducing the special
casing around adding/maintaining module info.
Test Plan: Imported from OSS
Differential Revision: D18575011
Pulled By: suo
fbshipit-source-id: 53e297b39aa1a03bcdadd795ff225aa68fec9d70
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29824
We have two distinct phases/uses for ConcreteModuleType:
1. We are building it up and using it to check whether we can
reuse JIT types. (RawConcreteModuleType)
2. We are using it to satisfy ModuleValue::attr queries.
(ConcreteModuleType)
These types share an underlying `ConcreteModuleTypeData` which
actually stores the relevant info.
Previously they were the same type because I was lazy, but it's been the
source of a bug. So split them to formalize the differing invariants for
the two phases.
Test Plan: Imported from OSS
Differential Revision: D18575010
Pulled By: suo
fbshipit-source-id: 3e4ebcd36e78b947150d8f0dbb74ecccad23e7c4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29930
Right now, python call remote exception rethrown is coupled with deserializtiaon.
For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown.
This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here()
close#29924
ghstack-source-id: 94210894
Test Plan: unit tests
Differential Revision: D18541916
fbshipit-source-id: 7cda93f623d52c740b3c1b1fa9a442f866984340
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30093https://github.com/pytorch/pytorch/pull/28226 introduced `worker_to_id` arg to the `def init_rpc` function for other `RpcAgent`. While it's not really used by `ProcessGroupAgent`. Cleanup is wanted for this, as described in https://github.com/pytorch/pytorch/issues/29031.
To adapt to the difference of different `RpcAgent`, adding a `RpcAgentOptions` base classes, which allow leveraging inheritance to add extra fields.
ghstack-source-id: 94197295
Test Plan:
### OSS RPC + RRef tests
```
buck test mode/dev-nosan //caffe2/test:rpc_fork
```
```
buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:thrift_rpc_fork_test -- test_sync_rpc
```
### Prototype RRef tests
```
buck test mode/dev-nosan caffe2/torch/fb/distributed/pytorch/tests:test_rpc
```
```
buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_rpc_thrift_rpc_agent
```
### Dist autograd
```
buck test mode/dev-nosan caffe2/test:dist_autograd_fork
```
```
buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:thrift_dist_autograd_fork_test
```
Differential Revision: D18595578
fbshipit-source-id: 616fca3b844c171ed5277bbc6a2b1693bc3a8065
Summary:
Overwrite `__setstate__` func in nn.MultiheadAttention func and add `self._qkv_same_embed_dim` attribute in the `dict`. Current users should not be affected by the change.
The changes have been tested to load a MultiheadAttention model trained by PyTorch 1.1. If users have an old MultiheadAttention model, please use `torch.load` func to load the old model for inference under v1.4.0 and above.
```
import torch
model = torch.load('old_v1.1.0_MultiheadAttention.pt') # model works for torch 1.4
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29001
Differential Revision: D18257671
Pulled By: zhangguanheng66
fbshipit-source-id: fa41b85f6d53034dc9f445af60f2ad9636e9abf7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30062
This allows to catch exceptions during optimizer creation.
ghstack-source-id: 94232436
Test Plan: new unit test.
Differential Revision: D18586108
fbshipit-source-id: 71cfdf337fe803dbea8787b4c68e5a52b70a1f68
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30067
### Summary
The mobile build has been broken since last week due to a runtime error caused by a missing operator in JIT:
```shell
libc++abi.dylib: terminating with uncaught exception of type torch::jit::script::ErrorReport:
Unknown builtin op: aten::_adaptive_avg_pool2d_backward.
Could not find any similar ops to aten::_adaptive_avg_pool2d_backward. This op may not exist or may not be currently supported in TorchScript.
:
at <string>:9:28
grad_self = grad.expand(self.size()) / (self_size[-1] * self_size[-2])
else:
grad_self = torch._adaptive_avg_pool2d_backward(grad, self)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return grad_self
```
### How this happens
Since we've disabled the autograd for the opensourced version, the `backward` ops won't get registered by JIT.
When `forward` runs, a `GraphExecutor` will be created according to the value of `executor_mode`. In the mobile case , this one was set to true, which gives us the `ProfilingGraphExecutorImpl` object. Seems like this executor will eventually try to emit IR for autograd schemas? which causes the error.
### Fix
There are two ways to fix it.
1. Add a macro to disable `profiling_mode` as well as `executor_mode` on mobile. Like what `FBCODE_CAFFE2` does [here](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/profiling_graph_executor_impl.cpp#L22).
2. Disable the two modes in runtime, by calling ` torch::jit::getExecutorMode() = false;` before calling forward.
(IMO, The second fix is sort of a workaround as it doesn't make sense from a user perspective (Why I need to do this). But the up side is that we don't have to introduce yet another macro )
Feel free to drop comments, if there is a better way to fix it.
### How this was not detected by our mobile CI
We're working on adding runtime tests to our mobile build to prevent similar issues like this.
### Test Plan
- The error above disappears
- Don't break CI
cc AshkanAliabadi
Test Plan: Imported from OSS
Differential Revision: D18605998
Pulled By: xta0
fbshipit-source-id: 11fa85c2b44d54bc28a9c45731af0f5d17d5804c
Summary:
This uses newly added InlinedCallStack to print the original call stack
even if the real call stack is shallower because of inlining.
This change also makes torchscript stacktraces look like python ones.
Example:
```
torch.jit.script
def baz(c, b):
return c + b
torch.jit.script
def foo(c, b):
return baz(c, b)
torch.jit.script
def bar(c, b):
return foo(c, b)
bar(torch.rand(10), torch.rand(9))
```
Output before:
```
Traceback (most recent call last):
File "fail.py", line 25, in <module>
bar(torch.rand(10), torch.rand(9))
RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0
The above operation failed in interpreter, with the following stack trace:
at fail.py:15:11
torch.jit.script
def baz(c, b):
return c + b
~~~~~ <--- HERE
```
Output after:
```
Traceback (most recent call last):
File "fail.py", line 41, in <module>
bar(torch.rand(10), torch.rand(9))
RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0
The above operation failed in interpreter.
Traceback (most recent call last):
File "fail.py", line 33
torch.jit.script
def bar(c, b):
return foo(c, b)
~~~ <--- HERE
File "fail.py", line 29, in foo
torch.jit.script
def foo(c, b):
return baz(c, b)
~~~ <--- HERE
File "fail.py", line 25, in baz
torch.jit.script
def baz(c, b):
return c + b
~~~~~ <--- HERE
```
Output of non-scripted python code:
```
Traceback (most recent call last):
File "fail.py", line 36, in <module>
bar(torch.rand(10), torch.rand(9))
File "fail.py", line 21, in bar
return foo(c, b)
File "fail.py", line 18, in foo
return baz(c, b)
File "fail.py", line 15, in baz
return c + b
RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0
```
Differential Revision: D18532812
Test Plan: Imported from OSS
Pulled By: ZolotukhinM
fbshipit-source-id: e7e5ba5e4a8f1c7086406271d0f1685d9db8541a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27921
InlinedCallstack serves a similar purpose to Scope, but instead of storing
string names of the functions it stores pointer to Function objects
themselves. Currently, scopes are used in tracing and callstacks are
used in scripting - hopefully I would be able to merge them in future.
gh-metadata: pytorch pytorch 27921 gh/ZolotukhinM/139/head
Differential Revision: D17914132
Test Plan: Imported from OSS
Pulled By: ZolotukhinM
fbshipit-source-id: b1daa6700199ee1a97a7f49a6fced9ac0dc13051
Summary:
Hi yf225,
I have a few doubts related to implementation:
1) What tests do I have to write?
2) What does _load_state_from_dict does?
3) Do I need to override reset() function as I can not see it's utility?
4) InstanceNormOptions could be removed with BatchNormOptions, but I find that
`track_running_status` is not defined instead `stateful` is defined.
InstanceNorm{1,2,3}d https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28790
Differential Revision: D18588666
Pulled By: yf225
fbshipit-source-id: bb9b81f01f62c3fc8765fa0ba0716768087ee155
Summary:
Since torchvision is not using input_channels / output_channels / with_bias in ConvOptions anymore (https://github.com/pytorch/vision/pull/1576), we can remove the bridges now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29838
Differential Revision: D18597943
Pulled By: yf225
fbshipit-source-id: 59101437f032f042574998eb90eaf0be09352364
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30112
Currently, we have torch::nn functionals that takes `input` as `Tensor&` in order to be able to in-place change `input`'s value. We likely shouldn't do this because it will prevent the following use case:
```cpp
F::elu(torch::tensor(1), F::ELUFuncOptions().inplace(true))
```
The solution is to change the type of `input` to `Tensor`, so that we can pass an rvalue into the functional.
Test Plan: Imported from OSS
Differential Revision: D18601580
Pulled By: yf225
fbshipit-source-id: 639a86eb62f6c986b0f20bf7e201983e83126e73
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29770
We were passing around const and non-const references for
DistAutogradContext from DistAutogradContainer. This wasn't safe since the
context could be deleted from the container and a thread might still be using
the reference. This usually would happen when a backward pass fails on the node
driving the backward pass (resulting in delete context messages being sent to
all nodes) but other nodes are still executing code related to that autograd
context.
This was also the reason why `test_backward_autograd_engine_error` was flaky.
Using a std::shared_ptr everywhere ensures we're safe and never crash.
Closes#28928Closes#26922
ghstack-source-id: 94201446
Differential Revision: D18494814
fbshipit-source-id: 0c925fdbd5755f6d876dad56885e2cbaf41fc5f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29492
Previously graph mode quantization only works for per tensor quantization,
this PR added support for per channel quantization as well, changes include
- insert per channel quantization calls(insert_quant_dequant)
- add support of folding for prepacked per channel quantized weight (fold_prepack)
Test Plan:
test is not possible until we can script PerChannelObserver, which comes in https://github.com/pytorch/pytorch/pull/29416
we'll add test in a separate PR after that.
Imported from OSS
Differential Revision: D18580444
fbshipit-source-id: 347c07f201648ec49f070523642a9170278f8aa4
Summary:
Stacked PRs
* https://github.com/pytorch/pytorch/issues/29244 - Use custom CRC
* **https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization**
This adds a serialization method that uses a zipfile (https://github.com/pytorch/pytorch/issues/26567). Right now it is
guarded behind a flag `_use_new_zipfile_serialization`. In release mode it seems to have performance about the same / slightly better than the current serialization in some simple benchmarks for large/small tensors.
Follow ups:
* Flip the `_use_new_zipfile_serialization` flag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29232
Differential Revision: D18332036
Pulled By: driazati
fbshipit-source-id: 1bac0847c4d599612cba905f2cac8248783be2f4
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29187
This introduces a new class `_NormBase` that `_InstanceNorm` and `_BatchNorm` inherit from separately. This means the `isinstance(module, _BatchNorm)` check won't falsely pass for `_InstanceNorm`.
The suggested fix of adding `and not isinstance(module, _InstanceNorm)` works as well, but requires introducing a cyclic dependency between `instancenorm.py` and `batchnorm.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29985
Differential Revision: D18588104
Pulled By: yf225
fbshipit-source-id: f599da3b902ad9c56836db4d429bfc462ed51338
Summary:
Support exporting left/right bitshifts to ONNX for all opset versions.
ONNX has a bitshift operator in opset 11, but it only supports unsigned ints, so it can't be used in PyTorch (since only uint8 is the only uint type).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28210
Reviewed By: hl475
Differential Revision: D18575512
Pulled By: houseroad
fbshipit-source-id: 74161db67f599996a0614981edcc171af6780d21
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29601
Follow up from https://github.com/pytorch/pytorch/pull/28392. Adds a background thread to `ProcessGroupAgent` that polls for timed out RPCs at a pre-set interval, and marks them as completed with a timeout exception if they have timed out. Also deletes the futures from the corresponding maps `futures_` and `futureTimeouts`. Unit tests are added to ensure that timed out RPCs are appropriately cleaned up.
Also adds a `shutdown` variable to process group agent to control the shutting down of this background thread, which can eventually be extended to use for controlling a clean shutdown of process group agent.
ghstack-source-id: 94175131
Test Plan: Added unit tests
Differential Revision: D18434215
fbshipit-source-id: c48abdb8759fe1447200ec66bb9d4b1c50ec4535
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30060
Mobile forward() passed inputs by reference, which is different from JIT's script::module. To make it consistent, change it pass by value.
Test Plan: Imported from OSS
Differential Revision: D18587786
Pulled By: iseeyuan
fbshipit-source-id: fa398124fd0a5168f708733ff88f0ba327726f43
Summary:
This is a fix for batch norm 2D with affine=False.
Repro: https://github.com/pytorch/pytorch/issues/29271
Error is because the output of the unsqueeze op does not have scalar type information. So I moved the references to scalar type after the unsqueeze line.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29458
Reviewed By: hl475
Differential Revision: D18400975
Pulled By: houseroad
fbshipit-source-id: f5c5633857c584edcef3b9e9946861dcfccccd75
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29958
DistributedOptimizer relies on hashing WorkerInfo in order to coalesce fan-out RPCs. This will likely be a very common use case (EASGD will do the same, for example).
ghstack-source-id: 94169198
Test Plan: unit test.
Differential Revision: D18548257
fbshipit-source-id: 7d67d4e1b9bc60403c372164982a75ae8c1d8389
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29787
The initializedContextIds_ map was never cleaned up in DistEngine and
kept on growing as we continue to run backward passes. To fix this, in this PR
we ensure that the context id is cleaned up from this map once we are done with
the backward pass.
Closes#29083
ghstack-source-id: 94161770
Test Plan: waitforbuildbot
Differential Revision: D18498937
fbshipit-source-id: 8d31fc066f6994627766f2b6ca36efa1bef89840
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30033
Removing this API for now since we don't have a concrete use-case for
this yet and as a result exposing this as a public API might result in users
depending on this API.
We can always add some variant of this API back if needed later.
ghstack-source-id: 94138302
Test Plan: waitforbuildbot
Differential Revision: D18578056
fbshipit-source-id: 078c62331725e03bd5702624afc16b1cdcdf26a4
Summary:
update the requirements on input dimensions for `torch.nn.SyncBatchNorm`:
1. 2D inputs is now permissible, https://github.com/pytorch/pytorch/issues/20204 ;
2. requires at least two element along normalization plane (BatchNorm behavior);
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29626
Differential Revision: D18492531
Pulled By: albanD
fbshipit-source-id: f008e46a2d520d73c3c2730890a7424eba2ede9e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29694
This PR adds preliminary support required to be able to run quantized pytorch models on a C2 backend.
For quantized ops we use a custom domain name 'caffe2' to register the ops if they are in the "quantized" namespace.
The change also adds JIT pass to unpack the quantized weights and insert the unpacked values into the graph.
The actual tensor values are looked up from the params dict.
Test Plan:
python test/onnx/test_pytorch_onnx_caffe2.py TestQuantizedOps
Imported from OSS
Reviewed By: houseroad
Differential Revision: D18467130
fbshipit-source-id: 53ebd8c43935f7d7e74305dad6c231a2247df176
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29667
Some previous implementations are defined in native_functions.yaml.
In this case, I don't define them explicitly in Tensor; instead
they are placed in VariableTypeManual.cpp. When I did this, I would
have deleted documentation; instead, this documentation was moved
to native_functions.yaml
This also replaces `current_version` with just `_version`.
This is a carved out portion of #28287, rebased past Tensor-Variable
merge.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18504934
Pulled By: ezyang
fbshipit-source-id: be7adf45b637daffe2b0b1631eb31d967525fc31
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29665
Our intention is to merge the static distinction between Tensor and
Variable. Ordinarily, this would entail merging the methods of Tensor
and Variable. But there are a lot of "private"-ish methods on Variable
that we don't actually want to dump onto the Tensor class. So, as prep
work, we move all of those methods off of Variable and into
the torch::autograd::impl namespace (impl as in, please don't use this
end users). This ends up being a fairly large patch because all of
the call sites have to play ball too.
While I was on the topic, I also moved any of the touched functions into
the C++ file, so that modifying them would not trigger a recompilation of
all of torch.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18496169
Pulled By: ezyang
fbshipit-source-id: afb203252620ec274be596b3e7b1d84d321bad3a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29762
Rename this API as discussed, since it's use cases extend beyond only
model parallelism.
ghstack-source-id: 94020627
Test Plan: Unit tests pass
Differential Revision: D18491743
fbshipit-source-id: d07676bb14f072c64da0ce99ee818bcc582efc57
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27979
Adds memory_format keyword argument (positional for cpp).
'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.
---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.
Test Plan: Imported from OSS
Differential Revision: D17980311
Pulled By: VitalyFedyunin
fbshipit-source-id: 12d013521091fcc9c045833577f6dc78d7b1e68f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29970
Add operators and JMP instruction used in PyText model in lite interpreter.
Test Plan: Imported from OSS
Differential Revision: D18555483
fbshipit-source-id: e5124d908762f78fb548505aecf33be8c8503275
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29960
Overload name is required in mobile operators with the same name but different schema. Since it's not used in JIT, it's safe to add overload names for JIT operators.
Test Plan: Imported from OSS
Differential Revision: D18555484
fbshipit-source-id: b451379af24e255d8b0c61b964ae32fd1a64ed34
Summary:
Hi yf225 , I have added **NLLLoss and CrossEntropyLoss.**
```
Also, while using log_softmax in cross_entropy_loss, I am getting an error
../caffe2/../torch/csrc/api/include/torch/nn/functional/loss.h:537:63: error: no matching function for call to log_softmax(const at::Tensor&)’
const Tensor& log_softmax_input = torch::log_softmax(input);
aten/src/ATen/Functions.h:5551:22: note: candidate: at::Tensor at::log_softmax(const at::Tensor&, int64_t, c10::optional<c10::ScalarType>)
static inline Tensor log_softmax(const Tensor & self, int64_t dim, c10::optional<ScalarType> dtype) {
^~~~~~~~~~~
aten/src/ATen/Functions.h:5551:22: note: candidate expects 3 arguments, 1 provided
```
I think the other two parameters should be optional as in python frontend(shown in documentation here at https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.log_softmax ). Rest, there were no errors in build and tests have passed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29812
Differential Revision: D18548249
Pulled By: yf225
fbshipit-source-id: 2ab350abd2a6f498d4dba2345f51ad87471f3038
Summary:
Since torchvision is not using input_channels / output_channels / with_bias in ConvOptions anymore (https://github.com/pytorch/vision/pull/1576), we can remove the bridges now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29838
Differential Revision: D18531481
Pulled By: yf225
fbshipit-source-id: e48d9e8cf110095f83d9ed18b9fec020ec725f3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29914
Currently we're visiting all submodules every time we're visiting a
method of a module.
Test Plan: Imported from OSS
Differential Revision: D18534602
Pulled By: ZolotukhinM
fbshipit-source-id: 38c5b0ab0bdd27599fd0a6af0eaa3603c68a97a8
Summary:
Changelog:
- Expose is_signed for torch.dtype by modifying torch/csrc/Dtype.cpp
- Allow half, bfloat16 and bool to also been "known" by the isSignedType function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29511
Test Plan:
- Add tests in test/test_torch.py
Closes https://github.com/pytorch/pytorch/issues/29475
Differential Revision: D18439030
Pulled By: albanD
fbshipit-source-id: 4b1f9da70c1c8dfd0a5bc028b6936acd1c64af47
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29839
std::to_string isn't reliably available on Android. Use c10::to_string
instead in some more files that we want to add to some Android builds.
Test Plan: CI
Reviewed By: linbinyu
Differential Revision: D18509295
fbshipit-source-id: 678af1abbea05777310499634ab01afbe21134d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29781
Even though the request might not contain any requires_grad tensor,
the return value could. Therefore, we should always include the
autograd context id in the request.
closes#28819
Test Plan: Imported from OSS
Differential Revision: D18496709
Pulled By: mrshenli
fbshipit-source-id: 2f870c410291a1300952895b7488ea07e5574228
Summary:
This PR adds `reset_parameters` to the torch::nn modules whose Python version also has `reset_parameters` defined, so that there is better parity between Python and C++ version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29832
Differential Revision: D18515939
Pulled By: yf225
fbshipit-source-id: 5aa23e5c7ce1026787c04ffeb6c7f167620dd491
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29765
instead of wrapping this C++ function with python that causes
unnecessary overhead, we can move this to pybind and use the `DefaultRpcAgent`
to get the timeout.
ghstack-source-id: 93879236
Test Plan: unit tests pass
Differential Revision: D18493195
fbshipit-source-id: fd0f1f13ee15acb5ea1ae7c696925c9b54304f6d
Summary:
Fix for https://github.com/pytorch/pytorch/issues/21545
We we were silently giving wrong semantics previously:
Python behavior:
```
def test(x=[]):
x.append(1)
return len(x)
print(test()) # 1
print(test()) # 2
```
By checking at the python layer, we prevent any new models from serializing this behavior but do not break existing serialized models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29833
Differential Revision: D18513168
Pulled By: eellison
fbshipit-source-id: 6fe73f28e1f9d39dedeaf67a04718089d14401a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29696
The paths distributed/autograd/context/dist_autograd_context.h and
distributed/autograd/context/dist_autograd_container.h were repetitive.
Therefore renaming these to distributed/autograd/context/context.h and
distributed/autograd/context/container.h
ghstack-source-id: 93850266
Test Plan: waitforbuildbot
Differential Revision: D18467624
fbshipit-source-id: bbf3905396f553006851af296c880c1bd106ec47
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29653
I didn't remove is_variable from Tensor for BC reasons, but I did
remove as many uses as I could from the codebase.
at::impl::variable_excluded_from_dispatch got moved to TensorBody.h
so that it's more widely accessible.
This diff is NOT semantics preserving. Here are the major differences:
- In a number of native operator implementations, we tested that arguments
are not variable. I replaced these with asserts that variable is
excluded from dispatch. I actually don't think these asserts are really
necessary now (they should certainly be true, but it's hard to get
it wrong), but I've kept them for old time's sake. At least, they'll detect
if you call these functions before you've processed variable (indicating
a bug in your kernel.)
- There are a number of places where we do a per-tensor test for being a
variable, for better error reporting when someone commits Tensor/Variable
confusion. Although these tests are substantively the same as the
tests above, in these cases I decided to *delete* the test entirely.
The reasoning is that in these cases, we didn't really care about
dispatch (also, see above; I'm not too sure we really need the dispatch
asserts), we cared about Tensor/Variable confusion. Since Tensor/Variable
confusion is impossible now, we don't need the tests. One of the key
factors which pushed me one way or another was whether or not a function
was doing per-tensor validation; if I kept the assert in such functions,
I'd repeatedly access the TLS. Even if we want to bring back the asserts,
they would have to go somewhere else.
Another similar idiom is the number of places we do !x.defined() ||
x.is_variable(); I treated this equivalently.
- nuclear_norm's computation of compute_uv is a bit weird, but I think
it's OK to just delete the is_variable case (I *suspect* that it is
always the case that self.is_variable(), but it doesn't really matter.)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18496168
Pulled By: ezyang
fbshipit-source-id: 5a1ded931e0c10a6b758ba64a8380d34110e0c3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29634
This implementation supports rpc.remote to self by doing the
following steps:
1. create an owner RRef
2. add the owner RRef to owners_ in RRefContext, and keep it alive
by using RRefId as the ForkId.
3. Go through serde and insert the message to the caller's thread-pool
4. When the response message gets processed, remove the itself from
RRef fork map.
Test Plan: Imported from OSS
Differential Revision: D18445812
Pulled By: mrshenli
fbshipit-source-id: e3b9aa98962c388acbc2ce294101a236d5cb2da6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29673
Following https://github.com/pytorch/pytorch/pull/29364 and https://github.com/pytorch/pytorch/pull/29404, this PR makes `F::EmbeddingFuncOptions` and `F::EmbeddingBagFuncOptions` separate classes from `torch::nn::EmbeddingOptions` and `torch::nn::EmbeddingBagOptions`, so that it's easier to enforce that arguments such as `num_embeddings` and `embedding_dim` are required for `torch::nn::EmbeddingOptions` and `torch::nn::EmbeddingBagOptions`.
Test Plan: Imported from OSS
Differential Revision: D18462540
Pulled By: yf225
fbshipit-source-id: f2abf431e48675b0a9d7f6f398cdb90ff9037c35
Summary:
Uses new overload mechanism for rnns, making it so that python & torchscript go through the same path and using an API that is in line with the one specified
in https://docs.python.org/3/library/typing.html#typing.overload
This brings the TorchScriptable rnns closer to the base implementation; unifying them should be done in a follow up PR but there are still a few limitations that make it difficult to do so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29614
Differential Revision: D18486982
Pulled By: eellison
fbshipit-source-id: aaaea66a4a7f12d2e46199ca254f9e8f7475500e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29632
This PR is BC-breaking in the following way:
Previously, C++ `torch::tensor` with a floating-point literal with no suffix (e.g. `torch::tensor(1.1)`) or a (nested) braced-init-list of
floating-point literals with no suffix (e.g. `torch::tensor({{1.1, 2.2}})` produces a tensor with dtype `at::kDouble`. After this PR, it produces a tensor with dtype `torch::get_default_dtype()`, matching Python `torch.tensor` behavior.
Test Plan: Imported from OSS
Differential Revision: D18465819
Pulled By: yf225
fbshipit-source-id: 6834fe50335c677bc3832f2a5e9cf8d1ede9f665
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29605
Adds a wrapper around the existing createException function that
allows passing of an error string, instead of a regular C++ exception. This
allows us to createExceptions for errors that aren't necessarilu c++
exceptions. This function is used by
https://github.com/pytorch/pytorch/pull/29601 and
https://github.com/pytorch/pytorch/pull/26336.
ghstack-source-id: 93819039
Test Plan: Unit tests pass
Differential Revision: D18439216
fbshipit-source-id: 70b6a2e4f107304e322cdd2630847ad0071bc0c1
Summary:
This PR changes the implementation of C++ Conv{1,2,3}d layers to exactly match the Python version, and add F::conv{1,2,3}d functionals. For more thorough testing, I will rely on the parity test mechanism which uses values from `common_nn.py` to generate the inputs and options that we are interested in testing.
This PR is BC-breaking in the following way:
In `Conv{1,2,3}dOptions`:
- `with_bias` is renamed to `bias`.
- `input_channels` is renamed to `in_channels`.
- `output_channels` is renamed to `out_channels`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28917
Differential Revision: D18471526
Pulled By: yf225
fbshipit-source-id: 7a33f60654ad93cc2e043245e7ff9e0ef9da15b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29213
A trivial use of make_variable is one where requires_grad=False. This
transformation is not technically semantics preserving, as make_variable
will create a shallow copy of the tensor in question; however, I
am guessing that we have the invariant that we don't actually make
use of this shallow copy in a nontrivial way.
There were some cases where the surrounding code expected a Variable proper
to be returned; I retained those sites.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18353503
Pulled By: ezyang
fbshipit-source-id: 57fe34d82e009c0cc852266fb0b79d6d9c62bb03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29203
There is no more Variable/Tensor distinction, so fix the misleading name.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18353505
Pulled By: ezyang
fbshipit-source-id: dadc394d533ab7746f70bc186c6645441a784518