Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554
In native_functions.yaml, the schemas for batch_norm and instance_norm
are incorrect: the inputs `running_mean` and `running_var` are mutated,
but are not marked as such in the function schema. Since `(a!)?`
annotations are currently not working (see #65760), this instead adds a
special case to `alias_anaysis.cpp`. If the value of `training` or
`use_input_stats` is known to be `false`, then `alias_analysis` will
mark the input as _not_ being written to.
Test Plan:
Removed the `skip` annotation on the following test, and added a special
exception in `check_alias_annotations`:
```
python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm
```
Also:
```
./build/bin/test_jit --gtest_filter="*BatchAndInstanceNormFixture*"
```
Imported from OSS
Reviewed By: eellison
Differential Revision: D31612339
fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb
Summary:
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309
Reviewed By: swolchok
Differential Revision: D31060315
Pulled By: gmagogsfm
fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D31705358
fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48
Summary:
- Adds Node base class and unit tests
- Also adds metadata utils to enable source code annotation and scope tracking
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601
Test Plan: Add new unit tests
Reviewed By: desertfire
Differential Revision: D31634044
fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066
I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395
Test Plan:
Added automated tests for TupleElements.
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.
Reviewed By: dhruvbird
Differential Revision: D30592622
fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393
Third try!
Fixes:
- test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that.
- ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed.
ghstack-source-id: 140560113
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D31534735
fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878
See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context:
In this PR:
- Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel
- Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node.
- This limitation makes it impossible to use this node for general use
- view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor)
- do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable
- Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature:
- static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`.
- Adds testing:
- for views: view relationship created
- performing in-place operation on the view, raises properly
- trying to create two view relationships is not allowed,
- single view relationship but not first input/first output should error
- view relation created properly for tensor vector output
- for in-place:
- version count bump
- triggers rebase_history
- multiple mutations is okay and also updates version counter
- TODO (follow up): Update tutorials for adding third-party operators (and document the above limitations)
- TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code
EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30901714
Pulled By: soulitzer
fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421
Original commit changeset: ab6bb8fe4e83
Plus this incldes BUILD.bazel changes, the reason for the revert.
Test Plan: See original diff
Reviewed By: gdankel
Differential Revision: D31542513
fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242
While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D31506853
Pulled By: anijain2305
fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445
PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D31503350
fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397
This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.
Test Plan:
test_lite_interpreter
Imported from OSS
Reviewed By: raziel
Differential Revision: D30710710
fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345
FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165
Test Plan:
CI
perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.
Reviewed By: hlu1
Differential Revision: D31027361
fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
Summary:
These utils are prerequisites for Lazy Node base class.
- set up new torch/csrc/lazy, test/cpp/lazy dirs
- add source files to build_variables.bzl in new lazy_core_sources var
- create new test_lazy binary
Fixes https://github.com/pytorch/pytorch/issues/65636
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181
Original commit changeset: 3d0d5377d71e
Test Plan:
Run PyTorch XLA corresponding PR in XLA CI:
https://github.com/pytorch/xla/pull/3148/files
Reviewed By: suo
Differential Revision: D31416438
fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129
1. Add an api to get `supported_types` from runtime, expose in c++ only.
2. Add an api to get `contained_types` from model, expose in both c++ and PyThon.
3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string.
4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime.
5. Expand the unittest for compatibility to cover type
6. Add unit test in python to check type list
ghstack-source-id: 139826944
Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail'
buck test //caffe2/test:mobile
```
Reviewed By: iseeyuan
Differential Revision: D30231419
fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947
Summary:
These utils are prerequisites for Lazy Node base class.
- set up new torch/csrc/lazy, test/cpp/lazy dirs
- add source files to build_variables.bzl in new lazy_core_sources var
- create new test_lazy binary
Fixes https://github.com/pytorch/pytorch/issues/65636
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635
Reviewed By: alanwaketan
Differential Revision: D31260343
Pulled By: wconstab
fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Do not delete `caffe2::OperatorBase::Output` calls as they have side effects
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041
Reviewed By: ngimel
Differential Revision: D31360142
Pulled By: malfet
fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025
This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (cd458fe092)) to minimize its exposure only to `StaticRuntime` as of now.
Test Plan: Modified existing unit tests whose behavior depends on D30437737 (cd458fe092).
Reviewed By: eellison
Differential Revision: D31350285
fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954
Reviewed By: ngimel
Differential Revision: D31326599
Pulled By: malfet
fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
Summary:
Description:
- Have only added `stdout` and `stderr` as possible options from python
API for now. We can do file path passing later maybe.
- Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file.
Python API:
`torch._C._jit_set_logging_stream('stdout|stderr')`
C++ API:
`::torch::jit::set_jit_logging_output_stream(ostream);`
Testing:
- Tested python API locally.
- Unit test for the C++ API is written
Fixes https://github.com/pytorch/pytorch/issues/54182
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768
Reviewed By: mrshenli
Differential Revision: D31291739
Pulled By: ZolotukhinM
fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861
First in a series. This PR changes the code in deploy.h/cpp and
interpreter_impl.h/cpp to be camel case instead of snake case. Starting
with this as it has the most impact on downstream users.
Test Plan: Imported from OSS
Reviewed By: shannonzhu
Differential Revision: D31291183
Pulled By: suo
fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551
Previously we had a big switch on Op kind to decide how to lower a given
JIT operator to NNC. This PR changes this switch to a hash table lookup.
Why? This helps us with at least two things:
1) With this approach we can easily check if we know how to handle a
given node in advance - i.e. we can inspect the entire graph and tell
whether it's possible to compile it or not without actually trying to do
that and dying in the middle. This would allow us to, say, provide
user-friendly error messages in AOT workflow.
2) We can switch to use schema instead of op kind to determine correct
lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963)
and using it instead of schema can lead to bugs.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D31148926
Pulled By: ZolotukhinM
fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879
This change makes the output of `prim::TupleConstruct` alias only with its inputs *when* the created tuple is directly returned from the graph.
The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used.
Test Plan:
Added
- `AliasMoveForTupleConstructWithSingleUseAsGraphOutput`
- `WildcardAliasForTupleConstructWithUses`
to cover the newly added code.
Reviewed By: eellison
Differential Revision: D30437737
fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173
Initializes dummy NCCL communicators in constructor for a basic health
check that communicators can be initialized prior to launching the first
collective.
After successful init, we immediately use `ncclCommAbort` to destroy these
communicators to ensure they don't interfere with regular communicator creation
during collectives.
Test Plan: CI
Reviewed By: pritamdamania87
Differential Revision: D31005792
fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824
See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite.
ghstack-source-id: 138958896
Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector.
Reviewed By: suo
Differential Revision: D30860719
fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f
Summary:
1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4.
2. Bump bytecode version from 6 to 7
3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators.
4. unittest to cover backport function
5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651
ghstack-source-id: 138539912
Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions
```
Reviewed By: raziel, tugsbayasgalan
Differential Revision: D30454080
fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179
This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.
Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.
Reviewed By: iseeyuan
Differential Revision: D31006555
fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac