Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19607
Explicit is better than implicit - it's pretty hard to debug where particular file is if it's not greppable.
As a follow up step - we should look whether we can just include build_variables.py in CMake directly to share setups of two build systems
Reviewed By: ezyang
Differential Revision: D15023348
fbshipit-source-id: 600ef2d1871bc28530c6a02681b284f7499904df
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19282
This is largely a hack because we need to use the function schema parser from ATen/core
but aren't clear yet on how the final software architecture should look like.
- Add function schema parser files from jit to ATen/core build target.
- Also move ATen/core build target one directory up to allow this.
We only change the build targets and don't move the files yet because this is likely
not the final build set up and we want to avoid repeated interruptions
for other developers. cc zdevito
Reviewed By: dzhulgakov
Differential Revision: D14931922
fbshipit-source-id: 26462e2e7aec9e0964706138edd3d87a83b964e3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19281
String<->Number conversions aren't available in the STL used in our Android environment.
This diff adds workarounds for that so that the function schema parser can be compiled for android
Reviewed By: dzhulgakov
Differential Revision: D14931649
fbshipit-source-id: d5d386f2c474d3742ed89e52dff751513142efad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19280
We want to use the function schema parser from ATen/core, but with as little dependencies as possible.
This diff moves the function schema parser into its own file and removes some of its dependencies.
Reviewed By: dzhulgakov
Differential Revision: D14931651
fbshipit-source-id: c2d787202795ff034da8cba255b9f007e69b4aea
Summary:
This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where:
* compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr`
* GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`.
* Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things.
* This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound. Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`.
* This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions. Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ...
Details:
* In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs.
* When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class.
* The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167
Differential Revision: D14891966
Pulled By: zdevito
fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18314
ghimport-source-id: 8cecb768d476ab19c9460f39c8f94a764e4cb052
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18314 Add ability to specialize class types to ArgumentSpec**
* #18226 Add Slot type to abstract the raw pointers being used for slots.
Differential Revision: D14574395
fbshipit-source-id: cc3af6e56e9ae52990f4a1ad56ecceaa2d493577
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18763
Without `link_whole` flag in opt-builds some of the files are not linked into `_C_impl` library, which causes some of static initializers not to run (namely, registering an cutomPythonOperation from python_interpreter.cpp). This diff fixes it.
Differential Revision: D14732471
fbshipit-source-id: 57cff6b4b6d479ad7ab7fd29f677746d91d6ff45
Summary:
This commit adds the `c10d::Reducer` class that hooks into autograd
and performs gradient bucketing and reduction. These are the core
parts of `nn.parallel.DistributedDataParallel` that up to now were
only usable for CUDA models.
This should enable the following:
* Distributed data parallelism for models defined using the C++ frontend.
* Allow overlap of gradient computation and reduction for non-CUDA models.
* Enable distributed data parallelism for models with some unused parameters.
This does not include any logic for computing bucket assignment, which
can be done separately; either by observing autograd execution order
(this is what Apex does), or by assigning buckets based on some
maximum byte size, or both.
Also see #17757 and #13273.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251
Reviewed By: mrshenli
Differential Revision: D14571899
Pulled By: pietern
fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c
Summary:
This defines a generic counters API that users can utilize to provide monitoring functionality in e.g. a production service. We expose both counters for runtime internals as well as a TorchScript API to create user-defined counters. Synopsis of the API:
- `torch/csrc/jit/script/logging.h` specifies the externally-facing API in C++
- `torch/jit/_logging.py` specifies the Python API
We use an interface, `LoggerBase`, to define the interactions between users and a logging backend. Implementing a subclass of `LoggerBase` allows the user to handle these events in a custom way, such as logging into a DB or calling into an infra-specific counters API.
From the frontend perspective, we can create log events in two ways:
1. We provide an `add_stat_value(name, val)` function. This calls into the Logger backend with a key/value pair. For example, we might call `add_stat_value('foo', 1)` to bump an event counter.
2. We provide a `time_point()` function to record a timestamp in nanoseconds. This can be used in conjunction with `add_stat_value` to record runtime wall clock durations.
Examples of frontend usage can be found in `test_jit.py TestLogging`.
We provide a trivial `LockingLogger` implementation as an example and for testing purposes. It is likely not ready for production usage. It demonstrates that a backend implementing the API can do things like specify aggregation types and report these aggregate stats via the `get_counters()` API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18235
Differential Revision: D14545060
Pulled By: jamesr66a
fbshipit-source-id: 04099543a1898cfdd411511e46e03d5dce9b4881
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18267
Motivation: we don't actually want to use it for real under any circumstances. This is an idea to unblock our internal progress and parallelize workstreams. We can easily define schemas for all ops in question and implement forwarding to C2 ops which is NOT going to be performant. Then several things can be happening in parallel:
* move code of ops outside of C2 ops that depend on protobuf into c10
* development of optimization/fusion passes
* building python-level wrappers with clean API
* improving perf
This demonstrates, Relu, quant, dequant. It seems to cover all use cases necessary (maybe except weights prepacking). Ideally I'd demonstrate Conv, but will get to it later in a separate PR (contributions welcomed)
Reviewed By: ezyang
Differential Revision: D14531232
fbshipit-source-id: 4cd4a71ae0cb373c6c0e81f965c442b82a1b4069
Summary:
Allows serialization/loading of attributes (`IValue`s of any type).
* metadata (attribute name, type) is stored in the `model.json`
* The binary format is a subset of the `pickle` module that supports the operations necessary for `IValue`s
* Attributes are serialized in the order they are defined on a module to a list in a single `attributes` file, with submodule attributes coming first. This order directly matches the order attributes are listed in `model.json`
* This can be inspected in Python with `pickle.load()` or with `pickletools` (PyTorch need not be installed for this to work)
* A class is used to store a tensor's index into the tensor table of the model, so to unpickle the file you have to use a custom Unpickler:
```python
class TensorID(object):
def __setstate__(self, id):
self.id = id
class JitUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == '__main__' and name == 'TensorID':
return TensorID
JitUnpickler(open("my_model/attributes.pkl", "rb")).load()
```
* pickle format: https://svn.python.org/projects/python/trunk/Lib/pickletools.py
* It currently does not support/guarantee that anything saved out with `pickle` (i.e. if you edit `attributes` with `pickle` directly) instead of our tools will be imported correctly
Also will fix#17683 and fix#16367
Followup Work:
* document format / choice of pickle: #17951
* create an example
* list specializations
* int size specializations, large binputs
* do a first pass over attributes to output only necessary `BINPUT` ops
* attribute reassignment (e.g `self.my_attribute = new_value`)
* `tensor.save("some_checkpoint.pkl")` support with tensors embedded in Pickle file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17423
Differential Revision: D14470965
Pulled By: driazati
fbshipit-source-id: 6a21a9939efdbe59b4bc57fd31d6d630bab5297e
Summary:
Stack:
⚫ **#17856 [jit] support serialization of classes** [💛](https://our.intern.facebook.com/intern/diff/D14402599/)
Add support for saving/loading TorchScript modules that depend on user-defned classes.
We track class dependencies the same we track tensor constants, then write them
all out such that we can just compile them in order before compiling the module
hierarchy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17856
Reviewed By: shannonzhu
Differential Revision: D14461599
Pulled By: suo
fbshipit-source-id: 7115f87e069fd00dc8381d7de9997864fef7ea9f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17742
This path isn't used anymore, and is incompatible with the changes stacked on top of this diff.
Removing it.
cc bwasti to check and confirm these can really be deleted
Reviewed By: ezyang
Differential Revision: D14362426
fbshipit-source-id: 32cdc19f28c2a981ae1e204901420998367ee588
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17729
When doing "import torch" in fbcode, previously the caffe2 cuda kernels weren't loaded because libcaffe2_gpu.so wasn't loaded.
Once you also did "from caffe2.python import workspace", then the cuda kernels were loaded because that triggered a runtime mechanism for loading libcaffe2_gpu.so.
We want the cuda kernels to always be available, so this diff adds a dependency from caffe2:libtorch_cuda to caffe2:caffe2_gpu.
Reviewed By: ezyang
Differential Revision: D14353498
fbshipit-source-id: 76a9fe69f231b308ab40eac393bb216c6fad3658
Summary:
TH_Index_Base is hard coded to 0 and can be removed from the code base.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17591
Differential Revision: D14269273
Pulled By: izdeby
fbshipit-source-id: d844e261f4af7297bad8a81e7d6dcf0a391b94e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17528
as title. register_prim_ops is messy because someone ruined clang-format, but I figured it's okay to include here since this is such a mechanical change
Reviewed By: driazati
Differential Revision: D14236943
fbshipit-source-id: c2b22845837b7f830015510e48ec2ee5202fa407
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17594
The original version of this broke things because a concurrent change raced with it in CI.
Reviewed By: ezyang
Differential Revision: D14266663
fbshipit-source-id: e8ac5dfcb7349b4f2c425d9f0eabbfc964314063
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17511
AliasTracker was doing bookkeeping for three concepts: the points-to graph,
writes, and wildcards.
This PR makes AliasTracker's job clearer: it keeps track of the points-to
graph. Thus it has been renamed MemoryDAG. Write and wildcard information were
pulled back into AliasDb as part of this—I may decide to pull them into their
own little modules since I don't want the alias analysis stuff to get too
bloated.
This refactor is necessary because we want to start tracking information for
aliasing elements that _aren't_ first-class IR Values (e.g. the "stuff" inside
a list). So MemoryDAG can't know too much about Values
Reviewed By: houseroad
Differential Revision: D14231251
fbshipit-source-id: 6cd98ae6fced8d6c1522c2454da77c3c1b2b0504
Summary:
First pass at user defined types. The following is contained in this PR:
- `UserType` type, which contains a reference to a module with all methods for the type, and a separate namespace for data attributes (map of name -> TypePtr).
- `UserTypeRegistry`, similar to the operator registry
- `UserObject` which is the runtime representation of the user type (just a map of names -> IValues)
- `UserTypeValue` SugaredValue, to manage getattr and setattr while generating IR, plus compiler.cpp changes to make that work.
- Frontend changes to get `torch.jit.script` to work as a class decorator
- `ClassDef` node in our AST.
- primitive ops for object creation, setattr, and getattr, plus alias analysis changes to make mutation safe.
Things that definitely need to get done:
- Import/export, python_print support
- String frontend doesn't understand class definitions yet
- Python interop (using a user-defined type outside TorchScript) is completely broken
- Static methods (without `self`) don't work
Things that are nice but not essential:
- Method definition shouldn't matter (right now you can only reference a method that's already been defined)
- Class definitions can only contain defs, no other expressions are supported.
Things I definitely won't do initially:
- Polymorphism/inheritance
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17314
Differential Revision: D14194065
Pulled By: suo
fbshipit-source-id: c5434afdb9b39f84b7c85a9fdc2891f8250b5025
Summary:
Creates a new shared type parser to be shared between the IR parser and the Schema Parser.
Also adds parsing of CompleteTensorType and DimensionedTensorType, and feature-gates that for the IRParser.
Renames the existing type_parser for python annotations, python_type_parser, and names the new one jit_type_parser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17383
Differential Revision: D14186438
Pulled By: eellison
fbshipit-source-id: bbd5e337917d8862c7c6fa0a0006efa101c76afe
Summary:
light weight implementation of LLVM filecheck utility. Currently only handles string matching - regexes & saving a regex to a variable name can be added as needed.
Current intended usage is through FileCheckBuilder python handle, and is shown in the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16858
Differential Revision: D14096244
Pulled By: eellison
fbshipit-source-id: c7c8d1457691c105e6ccbb3c1a378d96baac2569
Summary:
It might need some cleaning up and might be missing some features, but it should be already working for most cases.
This PR is based on top of PR16986 (so please review only the last commit here).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16987
Differential Revision: D14074577
Pulled By: ZolotukhinM
fbshipit-source-id: 712b598f423265655f574bb9903e2066628eaad3
Summary:
Currently the converters are very straightforward, i.e. there is no code for trying to
preserve semantics, we're purely perform conversion from one format to another.
Two things that we might want to add/change:
1. Add semantic conversion as well (but probably it would be a good idea to keep
it separate as a temporary thing).
2. Make sure we don't mess with value names, as they are crucial for current
uses of NetDefs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17123
Differential Revision: D14090244
Pulled By: ZolotukhinM
fbshipit-source-id: 07175fa9235582e1d1da5f10a42a5c1280b1b394
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16534
All c10 ops from the c10 dispatcher are now automatically registered with JIT
Reviewed By: dzhulgakov
Differential Revision: D13869275
fbshipit-source-id: 5ab5dec5b983fe661f977f9d29d8036768cdcab6
Summary:
This just moves thing around to make AliasTracker independently testable and keep things a little more separate. Follow-on PRs will change the interfaces of AliasDb and AliasTracker to be more clearly distinct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16588
Differential Revision: D13891894
Pulled By: suo
fbshipit-source-id: c5b590b5fdd462afefe743e499034068bf35784a
Summary:
This puts stubs in the autograd profiler for the use of cuda APIs allowing the cuda parts of libtorch to be linked separately from the CPU parts.
This also edits the buck build.
Previous:
For GPU builds:
_C -> csrc -> caffe2
For CPU builds:
_C -> csrc-cpu -> caffe2
Now:
GPU:
_C -> libtorch_cuda -> (libtorch -> caffe2, for CPU)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15898
Reviewed By: ailzhang
Differential Revision: D13617991
Pulled By: zdevito
fbshipit-source-id: 6d84a50bb356a54b4217f93219902755601b00e1
Summary:
This improves the error message for "unknown builtin op" to suggest similarly named ops.
Currently it prints out all operators with a name within two edits.
Related issue: https://github.com/pytorch/pytorch/issues/13409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15183
Differential Revision: D13578509
Pulled By: eellison
fbshipit-source-id: 5c73408eda1f7aa456f5bd28790c34df0c76aeca
Summary:
This PR allows a subclass of programs that have return statements that are not final in the graph.
`final_returns.h` contains the a comment describing how this is accomplished.
To minimize complexity in `compiler.cpp`, this pass is done as an AST-to-AST rewrite before the compiler runs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15463
Differential Revision: D13538962
Pulled By: zdevito
fbshipit-source-id: 67105ca873351825b4a364092ab1873779f3e462
Summary:
This separates the different parts of compiler.cpp to make their relationship more clear. In particular it adds:
* sugared_value.{h,cpp} - all the public SugaredValues that the compiler defines and a few that were inside compiler.cpp
* type_parser.{h, cpp} - Turns TreeRef's defining types into TypePtr
* schema_matching.{h, cpp} - infrastructure for matching arguments against overloaded schema and emitting builtin operators with a particular schema.
Retains:
* compiler.{h, cpp} - now responsible simply for the `defineMethodsInModule` infra structure.
Some utility functions like inlineCallTo have moved to ir.h.
Only thing that is not a move is some changes in module.h/cpp that remove multiple returns from `Method::emit_call_to`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15355
Reviewed By: suo, wanchaol
Differential Revision: D13507524
Pulled By: zdevito
fbshipit-source-id: 69ec936a9ff1a383c12a883616346b219c72e393
Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).
We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.
This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.
Differential Revision: D13494635
Pulled By: ailzhang
fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf