pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nick Gibson	33f4fca1a6	[TensorExpr] remove Let and LetStmt in favour of binding in Block (#37606 ) Summary: Implementation of the less popular proposal for eliminating overlap between LetStmt and Let: removing both and storing a mapping between Var and value Expr in the Block. This complicates some tests but simplifies the IR by restricting where variable binding can occur. I used the unit tests & python integration tests to verify this is correct but I'm unsure of coverage, particularly around the dependency checker in loopnest - ZolotukhinM your review would be useful there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37606 Differential Revision: D21467483 Pulled By: nickgg fbshipit-source-id: b402d3fce4cacf35d75f300f0a7dca32a43b6688	2020-05-09 16:23:37 -07:00
Nick Gibson	ad433e2003	[TensorExpr] Fix a bug in the IR Simplifier that could introduce a division by zero (#38055 ) Summary: In the IR Simplifier when doing partial factorization of Round+Mod patterns we divide by the lower number, which could be zero. Add in a quick check against zero avoid the crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38055 Differential Revision: D21478486 Pulled By: nickgg fbshipit-source-id: c5083f672e91662b7d1271d817cade7fa6c39967	2020-05-08 14:58:53 -07:00
Nick Gibson	f2f8027760	[TensorExpr] simplify trivial adds/subs/muls even in Float (#37960 ) Summary: The IR Simplifier early exits when working with dtypes that are not safe to reorder. There are some cases where we still want to simplify ops in these dtypes: x + 0, x - 0, x * 0 and x * 1. It's safe to eliminate the op here and it reduces clutter in the expr. Also added a quick simplification of casts which do nothing (their type is the same as the underlying). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37960 Differential Revision: D21457736 Pulled By: nickgg fbshipit-source-id: 40e20a3b55fc1afb2ec50071812238a08bded2ac	2020-05-07 17:23:47 -07:00
Ilia Cherniavskii	facc5e0cc4	Make profiler thread local (#36291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36291 Move profiler state to be a thread local property, reuse existing thread local propagation mechanism to ensure correct profiling of async tasks. This also makes push/pop callback thread safe and easier to use in e.g. distributed profilier Test Plan: USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit ./build/bin/test_jit python test/test_autograd.py python test/test_jit.py Differential Revision: D20938501 Pulled By: ilia-cher fbshipit-source-id: c0c6c3eddcfea8fc7c14229534b7246a0ad25845	2020-05-07 14:52:49 -07:00
Ilia Cherniavskii	2ef4010593	Propagate TLS callbacks with ThreadLocalState (#37745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37745 This PR makes it possible to set TLS callbacks and use them transparently not only in the main thread but also in any async tasks Test Plan: Imported from OSS Differential Revision: D21374873 Pulled By: ilia-cher fbshipit-source-id: 3be2e121673b32d7694e17e794f3b474826dffe9	2020-05-07 14:52:44 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00
Ilia Cherniavskii	c24c5f9684	Make RecordFunction callbacks thread local and modernize interface (#37491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491 This PR modernizes RecordFunction API and adds thread local callbacks in addition to the global ones Changes: - support for TLS callbacks, this is going to be the foundation of profiler and other tools - modernize interface around simple set of functions (add\|remove\|has\|clear)(Global\|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough - to properly support add/remove introduce the idea of callback handle returned by add - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run - added tests for new functionality Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit CI record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f Imported from OSS Differential Revision: D21300448 fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43	2020-05-07 14:51:02 -07:00
jiej	1667aa6451	[CUDA_FUSER] Expand operation support for cuda fuser (#37849 ) Summary: This PR added more supported operations in CUDA fuser. We are covering major point-wise operations supported in legacy fuser. In an attempt to adapt to legacy executor: 1. added an naive shape propagation pass on pytorch JIT IR; 2. small refactor on graph partitioning; 3. fallback interpreter execution of fusion group; Pull Request resolved: https://github.com/pytorch/pytorch/pull/37849 Reviewed By: yf225 Differential Revision: D21444320 Pulled By: soumith fbshipit-source-id: 712e18ab8497f8d58a07e6f8d200cdab52cf0d74	2020-05-07 09:21:09 -07:00
Luca Wehrstedt	bc09478a60	[TensorPipe] Use the new multi-payload message API (#37919 ) Summary: In D21209901 TensorPipe added support for a vector of payloads inside each message, instead of a single one, so that users with multiple payloads can send them separately as they are instead of having to copy them into a new block of contiguous memory. The PyTorch agent is using the old API, which is preventing us from deleting it. This change has no effects on over-the-wire format and thus on performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37919 ghstack-source-id: 103572164 Test Plan: On both workers ``` import os import torch import torch.distributed.rpc as rpc os.environ["MASTER_ADDR"] = "127.0.0.1" os.environ["MASTER_PORT"] = "8765" ``` On worker 0 ``` rpc.init_rpc(name="foo", rank=0, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2, rpc_backend_options=rpc.TensorPipeRpcBackendOptions(worker_name_to_id={"foo": 0, "bar": 0})) ``` On worker 1 ``` rpc.init_rpc(name="bar", rank=1, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2, rpc_backend_options=rpc.TensorPipeRpcBackendOptions(worker_name_to_id={"foo": 0, "bar": 0})) ``` On worker 0 ``` In [15]: rpc.rpc_sync("bar", torch.add, args=(torch.full((2,2), 1), torch.full((2,2), 2))) Out[15]: tensor([[3., 3.], [3., 3.]]) In [16]: rpc.rpc_sync("bar", torch.add, args=(1, 2)) Out[16]: 3 ``` Differential Revision: D21425536 fbshipit-source-id: a0ec2be825556b39aff018a2834baf815a6d8fa5	2020-05-07 02:52:30 -07:00
Owen Anderson	65260d48c8	Fix splitWithTail to insert the tail immediately after the outer loop. (#37941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37941 Differential Revision: D21429733 Pulled By: resistor fbshipit-source-id: 12094d990c11da8b44f32a52aa5e50b3f3575145	2020-05-07 00:05:23 -07:00
Michael Suo	b53e6bfd49	[jit] normalize `getMethod` (#37472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37472 Our convention is for `findX` to return an optional version and `getX` to assert that the X is there. Fix up `getMethod` to be consistent with this convention. Test Plan: Imported from OSS Differential Revision: D21297543 Pulled By: suo fbshipit-source-id: b40f56231cc8183e61bbb01fe5c0c113bcb6464d	2020-05-06 15:22:25 -07:00
Jerry Zhang	1ad46f470f	[jit] `__copy__` for `RecursiveScriptModule` (#36830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36830 Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D21431012 fbshipit-source-id: 13a1bf9744ec95ea59622226c8d8a8d55ec3f0b0	2020-05-06 13:55:01 -07:00
Nick Gibson	4e2ea6e013	[TensorExpr] Remove the Tensor argument from loopnest.reorderAxis (#37873 ) Summary: Remove the requirement for the axes provided to reorderAxis to come from a Tensor. We were using that to determine the relevant loops, but we can alternatively determine it by traversing the parents of each provided For. resistor does this work for you? Pull Request resolved: https://github.com/pytorch/pytorch/pull/37873 Differential Revision: D21428016 Pulled By: nickgg fbshipit-source-id: b16b2f41cb443dfc2c6548b7980731d1e7d89a35	2020-05-06 12:02:15 -07:00
Edward Yang	fe88806784	Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893 Original commit changeset: 50746043acf3 Test Plan: sandcastle and ossci Reviewed By: malfet, seemethere, ngimel Differential Revision: D21416509 fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b	2020-05-05 22:43:15 -07:00
Edward Yang	a2fc7f787a	Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count Test Plan: revert-hammer Differential Revision: D21171334 Original commit changeset: 37329a379de9 fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f	2020-05-05 16:36:15 -07:00
Mikhail Zolotukhin	1c0bad25f3	[TensorExpr] Add dtype to class Buf. (#36611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36611 Currently Buf represents underlying storage but it didn't have dtype. That resulted in specifying dtypes in different places and there was no mechanism to enforce its consistency: e.g. one could've created a kFloat expression and use a kInt buffer to store its result. Now we're centralizing where the logic regarding the storage is located and we can start enforcing semantics rules. Follow-ups: we can merge Buffer and BufHandle classes as the former is now a mere wrapper over the latter. Test Plan: Imported from OSS Differential Revision: D21027356 Pulled By: ZolotukhinM fbshipit-source-id: c06aa2c4077fdcde3bb4ca622d324aece79b5a9c	2020-05-05 15:04:37 -07:00
Kurt Mohler	3706803b60	Change StorageImpl to track byte count rather than element count (#37776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776 * Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl * Changed numel() and set_numel() to nbytes() and set_nbytes() * Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028 Differential Revision: D21171334 Pulled By: ezyang fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8	2020-05-05 14:20:51 -07:00
Jeremy Lilley	468a9d448e	[aten] Pass std::function<> to thread_pool by value, instead of const ref. (#37681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37681 By passing by value, we can std::move, and avoid unnecessarily copying args that are part of any std::function/lambda state (e.g. in the jit interpreter, there is a std::vector<> stack passed in the InterpreterContinuation) This makes the api also consistent with e.g. folly and best practices. Added a minor at::launch() benchmark to test/cpp/, the difference is mostly noticeable when copying the std::function<> internal args is non-trivial. Benchmarks pre/post (min over ~5 runs) NoData: 5.81 us -> 5.63 us (-3.2%) WithData(0): 6.67 us -> 5.88 us (-11.8%) WithData(4): 6.98 us -> 6.51 us (-6.7%) WithData(256): 9.44 us -> 7.89 (-16.5%) ghstack-source-id: 103322321 Test Plan: - perf: buck run mode/opt caffe2/test/cpp/api:parallel_benchmark pre/post - correctness buck test mode/dev-nosan caffe2/test/... Reviewed By: dzhulgakov Differential Revision: D21355148 fbshipit-source-id: 3567e730845106f1991091e4a892d093e00571c3	2020-05-05 08:41:38 -07:00
Hongyi Jia	3411ec6e32	[TensorPipe/RPC] Serialize and deserialize message (#36197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36197 Create APIs to convert between rpc::message and tensorpipe::message 1. tensorpipeSerialize() - converts rpc::message to tensorpipe::message without memory copy (tensors). 2. tensorpipeAllocateMessage - allocates rpc::message based on received tensorpipe descriptor to prepare memory-copy-free receiving. Test Plan: buck test caffe2/test/cpp/rpc:test_tensorpipe_serialization Reviewed By: lw Differential Revision: D20084125 fbshipit-source-id: ffbc310f93443e50261aed752be0fe176610dd2a	2020-05-05 05:45:57 -07:00
Nikita Shulga	c0ff085775	[PyTorch] Modify `data_parallel` to work with small tensors (#37704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37704 If input tensor can not be chunked, run `parallel_apply` on fewer devices Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs Differential Revision: D21365416 fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8	2020-05-04 11:06:42 -07:00
Owen Anderson	564de515f5	Add an iterator to Block. (#37542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37542 Differential Revision: D21314421 Pulled By: resistor fbshipit-source-id: e54d7a8a5c9c1186be59f69b5b8af030fc054b32	2020-05-01 15:12:49 -07:00
Nikolay Korovaiko	4ed790d742	Adding symbolic sizes, contiguity, stride indices (#36101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36101 Reviewed By: jamesr66a Differential Revision: D20908711 Pulled By: Krovatkin fbshipit-source-id: f90ce74acffeb645d7d906d07e293164d65ed7e6	2020-05-01 02:01:25 -07:00
Michael Suo	9e32a1f5cd	[wip] update graph fuser aliasdb in-place (#37106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37106 Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Test Plan: Imported from OSS Differential Revision: D21219179 Pulled By: suo fbshipit-source-id: 8ae5397b3a0ad90edec2fbc555647091f1ad5284	2020-04-30 22:21:35 -07:00
Michael Suo	5efd10518f	[jit] speed up alias analysis (#36345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36345 During compilation, we spend a huge amount of time in alias analyis. This PR does a few things to speed it up. 1. Separate the analysis into two phases: one where we build up the necessary data structures, and the other where we service aliasing queries. This allows us to defer building indices/maintaining index consistency until after the "buildup" phase is done. 2. Properly memoize/dynamic program the memory locations lookups. 3. Done naively, setting wildcards invalidates the above memoization, trigger costly recomputation. So I added a cache-aware `setWildcards`. Sadly that means you need alias analysis to reach into the guts of memorydag, but the speedup is worth it. Sadly, these changes are kind of coupled for correctness reasons, so they're all here at once. I used this model (thanks IlyaOvodov) as a provisional benchmark. You can get it here: https://www.dropbox.com/s/jlyygn6yygj1jkx/yolov3.zip. Unzip at run `python test_timing.py`. Baseline: (752.076s) right before `6bc8ffe824` After optimizing before inlining: (699.593s) After deferring cache construction: (426.180s) After cache-aware `setWildcards`: (193.678s) So a nice 75% speedup to overall compilation. There's a lot more to do in other places of the compilation pipeline though. Followup to this PR specifically: Everything that fans out from the `analyze` call is the "buildup" phase of AliasDB construction. This should be factored into a separate analysis pass to statically distinguish the two phases (right now we just null out stuff to accomplish the same thing dynamically). Test Plan: Imported from OSS Differential Revision: D20952727 Pulled By: suo fbshipit-source-id: 099f797222d7e71e5c04991584adc2c7eab5a70f	2020-04-30 18:27:41 -07:00
Owen Anderson	20ba29d81c	Add support for reductions on CPU in tensorexpr (#37333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37333 Differential Revision: D21290289 Pulled By: resistor fbshipit-source-id: ebba11f7af9e22b48c47e2eefb9497fa77acd17d	2020-04-30 10:59:38 -07:00
Mo Zhou	58a46a174e	[cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501 Differential Revision: D21303527 Pulled By: ezyang fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f	2020-04-29 09:26:16 -07:00
Jerry Zhang	6fa76b8a0c	[jit] __deepcopy__ for `RecursiveScriptModule` (#32684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32684 Previously we have `clone` and `clone_instance`, where `clone` will clone both type and value, and `clone_instance` only clone the value, both of them are shallow copies. We need to re-evaluate whether we should expose them as a user facing API. I think we should hide `clone`, but `clone_instance` might be useful as well, especially when we are copying a model with very large weights, people might just want to do shallow copy. This PR adds a `deepcopy` that might be useful as a user API, which deep copies the values, including Tensor, but we didn't deepcopy `Blob`, `Capsule`, `Future` or `PyObject`. For more discussions please see the following issue. fixes: https://github.com/pytorch/pytorch/issues/32519 Test Plan: Imported from OSS Differential Revision: D21220756 fbshipit-source-id: 476bf11fe82c08fac36e7457879a09f545ffdc5e	2020-04-28 18:47:11 -07:00
Ilia Cherniavskii	d068a456d3	[resubmit] Enable global observers API (#37382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37382 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: jamesr66a Differential Revision: D21268320 fbshipit-source-id: 93207e3b55325d20dcc5b1e8f448ab86933321da	2020-04-28 10:49:31 -07:00
Mike Ruberry	b64fc3c4b5	Changes warnings generated in cpp to show point of Python origination (#36052 ) Summary: Today in PyTorch, warnings triggered in C++ are printed to Python users like this: `../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.` This may be unhelpful to Python users, who have complained it's difficult to relate these messages back to their programs. After this PR, warnings that go through the PyWarningHandler and allow it to add context print like this: ``` test/test_torch.py:16463: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:81.) cpu_result = getattr(cpu_tensor, op_str)(*cpu_args) ``` This relates the warning back to the user's program. The information about the cpp file and line number is preserved in the body of the warning message. Some warnings, like those generated in the JIT, already account for a user's Python context, and so they specify that they should be printed verbatim and are unaffected by this change. Warnings originating in Python and warnings that go through c10's warning handler, which prints to cerr, are also unaffected. A test is added to test_torch.py for this behavior. The test relies on uint8 indexing being deprecated and its warning originating from its current header file, which is an unfortunate dependency. We could implement a `torch.warn` function, instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36052 Differential Revision: D20887740 Pulled By: mruberry fbshipit-source-id: d3515c6658a387acb7fccaf83f23dbb452f02847	2020-04-25 21:18:58 -07:00
Bram Wasti	04b36fc264	[TensorExpr] rfactor implementation (#36237 ) Summary: A similar interface to Halide's rfactor: https://halide-lang.org/tutorials/tutorial_lesson_18_parallel_associative_reductions.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/36237 Reviewed By: zheng-xq Differential Revision: D21233309 Pulled By: bwasti fbshipit-source-id: d2706a9e90b707ee195e339f834ff4a54b63a256	2020-04-25 10:01:31 -07:00
Michael Suo	1f08ff12ec	[jit] fix named tuples as attributes (#37251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37251 This was broken by recent changes to how we serialize with type tags. We save a name (like `Dict[str, MyNamedTuple]`) and then relied on the mobile type parser to resolve that name back into a set of types. This doesn't work for any NamedTypes as the mobile type parser doesn't know how to resolve those. The unpickler allows the caller to inject a type resolver in for this purpose, use that so that when importing in a non-mobile environment you get the right results. A second problem also had to be fixed: the SourceImporter type loader would only load named types directly (e.g. `MyNamedTuple`) and choked if it was a general type that contained a named tupe (e.g. `List[MyNamedTuple]`). Fixed that and renamed `loadNamedType` to `loadType` for clarity. Test Plan: Imported from OSS Differential Revision: D21235213 Pulled By: suo fbshipit-source-id: 16db0f4c5e91a890d67a8687cc8ababa6b94b0f4	2020-04-24 17:48:44 -07:00
anjali411	6e92579883	Added autograd support for C->C functions and enabled requires_grad=True for complex (#36932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36932 Differential Revision: D21181230 Pulled By: anjali411 fbshipit-source-id: 295f2cd1e2b9918a8b2cb88cab0536b2407dc455	2020-04-24 12:30:49 -07:00
Mikhail Zolotukhin	ebfe631ed8	[TensorExpr] Cleanup TensorExprKernel class and add CPP tests for it. (#36952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36952 Differential Revision: D21139939 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: a6605c0d6ccbb049ce27e6cdcc8fd8d2ebc057e3	2020-04-23 10:51:33 -07:00
Dmytro Dzhulgakov	50a1850d8d	[pytorch] Route default warning sync to LOG(WARNING) - second try (#36984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36984 Follow LOG(WARNING) format for c++ side warnings in order to play well with larger services, especially when using glog. I need to hook up into GLOG internals a bit in order to override FILE/LINE without having to change the whole thing to be macros, but it seems to be stable between glog versions. Note, this also changes caffe2_log_level to warning by default - I think it's a much better default when compiling without glog (or maybe even have info). With glog output, stderr capture doesn't work any more in tests. That's why we instead use c10-level warnings capture. Test Plan: Run unittest in both glog and non-glog build mode: glog: ``` W0416 12:06:49.778215 3311666 exception_test.cpp:23] Warning: I'm a warning (function TestBody) ``` no-glog: ``` [W exception_test.cpp:23] Warning: I'm a warning (function TestBody) ``` Reviewed By: ilia-cher Differential Revision: D21151351 fbshipit-source-id: fa926d9e480db5ff696990dad3d80f79ef79f24a	2020-04-23 01:08:00 -07:00
Edward Yang	a894fff265	Back out "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API" Summary: Original commit changeset: 636e8a11afc6 Test Plan: export to OSS Reviewed By: malfet Differential Revision: D21170502 fbshipit-source-id: e8f35f103c4924aedbcaaf868475008d24bdeeab	2020-04-22 09:18:23 -07:00
Mikhail Zolotukhin	799793f279	[TensorExpr] Cleanup IRPrinter implementation for statements. (#37050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37050 With this change curly braces are printed as a part of Block rather than a part of the enclosing statement. It allows us, for instance, to more easily see nested blocks: now they will be printed each in its own curly-braced scope. As a side effect, I had to change how we print loop options. Previously we did it like this: ``` for (...) { // <loop options> <loop body (Block)> } ``` Now, since everything in between { and } is a part of the block, we have to do it the following way: ``` for (...) /* <loop options> / { <loop body (Block)> } ``` Note the change from '//' to '/ .. */' for the loop option comments. Test Plan: Imported from OSS Differential Revision: D21171851 Pulled By: ZolotukhinM fbshipit-source-id: 39f51a9e15aec03b6527b0634fd4b9e01a912cda	2020-04-21 23:20:18 -07:00
Mikhail Zolotukhin	b8e2d797c0	[TensorExpr] Insert allocations for temporary buffer at the innermost valid scope. (#36836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36836 Test Plan: Imported from OSS Differential Revision: D21099913 Pulled By: ZolotukhinM fbshipit-source-id: 8faf5f1d55b60bdd4f4b2b909977aeb7abaa95b4	2020-04-21 22:51:46 -07:00
Nick Gibson	25abdcb3d1	[TensorExpr] add Block flattening to IR Simplifier (#37013 ) Summary: Some IR optimizations were leaving superfluous Blocks in the IR, this PR adds simplification and merging of enclosing Block statements to the IR Simplifier, e.g. ``` Block { Stmt 1 Block { Stmt 2 } Block {} } ``` becomes ``` Block { Stmt 1 Stmt 2 } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37013 Differential Revision: D21166208 Pulled By: nickgg fbshipit-source-id: 6dcdf863980d94731a8ddf184882c4a5b7259381	2020-04-21 17:58:18 -07:00
Mikhail Zolotukhin	359e7f4bba	Teach IRParser to parse strides along with sizes in a tensor type. (#36951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36951 Test Plan: Imported from OSS Differential Revision: D21139940 Pulled By: ZolotukhinM fbshipit-source-id: b56a1fddfc9de4684da3ba9a462e344d0985e8b6	2020-04-21 17:27:15 -07:00
Jeremy Lilley	443fe7ca0e	[rpc] Avoid wireDeserializer overreading buffers by 1 byte (#36976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36976 The bounds check and the read were swapped in two places - I noticed ASAN complaining in an unrelated change on an erroneous buffer. Adding a couple simple test cases. ghstack-source-id: 102606986 Test Plan: buck test mode/dev caffe2/test/cpp/rpc: Differential Revision: D21148936 fbshipit-source-id: 7ec5007535f7310437ac1b9a72852a223b9dd29a	2020-04-21 17:01:45 -07:00
James Reed	1592d6842c	[resubmit] Move profiler to a dispatch wrapper (#36766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36766 Original commit changeset: dcb41d243369 ghstack-source-id: 102614215 Test Plan: waitforsadcastle Differential Revision: D21076029 fbshipit-source-id: c2461c57cfd364bd23ff99bc2cb5572d22e23391	2020-04-21 16:37:11 -07:00
Nick Gibson	9854df673c	[TensorExpr] Fix bug in For elimination in the IRSimplifier. (#36965 ) Summary: When doing elimination of For loops which execute once, e.g. `for i = 0; i < 1; ++i { thing; } => thing;` we do var substitution while the temporary simplifier ExprNodes still exist, which could put them in an invalid state and leave unsimplified terms in the expression. The fix is to apply substitution before simplifying the body of the for loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36965 Differential Revision: D21145248 Pulled By: nickgg fbshipit-source-id: d874600c7a098fc05b8ef3109e516e2eaa2c24e0	2020-04-21 13:38:09 -07:00
James Reed	2ccdc39dce	Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API Test Plan: revert-hammer Differential Revision: D21089648 Original commit changeset: 8d54329c1252 fbshipit-source-id: 636e8a11afc628a4cdae9d44824985c10c70555e	2020-04-21 12:21:45 -07:00
Edward Yang	01100cb477	Put TORCH_LIBRARY in torch/library.h; add custom class API (#36742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742 Now, you can define a custom class inside a TORCH_LIBRARY block. It looks very similar to what you did before. Instead of ``` static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo); ``` you write ``` TORCH_LIBRARY(Namespace, m) { m.class_<Class>("Class") .def("foo", foo); } ``` All the old usages still work, but at some point we should start updating the tutorials when we're ready to go 100% live with the new pybind11 style API. custom class API previously lived in torch/ folder and in torch namespace, so for consistency, the new TORCH_LIBRARY also got moved to torch/library.h The definition of Library::class_ is in the bottom of that header because I need all of the class_ constructors available, but there is a circular dependency between the two headers. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D21089648 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507	2020-04-21 10:05:21 -07:00
Ilia Cherniavskii	3ae70cb847	Add RecordFunctionGuard (#36215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36215 Make it possible to disable observers, e.g. to avoid infinite recursion if an observer uses an operator Test Plan: USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit Differential Revision: D20912676 Pulled By: ilia-cher fbshipit-source-id: 29760cdfe488a02f943f755967b78779d6dbcef3	2020-04-20 19:19:14 -07:00
Xiaoqiang Zheng	5e504e83e8	Add sync-point insertions and block/thread local memory allocations (#36563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36563 Test Plan: Imported from OSS Differential Revision: D21014238 Pulled By: zheng-xq fbshipit-source-id: 4d61ff2f76345ea2825f2d5f60a771f65b24ad69	2020-04-20 18:52:30 -07:00
Xiaoqiang Zheng	32bbf12aa7	Make trivial thread-idx for degenerate statements without thread-idx. (#36480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36480 Test Plan: Imported from OSS Differential Revision: D20992505 Pulled By: zheng-xq fbshipit-source-id: 3d4e5401b59b9507b5f2db659e511bd1af53f5ab	2020-04-17 02:31:07 -07:00
Mike Ruberry	b45b9673a1	Fixes clang format (#36787 ) Summary: Fixes clang format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36787 Differential Revision: D21084603 Pulled By: mruberry fbshipit-source-id: 7e29da135f9a2aa126cb68640e33c1914fd570e3	2020-04-17 00:42:51 -07:00
Wanchao Liang	6d4c509168	[autograd] lower MAX_DEPTH limit according to TSAN limit (#36745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36745 As we hold a mutex for our custom C++ Node, when calling reentrant backward from custom C++ function, we will cocurrently holding many mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise it will complain. This PR lower the limit according to TSAN. TSAN Reference: https://github.com/google/sanitizers/issues/950 Test Plan: Imported from OSS Differential Revision: D21072604 Pulled By: wanchaol fbshipit-source-id: 99cd1acab41a203d834fa4947f4e6f0ffd2e70f2	2020-04-16 20:43:20 -07:00
Owen Anderson	1fc3556ec9	Teach the tensorexpr vectorizer to handle nested For loops. (#36467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36467 Differential Revision: D21013179 Pulled By: resistor fbshipit-source-id: aa4f3da58cf16934f11e0cf4252a300cbac98f21	2020-04-16 15:40:44 -07:00

1 2 3 4 5 ...

878 Commits