pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Wanchao Liang	f41742ff2f	[autograd] remove spinning for dist engine (#36606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36606 This PR refactor the continuation logic of the async mode on autograd engine, to avoid launch spinning works. To achieve that: 1. remove the continuation logic in execute_graph_task_with_continuiation 2. separate the usage of execute_graph_task between dist_engine and local engine, now dist_engine universally use `execute_graph_task_until_ready_queue_empty` (a better name appreciated here). 3. remove enqueue_blocked_task_on_cpu 4. remove the async mode in `execute_with_graph_task` as we don't need to use it in dist_engine Test Plan: Imported from OSS Differential Revision: D21032731 Pulled By: wanchaol fbshipit-source-id: 708ea3bc14815bdc151b56afa15eb85b4ac0f4b1	2020-04-26 22:23:30 -07:00
Wanchao Liang	ed9ec3c96f	[autograd] refactor some functions (#37061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37061 This PR refactors: 1. `set_device` to make it out of Engine 2. put `graph_task_completed` into GraphTask 3. put `mark_graph_task_completed` into GraphTask This also make the distributed engine easy to call those functions. Test Plan: Imported from OSS Differential Revision: D21188688 Pulled By: wanchaol fbshipit-source-id: f56106e6ed7d966cfa4d962781c7865cc3c5321d	2020-04-26 22:21:59 -07:00
HC Zhu	ea97fa1f2a	[PyTorch][Dist] Trigger pre/post hooks of output function nodes under distributed autograd (#34501 ) Summary: # Goals Do the following things during a distributed backward pass. 1. Accumulate the gradient of a variable to RPC context once the gradient is ready instead of at the very end of the backward pass. 2. Run post/pre hooks installed in`AccumulateGrad` nodes once the gradient is ready for the variable. Currently, the hooks in `AccumulateGrad` are not executed just because the function `AccumulateGrad` itself is not even evaluated by the local engine. 3. Make it extensible to support post hooks installed by DDP's reducer. # Introduce GradCapturePreHook ## Why do we need this? ### Root issue: * dist engine uses the autograd.grad-like API on the vanilla engine and then in the Future callback populates the context with the gradients. This is a bad emulation of the .backward() call on the vanilla engine. ### Practical issue: * The leaf’s hook are not called (because associated with the AccumulateGrad that is not call in the autograd.grad-like API). Modules like DDP rely on these hooks. * The Future is marked as completed before the context is actually populated with the grads leading to unexpected behavior on the user side. * The Future callback is only called at the complete end of the backward and so too late for DDP if they want to overlap compute/transfert. ### Proposed solution: * Provide hooks in the autograd.grad-like API that will allow the distributed engine to populate the context and call the hooks to better emulate the .backward call. ## Who can install a grad capture pre-hook? This will be an internal hook at C++ level and it won’t be exposed to PyThon code. Only call-sites directly interacting with the local engine can install such hooks. ## Signature The returned `grad` will be captured. ``` virtual const torch::Tensor& grad operator()(const torch::Tensor& grads) = 0; ``` ## Where are hooks installed? Grad capture pre-hooks are install in GraphTask::ExecInfo::Capture. ExecInfo is per node. Every backward run will have its own GraphTask instance. ## When/How will hooks be called? When the local engine captures the grads for a node, all grad capture pre hooks are called one by one in the order they are added. The output grads of the hooks will replace the original grads. The output of the last hook will be used for grad capturing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34501 Test Plan: All existing tests should pass. ``` python setup.py develop python test/distributed/rpc/test_dist_autograd_spawn.py DistAutogradTestWithSpawn.test_post_hooks ``` Differential Revision: D20953673 Pulled By: hczhu fbshipit-source-id: 543b3844823330ea9f9856bab7c5cb2679290a53	2020-04-21 13:23:18 -07:00
Wanchao Liang	6d4c509168	[autograd] lower MAX_DEPTH limit according to TSAN limit (#36745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36745 As we hold a mutex for our custom C++ Node, when calling reentrant backward from custom C++ function, we will cocurrently holding many mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise it will complain. This PR lower the limit according to TSAN. TSAN Reference: https://github.com/google/sanitizers/issues/950 Test Plan: Imported from OSS Differential Revision: D21072604 Pulled By: wanchaol fbshipit-source-id: 99cd1acab41a203d834fa4947f4e6f0ffd2e70f2	2020-04-16 20:43:20 -07:00
Jeremy Lilley	0035aeef40	[autograd] Avoid holding lock when completing GraphTask futureResult (#35101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35101 TSAN is noting lock-order-inversion in context of dist autograd because we're holding lock when GraphTask calls markCompleted() on the relevant futureResult_. Add an atomic bool to make it possible to protect this without holding the mutex, and also fix alignment of a few struct vars. ghstack-source-id: 101805283 Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/rpc:dist_autograd_spawn_thrift Differential Revision: D20553517 fbshipit-source-id: 446e3718dd68876bd312166ecceed1d92868ce4e	2020-04-13 15:23:47 -07:00
Ilia Cherniavskii	a5bfcc5323	Unify management of thread local settings (#35523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35523 In this PR we extend ThreadLocalState to cover dispatch keys and ThreadLocalDebugInfo and move it from JIT interpreter down to thread management (at::launch) and autograd (backward threads) code Test Plan: unit tests (CI) Reviewed By: dzhulgakov Differential Revision: D20615714 fbshipit-source-id: 16a9fc96a25cb6c2629230b1187fbf78786ac565	2020-04-01 01:56:39 -07:00
Ilia Cherniavskii	bc6bd0bb1a	Debug Information Guard Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other) Test Plan: CI test/cpp/jit Reviewed By: dzhulgakov Differential Revision: D20602775 fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb	2020-04-01 01:55:29 -07:00
Wanchao Liang	618104185b	[autograd] enable graph level thread parallelism on CPU (#33157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33157 This PR enables graph level thread parallelism on CPU for the Autograd Engine. It replace https://github.com/pytorch/pytorch/pull/29574 for the reason of task level parallelism drawbacks with the existing autograd system. Fixes https://github.com/pytorch/pytorch/issues/18333 The graph level parallelism on CPU design: 1. Remove the single CPU thread that init in the Engine itself and allow the owning thread (which calls Engine::execute) to drive the Engine execution so that we could let outer threading to enable thread parallelism. 2. Maintain a separate ReadyQueue per CPU thread, and stash the ReadyQueue for different devices/threads into the thread local shared_ptr, the Engine itself will memorize the shared_ptr of the ReadyQueue to different devices (other than CPU) 3. The CPU thread local ReadyQueue is initialized per CPU thread Engine::execute call (or `backward()`, `grad()` call), and memorized the shared_ptr into the GraphTask since every `backward()` call have its own GraphTask 4. Cross device NodeTask push is accomplished by 2 and 3. we can refer to device's ReadyQueue from Engine, and CPU's ReadyQueue from GraphTask, which means if we can push to a different ReadyQueue according to the device 5. Termination of the CPU thread: if we mark the graph_task as completed, we will exit the while loop and terminate the current backward execution, because it's guranteed that all other NodeTasks is finished before we mark a GraphTask as complete 6. re-entrant thread logic keeps the same, reentrant thread detection is similar as before, we set the worker_device to NO_DEVICE initially and set to CPU afterward to detect if this is a reentrant call or not. 7. we still have the reentrant thread pool that create new threads if it's a deep reentrant case, and reuse the ReadyQueue with the parent thread for performance. Since we introduce the thread parallelism on CPU, we have to ensure the thread safety of the GraphTask. This is not a problem if we execute all forward in different threads since we will build separate GraphTask in different threads, and each GraphTask is a separate instance that share nothing, i.e. Hogwild training on CPU should be fine on this case. But there might be case that user would like to do some part of the task in a single thread, and do the rest of work in several threads concurrently, so thread safety is crucial in those cases. The thread safety strategy for the multithread autograd is as follows: 1. Add a mutex to protect thread safety in Autograd Node/Function, and hold the lock for different data racing cases 2. Lock the mutex during Node::apply(), this is to ensure Node that writing to the shared variable are not racing across threads (i.e. AccumulateGrad and custom C++ Autograd Node if writing to shared variables ) 3. Lock the mutex during Node::release_variables(), this serve the purpose that when we release saved_variables from one thread, no other threads can call the Node::apply(), this ensures the variable references from other threads aren't dangling. 4. If we don't release any variables and no shared data read/write in the Node i.e. purely functional, we don't lock the mutex This way we could protect the thread safety on Autograd Node, but we could still not protect the thread safety on Node pre/post C++ hooks (python hooks are automatically thread safe), we rely on the user to write thread safe C++ hooks if they want the hook to be correctly applied in multithreading environment. User visiable changes: There're not too much user visiable changes, since we use the owning thread to drive the autograd execution, user could write their own threading code and does not block on the Autograd engine, some behaviors that user should be aware of: Non-determinism: if we are calling backward() on multiple thread concurrently but with shared inputs (i.e. Hogwild CPU training). Since parameters are automatically shared across threads, gradient accumulation might become non-deterministic on backward calls across threads, because two backward calls might access and try to accumulate the same .grad attribute. This is technically not safe, and it might result in racing condition and the result might be invalid to use. But this is expected pattern if user are using the multithreading approach to drive the whole training process but using shared parameters, user who use multithreading should have the threading model in mind and should expect this to happen. User should use the functional interface `torch.autograd.grad()` to calculate the gradients instead of `backward()` on loss. Graph retaining: If part of the autograd graph is shared between threads, i.e. run first part of forward single thread, then run second part in multiple threads, then the first part of graph is shared. In this case different threads execute grad() or backward() on the same graph might have issue of destroying the graph on the fly of one thread, and the other thread will crash in this case. We will error out to the user similar to what call `backward()` twice with out `retain_graph=True`, and let the user know they should use `retain_graph=True`. TODOs: [ ] benchmark the PR with example models and datasets to demonstrate the performance gain in CPU training [ ] ensure that we don't regress the single thread autograd performance Follow ups: [ ] a correct and tight integration with distributed autograd [ ] try to unify the thread pool between JIT and Autograd, and see if there's unifying pattern that we could apply universally Test Plan: Imported from OSS Differential Revision: D20236771 Pulled By: wanchaol fbshipit-source-id: 1e0bd4eec14ffebeffdb60b763b8d6f0e427eb64	2020-03-26 17:17:52 -07:00
Shen Li	c9117f27c4	Fix final callbacks for reentrant backwards (#35066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35066 Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. Failed Attempt The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. Current Solution Final callbacks are stored as a member variable in the GraphTask. * Insertion: use the thread_local current_graph_task to find the target GraphTask, and append final callback. * Deletion: final callbacks have the same lifetime as a GraphTask * Execution: Use the GraphTask provided in the argument to find final callbacks. Test Plan: Imported from OSS Differential Revision: D20546474 Pulled By: mrshenli fbshipit-source-id: d3f3449bb5af9f8703bcae63e6b52056cd535f11	2020-03-25 13:47:06 -07:00
Nikita Shulga	1c958f8ef9	`Engine::~Engine()` should wait for non-reentrant threads to shutdown (#34529 ) Summary: Because `this` must be valid while `Engine::main_thread` is running, at least for non-reentrant worker threads Pull Request resolved: https://github.com/pytorch/pytorch/pull/34529 Test Plan: Run `test_api --gtest-filter=ModulesTest.InstanceNorm1d` in a loop Differential Revision: D20552717 Pulled By: malfet fbshipit-source-id: a0197671db1b7b1499dda675e43e0826f368bf0d	2020-03-20 00:49:48 -07:00
Nikita Shulga	a22008f91e	Prohibit copying autograd engines (#34567 ) Summary: Make sure that there could not be more than one instance of either `torch::autograd::Engine` or `torch::autograd::python::PythonEngine` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34567 Test Plan: CI Differential Revision: D20390622 Pulled By: malfet fbshipit-source-id: c90595032afc88f552dee52901361b58b282dc1a	2020-03-12 08:06:53 -07:00
Pritam Damania	d30fa4837e	Unify gradient accumulation between distributed autograd and local autograd (#33214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33214 Distributed autograd had some custom logic in terms of how we accumulated gradients. This was mostly done early on to enable basic functionality. Although, in the long term we should merge this logic with what we have in the local autograd engine. A lot of work has gone into ensuring we accumulate grads correctly and efficiently and we should reuse that as a starting point. We can investigate if we need further custom logic for distributed autograd later on if we need additional optimizations. In this PR I've merged the gradient accumulation logic and also the gradient hooks. As a result, now gradient hooks are called in distributed autograd as well. ghstack-source-id: 99838019 Test Plan: waitforbuildbot Differential Revision: D19843284 fbshipit-source-id: 7923d7e871fb6afd3e98dba7de96606264dcb5f3	2020-03-10 01:56:08 -07:00
Vitaly Fedyunin	877ab3afe3	Better handing of Autograd+Fork errors. (#33885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33885 Fixes: #32835 Fixes: #5834 Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp. Test Plan: Imported from OSS Differential Revision: D20144024 Pulled By: VitalyFedyunin fbshipit-source-id: e7cf30568fff5110e9df7fe5b23f18ed992fa17f	2020-02-27 16:07:29 -08:00
Alban Desmaison	ef0f96e92f	[pytorch][PR] update comment in autograd.h for locking (#32222 ) Summary: Just update the comment to make it accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32222 Differential Revision: D19410428 Pulled By: albanD fbshipit-source-id: ad13596382613c2728e674a47049ea4f563964b9	2020-01-15 09:42:24 -08:00
Pritam Damania	fde94e7556	Provide async mode for local autograd engine. (#31230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31230 A major issue with distributed autograd currently is that we block an RPC thread when we call Engine::execute_with_graph_task. To resolve this issue, I've made modifications to the local autograd engine such that `execute_with_graph_task` returns a Future instead. The `execute()` methods for Engine::execute() and DistEngine::execute() still wait() on this Future which ensures there is no change in behavior yet. In follow up PRs we can modify the distributed autograd engine to take advantage of this Future. Closes #26359 ghstack-source-id: 96298057 Test Plan: waitforbuildbot Differential Revision: D18999709 fbshipit-source-id: 388f54467fd2415a0acb7df17bd063aedc105229	2020-01-05 00:29:28 -08:00
Pritam Damania	776fdda753	Add debug info API for distributed autograd. (#30642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30642 Adding a couple of basic metrics for distributed autograd which would help in determining stuckness. ghstack-source-id: 95156189 Test Plan: waitforbuildbot Differential Revision: D18776478 fbshipit-source-id: a0556ad6fe2b7c3cd0082ee2350c1c78cafaaec5	2019-12-07 13:56:51 -08:00
Pritam Damania	e8e7d93293	Additional autograd unit tests for Python UDFs. (#29041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29041 1) Enhanced autograd unit tests to test the torch.distributed.autograd.backward() API more thoroughly on Python UDFs. 2) Enhanced `python_error` to override `what` such that it returns an appropriate error string if we call `what()` on this error. This ensures we can propagate exceptions over the wire during RPCs (since we get the error string by calling what() on the exception) ghstack-source-id: 93098679 ghstack-source-id: 93098679 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D18273041 fbshipit-source-id: 85d3932fed6337668a812367fdfce233c1b3ff8e	2019-11-01 18:30:09 -07:00
Edward Yang	08860721ad	Revert D18195584: Additional autograd unit tests for Python UDFs. Test Plan: revert-hammer Differential Revision: D18195584 Original commit changeset: b795daf644ba fbshipit-source-id: 413dac34f1a28e0a591893f43e116f006fd3f2be	2019-11-01 06:59:54 -07:00
Pritam Damania	3bba751cd6	Additional autograd unit tests for Python UDFs. (#28824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28824 1) Enhanced autograd unit tests to test the torch.distributed.autograd.backward() API more thoroughly on Python UDFs. 2) Enhanced `python_error` to override `what` such that it returns an appropriate error string if we call `what()` on this error. This ensures we can propagate exceptions over the wire during RPCs (since we get the error string by calling what() on the exception) ghstack-source-id: 92972494 Test Plan: waitforbuildbot Differential Revision: D18195584 fbshipit-source-id: b795daf644ba1816fdec484545192ab55a2f71e7	2019-10-31 14:03:00 -07:00
Pritam Damania	1322daa506	Improve error handling for distributed autograd engine. (#27940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27940 1) If we receive an error for outstanding rpcs, we enqueue an appropriate error on the local autograd engine. 2) Add an `exit_on_error` mode for the local autograd engine, where the computation stops if we see an error. ghstack-source-id: 92603377 Test Plan: Added unit tests to test failures. Differential Revision: D17916844 fbshipit-source-id: 199a7832f1033c36a9bbcc1e80d86576c04965d0	2019-10-25 12:07:27 -07:00
Pritam Damania	3bccd3fc0d	Distributed Autograd - FAST mode backward pass implementation. (#27022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27022 This change implements the "FAST" mode distributed autograd backward pass as described in https://github.com/pytorch/pytorch/issues/23110. At a high level the backward pass works as follows: 1. We start by computing dependencies on the node that calls `torch.distributed.backward`. 2. This node computes the dependencies starting from the root nodes provided in the backward call and all the 'send' functions present in the current autograd context. The "FAST" mode assumes all 'send' functions are part of the autograd computation. 3. Once the dependency computation is done, the distributed autograd engine calls the local autograd engine to execute the autograd graph. Note that the autograd graph on a single node is not necessarily connected because of inter-node communication. As a result, we have special handling to ensure the local autograd engine ensures we execute the entire graph starting from the provided roots and all 'send' functions on the node. 4. When the local autograd engine hits a 'recv' function, it performs an async RPC to send the gradients over to the appropriate node and stores a future in the autograd context to keep track of this RPC. 5. On the destination node, the appropriate 'send' function is looked up and enqueued on the local autograd engine. If this is the first time the node is hearing about this autograd context id on the backward pass, then the node computes dependencies for the local autograd engine. 6. As part of compute dependencies, the distributed autograd engine discovers all leaf nodes and ensures those are passed as 'outputs' to the local autograd engine. This avoids running the 'AccumulateGrad' function. 7. The gradients computed for the leaf nodes are then actually accumulated in `DistAutogradContext` for the appropriate autograd context id. 8. The distributed autograd engine waits for the local autograd engine to complete and also waits for all the 'Futures' (stored in 4.) for respective RPCs to finish. We have made the following changes to the local autograd engine for this purpose: 1. Expose GraphTask and NodeTask so that the distributed autograd engine can use them. 2. Expose a `execute_with_graph_task` API which gives the distributed engine to build a GraphTask and pass it to the local autograd engine. 3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build a `NodeTask` for a 'send' function and enqueue it on the local autograd engine. In addition to this a few general improvements: 1. Added a `PropagateGradients` RPC call for the 'recv' function to pass gradients to the appropriate node during the backward pass. 2. Use IValues as much as possible in serialization for RpcWithAutograd. 3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate exception instead of just returning the message. This is inline with what most Future.wait() APIs do. 4. Added a `get_gradients(context_id)` API which allows users to retrieve a map from Tensor to respective gradient for the provided context_id on the local node. ghstack-source-id: 91794926 Test Plan: unit tests. Differential Revision: D17652615 fbshipit-source-id: 96f65c52adb2706ee29f4b49e1655afaa0a3bec3	2019-10-12 09:47:49 -07:00
mal	e7a9b0d62f	Rename torch::autograd::Function to torch::autograd::Node Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23269 Test Plan: Imported from OSS Differential Revision: D16454878 fbshipit-source-id: b1e840fc2d3901955280d141e5ad6efd5e9d66af	2019-07-23 20:52:22 -07:00
mal	0140a756d8	Prioritize reentrant tasks and execute them recursively until close to limit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22397 Test Plan: Added test for reentrant backwards with checkpoint and a test for a recursive backwards function (which should fail if we run all the reentrant tasks recursively in the same thread) and for testing priority of reentrant tasks. ~~Will add a test for priority of reentrant tasks in future pr.~~ Imported from OSS Differential Revision: D16131955 fbshipit-source-id: 18301d45c1ec9fbeb566b1016dbaf7a84a09c7ac	2019-07-05 08:51:06 -07:00
mal	c8b5f1d2f8	Switch autograd to use a pool of workers for each device (#21911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21911 ghimport-source-id: 3b7d37481201aa4b4ca8f7767603d0dfd13f871f Test Plan: Tested on https://github.com/pytorch/pytorch/issues/6959 and ensured no Recursion Error. Performance testing: [word_language_model](https://gist.github.com/malvika2147/34c214871d549f9275812f2d20506990) (no significant change) [mnist](https://gist.github.com/malvika2147/77890eef102099490a1029122fb20dd0) (no significant change) [Comparison of performance](https://gist.github.com/malvika2147/c0a8790910b8513bd2e20b224bdd6300) on https://github.com/pytorch/pytorch/issues/6959 with smaller inputs. (slower by about ~25%, expected) Imported from OSS Differential Revision: D15985852 fbshipit-source-id: ca172690857fd1718462b80f3a244af9d8825d6c	2019-06-25 09:08:26 -07:00
Malvika Joshi	9deab0cf0e	Documentation for locking discipline in engine.cpp/.h (#21548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21548 Added documentation as titled. Reviewed By: ezyang Differential Revision: D15723146 fbshipit-source-id: fab4a35c62f07256673318c0874701f7628b2f7a	2019-06-10 07:50:01 -07:00
mal	4980b8b95c	Renaming member variables in engine.cpp/h (#21283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21283 ghimport-source-id: 360a138e420ace3cd4ca6ccbc761c8e68319440d Differential Revision: D15607428 fbshipit-source-id: f8df6b42796a49c4d68fa8366b6a68d5715f6421	2019-06-03 12:54:50 -07:00
Davide Libenzi	272a48f6fe	Enable autograd to recognize the XLA backend as one providing multiple devices (#17847 ) Summary: …e devices, while not being CUDA/HIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17847 Differential Revision: D14545634 Pulled By: ezyang fbshipit-source-id: 417181bf2ff4f8978544afe2fb6b042e787854ed	2019-03-20 13:58:36 -07:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Peter Goldsborough	dccd0f2de6	Bag of clang tidy fixes for torch/csrc/ and torch/csrc/autograd (#11050 ) Summary: Linting `torch/csrc/` (non-recursive) and `torch/csrc/autograd` (non-recursive). Fixed things like: - `typedef` vs `using` - Use `.empty()` instead of comparing with empty string/using `.size() == 0` - Use range for loops instead of old style loops (`modernize-`) - Remove some `virtual` + `override` - Replace `stdint.h` with `cstdint` - Replace `return Type(x, y)` with `return {x, y}` - Use boolean values (`true`/`false`) instead of numbers (1/0) - More ... ezyang apaszke cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11050 Differential Revision: D9597505 Pulled By: goldsborough fbshipit-source-id: cb0fb4793ade885a8dbf4b10484487b84c64c7f2	2018-09-05 19:55:50 -07:00
Peter Goldsborough	04939a4745	Match parameter names and = default (#9737 ) Summary: More clang tidy cleanups in `torch/csrc`. This time: 1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration) 2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs. Also updated my script a little bit. apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737 Differential Revision: D9069069 Pulled By: goldsborough fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae	2018-07-30 14:10:00 -07:00
peter	53083b8353	Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491 ) (#9491 ) Summary: Fixes #9092. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693 Differential Revision: D8946850 Pulled By: ezyang fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707	2018-07-23 06:40:39 -07:00
albanD	78e3259bbe	Add autograd automatic anomaly detection (#7677 ) * add autograd automatic anomaly detection * python 3 string support * Fix non python build * fix typo in doc * better test and naming fix * fix no python build and python object handling * fix missing checks * clean NO_PYTHON build * Remove unwanted changes	2018-06-11 21:26:17 -04:00
Zachary DeVito	23dd033b51	Factor python dependency out of interpreter (#7970 ) * Factor python dependency out of interpreter * Remove NO_PYTHON for the autograd engine If there is no python bindings, then a default Engine is constructed the first time it is requested. If the python libraries are loaded, then they override the default accessor and the default engine becomes a python Engine. Note: it is possible for two engines to be generated if a non-python one gets created before the python bindings are loaded. This case is rare, and just results in additional threads being spawned. * Fixing AlexNet test which is skipped in CI	2018-06-01 16:07:21 -04:00
Peter Goldsborough	28b1a3852c	Add backward() to Tensor and Variable (#7774 ) * Add backward() to Tensor and Variable * Add at:: in front of Tensor * Trying to not move optional to appease windows? * Move implementation into cpp file * Undo some formatting changes	2018-05-24 17:31:41 -07:00
Will Feng	60745b3380	Revert #7750 and #7762 to fix Windows CI on master (#7772 ) * Revert "Add missing brace (#7762)" This reverts commit `ea27c5af50`. * Revert "[C++ API] Add backward() to Tensor and Variable (#7750)" This reverts commit `1e2762796f`.	2018-05-22 15:42:52 -07:00
Peter Goldsborough	1e2762796f	[C++ API] Add backward() to Tensor and Variable (#7750 ) * Add backward() to Tensor and Variable * Added a couple tests	2018-05-22 10:43:04 -07:00
Priya Goyal	e3196e0ea8	[Re-checkpointing] Autograd container for trading compute for memory (#6467 ) * Autograd container for trading compute for memory * add a unit test for checkpoint * address comments * address review comments * adding some docs for the checkpoint api * more comments * more comments * repro bug * Fix a subtle bug/apply some review comments * Update checkpoint.py * Run everything in grad mode * fix flake and chunk=1 * use imperative backward as per discussion * remove Variable and also add models and test for models * Add a simple thread local variable to check for autograd grad mode * remove models and models test after debugging * address review comments * address more comments * address more comments	2018-04-10 15:26:24 -04:00
Luca Antiga	396637cdd6	Python-free build of autograd + jit (#5356 ) This PR adds the possibility to build the C++ parts of autograd and jit, with no dependency on Python. The goal is to allow taking a PyTorch IR representation (a tree s-expr) and running it with provided inputs. Prerequisite: build PyTorch so that codegen runs once. Instructions: cd tools/cpp_build bash build_all.sh This will build libtorchjit and torchjit_test in tools/cpp_build/build/torchjit-build. The latter basically runs the code in test_jit.cpp for now. While writing the PR, it turned out that a few of Python.h includes were redundant. They were removed here (PyTorch tests still pass on my machine, we'll see CI). * Introduce Python-free builds of autograd and jit * Remove NO_PYTHON ifdef in functions/special	2018-03-08 15:13:10 -05:00
Peter Goldsborough	702a7f3864	Improve Function interface (#5221 ) * Improve Function interface * Undo tracer changes * Fix bug in VariableType.set_history * Rename function_counter and sequence_number to sequence_nr * Clarify Function documentation * Replace swap_next_edges with next_edges() getter * Bring back set_gradient_edge * Simplify special.cpp * add_gradient_edge -> create_gradient_edge * Add mutable getters for pre/post hooks * Use make_variable with Edge * Remove remove_gradient_edge in favor of detach_ * Fix documentation and remove create_gradient_edge friend method * Canonicalize some includes	2018-02-21 16:37:52 -05:00
Adam Paszke	79d15c52cb	Improve the engine support for functional graph execution (#4690 ) Previously the side-effect free grad calculation was performed using callbacks that could also override the decision to run a function. However this had a few problems e.g. it forced us to iterate over pretty much all functions in the graph and drop their buffers. This patch improves the mechanism, by adding explicit support for this kind of evaluation in execute(). It's safer, and the algorithm used to decide which nodes have to be evaluated was replaced with a faster one.	2018-01-18 11:20:30 +01:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
Adam Paszke	cf407213f9	Clean up stochastic function related dead code (#3782 )	2017-11-20 12:44:45 -05:00
Adam Paszke	9f0c4c9f9a	Make autograd engine reentrant without creating new threads	2017-09-05 17:48:55 -04:00
Adam Paszke	f83c4fad7b	Fix exception propagation from recursive Engine calls	2017-09-05 17:48:55 -04:00
Adam Paszke	594f98ce16	Support multi-stage AutogradClosures	2017-09-05 17:48:55 -04:00
Adam Paszke	fa308b3183	Improve backward tracing	2017-09-05 17:48:55 -04:00
Sam Gross	42485d87c2	Set the current device in each engine's thread (#2081 ) Fixes #2017	2017-07-13 16:24:38 -04:00
Edward Z. Yang	3ada9da808	Make csrc -Werror clean. (#1795 ) Primary things I had to fix: - Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included first, because it always unconditionally defines this macro. - Turn off strict aliasing, because Python 2 doesn't work with strict aliasing. - Workaround setuptools bug, where it's incorrectly passing -Wstrict-prototypes to C++ compilers (where this doesn't make any sense) To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext` Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 20:18:09 -04:00
Adam Paszke	86a065e45b	Add end callbacks to the engine	2017-06-12 21:58:38 -04:00
Adam Paszke	fa93653d09	Improve handling of graph roots in autograd engine (#1635 )	2017-05-24 14:50:07 -04:00
Edward Z. Yang	1f3ff5ced2	Miscellaneous documentation around autograd. (#1577 ) * Miscellaneous documentation around autograd. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-17 19:19:24 -04:00
Adam Paszke	e5db8f98be	Add torch.autograd.differentiate	2017-05-01 16:44:56 -04:00
Adam Paszke	2ca787fcf4	Refactor attribute names in autograd	2017-05-01 16:44:56 -04:00
albanD	71303b8af4	Autograd deadlock for recent glibc fix (#1243 )	2017-04-12 22:24:31 +02:00
Sam Gross	34ce58c909	Parallelize backwards	2017-03-03 11:26:00 -08:00
Adam Paszke	5e150caf38	Fix a bug in Engine::compute_dependencies	2017-02-17 10:40:08 +05:30
Sam Gross	bd5303010d	Refactor autograd package to separate Python dependencies. (#662 ) The core autograd Variable, Function, and Engine no longer depend on the Python API. This let's us implement functions in C++. In the future, we can also multithread engine and release the GIL for most of the non-Python backwards.	2017-02-13 16:00:16 -08:00
Adam Paszke	0325e2f646	Major autograd refactor Improves autograd performance by more than 2x and fixes a couple of bugs. All core functions have been moved to C.	2016-10-13 17:17:49 -07:00

1 2 3

108 Commits