pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Aaron Gokaslan	748bac8757	[BE]: Apply pyupgrade yield from and unit test alias upgrades (#94309 ) Applies some more harmless pyupgrades. This one gets rid of deprecated aliases in unit_tests and more upgrades yield for loops into yield from generators which are more performance and propagates more information / exceptions from original generator. This is the modern recommended way of forwarding generators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94309 Approved by: https://github.com/albanD	2023-02-07 20:08:58 +00:00
Nikita Shulga	1906eaf22f	[BE] Get rid of `future` (#92596 ) PyTorch has been Python-3.X+ for ages, so it's a shame to still rely on `future.utils` even in a deprecated Caffe2 codebase For the reference: https://peps.python.org/pep-0469/#migrating-directly-to-python-3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92596 Approved by: https://github.com/kit1980, https://github.com/orionr	2023-01-19 08:46:50 +00:00
Ankur Singla	549ef1d668	[caffe][memonger] Extend operator schema check to dag memonger (#48021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48021 Extending operator schema check for simple memonger to dag memonger as well. As part of this a fix is being made to handle inplace ops (having at least one output name same as input blob). Earlier all the output blobs from ops were being treated as shareable but it failed assertion of external input blobs with the same name not allowed to share. Test Plan: Added corresponding unit tests Reviewed By: hlu1 Differential Revision: D24968862 fbshipit-source-id: b6679a388a82b0d68f65ade64b85560354aaa3ef	2020-11-16 19:17:55 -08:00
Ankur Singla	f743b5639a	[caffe2][memonger] Add support for distributed inference predict nets in DAG memonger (#47718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47718 Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs. As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error. Test Plan: Added corresponding tests in memonger_test.py . Could not find unit tests in c++ version of memonger. Reviewed By: hlu1 Differential Revision: D24872010 fbshipit-source-id: 1dc99b2fb52b2bc692fa4fc0aff6b7e4c5e4f5b0	2020-11-13 14:12:07 -08:00
Richard Zou	17c58720fe	Revert D24346771: [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger Test Plan: revert-hammer Differential Revision: D24346771 (`5882f2e540`) Original commit changeset: ad2dd2e63f3e fbshipit-source-id: 90346f08c890eebe71f068748a8e24e4db88c250	2020-11-10 12:11:22 -08:00
Ankur Singla	5882f2e540	[caffe2][memonger] Add support for distributed inference predict nets in DAG memonger Summary: Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs. Here is one reference part0 predict net with AsyncIf ops: https://www.internalfb.com/intern/paste/P145812115/ As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error. Reviewed By: hlu1 Differential Revision: D24346771 fbshipit-source-id: ad2dd2e63f3e822ad172682f6d63f8474492255d	2020-11-10 09:35:28 -08:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Christopher Whelan	7a9ae52550	[hypothesis] Deadline followup (#42842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42842 Test Plan: `buck test` Reviewed By: thatch Differential Revision: D23045269 fbshipit-source-id: 8a3f4981869287a0f5fb3f0009e13548b7478086	2020-08-11 15:33:23 -07:00
Christopher Whelan	5cd0f5e8ec	[PyFI] Update hypothesis and switch from tp2 (#41645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41645 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1405 Test Plan: buck test Reviewed By: thatch Differential Revision: D20323893 fbshipit-source-id: 54665d589568c4198e96a27f0ed8e5b41df7b86b	2020-08-08 12:13:04 -07:00
rohithkrn	aa88c2c0b6	Unify gpu_support variable in python tests (#16748 ) Summary: Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748 Differential Revision: D13983132 Pulled By: bddppq fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f	2019-02-07 00:29:51 -08:00
rohithkrn	0d663cec30	Unify cuda and hip device types in Caffe2 python front end (#14221 ) Summary: Goal of this PR is to unify cuda and hip device types in caffe2 python front end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221 Differential Revision: D13148564 Pulled By: bddppq fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b	2018-11-29 14:00:16 -08:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	85b08f1b99	Trying to fix all networkx 2 issues. Summary: Basically: - more generator vs list changes. - difference in the return type of bellman_ford(), see _get_path. 2.x returns list. - nx 2 removed nbunch in topological_order, so we will need to manually use lexicographical_topological_sort with an explicit key derived from the source node order. Closes https://github.com/caffe2/caffe2/pull/1243 Reviewed By: ajtulloch Differential Revision: D5883195 Pulled By: Yangqing fbshipit-source-id: 215d01fdd026d3af1a11ff866bf835e104370e4c	2017-09-21 16:01:47 -07:00
Yangqing Jia	84182b1853	Partially fix memonger with networkx 2.0 Summary: This fixes the apparent discrepancy (list vs iterator). After this, there are still 3 failures regarding topological sort but that seems a bit involved. Someone shall look deeper. Closes https://github.com/caffe2/caffe2/pull/1242 Reviewed By: akyrola Differential Revision: D5881806 Pulled By: Yangqing fbshipit-source-id: 5a200010724befde2fa8ce1b61a9c1ba42cad46a	2017-09-21 10:24:41 -07:00
Aapo Kyrola	3ff351fc89	insert Free ops when blob used last time + memory allocation estimator Summary: release_blobs_when_used() will analyze when a blob is output the last time, and insert a Free op after that. Unless the blob was aliased. memonger.estimate_memory_usage() does a static memory analysis based on shape inference. See experimental/akyrola/test.py for example use. Reviewed By: asaadaldien Differential Revision: D5729199 fbshipit-source-id: 527a5152dbd4ef3bbe28b776c29163fff25f700a	2017-09-05 12:03:04 -07:00
Aapo Kyrola	885d9a7796	fix memonger for RecurrentNetworks Summary: When we ported to memonger to C++ in D5544219, we forgot to include the special handling of RecurrentNetwork ops. This fixes that and adds a test. Reviewed By: asaadaldien Differential Revision: D5692407 fbshipit-source-id: 4e739b5dd6c7298303eee9bfa1aa4d19359eb7b5	2017-08-25 16:01:25 -07:00
Aapo Kyrola	23209152a9	fix memonger test for open source by checking for cuda support Summary: This test was failing on non-GPU builds because it refers to operator CopyGPUToCPU. Thanks pietern for catching this. Reviewed By: asaadaldien Differential Revision: D5698763 fbshipit-source-id: 0bde0f3e99c58647dba2ea6da4d51938e763d10c	2017-08-24 12:02:38 -07:00
Aapo Kyrola	7fad4be4c6	Device-specific memongering Summary: Enforce that blobs don't mix between operators on different GPUs or CPU/GPU. Add test. + Fix memonger when no namescope is provided. Reviewed By: asaadaldien Differential Revision: D5644708 fbshipit-source-id: 0cb361efd6361b6e2138462584bab6b4de039b5d	2017-08-17 13:31:26 -07:00
Aapo Kyrola	59c0bb9e5a	fix for duplicate input case Summary: Fix a bug reported by dzhulgakov that occurs when input blobs is used twice in a same op --> it was released to the recycled blobs pool twice. Reviewed By: dzhulgakov, volkhin Differential Revision: D5414023 fbshipit-source-id: 861bb46fe901023cb9a496401736e6ecb77d5fae	2017-07-13 01:51:30 -07:00
Aapo Kyrola	ad62e82179	fast simple-net memonger for C++ Summary: To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used. Next I will integrate this with predictor and run canary (separate diff). Reviewed By: asaadaldien Differential Revision: D5375392 fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d	2017-07-06 15:17:07 -07:00
Aapo Kyrola	21ba0ff560	small fix to when input blob is input to multiple ops Summary: Memonger had a bug that it crashes if an input blob was input to multiple ops. This fixes that and adds a test. Reviewed By: asaadaldien Differential Revision: D5374860 fbshipit-source-id: 1d5044001eacdbe6db43f69727da9297558f5c5c	2017-07-05 22:37:26 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Ben Zhang	e128245e8c	Move memonger graph equality into memonger Summary: Lets try this again. Verify graphs every time memonger is run. Will definitely check for time though. Reviewed By: akyrola Differential Revision: D5308188 fbshipit-source-id: 512a76c759b670d31c49d1d492dd8ee1eaf3bafd	2017-06-28 17:36:40 -07:00
Ben Zhang	f937e4bffb	Revert D5288993: Memonger Graph Equality into Memonger Summary: This reverts commit b9f105ce00148b2673eed2dd390ab74f82f990ad Differential Revision: D5288993 fbshipit-source-id: 8f2e69c0ca21e142eb43b450d0b52ba76a5e429f	2017-06-21 13:45:50 -07:00
Peizhao Zhang	8464ec5c3a	Fixed a bug in compute_interference_graph() when using with multiple in-place operators. Summary: compute_interference_graph() was not able to handle the case when a blob is reused twice for operators supporting in-place parameters. For example, for the following network with operators Mul and Sub (blob) -> [Mul] -> (blob) -> [Sub] -> (blob) an incorrect edge will be added from [Sub] to [Mul] and causes nx.is_directed_acyclic_graph() to fail. Reviewed By: ajtulloch Differential Revision: D5271604 fbshipit-source-id: f6095b6f8e1dba556ba223a82c8170be7f744529	2017-06-21 12:01:37 -07:00
Ben Zhang	f222e226b4	Memonger Graph Equality into Memonger Summary: Make verify_graph_equality get called by share_grad_blobs and optimize_inference_for_dag Reviewed By: akyrola Differential Revision: D5288993 fbshipit-source-id: b9f105ce00148b2673eed2dd390ab74f82f990ad	2017-06-21 10:09:15 -07:00
Aapo Kyrola	5084ff3b9b	improve blob sharing Summary: Since D5193393 introduced a "token" system for memonger that prevents sharing of blobs across parallel branches, we can be more aggressive in blob sharing. Thus, this removes the tracking of 'unused free blobs' and just relies on the token system. For forward-only resnet50, this reduces the number of shared blobs to 5 (optimal accorsing to akirillov's calculation). This requires careful testing, so I will not land it soon. Reviewed By: asaadaldien Differential Revision: D5208985 fbshipit-source-id: 2e520c4ea2351a2ec327b6c5f2e3af24234d1c9a	2017-06-20 12:08:57 -07:00
Ben Zhang	1ec0b89361	Memonger Graph Verifier Summary: We want to make sure that a graph optimized by memonger doesn't have any possibility of two threads writing into the same output blob at the same time, when blobs are renamed. Creates a graph where edges are built such that a parents node's output blob is a child node's input blob, and there is no node in between the parent and child node that writes to the same blob. If two nets generate the same such graph, then the "path" of data is the same. Reviewed By: akyrola Differential Revision: D5210385 fbshipit-source-id: 6317fc4e16289339b50c2dcd86ec8b32d2d544a5	2017-06-19 00:46:32 -07:00
Junjie Bai	be7c336626	Deprecate CNNModelHelper in python/memonger_test.py Summary: Also fixed a small bug in ModelHelper constructor Reviewed By: harouwu Differential Revision: D5246799 fbshipit-source-id: 3719ca078f0e2b5e463fc93da9c8215f5583bd9a	2017-06-15 10:06:57 -07:00
Aapo Kyrola	27e01744b2	Probably fixed memonger Summary: This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test. 1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain. 2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs. 3) Added a harder unit test that failed before. 4) Added test for resnet50 + memonger Reviewed By: asaadaldien Differential Revision: D5193393 fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711	2017-06-08 09:19:24 -07:00
Luke Yeager	dc517b6c42	Change hypothesis settings for slow memonger test Summary: Failure mode: ``` - 7 passing examples, 0 failing examples, 0 invalid examples - Typical runtimes: 12-14987 ms - Stopped because settings.timeout=60 ``` After this change: ``` - 5 passing examples, 0 failing examples, 0 invalid examples - Typical runtimes: 12-15475 ms - Stopped because settings.max_examples=5 ``` Obviously, the `DYNAMIC_PROGRAMMING` tests are the troublemakers. An alternate solution would be to make separate tests for the two assignment algorithms (one fast, one slow). Closes https://github.com/caffe2/caffe2/pull/676 Differential Revision: D5147363 Pulled By: akyrola fbshipit-source-id: 85d9f8198e53c10de2a8d6645e2b0eb7953c96e0	2017-05-30 09:16:48 -07:00
Aapo Kyrola	da6b82b810	fix another bug related to in-place ops --> treat in-place ops like any other Summary: D5116828 changed how in-place ops were hanled in memonger and fixed a crash in NeuralMT. However, it still produced incorrect memongerization, because an op with one inplace input-output but another non-inplace output would be handled still incorrectly, as the other output's branch would not be followed properly. This is fixed by actually removing the whole in-place op special handling. This actually is not needed anymore, it was leftover from an older version of memonger that used topological sort of the ops. Reviewed By: asaadaldien Differential Revision: D5128142 fbshipit-source-id: b551b0faebdde410e6bd7516958c63cf610cc065	2017-05-24 23:32:03 -07:00
Aapo Kyrola	6c511f64cc	fix handling of ops with in-place input/output Summary: Memonger ignores ops with input and output in-place, but did not work correctly if there were also non-inplace inputs, like with Mul. Simple fix to also look at in-placeness during the traversar. Reviewed By: jhcross Differential Revision: D5116828 fbshipit-source-id: 52817f1221597986cc09cc65d094417c1923d965	2017-05-23 18:23:33 -07:00
Aapo Kyrola	f82a510be6	share forward activation blobs + pass unused free blobs down all branches + use shape infernece Summary: Added optional support for using activation blobs for sharing as well. Doing this change revealed an non-optimal implementation in the blob sharing: we need to prefer to reuse freeblobs by prefering those blobs that are already shared by many other blobs. Otherwise the memory usage can increase when the pool of 'free blobs' grows. Also, my first version only passed "free blobs" (i.e blobs in recycling pool) down the first branch when operators forked. But now we pass those blobs that were not used by the first branch down the second branch and so on. Also added support for blob size information in the heuristic. This uses the shape inference mechanism. I had to also do some small tweaks: - use Sum() operator as a way to match shapes of blobs that had otherwise unknown shapes. This is related to the Sum() operator that is added to combine multiple incoming gradient inputs (with _autosplit gradients). - a couple of random shape inference fixes This reduces the Resnet-50 memory usage on 64 batch from 9.45 Gig to 8.5 Gig. For a 32 batch, the memory usage is 4330 MiB, down from 4800 MB, compared to Torch's 6856MiB (thanks prigoyal for checking this for me). This is unfortunately quite a bunch to review... Reviewed By: asaadaldien Differential Revision: D4393909 fbshipit-source-id: 9c7c94125f96512bea80463ebcb63c215ef95ff9	2017-04-25 14:23:25 -07:00
Aapo Kyrola	3c9dfe4736	dag-compatible forward memonger Summary: Memonger's inference optimization is very efficient, but does not work if a multi-threaded DAG net is used. So I added this alternative that shares code with the gradient memonger and does the blob recycling by traversing the DAG and ensuring that blobs do not pass parallel branches. Reviewed By: viswanathgs Differential Revision: D4884303 fbshipit-source-id: dfd0a6ecdb91f4edbb0b743729c92f4cd015602e	2017-04-13 22:08:09 -07:00
Peizhao Zhang	cb3bd0ede8	Added a DP + recursion algorithm for finding optimal blob assignments based on blob sizes. Summary: Added a DP + recursion algorithm for finding blob assignments based on blob sizes. This algorithm gives optimal assignments. See comments for details. The algorithm is not used by default, set algo=memonger.AssignmentAlgorithm.DYNAMIC_PROGRAMMING and provide blob_sizes in optimize_interference() to use it. The blob sizes could be retrieved by running the net once and then calling blob_sizes = memonger.collect_blob_sizes(net). All blob sizes are assumed to be 1 if blob_sizes is not provided. In this case, using algo=memonger.AssignmentAlgorithm.GREEDY may be better. Testing on the segmentation model, the memory usage is reduced by 19% (14.96MB to 12.08MB) comparing using the greedy algorithm (without considering conv share buffer). The algorithm runs in 15s for the model with 55 sharable blobs. Reviewed By: ajtulloch Differential Revision: D4818476 fbshipit-source-id: 606936f4cf2715408d60b9a5cf3bcaf1985a0fec	2017-04-07 02:18:08 -07:00
Peizhao Zhang	59f464434d	Used blob sizes for finding assignments in a greedy way. Summary: Used blob sizes for finding assignments in a greedy way. Reviewed By: ajtulloch Differential Revision: D4818159 fbshipit-source-id: 89180a6117ba5be058e1d2f9488b06d618e91917	2017-04-06 12:36:38 -07:00
Peizhao Zhang	a54000dc6a	Added an ordering function to reduce live spans of computed blobs. Summary: Added an ordering function (topological_sort_traversal_longest_path()) to reduce live spans of computed blobs. The idea is to sort the ops based on the length of the execution path so that ops in longer path will be used first. Tested on segmentation model with on-the-fly decoder and reduced memory usage from 21.7MB to 14MB (original size is 33MB with compressed parameters and without considering the conv buffer), comparing to use topological_sort_traversal() as the ordering function. It is a general ordering function so I put it in memonger.py directly. Reviewed By: ajtulloch Differential Revision: D4790135 fbshipit-source-id: e661b45c1640de44ce1a9fdd009a4fba38f8e042	2017-04-06 12:20:39 -07:00
Aapo Kyrola	95b3309a87	Gradient Input memory sharing using memonger blob sharing Summary: This diff brings us to roughly par with Torch on ResNet memory usage. On batch size 32, Resnet-50 took 7497MiB, after this 5010 MiB. This will thus allow us to handle 64 images / GPU, or 256 images / 4 GPUs. In addition, I added a special argument to DagNet that causes it to run only one thread for the first iteration. This is needed since there are allocations on the first iteration's backward pass due to gradient sharing, and this will cause NCCL to deadlock. The sharing of gradient buffers requires inferring which gradients can share memory (i.e that they are not used concurrently). Previous memonger code uses topological sort, but rbgirshick showed that it does not work with tree-like models. Thus, I wrote a new optimization algorithm based on DFS. It takes about 0.25 secs / GPU on resnet-50, so is clearly fast enough. Module data_parallel_model supports this feature natively. Reviewed By: prigoyal Differential Revision: D4363209 fbshipit-source-id: 73b11e7610438098bb11bff0af8075ab0cf2c0f1	2017-01-09 19:44:23 -08:00
Yangqing Jia	09bed67e4f	add untracked files	2016-07-21 11:26:41 -07:00

41 Commits