Commit Graph

50 Commits

Author SHA1 Message Date
Nikita Shulga
1906eaf22f [BE] Get rid of future (#92596)
PyTorch has been Python-3.X+ for ages, so it's a shame to still rely on `future.utils` even in a deprecated Caffe2 codebase

For the reference:
https://peps.python.org/pep-0469/#migrating-directly-to-python-3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92596
Approved by: https://github.com/kit1980, https://github.com/orionr
2023-01-19 08:46:50 +00:00
Tongliang Liao
198d727d01 Remove trailing semicolon. (#74031)
Summary:
Resolve https://github.com/pytorch/pytorch/pull/24388#discussion_r823210924

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74031

Reviewed By: ezyang

Differential Revision: D34820695

Pulled By: soulitzer

fbshipit-source-id: a42ff3a98aae25bda37680b6e1a8d5d6f0468ba4
(cherry picked from commit d428b4f2f8a2af18561e45fecc6617bbc023b68e)
2022-03-13 16:25:42 +00:00
Tongliang Liao
adae0d35d2 RNN args renaming in memonger.
RNN ops may contains link_internal/link_external and alias_src/alias_dst.
They should be renamed together with input/output blobs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/24388
Approved by: https://github.com/ezyang
2022-03-09 20:21:33 +00:00
Avinash Nagaraj Bukkittu
70a09d97d1 Use nodes instead of node
Summary: `networkx 2.4+` replaced `node` attribute to `nodes` in graph object. This caused failures in `caffe2`'s' `topological_sort_traversal_longest_path` function which uses networkx library for topological sort.

Differential Revision: D27718857

fbshipit-source-id: 812fbb613946565d089cc84a20f3cdf7df046e19
2021-04-13 10:45:35 -07:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Tao Wu
08c3339e7c [pyfi] override TP2 networkx -> PyFI networkx (#37764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37764

Auto-generated diff for TP2->PyFI migration.

```
networkx
  TP2 version: 2.0
  PyFI active wheels (networkx):
    py2-darwin           -> 2.3
    py2-platform007      -> 2.2
    py3-darwin           -> 2.3
    py3-platform007      -> 2.3
    py3.7-platform007    -> 2.3
```

#buildmore

excited_python

Test Plan: buildallthethings

Reviewed By: thatch

Differential Revision: D19790867

fbshipit-source-id: d6f893beee794df5408a5117978b534cafc6ec83
2020-05-11 13:20:00 -07:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Tongliang Liao
4f254c3c33 Fix typo "properlyh"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24067

Differential Revision: D16732526

Pulled By: ezyang

fbshipit-source-id: 0f3a5b53c0e46bd40a6e5c838504301766c00a82
2019-08-09 11:43:04 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Dmytro Dzhulgakov
2972a6ca02 Revert D6026557: [caffe2][PR] Fix "No handlers could be found for logger"
Summary:
This reverts commit 95c634872ac02be721257169e38c8fead04cd66b

bypass-lint

Differential Revision: D6026557

fbshipit-source-id: 663c28583ce3b01070ff5449115ed7e222f71776
2017-10-12 20:21:52 -07:00
Luke Yeager
75bece6ede Fix "No handlers could be found for logger"
Summary: Closes https://github.com/caffe2/caffe2/pull/1316

Differential Revision: D6026557

Pulled By: Yangqing

fbshipit-source-id: 95c634872ac02be721257169e38c8fead04cd66b
2017-10-10 22:32:13 -07:00
Andrey Malevich
e13f199452 Switch RNNOp to use NetDef argument for step represenetation.
Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well.

Reviewed By: salexspb

Differential Revision: D5949330

fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f
2017-10-10 22:01:51 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Yangqing Jia
85b08f1b99 Trying to fix all networkx 2 issues.
Summary:
Basically:

- more generator vs list changes.
- difference in the return type of bellman_ford(), see _get_path. 2.x returns list.
- nx 2 removed nbunch in topological_order, so we will need to manually use lexicographical_topological_sort with an explicit key derived from the source node order.
Closes https://github.com/caffe2/caffe2/pull/1243

Reviewed By: ajtulloch

Differential Revision: D5883195

Pulled By: Yangqing

fbshipit-source-id: 215d01fdd026d3af1a11ff866bf835e104370e4c
2017-09-21 16:01:47 -07:00
Yangqing Jia
84182b1853 Partially fix memonger with networkx 2.0
Summary:
This fixes the apparent discrepancy (list vs iterator). After this, there are still 3 failures regarding topological sort but that seems a bit involved. Someone shall look deeper.
Closes https://github.com/caffe2/caffe2/pull/1242

Reviewed By: akyrola

Differential Revision: D5881806

Pulled By: Yangqing

fbshipit-source-id: 5a200010724befde2fa8ce1b61a9c1ba42cad46a
2017-09-21 10:24:41 -07:00
Aapo Kyrola
3ff351fc89 insert Free ops when blob used last time + memory allocation estimator
Summary:
release_blobs_when_used() will analyze when a blob is output the last time, and insert a Free op after that. Unless the blob was aliased.
memonger.estimate_memory_usage() does a static memory analysis based on shape inference. See experimental/akyrola/test.py for example use.

Reviewed By: asaadaldien

Differential Revision: D5729199

fbshipit-source-id: 527a5152dbd4ef3bbe28b776c29163fff25f700a
2017-09-05 12:03:04 -07:00
Aapo Kyrola
7fad4be4c6 Device-specific memongering
Summary:
Enforce that blobs don't mix between operators on different GPUs or CPU/GPU. Add test.

+ Fix memonger when no namescope is provided.

Reviewed By: asaadaldien

Differential Revision: D5644708

fbshipit-source-id: 0cb361efd6361b6e2138462584bab6b4de039b5d
2017-08-17 13:31:26 -07:00
Ahmed Taei
a0fe96d7cd Rewrite memonger DAG in C++.
Summary: This diff replaces the main of the memonger for dag algorithm _compute_blob_recycling_for_dag with a c++ implementation.

Reviewed By: akyrola

Differential Revision: D5544219

fbshipit-source-id: 9f868880c8d0eb997ad3dd39433f9d0b9216d303
2017-08-16 16:17:15 -07:00
Aapo Kyrola
c05c500a82 check _grad suffix
Summary:
Memonger had a subtle bug which caused it to recycle "splitinfo" outputs of Concat/Split. That is bad since they are in CPU device, and woult cause them to be realloaced. This caused big slowdown with Kaiming's trainer.

Bug was that we checked for gradients as contaning "_grad" in the name, although we should only allow it as a suffix. Admittedly, this is not elegant to do string checking anyways, but that is how Caffe2 works now.

Reviewed By: asaadaldien

Differential Revision: D5627251

fbshipit-source-id: c12be2323109bf81c3725d8884c7ef024e010bd5
2017-08-14 19:47:59 -07:00
Aapo Kyrola
8079abbaf1 fix traversal order
Summary: Memonger did not properly track the number of times a blob output has to be produced before an operator can be visited. Actually I remember fixing this before, but well. This bug was manifested in Priya's model, so thanks prigoyal, and benz's model verifier nicely caught the wrong output.

Reviewed By: asaadaldien

Differential Revision: D5524912

fbshipit-source-id: 10f4d7056b84aba0274a918af508ea043e6026f9
2017-07-30 21:47:48 -07:00
Aapo Kyrola
baef769035 add code comments to memonger
Summary: Add some comments to dag-memonger to help asaadaldien with his C++ port.

Reviewed By: asaadaldien

Differential Revision: D5435459

fbshipit-source-id: dd5d482efb017418d22f42ee79fbd4668bd31bdd
2017-07-17 13:07:33 -07:00
Aapo Kyrola
192e0546bf fix for back-and-forth models, pass reference instead of copy
Summary:
akirillov again presented me with a memonger-bug: his model that has kind of a 'back-and-forth structure' where blobs are passed left and right in a ladder-like structure, revealed a bug in memonger: I should pass the set of free blobs as a reference, not a copy so that the recyclings are properly accounted for. Hard to explain.

Since we have the graph verifier, we can be more confident with these changes.

I also added some helpful debug to the graph verifier.

Differential Revision: D5396925

fbshipit-source-id: 0bffb3a0bf8532afcd6b5bc9331c779768a8c5c5
2017-07-11 10:52:14 -07:00
Aapo Kyrola
ad62e82179 fast simple-net memonger for C++
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.

Next I will integrate this with predictor and run canary (separate diff).

Reviewed By: asaadaldien

Differential Revision: D5375392

fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
2017-07-06 15:17:07 -07:00
Aapo Kyrola
21ba0ff560 small fix to when input blob is input to multiple ops
Summary: Memonger had a bug that it crashes if an input blob was input to multiple ops. This fixes that and adds a test.

Reviewed By: asaadaldien

Differential Revision: D5374860

fbshipit-source-id: 1d5044001eacdbe6db43f69727da9297558f5c5c
2017-07-05 22:37:26 -07:00
Thomas Dudziak
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
Ben Zhang
e128245e8c Move memonger graph equality into memonger
Summary: Lets try this again. Verify graphs every time memonger is run. Will definitely check for time though.

Reviewed By: akyrola

Differential Revision: D5308188

fbshipit-source-id: 512a76c759b670d31c49d1d492dd8ee1eaf3bafd
2017-06-28 17:36:40 -07:00
Aapo Kyrola
4d16578284 fix + verification for inplace blobs
Summary:
Fixes a memonger bug where it could recycle a blob that was released by the same op being processed.
Added a verification step to ensure in-place assignments are not changed.

Reviewed By: asaadaldien

Differential Revision: D5331495

fbshipit-source-id: 20b08f6de5b973e8c9868aa048c142cac1eb6c58
2017-06-27 13:51:03 -07:00
Ben Zhang
4862c0f47f Memonger in O(blobs)
Summary:
Made them faster.

This should be equivalent to the algorithm akyrola suggested, just with a list (of parents) as an intermediate representation instead of a string.

Reviewed By: akyrola

Differential Revision: D5308133

fbshipit-source-id: c976a513d10e79c157ea803afb99b147e9ea3357
2017-06-26 11:04:13 -07:00
Thomas Dudziak
342de07231 Core unit test fixes for Python 3
Summary: As title

Differential Revision: D5291327

fbshipit-source-id: 7dd9279c53ba55d3422c31973ffcec5705787fdf
2017-06-23 13:22:16 -07:00
Ben Zhang
f937e4bffb Revert D5288993: Memonger Graph Equality into Memonger
Summary: This reverts commit b9f105ce00148b2673eed2dd390ab74f82f990ad

Differential Revision: D5288993

fbshipit-source-id: 8f2e69c0ca21e142eb43b450d0b52ba76a5e429f
2017-06-21 13:45:50 -07:00
Peizhao Zhang
8464ec5c3a Fixed a bug in compute_interference_graph() when using with multiple in-place operators.
Summary:
compute_interference_graph() was not able to handle the case when a blob is reused twice for operators supporting in-place parameters. For example, for the following network with operators Mul and Sub

(blob) -> [Mul] -> (blob) -> [Sub] -> (blob)

an incorrect edge will be added from [Sub] to [Mul] and causes nx.is_directed_acyclic_graph() to fail.

Reviewed By: ajtulloch

Differential Revision: D5271604

fbshipit-source-id: f6095b6f8e1dba556ba223a82c8170be7f744529
2017-06-21 12:01:37 -07:00
Ben Zhang
f222e226b4 Memonger Graph Equality into Memonger
Summary: Make verify_graph_equality get called by share_grad_blobs and optimize_inference_for_dag

Reviewed By: akyrola

Differential Revision: D5288993

fbshipit-source-id: b9f105ce00148b2673eed2dd390ab74f82f990ad
2017-06-21 10:09:15 -07:00
Aapo Kyrola
5084ff3b9b improve blob sharing
Summary:
Since D5193393 introduced a "token" system for memonger that prevents sharing of blobs across parallel branches, we can be more aggressive in blob sharing. Thus, this removes the tracking of 'unused free blobs' and just relies on the token system.
For forward-only resnet50, this reduces the number of shared blobs to 5 (optimal accorsing to akirillov's calculation).

This requires careful testing, so I will not land it soon.

Reviewed By: asaadaldien

Differential Revision: D5208985

fbshipit-source-id: 2e520c4ea2351a2ec327b6c5f2e3af24234d1c9a
2017-06-20 12:08:57 -07:00
Ben Zhang
1ec0b89361 Memonger Graph Verifier
Summary:
We want to make sure that a graph optimized by memonger doesn't have any possibility of two threads writing into the same output blob at the same time, when blobs are renamed.

Creates a graph where edges are built such that a parents node's output blob is a child node's input blob, and there is no node in between the parent and child node that writes to the same blob. If two nets generate the same such graph, then the "path" of data is the same.

Reviewed By: akyrola

Differential Revision: D5210385

fbshipit-source-id: 6317fc4e16289339b50c2dcd86ec8b32d2d544a5
2017-06-19 00:46:32 -07:00
haracejacob
2ec294a8bb Fix a few typos and grammars in comment
Summary:
Fix a few typos and grammars in comment

by using language-check, python library
spell_checker source code is here : https://github.com/17-1-SKKU-OSS/011A/blob/master/spell_checker/spell_checker.py
here is the text file which indicates what things should be fixed :  https://github.com/17-1-SKKU-OSS/011A/tree/master/spell_checker/fix/caffe2
Closes https://github.com/caffe2/caffe2/pull/719

Differential Revision: D5165118

Pulled By: aaronmarkham

fbshipit-source-id: 7fb8ef7a99d03cd5fd2f9ebdb01b9865e90fc37b
2017-06-14 18:22:39 -07:00
Aapo Kyrola
27e01744b2 Probably fixed memonger
Summary:
This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test.

1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain.
2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs.
3) Added a harder unit test that failed before.
4) Added test for resnet50 + memonger

Reviewed By: asaadaldien

Differential Revision: D5193393

fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711
2017-06-08 09:19:24 -07:00
Peizhao Zhang
87a12dd355 Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace.
Summary: Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Some of the output blobs (like mask output of DropOut when is_test=1) may be nullptr and FetchBlob will fail.

Differential Revision: D5198641

fbshipit-source-id: 45ee26c4cb1c25cc48904e9f7d7c007224c97418
2017-06-07 15:35:32 -07:00
Aapo Kyrola
da6b82b810 fix another bug related to in-place ops --> treat in-place ops like any other
Summary:
D5116828 changed how in-place ops were hanled in memonger and fixed a crash in NeuralMT. However, it still produced incorrect memongerization, because an op with one inplace input-output but another non-inplace output would be handled still incorrectly, as the other output's branch would not be followed properly.

This is fixed by actually removing the whole in-place op special handling. This actually is not needed anymore, it was leftover from an older version of memonger that used topological sort of the ops.

Reviewed By: asaadaldien

Differential Revision: D5128142

fbshipit-source-id: b551b0faebdde410e6bd7516958c63cf610cc065
2017-05-24 23:32:03 -07:00
Aapo Kyrola
6c511f64cc fix handling of ops with in-place input/output
Summary: Memonger ignores ops with input and output in-place, but did not work correctly if there were also non-inplace inputs, like with Mul. Simple fix to also look at in-placeness during the traversar.

Reviewed By: jhcross

Differential Revision: D5116828

fbshipit-source-id: 52817f1221597986cc09cc65d094417c1923d965
2017-05-23 18:23:33 -07:00
Aapo Kyrola
f82a510be6 share forward activation blobs + pass unused free blobs down all branches + use shape infernece
Summary:
Added optional support for using activation blobs for sharing as well. Doing this change revealed an non-optimal implementation in the blob sharing: we need to prefer to reuse freeblobs by prefering those blobs that are already shared by many other blobs. Otherwise the memory usage can increase when the pool of 'free blobs' grows.

Also, my first version only passed "free blobs" (i.e blobs in recycling pool) down the first branch when operators forked. But now we pass those blobs that were not used by the first branch down the second branch and so on.

Also added support for blob size information in the heuristic. This uses the shape inference mechanism.

I had to also do some small tweaks:
- use Sum() operator as a way to match shapes of blobs that had otherwise unknown shapes. This is related to the Sum() operator that is added to combine multiple incoming gradient inputs (with _autosplit gradients).
- a couple of random shape inference fixes

This reduces the Resnet-50 memory usage on 64 batch from 9.45 Gig to 8.5 Gig.
For a 32 batch, the memory usage is 4330 MiB, down from 4800 MB, compared to Torch's 6856MiB (thanks prigoyal  for checking this for me).

This is unfortunately quite a bunch to review...

Reviewed By: asaadaldien

Differential Revision: D4393909

fbshipit-source-id: 9c7c94125f96512bea80463ebcb63c215ef95ff9
2017-04-25 14:23:25 -07:00
Luke Yeager
b7be2016aa Fix typos in memonger.py
Summary:
Found while browsing the code. Cool stuff in here!
Closes https://github.com/caffe2/caffe2/pull/276

Differential Revision: D4911421

Pulled By: Yangqing

fbshipit-source-id: 3bef10a4001a6b4d4527c054519d69131799a0e2
2017-04-18 20:52:41 -07:00
Aapo Kyrola
3c9dfe4736 dag-compatible forward memonger
Summary: Memonger's inference optimization is very efficient, but does not work if a multi-threaded DAG net is used. So I added this alternative that shares code with the gradient memonger and does the blob recycling by traversing the DAG and ensuring that blobs do not pass parallel branches.

Reviewed By: viswanathgs

Differential Revision: D4884303

fbshipit-source-id: dfd0a6ecdb91f4edbb0b743729c92f4cd015602e
2017-04-13 22:08:09 -07:00
Peizhao Zhang
cb3bd0ede8 Added a DP + recursion algorithm for finding optimal blob assignments based on blob sizes.
Summary:
Added a DP + recursion algorithm for finding blob assignments based on blob sizes. This algorithm gives optimal assignments. See comments for details.

The algorithm is not used by default, set algo=memonger.AssignmentAlgorithm.DYNAMIC_PROGRAMMING and provide blob_sizes in optimize_interference() to use it. The blob sizes could be retrieved by running the net once and then calling blob_sizes = memonger.collect_blob_sizes(net). All blob sizes are assumed to be 1 if blob_sizes is not provided. In this case, using algo=memonger.AssignmentAlgorithm.GREEDY may be better.

Testing on the segmentation model, the memory usage is reduced by 19% (14.96MB to 12.08MB) comparing using the greedy algorithm (without considering conv share buffer). The algorithm runs in 15s for the model with 55 sharable blobs.

Reviewed By: ajtulloch

Differential Revision: D4818476

fbshipit-source-id: 606936f4cf2715408d60b9a5cf3bcaf1985a0fec
2017-04-07 02:18:08 -07:00
Peizhao Zhang
59f464434d Used blob sizes for finding assignments in a greedy way.
Summary: Used blob sizes for finding assignments in a greedy way.

Reviewed By: ajtulloch

Differential Revision: D4818159

fbshipit-source-id: 89180a6117ba5be058e1d2f9488b06d618e91917
2017-04-06 12:36:38 -07:00
Peizhao Zhang
a54000dc6a Added an ordering function to reduce live spans of computed blobs.
Summary:
Added an ordering function (topological_sort_traversal_longest_path()) to reduce live spans of computed blobs. The idea is to sort the ops based on the length of the execution path so that ops in longer path will be used first.

Tested on segmentation model with on-the-fly decoder and reduced memory usage from 21.7MB to 14MB (original size is 33MB with compressed parameters and without considering the conv buffer), comparing to use topological_sort_traversal() as the ordering function.

It is a general ordering function so I put it in memonger.py directly.

Reviewed By: ajtulloch

Differential Revision: D4790135

fbshipit-source-id: e661b45c1640de44ce1a9fdd009a4fba38f8e042
2017-04-06 12:20:39 -07:00
Aapo Kyrola
02f0c1c9d7 make memonger work with RecurrentNetwork(Gradient)
Summary:
This diff enables support of recurrent networks for memonger:
1. Memonger descends into the step-nets and renames the blobs accordingly
2. Memonger tells the gradient op about the renamed blobs by adding a parameter "paramname.renamed=<new name>"
3. RecurrentNetworkGradientOp applies remapping to links and gradient blobs.

I first thought of refactoring the whole gradient blob management of the recurrent network, but that looks to be very hard without a major revise of the code.

Note, I did not enable memonger for neural_mt, since I think the team should do more testing before enabling this.

Reviewed By: salexspb

Differential Revision: D4812823

fbshipit-source-id: 1ffdf3cfb4fcd00eec5bb0ece3bf416aa6d3e26b
2017-04-05 09:48:25 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Viswanath Sivakumar
9775ffc6ae Fixes to topological sort, canonical blob naming, sharing final blob
Summary: Three small changes:

Reviewed By: ajtulloch

Differential Revision: D4437131

fbshipit-source-id: c849e36e1c4d1dce947076349df863fafe62c66d
2017-01-25 15:14:26 -08:00
Aapo Kyrola
95b3309a87 Gradient Input memory sharing using memonger blob sharing
Summary:
This diff brings us to roughly par with Torch on ResNet memory usage. On batch size 32, Resnet-50 took 7497MiB, after this 5010 MiB. This will thus allow us to handle 64 images / GPU, or 256 images / 4 GPUs.

In addition, I added a special argument to DagNet that causes it to run only one thread for the first iteration. This is needed since there are allocations on the first iteration's backward pass due to gradient sharing, and this will cause NCCL to deadlock.

The sharing of gradient buffers requires inferring which gradients can share memory (i.e that they are not used concurrently). Previous memonger code uses topological sort, but rbgirshick showed that it does not work with tree-like models. Thus, I wrote a new optimization algorithm based on DFS. It takes about 0.25 secs / GPU on resnet-50, so is clearly fast enough.

Module data_parallel_model supports this feature natively.

Reviewed By: prigoyal

Differential Revision: D4363209

fbshipit-source-id: 73b11e7610438098bb11bff0af8075ab0cf2c0f1
2017-01-09 19:44:23 -08:00
Yangqing Jia
09bed67e4f add untracked files 2016-07-21 11:26:41 -07:00