Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.
The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.
This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.
Reviewed By: kennyhorror
Differential Revision: D4555423
fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187
Summary:
This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations.
Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema.
I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way.
Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test.
Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around.
Reviewed By: dzhulgakov
Differential Revision: D4436818
fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c
Summary:
I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff.
Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets. This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones.
After the optimization, the net construction drops from 95 secs to 8.2 secs!
Reviewed By: azzolini
Differential Revision: D4288307
fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e
Summary:
I got a weird error about NoneType not being iterable which made me think
it was some error in the C2 core, whereas it was an error in my code.
Reviewed By: Yangqing
Differential Revision: D4192799
fbshipit-source-id: 0122f13e205c1c6a0766545f0ad6296228d3a3d9