Summary:
Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594)
Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner.
I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase).
1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests.
2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests.
3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients.
4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object.
5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo.
I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do
```
settings(...)
given(...)
def test_my_stuff(...)
```
But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350
Reviewed By: houseroad
Differential Revision: D9693857
Pulled By: ajyu
fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7
Summary: Cleaning up converter.cc and allowing networks that have "pass through" inputs (that are also outputs but aren't actually consumed by the network)
Reviewed By: duc0
Differential Revision: D9759435
fbshipit-source-id: 1ddfcc60a1b865a06682e4022230dfecc4b89ec3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748
For avx512, we need to align at a multiple of 64B not 32B
Regardless of avx512, it's in general a good idea to be cache line aligned.
Reviewed By: ilia-cher
Differential Revision: D9845056
fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512
Summary:
Requires https://github.com/onnx/onnx/pull/1377
This PR makes it so that slices with dynamic boundary values can be exported from pytorch and run in caffe2 via ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11255
Differential Revision: D9790216
Pulled By: jamesr66a
fbshipit-source-id: 6adfcddc5788df4d34d7ca98341077140402a3e2
Summary:
* Many op in lstm part of the model don't have implementation in ideep/mkl, and it doesn't make sense to copy back and forth for the few available ops because majority of RNN will be on CPU
* Thus the strategy is to enable mkl only for the resnet18 part of the model, then switch to default cpu engine for the lstm part
* The net may contain some external_inputs falsely added during ONNX->Caffe2. Canary in service shows their existence could leads to service crash (presumably due to these blob somehow get shared between threads). They're now manually removed which seem to be enough to avoid the crash.
Reviewed By: viswanathgs
Differential Revision: D8888763
fbshipit-source-id: da7761bcb7d876ff7bbb6640ae4b24712c0b1de6
Summary:
This fixes the build when CuDNN was not found on the system.
From the `git blame`, it looks like the bug has been around for 2 years :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11562
Differential Revision: D9784589
Pulled By: soumith
fbshipit-source-id: b33153436dced0a503c9833cdf52f7093f3394b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413
LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result.
Reviewed By: manojkris, xianjiec
Differential Revision: D9724988
fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11418
Several improvements that aim to make the APIs more straightforward to use
- Get rid of helper methods subgraph and nonTerminal . Users now should create a NNMatchGraph directly via graph's createNode and createEdge API
- Get rid of operatorSubgraph helper method
- invertGraphTraversal flag applies to both the match graph and the scanned graph. This allows user to create match graph in the same direction as the scanned graph, thus reduce confusion.
- additional parameters of matchNode (count, includeInSubgraph, nonTerminal) are removed from the constructors and moved into setter methods. (We no longer enforce that MatchNode is immutable but this helps improve code clarity).
- Tests are updated to reflect the changes
Follow up changes:
- Possibly clean up the tests further. This change aims to minimally modify the unit tests.
- Help a validity check that enforce the current limitation of the match graph (single source node), and throws if the match graph does not satisfy the criteria.
- Have the single source node be detected automatically and callers just need to pass in the matchGraph instead of the source node reference.
Differential Revision: D9732565
fbshipit-source-id: ae8320e2bc89b867f6bb4b1c1aad635f4b219fa1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291
This new operator will do the following:
Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where:
1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements)
2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1)
3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0)
Reviewed By: bddppq, chocjy
Differential Revision: D9013119
fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84
Summary:
Fixes the issue discussed in #10838. `hidden_size` should be the last dimension regardless if we're in ONNX or PyTorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11368
Differential Revision: D9734814
Pulled By: soumith
fbshipit-source-id: 7f69947a029964e092c7b88d1d79b188a417bf5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888
Add cuda version of SpatialBNOp also optimize SpatialBN on CPU
Reviewed By: houseroad
Differential Revision: D9512435
fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
Summary:
This is an experimental build on top of what orionr and mingzhe09088 built.
Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266
Reviewed By: orionr
Differential Revision: D9682942
Pulled By: Yangqing
fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11098
Added a test for testing CPU version across multiple devices.
Reviewed By: enosair, BIT-silence
Differential Revision: D9584520
fbshipit-source-id: 0d8c85e6d402bc7b34d5f8f16ef655ff9b61b49e
Summary:
Turns out that '' net.type is not acceptable to CreateNet.
But empty net.type is acceptable.
Fix that in this diff. Also this is related to T33613083
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11286
Reviewed By: Maratyszcza, wat3rBro
Differential Revision: D9659920
Pulled By: harouwu
fbshipit-source-id: d68f24b754e18e1121f029656d885c48ab101946
Summary: Closing the gap a bit on API, allowing users to go NetDef -> nomnigraph -> NetDef in python now
Reviewed By: duc0
Differential Revision: D9670495
fbshipit-source-id: 6497518ffc05a186deb0d657e06317980d39ddd5
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.
Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect
Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264
Differential Revision: D9656793
Pulled By: yf225
fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
static_assert(d < MaxDeviceTypes, "");
Event::event_creator_[d] = f;
}
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):
1. caffe2::DeviceType → caffe2::DeviceTypeProto
2. caffe2::CPU → caffe2::PROTO_CPU
3. caffe2::DeviceType = at::DeviceType
4. caffe2::CPU = at::DeviceType::CPU
codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes
In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.
Reviewed By: ezyang
Differential Revision: D9545704
fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10874
Fixes the log message "WARNING:data_workers:Warning, data loading lagging behind: name=0" where instead of source name the size of a queue is reported
Reviewed By: panshen1, Novitial
Differential Revision: D9506606
fbshipit-source-id: 03717cfa9b991afb335ef877378afa3b52fd8f22
Summary:
keep net type info when generating model complete net. This will keep the performance optimization option
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11032
Reviewed By: wat3rBro
Differential Revision: D9564125
Pulled By: harouwu
fbshipit-source-id: c6546af9b1d4ff5eddf6124e24a5da1b8baf47df
Summary:
Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests.
To use:
1. Refactor your test to be a SerializedTestCase
1a. Decorate it with given_and_seeded
1b. Call testWithArgs in main
2. Run your test with -g to generate the output. Check it in.
3. Subsequent runs of the test without generating the output will check against the checked in test case.
Details:
Run your test with `python caffe2/python/operator_test/[your_test].py -g`
Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?)
Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594
Reviewed By: ezyang
Differential Revision: D9370359
Pulled By: ajyu
fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739
I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint.
But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk
This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan".
This adding a dummy TaskOutput when user specifies no TaskOutput is a hack.
The reason for this is that ZMQ socket can't send empty blob list.
As a result, if the Task on the Worker had no output,
The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`.
TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces.
Instead, we should move the creating of the dummy blob to some deeper layer,
and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces.
After this change, the workaround becomes totally transparent and no side-effect to users.
Reviewed By: mraway
Differential Revision: D9566744
fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af
Summary: When conversion fails, dump more information to help fix up the netdef
Reviewed By: hyuen, yinghai
Differential Revision: D9558667
fbshipit-source-id: 8917cc61c9be6285697e4f8395a9dbc7135f618e
Summary:
1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks.
2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models.
3. Ignore 0-dim tensor in MKL-DNN concat operator.
4. Generate dynamic library of Detectron module for CPU device.
This PR obsoletes #9164.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157
Differential Revision: D9276837
Pulled By: yinghai
fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955
Add GPU version of HardSigmoid Op to Caffe2. Updated test file to
include GPU tests.
Reviewed By: enosair
Differential Revision: D9499353
fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11003
Need a interface to re-write the graph after the net is built and after adding gradient ops.
Reviewed By: aazzolini, harouwu
Differential Revision: D9557827
fbshipit-source-id: 2e082f0321c0776e488a29e18047d950948e7c37
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739
I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint.
But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk
This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan".
This adding a dummy TaskOutput when user specifies no TaskOutput is a hack.
The reason for this is that ZMQ socket can't send empty blob list.
As a result, if the Task on the Worker had no output,
The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`.
TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces.
Instead, we should move the creating of the dummy blob to some deeper layer,
and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces.
After this change, the workaround becomes totally transparent and no side-effect to users.
Reviewed By: mraway
Differential Revision: D9413150
fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10929
Workspace classes methods were missing on the Python side.
Being able to write the New Checkpoint Framework with more control of the workspace and cleaner implementation.
Added
- ws.feed_blob(name, arr)
- ws.remove_blob(name)
Reviewed By: mraway
Differential Revision: D9486867
fbshipit-source-id: ea02d2e3a39d716a5a3da0482f57d4ac4c893763