pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Sean Lynch	f9a766bb39	Increase deadline time for load_save tests (#43205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43205 A number of tests that forward to `TestLoadSaveBase.load_save` are all marked as flaky due to them regularly taking much longer to start up than hypothesis' default timeout of 200ms. This diff fixes the problem by removing the timeout for `load_save`. This is alright as these tests aren't meant to be testing the performance of these operators. I would set the deadline to 60s if I could however it appears the that caffe2 github CI uses a different version of hypothesis that doesn't allow using `dateutil.timedelta` so instead of trying to figure out an approach that works on both I've just removed the deadline time. I've also tagged all existing tasks WRT these failures. Differential Revision: D23175752 fbshipit-source-id: 324f9ff034df1ac4874797f04f50067149a6ba48	2020-08-20 08:41:24 -07:00
Hector Yuen	06d43dc69a	default ice-ref to c-step (#4812 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4812 if no compilation options are passed, default to c-step fixed the FC and batchmatmul implementations to match C-step fixed the fakelowp map calling to make sure we use the fp32 substitution of operators updated the accumulator test to make it pass with fp32 Test Plan: fakelowp tests glow/test/numerics net_runner Reviewed By: jfix71 Differential Revision: D23086534 fbshipit-source-id: 3fbb8c4055bb190becb39ce8cdff6671f8558734	2020-08-19 09:50:34 -07:00
Ann Shan	2e6e295ecc	refactor _save_parameters to _save_data (#43162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43162 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23175286 Pulled By: ann-ss fbshipit-source-id: 6f930b98c367242fd4efbf51cb1d09995f7c4b40	2020-08-18 14:57:03 -07:00
Yinghai Lu	b92b556a12	Add shape inference to SparseLengthsSumSparse ops (#43181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43181 att Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: ChunliF Differential Revision: D23097145 fbshipit-source-id: 3e4506308446f28fbeb01dcac97dce70c0443975	2020-08-18 09:36:53 -07:00
Christian Sarofeen	b3bda94393	[NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129 ) Summary: Had a bunch of merged commits that shouldn't have been there, reverted them to prevent conflicts. Lots of new features, highlights listed below. Overall: - Enables pointwise fusion, single (but N-D) broadcast -- pointwise fusion, single (but N-D) broadcast -- pointwise -- single (but N-D) reduction fusion. Integration: - Separate "magic scheduler" logic that takes a fusion and generates code generator schedule - Reduction fusion scheduling with heuristics closely matching eagermode (unrolling supported, but no vectorize support) - 2-Stage caching mechanism, one on contiguity, device, type, and operations, the other one is input size->reduction heuristic Code Generation: - More generic support in code generation for computeAt - Full rework of loop nest generation and Indexing to more generically handle broadcast operations - Code generator has automatic kernel launch configuration (including automatic allocation of grid reduction buffers) - Symbolic (runtime) tilling on grid/block dimensions is supported - Simplified index generation based on user-defined input contiguity - Automatic broadcast support (similar to numpy/pytorch semantics) - Support for compile time constant shared memory buffers - Parallelized broadcast support (i.e. block reduction -> block broadcast support) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43129 Reviewed By: mrshenli Differential Revision: D23162207 Pulled By: soumith fbshipit-source-id: 16deee4074c64de877eed7c271d6a359927111b2	2020-08-18 09:10:08 -07:00
Nikita Shulga	034e6727e7	Set default ATen threading backend to native if USE_OPENMP is false (#43067 ) Summary: Since OpenMP is not available on some platforms, or might be disabled by user, set default `ATEN_THREADING` based on USE_OPENMP and USE_TBB options Fixes https://github.com/pytorch/pytorch/issues/43036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43067 Reviewed By: houseroad Differential Revision: D23138856 Pulled By: malfet fbshipit-source-id: cc8f9ee59a5559baeb3f19bf461abbc08043b71c	2020-08-17 10:33:31 -07:00
Venkata Chintapalli	33c5fe3c1d	Enable test_logit FakeLowP test. (#43073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43073 Enable test_logit FakeLowP test. Test Plan: test_op_nnpi_fp16.py Reviewed By: hyuen Differential Revision: D23141375 fbshipit-source-id: cb7e7879487e33908b14ef401e1ab05fda193d28	2020-08-14 14:49:29 -07:00
Edson Romero	5014cf4a4d	Export MergeIdLists Caffe2 Operator to PyTorch Summary: As titled. Test Plan: buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_merge_id_lists Reviewed By: yf225 Differential Revision: D23076951 fbshipit-source-id: c37dfd93003590eed70b0d46e0151397a402dde6	2020-08-14 14:46:17 -07:00
Hector Yuen	c8e789e06e	add fake fp16 fusions to net transforms (#42927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42927 added fp16 fusion to net transforms refactored the transforms as well as glow_transform to get out of opt/custom so that the OSS builds passed Test Plan: added net runner tests for this Reviewed By: yinghai Differential Revision: D23080881 fbshipit-source-id: ee6451811fedfd07c6560c178229854bca29301f	2020-08-14 13:30:27 -07:00
Dongxin Liu	a2b86d95d1	Make Mish support large inputs. (#43037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43037 In the previous version of mish_op.cc, the output would be 'nan' for large inputs. We re-write mish_op.cc to solve this problem. Test Plan: Unit test buck test //dper3/dper3/modules/tests:core_modules_test -- test_linear_compress_embedding_with_attention_with_activation_mish {F284052906} buck test mode/opt //dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_with_mish {F284224158} ## Workflow f212113434 {F285281318} Differential Revision: D23102644 fbshipit-source-id: 98f1ea82f8c8e05b655047b4520c600fc1a826f4	2020-08-14 08:53:16 -07:00
Luca Wehrstedt	ed242cbec5	Guard TensorPipe agent by USE_TENSORPIPE (#42682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42682 ghstack-source-id: 109834351 Test Plan: CI Reviewed By: malfet Differential Revision: D22978717 fbshipit-source-id: 18b7cbdb532e78ff9259e82f0f92ad279124419d	2020-08-14 02:57:36 -07:00
Ren Chen	e182ec97b3	Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator. Summary: 1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context. 2. Add support to scaling lengths vector for SplitByLengths operator. 3. Add support to test SplitByLengths operator in the CUDA context. Example for SplitByLengths operator processing scaling lengths vector: value vector A = [1, 2, 3, 4, 5, 6] length vector B = [1, 2] after execution of SplitByLengths operator, the output should be [1,2] and [3,4,5,6] Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test Reviewed By: kennyhorror Differential Revision: D23079841 fbshipit-source-id: 3700e7f2ee0a5a2791850071fdc16e5b054f8400	2020-08-14 01:04:08 -07:00
Hector Yuen	3544f60f76	make deadline=None for all numerics tests (#43014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43014 changing this behavior mimics the behavior of the hold hypothesis testing library Test Plan: ran all tests on devserver Reviewed By: hl475 Differential Revision: D23085949 fbshipit-source-id: 433fdfbb04b6a609b738eb7c319365049a49579b	2020-08-13 16:48:31 -07:00
Luca Wehrstedt	8493b0d5d6	Enroll TensorPipe agent in C++-only E2E test (#42680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42680 ghstack-source-id: 109544678 Test Plan: CI Reviewed By: mrshenli Differential Revision: D22978714 fbshipit-source-id: 04d6d190c240c6ead9bd9f3b7f3a5f964d7451e8	2020-08-13 07:07:30 -07:00
Hector Yuen	5157afcf59	fix int8 FC (#42691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42691 fix quantization of FC bias to match nnpi quantize biases to fp16 Test Plan: improved the unit test to have input tensors in fp32 Reviewed By: tracelogfb Differential Revision: D22941521 fbshipit-source-id: 00afb70610f8a149110344d52595c39e3fc988ab	2020-08-12 09:30:34 -07:00
Ehsan K. Ardestani	ecb9e790ed	Remove excessive logging in plan_executor (#42888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42888 as title Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file /mnt/public/ehsanardestani/temp/quant_eval_inputs_all.json Reviewed By: amylittleyang Differential Revision: D23066529 fbshipit-source-id: f925afd1734e617e412b0f171e16c781d13272d9	2020-08-11 23:57:17 -07:00
Ophir Romano	a346e90c49	Update to NNP-I v1.0.0.5 (#4770 ) Summary: Align code to NNP-I v1.0.0.5 (glow tracing changes). Pull Request resolved: https://github.com/pytorch/glow/pull/4770 Reviewed By: arunm-git Differential Revision: D22927904 Pulled By: hl475 fbshipit-source-id: 3746a6b07f3fcffc662d80a95513427cfccac7a5	2020-08-11 23:53:23 -07:00
Christopher Whelan	7a9ae52550	[hypothesis] Deadline followup (#42842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42842 Test Plan: `buck test` Reviewed By: thatch Differential Revision: D23045269 fbshipit-source-id: 8a3f4981869287a0f5fb3f0009e13548b7478086	2020-08-11 15:33:23 -07:00
Hector Yuen	3bf2978497	remove deadline enforcement for hypothesis (#42871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42871 old version of hypothesis.testing was not enforcing deadlines after the library got updated, default deadline=200ms, but even with 1s or more, tests are flaky. Changing deadline to non-enforced which is the same behavior as the old version Test Plan: tested fakelowp/tests Reviewed By: hl475 Differential Revision: D23059033 fbshipit-source-id: 79b6aec39a2714ca5d62420c15ca9c2c1e7a8883	2020-08-11 14:28:53 -07:00
Edson Romero	71dbfc79b3	Export BatchBucketOneHot Caffe2 Operator to PyTorch Summary: As titled. Test Plan: ``` buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_batch_bucket_one_hot_op ``` Reviewed By: yf225 Differential Revision: D23005981 fbshipit-source-id: 1daa8d3e7d6ad75e97e94964db95ccfb58541672	2020-08-11 14:00:19 -07:00
Yury Gitman	9c8f5cb61d	Ensure IDEEP transpose operator works correctly Summary: I found out that without exporting to public format IDEEP transpose operator in the middle of convolution net produces incorrect results (probably reading some out-of-bound memory). Exporting to public format might not be the most efficient solution, but at least it ensures correct behavior. Test Plan: Running ConvFusion followed by transpose should give identical results on CPU and IDEEP Reviewed By: bwasti Differential Revision: D22970872 fbshipit-source-id: 1ddca16233e3d7d35a367c93e72d70632d28e1ef	2020-08-11 12:58:31 -07:00
Mike Ruberry	ddcf3ded3e	Revert D23002043: add net transforms for fusion Test Plan: revert-hammer Differential Revision: D23002043 (`a4b763bc2c`) Original commit changeset: f0b13d51d68c fbshipit-source-id: d43602743af35db825e951358992e979283a26f6	2020-08-10 21:22:57 -07:00
Mike Ruberry	dedcc30c84	Fix ROCm CI by increasing test timeout (#42827 ) Summary: ROCm is failing to run this test in the allotted time. See, for example, https://app.circleci.com/pipelines/github/pytorch/pytorch/198759/workflows/f6066acf-b289-46c5-aad0-6f4f663ce820/jobs/6618625. cc jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/42827 Reviewed By: pbelevich Differential Revision: D23042220 Pulled By: mruberry fbshipit-source-id: 52b426b0733b7b52ac3b311466d5000334864a82	2020-08-10 20:26:20 -07:00
Hector Yuen	a4b763bc2c	add net transforms for fusion (#42763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42763 add the fp16 fusions as net transforms: -layernorm fused with mul+add -swish int8 Test Plan: added unit test, ran flows Reviewed By: yinghai Differential Revision: D23002043 fbshipit-source-id: f0b13d51d68c240b05d2a237a7fb8273e996328b	2020-08-10 20:16:14 -07:00
Venkata Chintapalli	e7b5a23607	include missing settings import Summary: from hypothesis import given, settings Test Plan: test_op_nnpi_fp16.py Differential Revision: D23031038 fbshipit-source-id: 751547e6a6e992d8816d4cc2c5a699ba19a97796	2020-08-10 10:45:34 -07:00
Christopher Whelan	5cd0f5e8ec	[PyFI] Update hypothesis and switch from tp2 (#41645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41645 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1405 Test Plan: buck test Reviewed By: thatch Differential Revision: D20323893 fbshipit-source-id: 54665d589568c4198e96a27f0ed8e5b41df7b86b	2020-08-08 12:13:04 -07:00
Hector Yuen	18ca999e1a	integrate int8 swish with net transformer Summary: add a fuse path for deq->swish->quant update swish fake op interface to take arguments accordingly Test Plan: net_runner passes unit tests need to be updated Reviewed By: venkatacrc Differential Revision: D22962064 fbshipit-source-id: cef79768db3c8af926fca58193d459d671321f80	2020-08-07 23:01:06 -07:00
Venkata Chintapalli	e95fbaaba3	Adding Peter's Swish Op ULP analysis. (#42573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42573 * Generate the ULP png files for different ranges. Test Plan: test_op_ulp_error.py Reviewed By: hyuen Differential Revision: D22938572 fbshipit-source-id: 6374bef6d44c38e1141030d44029dee99112cd18	2020-08-07 19:13:01 -07:00
Jianyu Huang	d4a4c62df3	[caffe2] Fix the timeout (stuck) issues of dedup SparseAdagrad C2 kernel Summary: Backout D22800959 (`f30ac66e79`). This one is causing the timeout (machine stuck) issues for dedup kernels. Reverting it make the unit test pass. Still need to investigate why this is the culprit... Original commit changeset: 641d52a51070 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` Reviewed By: jspark1105 Differential Revision: D23008389 fbshipit-source-id: 4f1b9a41c78eaa5541d57b9d8aa12401e1d495f2	2020-08-07 18:42:36 -07:00
Jongsoo Park	3fa0581cf2	[fbgemm] use new more general depthwise 3d conv interface (#42697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42697 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/401 As title Test Plan: CI Reviewed By: dskhudia Differential Revision: D22972233 fbshipit-source-id: a2c8e989dee84b2c0587faccb4f8e3bcb05c797c	2020-08-07 18:30:56 -07:00
Edson Romero	2b04712205	Exposing Percentile Caffe2 Operator in PyTorch Summary: As titled. Test Plan: ``` buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_percentile ``` Reviewed By: yf225 Differential Revision: D22999896 fbshipit-source-id: 2e3686cb893dff1518d533cb3d78c92eb2a6efa5	2020-08-07 16:22:37 -07:00
Adam Simpkins	02f58bdbd7	[caffe2] add type annotations for caffe2.distributed.python Summary: Add Python type annotations for the `caffe2.distributed.python` module. Test Plan: Will check sandcastle results. Reviewed By: jeffdunn Differential Revision: D22994012 fbshipit-source-id: 30565cc41dd05b5fbc639ae994dfe2ddd9e56cb1	2020-08-07 13:12:53 -07:00
lixinyu	98de150381	C++ API TransformerEncoderLayer (#42633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42633 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22994332 Pulled By: glaringlee fbshipit-source-id: 873abdf887d135fb05bde560d695e2e8c992c946	2020-08-07 11:49:42 -07:00
Stephen Chen	2971bc23a6	Handle fused scale and bias in fake fp16 layernorm Summary: Allow passing scale and bias to fake fp16 layernorm. Test Plan: net_runner. Now matches glow's fused layernorm. Reviewed By: hyuen Differential Revision: D22952646 fbshipit-source-id: cf9ad055b14f9d0167016a18a6b6e26449cb4de8	2020-08-07 10:48:33 -07:00
Mike Ruberry	9c8021c0b1	Adds torch.linalg namespace (#42664 ) Summary: This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did. Future PRs will likely: - add more functions to torch.linalg - expand the testing done in test_linalg.py, including legacy functions, like torch.ger - deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664 Reviewed By: ngimel Differential Revision: D22991019 Pulled By: mruberry fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b	2020-08-07 10:18:30 -07:00
DeepakVelmurugan	4eb02add51	Blacklist to Blocklist in onnxifi_transformer (#42590 ) Summary: Fixes issues in https://github.com/pytorch/pytorch/issues/41704 and https://github.com/pytorch/pytorch/issues/41705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42590 Reviewed By: ailzhang Differential Revision: D22977357 Pulled By: malfet fbshipit-source-id: ab61b964cfdf8bd2b469f4ff8f6486a76bc697de	2020-08-07 08:05:32 -07:00
Jordan Fix	fb8aa0046c	Add use_glow_aot, and include ONNX again as a backend for onnxifiGlow (#4787 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4787 Resurrect ONNX as a backend through onnxifiGlow (was killed as part of D16215878). Then look for the `use_glow_aot` argument in the Onnxifi op. If it's there and true, then we override whatever `backend_id` is set and use the ONNX backend. Reviewed By: yinghai, rdzhabarov Differential Revision: D22762123 fbshipit-source-id: abb4c3458261f8b7eeae3016dda5359fa85672f0	2020-08-07 04:31:24 -07:00
Chunli Fu	cb1ac94069	[blob reorder] Seperate user embeddings and ad embeddings in large model loading script Summary: Put user embedding before ads embedding in blobReorder, for flash verification reason. Test Plan: ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:enable_large_model_loading -- --model_path_src="/home/$USER/models/" --model_path_dst="/home/$USER/models_modified/" --model_file_name="182560549_0.predictor" ``` https://www.internalfb.com/intern/anp/view/?id=320921 to check blobsOrder Reviewed By: yinghai Differential Revision: D22964332 fbshipit-source-id: 78b4861476a3c889a5ff62492939f717c307a8d2	2020-08-06 23:54:03 -07:00
Stephen Chen	cdd7db1ffc	Bound shape inferencer: fix int8fc scale and bias Summary: Previous when inferring Int8FC, we failed to carry over the scale and zero point properly. Also fixed int8 FC weight data type to be int8 instead of uint8 as that's what C2 actually uses. Test Plan: Use net_runner to lower a single Int8Dequantize op. Previous scale and bias would always be 1 and 0. Now the proper value is set. Reviewed By: yinghai Differential Revision: D22912186 fbshipit-source-id: a6620c3493e492bdda91da73775bfc9117db12d1	2020-08-06 14:40:25 -07:00
Ehsan K. Ardestani	a5af2434fe	NVMified NE Eval Summary: This diff NVMifies the NE Eval Flow. - It defines a `LoadNVM` operator which either - receives a list of nvm blobs, or - extracts the blobs that could be NVMified from the model. - dumps NVMified blobs into NVM - and deallocates from DRAM - NVMify the Eval net on dper and C2 backend Specific NVMOp for SLS is pushed through different diffs. Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file=/mnt/public/ehsaardestani/temp/small_model.json 2>&1 \| tee log Reviewed By: yinghai, amylittleyang Differential Revision: D22469973 fbshipit-source-id: ed8379ad404e96d04ac05e580176d3aca984575b	2020-08-06 10:25:31 -07:00
Luca Wehrstedt	c30bc6d4d7	Update TensorPipe submodule (#42522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CI Reviewed By: malfet Differential Revision: D22959472 fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67	2020-08-06 02:14:58 -07:00
Ilia Cherniavskii	a53fdaa23f	Remove ProfiledType (#42570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570 ProfiledType doesn't do anything and is not used atm, removing Test Plan: CI Reviewed By: ezyang Differential Revision: D22938664 Pulled By: ilia-cher fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0	2020-08-06 01:52:08 -07:00
Mike Ruberry	ccfce9d4a9	Adds fft namespace (#41911 ) Summary: This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function. Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python: ``` import torch.fft t = torch.randn(128, dtype=torch.cdouble) torch.fft.fft(t) ``` See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911 Reviewed By: glaringlee Differential Revision: D22941894 Pulled By: mruberry fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d	2020-08-06 00:20:50 -07:00
Rui Liu	92b7347fd7	Enforce counter value to double type in rowwise_counter Summary: Enforce counter value to double type in rowwise_counter. Context: The existing implementation is using float type for counter value. But due to the precision limit of a floating number [1], we observed that the counter value can't increment beyond 16777216.0 (i.e., the max value is 16777216.0) in our earlier experiments. We decide to enforce double type to avoid this issue. [1] https://stackoverflow.com/questions/12596695/why-does-a-float-variable-stop-incrementing-at-16777216-in-c Test Plan: op test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python/operator_test(f0b0b48c)$ buck test :rowwise_counter_test Trace available for this run at /tmp/testpilot.20200728-083200.729292.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par Discovering tests Running 1 test Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047 ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - test_rowwise_counter (caffe2.caffe2.python.operator_test.rowwise_counter_test.TestRowWiseCounter) 0.265 1/1 (passed) ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - main 14.414 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047 Summary (total time 18.51s): PASS: 2 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` optimizer test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python(7d66fbb9)$ buck test :optimizer_test Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874434841896 Summary (total time 64.87s): PASS: 48 FAIL: 0 SKIP: 24 caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestMomentumSgd) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestGFtrl) caffe2/caffe2/python:optimizer_test - test_caffe2_cpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestSparseRAdam) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagradWithCounter) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagrad) caffe2/caffe2/python:optimizer_test - test_caffe2_gpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin) caffe2/caffe2/python:optimizer_test - testDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagrad) caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestFtrl) caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestRmsProp) ...and 14 more not shown... FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` param download test ``` ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/fb/net_transforms/tests(7ef20a38)$ sudo buck test :param_download_test Finished test run: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924481526935 ``` e2e flow: f208394929 f207991149 f207967273 ANP notebook to check the counter value loaded from the flows https://fburl.com/anp/5fdcbnoi screenshot of the loaded counter (note that counter max is larger than 16777216.0) {F250926501} Reviewed By: ellie-wen Differential Revision: D22711514 fbshipit-source-id: 426fed7415270aa3f276dda8141907534734337f	2020-08-05 20:40:51 -07:00
Summer Deng	509fb77b70	Adjust bound_shape_inferencer to take 4 inputs for FCs (#41934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41934 The model exported from online training workflow with int8 quantization contains FCs with 4 inputs. The extra input is the quant_param blob. This diff is to adjust the bound_shape_inferencer and int8 op schema to get shape info for the quant_param input. Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: yinghai Differential Revision: D22683554 fbshipit-source-id: 684d1433212a528120aba1c37d27e26b6a31b403	2020-08-05 18:44:48 -07:00
Andres Suarez	9ea9d1b52e	[fbs][2/n] Remove .python3 markers Test Plan: `xbgr '\.python3'` shows only one (dead) usage of this file: https://www.internalfb.com/intern/diffusion/FBS/browse/master/fbcode/python/repo_stats/buck.py?commit=9a8dd3243207819325d520c208218f6ab69e4e49&lines=854 Reviewed By: lisroach Differential Revision: D22955631 fbshipit-source-id: e686d9157c08c347d0ce4acdd05bd7ab29ff7df5	2020-08-05 18:25:50 -07:00
Stephen Chen	54ffb05eff	better error message between C2 and glow (#41603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41603 Pull Request resolved: https://github.com/pytorch/glow/pull/4704 Previously in the glow onnxifi path, when an error is encountered, we log it to stderr then just return ONNXIFI_STATUS_INTERNAL_ERROR to C2. C2 then does CAFFE2_ENFORCE_EQUAL(return_code, ONNXIFI_STATUS_SUCCESS). The error message that eventually went to the user is something like [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0 This diff adds plumbing to get human readable error message out of glow into C2. Test Plan: Run ads replayer. Overload it with traffic. Now the error message sent back to the client used to be E0707 00:57:45.697196 3709559 Caffe2DisaggAcceleratorTask.cpp:493] During running REMOTE_OTHER net: [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0 (Error from operator:.... Now it's ``` E0707 16:46:48.366263 1532943 Client.cpp:966] Exception when calling caffe2_run_disagg_accelerator on remote predictor for model 190081310_0 : apache::thrift::TApplicationException: c10::Error: [enforce fail at onnxifi_op.cc:556] . Error code: RUNTIME_REQUEST_REFUSED Error message: The number of allowed queued requests has been exceeded. queued requests: 100 allowed requests: 100 Error return stack: glow/glow/lib/Runtime/HostManager/HostManager.cpp:673 glow/glow/lib/Onnxifi/HostMana (Error from operator:... ``` Reviewed By: gcatron, yinghai Differential Revision: D22416857 fbshipit-source-id: 564bc7644d9666eb660725c2dca5637affae9b73	2020-08-05 16:25:13 -07:00
Stephen Chen	5023995292	fix output size adjustment for onnxifi_op Summary: this breaks if we cut the net at certain int8 ops boundary. Test Plan: with net_runner to lower a single Int8Quantize op. It used to break. Now it works. Reviewed By: yinghai Differential Revision: D22912178 fbshipit-source-id: ca306068c9768df84c1cfa8b34226a1330e19912	2020-08-05 15:55:46 -07:00
Mike Ruberry	24e2a8a171	Revert D22780307: Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator. Test Plan: revert-hammer Differential Revision: D22780307 (`76905527fe`) Original commit changeset: c5ca60ae16b2 fbshipit-source-id: f3c99eec5f05121e2bed606fe2ba84a0be0cdf16	2020-08-05 12:47:56 -07:00
Yongbin Gu	18a32b807b	Add API to collect output_col_minmax_histogram Summary: Add an API to collect output_col_minmax_histogram. This is used to implement input_equalization. Roll back revised the collect_single_histogram in the new version to make sure it does not affect the product. The newly added one can implement collect the activation histogram and output col max histogram at the same time. Test Plan: Add a unit test, and pass it. https://our.intern.facebook.com/intern/testinfra/testrun/2251799847601374 After updating the dump API, it passed the updated unit test https://our.intern.facebook.com/intern/testinfra/testrun/844425097716401 Integrated the output_col_minmax_histogram to the collect single histogram, and make it downward compatible https://our.intern.facebook.com/intern/testinfra/testrun/8162774342207893 I added different cases to tested newly added function. It passed the unit test https://our.intern.facebook.com/intern/testinfra/testrun/4503599658969000 Tested after new revision: https://our.intern.facebook.com/intern/testinfra/testrun/5348024589078557 Reviewed By: hx89 Differential Revision: D22919913 fbshipit-source-id: c9cb05e0cf14af0dfde3d22921abb42f97a61df2	2020-08-05 12:33:10 -07:00
Yinghai Lu	5c5d7a9dca	Freeze dynamic (re)quantizaiton ops into standard ones (#42591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42591 We don't support lowering with 2-input Int8Quantize and 4-input Int8FC. Just do a conversion to absorb the quantization params into the op itself. Test Plan: ``` buck test caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test ``` Reviewed By: benjibc Differential Revision: D22942673 fbshipit-source-id: a392ba2afdfa39c05c5adcb6c4dc5f814c95e449	2020-08-05 11:53:09 -07:00
Ren Chen	76905527fe	Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator. Summary: 1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context. 2. Add support to scaling lengths vector for SplitByLengths operator. 3. Add support to test SplitByLengths operator in the CUDA context. Example for SplitByLengths operator processing scaling lengths vector: value vector A = [1, 2, 3, 4, 5, 6] length vector B = [1, 2] after execution of SplitByLengths operator, the output should be [1,2] and [3,4,5,6] Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test Reviewed By: kennyhorror Differential Revision: D22780307 fbshipit-source-id: c5ca60ae16b24032cedfa045a421503b713daa6c	2020-08-05 11:46:00 -07:00
Dmytro Dzhulgakov	06d978a9ad	[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: \| Case \| Call to device_count() \| init_cuda, e.g. allocating tensor \| \| -- \| -- \| -- \| \| all good \| non-zero \| just works \| \| no gpus \| 0, no warning \| throw exception with good message \| \| driver issues \| 0, produce warning \| throw exception with good message \| \| out of memory with ASAN \| 0, produce warning\| throw exception with ASAN message \| Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5	2020-08-05 11:39:31 -07:00
Yongbin Gu	d45e2d3ef9	Reduce the output overhead of OutputColumnMaxHistogramObserver by enabling changing bin_nums, Update the observer_test.py Summary: Current OutputColumnMaxHistogramObserver will output 2048 bins for each column. The file will be extremely large and the dumping time is quite long. However, we only use the min and max finally. This diff enables changing bin_nums by adding an argument. And the default value is set to 16 to reduce dumping overhead. When we need more bins to analyze the results, we only need to change this argument Test Plan: buck run caffe2/caffe2/quantization/server:observer_test {F263843430} Reviewed By: hx89 Differential Revision: D22918202 fbshipit-source-id: bda34449355b269b24c55802012450ebaa4d280c	2020-08-04 17:07:25 -07:00
Yinghai Lu	8850fd1952	Add python inferface to create OfflineTensor (#42516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42516 att. We need it for some scripts. Reviewed By: houseroad Differential Revision: D22918112 fbshipit-source-id: 8a1696ceeeda67a34114bc57cb52c925711cfb4c	2020-08-04 01:31:34 -07:00
Ann Shan	d707d4bf6d	Implement a light SGD optimizer (#42137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42137 This PR implements an SGD optimizer class similar to torch::optim::SGD, but it doesn't inherit from torch::optim::Optimizer, for use on mobile devices (or other lightweight use case). Adding Martin's comment for visibility: "SGD may be the only optimizer used in near future. If more client optimizers are needed, refactoring the full optim codes and reusing the existing code would be an option." Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22846514 Pulled By: ann-ss fbshipit-source-id: f5f46804aa021e7ada7c0cd3f16e24404d10c7eb	2020-08-03 17:27:53 -07:00
Yinghai Lu	dbdd28207c	Expose a generic shape info struct for ONNXIFI Python interface (#42421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42421 Previously, we can only feed shape info from Python with float dtype, and batch based dim type when we do onnxifi from Python. This diff removes this limitation and uses TensorBoundShapes protobuf as a generic shape info struct. This will make the onnxifi interface in Python more flexible. Reviewed By: ChunliF Differential Revision: D22889781 fbshipit-source-id: 1a89f3a68c215a0409738c425b4e0d0617d58245	2020-08-03 16:10:05 -07:00
Xing Wang	ebfff31e19	[distributedhogwild] Introducing new tags for distributed hogwild. (#42381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42381 Introduce new tag to support distributed hogwild. Reviewed By: boryiingsu Differential Revision: D20484099 fbshipit-source-id: 5973495589e0a7ab185d3867b37437aa747f408a	2020-08-03 07:10:44 -07:00
Martin Yuan	bfa94487b9	Remove register_mobile_autograd.cpp. (#42397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42397 Since the autograd registration is unified to code-gen, we don't need to keep a manual registration file for mobile. Remove it to avoid extra maintenance. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D22883153 Pulled By: iseeyuan fbshipit-source-id: 6db0bd89369beab9eed6e9a9692dd46f5bd1ff48	2020-08-02 14:14:33 -07:00
Xiaomeng Yang	5769b06ab5	[Caffe2] Remove explicitly divide by zero in SpatialBN training mode (#42380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42380 [Caffe2] Remove explicitly divide by zero in SpatialBN training mode Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test Reviewed By: houseroad Differential Revision: D22873214 fbshipit-source-id: 70b505391b5db02b45fc46ecd7feb303e50c6280	2020-08-01 11:54:58 -07:00
Venkata Chintapalli	ff91b169c7	Changes to match Fused Op: Dequantize->Swish->Quantize (#42255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42255 Changes to match Fused Op: Dequantize->Swish->Quantize * Changes to scale handling Results showing matching intermediate and final Swish_Int8 Op. P137389801 Test Plan: test case test_deq_swish_quant_nnpi.py Reviewed By: hyuen Differential Revision: D22827499 fbshipit-source-id: b469470ca66f6405ccc89696694af372ce6ce89e	2020-07-31 16:54:39 -07:00
Jing Ma	4fc525e729	[Dper3] Implementation of squeezed input to DC++ Summary: This Diff provides an option for DC++ module to use the squeezed sparse feature embeddings to generate attention weights, with the purpose of reducing the network size to achieve QPS gains. There are 3 squeeze options: sum, max, and mean, along the embedding dimension and are provided for both the attention weights and resnet generation. Example workflow: f208474456 {F257199459} Test Plan: 1. Test single ops buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_reduce_back_mean buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_reduce_back_max 2. Test DC++ module buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_arch_one_layer_compressed_embeddings_only_squeeze_input buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_arch_shared_input_squeeze_input buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_input_compress_embeddings_squeeze_input 3. Test Arch buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test -- test_dense_sparse_interaction_compress_dot_arch_dot_compress_pp_squeezed_input 4. e2e test buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_compress_dot_attention_fm_max_fc_size_squeeze_input Reviewed By: taiqing Differential Revision: D22825069 fbshipit-source-id: 29269ea22cb47d487a1c92a1f6daae1055f54cfc	2020-07-31 14:31:43 -07:00
Yan Xie	bdd9ef1981	Support RowWiseSparseAdam on GPU (#35404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35404 Implement RowWiseSparseAdam on CUDA Reviewed By: xw285cornell Differential Revision: D20650225 fbshipit-source-id: 5f871e2f259e362b713c9281b4d94534453995cf	2020-07-31 10:47:29 -07:00
Edward Yang	352e15f1a2	Revert D22812445: Update TensorPipe submodule Test Plan: revert-hammer Differential Revision: D22812445 (`2335430086`) Original commit changeset: e6d824bb28f5 fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d	2020-07-31 10:16:48 -07:00
DeepakVelmurugan	fbb052c2cc	BlackList to BlockList (#42279 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41701 blackList convention to blockList convention Pull Request resolved: https://github.com/pytorch/pytorch/pull/42279 Reviewed By: VitalyFedyunin Differential Revision: D22843178 Pulled By: malfet fbshipit-source-id: c9be5a5f084dfd0e46545d4a3d1124ef59277604	2020-07-30 18:06:49 -07:00
Hong Xu	7d6c4f62ef	Remove 4 unused variables in lp_pool_op.cc (#42329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42329 Reviewed By: VitalyFedyunin Differential Revision: D22850894 Pulled By: mrshenli fbshipit-source-id: 1e91380a432525b83c0bb0bfef0d5067c767cb67	2020-07-30 15:50:17 -07:00
Luca Wehrstedt	2335430086	Update TensorPipe submodule (#42225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CircleCI is all green. Reviewed By: beauby Differential Revision: D22812445 fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f	2020-07-30 02:32:52 -07:00
Hao Lu	4f163df41a	[caffe2] Special handling of If/AsyncIf op in RemoveOpsByType (#42286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42286 One more bug to fix. Operators such as If and AsyncIf need special treatment not just in `onnx::SsaRewrite`, but also in `RemoveOpsByType`. The solution needs two steps: 1) add external inputs/outputs of the subnets of If/AsyncIf op to the inputs/outputs of the op 2) if the inputs/outputs of the If/AsyncIf op need to be renamed as a result, the same inputs/outputs of the subnets need to be renamed as well. I also added unit tests to cover this corner case. Test Plan: ``` buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test mkdir /tmp/models rm -rf /tmp/$USER/snntest rm -rf /tmp/snntest buck run mode/opt admarket/lib/ranking/prediction_replayer/snntest_replayer_test/tools:snntest_replay_test -- --serving_paradigm=USER_AD_PRECOMPUTATION_DSNN ``` Differential Revision: D22834028 fbshipit-source-id: c070707316cac694f452a96e5c80255abf4014bc	2020-07-30 02:02:20 -07:00
Jianyu Huang	f30ac66e79	[caffe2] Fix a performance bug in Dedup SparseAdagrad op (#42287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42287 We shouldn't use block_size for thread dimensions in linear_index_weight_offsets_dedup_kernel, since the kernel doesn't iterate the embedding dimensions. ghstack-source-id: 108834058 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` Reviewed By: jspark1105 Differential Revision: D22800959 fbshipit-source-id: 641d52a51070715c04f9fd286e7e22ac62001f61	2020-07-30 01:00:59 -07:00
Priyanshu	6c251f74b2	replace black_list/blacklist with blocklist/block_list (#42089 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42089 Reviewed By: pbelevich Differential Revision: D22794556 Pulled By: SplitInfinity fbshipit-source-id: 4404845b6293b076b3c8cc02b135b20c91397a79	2020-07-29 16:26:02 -07:00
Xing Wang	27b03d62de	[HT] Clear the device placement tag for the auto gen sum so that we could break the component for FC sharing the same input (#42219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42219 Introduce a new extra info that is tagged on the forward net for the operators sharing the same input. The effect is that the auto gen sum of gradient for the input will not follow the tag of the operator tags in the forward net. This allow more flexible device allocation. Test Plan: # unit test `./buck-out/gen/caffe2/caffe2/python/core_gradients_test#binary.par -r testMultiUseInputAutoGenSumDevice` Reviewed By: xianjiec, boryiingsu Differential Revision: D22609080 fbshipit-source-id: d558145e5eb36295580a70e1ee3a822504dd439a	2020-07-29 15:21:27 -07:00
Xiaomeng Yang	60f51542dc	[Caffe2] Fix spatial_bn bug for computing running_var on CPU or on CUDA without CuDNN (#42151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42151 Previously our Caffe2 SpatialBN op impl was incorrect for computing running_var without unbias coefficent. Actually it should fail the test because the output will be different with CuDNN's output. However, our tests are too weak to find this bug. This diff fix all of them. Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test Reviewed By: houseroad Differential Revision: D22786127 fbshipit-source-id: db80becb67d60c44faae180c7e4257cb136a266d	2020-07-29 11:20:03 -07:00
Ying Zhang	b2ef7fa359	Add a flag to enforce fp32 to fp16 conversion for all inputs of the onnxifi net. (#39931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39931 ATT. Reviewed By: yinghai, ChunliF Differential Revision: D21993492 fbshipit-source-id: ff386e6e9b95a783906fc1ae6a62462e6559a20b	2020-07-28 16:48:43 -07:00
Chunli Fu	8a644f0c13	[Shape Inference] Fix InferFC Summary: Sometimes first dim of X in FC is BATCH_OF_FEATURE_MAX instead of BATCH. This caused an issue in f207899183 (when first dim of X is 64 but is set to 1 in inferFC). Change the check from `!= BATCH` to `== UNKNOWN` Test Plan: unit test Reviewed By: yinghai Differential Revision: D22784691 fbshipit-source-id: eb66ba361d6fe75672b13edbac2fbd269a7e7a00	2020-07-28 16:43:19 -07:00
Venkata Chintapalli	3c084fd358	Dequant => Swish => Quant Test case. (#41976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41976 Dequant => Swish => Quant Test case. (Note: this ignores all push blocking failures!) Test Plan: test_deq_swish_quant_nnpi.py. Reviewed By: hyuen Differential Revision: D22718593 fbshipit-source-id: 1cee503a27e339af6d89c819007511b90bb6610c	2020-07-28 16:05:12 -07:00
Nikita Shulga	fd9205e14b	Enable caffe2 tests for RocM jobs (#41604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41604 Reviewed By: ezyang Differential Revision: D22603703 Pulled By: malfet fbshipit-source-id: 789ccf2bb79668a5a68006bb877b2d88fb569809	2020-07-28 14:21:42 -07:00
Nitish Awasthi	4d17ecb071	Changed Blacklisted to Blocklisted (#42100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41703 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42100 Reviewed By: ngimel Differential Revision: D22780380 Pulled By: SplitInfinity fbshipit-source-id: d465c41f1d4951ab6de55cb827c7ef53975209af	2020-07-28 13:21:26 -07:00
Nikita Shulga	48ae5945de	Skip TestExtractPredictorNet if compiled without OpenCV (#42168 ) Summary: Found while trying to get RocM Caffe2 CI green Pull Request resolved: https://github.com/pytorch/pytorch/pull/42168 Reviewed By: seemethere Differential Revision: D22791879 Pulled By: malfet fbshipit-source-id: 8f7ef9711bdc5941b2836e4c8943bb95c72ef8af	2020-07-28 11:26:55 -07:00
Nikita Shulga	2f61aca17b	Skip DataIO tests relying on LevelDB if compiled without it (#42169 ) Summary: Found while trying to get RocM Caffe2 job green Pull Request resolved: https://github.com/pytorch/pytorch/pull/42169 Reviewed By: seemethere Differential Revision: D22791896 Pulled By: malfet fbshipit-source-id: 9df6233876aec5ead056365499bab970aa7e8bdc	2020-07-28 10:18:26 -07:00
Khalid Almufti	b282297559	Replace whitelist with allowlist (#42067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41757 I've replaced all the whitelist with allowlist for this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067 Reviewed By: pbelevich Differential Revision: D22791690 Pulled By: malfet fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4	2020-07-28 08:01:16 -07:00
Hao Lu	5336ccc1b2	[BugFix] Fix bug in onnx::SsaRewrite (#42148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42148 Differential Revision: D22687388 fbshipit-source-id: facf7a186dd48d6f919d0ff5d42f756977c3f9f4	2020-07-28 01:44:47 -07:00
Yinghai Lu	0a0960126c	If we don't collect tracing, always free the trace data (#42118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42118 We toggle trace on with a certain probablility. In the case of 3 inferences with trace on/off/on. We leak the trace from the first inference. Always clean up the trace will fix it. Test Plan: predictor I created a tiny repro here: D22786551 With this fix, this issue is gone. Reviewed By: gcatron Differential Revision: D22768382 fbshipit-source-id: 9ee0bbcb2bc5f76107dae385759fe578909a683d	2020-07-27 21:49:30 -07:00
Jiyan Yang	c062cdbd90	Log the net if blob doesn't exist when setting output record (#41971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41971 Reviewed By: wx1988 Differential Revision: D22490309 fbshipit-source-id: d967ee211b610f5523a307b5266b9fcb0277a21c	2020-07-27 19:13:50 -07:00
Stephen Chen	f805184165	onnxifi: make it work with AsyncIf Summary: the onnxifi path didn't handle the input/output name rewrite for ssa correctly for AsyncIf op. Add support for it. Also fixed a place where we lose the net type while doing onnxifi transform. Test Plan: Load 163357582_593 which is a multi feed model that uses AsyncIf. This used to fail with c2 not finding some blobs in workspace. Now it works. Reviewed By: dhe95 Differential Revision: D21268230 fbshipit-source-id: ce7ec0e952513d0f251df1bfcfb2b0250f51fd94	2020-07-27 18:27:35 -07:00
Lingyi Liu	d6f1346c37	Add a new op for converting the dense feature to sparse representation Summary: we need this op to avoid the splicing of a dense tensor and then use the Mergesinglescaler op Test Plan: integrated test with dper2 Differential Revision: D22677523 fbshipit-source-id: f4f9a1f06841b0906ec8cbb435482ae0a89e1721	2020-07-27 12:45:37 -07:00
Venkata Chintapalli	4290d0be60	Remove settings for the logit test case. (#42114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42114 Remove settings for the logit test case. (Note: this ignores all push blocking failures!) Test Plan: test_op_nnpi_fp16.py test case. Reviewed By: hyuen Differential Revision: D22766728 fbshipit-source-id: 2fe8404b103c613524cf1beddf1a0eb9068caf8a	2020-07-27 10:59:23 -07:00
Tristan Rice	976e614915	caffe2: add PIPELINE tag (#41482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41482 This adds a new tag for use with pipeline parallelism. Test Plan: CI Reviewed By: heslami Differential Revision: D22551487 fbshipit-source-id: 90910f458a9bce68f7ef684773322a49aa24494a	2020-07-24 15:25:14 -07:00
Jeff Daily	2e95b29988	restore at::Half support for caffe2 SumOp (#41952 ) Summary: PR https://github.com/pytorch/pytorch/issues/40379 added long support but removed at::Half support. Restore at::Half support. CC ezyang xw285cornell neha26shah Pull Request resolved: https://github.com/pytorch/pytorch/pull/41952 Reviewed By: colesbury Differential Revision: D22720656 Pulled By: xw285cornell fbshipit-source-id: be83ca7fe51fc43d81bc0685a3b658353d42f8ea	2020-07-24 10:49:06 -07:00
Priyanshu	401ac2dd39	Replaced whitelisted with allowed (#41867 ) Summary: Closes https://github.com/pytorch/pytorch/issues/41746 Closes https://github.com/pytorch/pytorch/issues/41745 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41867 Reviewed By: izdeby Differential Revision: D22703533 Pulled By: mrshenli fbshipit-source-id: 915895463a92e18f36db93b8884d9fd432c0997d	2020-07-23 16:53:51 -07:00
Ann Shan	dfe7d27d0e	implement lite parameter serializer (#41403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41403 Test Plan: Imported from OSS Reviewed By: kwanmacher Differential Revision: D22611633 Pulled By: ann-ss fbshipit-source-id: b391e8c96234b2e69f350119a11f688e920c7817	2020-07-23 14:25:44 -07:00
Mengchi Zhang	30ce7b3740	Fix bug when compiling with caffe2 (#41868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41868 Fix bug when compiling with caffe2 Reviewed By: jianyuh Differential Revision: D22670707 fbshipit-source-id: aa654d7b9004257e0288c8ae8819ca5752eea443	2020-07-23 09:11:05 -07:00
Rohith Menon	4e16be9073	[MemLeak] Fix memory leak from releasing unique ptr (#41883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41883 Fix memory leak from releasing unique ptr Test Plan: Tested serialization with and without the change. Heap profile without change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 7298.4 MB 4025.2 55.2% 55.2% 4025.2 55.2% c10::alloc_cpu (inline) 3195.3 43.8% 98.9% 3195.3 43.8% caffe2::SerializeUsingBytesOrInt32 63.6 0.9% 99.8% 63.6 0.9% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.5 0.0% 99.9% 2.5 0.0% folly::aligned_malloc (inline) 1.2 0.0% 99.9% 1.2 0.0% caffe2::detail::CopyFromProtoWithCast (inline) 1.0 0.0% 99.9% 1.0 0.0% __new_exitfn 1.0 0.0% 100.0% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::HHWheelTimerBase::newTimer (inline) 0.5 0.0% 100.0% 0.5 0.0% std::__detail::_Hashtable_alloc::_M_allocate_node ``` Heap profile with change: ``` Welcome to jeprof! For help, type 'help'. (jeprof) top Total: 6689.2 MB 4025.2 60.2% 60.2% 4025.2 60.2% c10::alloc_cpu (inline) 2560.0 38.3% 98.4% 2560.0 38.3% caffe2::::HugePagesArena::alloc_huge (inline) 90.9 1.4% 99.8% 90.9 1.4% __gnu_cxx::new_allocator::allocate (inline) 5.0 0.1% 99.9% 5.0 0.1% google::protobuf::RepeatedField::Reserve 2.0 0.0% 99.9% 2.0 0.0% prof_backtrace_impl (inline) 1.0 0.0% 99.9% 20.3 0.3% std::__cxx11::basic_string::_M_construct (inline) 1.0 0.0% 99.9% 1.0 0.0% std::_Function_base::_Base_manager::_M_init_functor (inline) 0.5 0.0% 99.9% 0.5 0.0% folly::UnboundedQueue::allocNextSegment (inline) 0.5 0.0% 100.0% 0.5 0.0% folly::aligned_malloc (inline) 0.5 0.0% 100.0% 0.5 0.0% __new_exitfn ``` Reviewed By: yinghai Differential Revision: D22662093 fbshipit-source-id: d0b8ff1ed26c72b14bb02fb1146c51ef11a7e519	2020-07-22 16:54:19 -07:00
Colin L Reliability Rice	dfa914a90c	Modify lazy_dyndep loading to trigger inside workspace. (#41687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41687 Specifically, this makes a new library (lazy), which can be used from both core and workspace. This allows workspace.Createnet to trigger lazy loading of dyndep dependencies. Test Plan: Added a unit test specifically for workspace.CreateNet Reviewed By: dzhulgakov Differential Revision: D22441877 fbshipit-source-id: 3a9d1af9962585d08ea2566c9c85bec7377d39f2	2020-07-22 15:36:43 -07:00
Yinghai Lu	2d15b39745	[Onnxifi] Support running with quantized int8 inputs (#41820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41820 Pull Request resolved: https://github.com/pytorch/glow/pull/4721 In order to support int8 quantized tensor as an input to OnnxifiOp, we need to - Add support to recognize and extract shape meta from int8 tensor at input of OnnxifiOp - Make a copy of the input data and shift by 128 in Glow if input data is uint8 quantized tensor to get correct result because Glow uses int8 to represent the quantized data regardless. - Propagate correct quantization parameters to through shape info in C2. This diff implements the above. Test Plan: ``` buck test caffe2/caffe2/contrib/fakelowp/test:test_int8_quantnnpi ``` Reviewed By: jackm321 Differential Revision: D22650584 fbshipit-source-id: 5e867f7ec7ce98bb066ec4128ceb7cad321b3392	2020-07-22 13:42:34 -07:00
Venkata Chintapalli	ca68dc7fa2	replace std::clamp with shim (#41855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41855 replace std::clamp with shim Test Plan: test_op_nnpi_fp16.py covers the testing. Reviewed By: hyuen Differential Revision: D22667645 fbshipit-source-id: 5e7c94b499f381bde73f1984a6f0d01fb962a671	2020-07-22 11:06:36 -07:00
Jeong Ukjae	f03156f9df	replace blacklist in caffe2/python/onnx/frontend.py (#41777 ) Summary: Close https://github.com/pytorch/pytorch/issues/41712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41777 Reviewed By: izdeby Differential Revision: D22648532 Pulled By: yinghai fbshipit-source-id: 7f4c9f313e2887e70bb4eb1ab037aea6b549cec7	2020-07-22 10:02:16 -07:00
Mengchi Zhang	5c9918e757	Fix row-wise sparse SparseLengthSum and sparse adagrad fused operator (#41818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41818 Fix row-wise sparse SparseLengthSum and sparse adagrad fused operator Reviewed By: jianyuh Differential Revision: D22345013 fbshipit-source-id: 7c2d6c506b404f15a7aa8f1d0ccadb82e515a4c3	2020-07-21 19:32:16 -07:00
Venkata Chintapalli	3a9a64a4da	Add non zero offset test cases for Quantize and Dequantize Ops. (#41693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41693 Add non zero offset test cases for Quantize and Dequantize Ops. Test Plan: Added new test case test_int8_non_zero_offset_quantize part of the test_int8_ops_nnpi.py test file. Reviewed By: hyuen Differential Revision: D22633796 fbshipit-source-id: be17ee7a0caa6e9bc7b175af539be2e6625ad47a	2020-07-20 16:03:32 -07:00
Eileen Pan	f07816003a	[2/n][Compute Meta] support analysis for null flag features Summary: ## TLDR Support using NaN default value for missing dense features in RawInputProcessor for DPER2. In preparation for subsequent support for null flag features in compute meta. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval. Differential Revision: D22439142 fbshipit-source-id: 99ae9755bd41a5d5f43bf5a9a2819d64f3883005	2020-07-20 13:13:45 -07:00
Venkata Chintapalli	cc3c18edbc	More LayerNorm Vectorization in calcMeanStd function. (#41618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41618 More LayerNorm Vectorization in calcMeanStd function. Test Plan: test covered in test_layernorm_nnpi_fp16.py Reviewed By: hyuen Differential Revision: D22606585 fbshipit-source-id: be773e62f0fc479dbc2d6735f60c2e98441916e9	2020-07-20 11:55:54 -07:00
Alphons Jaimon	ce443def01	Grammar patch 1 (.md) (#41599 ) Summary: A minor spell check! I have gone through a dozen of .md files to fix the typos. zou3519 take a look! Pull Request resolved: https://github.com/pytorch/pytorch/pull/41599 Reviewed By: ezyang Differential Revision: D22601629 Pulled By: zou3519 fbshipit-source-id: 68d8f77ad18edc1e77874f778b7dadee04b393ef	2020-07-20 10:19:08 -07:00
Hongzheng Shi	581e9526bb	[GradualGating] support better k value change (#41557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41557 - add new learning rate functor "slope" - use "slope" learning rate in gated_sparse_feature module Test Plan: buck test dper3/dper3/modules/tests:core_modules_test -- test_gated_sparse_features_shape_num_warmup_tensor_k buck test caffe2/caffe2/python/operator_test:learning_rate_op_test -- test_slope_learning_rate_op Reviewed By: huayuli00 Differential Revision: D22544628 fbshipit-source-id: f2fcae564e79e1d8bcd3a2305d0c11ca7c0d3b3c	2020-07-17 20:44:28 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Hao Lu	39b4701d31	[caffe2][redo] Reimplement RemoveOpsByType with SSA (#41606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41606 The previous diff (D22220798 (`59294fbbb9`) and D22220797) was recently reverted (D22492356 (`28291d3cf8`), D22492355) because of a bug associated with the op AsyncIf. The AsyncIf op has net_defs as args and the SSA rewriting didn't take that into account. It has a special path for the op If, but not for AsyncIf. Several changes I made to fix the bug: 1) Add op AsyncIf to the special path for If op in SSA rewriting 2) clear inputs/outputs of the netdefs that are args in If/AsyncIf ops because they're no longer valid 3) revert renamed inputs/outputs in the arg netdefs that are in the external_outputs in the parent netdef 2) and 3) are existing bugs in the `SsaRewrite` function that were just never exposed before. The algorithm for `RemoveOpsByType` is the same as in my previous diff D22220798 (`59294fbbb9`). The only new changes in this diff are in `onnx::SsaRewrite` and a few newly added unit tests. (Note: this ignores all push blocking failures!) Reviewed By: yinghai Differential Revision: D22588652 fbshipit-source-id: ebb68ecd1662ea2bae14d4be8f61a75cd8b7e3e6	2020-07-17 16:06:43 -07:00
Venkata Chintapalli	43b1923d98	Enable SLS FP32 accumulation SparseLengthsWeightedSumFused8BitRowwiseFakeFP32NNPI Op. (#41577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41577 * Remove skipping test * Use fma_avx_emulation * Increase test examples to 100 (Note: this ignores all push blocking failures!) Test Plan: Tests are covered in test_sls_8bit_nnpi.py Reviewed By: hyuen Differential Revision: D22585742 fbshipit-source-id: e1f62f47eb10b402b11893ffca7a6786e31daa79	2020-07-17 11:19:47 -07:00
Nathan Goldbaum	1e230a5c52	rewrite C++ __torch_function__ handling to work with TensorList operands (#41575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41575 Fixes https://github.com/pytorch/pytorch/issues/34294 This updates the C++ argument parser to correctly handle `TensorList` operands. I've also included a number of updates to the testing infrastructure, this is because we're now doing a much more careful job of testing the signatures of aten kernels, using the type information about the arguments as read in from `Declarations.yaml`. The changes to the tests are required because we're now only checking for `__torch_function__` attributes on `Tensor`, `Optional[Tensor]` and elements of `TensorList` operands, whereas before we were checking for `__torch_function__` on all operands, so the relatively simplistic approach the tests were using before -- assuming all positional arguments might be tensors -- doesn't work anymore. I now think that checking for `__torch_function__` on all operands was a mistake in the original design. The updates to the signatures of the `lambda` functions are to handle this new, more stringent checking of signatures. I also added override support for `torch.nn.functional.threshold` `torch.nn.functional.layer_norm`, which did not yet have python-level support. Benchmarks are still WIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34725 Reviewed By: mruberry Differential Revision: D22357738 Pulled By: ezyang fbshipit-source-id: 0e7f4a58517867b2e3f193a0a8390e2ed294e1f3	2020-07-17 08:54:29 -07:00
Yinghai Lu	eb3bf96f95	During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41464 If input is int8 rowwise quantized, currently we cannot low it to Glow. And previously, we had some error when running with inbatch broadcast. The main issue is that Tile op doesn't support uint8_t type, which is very easily added here. However, this will result in non-ideal situation that we will leave Tile -> Fused8BitRowwiseQuantizedToFloat on host side, which probably hurt the memory bw a lot. Even we later add the support to Fused8BitRowwiseQuantizedToFloat in Glow, it's still not ideal because we are doing redudant compute on identical columns. So the solution here is to swap the order of Fused8BitRowwiseQuantizedToFloat and Tile to make it Tile -> Fused8BitRowwiseQuantizedToFloat. In this way, it will resolve the error we saw immediately. For the short term, we can still run Tile in card. And for longer term, things runs faster on card. The optimization is a heuristic. If in the net, there isn't such pattern, inbatch broadcast will work as it was before. (Note: this ignores all push blocking failures!) Test Plan: ``` buck test caffe2/caffe2/opt/custom:in_batch_broadcast_test ``` Reviewed By: benjibc Differential Revision: D22544162 fbshipit-source-id: b6dd36a5925a9c8103b80f034e7730a7a085a6ff	2020-07-16 21:25:18 -07:00
Colin L Reliability Rice	415ff0bceb	Create lazy_dyndeps to avoid caffe2 import costs. (#41343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41343 Currently caffe2.InitOpLibrary does the dll import uniliaterally. Instead if we make a lazy version and use it, then many pieces of code which do not need the caffe2urrenoperators get a lot faster. One a real test, the import time went from 140s to 68s. 8s. This also cleans up the algorithm slightly (although it makes a very minimal difference), by parsing the list of operators once, rather than every time a new operator is added, since we defer the RefreshCall until after we've imported all the operators. The key way we maintain safety, is that as soon as someone does an operation which requires a operator (or could), we force importing of all available operators. Future work could include trying to identify which code is needed for which operator and only import the needed ones. There may also be wins available by playing with dlmopen (which opens within a namespace), or seeing if the dl flags have an impact (I tried this and didn't see an impact, but dlmopen may make it better). Note that this was previously landed and reverted. The issue was that if a import failed and raised an exception, the specific library would not be removed from the lazy imports. This caused our tests which had libraries that failed to poison all other tests that ran after it. This has been fixed and a unit test has been added for this case (to help make it obvious what failed). Test Plan: I added a new test a lazy_dyndep_test.py (copied from all_compare_test.py). I'm a little concerned that I don't see any explicit tests for dyndep, but this should provide decent coverage. I've added a specific test to handle the poisoning issues mentioned above, which caused the previous version to get reverted. Differential Revision: D22506369 fbshipit-source-id: 7395df4778e8eb0220630c570360b99a7d60eb83	2020-07-16 15:17:41 -07:00
Hector Yuen	d80e0c62be	fix dequantization to match nnpi (#41505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41505 fix the dequantization to match the fixes from quantization Test Plan: test is not conclusive, since only comparing emulation with reference collected from Amy's run running an evaluation workflow at the moment Reviewed By: venkatacrc Differential Revision: D22558092 fbshipit-source-id: 3ff00ea15eac76007e194659c3b4949f07ff02a4	2020-07-16 00:40:57 -07:00
Hector Yuen	26790fb26d	fix quantization mechanism to match nnpi (#41494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41494 revert back to the changes from amylittleyang to make quantization work Test Plan: ran against a dump from ctr_instagram, and verified that: -nnpi and fakelowp match bitwise -nnpi is different at most by 1 vs fbgemm, most likely due to the type of rounding Reviewed By: venkatacrc Differential Revision: D22555276 fbshipit-source-id: 7074521d181f15ef6270985bb71c4b44d25d1c30	2020-07-16 00:40:55 -07:00
Hector Yuen	e6859ec78f	resurrect single quantization op test (#41476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41476 deleted this test by default, re-adding it in its own file to make it more explicit Test Plan: ran the test Reviewed By: yinghai Differential Revision: D22550217 fbshipit-source-id: 758e279b2bab3b23452a3d0ce75fb366f7afb7be	2020-07-16 00:37:46 -07:00
Lu Fang	b2e52186b9	Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41461 capacity is misleading, and we have many wrong uses internally. Let's rename to nbytes to avoid the confusion in future. Ultimately, we could remove this parameter if possible. So far I haven't seen any case this capacity is necessary. Test Plan: oss ci Differential Revision: D22544189 fbshipit-source-id: f310627f2ab8f4ebb294e0dd5eabc380926991eb	2020-07-15 22:04:18 -07:00
peter	404799d43f	Disable failed caffe2 tests for BoundShapeInference on Windows (#41472 ) Summary: Related: https://github.com/pytorch/pytorch/issues/40861 https://github.com/pytorch/pytorch/issues/41471 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41472 Reviewed By: yns88 Differential Revision: D22562385 Pulled By: malfet fbshipit-source-id: aebc600915342b984f4fc47cef0a1e79d8965c10	2020-07-15 19:39:45 -07:00
Pritam Damania	ff6e560301	Add C++ end to end test for RPC and distributed autograd. (#36893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36893 Adding an end to end test for running a simple training loop in C++ for the distributed RPC framework. The goal of this change is to enable LeakSanitizer and potentially catch memory leaks in the Future. Enabling LSAN with python multiprocessing is tricky and we haven't found a solution for this. As a result, adding a C++ test that triggers most of the critical codepaths would be good for now. As an example, this unit test would've caught the memory leak fixed by: https://github.com/pytorch/pytorch/pull/31030 ghstack-source-id: 107781167 Test Plan: 1) Verify the test catches memory leaks. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D21112208 fbshipit-source-id: 4eb2a6b409253108f6b6e14352e593d250c7a64d	2020-07-15 12:59:19 -07:00
Venkata Chintapalli	225289abc6	Adding epsilon input argument to the Logit Op Summary: Adding epsilon input argument to the Logit Op Test Plan: Added test_logit test case. Reviewed By: hyuen Differential Revision: D22537133 fbshipit-source-id: d6f89afd1589fda99f09550a9d1b850cfc0b9ee1	2020-07-15 12:16:19 -07:00
Anush Elangovan	c86699d425	[cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387 ) Summary: Add support for including pytorch via an add_subdirectory() This requires using PROJECT_* instead of CMAKE_* which refer to the top-most project including pytorch. TEST=add_subdirectory() into a pytorch checkout and build. There are still some hardcoded references to TORCH_SRC_DIR, I will fix in a follow on commit. For now you can create a symlink to <pytorch>/torch/ in your project. Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387 Reviewed By: zhangguanheng66 Differential Revision: D22539944 Pulled By: ezyang fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d	2020-07-15 11:09:05 -07:00
Shen Li	8548a21c00	Revert D22543215: Adjust bound_shape_inferencer to take 4 inputs for FCs Test Plan: revert-hammer Differential Revision: D22543215 (`86a2bdc35e`) Original commit changeset: 0977fca06630 fbshipit-source-id: b440f9b1eaeb35ec8b08e899890691e7a77a9f6d	2020-07-15 08:10:39 -07:00
Dinesh Govindaraj	f153b35b9b	Shape inference for SparseToDense in ExpertCombiner Summary: Adding shape inference for SpraseToDense. Proposal impl of shape inference only works when data_to_infer_dim is given, otherwise SpraseToDense output dimension depends on max value of input tensor Test Plan: buck test //caffe2/caffe2/python:sparse_to_dense_test buck test //caffe2/caffe2/python:hypothesis_test -- test_sparse_to_dense Dper3 Changes: f204594813 buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test Reviewed By: zhongyx12, ChunliF Differential Revision: D22479511 fbshipit-source-id: 8983a9baea8853deec53ad6f795c874c3fb93de0	2020-07-15 08:04:48 -07:00
Summer Deng	86a2bdc35e	Adjust bound_shape_inferencer to take 4 inputs for FCs (#41452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41452 The model exported from online training workflow with int8 quantization contains FCs with 4 inputs. The extra input is the quant_param blob. This diff is to adjust the bound_shape_inferencer to get shape info for the quant_param input. Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: anurag16 Differential Revision: D22543215 fbshipit-source-id: 0977fca06630e279d47292e6b44f3d8180a767a5	2020-07-15 01:43:39 -07:00
Yan Xie	921d2a164f	SparseAdagrad/RowWiseSparseAdagrad mean fusion on CPU & GPU and dedup version for RowWiseSparse mean fusion on GPU Summary: 1. Support SparseAdagradFusedWithSparseLengthsMeanGradient and RowWiseSparseAdagradFusedWithSparseLengthsMeanGradient on CPU and GPU 2. Add the dedup implementation of fused RowWiseAdagrad op on GPUs for mean pooling Reviewed By: xianjiec Differential Revision: D22165603 fbshipit-source-id: 743fa55ed5893c34bc6406ddfbbbb347b88091d1	2020-07-14 22:36:16 -07:00
Hector Yuen	f074994a31	vectorize rounding ops (#41439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41439 use RoundToFloat16 on arrays Test Plan: layernorm unittest Reviewed By: venkatacrc Differential Revision: D22540118 fbshipit-source-id: dc84fd22b5dc6a3bd15ad4ec1eecb9db13d64e97	2020-07-14 20:59:39 -07:00
Hector Yuen	96f124e623	remove template arguments of layernorm Summary: remove layernorm templates and make them float since that's the only variant minor fixes in logging and testing Test Plan: ran the test Reviewed By: venkatacrc Differential Revision: D22527359 fbshipit-source-id: d6eec362a6e88e1c12fddf820ae629ede13fb2b8	2020-07-14 20:56:23 -07:00
rohithkrn	c528faac7d	[ROCm] Skip problematic mgpu tests on ROCm3.5 (#41409 ) Summary: nccl tests and parallelize_bmuf_distributed test are failing on rocm3.5.1. Skipping these tests to upgrade the CI to rocm3.5.1 jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41409 Reviewed By: orionr Differential Revision: D22528928 Pulled By: seemethere fbshipit-source-id: 928196b7a62a441d391e69f54b278313ecc75d77	2020-07-14 11:55:43 -07:00
Hector Yuen	5f146a4125	fix include file path in unary ops Summary: fix include file path in unary ops Test Plan: compile Reviewed By: amylittleyang Differential Revision: D22527312 fbshipit-source-id: 589efd2231ff8bd3133cb7844738429927ecee68	2020-07-14 11:08:51 -07:00
Edward Yang	befb22790f	Fix a number of deprecation warnings (#40179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179 - Pass no-psabi to shut up GCC about # Suppress "The ABI for passing parameters with 64-byte alignment has changed in GCC 4.6" - Fix use of deprecated data() accessor (and minor optimization: hoist accessor out of loop) - Undeprecate NetDef.num_workers, no one is serious about fixing these - Suppress warnings about deprecated pthreadpool types Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22234138 Pulled By: ezyang fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849	2020-07-14 09:11:34 -07:00
Hector Yuen	d601325de4	update operators in the mapping to fp16 emulation Summary: add logit and swish to this list Test Plan: f203925461 Reviewed By: amylittleyang Differential Revision: D22506814 fbshipit-source-id: b449e4ea16354cb76915adb01cf317cffb494733	2020-07-13 14:08:24 -07:00
Summer Deng	c451ddaeda	Add shape inference functions for int8 quantization related ops (#41215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41215 To unblock int8 model productization on accelerators, we need the shape and type info for all the blobs after int8 quantization. This diff added shape inference functions for int8 quantization related ops. Test Plan: ``` buck test caffe2/caffe2/quantization/server:int8_gen_quant_params_test buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test ``` Reviewed By: hx89 Differential Revision: D22467487 fbshipit-source-id: 8298abb0df3457fcb15df81f423f557c1a11f530	2020-07-13 12:02:11 -07:00
Yavuz Yetim	d04a2e4dae	Back out "Revert D22329069: Self binning histogram" (#41313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41313 This diff backs out the backout diff. The failure was due to C++ `or` not being supported in MSVC. This is now replaced with \|\| Original commit changeset: fc7f3f8c968d Test Plan: Existing unit tests, check github CI. Reviewed By: malfet Differential Revision: D22494777 fbshipit-source-id: 3271288919dc3a6bfb82508ab9d021edc910ae45	2020-07-13 11:46:34 -07:00
Hector Yuen	dea39b596e	reduce logging for layernorm (#41305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41305 added a warning message when layernorm under/overflows, which is what nnpi does, reducing the frequency of the logging to every 1000 Test Plan: compilation Reviewed By: yinghai Differential Revision: D22492726 fbshipit-source-id: 9343beeae6e65bf3846c6b3d2edd2a08dac85ed6	2020-07-13 10:23:46 -07:00
Mengchi Zhang	67a4f375cd	Pass the number of indices but not embedding size in PyTorch operator (#41315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41315 We should pass the number of indices but not embedding size in SparseAdagrad fused PyTorch operator Reviewed By: jianyuh Differential Revision: D22495422 fbshipit-source-id: ec5d3a5c9547fcd8f95106d912b71888217a5af0	2020-07-12 20:55:40 -07:00
Anurag Gupta	106b0b6a62	Op to create quant scheme blob (#40760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40760 Add op to create a quant scheme. Test Plan: buck test mode/opt caffe2/caffe2/quantization/server:int8_quant_scheme_blob_fill_test {F241838981} Reviewed By: csummersea Differential Revision: D22228154 fbshipit-source-id: 1b7a02c06937c68e2fcccf77eb10a965300ed732	2020-07-11 10:53:10 -07:00
Mengchi Zhang	c864158475	Add fp16 support to SparseLengthSum PyTorch operator (#41058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41058 SparseLengthSum PyTorch operator just accept float and double type before, this diff add fp16 support to SparseLengthSum PT operator. Reviewed By: jianyuh Differential Revision: D22387253 fbshipit-source-id: 2a7d03ceaadbb7b04077cff72ab77da6457ba989	2020-07-11 07:54:32 -07:00
Hao Lu	28291d3cf8	[caffe2] Revert D22220798 (#41302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41302 Test Plan: ``` buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test ``` Differential Revision: D22492356 fbshipit-source-id: efcbc3c67abda5cb9da47e633804a4800d92f89b	2020-07-11 03:28:29 -07:00
Hector Yuen	e544bf2924	fix the range of the random weights used in the int8fc test (#41303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41303 the error came from I0710 18:02:48.025024 1780875 NNPIOptions.cpp:49] [NNPI_LOG][D] [KS] convert_base_kernel_ivp.cpp(524): Output Scale 108240.101562 is out of valid range +-(Min 0.000061 Max 65504.000000)!!! Seems like the weights we are using are too small, thus generating scaling factors out of the range of fp16 (>65k). I am tentatively increasing this factor to a higher value to avoid this. (10x bigger) Also increased max_examples to 100 Test Plan: ran this test Reviewed By: yinghai Differential Revision: D22492481 fbshipit-source-id: c0f9e59b0e70895ab787868ef1d87e6e80106554	2020-07-11 00:19:29 -07:00
Jianyu Huang	095886fa42	[caffe2] Fix the issues when using CUB RadixSort (#41299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41299 When using `cub::DeviceRadixSort::SortPairs` (https://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html), the `end_bit` argument, or the most-significant bit index (exclusive) needed for key comparison, should be passed with `int(log2(float(num_rows)) + 1)` instead of `int(log2(float(num_indice)) + 1)`. This is because all the values in indices array are guaranteed to be less than num_rows (hash_size), not num_indices. Thanks ngimel for pointing this point and thanks malfet for quickly fixing the log2() compilation issues. Note: An optional bit subrange [begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement. Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` Reviewed By: malfet Differential Revision: D22491662 fbshipit-source-id: 4fdabe86244c948af6244f9bd91712844bf1dec1	2020-07-10 22:39:43 -07:00
Nikita Shulga	d1f06da9b7	Solve log2(x:int) ambiguity by using log2(float(x)) (#41295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41295 Differential Revision: D22490995 Pulled By: malfet fbshipit-source-id: 17037e551ce5986f3162389a61932099563c02a7	2020-07-10 20:12:36 -07:00
Nikita Shulga	7bae5780a2	Revert D22329069: Self binning histogram Test Plan: revert-hammer Differential Revision: D22329069 (`16c8146da9`) Original commit changeset: 28406b94e284 fbshipit-source-id: fc7f3f8c968d1ec7d2a1cf7a4d05900f51055d82	2020-07-10 16:22:29 -07:00
Yavuz Yetim	16c8146da9	Self binning histogram (#40875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40875 This op uses the given num_bins and a spacing strategy to automatically bin and compute the histogram of given matrices. Test Plan: Unit tests. Reviewed By: neha26shah Differential Revision: D22329069 fbshipit-source-id: 28406b94e284d52d875f73662fc82f93dbc00064	2020-07-10 13:55:42 -07:00
Jade Nie	75b6dd3d49	Wrap Caffe2's SparseLengthsSum into a PyTorch op (#39596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39596 This diff wraps Caffe2's SparseLengthsSum on GPU as a PT op. Reviewed By: jianyuh Differential Revision: D21895309 fbshipit-source-id: 38bb156f9be8d28225d2b44f5b4c93d27779aff9	2020-07-10 11:19:13 -07:00
Venkata Chintapalli	4a09501fbe	LogitOp LUT based fake FP16 Op. (#41258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41258 LogitOp LUT based fake FP16 Op. (Note: this ignores all push blocking failures!) Test Plan: test_op_nnpi_fp16.py covers the test_logit testing. Reviewed By: hyuen Differential Revision: D22351963 fbshipit-source-id: e2ed2bd9bfdc58c6f823d7d41557109c08628bd7	2020-07-10 10:53:42 -07:00
Jianyu Huang	62e16934cb	[caffe2] Add the dedup implementation of fused RowWiseAdagrad op on GPUs (#40282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40282 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` https://our.intern.facebook.com/intern/testinfra/testrun/4785074632584150 Reviewed By: jspark1105 Differential Revision: D22102737 fbshipit-source-id: fa3fef7cecb1e2cf5c9b6019579dc0f86fd3a3b2	2020-07-10 09:05:24 -07:00
rohithkrn	df252c059c	[ROCm] Skip caffe2 unique op test for rocm3.5 (#41219 ) Summary: unique op test failure in caffe2 blocks upgrading CI to rocm3.5.1. Skipping the test to unblock will re-enable after root causing and fixing the issue. jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41219 Differential Revision: D22471452 Pulled By: xw285cornell fbshipit-source-id: 9e503c8b37c0a4b92632f77b2f8a90281a9889c3	2020-07-09 20:00:29 -07:00
Hector Yuen	a79b416847	make Int8 FC bias quantization use round flush to infinity Summary: the current quantization rounding function uses fbgemm which defaults to round to nearest. The current implementation of hw uses round flush to infinity. Adding such an option to switch the mode of rounding. Test Plan: ran against test_fc_int8 Reviewed By: venkatacrc Differential Revision: D22452306 fbshipit-source-id: d2a1fbfc695612fe07caaf84f52669643507cc9c	2020-07-09 17:25:41 -07:00
Kimish Patel	d6feb6141f	[Vec256][neon] Add neon backend for vec256 (#39341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341 This PR introduces neon backend for vec256 class for float datatype. For now only aarch64 is enabled due to few issues with enabling in aarch32 bit. Test Plan: vec256_test Imported from OSS Differential Revision: D21822399 fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d	2020-07-09 16:25:09 -07:00
Kimish Patel	bddba1e336	Add benchmark for add op. (#40059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059 This benchmark is added specifically for mobile to see if compiler is autovectorizing and thus we have no advantage of neon backend for vec256 for add op. Test Plan: CI Imported from OSS Differential Revision: D22055146 fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5	2020-07-09 16:22:55 -07:00
Nikita Shulga	1f1351488e	Revert D21870844: Create lazy_dyndeps to avoid caffe2 import costs. Test Plan: revert-hammer Differential Revision: D21870844 (`07fd5f8ff9`) Original commit changeset: 3f65fedb65bb fbshipit-source-id: 4f661072d72486a9c14711e368247b3d30e28af9	2020-07-09 14:18:38 -07:00
HC Zhu	2252188e85	[caffe2] Fix spatial_batch_norm_op dividision-by-zero crash (#40806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40806 When the input is empty, the operator will crash on "runtime error: division by zero". This has been causing Inference platform server crashes. Example crash logs: {P134526683} Test Plan: Unit test See reproducing steps in the Test Plan of D22300135 Reviewed By: houseroad Differential Revision: D22302089 fbshipit-source-id: aaa5391fddc86483b0f3aba3efa7518e54913635	2020-07-09 12:04:11 -07:00
Linbin Yu	df1f8a48d8	add null check for c2 tensor conversion (#41096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41096 The spark spot model had some issues in tensor conversion, see P134598596. It happens when we convert an undefined c10 tensor to caffe2 tensor. This diff added a null check. Test Plan: spark spot model runs without problem Reviewed By: smessmer Differential Revision: D22330705 fbshipit-source-id: dfe0f29a48019b6611cad3fd8f2ae49e8db5427e	2020-07-09 11:44:23 -07:00
Colin L Reliability Rice	07fd5f8ff9	Create lazy_dyndeps to avoid caffe2 import costs. (#39488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39488 Currently caffe2.InitOpLibrary does the dll import uniliaterally. Instead if we make a lazy version and use it, then many pieces of code which do not need the caffe2urrenoperators get a lot faster. One a real test, the import time went from 140s to 68s. 8s. This also cleans up the algorithm slightly (although it makes a very minimal difference), by parsing the list of operators once, rather than every time a new operator is added, since we defer the RefreshCall until after we've imported all the operators. The key way we maintain safety, is that as soon as someone does an operation which requires a operator (or could), we force importing of all available operators. Future work could include trying to identify which code is needed for which operator and only import the needed ones. There may also be wins available by playing with dlmopen (which opens within a namespace), or seeing if the dl flags have an impact (I tried this and didn't see an impact, but dlmopen may make it better). Test Plan: I added a new test a lazy_dyndep_test.py (copied from all_compare_test.py). I'm a little concerned that I don't see any explicit tests for dyndep, but this should provide decent coverage. Differential Revision: D21870844 fbshipit-source-id: 3f65fedb65bb48663670349cee5e1d3e22d560ed	2020-07-09 11:34:57 -07:00

1 2 3 4 5 ...

6402 Commits