pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Omkar Salpekar	6b65b3cbd8	[Distributed] DeleteKey API for c10d TCP Store (#45401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45401 Added a DeleteKey API for the TCP Store ghstack-source-id: 112997162 Test Plan: Modified the existing get/set test to use delete. verified that the correct keys were deleted and that the numKeys API returned the right values Reviewed By: mrshenli Differential Revision: D23955730 fbshipit-source-id: 5c9f82be34ff4521c59f56f8d9c1abf775c67f9f	2020-09-28 15:30:39 -07:00
Yangxin Zhong	190f91e3db	Adding Histogram Binning Calibration to DSNN and Adding Type Double to Caffe2 ParallelSumOp/SumReluOp Summary: As title. Test Plan: FBL job without this diff failed: f221545832 Error message: ``` NonRetryableException: AssertionError: Label is missing in training stage for HistogramBinningCalibration ``` FBL job with canary package built in this diff is running without failure: f221650379 Reviewed By: chenshouyuan Differential Revision: D23959508 fbshipit-source-id: c077230de29f7abfd092c84747eaabda0b532bcc	2020-09-28 15:21:31 -07:00
Yangxin Zhong	9163e8171e	Adding Type Double to Caffe2 Mean Op Summary: Adding support for type double to caffe2 MeanOp and MeanGradientOp. Test Plan: All tests passed. Example FBL job failed without this diff: f221169563 Error message: ``` c10::Error: [enforce fail at mean_op.h:72] . Mean operator only supports 32-bit float, but input was of type double (Error from operator: input: "dpsgd_8/Copy_3" input: "dpsgd_8/Copy_4" output: "dpsgd_8/Mean_2" name: "" type: "Mean" device_option { device_type: 0 device_id: 0 }) ``` Example FBL job is running without failure with the canary package built from this diff: f221468723 Reviewed By: chenshouyuan Differential Revision: D23956222 fbshipit-source-id: 6c81bbc390d812ae0ac235e7d025141c8402def1	2020-09-28 13:35:29 -07:00
Natalia Gimelshein	78caa028b6	Revert D23009117: [Distributed] DeleteKey API for c10d TCP Store Test Plan: revert-hammer Differential Revision: D23009117 (`addf94f2d6`) Original commit changeset: 1a0d95b43d79 fbshipit-source-id: ad3fe5501267e1a0a7bf23410766f1e92b34b24d	2020-09-27 12:04:42 -07:00
Thomas Bredillet	0fa551f0ab	[c2] Fix int types for learning rate Summary: Currently GetSingleArgument is overflowing since it's expecting an int instead of an int64 when using a 1cycle (hill policy) annealing schedule Test Plan: unittest buck test caffe2/caffe2/python/operator_test:learning_rate_op_test Differential Revision: D23938169 fbshipit-source-id: 20d65df800d7a0f1dd9520705af31f63ae716463	2020-09-26 10:59:29 -07:00
Omkar Salpekar	addf94f2d6	[Distributed] DeleteKey API for c10d TCP Store (#43963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43963 Added a DeleteKey API for the TCP Store ghstack-source-id: 112939762 Test Plan: Modified the existing get/set test to use delete. verified that the correct keys were deleted and that the numKeys API returned the right values Reviewed By: jiayisuse Differential Revision: D23009117 fbshipit-source-id: 1a0d95b43d79e665a69b2befbaa059b2b50a1f66	2020-09-26 00:54:21 -07:00
Omkar Salpekar	304e1d1e19	[Distributed] getNumKeys API to c10d TCPStore (#43962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43962 TCPStore needs a getNumKeys API for our logging needs. ghstack-source-id: 112939761 Test Plan: Adding tests to C++ Store Tests Reviewed By: pritamdamania87 Differential Revision: D22985085 fbshipit-source-id: 8a0d286fbd6fd314dcc997bae3aad0e62b51af83	2020-09-26 00:49:00 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Bram Wasti	d1a11618f5	[static runtime] Add _out variants and reuse memory (#44128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44128 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604304 Pulled By: bwasti fbshipit-source-id: 06a23cb75700a0fc733069071843b7b498e7b9e9	2020-09-25 11:03:06 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
jjsjann123	99e0a87bbb	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 ) Summary: A lot of changes are in this update, some highlights: - Added Doxygen config file - Split the fusion IR (higher level TE like IR) from kernel IR (lower level CUDA like IR) - Improved latency with dynamic shape handling for the fusion logic - Prevent recompilation for pointwise + reduction fusions when not needed - Improvements to inner dimension reduction performance - Added input -> kernel + kernel launch parameters cache, added eviction policy - Added reduction fusions with multiple outputs (still single reduction stage) - Fixed code generation bugs for symbolic tiled GEMM example - Added thread predicates to prevent shared memory form being loaded multiple times - Improved sync threads placements with shared memory and removed read before write race - Fixes to FP16 reduction fusions where output would come back as FP32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45218 Reviewed By: ezyang Differential Revision: D23905183 Pulled By: soumith fbshipit-source-id: 12f5ad4cbe03e9a25043bccb89e372f8579e2a79	2020-09-24 23:17:20 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Dianshi Li	03dde4c62a	Resend diff D23858329 (#45315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45314 in D23858329 (`721cfbf842`), we put PriorCorrectionCalibrationPrediction unit test in OSS file which causes test failure issue in public trunk. this diff moves it to FB only test file. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op buck test //caffe2/caffe2/fb/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op ``` all pass. Reviewed By: houseroad Differential Revision: D23899012 fbshipit-source-id: 1ed97d8702e2765991e6caf5695d4c49353dae82	2020-09-24 18:41:49 -07:00
Haixin Liu	1539d4a664	Add operator to compute the equalization scale (#45096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45096 Add operator to compute the equalization scale. This will be used in the integration of equalization into dper int8 fixed quant scheme quantization flow. Design docs: https://fb.quip.com/bb7SAGBxPGNC https://fb.quip.com/PDAOAsgoLfRr Test Plan: buck test caffe2/caffe2/quantization/server:compute_equalization_scale_test Reviewed By: jspark1105 Differential Revision: D23779870 fbshipit-source-id: 5e6a8c220935a142ecf8e61100a8c71932afa8d7	2020-09-24 15:19:49 -07:00
Danny Huang	cd7a682282	[caffe2] adds hypothesis test for queue ops cancel (#45178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45178 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * Adds a hypothesis test for queue ops cancellation. Test Plan: ## Unit test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000 ``` ``` Summary Pass: 1000 ListingSuccess: 1 ``` Reviewed By: d4l3k Differential Revision: D23847576 fbshipit-source-id: 2fc351e1ee13ea8b32d976216d2d01dfb6fcc1ad	2020-09-24 14:43:52 -07:00
Danny Huang	cbe1eac1f4	[caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#45177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45177 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * When an error occurs in a net or it got cancelled, running ops will have the `Cancel` method called. This diff adds `Cancel` method to the `SafeEnqueueBlobsOp` and `SafeDequeueBlobsOp` to have the call queue->close() to force all the blocking ops to return. * Adds unit test that verified the error propagation. Test Plan: ## Unit test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000 ``` ``` Summary Pass: 1000 ListingSuccess: 1 ``` Reviewed By: d4l3k Differential Revision: D23846967 fbshipit-source-id: c7ddd63259e033ed0bed9df8e1b315f87bf59394	2020-09-24 14:22:46 -07:00
Xiaomeng Yang	e2bcdc7b69	[Caffe2] Fix LayerNormOp when batch_size == 0. (#45250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45250 [Caffe2] Fix LayerNormOp when batch_size == 0. Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test Reviewed By: houseroad Differential Revision: D23892091 fbshipit-source-id: 9a34654dd6880c9d14b7111fcf850e4f48ffdf91	2020-09-24 12:30:03 -07:00
Mike Ruberry	956a25d061	Revert D23858329: [PT Model Split] Support 2 operators in PT by C2 conversion Test Plan: revert-hammer Differential Revision: D23858329 (`721cfbf842`) Original commit changeset: ed37118ca7f0 fbshipit-source-id: 30c700f80665be11afc608b00a77766064e60b35	2020-09-23 21:20:21 -07:00
Jordan Fix	c760bc8fb1	Add GlowLoadAOTModel flag (#45189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45189 Pull Request resolved: https://github.com/pytorch/glow/pull/4902 Test Plan: Test locally Reviewed By: yinghai Differential Revision: D23810445 fbshipit-source-id: 56e717d80abbfe76b15d0f4249e1e399a9722753	2020-09-23 20:50:04 -07:00
Dianshi Li	721cfbf842	[PT Model Split] Support 2 operators in PT by C2 conversion (#45231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45231 There are two operators: `PriorCorrectionCalibrationPrediction` and `GatherRangesToDense` is not supported in PT which makes GLOW cannot work. To unblock, we first try to use C2->PT conversion. In the long-term, we need to implement PT custom ops. This diff does this conversion to unblock current project. Test Plan: Run unit test. the Test input is from current DPER example. All pass. ```buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op --print-passing-details > c2 reference output > [0.14285715 0.27272728 0.39130434 0.5 ] > PT converted output > tensor([0.1429, 0.2727, 0.3913, 0.5000]) buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op --print-passing-details c2 reference output > [array([[6, 5, 4, 3], [0, 0, 0, 0]], dtype=int64)] > PT converted output > [tensor([[6, 5, 4, 3], [0, 0, 0, 0]])] ``` Reviewed By: allwu, qizzzh Differential Revision: D23858329 fbshipit-source-id: ed37118ca7f09e1cd0ad1fdec3d37f66dce60dd9	2020-09-23 18:31:57 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Hao Lu	c0267c6845	[caffe2] Support data types in shape hints (#45110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45110 A recent change in DSNN quantizes the ad embedding to 8 bits. Ad embeddings are part of the inputs to the DSNN merge net. To correctly pass shape hints of input tensors including quantized ad embeddings, we need to be able to annotate the data types in shape hints. A bit on the corner cases, if type is omitted or not a valid type, e.g., white spaces, instead of throwing an exception, I decided to return the default type, float. Test Plan: ``` buck test caffe2/caffe2/fb/opt:shape_info_utils_test ``` Reviewed By: yinghai Differential Revision: D23834091 fbshipit-source-id: 5e072144a7a7ff4b5126b618062dfc4041851dd3	2020-09-22 17:49:33 -07:00
Daya Khudia	09aee06e82	[caffe2] Replace embedding conversion ops with fbgemm functions (#44843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44843 Replace perfkernels calls with fbgemm kernels to avoid code duplication ghstack-source-id: 112496292 Test Plan: CI Reviewed By: radkris-git Differential Revision: D23675519 fbshipit-source-id: 05c285a9eeb9ea109a04a78cb442a24ee40a4aec	2020-09-22 11:57:01 -07:00
Brandon Lin	36ec8f8fb8	[dper3] Create dper LearningRate low-level module (#44639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44639 As title; this will unblock migration of several modules that need learning rate functionality. Test Plan: ``` buck test //dper3/dper3/modules/low_level_modules/tests:learning_rate_test ``` Reviewed By: yf225 Differential Revision: D23681733 fbshipit-source-id: 1d98cb35bf6a4ff0718c9cb6abf22401980b523c	2020-09-22 08:26:07 -07:00
Hector Yuen	32c1a8c79f	adjust shape inference in sls tests (#44936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44936 need to provide max sequence size and max element size instead of total added a check that onnxifi was succesful Test Plan: sls tests Reviewed By: yinghai Differential Revision: D23779437 fbshipit-source-id: 5048d6536ca00f0a3b0b057c4e2cf6584b1329d6	2020-09-21 22:09:55 -07:00
Lin.Sung	f77ba0e48c	Change typo 'momemtum' to 'momentum' (#45045 ) Summary: As the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45045 Reviewed By: mruberry Differential Revision: D23808563 Pulled By: mrshenli fbshipit-source-id: ca818377f4c23d67b037c146fef667ab8731961e	2020-09-21 19:03:26 -07:00
Lingyi Liu	2d884f2263	Optimize Scale function (#44913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18322 Optimize Scale function i-am-not-moving-c2-to-c10 Test Plan: buck test mode/dbg caffe2/caffe2/python/operator_test:weighted_sum_test Reviewed By: BIT-silence Differential Revision: D14575780 fbshipit-source-id: db333a7964581dcaff6e432ff1d6b517ba1a075f	2020-09-18 14:31:33 -07:00
Lucas Hosseini	af3fc9725d	Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803 Test Plan: CI Reviewed By: lw Differential Revision: D23732022 fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece	2020-09-18 13:51:43 -07:00
Saif Ul Islam	e400150c3b	Fixed for caffe2/opt/tvm_transformer.cc (#44249 ) Summary: Fixes #https://github.com/pytorch/pytorch/issues/41706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44249 Reviewed By: gmagogsfm Differential Revision: D23752331 Pulled By: SplitInfinity fbshipit-source-id: 1d7297e080bc1e065129259e406af7216f3f0665	2020-09-18 00:03:59 -07:00
Jongsoo Park	086a2e7a4e	[caffe2] add cost inference for FusedFakeQuantFC and FusedFakeQuantFCGradient (#44840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44840 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44762 Move CostInferenceForFCGradient to fc_inference.cc/h to be used in multiple .cc files. Test Plan: CI Reviewed By: qizzzh Differential Revision: D23714877 fbshipit-source-id: d27f33e270a93b0e053f2af592dc4a24e35526cd	2020-09-17 14:07:17 -07:00
Mike Ruberry	b6f4bb0a70	Revert D23236088: [pytorch][PR] [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp Test Plan: revert-hammer Differential Revision: D23236088 (`0ccc38b773`) Original commit changeset: daa90d9ee324 fbshipit-source-id: 933c7deab177250075683a9bea143ac37f16a598	2020-09-16 23:32:50 -07:00
Danny Huang	0ccc38b773	[caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#44495 ) Summary: ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. * When an error occurs in a net or it got cancelled, running ops will have the `Cancel` method called. * This diff adds `Cancel` method to the `SafeEnqueueBlobsOp` and `SafeDequeueBlobsOp` to have the call queue->close() to force all the blocking ops to return. * Adds unit test that verified the error propagation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44495 Test Plan: ## Unit Test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test ``` Reviewed By: dzhulgakov Differential Revision: D23236088 Pulled By: dahsh fbshipit-source-id: daa90d9ee32483fb51195e269a52cf5987bb0a5a	2020-09-16 18:17:34 -07:00
Venkata Chintapalli	a3835179a1	[FakeLowP] Addressing FakeLowP OSS issues. (#44819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44819 [12:39 AM] Cherckez, Tal please review the following patch. should address these issues that our validation team found: A) test_op_nnpi_fp16: hypothesis to trigger max_example*max_example. B) batchnorm: batchNorm has derived from unit test which doesnt have setting required for hypothesis. hence default value as 100 getting set. Test Plan: buck test //caffe2/caffe2/contrib/fakelowp/test/... https://our.intern.facebook.com/intern/testinfra/testrun/5910974543950859 Reviewed By: hyuen Differential Revision: D23740970 fbshipit-source-id: 16fcc49f7bf84a5d7342786f671cd0b4e0fc87d3	2020-09-16 13:56:11 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Nikita Shulga	1718b16d15	[Caffe2] gcs_cuda_only is trivial if CUDA not available (#44578 ) Summary: Make `gcs_cuda_only` and `gcs_gpu_only` return empty device lists if CUDA/GPU(CUDA or RocM) not available Pull Request resolved: https://github.com/pytorch/pytorch/pull/44578 Reviewed By: walterddr Differential Revision: D23664227 Pulled By: malfet fbshipit-source-id: 176b5d964c0b02b8379777cd9a38698c11818690	2020-09-16 12:24:08 -07:00
Abdelrauf	6954ae1278	Vec256 Test cases (#42685 ) Summary: [Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676) Testing Current list: - [x] Blends - [x] Memory: UnAlignedLoadStore - [x] Arithmetics: Plus,Minu,Multiplication,Division - [x] Bitwise: BitAnd, BitOr, BitXor - [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual - [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp - [x] SignManipulation: Absolute, Negate - [x] Interleave: Interleave, DeInterleave - [x] Rounding: Round, Ceil, Floor, Trunc - [x] Mask: ZeroMask - [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal - [x] Trigonometric: Sin, Cos, Tan - [x] Hyperbolic: Tanh, Sinh, Cosh - [x] InverseTrigonometric: Asin, ACos, ATan, ATan2 - [x] Logarithm: Log, Log2, Log10, Log1p - [x] Exponents: Exp, Expm1 - [x] ErrorFunctions: Erf, Erfc, Erfinv - [x] Pow: Pow - [x] LGamma: LGamma - [x] Quantization: quantize, dequantize, requantize_from_int - [x] Quantization: widening_subtract, relu, relu6 Missing: - [ ] Constructors, initializations - [ ] Conversion , Cast - [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex) #### Notes on tests and testing framework - some math functions are tested within domain range - mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions. - some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~ - round was tested against pytorch at::native::round_impl. ~~for double type on Vsx vec_round failed for (even)+0 .5 values~~ . it was solved by using vec_rint - ~~complex types are not tested~~ After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain - ~~quantizations are not tested~~ Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions - the testing framework should be improved further - ~~For now `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~ Vec256 Test cases will be built for each CPU_CAPABILITY Fixes: https://github.com/pytorch/pytorch/issues/15676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42685 Reviewed By: malfet Differential Revision: D23034406 Pulled By: glaringlee fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8	2020-09-16 11:48:02 -07:00
Yan Xie	285ba0d068	Enable fp16 for UniformFill (#44540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44540 Support output type to be fp16 for UniformFill Reviewed By: jianyuh Differential Revision: D23558030 fbshipit-source-id: 53a5b2c92cfe78cd11f55e6ee498e1bd682fe4a1	2020-09-15 15:09:18 -07:00
Yan Xie	4ce6af35c4	Enable fp16 for CUDA SparseLengthsSum/Mean (#44089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44089 Add support of fp16 as input type in SparseLengthSum/Mean caffe2 operator Reviewed By: xianjiec Differential Revision: D23436877 fbshipit-source-id: 02fbef2fde17d4b0abea9ca5d17a36aa989f98a0	2020-09-15 11:10:54 -07:00
Natalia Gimelshein	e703c17967	Revert D23584071: [dper3] Create dper LearningRate low-level module Test Plan: revert-hammer Differential Revision: D23584071 (`a309355be3`) Original commit changeset: f6656531b1ca fbshipit-source-id: b0a93f4286053fb8576a70278edca3a7d89c722b	2020-09-12 20:45:30 -07:00
Brandon Lin	a309355be3	[dper3] Create dper LearningRate low-level module Summary: As title; this will unblock migration of several modules that need learning rate functionality. Test Plan: ``` buck test //dper3/dper3/modules/low_level_modules/tests:learning_rate_test ``` WIP: need to add more learning rate tests for the different policies Reviewed By: yf225 Differential Revision: D23584071 fbshipit-source-id: f6656531b1caba38c3e3a7d6e16d9591563391e2	2020-09-12 15:33:29 -07:00
Hector Yuen	0743d013a6	fuse layernorm + quantize (#44232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44232 enhance layernorm to optionally quantize its output add fusion code to replace instances of layernorm +quantization Test Plan: tested layernorm net_runner P141557987 Reviewed By: venkatacrc Differential Revision: D23510893 fbshipit-source-id: 32f57ba2090d35d86dcc951e0f3f6a8901ab3153	2020-09-12 13:32:33 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00
Danny Huang	2b8f0b2023	[caffe2] adds Cancel to OperatorBase and NetBase (#44145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44145 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * Adds `NetBase::Cancel()` to NetBase which iterates over the entire list of operators and call Cancel. * Cancel on all ops was added to Net since there's nothing Asyc specific about it. * `AsyncSchedulingNet` calls parent Cancel. * To preserve backwards compatibility, `AsyncSchedulingNet`'s Cancel still calls `CancelAndFinishAsyncTasks` . * Adds `Cancel()` to `OperatorBase`. Reviewed By: dzhulgakov Differential Revision: D23279202 fbshipit-source-id: e1bb0ff04a4e1393f935dbcac7c78c0baf728550	2020-09-11 12:50:26 -07:00
Brandon Lin	ea55820606	[dper3] Export PackSegments and UnpackSegments to Pytorch Summary: As title. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test/:torch_integration_test -- test_pack_segments ``` Reviewed By: yf225 Differential Revision: D23610495 fbshipit-source-id: bd8cb61f2284a08a54091a4f982f01fcf681f215	2020-09-11 09:29:24 -07:00
Hector Yuen	6c98d904c0	handle the case of -0.0 on tanh quantization (#44406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44406 this fix makes fakelowp identical to hw - mask out the floating point number with 0x7fff so we are always dealing with positive numbers - dsp implementation is correct, ice-ref suffers from this same problem Test Plan: - tested with test_fusions.py, can't enable the test until the fix in ice-ref appears Reviewed By: venkatacrc Differential Revision: D23603878 fbshipit-source-id: a72d93a4bc811f98d1b5e82ddb204be028addfeb	2020-09-10 01:18:45 -07:00
Giuseppe Ottaviano	6324ef4ced	[caffe2] Speed up compilation of aten-op.cc (#44440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44440 `aten-op.cc` takes a long time to compile due to the large generated constructor. For each case, the `std::function` constructor and the initialization functions are inlined, producing a huge amount of intermediate code that takes a long time to optimize, given that many compiler optimization passes are superlinear in the function size. This diff moves each case to a separate function, so that each one is cheap to optimize, and the constructor is just a large jump table, which is easy to optimize. Reviewed By: dzhulgakov Differential Revision: D23593741 fbshipit-source-id: 1ce7a31cda10d9b0c9d799716ea312a291dc0d36	2020-09-09 21:21:48 -07:00
Gang Shen	058d7228ec	Expose the interface of nesterov of SGD Optimizer from caffe2 to dper Summary: Expose the interface of `nesterov` of SGD Optimizer from caffe2 to dper. dper sgd optimizer (https://fburl.com/diffusion/chpobg0h) has referred to NAG sgdoptimizer in caffe2: https://fburl.com/diffusion/uat2lnan. So just need to add the parameter 'nesterov' in dper sgd optimizer. Analysis of run resutls: N345540. - train_ne increases as momentum (m) decreases. - for m=0.95, 0.9: eval_ne is lower with NAG than production (no NAG, m = 0.95). - for m=0.99: eval_ne with or without NAG is higher than production. It indicates larger variance in validation and overfit in training (lower train_ne). Test Plan: 1. unit tests: `buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_without_nesterov` `buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_with_nesterov` . 1. build dper front end package: `flow-cli canary ads.dper3.workflows.sparse_nn.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here https://www.internalfb.com/intern/buck/build/2a368b55-d94b-45c1-8617-2753fbce994b. Flow package version is ads_dper3.canary:856b545cc6b249c0bd328f845adeb0d2. . 2. To build dper back end package: `flow-cli canary dper.workflows.dper3.train --mode opt --entitlement ads_global --run-as-secure-group team_ads_ml_ranking`. The build result (refreshed) is here: https://www.internalfb.com/intern/buck/build/70fa91cd-bf6e-4a08-8a4d-41e41a77fb52. Flow package version is aml.dper2.canary:84123a34be914dfe86b1ffd9925869de. . 3. Compare prod with NAG-enabled runs: a) refreshed prod run (m=0.95): f213877098 NAG enabled run (m=0.95): f213887113 . b) prod run (m=0.9): f214065288 NAG enabled run (m=0.9): f214066319 . c) prod run (m=0.99): f214065804 NAG enabled run (m=0.99): f214066725 . d) change date type of nestrov to `bool` and launched a validation run NAG enabled (m=0.95): f214500597 Reviewed By: ustctf Differential Revision: D23152229 fbshipit-source-id: 61703ef6b4e72277f4c73171640fb8afc6d31f3c	2020-09-09 19:37:00 -07:00
Danny Huang	5ee31308e6	[caffe2] exposes Net cancellation through pybind state (#44043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44043 To invoke `cancel` from the net instance in Python, we expose it through pybind state. Reviewed By: dzhulgakov Differential Revision: D23249660 fbshipit-source-id: 45a1e9062dca811746fcf2e5e42199da8f76bb54	2020-09-09 18:13:13 -07:00
Xiaodong Wang	ba6ddaf04c	[pyper] export caffe2 bucketize GPU operator to pytorch Summary: Exporting the Bucketize operator on CUDA. Also adding unit test. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/sparsenn:gpu_test -- test_bucketize Differential Revision: D23581321 fbshipit-source-id: 7f21862984c04d840410b8718db93006f526938a	2020-09-09 16:08:53 -07:00

1 2 3 4 5 ...

6402 Commits