pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Hao Lu	53dff784e2	[caffe2] Fix inplace ops in onnx::SsaRewrite (#46134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46134 Make sure in-place ops stay in-place after SsaRewrite. This seems to break the premise of SSA, but it's necessary to ensure correctness. Note here we only preserve the inplace ops that enforce inplace. Ops like `Relu` don't enforce inplace, they allow inplace. (Note: this ignores all push blocking failures!) Reviewed By: yinghai Differential Revision: D24234957 fbshipit-source-id: 274bd3ad6227fce6a98e615aad7e57cd2696aec3	2020-10-22 13:26:31 -07:00
Hao Lu	51bf7bed84	[caffe2] Allow memonger to optimize nets with inplace(enforced) ops (#46560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46560 Follow-up for D24236604 (`16c52d918b`). For nets that pass the schema check, memonger actually makes sure to preserve the inplaceness of operators if they are already inplace. So we can safely enable it for correct input nets. (Note: this ignores all push blocking failures!) Differential Revision: D24402482 fbshipit-source-id: a7e95cb0e3eb87adeac79b9b69eef207957b0bd5	2020-10-22 13:23:33 -07:00
Richard Barnes	c44300884e	Clarify timing of GetDeviceProperty() (#46715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715 Test Plan: N/A Reviewed By: ezyang Differential Revision: D24455538 fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009	2020-10-22 11:29:31 -07:00
Alexander Grund	93719440b8	Replace map(lambda constructs (#46462 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462 Reviewed By: zou3519 Differential Revision: D24422343 Pulled By: ezyang fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237	2020-10-22 09:50:22 -07:00
Jeff Hwang	9b5197b763	[mlf][efficiency] add tensor inference function to last-n collector op (#46693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46693 title Test Plan: unit tests Reviewed By: hx89 Differential Revision: D23946770 fbshipit-source-id: f7c3d4a1b4ef3b0e5f56e5a9a30f5003ce9f40b0	2020-10-22 01:15:00 -07:00
Daya Khudia	f47231bf0e	[caffe2][dnnlowp] Remove openmp usage in quantize dnnlowp op Summary: It creates cpu overload issues when openmp gets enabled and OMP_NUM_THREADS=1 is not set. Test Plan: buck test //caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test Reviewed By: jspark1105 Differential Revision: D24437305 fbshipit-source-id: 426209fc33ce0d4680c478f584716837ee62cb5e	2020-10-20 19:33:56 -07:00
Alexander Grund	5b0f400488	Replace list(map(...)) constructs by list comprehensions (#46461 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant. It also fixes a bug detected by this where the argument order of `map` was confused: `030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)` Fixes https://github.com/pytorch/pytorch/issues/46392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461 Reviewed By: ailzhang Differential Revision: D24367015 Pulled By: ezyang fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7	2020-10-19 18:42:49 -07:00
Jiakai Liu	3d421b3137	[pytorch] rewrite of the python binding codegen with the v2 API (#46244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244 - What does the generated binding code do? The Python binding codegen produces code that takes the input list of PyObjects, finds the matching ATen C++ function using PythonArgParser, converts the PyObjects into C++ types and calls the ATen C++ function: ``` +--------+ parsing +------------------------+ binding +-----------------------+ \| PyObjs \| ---------> \| PythonArgParser Output \| ---------> \| Cpp Function Dispatch \| +--------+ +------------------------+ +-----------------------+ ``` - Are Python arguments 1-1 mapped to C++ arguments? Python arguments might be reordered, packed, unpacked when binding to C++ arguments, as illustrated below: ``` // Binding - Reorder & Packing // aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor Python Args Cpp Args ----------------------------------------------------------- 0: size size 1: names names 2: memory_format -------+ 3: dtype -----+-\|--> options 4: layout / \| 5: device / +--> memory_format 6: pin_memory / 7: requires_grad -+ // Binding - Unpacking // aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) Python Args Cpp Args ----------------------------------------------------------- +----> max /-----> max_values 0: input / self 1: dim / dim 2: keepdim / keepdim 3: out -----+ ``` - Why do we want to rewrite the python binding codegen? The old codegen takes Declarations.yaml as input. It doesn't distinguish between Python arguments and C++ arguments - they are all mixed together as a bag of non-typed dict objects. Different methods process these arg objects and add new attributes for various different purposes. It's not so obvious to figure out the semantics of these attributes. The complicated binding logic happens implicitly and scatteredly. ``` +--------------------+ \| Native Functions \| +--------------------+ \| \| v +--------------------+ \| Cpp Signatures \| +--------------------+ \| \| v +--------------------+ \| Declarations.yaml \| +--------------------+ \| +-------------------------------------+ \| +-------> \| PythonArgParser Schema \| \| \| +-------------------------------------+ \| \| . \| \| . v \| . +--------------------+ +-------------------------------------+ \| NonTyped Args Objs \| --> \| PythonArgParser -> Cpp Args Binding \| +--------------------+ +-------------------------------------+ \| . \| . \| . \| +-------------------------------------+ +-------> \| Cpp Function Dispatch \| +-------------------------------------+ ``` This PR leverages the new immutable data models introduced in the new aten codegen. It introduces dedicated data models for python schema. This way, we can not only avoid subtle Declaration.yaml conversions but also decouple the generation of python schema, python to c++ binding and c++ function call. The ultimate state will be like the following diagram: ``` +-------------------+ +-------------------------------------+ +-------> \| Python Signatures \| --> \| PythonArgParser Schema \| \| +-------------------+ +-------------------------------------+ \| \| . \| \| . \| \| . +------------------+ \| +-------------------------------------+ \| Native Functions \| +-------> \| PythonArgParser -> Cpp Args Binding \| +------------------+ \| +-------------------------------------+ \| \| . \| \| . \| \| . \| +-------------------+ +-------------------------------------+ +-------> \| Cpp Signatures \| --> \| Cpp Function Dispatch \| +-------------------+ +-------------------------------------+ ``` This PR has migrated the core binding logic from tools/autograd/gen_python_functions.py to tools/codegen/api/python.py. It produces the byte-for-byte same results (tested with #46243). Will migrate the rest of gen_python_functions.py in subsequent PRs. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24388874 Pulled By: ljk53 fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841	2020-10-19 17:36:45 -07:00
jiej	ac146c4820	[nvFuser] Switching to `CudaFusionGuard` from `BailOut` for nvfuser - update 2 (#46452 ) Summary: 1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor; 2. dropped support for legacy fuser; 3. re-enabled nvfuser tests; 4. added registration for profiling record to allow profiling on user specified nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452 Reviewed By: zou3519, anjali411 Differential Revision: D24364642 Pulled By: ngimel fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b	2020-10-19 15:44:31 -07:00
Tristan Rice	0c9787c758	caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987 This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb) For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current. Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions. Reviewed By: dzhulgakov Differential Revision: D23219710 fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814	2020-10-16 16:08:35 -07:00
Tristan Rice	dd169ca17c	caffe2/plan_executor: propagate exceptions from reporter substeps (#46424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46424 Currently if an exception occurs in a reporter thread the process is killed via std::terminate. This adds support for handling the reporter exception if FLAGS_caffe2_handle_executor_threads_exceptions is set to true. Test Plan: buck test mode/opt -c python.package_style=inplace //caffe2/caffe2/python:hypothesis_test //caffe2/caffe2:caffe2_test_cpu -- --stress-runs 100 Reviewed By: dahsh Differential Revision: D24345027 fbshipit-source-id: 0659495c9e27680ebae41fe5a3cf26ce2f455cb3	2020-10-16 12:28:57 -07:00
Jongsoo Park	c37baa9177	[caffe2] add concat benchmark (#46457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46457 Wanted to see if using CopyMatrix specialized for float that uses mkl_somatcopy can be faster but it wasn't. Still want to check in benchmark that can be used later. Test Plan: . Reviewed By: dskhudia Differential Revision: D24345901 fbshipit-source-id: d3e68dbb560e3138fda11c55789cd41bc0715c6d	2020-10-16 08:48:42 -07:00
Jeff Hwang	ecf63351bc	[mlf][efficiency] modify equalization scale operator to return single output (#46449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46449 modifies `ComputeEqualizationScale` to have a single output `S` Test Plan: ``` buck test caffe2/caffe2/quantization/server:compute_equalization_scale_test ``` plus e2e tests Reviewed By: hx89 Differential Revision: D23946768 fbshipit-source-id: 137c2d7a58bb858db411248606a5784b8066ab23	2020-10-16 01:22:37 -07:00
Ben Koopman	757173a4da	Add Sigmoid operator from Caffe2 (#46286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46286 commonize fp16 unary operators Reviewed By: hyuen Differential Revision: D24199660 fbshipit-source-id: 99bffa24dc3fa459561a7a2743b1a4dce4be5d58	2020-10-15 16:13:37 -07:00
Hao Lu	16c52d918b	[caffe2] Bypass memonger for in-place ops (#46378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46378 Reviewed By: dzhulgakov Differential Revision: D24236604 fbshipit-source-id: 9f599687467ea969e89243482f8e2a41f7db0a23	2020-10-15 16:03:52 -07:00
Zeliang Chen	38c97fb6f0	[shape inference] add shape inference support Summary: * To make pruning op compatible with shape inference, we introduced a new quantile argument (as in D23463390) to differentiate dynamic/fixed pruning. * The fixed pruning op has defined output shapes. However, the input shapes are not determined therefore we want to bypass the input shapes checking for two pruning ops, as implemented in this diff. Test Plan: buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425102187909 ✓ ListingSuccess: caffe2/caffe2/opt:bound_shape_inference_test - main (1.973) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.FC3D (2.604) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSumFused4BitRowwise (2.635) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.FC (2.690) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Int8QuantizeInferInputBackwards (2.705) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSum (2.729) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Reshape (2.754) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ConcatMissingInput (2.770) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ElementwiseOp (2.770) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Tile (2.785) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Bucketize (2.789) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSumFused8BitRowwise (2.807) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSum8BitRowwiseSparse (2.841) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Split (2.863) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ConcatInferInputBackwards (2.894) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ElementwiseInferInputBackwards (2.898) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Combo0 (2.902) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.LengthsRangeFill (2.964) ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Quantization (2.964) Summary Pass: 18 ListingSuccess: 1 Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425102187909 ``` buck test caffe2/caffe2/fb/opt:bound_shape_inference_net_test ``` Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/3096224780078093 ✓ ListingSuccess: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - main (14.092) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipLengths (15.508) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessing (15.521) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRanges (16.198) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.RowwisePrune (16.302) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.GatherRanges1 (16.585) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo3 (16.865) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessingWithCast (16.907) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.GatherRanges2 (16.921) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.LengthsRangeFill (17.157) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRangesAndGatherRanges (17.277) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessing (17.274) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRangesGatherSigridHash (17.554) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo1 (17.645) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessingDEFAULT (17.887) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessingDEFAULT (17.929) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.f97293388_0 (19.343) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.GatherRangesToDense1 (19.489) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessingWithCast (19.887) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.xray_v11 (19.905) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.SigridTransforms (20.080) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo2 (20.086) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.vanillaSparseNN (59.847) ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.gather (97.822) Summary Pass: 23 ListingSuccess: 1 ``` ## Workflow testing === * non-DI/fixed quantile/user side/non-self-binning f224250571 * non-DI/fixed quantile/user+ad side/non-self-binning f224250610 * DI/fixed quantile/user side/self-binning f224250637 * DI/fixed quantile/user+ad side/self-binning f224250662 * non-DI/dynamic quantile/user+ad side/non-self-binning f224250705 * DI/dynamic quantile/user+ad side/self-binning f224250760 Reviewed By: ChunliF Differential Revision: D23647390 fbshipit-source-id: 3ec1c0eaea53bd4d5eda4a0436577216f7fa8ead	2020-10-15 00:46:06 -07:00
Nikita Shulga	84771fc64f	[caffe2] Add 10s deadline for all Caffe2 hypothesis fuzz tests Test Plan: CI Reviewed By: walterddr Differential Revision: D24298118 fbshipit-source-id: 2286c1e37ed9c43f404b888386c0bd4b0b6a55c6	2020-10-14 06:30:09 -07:00
Nikita Shulga	1fcec6e72b	[caffe2] Add operator schema for FP16SparseNorm (#46300 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/pull/45551 Also Fix signed-unsigned comparison warnings in test/cpp/tensorexpr/test_train_impl.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/46300 Reviewed By: walterddr Differential Revision: D24294821 Pulled By: malfet fbshipit-source-id: 16bffa71ec0d2d38208855223a3c5efb18414ab5	2020-10-13 18:58:23 -07:00
Jianyu Huang	5c67cc7a9e	[caffe2] Enable fp16 for SparseNormalize op (#45551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45551 The FP16 version of SparseNormalize op in Caffe2 is missing. This Diff adds FP16 support to unblock MC process of adding FP16 to Dper3. Check https://fb.quip.com/L0T2AXGwUY3n#EReACAeifk3 . One question is whether the pure FP16 Sparse Normalized op will affect the accuracy? Maybe we should do it in FP32 domain. ghstack-source-id: 114184398 Test Plan: ``` buck run mode/opt //caffe2/caffe2/python/operator_test:sparse_normalize_test ``` ``` buck run mode/opt -c python.package_style=inplace mode/no-gpu //caffe2/caffe2/python/benchmarks:sparse_normalize_benchmark -- --fp16 ``` Reviewed By: jspark1105 Differential Revision: D24005618 fbshipit-source-id: 8b918ec4063fdaafa444779b95206ba2b7b38537	2020-10-13 15:35:22 -07:00
Danny Huang	85c3ba5588	[caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46110 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145). * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added PlanExecutorTest `ErrorPlanWithCancellableStuckNet` for plan executor. * Set cancelCount to zero at the beginning of tests to avoid global state be carried over in some test environment. Test Plan: ## Unit Test Added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 1000 ``` Reviewed By: d4l3k Differential Revision: D24226577 fbshipit-source-id: c834383bfe6ab50747975c229eb42a363eed3458	2020-10-12 12:00:15 -07:00
John Lundell	3883cdb87e	TensorInferenceFunction checks Summary: Added OpSchema::NeedsAllInputShapes wrapper around the TensorInferenceFunction to fix exception when referencing the dim array when the input shape was unknown. There may be other operators that could use a similar change, these are just the ones that was causing InferShapesAndTypes throw an exception for my examples. Test Plan: Tested with notebook n352716 Differential Revision: D23745442 fbshipit-source-id: d63eddea47d7ba595e73c4693d34c790f3a329cc	2020-10-11 16:08:58 -07:00
Mark Santaniello	1a99689d71	[caffe2] Fix preprocessor checks for FMA Summary: I think this preprocessor check is incorrect. The fused multiply-add (FMA) instructions are not part of AVX2. Test Plan: CI Reviewed By: jspark1105 Differential Revision: D24237836 fbshipit-source-id: 44f9b9179918332eb85ac087827726300f56224e	2020-10-11 11:48:32 -07:00
Jongsoo Park	4c87d337af	[Caffe2] use the real new fbgemm sparse adagrad interface (#46132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46132 As title Test Plan: . Reviewed By: dskhudia Differential Revision: D24197694 fbshipit-source-id: 2bfe8f52409fa500d2ea359dec7f521cffb20efb	2020-10-10 08:57:54 -07:00
Zeliang Chen	34951e9adc	[shape inference] adding a new flag to the struct Summary: Adding a new flag shape_is_set to the structs for shape inference on in-place op to prevent duplicated inference. Test Plan: buck test mode/opt-clang caffe2/caffe2/opt:bound_shape_inference_test buck test mode/opt-clang caffe2/caffe2/fb/opt:shape_info_utils_test Reviewed By: ChunliF Differential Revision: D24134767 fbshipit-source-id: 5142e749fd6d1b1092a45425ff7b417a8086f215	2020-10-09 19:29:08 -07:00
Jongsoo Park	da033e0b2d	[Caffe2] use new fbgemm sparse adagrad interface with temp name (#46089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46089 Follow-up of D24195799 Test Plan: . Reviewed By: dskhudia Differential Revision: D24196753 fbshipit-source-id: 216512822cfb752984bb97bd229af9746e866eaa	2020-10-09 12:51:43 -07:00
Danny Huang	87226f72d2	[caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46080 temp removal of ErrorPlanWithCancellableStuckNet, will fill out more Test Plan: ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest ``` remove a test Reviewed By: fegin Differential Revision: D24213971 fbshipit-source-id: e6e600bad00b45c726311193b4b3238f1700526e	2020-10-08 23:35:45 -07:00
Danny Huang	487624e369	[caffe2] plan executor error propagation test with blocking cancellable op (#45319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45319 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145) * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added `ErrorPlanWithCancellableStuckNet` for plan executor. * We set a plan with two nets: one stuck net with blocking operator that never returns, and one with error net with error op that throws, and tested it throw and cancel. Test Plan: ## Unit Test added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 ``` ``` Summary Pass: 400 ListingSuccess: 2 ``` Reviewed By: d4l3k Differential Revision: D23920548 fbshipit-source-id: feff41f73698bd6ea9b744f920e0fece4ee44438	2020-10-08 19:54:49 -07:00
Michael Ranieri	c6672a608b	caffe2 missing cctype header (#46052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46052 `<cctype>` is what provides `isuppper`, etc. https://en.cppreference.com/w/cpp/header/cctype clang on windows complaining about the missing header. Test Plan: CI green Reviewed By: yinghai Differential Revision: D24201925 fbshipit-source-id: 7b242200f09c30bf78dde226e14ee4be71758b87	2020-10-08 16:48:49 -07:00
n-v-k	64b0686986	Expose ChannelShuffle (#46000 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45999 Also small fix for caffe2 counterpart Pull Request resolved: https://github.com/pytorch/pytorch/pull/46000 Reviewed By: mruberry Differential Revision: D24185855 Pulled By: ngimel fbshipit-source-id: c5d599bb8100b86b81c6901f1b8b8baefc12cb16	2020-10-08 16:00:01 -07:00
Tristan Rice	59e4803b94	Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981 This is a recommit of previously reverted D20850851 (`3fbddb92b1`). TL;DR - combining condition_variables and atomics is a bad idea https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them. Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120 buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100 buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/ will ensure no timeouts in OSS Reviewed By: walterddr, dahsh Differential Revision: D24165505 fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196	2020-10-08 14:17:30 -07:00
Bugra Akyildiz	298e0e0d57	Refactor gather_ranges_to_dense from Python to C++ (#46021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46021 Refactor gather_ranges_to_dense from Python to C++ https://www.internalfb.com/intern/tasks/?t=71935517 Test Plan: General build/test: ``` buck build -c python.helpers=true fbcode/caffe2 buck test -c python.helpers=true fbcode/caffe2 ``` Specific Test: ```buck test mode/dev-nosan //caffe2/torch/fb/sparsenn:test -- 'test_gather_ranges_to_dense \(caffe2\.torch\.fb\.sparsenn\.tests\.sparsenn_operators_test\.SparseNNOperatorsTest\)' ``` Reviewed By: houseroad Differential Revision: D23858186 fbshipit-source-id: 8bce7c279275c8ff7316901b455e1d1dd7e36b13	2020-10-08 11:03:06 -07:00
Thomas Viehmann	d3d8da7a8e	Enable CUDA Fuser for ROCm (#45965 ) Summary: This enables the cuda fuser on ROCm and enables tests for them. Part of this patch is based on work of Rohith Nallamaddi, thank you. Errors are my own, of course. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45965 Reviewed By: seemethere Differential Revision: D24170457 Pulled By: walterddr fbshipit-source-id: 3dd25b3501a41d2f00acba3ce8642ce51c49c9a6	2020-10-08 10:41:56 -07:00
Yinghai Lu	c9caa828f5	Throw special exception when backend compilation is met with fatal error (#45952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45952 Pull Request resolved: https://github.com/pytorch/glow/pull/4967 When glow compilation meets with nonrecoverable fatal error (hardware is busted), we would like to throw a special exception other than the normal caffe2::EnforceNotMet so that we can signal the upper layer application to handle it differently. Test Plan: Manually code some error and add LOG(FATAL) in the special exception path and wait for application to fatal. Reviewed By: ipiszy Differential Revision: D24156792 fbshipit-source-id: 4ae21bb0d36c89eac331fc52dd4682826b3ea180	2020-10-08 00:46:01 -07:00
Yinghai Lu	a92b49f7c8	[Onnxifi] Don't throw exception when we cannot write out debug files (#45979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45979 For some reason, sometime we cannot write out the debug files. This shouldn't block the whole service. Hence, we opt in to error out instead of throw error. Test Plan: Run net_runner test at `/` and observe error being printed out but the test passes. Reviewed By: ipiszy Differential Revision: D24165081 fbshipit-source-id: a4e1d0479d54d741e615e3a00b3003f512394fd4	2020-10-08 00:18:24 -07:00
Nikita Shulga	81d40aaf96	Add `[zc]heevd` to the list of MKL symbols exported from torch_cpu (#46002 ) Summary: cpu implementation of `torch.symeig` uses `[zc]heev`, but MAGMA only have `d`-suffixed flavors of those functions Fixes https://github.com/pytorch/pytorch/issues/45922 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46002 Reviewed By: walterddr Differential Revision: D24177730 Pulled By: malfet fbshipit-source-id: 0e9aeb60a83f8a4b8ac2a86288721bd362b6040b	2020-10-07 20:50:10 -07:00
Venkata Chintapalli	a36f11a3a5	[FakeLowP] T76913842 Make AddFakeFp16 take int inputs (#45992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45992 Created a template version of AddFakeFp16 to take both float and int inputs. Test Plan: notebook with local bento kernel: N369049 Reviewed By: amylittleyang Differential Revision: D24169720 fbshipit-source-id: 679de391224f65f6c5b3ca890eb0d157f09712f6	2020-10-07 17:43:00 -07:00
Hao Lu	0927e02a6a	[caffe2] Do not run RemoveOpsByType on recurrent networks (#45986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45986 Recurrent networks have subnets that are not well supported by `RemoveOpsByType`. Here we exclude recurrent networks by adding the same check as in memonger. Test Plan: ``` buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test ``` AdIndexer canary for sanity check: https://www.internalfb.com/intern/ads/canary/430059485214766620 Differential Revision: D24167284 fbshipit-source-id: fa90d1c1f34af334a599d879af09d4c0bf7c27bd	2020-10-07 14:07:52 -07:00
Rong Rong	1bb2d41b68	Revert D20850851: caffe2/plan_executor: wait for 1 minute after exception and then abort Test Plan: revert-hammer Differential Revision: D20850851 (`3fbddb92b1`) Original commit changeset: 330503775d80 fbshipit-source-id: 612c6c3c4d5586bc8ad00a112cd00fc74fb44243	2020-10-07 09:04:24 -07:00
Bert Maher	50f89578dd	[te] Add a benchmark harness (#45875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45875 Adds a googlebenchmark harness for perf testing programs generated by tensorexpr, sans any pytorch wrappings (for python-level benchmarks of tensorexpr, see benchmarks/tensorexpr). Currently there's a harness for gemm that sets up the problem using torch (and also measures the perf of a torch::mm to give a baseline). Right now there's just an unoptimized implementation that is expected to be not very fast. More optimized versions are coming. Sample output from my dev box: ``` Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) -------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------- Gemm/Torch/128/128/128 73405 ns 73403 ns 8614 GFLOPS=57.1411G/s Gemm/TensorExprNoopt/128/128/128 3073003 ns 3072808 ns 229 GFLOPS=1.36497G/s ``` Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D24142403 Pulled By: bertmaher fbshipit-source-id: 3354aaa56868a43a553acd1ad9a192f28d8e3597	2020-10-06 16:57:27 -07:00
n-v-k	c1af91a13a	[caffe2] SliceOp axes indexing fixes. (#45432 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45432 Reviewed By: albanD Differential Revision: D24132547 Pulled By: dzhulgakov fbshipit-source-id: d67f7a92d806fb8ac8fc8f522b251d3a8fb83037	2020-10-06 13:21:08 -07:00
Tristan Rice	3fbddb92b1	caffe2/plan_executor: wait for 1 minute after exception and then abort (#45297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45297 If we have two concurrent substeps and one of them throws an exception and the other is blocking, we'll currently hang. This waits up to 1 minute for it to complete before terminating the process. Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: dahsh Differential Revision: D20850851 fbshipit-source-id: 330503775d8062a34645ba55fe38e6770de5e3c7	2020-10-06 12:59:09 -07:00
Pawel Garbacki	fb50fcaa82	[C2] Add string equality operator (#45886 ) Summary: This diff adds a string equality checking operator. Another attempt at reverted D24042344 (`cf48872d28`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45886 Test Plan: unit tests, github builds Reviewed By: dzhulgakov Differential Revision: D24129953 fbshipit-source-id: caa53c7eac5c67c414c37e9d93416104f72556b9	2020-10-06 12:08:26 -07:00
Dmytro Dzhulgakov	519c086418	Revert D24042344: [C2] Add string equality operator Test Plan: revert-hammer Differential Revision: D24042344 (`cf48872d28`) Original commit changeset: c8997c6130e3 fbshipit-source-id: 3d8aec1104a2a59c67ab4b7e77caeaf9fc94ae1d	2020-10-05 15:09:03 -07:00
Pawel Garbacki	cf48872d28	[C2] Add string equality operator Summary: This diff adds a string equality checking operator. Test Plan: Unit tests Differential Revision: D24042344 fbshipit-source-id: c8997c6130e3438f2ae95dae69f76978e2e95527	2020-10-05 10:47:53 -07:00
Thomas Viehmann	3ab88c3903	Enable TorchBind tests on ROCm (#45426 ) Summary: The torchbind tests didn't work be cause somehow we missed the rename of caffe2_gpu to torch_... (hip for us) in https://github.com/pytorch/pytorch/issues/20774 (merged 2019-06-13, oops) and still tried to link against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45426 Reviewed By: VitalyFedyunin Differential Revision: D24112439 Pulled By: walterddr fbshipit-source-id: a66a574e63714728183399c543d2dafbd6c028f7	2020-10-05 09:38:12 -07:00
Xianjie Chen	73e9daa35f	[caffe2] Optimize Dedup version of RowWiseSparseAdagrad fused op by WarpReduce (#45649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45649 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44275 * This Diff applies WarpReduce optimization for dedup version of RowWiseSparseAdagrad fused op. Basically we can achieve ~1.33x performance improvement with this Diff. * Port the way from D23948802 to find the num_dup * fix the likely bug about fp16 in the dedup kernel Reviewed By: jianyuh Differential Revision: D23561994 fbshipit-source-id: 1a633fcdc924593063a67f9ce0d36eadb19a7efb	2020-10-02 14:28:24 -07:00
Marcio Porto	c31066ac9d	Torch Integration Test Formatting Changes (#45740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45740 Reviewed By: esqu1 Differential Revision: D23869021 fbshipit-source-id: 5910d44f9475bd7a53dc0478b69b39572dc8666f	2020-10-02 14:02:31 -07:00
Marcio Porto	b234acd414	Exposes SparseToDenseMask Caffe2 Operator (#45670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45670 Reviewed By: esqu1 Differential Revision: D23868280 fbshipit-source-id: d6afa129c073fe611cb43a170025bc3c880a4bec	2020-10-02 10:05:13 -07:00
Michael Suo	18253f4a48	Fix BUILD_CAFFE2 if FBGEMM and NNPACK are not built (#45610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45610 Also add to the usual documentation places that this option exists. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24058199 Pulled By: suo fbshipit-source-id: 81574fbd042f47587e2c7820c726fac0f68af2a7	2020-10-01 14:58:55 -07:00
Kunal Bhalla	4564444c91	[RFC][caffe2] TaskGroup.__repr__ shouldn't have side effects Summary: `__repr__` calling self.tasks() ends up marking the instance as "used", which doesn't seem appropriate. I was debugging a value being passed around and then ran into `Cannot add Task to an already used TaskGroup.` because the value had been logged once. Test Plan: Added a unit test -- didn't see a clean public method to test it, but I'm happy to add one if that makes sense. Will wait for sandcastle to trigger everything else; I'm not at all familiar with this code so any other recommendations would be great! Reviewed By: cryptopic Differential Revision: D23541198 fbshipit-source-id: 5d1ec674a1ddaedf113140133b90e0da6afa7270	2020-10-01 14:21:03 -07:00

1 2 3 4 5 ...

6457 Commits