pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	b5594f7df0	Revert "Use missing-prototypes in torch_cpu (#103725 )" This reverts commit `716b3b893d`. Reverted https://github.com/pytorch/pytorch/pull/103725 on behalf of https://github.com/osalpekar due to Broke caffe2 builds due. More info at [D46920675](https://www.internalfb.com/diff/D46920675) ([comment](https://github.com/pytorch/pytorch/pull/103725#issuecomment-1603129273))	2023-06-22 18:30:31 +00:00
cyy	716b3b893d	Use missing-prototypes in torch_cpu (#103725 ) This PR enables Wmissing-prototypes in torch_cpu except some generated cpp files and the mps and metal backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103725 Approved by: https://github.com/albanD	2023-06-21 13:19:55 +00:00
Nikita Shulga	a229e78544	[BE] Enforce sign-compare (#96723 ) Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds. Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase. The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars here: `6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)` Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869 Do not try to fix sign compare violations in caffe2 codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723 Approved by: https://github.com/albanD	2023-03-15 06:04:20 +00:00
haozhe.zhu	7cd6e6acad	add bf16 in fp32 out fast path for embedingbag in caffe2 perfkernel (#89198 ) Add BF16 in FP32 out kernel into Caffe2 emb perfkernels. And also update the python code-gen files to generate the kernel. The ut will be covered in the next PR(#89199) in this stack ( Tested by nn.EmbeddingBag with BF16 data type) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89198 Approved by: https://github.com/jgong5, https://github.com/kit1980	2022-11-30 13:06:13 +00:00
efiks	ea0ec9d71c	[tourch] BatchBoxCox - fix numerical issue in vectorized code (#88875 ) Summary: Usage of fast math in BatchBoxCox kernel provided different math results between dev and optimized versions which cause few internal test to fail. For now disabling the compiler optimized version and relying on ATEN vectors Differential Revision: D41211784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88875 Approved by: https://github.com/hyuen	2022-11-11 21:58:23 +00:00
efiks	dcefea2706	[caffe2][tourch] Optimize BatchBoxCox (#87585 ) Differential Revision: D40215424 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87585 Approved by: https://github.com/hyuen	2022-11-10 06:11:05 +00:00
efiks	2e4c89eba9	[torch] Unify batch_box_cox implementations into perfkernels folder (#86569 ) Summary: 1) Adding MKL/AVX2 based implementation into perfkernels. This implementation is similar to caffe2/operators/batch_box_cox_op.cc 2) Migrating batch_box_cox_op of caffe2 use this implementation Test Plan: CI Differential Revision: D40208074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86569 Approved by: https://github.com/hyuen	2022-10-23 19:29:25 +00:00
John Detloff	e0229d6517	Remove caffe2 mobile (#84338 ) We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338 Approved by: https://github.com/dreiss	2022-09-08 01:49:55 +00:00
Nikita Shulga	6d85e7dafa	Fix sign-compare in caffe2 Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/75082 Approved by: https://github.com/ngimel	2022-04-05 00:08:05 +00:00
Richard Barnes	1622546050	use irange for loops (#70248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70248 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32813863 fbshipit-source-id: 527244b4a2b220fdfe7f17dee3599603f492a2ca	2022-01-06 23:14:29 -08:00
Rick Weyrauch	8acd0a8b2f	Allow row sizes to support int64/size_t. (#69303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69303 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/792 Follow up to D32715453 (`e60fd10659`), allowing row size to be 64-bit. Test Plan: buck test mode/opt -c fbcode.caffe2_gpu_type=v100,a100 //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt -c fbcode.caffe2_gpu_type=none //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt //caffe2/test: Reviewed By: jspark1105, jianyuh Differential Revision: D32768838 fbshipit-source-id: 9e2b01d8d23e71f8333820e725379c3fc1c0711a	2021-12-14 10:09:08 -08:00
Ramanpreet Nara	f587267dc7	Revert D31705359: use irange for loops 8 Test Plan: revert-hammer Differential Revision: D31705359 (`17e5200441`) Original commit changeset: c9ea2fbc0f9c fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14	2021-12-02 12:55:08 -08:00
Richard Barnes	17e5200441	use irange for loops 8 (#66743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705359 fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b	2021-12-02 10:21:29 -08:00
Shashank Chaudhry	06d1be2447	[NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624 Test Plan: Visual inspection. Sandcastle. Reviewed By: malfet Differential Revision: D31986628 fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659	2021-11-02 22:14:04 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Hong Xu	7acb8b71e1	Remove AVX detection code that duplicates FindAVX.cmake (#61748 ) Summary: This PR deletes some code in `MiscCheck.cmake` that perform the exact same functionality as `FindAVX.cmake`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748 Reviewed By: ejguan Differential Revision: D29791282 Pulled By: malfet fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213	2021-07-20 14:34:36 -07:00
Andrew Gallagher	03de807d81	[caffe2/utils] Add explicit rule to avoid package boundary violation (#60677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60677 Add a rule to wrap conversions.h and depend on that, rather than relying on a glob which violates package boundaries. Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core` Reviewed By: mzlee Differential Revision: D29370841 fbshipit-source-id: d4dd383eb8457d4f5118574e34e6f17c32fde647	2021-06-28 14:43:30 -07:00
Cao Gao	bac6bcd6d8	Update call site for FBGemm quantization util functions. (#624 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59637 Replace FloatToFusedNBitRowwiseQuantizedSBHalf, FusedNBitRowwiseQuantizedSBHalfToFloat, FloatToFused8BitRowwiseQuantizedSBFloat, and Fused8BitRowwiseQuantizedSBFloatToFloat with newer version. Test Plan: CI tests. Reviewed By: dskhudia Differential Revision: D28918581 fbshipit-source-id: a21274add71439c5e51287a0e2ec918a8d8e5392	2021-06-16 10:15:34 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Stiopa Koltsov	547346d663	[caffe2] Fix -Wundef Summary: * `#if` with some undefined name is a warning when `-Wundef` is specified (which is in ovrsource for example) * identifiers starting with two underscores are [reserved for compiler internals](https://en.cppreference.com/w/cpp/language/identifiers) Test Plan: CI Reviewed By: ezyang Differential Revision: D27318070 fbshipit-source-id: 4989fc6a3bf3c176eddd7c25aca47414e4973edd	2021-03-31 22:24:30 -07:00
frdong	d3f784244e	fix comparison of narrow type with wide type in loop condition part2 (#54471 ) Summary: Follow up PR of https://github.com/pytorch/pytorch/issues/53951. This PR fixes remaining semmle warning: comparison of narrow type with wide type in loop condition Pull Request resolved: https://github.com/pytorch/pytorch/pull/54471 Reviewed By: bdhirsh Differential Revision: D27262493 Pulled By: malfet fbshipit-source-id: 05765758da79699936af11de237c3ff3d34373d6	2021-03-23 23:38:38 -07:00
frdong	92770d25cd	fix comparison of narrow type with wide type in loop condition (#53951 ) Summary: fix Semmle warning: Comparison of narrow type with wide type in loop condition For example there is below piece of code: for (int i=0; i<array.size(); ++i) {} The problem is that array.size() return type is size_t can be larger type than int depending on the implementation so there is chance that i overflows (for very large array that array size is beyond the range of integer) and this loop will never be terminated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53951 Reviewed By: zou3519 Differential Revision: D27181495 Pulled By: malfet fbshipit-source-id: 0612c5cedcdc656c193085e7fbb87dd163f20688	2021-03-22 16:40:35 -07:00
Gemfield	700c817a6a	Add install for libCaffe2_perfkernels_avx*.a (#53825 ) Summary: When build libtorch static library, these three static libraries will be generated but won't be installed to CMAKE_INSTALL_LIBDIR: - libCaffe2_perfkernels_avx2.a - libCaffe2_perfkernels_avx512.a - libCaffe2_perfkernels_avx.a This PR will fix this issue. Please be noted that after this fix there still have static libraries missing in CMAKE_INSTALL_LIBDIR, but they belong to third_party repo, and we need to fix in the corresponding repo: - libfoxi_loader.a - libonnx.a - libonnx_proto.a - libfmt.a - libnccl_static.a Pull Request resolved: https://github.com/pytorch/pytorch/pull/53825 Reviewed By: ngimel Differential Revision: D27013844 Pulled By: malfet fbshipit-source-id: 8a84cc72b6ae87393ca26c4e474f5526a7b18ab2	2021-03-15 08:37:11 -07:00
Qi Zhou	0ec717c830	Support int32 indices and offsets in nn.EmbeddingBag (#46758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758 It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type. Test Plan: unit tests Reviewed By: ngimel Differential Revision: D24470808 fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b	2020-11-03 23:33:50 -08:00
Mark Santaniello	1a99689d71	[caffe2] Fix preprocessor checks for FMA Summary: I think this preprocessor check is incorrect. The fused multiply-add (FMA) instructions are not part of AVX2. Test Plan: CI Reviewed By: jspark1105 Differential Revision: D24237836 fbshipit-source-id: 44f9b9179918332eb85ac087827726300f56224e	2020-10-11 11:48:32 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Daya Khudia	09aee06e82	[caffe2] Replace embedding conversion ops with fbgemm functions (#44843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44843 Replace perfkernels calls with fbgemm kernels to avoid code duplication ghstack-source-id: 112496292 Test Plan: CI Reviewed By: radkris-git Differential Revision: D23675519 fbshipit-source-id: 05c285a9eeb9ea109a04a78cb442a24ee40a4aec	2020-09-22 11:57:01 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Jongsoo Park	7a837019a4	[caffe2] optimize 2/4-bit row-wise quantization (#387 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39985 avx2 optimized 2/4-bit row-wise quantization/dequantization in perfkernels. This diff slightly change the numerics of quantization by multiplying with the inverse of scale instead of dividing with scale. Test Plan: In my devserver for i in 2 4 8; do echo $i; buck run mode/opt :fused_rowwise_nbit_conversion_bench -- --bit-rate=$i; done Before this diff 2-bit 3.35394 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 3.60351 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.434467 ms. 100%. FloatToFused8BitRowwiseQuantized After this diff 2-bit 0.606386 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 0.446683 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.4349 ms. 100%. FloatToFused8BitRowwiseQuantized Reviewed By: choudharydhruv, jianyuh Differential Revision: D22033195 fbshipit-source-id: d3a219e47b8345268d90a160c9314ed0d5b71467	2020-06-19 21:28:31 -07:00
Taiqing Wang	8cb1f2f9dc	implement L2 regularization for Adagrad in caffe2 and dper (#37705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372 Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/) Problem formulation L(w) = J(w) + lambda/2 * \|\|w\|\|^2 J(w) is the empirical loss, and \|\|w\|\|^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer. dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i. To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation. Code changes * In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added. * In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors. * In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero. Test Plan: ` buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay ` ` ./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par ` Reviewed By: jspark1105 Differential Revision: D21258652 fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f	2020-05-03 10:42:49 -07:00
Dmytro Dzhulgakov	7576cf8d00	[caffe2] Use cpuinfo in perfkernels to simplify build dependency (#36371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36371 It allows to drop circular dependency and remove unknown_symbols in Buck build. It'd be good to get rid of GetCpuId all together in favor of cpuinfo, but it's not really blocking anything Reviewed By: malfet Differential Revision: D20958000 fbshipit-source-id: ed17a2a90a51dc1adf9e634af56c85f0689f8f29	2020-04-10 13:26:34 -07:00
Evgeny Fiksman	e372f42110	[caffe2] Explicit vectorization of LSTM operator (#35556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35542 Apply explicit vectorization to lstm_unit operator. Enabled by -DENABLE_VECTORIZATION=1 This optimization requires vector library support and was tested with Intel SVML & clang. However, compiler which support OpenMP4.5 with omp simd extention should also benefit. After the code changes In file included from caffe2/caffe2/operators/lstm_unit_op.cc:1: caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] caffe2/caffe2/operators/lstm_unit_op.h:112:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { Test Plan: Check failures at OSS CI - No build failures related to this change - Failing tests are: - py3.6-clang7-rocmdeb-ubuntu16.04-test2 >RuntimeError: fft: ATen not compiled with MKL support - caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test - >gradient_check_test.py::TestMakeTwo Exited with code exit status 1 - pytorch_macos_10_13_py3_test , Test errors like: > ERROR [0.014s]: test_boolean_indexing_weirdness_cpu (__main__.NumpyTestsCPU) RuntimeError: shape mismatch: indexing tensors could not be broadcast together with shapes [0], [2] - caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test - No failure info Reviewed By: jspark1105 Differential Revision: D20484640 fbshipit-source-id: 8fb82dbd6698c8de3e0bbbc0b48d15b70e36ca94	2020-04-01 17:19:56 -07:00
peter	e3daf70184	Fix AVX detection with clang-cl (#35653 ) Summary: Defining macros `/D__F16C__` or sth similar won't work on clang-cl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35653 Differential Revision: D20735878 Pulled By: ezyang fbshipit-source-id: 392a664b0a9e74222b1a03b8c3f6ebb2c61d867e	2020-03-30 07:53:37 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
Jongsoo Park	a7fe200f5f	[caffe2] simplify caffe2 code with fbgemm handling block size 1 emb (#33774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33774 Simplify caffe2 code using D19246900 Test Plan: CI Reviewed By: jianyuh Differential Revision: D20102410 fbshipit-source-id: 8de4d9cfac66898db0718ac6477339fd5e5428e3	2020-02-27 14:45:28 -08:00
Jongsoo Park	c57f8984e6	[caffe2] make order btw div and mul in adgrad consistent (#32974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32974 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/286 Re-attempt of D18805426 . Decided to be consistent with PyTorch Adagrad There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. This diff make them consistent by doing w += lr * grad / (sqrt(moment) + epsilon) in Adagrad and w += lr / (sqrt(moment) + epsilon) * grad in RowWiseSparseAdagrad. The Adagrad order is consistent with PyTorch (see aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp addcmul_cpu_kernel function). The RowWiseSparseAdagrad order is to make compute more efficient. In RowWiseSparseAdagrad, lr / (sqrt(moment) + epsilon) is shared among all elements in the row And, we're not going to use FMA to be consistent with PyTorch (even though it provides a little accuracy benefit) Test Plan: CI Reviewed By: wx1988 Differential Revision: D19342865 fbshipit-source-id: e950c16f2e1c4a2f2a3ef53b1705db373c67f341	2020-02-16 22:45:59 -08:00
Jianyu Huang	a840afbeb4	[pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32683 Pull Request resolved: https://github.com/pytorch/glow/pull/4079 Similar to D17768404, we changed the EmbeddingBag operator for 8-bit fused version to add the option to include the last offset and parallelize the op. ghstack-source-id: 97404645 Test Plan: To generate the AVX2 code (`embedding_lookup_fused_8bit_rowwise_idx_avx2.cc`): ``` python hp_emblookup_codegen.py --fused --use-offsets ``` To test the correctness: ``` buck test //caffe2/torch/fb/sparsenn:test -- test_embedding_bag_byte_rowwise_offsets --print-passing-details ``` Reviewed By: yinghai Differential Revision: D19592761 fbshipit-source-id: f009d675ea3f2228f62e9f86b7ccb94700a0dfe0	2020-01-29 16:04:56 -08:00
Jianyu Huang	3ada2e0d64	[pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4049 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477 We would like to add the intra-op parallelization support for the EmbeddingBag operator. This should bring speedup for the DLRM benchmark: https://github.com/pytorch/pytorch/pull/24385 Benchmark code: ``` from __future__ import absolute_import, division, print_function, unicode_literals import torch import time eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum') input = torch.LongTensor(1500).random_(0, 1000000) offsets = torch.zeros(64, dtype=torch.int64) niter = 10000 s = time.time() for _ in range(niter): out = eb(input, offsets) time_per_iter = (time.time() - s) / niter print('time_per_iter', time_per_iter) print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9) ``` The following results are single core on Skylake T6: - Before our change (with the original caffe2::EmbeddingLookup) time_per_iter 6.313693523406982e-05 GB/s 6.341517821789133 - After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths. time_per_iter 5.7627105712890626e-05 GB/s 6.947841559053659 - With Intel's PR: https://github.com/pytorch/pytorch/pull/24385 time_per_iter 7.393271923065185e-05 GB/s 5.415518381664018 For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6. ghstack-source-id: 97124557 Test Plan: With D16990830: ``` buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench ``` With D17750961: ``` buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb ``` OSS test ``` python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu ``` Buck test ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets" --print-passing-details ``` Generate the AVX2 code for embedding_lookup_idx_avx2.cc: ``` python hp_emblookup_codegen.py --use-offsets ``` Differential Revision: D17768404 fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700	2020-01-23 21:29:44 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Hector Yuen	9e9ca6ec37	add conversion functions to embedding tables (#31083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31083 add (fp32/fp16)<->(int8 rowwise quantized fp32/fp16 scale biases) Test Plan: added unit tests enhanced shape inference tests Reviewed By: jspark1105 Differential Revision: D18920547 fbshipit-source-id: 6b3d7cb93f9d1669ecf511817d73976177632891	2020-01-08 16:56:12 -08:00
Jongsoo Park	7a12ccd003	optimize FloatToFused8BitRowwiseQuantized and Fused8BitRowwiseQuantizedToFloat (#31470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31470 Optimize performance of these two operators. Additionally use nearbyint instead of round to be consistent with 4-bit embedding table quantization. Reviewed By: hyuen Differential Revision: D19072103 fbshipit-source-id: efe96f14aeff7958cceb453ed625d3fd693891ff	2019-12-20 10:09:26 -08:00
Jongsoo Park	e09c415387	Back out "make the order btw div and mul in adagrad update consistent" (#30737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30737 Original commit changeset: 2a8b2a3f5401 Reverting this to be safe until we address test failures in T58528495 Test Plan: CI Reviewed By: wx1988 Differential Revision: D18812384 fbshipit-source-id: 2a3ac554024773022ec827f259127e4c8cffe6e2	2019-12-04 17:43:45 -08:00
Jongsoo Park	d32f261f16	make the order btw div and mul in adagrad update consistent (#30449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30449 There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. In this diff we first compute effective_lr = lr / (sqrt(moment) + epsilon) and then multiply with gradient. Test Plan: CI Reviewed By: protonu Differential Revision: D18703416 fbshipit-source-id: 2a8b2a3f5401466549561412bd22f07abac3c598	2019-12-02 13:53:38 -08:00
Jongsoo Park	649e7f057e	fix comment index_size->output_size (#29831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29831 As title. Thanks Aleks Zi for finding this! Test Plan: Just changing comments Reviewed By: zlateski Differential Revision: D18511259 fbshipit-source-id: 5f1ad9ba53db9b22622a556ec214ced361ec016a	2019-11-16 01:49:02 -08:00
James Donald	7f485121a6	Avoid MSVC _cvtsh_ss() workaround with clang-cl (#29726 ) Summary: We (me fnabulsi bmcdb) have a handful of fixes used locally to build and run with clang-cl. I am aware of https://github.com/pytorch/pytorch/issues/8784 but it has not been touched in almost a year. It may be more practical to upstream the non-controversial fixes piecewise. For example, this one. Here, the dummy version of `_cvtsh_ss` for MSVC is not required (and hence causes conflicts) when using clang-cl so can be #ifdef'd out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29726 Differential Revision: D18478120 Pulled By: ezyang fbshipit-source-id: cdcd94251e68347446f2ad1ac5a0e71089f7d0ab	2019-11-13 12:49:13 -08:00

1 2 3

105 Commits