pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Huayu Li	cd4c3b48a6	Add LN after specialzied output embeddings and flexible LCE (#35178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35178 * add layer norm (LN) after specialized output embeddings * add flexible lce inside specialized module Test Plan: * unit-tests * buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- * buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_6 -- * workflows * flexible lce: f177025325 {F232112501} * LN: f177025301 {F232112982} Differential Revision: D20586281 fbshipit-source-id: 664e77cb4cb5bec6646cafd2e4afb88aff27df03	2020-04-30 15:32:09 -07:00
Fei Tian	809ee9d04c	Enable personalized FC weight_init and sparse_emb weight_init (#31707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31707 Change the initialization value for FC weight init and sparse embedding lookup init. Previous default initialization is uniform(-\sqrt(1/input_dim), \sqrt(1/input_dim)); Now pass into a flexible hyperparameter, say \alpha into it, to change into uniform(-\sqrt(\alpha/input_dim), \sqrt(\alpha/input_dim)); Reviewed By: chonglinsun Differential Revision: D18825615 fbshipit-source-id: 4c5f2e07f2b3f5d642fd96d64dbf68892ebeb30b	2020-01-07 10:10:54 -08:00
Summer Deng	a42d093db2	FCTransposed to FbFCPacked (#29766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29766 Add FbgemmPackTranspose op to support the packing on FCTransposed weights Add FCTransposed to FbFCPacked transformation to Dper fp16 exporter Test Plan: ``` buck test mode/opt caffe2/caffe2/fb/fbgemm:fb_fc_packed_op_test ``` ``` buck test mode/opt caffe2/caffe2/python:layers_test ``` Differential Revision: D18482306 fbshipit-source-id: e8f1947b3d0d04892293509ebf88742f5f0f5997	2019-12-10 10:18:21 -08:00
Xing Wang	a1513dced3	Integrate FC fp16 exporter into Dper2 (#26582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26582 Add the blob quantization. replace the op in the eval/predictor net. Test Plan: # Unit test: ----- buck build fblearner/flow/projects/dper/tests/validators:test_exporter_options_validators ./buck-out/gen/fblearner/flow/projects/dper/tests/validators/test_exporter_options_validators#binary.par ---- buck build caffe2/caffe2/fb/dper/layer_models/tests:exporter_test ./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/exporter_test-2.7#binary.par Reviewed By: chocjy Differential Revision: D17439720 fbshipit-source-id: 68de5d0322b0111aeca5ed552210bf80a4cddc78	2019-09-29 10:19:28 -07:00
Yan Zhu	003f97cefa	fc layer accept axis argument (#13822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13822 as title Reviewed By: xianjiec Differential Revision: D12996338 fbshipit-source-id: 1aa61e71e2d79535325ea7034c82e1cb6bf3a9f6	2018-11-11 13:44:57 -08:00
sf-wind	5b86c3af4a	Update from facebook (#8384 ) * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * Remove the code per soumith's comments * Remove the code per soumith's comments * Remove blank lines in the end of file * Resolve conflicts for torch/_thnn/utils.py * Update MKL exporter to IDEEP ops TSIA * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove the code per soumith's comments * [ONNX] Add an ATen fallback pathway for ONNX export (#8273) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface * Remove imaginary file (#8415) * [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format * Enable some reduce operators' ONNX backend tests (#8418) * fix old comment to point to the right file (#8416) * Stop pinning nccl version. (#8421) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) * Enable some of the ONNX backend test on broadcasting (#8423) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast * Expose proto utils and ONNX (#8073) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files * Rebase creates some weird situations, revert them manually * Remove more weird changes due to rebase * Need to add thread_name.cc after merge	2018-06-13 13:10:45 -07:00
Yinghai Lu	ef8f556212	[Caffe2] Changes done inside Facebook (#6378 ) * fix unit test for sqrt op From the error logging: [idx, grad, grad_estimate] are: [[ 146. 0.5 0.45776367] [ 147. 0.5 0.45776367] The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; ) The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss) This diff - increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :) - also clean up, and merge the test case for inplace Vs. non-inplace Tested with: `CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"` * CompositeReader & CompositeReaderBuilder A new type of reader gluing multiple readers together. * Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid" Original commit changeset: 9325a4356dbe * [dai][WIP] convert params to int8 on ps before sending to trainer Add float->uint8 conversion in addition to float->fp16 conversion in model_saver. * [easy] improve unit test for sparse length sum ops as desc. #accept2ship * Update GitHub upstream to `771fcb3455` * move sparse hash unique ops to OOS and add unit tests - move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1 - The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2 - fix the CUDA UniqueOp for the case when batch is empty. - add unit test * group_norm_op for caffe2 This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494 This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel). * Resubmit D7405233: disappeared in D7464958 OOS publish causes the op missing -- however, test was still there * [c2] add sparse hash engine for cuda unique op The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU. * [dper][gpu] enable unit testing gpu trainer for sparse nn to debug the GPU trainer using mock data in unit test. make it easier to develop GPU trainer for new models. * Reuse Gloo context for Synchronize() calls Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts). * [GanH/WGAN][1/n]: add FC param clipping as titled * [mobile] minimizing changes between caffe2_benchmark and speed_benchmark * [GanH]: enable diagnose within model avoid finding blob names but to directly enable inside the model * Add `net_transformer_fun` option to DPM This callback allows for various transformations to be made to the model after gradient operators have been added. The immediate motivation for this is to allow transformations such has "checkpoint-and-recompute" which allow trading off memory for additional compute. Adding several callbacks like this has made DPM's API less than ideal at this stage. However, I could not find any reasonable alternative. * [DT] [33/n] Compile flow task groups task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary. * Initial commit for sparse_normalize vectorization and benchmark * [GanH]: LB Calibration for JSD as titled * Tracing event in async executor Adding event tracing through TRACE_EVENT macro in async executor * [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset D7409751 got lost in D7464958 * Visualizing realtime weights values we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index. Currently, we assume the blob to be 2 dimensional. * [GanH][Easy]: Fix Homotopy Weighting apparantely, there was a bug in homotopy weight (alpha, beta) update * [c2] move sparse hash unique op out of oss so that oss do not need to depend on google hash map. * Get rid of std::round as it's not supported on Android * Revert changes on setup.py * Skip shaky test on Dataio * fix	2018-04-10 21:11:43 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Xue Feng	0e9b0cf779	add error msg in fc input_record Summary: as titled Reviewed By: xianjiec Differential Revision: D6787879 fbshipit-source-id: 4bbdd11455480b25fa18121fa4527a9f0a03addc	2018-01-23 14:48:15 -08:00
Yan Shang	41bb662d96	add dense regularization Reviewed By: xianjiec Differential Revision: D5617571 fbshipit-source-id: 875d7c8753bdb3b6847d5e3f47ad8568cdf172f8	2018-01-08 13:03:17 -08:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Jiyan Yang	a8695178aa	Adding parameter sharing API to Dper2 Summary: To achive this, I modified the blob name scheme defined in a layer. Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc within the same scope). Now I change it to scope/fc/w and scope/fc_auto_0/w. That is, we rely on the uniqueness of the scoped layer name to define names for blobs. I also overwrote the create_param method in LayerModelHelper to let it use the resolved name for blobs given the sharingparameter context. There are some details such as making the initializer more structured that I need to finalize. Reviewed By: kennyhorror Differential Revision: D5435132 fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455	2017-08-03 00:33:18 -07:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Kittipat Virochsiri	da36212259	SamplingTrain layer Summary: `SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema. When initiated in prediction context, the full output of the wrap layer is produced. This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context. I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad. Reviewed By: xianjiec Differential Revision: D4689887 fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8	2017-03-27 23:31:55 -07:00
Xianjie Chen	aed3aabc7f	model and preprocessor can handle empty dense inputs Summary: we may not need dense feature inputs in some models (e.g., double helix). Reviewed By: dzhulgakov Differential Revision: D4568755 fbshipit-source-id: 6850508f86fafb53f81783b2a2a38776be5455d7	2017-02-22 11:19:15 -08:00
Xianjie Chen	8949abe10b	more clear about supported output dimension Summary: Do I understand correctly? It must be of size 1 for sigrid Reviewed By: kennyhorror Differential Revision: D4576541 fbshipit-source-id: 92fa8dc62e36ff095e14cceeb80b03c0028f5695	2017-02-16 21:01:52 -08:00
Xianjie Chen	d0621a2449	NextScopedBlob with well-defined behavior and respect namescope Summary: Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope. The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference. This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models. Reviewed By: kennyhorror Differential Revision: D4555423 fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187	2017-02-16 17:16:36 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00

20 Commits