pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Yuanyuan Chen	f231be25c6	Mark unused parameters in C++ code (#164912 ) This PR adds unused parameter name comments in C++ declarations to improve code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164912 Approved by: https://github.com/Skylion007	2025-10-09 06:23:25 +00:00
cyy	8fa81a6066	Enable misc-use-internal-linkage check and apply fixes (#148948 ) Enables clang-tidy rule [`misc-use-internal-linkage`](https://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html). This new check was introduced in Clang-Tidy 18 and is available due to recent update of Clang-Tidy 19. The check marks functions and variables used only in the translation unit as static. Therefore undesired symbols are not leaked into other units, more link time optimisations are possible and the resulting binaries may be smaller. The detected violations were mostly fixed by using static. In other cases, the symbols were indeed consumed by others files, then their declaring headers were included. Still some declarations were wrong and have been fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148948 Approved by: https://github.com/Skylion007	2025-03-12 14:22:56 +00:00
cyy	09291817b2	Fix extra semicolon warning (#148291 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148291 Approved by: https://github.com/Skylion007	2025-03-03 18:51:44 +00:00
cyy	38d3c27849	[1/N] Enable cppcoreguidelines-special-member-functions (#137405 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/137405 Approved by: https://github.com/ezyang	2024-10-23 00:16:53 +00:00
cyy	a2396b2dd8	[2/N] Fix extra warnings brought by clang-tidy-17 (#137459 ) Follows #137407 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137459 Approved by: https://github.com/Skylion007	2024-10-08 19:05:02 +00:00
cyy	507611f9ae	[CUDACachingAllocator] Turn Allocator::allocate into non-const (#120969 ) Ideally, the method should be non-const since it changes the allocator state. Some const_casts are also removed in the way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120969 Approved by: https://github.com/albanD	2024-03-05 09:53:05 +00:00
Edward Yang	b4a35632f9	Add function to materialize COW storages (#117053 ) Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems) Test Plan: sandcastle, OSS CI Differential Revision: D52610522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053 Approved by: https://github.com/malfet, https://github.com/kurtamohler	2024-01-10 15:34:16 +00:00
PyTorch MergeBot	f36d09fcb7	Revert "Add function to materialize COW storages (#113396 )" This reverts commit `e2f090086b`. Reverted https://github.com/pytorch/pytorch/pull/113396 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113396#issuecomment-1818769090))	2023-11-20 10:26:01 +00:00
Kurt Mohler	e2f090086b	Add function to materialize COW storages (#113396 ) Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113396 Approved by: https://github.com/ezyang	2023-11-17 01:58:51 +00:00
PyTorch MergeBot	9c7391ea36	Revert " [1/N] Apply clang-tidy to c10 cuda files (#111137 )" This reverts commit `43b023694e`. Reverted https://github.com/pytorch/pytorch/pull/111137 on behalf of https://github.com/malfet due to Was reverted internally due to the failures in torch.cuda.memory_stats(device=0) (presumably) ([comment](https://github.com/pytorch/pytorch/pull/111137#issuecomment-1769274103))	2023-10-18 20:32:53 +00:00
cyy	43b023694e	[1/N] Apply clang-tidy to c10 cuda files (#111137 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/111137 Approved by: https://github.com/zou3519, https://github.com/Skylion007	2023-10-17 04:52:50 +00:00
cyy	fa65ae8f56	cleanup unused include (#93359 ) Using `include-what-you-use` tool to find out and remove some unused includes Pull Request resolved: https://github.com/pytorch/pytorch/pull/93359 Approved by: https://github.com/malfet	2023-02-04 02:15:50 +00:00
cyy	bfe5e1258b	avoid unnecessary static_cast (#93898 ) avoid unnecessary static_cast Pull Request resolved: https://github.com/pytorch/pytorch/pull/93898 Approved by: https://github.com/Skylion007	2023-02-03 03:44:43 +00:00
cyy	37f7c00a8a	More fixes and improved clang-tidy checkers (#93213 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93213 Approved by: https://github.com/Skylion007	2023-02-01 14:44:17 +00:00
cyy	045d1de02d	Fix some code issues (#92760 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/92760 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-01-24 08:19:03 +00:00
Codrin Popa	0aedda25bc	[PyTorch] Reporting OOM events to the Pytorch Profiler. (#80050 ) Summary: Similar to reporting alloc and dealloc events in the PyTorch profiler, we are now reporting Out of Memory events as well. This is useful for performance troubleshooting Test Plan: Added test_oom_tracing to test/test_profiler.py Differential Revision: D36268132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80050 Approved by: https://github.com/robieta	2022-07-20 16:51:39 +00:00
mikey dagitses	844a4b47df	extract out //c10/core:alloc_cpu (#70859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70859 ghstack-source-id: 147642534 Test Plan: Extracting code unmodified to a new library: relying on CI to validate. Reviewed By: malfet Differential Revision: D33329688 fbshipit-source-id: f60327467d197ec1862fb3554f8b83e6c84cab5c (cherry picked from commit `f82e7c0e9b`)	2022-01-27 07:34:52 +00:00
mikey dagitses	fc6a488e9a	extract out //c10/core:alignment (#70858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70858 ghstack-source-id: 147642533 Test Plan: Extracted a constant to a new header, trusting CI build to validate. Reviewed By: malfet Differential Revision: D33329689 fbshipit-source-id: 8697bb81a5cc3366462ebdf1f214b62d478fa77c (cherry picked from commit `16663847e1`)	2022-01-27 07:34:52 +00:00
Richard Barnes	29d759948e	use irange for loops 2 (#66746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705361 fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268	2021-12-10 04:26:23 -08:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Han Guangyun	8bbcef5096	Report more information for memory profiling (#61282 ) Summary: Report pointed memory size, total allocated memory, total reserved size all in one report. `ptr` and `alloc_size` will be used for associating with op trace. `allocated_size`, `reserved_size` will be used for memory trace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61282 Reviewed By: ejguan Differential Revision: D29796282 Pulled By: chaekit fbshipit-source-id: 5314c867632d3af1fa9a3811b35eaa5e931a5d87	2021-08-04 15:03:14 -07:00
Richard Barnes	ee44d73e59	Modernize override (#61744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717320 fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc	2021-07-23 23:04:46 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Nikita Shulga	635d864b26	Fix modernize-use-equals-default nolint failures in torch/csrcs (#61142 ) Summary: Test-plan: Compile + clang-tidy Pull Request resolved: https://github.com/pytorch/pytorch/pull/61142 Reviewed By: VitalyFedyunin Differential Revision: D29529372 Pulled By: malfet fbshipit-source-id: 2ccde7712a51c28243b16bbb4d1d68086e0414a6	2021-07-06 09:46:46 -07:00
Ilia Cherniavskii	047ae6b713	[profiler][small] CUDA synchronize guard, minor fix (#58254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58254 Don't use CUDA synchronize when profiling in CPU only mode. minor fixes (a clarification for a doc string, fix spammy logging) (Note: this ignores all push blocking failures!) Test Plan: manual + CI Reviewed By: gdankel, chaekit Differential Revision: D28423667 Pulled By: ilia-cher fbshipit-source-id: 04c71727f528ae8e2e0ff90e88271608d291bc69	2021-05-13 19:21:56 -07:00
Nikita Shulga	087049000b	Make c10 clang-tidy clean (#55870 ) Summary: This change was autogenerated by running: ``` % find c10 -iname "*.cpp" -exec python3 tools/clang_tidy.py -c build -x {} -s \; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55870 Reviewed By: janeyx99 Differential Revision: D27728617 Pulled By: malfet fbshipit-source-id: bede4d7f0c106d51394d1e9efddf01bf894421c5	2021-04-14 11:23:28 -07:00
Caroline Chen	2f91cda37e	Modify error message (#53525 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53518 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53525 Reviewed By: mthrok Differential Revision: D26900045 Pulled By: carolineechen fbshipit-source-id: 387301381603d37d24cc829c8fed38123f268c0b	2021-03-09 09:12:00 -08:00
Ilia Cherniavskii	f1f9b049d8	[profiler] Support top-level memory events (#51421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51421 Mark memory events that did not happen within an operator context explicitly in the profiler output. Test Plan: python test/test_profiler.py -k test_memory_profiler Reviewed By: ngimel Differential Revision: D26166518 Pulled By: ilia-cher fbshipit-source-id: 3c14d3ac25a7137733ea7cc65f0eb48693a98f5e	2021-02-04 04:14:15 -08:00
Hao Lu	4976208e73	[caffe2] Register BlackBoxPredictor AllocationArenaPool as CPUCachingAllocator (#48161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48161 - Register BlackBoxPredictor AllocationArenaPool as CPUCachingAllocator - Use the AllocationArenaPool in both BlackBoxPredictor and StaticRuntime Test Plan: ``` buck run //caffe2/caffe2/fb/predictor:black_box_predictor_test buck run //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` AF canary: https://www.internalfb.com/intern/ads/canary/431021257540238874/ Reviewed By: dzhulgakov Differential Revision: D24977611 fbshipit-source-id: 33ba596b43c1e558c3ab237a0feeae93565b2d35	2020-11-30 15:03:34 -08:00
Kimish Patel	a09e1098e7	Profiling allocator for mobile. (#43951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43951 AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Imported from OSS Reviewed By: dreiss Differential Revision: D23451019 fbshipit-source-id: 98bf1dbcfa8fcfb83d505ac01095e84a3f5b778d	2020-10-06 09:09:54 -07:00
Kimish Patel	6e55a26e10	Move mobile specific CPUCachingAllocator to c10/mobile folder. (#45364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45364 Plus add some more comments about the usage, limitations and cons. Test Plan: Build and run benchmark binary. Reviewed By: gchanan Differential Revision: D23944193 fbshipit-source-id: 30d4f4991d2185a0ab768d94c846d73730fc0835	2020-09-29 11:33:26 -07:00
Kimish Patel	2a08566b8f	Simple caching allocator for CPU. (#42006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006 This PR introduces a simple CPU caching allocator. This is specifically intended for mobile use cases and for inference. There is nothing specific to the implementation that can prevent it from other use cases, however its simplicity may not be suitable everywhere. It simply tracks allocation by sizes and relies on deterministic repeatable behavior where allocation of same sizes are made on every inference. Thus after the first allocation when the pointer is returned, instead of returning it to system, allocator caches it for subsequent use. Memory is freed automatically at the end of the process, or it can be explicitly freed. This is enabled at the moment in DefaultMobileCPUAllocator only. Test Plan: android test: cpu_caching_allocator_test Imported from OSS Reviewed By: dreiss Differential Revision: D22726976 fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c	2020-08-21 19:09:22 -07:00
Ilia Cherniavskii	a94fb71b12	Memory profiling (#37775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37775 Adding memory usage into profiler table output Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake ``` import torch import torchvision.models as models model = models.resnet18() inp = torch.randn(5, 3, 224, 224) with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof: model(inp) print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15)) ``` ``` --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]] empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 [] stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]] empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 [] is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]] masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]] conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]] contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]] _convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 [] thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [ --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 154.855ms ``` Reviewed By: ngimel Differential Revision: D21384248 Pulled By: ilia-cher fbshipit-source-id: 31359cce2aa06f6255ed1ad8c60d03cb640bfec3	2020-05-19 15:48:48 -07:00
Allan Di Wu	f538cd627a	Install HugePagesArena to optimize pytorch prediction performance (#37640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37640 Enable oversize arena to reduce memory fragmentation. Memory request with large size (configurable with FLAGS_caffe2_oversize_threshold) are fulfilled from dedicated arena separate from the existing huge page arena. Two additional parameters are introduced to configure the 2-phase decay of the memory arena: - caffe2_dirty_decay_ms - caffe2_muzzy_decay_ms In current JEMalloc implementation, oversized allocations will be immediately purged regardless of putting it in arena or not. Therefore we need to extend the decay time to indefinite. Currently we set the default for caffe2_muzzy_decay_ms to -1. We now enable the arena allocator statically. To ensure it is correctly installed regardless of static initialization order, we add a priority flag in c10::SetAllocator, and only higher priority allocators can overwrite existing ones. ghstack-source-id: 103276877 Test Plan: buck test mode/dev //caffe2/caffe2/fb/init:huge_pages_allocator_test Benchmarking known CV model that benefits from page arena: ``` PyTorchModelBench.cpp:183] test / base : 86.9532% ``` By adjusting ```dirty_decay_ms``` and ```muzzy_decay_ms```, we have the following plots: https://pxl.cl/15SWW https://pxl.cl/15TnL From the figures above we can see performance does not change much until dirty decay time is indefinite (set to -1). Either setting muzzy decay or dirty decay time to -1 will reach best performance, regardless of which one it is. Even setting the decay time to very long (100s, which is longer than the run), does not change the performance by much. ## Observe performance difference in production with a variety of models (WIP) Reviewed By: dzhulgakov Differential Revision: D21258581 fbshipit-source-id: c006f8b94f28aef0666e52f48d4e82cf0d3a48af	2020-05-06 17:27:10 -07:00
Ashkan Aliabadi	006f1a32f8	Mobile CPU allocator. (#36032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36032 QNNPACK AND XNNPACK may out-of-bound access the input and / or output tensors. This is by-design, and chosen to make the implementation of micro-kernels both simpler and faster as a result of not having to individually handle the corner cases where the number of processed elements is not a multiple of SIMD register width. This behavior will trigger ASAN though, and may result in a segfault if the accessed memory location just so happens to fall on a page the current process has no read access to. Here we define a custom allocator that allocates the extra storage required to keep this behavior safe. This allocator could have been restricted to QNNPACK and XNNPACK only, but that would have negative performance ramifications, as input tensors must now be reallocated, and copied over, if the tensor is not allocated with this allocator to begin with. Making this allocator the default on mobile builds minimizes the probability of unnecessary reallocations and copies, and also enables acceleration of operations where the output tensor is allocated outside of the function doing the implementation, wherein the implementation cannot simply re-allocate the output with the guarding allocator. Test Plan: Imported from OSS Differential Revision: D20970217 Pulled By: AshkanAliabadi fbshipit-source-id: 65cca2d38d7c0cef63c732f393016f50f1fa5199	2020-04-23 11:03:03 -07:00
Dmytro Dzhulgakov	d9dcfacd9e	Improve CPUAllocator OOM message (#20618 ) Summary: Spotted while debugging some problem Before ``` >>> torch.empty(1015) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0 ``` After ``` >>> torch.empty(1015) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 4000000000000000 bytes. Error code 12 (Cannot allocate memory) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20618 Reviewed By: ezyang Differential Revision: D15390400 Pulled By: dzhulgakov fbshipit-source-id: 31f448303e4bd5f8c2bad8ca0f05bcece22a4b5e	2019-05-17 16:14:49 -07:00
Mikhail Zolotukhin	4211f674f0	Cleanup includes in c10/core/CPUAllocator.cpp. (#19885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19885 ghimport-source-id: 1f7d228ac8d1dd9aeb70446a6baf63f62f195663 Differential Revision: D15118516 Pulled By: ZolotukhinM fbshipit-source-id: bdaf56d97db9e70cbd36ca03349f6eabfbac2668	2019-05-06 16:06:19 -07:00
Edward Yang	b3b692a80a	Don't have malloc-free pairs that cross DLL boundaries. (#17302 ) Summary: See https://blogs.msdn.microsoft.com/oldnewthing/20060915-04/?p=29723 for more background on this requirement on Windows. Fixes #17239. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc xkszltl peterjc123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17302 Differential Revision: D14150067 Pulled By: ezyang fbshipit-source-id: 9dc16ca781ff17515b8df1bb55492477e7843d4c	2019-02-20 20:31:41 -08:00
Dmytro Dzhulgakov	11816affab	Safety check for negative alloc_cpu() attempt (#17071 ) Summary: Some legacy TH code was relying on alloc to throw when called with negative number!!! E.g. `torch.linspace(0, 1, -1)`. And it breaks ASAN build. I still believe alloc should receive size_t, but I added a safety enforce inside. It should fix ASAN. I'll follow up with a proper fix for empty_cpu (which is probably the right place to do it) separately Pull Request resolved: https://github.com/pytorch/pytorch/pull/17071 Differential Revision: D14074157 Pulled By: dzhulgakov fbshipit-source-id: 3ed3bdb873e446edecb558e1df491310fd7179e3	2019-02-13 22:41:13 -08:00
Dmytro Dzhulgakov	51dd2000cd	unify c2 and TH allocator (#16892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16892 Replaces https://github.com/pytorch/pytorch/pull/14517 Merged caffe2 and TH CPU Allocators. Mostly using the code from caffe2 allocators. `memset` of caffe2 allocator is gone now. These two allocators should be almost the same. Baseline: ``` Running ./tensor_allocation Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_MakeStorageImpl 148 ns 148 ns 4676594 BM_StorageImplCtor 54 ns 54 ns 12957810 BM_MallocStorageImpl 62 ns 62 ns 11254745 BM_TensorImplCtor 22 ns 22 ns 31939472 BM_MallocTensorImpl 105 ns 105 ns 6505661 BM_Malloc_1 43 ns 43 ns 16464905 BM_MakeTensorFromStorage 126 ns 126 ns 5586116 BM_MakeVariableFromTensor 236 ns 236 ns 2995528 BM_ATenCPUTensorAllocationSmall1 319 ns 319 ns 2268884 BM_ATenCPUTensorAllocationSmall2 318 ns 318 ns 2163332 BM_ATenCPUTensorAllocationMedium1 403 ns 403 ns 1663228 BM_ATenCPUTensorAllocationMedium2 448 ns 448 ns 1595004 BM_ATenCPUTensorAllocationBig1 532 ns 532 ns 1352634 BM_ATenCPUTensorAllocationBig2 4486 ns 4486 ns 160978 ``` Changed: ``` Running ./tensor_allocation Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_MakeStorageImpl 141 ns 141 ns 4803576 BM_StorageImplCtor 55 ns 55 ns 13129391 BM_MallocStorageImpl 64 ns 64 ns 11088143 BM_TensorImplCtor 23 ns 23 ns 31616273 BM_MallocTensorImpl 101 ns 101 ns 7017585 BM_Malloc_1 39 ns 39 ns 18523954 BM_MakeTensorFromStorage 118 ns 118 ns 5877919 BM_MakeVariableFromTensor 452 ns 452 ns 1565722 BM_ATenCPUTensorAllocationSmall1 384 ns 384 ns 1819763 BM_ATenCPUTensorAllocationSmall2 389 ns 389 ns 1857483 BM_ATenCPUTensorAllocationMedium1 425 ns 425 ns 1646284 BM_ATenCPUTensorAllocationMedium2 430 ns 430 ns 1561319 BM_ATenCPUTensorAllocationBig1 508 ns 508 ns 1309969 BM_ATenCPUTensorAllocationBig2 3799 ns 3799 ns 173674 ``` lstm benchmark: Before: ``` INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k ``` After: ``` INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k ``` Reviewed By: ezyang Differential Revision: D13202632 fbshipit-source-id: db6d2ec756ed15b0732b15396c82ad42302bb79d	2019-02-12 21:16:34 -08:00

41 Commits