pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Tongzhou Wang	b6f43afaca	Fix tensordot allowing negative dims (#31954 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31926 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31954 Differential Revision: D19331847 Pulled By: zou3519 fbshipit-source-id: e30dd9517917c056a52be7d16f23247fe28f4e28	2020-01-10 07:42:04 -08:00
Rohan Varma	8ea49e7a08	add missing braces for format in rpc _to_worker_info (#31969 ) Summary: This was missing and resulted in the incorrect `name` passed into `_to_worker_info` not being printed out in the error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31969 Differential Revision: D19331927 Pulled By: rohan-varma fbshipit-source-id: e74d47daec3224c2d9b9da3c0a6404cfa67baf65	2020-01-09 23:18:46 -08:00
Jiakai Liu	4e84661139	update llvmlite to 0.30.0 (#31858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31858 Trying to upgrade docker image but ran into the following error: ``` Running test_nn ... [2020-01-04 18:05:12.537860] Traceback (most recent call last): File "test_nn.py", line 45, in <module> from common_cuda import TEST_CUDA, TEST_MULTIGPU, TEST_CUDNN, TEST_CUDNN_VERSION File "/var/lib/jenkins/workspace/test/common_cuda.py", line 16, in <module> import numba.cuda File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 178, in <module> _ensure_llvm() File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 100, in _ensure_llvm raise ImportError(msg) ImportError: Numba requires at least version 0.30.0 of llvmlite. Installed version is 0.28.0. ``` Test Plan: Imported from OSS Differential Revision: D19282923 Pulled By: ljk53 fbshipit-source-id: bdeefbf4f6c0c97df622282f76e77eb1eadba436	2020-01-09 19:28:08 -08:00
Shen Li	62f93443e5	Explain RPC behavior when using Tensor as arg or return value Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31968 Test Plan: Imported from OSS Differential Revision: D19321380 Pulled By: mrshenli fbshipit-source-id: e3431f1f02963cc8d8266a420ab03866106f26ac	2020-01-09 16:42:24 -08:00
Zafar Takhirov	6abfa9ad8a	Quantized H Tangent function (#31031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031 This activation will be needed for the LSTM implementation. Also includes the QNNPack implementation. Test Plan: Imported from OSS Differential Revision: D19334280 Pulled By: z-a-f fbshipit-source-id: ae14399765a47afdf9b1e072d3967c24ff473e8d	2020-01-09 16:16:17 -08:00
Bram Wasti	021e1e20c1	Revert D19320493: Javadoc changes Test Plan: revert-hammer Differential Revision: D19320493 Original commit changeset: cc76b2a2acbe fbshipit-source-id: 3b36dd2d2591acc60a06a421dd625c21adbe578a	2020-01-09 14:23:30 -08:00
Jiakai Liu	700d1c5cbc	update CI script to take string docker image version (#31857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31857 According to mingbowan we will change to use string docker image version because the tag is no longer an integer since we move the docker image build job to circle CI: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Test Plan: - with stacked PR Differential Revision: D19282726 Pulled By: ljk53 fbshipit-source-id: 7a12ae89a11cf15163b905734d50fed6dc98cb07	2020-01-09 14:15:10 -08:00
Lu Fang	67ff051ddd	Remove temporary fix for torchbind in BC check (#31982 ) Summary: Remove the patch Pull Request resolved: https://github.com/pytorch/pytorch/pull/31982 Reviewed By: hl475 Differential Revision: D19333205 Pulled By: houseroad fbshipit-source-id: 1d16fd31ede7266789141238520d47b762a7a340	2020-01-09 13:58:16 -08:00
Alban Desmaison	2968faf154	Update doc about output_differentiability keyword in derivatives.yaml Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31925 Test Plan: Imported from OSS Differential Revision: D19303833 Pulled By: albanD fbshipit-source-id: 291a9f122720844a5f8386b22cf6abc66ae86e4d	2020-01-09 13:48:06 -08:00
Edward Yang	67c1d930eb	Lock graph_task before writing leaf_streams. (#31995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31995 Fixes #31906. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19331259 Pulled By: ezyang fbshipit-source-id: 5d24bf3555e632211a9b6f8e50ff241603c18b3d	2020-01-09 13:26:36 -08:00
TH3CHARLie	1296e2d55e	C++ API parity: isinf (#31099 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31021, port the legacy binding method of `isinf` to C++ therefore support JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/31099 Differential Revision: D19314733 Pulled By: yf225 fbshipit-source-id: 5725c51d19c33b4fddd0fc9e7034078580bd534e	2020-01-09 13:16:13 -08:00
Sameer Deshmukh	cfdfdf70d7	remove JSON dumping dependency (#30724 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19420 So after actually writing a C++ JSON dumping class I figured that a faster and cleaner way would be simply rewrite the Python without the JSON module since the JSON that we need to output is so simple. For now I decided to not touch the `parse_cpu_trace` function since only changing `export_chrome_trace` shows a 4x speedup. Here's the script I used for benchmarking: ``` python import time import torch x = torch.ones(2, 2) start = time.time() with torch.autograd.profiler.profile() as prof: for _ in range(10000): x * x for i in range(50): prof.export_chrome_trace("trace.json") stop = time.time() print(stop-start) ``` master branch (using json dump) -> 8.07515025138855 new branch (without json dump) -> 2.0943689346313477 I checked the trace file generated in the [test](https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L2659) and it does work fine. Please let me know what you think. If you still insist on the C++ version I can send a new patch soon enough. CC ezyang rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/30724 Differential Revision: D19298955 Pulled By: ezyang fbshipit-source-id: b0d7324ea5f90884ab8a00dd272f3aa3d9bc0427	2020-01-09 12:56:16 -08:00
jlquinn	bc68a8745f	Spelling fix in transformer docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31973 Differential Revision: D19330660 Pulled By: zou3519 fbshipit-source-id: 29ea1e790a34f0241cb7aba85110f087cdc069ba	2020-01-09 11:13:23 -08:00
Jessica Lin	26f552a3d1	Javadoc changes (#31956 ) Summary: - Add Javadoc url in index.rst - Delete no longer needed java rst files - Remove intersphinx extension from conf.oy - Remove javasphinx from docs/requirements.txt Pull Request resolved: https://github.com/pytorch/pytorch/pull/31956 Differential Revision: D19320493 Pulled By: jlin27 fbshipit-source-id: cc76b2a2acbe2ecdabcd3339e1cc3182f0c906ae	2020-01-09 10:55:24 -08:00
xiaobing.zhang	e59e5ba5a3	Move geometric to Aten(CPU) (#31878 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24704. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.geometric_(0.5) for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.geometric_(0.5) t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0092 (ms). input size(128, 10) forward time is 0.0802 (ms). input size(128, 100) forward time is 0.7994 (ms). input size(128, 1000) forward time is 7.8403 (ms). ``` After: ``` input size(128, 1) forward time is 0.0088 (ms). input size(128, 10) forward time is 0.0781 (ms). input size(128, 100) forward time is 0.7815 (ms). input size(128, 1000) forward time is 7.7163 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31878 Differential Revision: D19314510 Pulled By: ezyang fbshipit-source-id: 2d95bf9938c8becf280890acf9e37223ddd08a39	2020-01-09 10:47:56 -08:00
xiaobing.zhang	99b3f9cac4	Move log_sigmoid to Aten(CPU) (#30958 ) Summary: VitalyFedyunin, This PR is about port LogSigmoid activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" m = nn.LogSigmoid() #warm up for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Before: ``` input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms). input size(128, 100) forward time is 0.90 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 9.04 (ms); backwad avg time is 0.87 (ms). ``` After: ``` input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 0.28 (ms); backwad avg time is 0.07 (ms). ``` OMP_NUM_THREADS=1: ``` Before: input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms). input size(128, 100) forward time is 0.88 (ms); backwad avg time is 0.10 (ms). input size(128, 1000) forward time is 8.72 (ms); backwad avg time is 0.81 (ms). After: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 0.63 (ms); backwad avg time is 0.15 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24724, https://github.com/pytorch/pytorch/issues/24725. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30958 Differential Revision: D19275111 Pulled By: ezyang fbshipit-source-id: bbfe82e58fb27a4fb21c1914c6547a9050072e5c	2020-01-09 10:30:00 -08:00
xiaobing.zhang	5a76335aaa	Move lshift to Aten (#31566 ) Summary: VitalyFedyunin , this PR is about move lshift to Aten. Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__lshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ilshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __lshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.31618343852460384 device: cpu, dtype: torch.uint8, 100000 times 0.31258584931492805 device: cpu, dtype: torch.int16, 100000 times 0.3140896391123533 device: cpu, dtype: torch.int32, 100000 times 0.34389012958854437 device: cpu, dtype: torch.int64, 100000 times 0.339566046372056 device: cpu, dtype: torch.float32, 100000 times 0.4180623721331358 device: cpu, dtype: torch.float64, 100000 times 0.4165227338671684 device: cuda, dtype: torch.int8, 100000 times 1.7851383443921804 device: cuda, dtype: torch.uint8, 100000 times 1.7842160519212484 device: cuda, dtype: torch.int16, 100000 times 1.789359962567687 device: cuda, dtype: torch.int32, 100000 times 1.7822618428617716 device: cuda, dtype: torch.int64, 100000 times 1.7968465769663453 device: cuda, dtype: torch.float32, 100000 times 1.8066061967983842 device: cuda, dtype: torch.float64, 100000 times 1.8046843251213431 __lshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.04618230368942022 device: cpu, dtype: torch.uint8, 10000 times 0.04634759668260813 device: cpu, dtype: torch.int16, 10000 times 0.040676115080714226 device: cpu, dtype: torch.int32, 10000 times 0.04404774494469166 device: cpu, dtype: torch.int64, 10000 times 0.04511771444231272 device: cpu, dtype: torch.float32, 10000 times 0.6887832451611757 device: cpu, dtype: torch.float64, 10000 times 0.5559549620375037 device: cuda, dtype: torch.int8, 10000 times 0.17996764183044434 device: cuda, dtype: torch.uint8, 10000 times 0.17970609478652477 device: cuda, dtype: torch.int16, 10000 times 0.17873135022819042 device: cuda, dtype: torch.int32, 10000 times 0.1781835313886404 device: cuda, dtype: torch.int64, 10000 times 0.17846618220210075 device: cuda, dtype: torch.float32, 10000 times 0.18056879844516516 device: cuda, dtype: torch.float64, 10000 times 0.18132662680000067 __ilshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.61110960226506 device: cpu, dtype: torch.uint8, 100000 times 0.6333359787240624 device: cpu, dtype: torch.int16, 100000 times 0.6345370784401894 device: cpu, dtype: torch.int32, 100000 times 0.6470990972593427 device: cpu, dtype: torch.int64, 100000 times 0.6587044578045607 device: cpu, dtype: torch.float32, 100000 times 0.7269002720713615 device: cpu, dtype: torch.float64, 100000 times 0.7217964073643088 device: cuda, dtype: torch.int8, 100000 times 1.9880435159429908 device: cuda, dtype: torch.uint8, 100000 times 1.986489498987794 device: cuda, dtype: torch.int16, 100000 times 2.0059875370934606 device: cuda, dtype: torch.int32, 100000 times 1.995262237265706 device: cuda, dtype: torch.int64, 100000 times 1.9974954994395375 device: cuda, dtype: torch.float32, 100000 times 2.00442770216614 device: cuda, dtype: torch.float64, 100000 times 2.009664717130363 __ilshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.08199594635516405 device: cpu, dtype: torch.uint8, 10000 times 0.08096733782440424 device: cpu, dtype: torch.int16, 10000 times 0.0734213450923562 device: cpu, dtype: torch.int32, 10000 times 0.0769620593637228 device: cpu, dtype: torch.int64, 10000 times 0.08650507684797049 device: cpu, dtype: torch.float32, 10000 times 0.7196345143020153 device: cpu, dtype: torch.float64, 10000 times 0.597336508333683 device: cuda, dtype: torch.int8, 10000 times 0.19723015930503607 device: cuda, dtype: torch.uint8, 10000 times 0.19754122477024794 device: cuda, dtype: torch.int16, 10000 times 0.19710093270987272 device: cuda, dtype: torch.int32, 10000 times 0.19611249305307865 device: cuda, dtype: torch.int64, 10000 times 0.19750046730041504 device: cuda, dtype: torch.float32, 10000 times 0.19680574722588062 device: cuda, dtype: torch.float64, 10000 times 0.19689027685672045 ``` After: ``` __lshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3031281465664506 device: cpu, dtype: torch.uint8, 100000 times 0.30772678554058075 device: cpu, dtype: torch.int16, 100000 times 0.3088294789195061 device: cpu, dtype: torch.int32, 100000 times 0.30907699652016163 device: cpu, dtype: torch.int64, 100000 times 0.31315001379698515 device: cpu, dtype: torch.float32, 100000 times 0.38823566399514675 device: cpu, dtype: torch.float64, 100000 times 0.39300001971423626 device: cuda, dtype: torch.int8, 100000 times 1.3225595457479358 device: cuda, dtype: torch.uint8, 100000 times 1.31739442050457 device: cuda, dtype: torch.int16, 100000 times 1.3198596313595772 device: cuda, dtype: torch.int32, 100000 times 1.309600466862321 device: cuda, dtype: torch.int64, 100000 times 1.3264533821493387 device: cuda, dtype: torch.float32, 100000 times 1.3377520674839616 device: cuda, dtype: torch.float64, 100000 times 1.3343619462102652 __lshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.02718757465481758 device: cpu, dtype: torch.uint8, 10000 times 0.02701799664646387 device: cpu, dtype: torch.int16, 10000 times 0.025483975186944008 device: cpu, dtype: torch.int32, 10000 times 0.025557605549693108 device: cpu, dtype: torch.int64, 10000 times 0.026179466396570206 device: cpu, dtype: torch.float32, 10000 times 0.0962932649999857 device: cpu, dtype: torch.float64, 10000 times 0.1611471576616168 device: cuda, dtype: torch.int8, 10000 times 0.13165222201496363 device: cuda, dtype: torch.uint8, 10000 times 0.13358880020678043 device: cuda, dtype: torch.int16, 10000 times 0.1342075066640973 device: cuda, dtype: torch.int32, 10000 times 0.1328689968213439 device: cuda, dtype: torch.int64, 10000 times 0.13336248509585857 device: cuda, dtype: torch.float32, 10000 times 0.1345295710489154 device: cuda, dtype: torch.float64, 10000 times 0.14084953162819147 __ilshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.19080814253538847 device: cpu, dtype: torch.uint8, 100000 times 0.18541878275573254 device: cpu, dtype: torch.int16, 100000 times 0.19136024825274944 device: cpu, dtype: torch.int32, 100000 times 0.1916898973286152 device: cpu, dtype: torch.int64, 100000 times 0.1973192635923624 device: cpu, dtype: torch.float32, 100000 times 0.2668355852365494 device: cpu, dtype: torch.float64, 100000 times 0.24472137168049812 device: cuda, dtype: torch.int8, 100000 times 1.3581306440755725 device: cuda, dtype: torch.uint8, 100000 times 1.3522163443267345 device: cuda, dtype: torch.int16, 100000 times 1.366145665757358 device: cuda, dtype: torch.int32, 100000 times 1.3674909211695194 device: cuda, dtype: torch.int64, 100000 times 1.3734915973618627 device: cuda, dtype: torch.float32, 100000 times 1.3831533305346966 device: cuda, dtype: torch.float64, 100000 times 1.396162535995245 __ilshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.02847585454583168 device: cpu, dtype: torch.uint8, 10000 times 0.02960751298815012 device: cpu, dtype: torch.int16, 10000 times 0.028516249731183052 device: cpu, dtype: torch.int32, 10000 times 0.02842544950544834 device: cpu, dtype: torch.int64, 10000 times 0.029186096973717213 device: cpu, dtype: torch.float32, 10000 times 0.0999628696590662 device: cpu, dtype: torch.float64, 10000 times 0.16676222812384367 device: cuda, dtype: torch.int8, 10000 times 0.13856443110853434 device: cuda, dtype: torch.uint8, 10000 times 0.13766566663980484 device: cuda, dtype: torch.int16, 10000 times 0.13652489613741636 device: cuda, dtype: torch.int32, 10000 times 0.13678150344640017 device: cuda, dtype: torch.int64, 10000 times 0.13749946560710669 device: cuda, dtype: torch.float32, 10000 times 0.13879029918462038 device: cuda, dtype: torch.float64, 10000 times 0.14587809145450592 ``` Fix https://github.com/pytorch/pytorch/issues/24510 #24514 https://github.com/pytorch/pytorch/issues/24657 https://github.com/pytorch/pytorch/issues/24661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31566 Differential Revision: D19314251 Pulled By: ezyang fbshipit-source-id: 52df17b2c18ef1880374c6dbcf18fb1118086552	2020-01-09 09:41:36 -08:00
Richard Zou	5c423cae72	Add precision tests for CUDA half linspace+logspace (#31962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31962 I added precision tests for CUDA half, float, and double. The precision for CUDA half seems bad, but I checked the numbers against previous versions of pytorch. The output of CUDA Half linspace+logspace are exactly the same when compared with 1.2.0. Test Plan: - Run CI Differential Revision: D19320182 Pulled By: zou3519 fbshipit-source-id: 38d3d4dea2807875ed0b0ec2b93b19c10a289988	2020-01-09 07:35:52 -08:00
Iurii Zdebskyi	5d5f156558	Revert D18903453: Quantized H Tangent function Test Plan: revert-hammer Differential Revision: D18903453 Original commit changeset: 0050b1cebb1d fbshipit-source-id: 205978f71d5688d4068861f7cf2dff40fbb311c6	2020-01-09 07:30:49 -08:00
Edward Yang	ddff4efa26	Don't use RTLD_GLOBAL to load _C. (#31162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162 This should help us resolve a multitude of weird segfaults and crashes when PyTorch is imported along with other packages. Those would often happen because libtorch symbols were exposed globally and could be used as a source of relocations in shared libraries loaded after libtorch. Fixes #3059. Some of the subtleties in preparing this patch: * Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this. * Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D19262579 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2	2020-01-09 07:28:15 -08:00
Edward Yang	8614860210	Uniformly apply Windows logic in cpp_extensions everywhere (#31161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161 Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols. But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262578 Pulled By: ezyang fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f	2020-01-09 07:28:11 -08:00
Negin Raoof	0dbd5c0bfe	Added torchvision tests as part of ORT tests (#31835 ) Summary: Added torchvision tests as part of ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/31835 Reviewed By: hl475 Differential Revision: D19278607 Pulled By: houseroad fbshipit-source-id: 18a6a85ce3019bcc9aee9517af1378964b585afd	2020-01-08 21:04:29 -08:00
Supriya Rao	6d9a9e379d	Fix segfault in caffe2 slice test (#31801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31801 Try to fix issue #30764 Test Plan: python test/onnx/test_utility_funs.py TestUtilityFuns Imported from OSS Differential Revision: D19315046 fbshipit-source-id: de3595969280e4ebe762cb098ff0891f8b5a9a90	2020-01-08 17:13:29 -08:00
Hector Yuen	9e9ca6ec37	add conversion functions to embedding tables (#31083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31083 add (fp32/fp16)<->(int8 rowwise quantized fp32/fp16 scale biases) Test Plan: added unit tests enhanced shape inference tests Reviewed By: jspark1105 Differential Revision: D18920547 fbshipit-source-id: 6b3d7cb93f9d1669ecf511817d73976177632891	2020-01-08 16:56:12 -08:00
jjsjann123	eb23171bce	TensorIterator norm update (#31903 ) Summary: special case for norm out where p == 2. Instead of calling `pow`, we use multiplication as a faster code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31903 Differential Revision: D19312749 Pulled By: ngimel fbshipit-source-id: 73732b7b37a243a14438609784795b920271a0b5	2020-01-08 16:50:42 -08:00
Elias Ellison	8ecd3f783d	check for object equality in constant pooling (#31800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31800 If we know that two constants are the same object, we can ignore other constraints and pool them together. This fixes an issue introduced by the other PR where quantization relied on constant pooling happening for correctness. Test Plan: Imported from OSS Differential Revision: D19269499 Pulled By: eellison fbshipit-source-id: 9d4396125aa6899cb081863d463d4f024135cbf4	2020-01-08 16:47:07 -08:00
Elias Ellison	319cc21108	Add AliasDb API For Changing Aliasing (#31501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31501 We have a number of places in our code base where we should be checking if it's safe to change the alias relationship between two sets of values. This PR adds an api to Alias Db to consolidate the logic, and refactors Constant Pooling and `CSE` to use the new api. Next steps: add api usage in peephole.cpp where applicable. Happy to bikeshed `AliasDb::safeToChangeAliasingRelationship`. Previously I suggested `AliasDb::safeToIntroduceAliasing`, however that's not quite accurate, because this API also handles when it is unsafe to remove aliasing. Alternate suggestions: `safeToChangeAliasing`, `validToChangeAliasing`, `validToChangeAliasingRelationship` Related: https://github.com/pytorch/pytorch/issues/28360 Test Plan: Imported from OSS Differential Revision: D19254413 Pulled By: eellison fbshipit-source-id: 17f7f52ad2d1526d303132767cbbb32f8189ae15	2020-01-08 16:47:03 -08:00
davidriazati	5cc49ed45f	Document `IValue` (#31904 ) Summary: This is a first pass attempt at documenting `IValue` to help with problems like in #17165. Most users are probably concerned with * how to make an `IValue` that matches the input type to their graph (most of the constructors are pretty self explanatory, so as long as they are in the docs I think its enough) * how to extract the results after running their graph (there is a small note on the behavior of `.toX()` based on confusions we've had in the past) Preview: https://driazati.github.io/pytorch_doc_previews/31904/api/structc10_1_1_i_value.html#exhale-struct-structc10-1-1-i-value There are also some random CSS fixes to clean up the style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31904 Pulled By: driazati Differential Revision: D19318733 fbshipit-source-id: b29dae3349d5a7ea5a3b8e09cd23f7ff8434edb4	2020-01-08 16:08:35 -08:00
davidriazati	883fb5434a	Use real argument names for Python functions (#29300 ) Summary: This hooks up `inspect` so that Python functions get their parameters names attached instead of naming them `0, 1, 2, ...`. This also fixes issue #28537 where `ignore` functions were improperly typing `self`. ](https://our.intern.facebook.com/intern/diff/19256434/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29300 Pulled By: driazati Differential Revision: D19256434 fbshipit-source-id: 6a1fe7bd0afab708b8439517798955d0abfeb44c	2020-01-08 15:41:28 -08:00
davidriazati	09a22f3301	Remove C++ docs contributing page (#31908 ) Summary: Stacked PRs * #31908 - Remove C++ docs contributing page * #31905 - Add doc previewing instructions We should have 1 source of truth for contribution instructions (CONTRIBUTING.md). This PR moves the instructions from the C++ doc pages there instead of having its own separate page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31908 Pulled By: driazati Differential Revision: D19296366 fbshipit-source-id: c1daf004259342bd09e09dea3b80e34db47066ec	2020-01-08 15:37:35 -08:00
davidriazati	8c59d48281	Add doc previewing instructions (#31905 ) Summary: Stacked PRs * #31908 - Remove C++ docs contributing page * #31905 - Add doc previewing instructions This adds some instructions on how to get started with Github pages you can show reviewers your documentation changes. Hopefully we can delete this eventually and build docs automatically on relevant PRs in CI. ](https://our.intern.facebook.com/intern/diff/19296364/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31905 Pulled By: driazati Differential Revision: D19296364 fbshipit-source-id: df47fa1a8d7be029c3efcf6521298583ad9f7a95	2020-01-08 15:37:31 -08:00
xiaobing.zhang	dedd16b418	remove THConv code which never be used (#31879 ) Summary: Just remove dead code in TH. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31879 Differential Revision: D19315818 Pulled By: ezyang fbshipit-source-id: dbeb2475e19e9ebf769df2649cc859c08d3d184d	2020-01-08 15:14:27 -08:00
xiaobing.zhang	9a3cb1e859	Move cauchy to Aten(CPU) (#31824 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24684. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.cauchy_() for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.cauchy_() t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0071 (ms). input size(128, 10) forward time is 0.0596 (ms). input size(128, 100) forward time is 0.5798 (ms). input size(128, 1000) forward time is 5.8395 (ms). ``` After: ``` input size(128, 1) forward time is 0.0070 (ms). input size(128, 10) forward time is 0.0583 (ms). input size(128, 100) forward time is 0.5714 (ms). input size(128, 1000) forward time is 5.7674 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31824 Differential Revision: D19314411 Pulled By: ezyang fbshipit-source-id: 58098546face3e5971b023f702cfe44ff1cccfbc	2020-01-08 15:10:53 -08:00
xiaobing.zhang	9ba6a768de	Add op bitwise_or (#31559 ) Summary: ezyang , this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 . Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__or__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ior__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.17616272252053022 device: cpu, dtype: torch.uint8, 100000 times 0.17148233391344547 device: cpu, dtype: torch.int16, 100000 times 0.17616403382271528 device: cpu, dtype: torch.int32, 100000 times 0.17717823758721352 device: cpu, dtype: torch.int64, 100000 times 0.1801931718364358 device: cuda, dtype: torch.int8, 100000 times 1.270583058707416 device: cuda, dtype: torch.uint8, 100000 times 1.2636413089931011 device: cuda, dtype: torch.int16, 100000 times 1.2839747751131654 device: cuda, dtype: torch.int32, 100000 times 1.2548385225236416 device: cuda, dtype: torch.int64, 100000 times 1.2650810535997152 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031136621721088886 device: cpu, dtype: torch.uint8, 10000 times 0.030786747112870216 device: cpu, dtype: torch.int16, 10000 times 0.02391665056347847 device: cpu, dtype: torch.int32, 10000 times 0.024147341027855873 device: cpu, dtype: torch.int64, 10000 times 0.024414129555225372 device: cuda, dtype: torch.int8, 10000 times 0.12741921469569206 device: cuda, dtype: torch.uint8, 10000 times 0.1249831635504961 device: cuda, dtype: torch.int16, 10000 times 0.1283819805830717 device: cuda, dtype: torch.int32, 10000 times 0.12591975275427103 device: cuda, dtype: torch.int64, 10000 times 0.12655890546739101 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3908365070819855 device: cpu, dtype: torch.uint8, 100000 times 0.38267823681235313 device: cpu, dtype: torch.int16, 100000 times 0.38239253498613834 device: cpu, dtype: torch.int32, 100000 times 0.3817988149821758 device: cpu, dtype: torch.int64, 100000 times 0.3901665909215808 device: cuda, dtype: torch.int8, 100000 times 1.4211318120360374 device: cuda, dtype: torch.uint8, 100000 times 1.4215159295126796 device: cuda, dtype: torch.int16, 100000 times 1.4307750314474106 device: cuda, dtype: torch.int32, 100000 times 1.4123614141717553 device: cuda, dtype: torch.int64, 100000 times 1.4480243818834424 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06468924414366484 device: cpu, dtype: torch.uint8, 10000 times 0.06442475505173206 device: cpu, dtype: torch.int16, 10000 times 0.05267547257244587 device: cpu, dtype: torch.int32, 10000 times 0.05286940559744835 device: cpu, dtype: torch.int64, 10000 times 0.06211103219538927 device: cuda, dtype: torch.int8, 10000 times 0.15332304500043392 device: cuda, dtype: torch.uint8, 10000 times 0.15353196952492 device: cuda, dtype: torch.int16, 10000 times 0.15300503931939602 device: cuda, dtype: torch.int32, 10000 times 0.15274472255259752 device: cuda, dtype: torch.int64, 10000 times 0.1512152962386608 ``` After: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.2465507509186864 device: cpu, dtype: torch.uint8, 100000 times 0.2472386620938778 device: cpu, dtype: torch.int16, 100000 times 0.2469814233481884 device: cpu, dtype: torch.int32, 100000 times 0.2535214088857174 device: cpu, dtype: torch.int64, 100000 times 0.24855613708496094 device: cuda, dtype: torch.int8, 100000 times 1.4351346511393785 device: cuda, dtype: torch.uint8, 100000 times 1.4434308474883437 device: cuda, dtype: torch.int16, 100000 times 1.4520929995924234 device: cuda, dtype: torch.int32, 100000 times 1.4456610176712275 device: cuda, dtype: torch.int64, 100000 times 1.4580101007595658 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.029985425993800163 device: cpu, dtype: torch.uint8, 10000 times 0.03024935908615589 device: cpu, dtype: torch.int16, 10000 times 0.026356655173003674 device: cpu, dtype: torch.int32, 10000 times 0.027377349324524403 device: cpu, dtype: torch.int64, 10000 times 0.029163731262087822 device: cuda, dtype: torch.int8, 10000 times 0.14540370367467403 device: cuda, dtype: torch.uint8, 10000 times 0.1456305105239153 device: cuda, dtype: torch.int16, 10000 times 0.1450125053524971 device: cuda, dtype: torch.int32, 10000 times 0.1472016740590334 device: cuda, dtype: torch.int64, 10000 times 0.14709716010838747 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.27195510920137167 device: cpu, dtype: torch.uint8, 100000 times 0.2692424338310957 device: cpu, dtype: torch.int16, 100000 times 0.27726674638688564 device: cpu, dtype: torch.int32, 100000 times 0.2815811652690172 device: cpu, dtype: torch.int64, 100000 times 0.2852728571742773 device: cuda, dtype: torch.int8, 100000 times 1.4743850827217102 device: cuda, dtype: torch.uint8, 100000 times 1.4766502184793353 device: cuda, dtype: torch.int16, 100000 times 1.4774163831025362 device: cuda, dtype: torch.int32, 100000 times 1.4749693805351853 device: cuda, dtype: torch.int64, 100000 times 1.5772947426885366 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03614502027630806 device: cpu, dtype: torch.uint8, 10000 times 0.03619729354977608 device: cpu, dtype: torch.int16, 10000 times 0.0319912089034915 device: cpu, dtype: torch.int32, 10000 times 0.03319283854216337 device: cpu, dtype: torch.int64, 10000 times 0.0343862259760499 device: cuda, dtype: torch.int8, 10000 times 0.1581476852297783 device: cuda, dtype: torch.uint8, 10000 times 0.15974601730704308 device: cuda, dtype: torch.int16, 10000 times 0.15957212820649147 device: cuda, dtype: torch.int32, 10000 times 0.16002820804715157 device: cuda, dtype: torch.int64, 10000 times 0.16129320487380028 ``` Fix https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559 Differential Revision: D19315875 Pulled By: ezyang fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad	2020-01-08 15:06:30 -08:00
xiaobing.zhang	4f9d2f74e2	Port softplus activation to Aten(CPU+CUDA) (#30504 ) Summary: VitalyFedyunin, This PR is about port Softplus activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Softplus() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms). CPU: input size(128, 100) forward time is 1.16 (ms); backwad avg time is 0.69 (ms). input size(128, 10000) forward time is 60.19 (ms); backwad avg time is 31.86 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: input size(128, 100) forward time is 0.43 (ms); backwad avg time is 0.16 (ms). input size(128, 10000) forward time is 1.65 (ms); backwad avg time is 0.83 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) forward time is 0.53 (ms); backwad avg time is 0.28 (ms). input size(128, 10000) forward time is 51.33 (ms); backwad avg time is 25.48 (ms). After: input size(128, 100) forward time is 0.44 (ms); backwad avg time is 0.16 (ms). input size(128, 10000) forward time is 42.05 (ms); backwad avg time is 13.97 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24633, https://github.com/pytorch/pytorch/issues/24634, https://github.com/pytorch/pytorch/issues/24766, https://github.com/pytorch/pytorch/issues/24767. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30504 Differential Revision: D19274913 Pulled By: ezyang fbshipit-source-id: 21b29e8459dcba5a040cc68333887b45a858328e	2020-01-08 15:03:53 -08:00
Yinghai Lu	d2fdf140af	Combine all the user inputs together and convert them to fp16 (#31898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31898 Att Reviewed By: tracelogfb Differential Revision: D19291357 fbshipit-source-id: 747ed5234ca042ceeaff2d094701ead7597ac3ee	2020-01-08 14:36:42 -08:00
Yinghai Lu	8b4feff01d	Use simd version for fp16 conversions (#31897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31897 Previous version only use avx2. The _simd version uses avx512 if CPU is capable of that. Test Plan: Unitttest Reviewed By: tracelogfb Differential Revision: D19291499 fbshipit-source-id: 3b1ee0ba756e5c9defbd5caf7f68982d9b2ca06c	2020-01-08 14:36:38 -08:00
Alban Desmaison	1314f7f4f4	Ensure the original grad_mode is restored during backward (#31884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31884 Fix #31715 Test Plan: Imported from OSS Differential Revision: D19301076 Pulled By: albanD fbshipit-source-id: 2d20c01bfb6364fa96c8fe5aa5ce7ea39defa3ce	2020-01-08 14:16:51 -08:00
Alban Desmaison	c299cb05ef	temporary fix for jit test backward compatibility issues Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31949 Test Plan: Imported from OSS Differential Revision: D19314763 Pulled By: albanD fbshipit-source-id: b5eff0ed53a371d260596ca85d914c8bddb0a8aa	2020-01-08 13:32:08 -08:00
Mingbo Wan	462bfc7fe7	docker hub image info (#31923 ) Summary: result: http://docker.pytorch.org/docker_hub.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31923 Differential Revision: D19316770 Pulled By: mingbowan fbshipit-source-id: 57f34d8983d26772bb0d310fa0a4085674c860e5	2020-01-08 13:20:06 -08:00
Edward Yang	5dfcfeebb8	Revert D19298735: Emit warning from deprecated torch function signatures Test Plan: revert-hammer Differential Revision: D19298735 Original commit changeset: 03cb78af1765 fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70	2020-01-08 13:04:41 -08:00
Zafar Takhirov	620060cb0c	Quantized H Tangent function (#31031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031 This activation will be needed for the LSTM implementation. Also includes the QNNPack implementation. Test Plan: Imported from OSS Differential Revision: D18903453 Pulled By: z-a-f fbshipit-source-id: 0050b1cebb1ddb179b7ecbcb114fe70705070f67	2020-01-08 12:59:39 -08:00
Peter Bell	54777b1e73	Avoid reference invalidation in cuda SpectralOps' plan_caches (#31861 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31412 The root cause is `plan_caches` being resized in one thread while another holds a reference to an existing `CuFFTParamsLRUCache` which then becomes invalidated. I was able to reproduce the crash very reliably without this fix applied and no longer see it. Being a race condition, it's hard to say for sure though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31861 Differential Revision: D19312314 Pulled By: ezyang fbshipit-source-id: 06e4561128d503f2d70cdfe1982be0f3db2a8cf8	2020-01-08 11:50:05 -08:00
Shen Li	7f723cbd8a	Revert D19290954: Implement backend-agnostic rpc._wait_all_workers() utility Test Plan: revert-hammer Differential Revision: D19290954 Original commit changeset: cdb22203c2f2 fbshipit-source-id: 2ae194a06a645e4f48879271eccf0588b0956cd3	2020-01-08 10:25:51 -08:00
Xiang Gao	c66ca74f03	Add device debug info to CUDA build (#31929 ) Summary: Also print NVCC flags in the summary Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929 Differential Revision: D19312079 Pulled By: ezyang fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037	2020-01-08 09:56:20 -08:00
Sebastian Messmer	f0072b3af5	Remove C++11 compatibility from c10::optional (#30919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30919 deletecode ghstack-source-id: 96383227 Test Plan: waitforsandcastle Differential Revision: D18869641 fbshipit-source-id: c08345d17a291cea3749af20473b6acddc78ab27	2020-01-08 09:19:59 -08:00
Sebastian Messmer	f67851d69a	Fix c10::util::get_fully_qualified_type_name for MSVC (#31313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31313 This is a bugfix. The reason we couldn't enable the constexpr-ness for it before is that it was buggy, and without constexpr it crashed at runtime and not at compile time which seems to have passed our CI unfortunately... ghstack-source-id: 96380160 Test Plan: Now it works even when enabling constexpr for it Differential Revision: D19087471 fbshipit-source-id: 28be107389f4507d35d08eab4b089a405690529b	2020-01-08 09:11:10 -08:00
Sebastian Messmer	2a294aace6	Remove memory ordering from LeftRight (#31026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31026 This is error prone and probably wrong. Since we don't use LeftRight on the hot path anymore, let's remove this. ghstack-source-id: 96369644 Test Plan: none Differential Revision: D18902165 fbshipit-source-id: 7b9478cd7cc071f403d75da20c7c889c27248b5c	2020-01-08 08:59:30 -08:00
James Donald	84dfa96f62	Fix -Wundef warning in conversions.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31911 Test Plan: * CI builds including GPU and OSS-build tests * The `defined(__HIP_DEVICE_COMPILE__) ` instance a few lines below is proof that this is a define/undef flag, not a define01 flag Reviewed By: hlu1 Differential Revision: D19296560 fbshipit-source-id: 1c45069aec534b0bf4a87751a74680675c985e06	2020-01-08 08:39:37 -08:00
Alban Desmaison	ee817012b2	Add more tests to the autograd wrt view and inplace (#31147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31147 The goal here is to add more tests of the current behavior of the autograd to make sure no regressions are introduced when modifying it. Do let me know if you think of other corner cases I missed. Test Plan: Imported from OSS Differential Revision: D19301082 Pulled By: albanD fbshipit-source-id: 2cb07dcf99e56eb1f2c56a179796f2e6042d5a2d	2020-01-08 07:14:52 -08:00

1 2 3 4 5 ...

23444 Commits