pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

xiaobing.zhang 9ba6a768de Add op bitwise_or (#31559 ) Summary: ezyang , this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 . Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__or__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ior__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.17616272252053022 device: cpu, dtype: torch.uint8, 100000 times 0.17148233391344547 device: cpu, dtype: torch.int16, 100000 times 0.17616403382271528 device: cpu, dtype: torch.int32, 100000 times 0.17717823758721352 device: cpu, dtype: torch.int64, 100000 times 0.1801931718364358 device: cuda, dtype: torch.int8, 100000 times 1.270583058707416 device: cuda, dtype: torch.uint8, 100000 times 1.2636413089931011 device: cuda, dtype: torch.int16, 100000 times 1.2839747751131654 device: cuda, dtype: torch.int32, 100000 times 1.2548385225236416 device: cuda, dtype: torch.int64, 100000 times 1.2650810535997152 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031136621721088886 device: cpu, dtype: torch.uint8, 10000 times 0.030786747112870216 device: cpu, dtype: torch.int16, 10000 times 0.02391665056347847 device: cpu, dtype: torch.int32, 10000 times 0.024147341027855873 device: cpu, dtype: torch.int64, 10000 times 0.024414129555225372 device: cuda, dtype: torch.int8, 10000 times 0.12741921469569206 device: cuda, dtype: torch.uint8, 10000 times 0.1249831635504961 device: cuda, dtype: torch.int16, 10000 times 0.1283819805830717 device: cuda, dtype: torch.int32, 10000 times 0.12591975275427103 device: cuda, dtype: torch.int64, 10000 times 0.12655890546739101 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3908365070819855 device: cpu, dtype: torch.uint8, 100000 times 0.38267823681235313 device: cpu, dtype: torch.int16, 100000 times 0.38239253498613834 device: cpu, dtype: torch.int32, 100000 times 0.3817988149821758 device: cpu, dtype: torch.int64, 100000 times 0.3901665909215808 device: cuda, dtype: torch.int8, 100000 times 1.4211318120360374 device: cuda, dtype: torch.uint8, 100000 times 1.4215159295126796 device: cuda, dtype: torch.int16, 100000 times 1.4307750314474106 device: cuda, dtype: torch.int32, 100000 times 1.4123614141717553 device: cuda, dtype: torch.int64, 100000 times 1.4480243818834424 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06468924414366484 device: cpu, dtype: torch.uint8, 10000 times 0.06442475505173206 device: cpu, dtype: torch.int16, 10000 times 0.05267547257244587 device: cpu, dtype: torch.int32, 10000 times 0.05286940559744835 device: cpu, dtype: torch.int64, 10000 times 0.06211103219538927 device: cuda, dtype: torch.int8, 10000 times 0.15332304500043392 device: cuda, dtype: torch.uint8, 10000 times 0.15353196952492 device: cuda, dtype: torch.int16, 10000 times 0.15300503931939602 device: cuda, dtype: torch.int32, 10000 times 0.15274472255259752 device: cuda, dtype: torch.int64, 10000 times 0.1512152962386608 ``` After: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.2465507509186864 device: cpu, dtype: torch.uint8, 100000 times 0.2472386620938778 device: cpu, dtype: torch.int16, 100000 times 0.2469814233481884 device: cpu, dtype: torch.int32, 100000 times 0.2535214088857174 device: cpu, dtype: torch.int64, 100000 times 0.24855613708496094 device: cuda, dtype: torch.int8, 100000 times 1.4351346511393785 device: cuda, dtype: torch.uint8, 100000 times 1.4434308474883437 device: cuda, dtype: torch.int16, 100000 times 1.4520929995924234 device: cuda, dtype: torch.int32, 100000 times 1.4456610176712275 device: cuda, dtype: torch.int64, 100000 times 1.4580101007595658 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.029985425993800163 device: cpu, dtype: torch.uint8, 10000 times 0.03024935908615589 device: cpu, dtype: torch.int16, 10000 times 0.026356655173003674 device: cpu, dtype: torch.int32, 10000 times 0.027377349324524403 device: cpu, dtype: torch.int64, 10000 times 0.029163731262087822 device: cuda, dtype: torch.int8, 10000 times 0.14540370367467403 device: cuda, dtype: torch.uint8, 10000 times 0.1456305105239153 device: cuda, dtype: torch.int16, 10000 times 0.1450125053524971 device: cuda, dtype: torch.int32, 10000 times 0.1472016740590334 device: cuda, dtype: torch.int64, 10000 times 0.14709716010838747 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.27195510920137167 device: cpu, dtype: torch.uint8, 100000 times 0.2692424338310957 device: cpu, dtype: torch.int16, 100000 times 0.27726674638688564 device: cpu, dtype: torch.int32, 100000 times 0.2815811652690172 device: cpu, dtype: torch.int64, 100000 times 0.2852728571742773 device: cuda, dtype: torch.int8, 100000 times 1.4743850827217102 device: cuda, dtype: torch.uint8, 100000 times 1.4766502184793353 device: cuda, dtype: torch.int16, 100000 times 1.4774163831025362 device: cuda, dtype: torch.int32, 100000 times 1.4749693805351853 device: cuda, dtype: torch.int64, 100000 times 1.5772947426885366 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03614502027630806 device: cpu, dtype: torch.uint8, 10000 times 0.03619729354977608 device: cpu, dtype: torch.int16, 10000 times 0.0319912089034915 device: cpu, dtype: torch.int32, 10000 times 0.03319283854216337 device: cpu, dtype: torch.int64, 10000 times 0.0343862259760499 device: cuda, dtype: torch.int8, 10000 times 0.1581476852297783 device: cuda, dtype: torch.uint8, 10000 times 0.15974601730704308 device: cuda, dtype: torch.int16, 10000 times 0.15957212820649147 device: cuda, dtype: torch.int32, 10000 times 0.16002820804715157 device: cuda, dtype: torch.int64, 10000 times 0.16129320487380028 ``` Fix https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559 Differential Revision: D19315875 Pulled By: ezyang fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad		2020-01-08 15:06:30 -08:00
..
_static	Improve documentation around builtin functions (#30347 )	2019-12-04 13:50:40 -08:00
_templates	Generate sphinx docs with secure content. (#18508 )	2019-03-27 11:01:48 -07:00
community	Update persons_of_interest.rst	2019-12-05 21:20:40 -08:00
notes	minor doc tweak to use mp.spawn in example (#30381 )	2020-01-06 22:19:01 -08:00
org/pytorch	Revert D17850696: [pytorch][PR] Updates to quantization related files, index.rst, and javadocs	2019-10-10 09:23:33 -07:00
scripts	Add torch.nn.GELU for GELU activation (#28944 )	2019-11-03 21:55:05 -08:00
__config__.rst	Allow a non-OpenMP based build (#19749 )	2019-05-06 19:34:48 -07:00
autograd.rst	Added docs for context method mixins. Fixes issue #27365 (#28643 )	2019-10-28 08:31:35 -07:00
bottleneck.rst	[docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763 )	2018-04-19 13:15:27 -04:00
checkpoint.rst	Stashing checkpointing RNG states based on devices of arg tensors (#14518 )	2018-12-11 09:48:45 -08:00
conf.py	Improve documentation around builtin functions (#30347 )	2019-12-04 13:50:40 -08:00
cpp_extension.rst	Inline JIT C++ Extensions (#7059 )	2018-04-30 11:48:44 -04:00
cuda_deterministic_backward.rst	Typo correction in cuda_deterministic_backward.rst (#25011 )	2019-08-22 21:19:39 -07:00
cuda_deterministic.rst	Amend nondeterminism notes (#12217 )	2018-10-16 23:59:26 -07:00
cuda.rst	Fix most documentation warnings (#27782 )	2019-10-13 10:34:01 -07:00
cudnn_deterministic.rst	Amend nondeterminism notes (#12217 )	2018-10-16 23:59:26 -07:00
cudnn_persistent_rnn.rst	don't copy weight gradients in rnn (#12600 )	2018-10-12 13:34:10 -07:00
data.rst	Fix typo in data.rst docs	2019-12-18 09:52:10 -08:00
distributed.rst	Fix typos (#30606 )	2019-12-02 20:17:42 -08:00
distributions.rst	Revert D18249048: Moved VonMises distribution with sampling upstream from Pyro.	2019-11-04 09:50:50 -08:00
dlpack.rst	document torch.utils.dlpack (#9343 )	2018-07-11 07:46:09 -07:00
hub.rst	Fix typos (#30606 )	2019-12-02 20:17:42 -08:00
index.rst	Restructure docs organization and naming (#31849 )	2020-01-07 11:16:53 -08:00
jit_builtin_functions.rst	Fix builtin function reference (#24056 )	2019-08-09 15:58:15 -07:00
jit_language_reference.rst	Cleanup after moving language reference (#31146 )	2019-12-18 15:09:35 -08:00
jit_python_reference.rst	Add Python language reference docs (#30686 )	2019-12-26 13:21:36 -08:00
jit_unsupported.rst	add unsupported section (#31329 )	2019-12-18 13:56:02 -08:00
jit.rst	Add Python language reference docs (#30686 )	2019-12-26 13:21:36 -08:00
math-quantizer-equation.png	adding quantization.rst file for quantization feature (#27559 )	2019-10-09 16:45:09 -07:00
model_zoo.rst	add/move a few apis in torch.hub (#18758 )	2019-04-10 23:10:39 -07:00
multiprocessing.rst	Bag of documentation fixes; fix more sphinx warnings (#27850 )	2019-10-15 07:31:14 -07:00
name_inference.rst	Fix typos (#30606 )	2019-12-02 20:17:42 -08:00
named_tensor.rst	Bag of documentation fixes; fix more sphinx warnings (#27850 )	2019-10-15 07:31:14 -07:00
nn.functional.rst	Breaks up NN module in docs so it loads faster.	2019-06-11 09:38:41 -07:00
nn.init.rst	Bag of documentation fixes; fix more sphinx warnings (#27850 )	2019-10-15 07:31:14 -07:00
nn.rst	Pruning Functionality (#24076 )	2019-11-08 19:38:00 -08:00
onnx.rst	Update onnx landing page for 1.3 (#27581 )	2019-10-11 20:53:50 -07:00
optim.rst	Fix capitalization inconsistency in optim.rst	2019-12-04 08:17:03 -08:00
packages.rst	Revert D17850696: [pytorch][PR] Updates to quantization related files, index.rst, and javadocs	2019-10-10 09:23:33 -07:00
quantization.rst	Updates to quantization documentation (#30288 )	2019-11-23 09:29:30 -08:00
random.rst	Fix most documentation warnings (#27782 )	2019-10-13 10:34:01 -07:00
rpc.rst	Document WorkerInfo and RpcBackendOptions structures in RPC docs. (#31077 )	2019-12-11 11:39:57 -08:00
sparse.rst	Bag of documentation fixes; fix more sphinx warnings (#27850 )	2019-10-15 07:31:14 -07:00
storage.rst
tensor_attributes.rst	Expose a torch.result_type and simplify tensor iterator	2019-09-25 06:52:23 -07:00
tensorboard.rst	Add method add_hparams to API doc (#27344 )	2019-10-03 17:07:45 -07:00
tensors.rst	Add op bitwise_or (#31559 )	2020-01-08 15:06:30 -08:00
torch.rst	Add op bitwise_or (#31559 )	2020-01-08 15:06:30 -08:00
type_info.rst	Allow converting char tensor to numpy; add [fi]info.min (#15046 )	2018-12-24 09:11:24 -08:00