Commit Graph

43090 Commits

Author SHA1 Message Date
Jason Ansel
ac26f8237c Allow disabling nvfuser without CUDA (#71358)
Summary:
On a CPU-only build of pytorch `torch._C._jit_set_nvfuser_enabled(False)` would throw an error (even though it is a no-op operation), with this fix:
```
>>> torch._C._jit_set_nvfuser_enabled(False)
False
>>> torch._C._jit_set_nvfuser_enabled(True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Running CUDA fuser is only supported on CUDA builds.
>>>
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71358

Reviewed By: eellison

Differential Revision: D33601135

Pulled By: jansel

fbshipit-source-id: c764df2fa197ce7b4f71e5df0a91cd988766e99c
(cherry picked from commit a801df9321)
2022-01-19 20:01:09 +00:00
Pearu Peterson
214f4bf2ff Support sparse.sum on empty sparse tensor (#71091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71091

Fixes https://github.com/pytorch/pytorch/issues/65394

The masked sum on a full input tensor (of any layout) with an all-true mask is the same as the sum on the strided input tensor (after applying `to_dense` to sparse inputs).
Since masked sum uses `torch.sparse.sum` then, for the simplicity of masked reductions implementations, its reduction behavior ought to be defined by the behavior of the `torch.sum`. This PR implements the behavioral connection with respect to the directional summation of empty sparse tensors that correspond to all-zero strided tensors.

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D33651750

Pulled By: cpuhrsch

fbshipit-source-id: 703891bff88c8da6270b4272f5d2da81688db67d
(cherry picked from commit 53f97e80f7)
2022-01-19 18:58:08 +00:00
Rohan Varma
3b589c3497 [DDP Checkpointing] non-reentrant checkpoint tests (#69060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69060

Saved variable hooks checkpointing was added in https://github.com/pytorch/pytorch/pull/69508, this PR adds some tests for DDP.

Specifically, we can support almost all DDP use cases with this new API, such as dynamic module with find_unused_parameters=True. One case remains to be supported, which is static_graph + non-reentrant based checkpointing. The underlying reason this does not work is https://github.com/pytorch/pytorch/issues/58111.
ghstack-source-id: 147219887

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32712126

fbshipit-source-id: ba5ae9ca77fd8929ee020c7dc97838bae9a1931b
(cherry picked from commit 9c7f93e217)
2022-01-19 18:09:41 +00:00
Richard Barnes
75aaa9f92b Remove simd qualifier for pragma omp loop in upsample_nearest_op.h (#71462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71462

Fixes
```
      6 aienv/aienv_ig_reels_base:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
      6 deep_entity_classification/si_dec_gnn:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
      6 feed_recommendation_infra/multifeed_execution_graph_service_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
     12 mobile_cv/mobile-vision_experimental:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
     30 mobile_cv/mobile-vision_xraymobilev2_detection_caffe2:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
     42 aienv/aienv:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
    128 feed_recommendation_infra/multifeed_recagg_dev:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
    136 fluent2/fblearner_flow_projects_fluent2_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
   1338 f6/f6_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
```

Test Plan: Sandcastle

Reviewed By: luciang

Differential Revision: D33641869

fbshipit-source-id: 8424849cfac5cb0109272dec2086863067bbde66
(cherry picked from commit d18429905c)
2022-01-19 18:04:10 +00:00
kshitij12345
908fd3d78b [fix] composite compliance: quantile and nanquantile (#70894)
Summary:
Reference https://github.com/pytorch/pytorch/issues/69991

Refactored such that only `out` variant copies the result into `out` otherwise we just return the result of the composite functions as is.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70894

Reviewed By: samdow

Differential Revision: D33641742

Pulled By: zou3519

fbshipit-source-id: 671be13b31a7fff3afc0b7976706a5ecfc51ccac
(cherry picked from commit e7d5ac9af3)
2022-01-19 17:54:00 +00:00
Mike Ruberry
a0ada2d22b Back out "[pytorch][PR] Performance and memory improvements to batched torch.linalg.solve" (#71421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71421

Original commit changeset: 7a0dd443cd0e

Original Phabricator Diff: D33028236 (410e91adee)

Test Plan: PyTorch OSS CI

Reviewed By: ngimel

Differential Revision: D33637628

fbshipit-source-id: 1e81485be202b2f9d6a1ff315279cc099754c2dc
(cherry picked from commit c2d730bfeb)
2022-01-19 17:26:01 +00:00
Nikita Shulga
8a9243996c Lazy load pandas when importing pytorch (#71316)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/71313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71316

Reviewed By: wenleix

Differential Revision: D33595043

Pulled By: malfet

fbshipit-source-id: da8c7a7f132696645191d7b7055c4c21970d92c3
(cherry picked from commit 2d4847780a)
2022-01-19 17:02:50 +00:00
Jane Xu
671a0b5376 Move sccache compilation log to its own group (#71444)
Summary:
The sccache compilation log is often misleading.

We can move it to its own group so people don't see it right away

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71444

Reviewed By: atalman

Differential Revision: D33659650

Pulled By: janeyx99

fbshipit-source-id: f22fd21640a8747beeacce8857bbb8281efd76f4
(cherry picked from commit e25970abf9)
2022-01-19 16:47:36 +00:00
Andrey Talman
7ed2a43d26 Adding wheels with py3.10 (#71419)
Summary:
Adding wheels with py3.10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71419

Reviewed By: janeyx99

Differential Revision: D33657770

Pulled By: atalman

fbshipit-source-id: 5d24f1771991ff07fbfd92d04d3d5211cf53084c
(cherry picked from commit bf2f2624e1)
2022-01-19 16:40:39 +00:00
Pritam Damania
b56ba296b1 Support multiple input dims for sharded linear. (#70266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70266

Addresses some of the issues mentioned in
https://github.com/pytorch/pytorch/issues/65638. ShardedLinear implementation
only support 2D inputs.

On the other hand `nn.Linear` supports arbitrary dimensions for inputs and
outputs. As a result, in this PR I've added support to ensure that
ShardedLinear supports arbitrary input dims as well.
ghstack-source-id: 147206607

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D33267630

fbshipit-source-id: 0460994c3aa33348b80547d9274206ef90cb29b6
(cherry picked from commit 7c289e1dbf)
2022-01-19 08:07:14 +00:00
Rohan Varma
fbc3b8c1bb [RPC] Fix a few flaky RPC tsan tests (#71460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71460

When running with TSAN, we use a larger RPC timeout: https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/dist_utils.py#L68. As a result, the assertions here are invalid.

Tried to fix this by just setting `self.rpc_backend_options.rpc_timeout` to the new timeout, but `rpc_backend_options` is reconstructed every time it is accessed, so this doesn't work:: https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py#L15

Just removing the asserts should be fine as they don't really add value to what's being tested.
ghstack-source-id: 147208455

Test Plan: CI

Reviewed By: fduwjj

Differential Revision: D33648421

fbshipit-source-id: 9a5052b1c851fe7f838792d8bdf17d0563b4aa00
(cherry picked from commit 96ddab3433)
2022-01-19 06:12:43 +00:00
Chen Lai
9515213070 [Operator Versioning] Remove version compare as they are decoupled now (#71461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71461

After operator versioning work, the version in model file is used for operator versioning, while bytecode_version is used for bytecode versioning (for bytecode schema). They are two seperate things now and this comparison is not needed.
ghstack-source-id: 147209286

Test Plan: CI

Reviewed By: iseeyuan, tugsbayasgalan

Differential Revision: D33648592

fbshipit-source-id: beaa136a728f88435176a00c07b2d521210f107f
(cherry picked from commit e90e650e1a)
2022-01-19 04:51:45 +00:00
Pearu Peterson
677fab6d1d Support broadcast_to on sparse COO tensors (#71073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71073

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33645744

Pulled By: cpuhrsch

fbshipit-source-id: 4775c9636c4e868022a8c1bbfec93e351d1cf885
(cherry picked from commit 640f21e09a)
2022-01-19 04:33:41 +00:00
Mike Ruberry
9b9b878c89 Fixes jiterator cache macro include + updates CUDA note with cache variables (#71452)
Summary:
Per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71452

Reviewed By: ngimel

Differential Revision: D33646495

Pulled By: mruberry

fbshipit-source-id: bbf627e6d7a724a83a3ea2ae9c0f50430f8d578e
(cherry picked from commit d1e72b144a)
2022-01-19 03:45:05 +00:00
Peter Bell
125bdb6d51 empty_meta: Add functions that don't depend on Tensor (#70615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70615

This adds `at::detail::empty_meta` and
`at::detail::empty_strided_meta` to complement the cpu API.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33623678

Pulled By: ngimel

fbshipit-source-id: 59e003116361fb547ec2c633bbc15a7973e21d0e
(cherry picked from commit b4f5836fa1)
2022-01-19 03:41:20 +00:00
Mengchi Zhang
b4a75af758 [fx2trt] Export some options out (#71315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71315

Add variables in LowerSetting to export options from TRTInterpreter and interpreter.run:
- explicit precision
- int8_mode

Export skip_folding_node_fn options from split_const_subgraphs.

Reviewed By: wushirong

Differential Revision: D33585385

fbshipit-source-id: 3d20b69d255ad97487e462436ae479587a8e2118
(cherry picked from commit f24a279517)
2022-01-19 02:13:31 +00:00
Peter Bell
87215ed526 empty_strided: Factor out generic implementation (#70614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70614

This creates an `empty_strided_generic` function which, similar to
`empty_generic`, is a device-independent tensor constructor. This also
adds `at::detail::empty_strided_cpu` to complement
`at::detail::empty_cpu`.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33623679

Pulled By: ngimel

fbshipit-source-id: 85994e88d664870bf425f398dfcdfc467885c694
(cherry picked from commit 2ff2a89df5)
2022-01-19 01:54:16 +00:00
Matthias Braun
d5e9a276ea Adapt to llvm marking SmallVector::set_size private (#71434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71434

See also https://reviews.llvm.org/D115380

Reviewed By: zhuhan0

Differential Revision: D33638540

fbshipit-source-id: a55e51462dc0d8f55a75bb79d9d76db781a36af2
(cherry picked from commit 78d1d65f77)
2022-01-19 00:54:03 +00:00
Eli Uriegas
30739f5329 ci: Change binary trigger to be nightly push (#71447)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71447

Changes the nightly build trigger to be based on pushes to the `nightly`
branch instead of being based on the tagged push. This aligns it with
our current CircleCI trigger and should make it so that it's easily
viewable using tools like https://hud.pytorch.org/ci/pytorch/pytorch/nightly

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33647102

Pulled By: seemethere

fbshipit-source-id: c6757da35b7ec2d68bf36160dd7f3cb9ed040899
(cherry picked from commit 99b7b22650)
2022-01-19 00:27:42 +00:00
Peter Bell
6f4c491c6b empty_cpu: Add functions that don't depend on Tensor (#70613)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70613

This refactors `at::detail::empty_cpu` to use only `TensorBase` so you
can construct tensors without including `Tensor.h`. It also adds a
`TensorOptions` version to reduce friction in operators moving from
the `at::empty` API.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33623682

Pulled By: ngimel

fbshipit-source-id: 7a7b08bc2ed06830a3d698197a0c8389a096dc1d
(cherry picked from commit 2e17ad0bbd)
2022-01-19 00:01:58 +00:00
Yan Li
6964aa2ced backout D33469839 (#71443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71443

cogwheel test inline_cvr_infer_canary_pyper_model_publish is timing out.

The convert_fx call takes > 20 mins for local and local_ro sub modules, which used to take ~ 2 mins.

Test Plan:
Fblearn flow run
* the following cmd took 1113 seconds before the diff and 5002 seconds after.
    flow-cli clone-locally 320014219  --run-as-secure-group pytorch_at_scale  --operators pyper_model_publish_workflow.pyper_model_publish_workflow.process_torch_package_model_files.process_non_sparse_parameters[0]

Cogwheel test
* Cogwheel test with packages in B3588 (the last good run) took 4694.48s
* Cogwheel test with packages in B3590 (the first timeout) took 13975.83s
* Cogwheel test with the following packages took 4535.04s
  * all packages in B3588 except the model publish
  * the model publish built with D33469839 (043e84b3d2) reversed (created D33633570)

Reviewed By: albanD, jerryzh168

Differential Revision: D33633570

fbshipit-source-id: dc5e777c48a90c551641a3f79126461f6a60449e
(cherry picked from commit 03ab65023a)
2022-01-18 23:51:51 +00:00
Rohan Varma
4fd1992a60 [Docs][BE] DDP doc fix (#71363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71363

Looks like DDP example is currently broken as per
https://discuss.pytorch.org/t/official-ddp-example-is-broken/141493. Fix the
issue by setting the correct env variable.
ghstack-source-id: 147080377

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D33607250

fbshipit-source-id: e0e7d03cc365c186253b959c4c5405a5e3609218
(cherry picked from commit 32472884ec)
2022-01-18 22:24:51 +00:00
Taylor Robie
322f13d914 [Profiler] Fix memory profile type from recent refactor (#71417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71417

I accidentally changed CPU_INSTANT_EVENT to CPU_OP, which broke TensorBoard.

Test Plan: Make memory profiling unit test check this case.

Reviewed By: aaronenyeshi

Differential Revision: D33637286

fbshipit-source-id: c95945f6b85cd4168820bd4d2a9203274a0a5bd6
(cherry picked from commit b1e258672a)
2022-01-18 22:18:11 +00:00
Nikita Shulga
ff8fb717db Fix get_git_repo_dir (#71448)
Summary:
Otherwise, rev-list will only pick-up commits in `.github` repo

Before:
```
% git -C .github rev-list 1eb6146d967b2d09af37c54af411d03f0b790209..1ff7f65cc1ad499a71457368894ca14bed069749 -- .
598b55fd18
ae089d6bdf
```
After
```
% git -C . rev-list 1eb6146d967b2d09af37c54af411d03f0b790209..1ff7f65cc1ad499a71457368894ca14bed069749 -- .
1ff7f65cc1
2ac58b0dc1
598b55fd18
55899528a2
ae089d6bdf
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71448

Reviewed By: seemethere, atalman

Differential Revision: D33644256

Pulled By: malfet

fbshipit-source-id: fa2e06f6767e7702af6ce85471aea07fa58292c0
(cherry picked from commit 594cecc0e1)
2022-01-18 22:12:41 +00:00
XiaobingSuper
b8679ee1fc fix conv+bn folding issue when bn hasn't running states (#71259)
Summary:
Doing conv+bn folding which bn hasn't a running stats, there have error for JIT and FX path:

```
import torch

import torch.nn as nn

import torch.fx.experimental.optimization as optimization

class M(nn.Module):
    def __init__(self):
        super(M, self).__init__()
        self.conv = nn.Conv2d(32, 64, 3, stride=2)
        self.bn = nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return x

x = torch.randn([1, 32, 50, 50])

model = M().eval()

'''
# jit path
with torch.no_grad():
    traced = torch.jit.trace(model, x).eval()
    traced = torch.jit.freeze(traced)
'''

# FX path
fused_model = optimization.fuse(model)
```

expected result:
1. JIT path
```
Traceback (most recent call last):
  File "bn_test.py", line 27, in <module>
    traced = torch.jit.freeze(traced)
  File "/home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.8/site-packages/torch/jit/_freeze.py", line 119, in freeze
    run_frozen_optimizations(out, optimize_numerics, preserved_methods)
  File "/home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.8/site-packages/torch/jit/_freeze.py", line 167, in run_frozen_optimizations
    torch._C._jit_pass_optimize_frozen_graph(mod.graph, optimize_numerics)
RuntimeError: Expected Tensor but got None
```
2. FX path
```
Traceback (most recent call last):
  File "bn_test.py", line 31, in <module>
    model = optimization.fuse(model, inplace=True)
  File "/home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.8/site-packages/torch/fx/experimental/optimization.py", line 71, in fuse
    fused_conv = fuse_conv_bn_eval(conv, bn)
  File "/home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.8/site-packages/torch/nn/utils/fusion.py", line 11, in fuse_conv_bn_eval
    fuse_conv_bn_weights(fused_conv.weight, fused_conv.bias,
  File "/home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.8/site-packages/torch/nn/utils/fusion.py", line 23, in fuse_conv_bn_weights
    bn_var_rsqrt = torch.rsqrt(bn_rv + bn_eps)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
```

This PR will fix this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71259

Reviewed By: anjali411

Differential Revision: D33595049

Pulled By: davidberard98

fbshipit-source-id: 0fe56bb2bb25d6d54ebc53789d2ad22458da9012
(cherry picked from commit 5672c08378)
2022-01-18 22:12:41 +00:00
Nikita Shulga
a986154950 Lazy import packaging in torch_version (#71345)
Summary:
As it is a pretty big package and to be used during normal
course of PyTorch initialization

Fixes https://github.com/pytorch/pytorch/issues/71280

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71345

Reviewed By: seemethere

Differential Revision: D33594547

Pulled By: malfet

fbshipit-source-id: e0abea82dbdc29914512b610692701140d3e68a2
(cherry picked from commit 1ff7f65cc1)
2022-01-18 22:12:41 +00:00
Andrey Talman
efd274bbcb Fix for windows builds with python 3.10 , getting rid of ssize_t (ssize_t is not a C++ defined type) (#71390)
Summary:
Fix for windows builds with python 3.10 , getting rid of ssize_t

Here is the completed bin build : https://app.circleci.com/pipelines/github/pytorch/pytorch/441527/workflows/144edb79-b398-4d70-92fe-b63158c1b439/jobs/16954881

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71390

Reviewed By: samdow

Differential Revision: D33637686

Pulled By: atalman

fbshipit-source-id: fcdfca672dc20385a3d2339c20e69bd2d1717e88
(cherry picked from commit 2ac58b0dc1)
2022-01-18 22:12:41 +00:00
Peiqi Yin
ea0524dbc3 [FIX LOG] Complete a '\n' in GRAPH_DEBUG (#70421)
Summary:
In file graph_executor.cpp, line 963, a '\n' is missing in GRAPH_DEBUG, which all other GRAPH_DEBUG places here holds.
The output in GRAPH_DEBUG seems weird.

[DEBUG graph_executor.cpp:963] After CheckInplace (end of runOptimization)graph(%0 : Float(*, *, *, *, requires_grad=0, device=cpu),

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70421

Reviewed By: Gamrix

Differential Revision: D33596430

Pulled By: davidberard98

fbshipit-source-id: 0e7c3c02ce44bf925f0c45e96a382104059fe397
(cherry picked from commit 55899528a2)
2022-01-18 22:12:41 +00:00
Eli Uriegas
02ac73a973 ci: Add PR trigger for binary builds workflows (#71431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71431

Adds a PR trigger based on paths to the binary build workflows to make
it easier to test / verify changes to the binary build workflows without
adding a bunch of skipped checks to the majority of our workflows

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: atalman

Differential Revision: D33641276

Pulled By: seemethere

fbshipit-source-id: 0ed65cbcebf06dfe998f81d67df817250dd1a716
(cherry picked from commit 598b55fd18)
2022-01-18 21:19:27 +00:00
Nikita Shulga
5243986df6 Update syncbranches workflow (#71420)
Summary:
Use `pytorchmergebot` credentials to do the merge
Infer sync branch name from the workflow rather than hardcode it
Move common functions from `syncbranches.py` to `gitutils.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71420

Reviewed By: bigfootjon

Differential Revision: D33638846

Pulled By: malfet

fbshipit-source-id: a568fd9ca04f4f142a7f5f64363e9516f5f4ef1c
2022-01-18 11:31:57 -08:00
Jane Xu
1eb6146d96 Add manual simple retry to ECR login (#71287)
Summary:
Current retry with AWS_MAX_ATTEMPTS does not seem to work as we still get failures https://github.com/pytorch/pytorch/runs/4806177738?check_suite_focus=true

This should hopefully alleviate

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71287

Reviewed By: malfet, seemethere

Differential Revision: D33573788

Pulled By: janeyx99

fbshipit-source-id: 300fde9a9fa5a2da3e9d18b7989a3676500d8011
2022-01-18 10:56:53 -08:00
Peter Bell
2bb6a4f437 Generate aten_interned_strings.h automatically (#69407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69407

This generates aten_interned_strings.h from `native_functions.yaml`
which is more like how it was originally done. The items deleted from
`interned_strings.h` are duplicates that need to be removed in order
for the code to compile, some of the remaining items may still be out
of date but it is fairly benign even if that's the case.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32923636

Pulled By: albanD

fbshipit-source-id: a0fd6b3714e70454c5f4ea9b19da5e047d2a4687
2022-01-18 08:29:54 -08:00
Michael Dagitses
d665097cad allow Bazel to build without glog and gflags (#70850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70850

We support both, so we want to ensure both continue to work.
ghstack-source-id: 146960552

Test Plan: Tested manually. A subsequent diff adds this test configuration to CI.

Reviewed By: malfet

Differential Revision: D33297464

fbshipit-source-id: 70e1431d0907d480c576239af93ef57036d5e4d7
2022-01-18 08:08:46 -08:00
Michael Dagitses
ffdc6b4994 extract //c10/macros to its own package (#70849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70849

ghstack-source-id: 146960563

Test Plan: Bazel CI tests will protect this.

Reviewed By: malfet

Differential Revision: D33297235

fbshipit-source-id: 6504a977e82ad2f2232a74233b96cdea8bf94a20
2022-01-18 08:08:42 -08:00
Michael Dagitses
8d0e354191 fix CAFFE2_BUILD_MAIN_LIB to the correct C10_BUILD_MAIN_LIB (#70848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70848

This is the C10 library, it that's the main lib we are building
here. While here, use `local_defines` instead of `copts` for this
definition. Both `copts` and `local_defines` only apply to the
compilation units in the library, and not transitively.
ghstack-source-id: 146998039

Test Plan: We are relying on CI to verify this doesn't cause any problems.

Reviewed By: malfet

Differential Revision: D33429420

fbshipit-source-id: b3fc84c0588bd43346e3f9f77e851d293bde9428
2022-01-18 08:05:20 -08:00
Erjia Guan
fd9e08df5d Make Demux serializable with lambda function (#71311)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71311

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D33584552

Pulled By: ejguan

fbshipit-source-id: 52324faf5547f9f77582ec170ec91ce3114cfc61
2022-01-18 06:47:54 -08:00
CodemodService FBSourceClangFormatLinterBot
f0db15122f [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33629127

fbshipit-source-id: 47befcd98cfa544a4d822161d8bfbe8d7a788e4d
2022-01-18 01:50:08 -08:00
Mike Ruberry
d17f340a2e The Cacherator (#71350)
Summary:
This PR adds a persistent filesystem cache for jitted kernels. The cache is disabled on Windows because it relies on POSIX headers.

The cache writes, by default, to `~/.cache/torch/kernels`, but the location can be controlled by setting the `PYTORCH_KERNEL_CACHE_PATH`. A separate environment variable, `USE_PYTORCH_KERNEL_CACHE`, will disable all caching logic when set to zero.

The use of a persistent fileystem cache dramatically lowers the "first call time" for an operator AFTER its has been compiled, because it skips (most of) the jit compilation process. On systems where we're compiling only to ptx that ptx still has to be just-in-time compiled by the driver API, so an additional latency of around 10 milliseconds is expected at first call time. On systems which compile to SASS the additional first call time latency is about one millisecond. This compares with times of 150 milliseconds+ for just-in-time kernel compilation.

Files in the cache use a mostly human readable string that includes an SHA1 hash of the CUDA C string used to generate them. Note that this is not an SHA1 hash of the file's contents, because the contents are the compiled ptx or SASS. No verification is done when the file is loaded to ensure the kernel is what's expected, but it's far more likely you'll be struck by a meteor than observe two file names conflict. Using SHA1 hashes to generate unique ids this way is a common practice (GitHub does it, too).

This cache design could be reused by other fusion systems and should allow us to jiterate more operations without fear of regressing the "incremental development" scenario where users are tweaking or extending programs slightly, rerunning then, and then repeating that process again and again. Without a cache each run of the program would have to recompile every jitted kernel, but with this cache we expect a negligible impact to the user experience.

cc kshitij12345, xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71350

Reviewed By: ngimel

Differential Revision: D33626671

Pulled By: mruberry

fbshipit-source-id: d55df53416fbe46348623846f699f9b998e6c318
2022-01-17 23:52:14 -08:00
Peter Bell
7b9fff90d2 empty_generic: Remove redundant device argument (#70612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70612

The device information is embedded in the `DataPtr` returned from the
allocator, so this argument is completely ignored.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D33623681

Pulled By: ngimel

fbshipit-source-id: bea64707bb17d46debb0ed7c1175493df56fee77
2022-01-17 20:18:43 -08:00
Ivan Yashchuk
f93ffc9ea8 Sparse CSR: Handle zero matrix consistently for triangular_solve (#71304)
Summary:
This PR enables `test_block_triangular` tests on the CPU.
These tests revealed that there was a problem with how the nnz==0 case is handled. Now we return a tensor filled with NaNs both on CUDA and CPU.

cc nikitaved pearu cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71304

Reviewed By: davidberard98

Differential Revision: D33600482

Pulled By: cpuhrsch

fbshipit-source-id: d09cb619f8b6e54b9f07eb16765ad1c183c42487
2022-01-17 13:47:49 -08:00
Nolan O'Brien
17540c5c80 [warnings][Caffe2] Suppress warnings in non-c10 headers (#71370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71370

Round out suppressing warnings in `caffe2` headers

Test Plan: CI check

Reviewed By: r-barnes

Differential Revision: D33613084

fbshipit-source-id: 9306d480bd796aeae4d887ad26b6ddc2c571c9e4
2022-01-17 10:09:31 -08:00
Nolan O'Brien
cf47338191 [Caffe2][warnings] Suppress -Wimplicit-int-float-conversion in TypeSafeSignMath.h for clang (#71369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71369

Suppress `-Wimplicit-int-float-conversion` in `TypeSafeSignMath.h` when building with clang

Test Plan: CI check

Reviewed By: r-barnes

Differential Revision: D33612983

fbshipit-source-id: cff1239bc252d4a2f54a50a2bbcd48aeb8bf31ca
2022-01-17 10:05:21 -08:00
Xu Zhao
ddf97a59ca Remove the dependency of pytorch nightly. (#71323)
Summary:
This PR removes the PyTorch nightly dependencies of TorchBench CI. Instead, it relies on the bisection script to install TorchBench dependencies (https://github.com/pytorch/benchmark/pull/694).
This will unblock TorchBench CI users when the nightly build fails (e.g., https://github.com/pytorch/pytorch/issues/71260)

RUN_TORCHBENCH: resnet18
TORCHBENCH_BRANCH: xz9/optimize-bisection

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71323

Reviewed By: wconstab

Differential Revision: D33591713

Pulled By: xuzhao9

fbshipit-source-id: f1308ea33ece1f18196c993b40978351160ccc0c
2022-01-17 09:52:36 -08:00
Nolan O'Brien
a383d01774 [fbcode][warnings] Suppress warnings in caffe2/c10 (#71356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71356

Suppress remaining header based warnings in `caffe2/c10` when building with `clang`

Test Plan: CI pass

Reviewed By: r-barnes

Differential Revision: D33600097

fbshipit-source-id: e1c0d84a0bad768eb03e047d62b5379cf28b48e2
2022-01-15 18:34:08 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
1ecfa1d61a Load zip file in deploy interpreter (#71072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71072

This PR replaces the old logic of loading frozen torch through cpython by directly loading zipped torch modules directly onto deploy interpreter. We use elf file to load the zip file as its' section and load it back in the interpreter executable. Then, we directly insert the zip file into sys.path of the each initialized interpreter. Python has implicit ZipImporter module that can load modules from zip file as long as they are inside sys.path.

Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy

Reviewed By: shunting314

Differential Revision: D32442552

fbshipit-source-id: 627f0e91e40e72217f3ceac79002e1d8308735d5
2022-01-15 14:39:59 -08:00
Jerry Zhang
08d8f81704 [quant][fix][fx][graphmode] Fix qconfig setting for fused modules (#71254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71254

when we configure linear and relu with the same qconfig, we currently have utility functions to also
generate a qconfig for the fused linear relu module, but this code is not called in correct order before
which resulted in unexpected behaviors. This PR fixes the issue. Please see test case for more details.
(Test case is from Supriya)

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_fused_module_qat_swap

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33558321

fbshipit-source-id: d95114dc4b77264e603c262c2da02a3de4acba69
2022-01-14 23:31:11 -08:00
Lucian Grijincu
bb49352354 caffe2/torch/csrc/jit/frontend/tree_views: workaround nvcc compiler error
Test Plan:
Move it outside the header so it's not seen by nvcc

```
$ buck2 build -c fbcode.platform=platform010 fbcode//accelerators/pytorch/lib/cuda:ngram_repeat_block_cuda
Downloading buck2...
[======================================================================]

watchman fresh instance event, clearing cache
Using disallowed linker flag 'arvr/third-party/toolchains/platform009/build/mesa/lib/libGL.so' in library rule 'fbsource//third-party/toolchains:opengl'
Using disallowed linker flag 'arvr/third-party/freeglut/3.0.0/libs/x64-linux/libglut.a' in library rule 'fbsource//third-party/toolchains:GLUT'
Action Failed for fbcode//accelerators/pytorch/lib/cuda:ngram_repeat_block_cuda (ovr_config//platform/linux:x86_64-fbcode-platform010-clang-6dbc4bb1b9a32829)#5:
cxx_compile ngram_repeat_block_cuda_kernel.cu (pic) failed with non-zero exit code 1
debug information: action_digest=b2bda91d24dad53e960c740ef9a412cee1902d86:94
stdout:
stderr:
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Property>]':
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:117:   required from here
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Property>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'}
  220 |     return Maybe<T>(Compound::create(TK_OPTION, range, {value}));
      |                ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note:   initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)'
  143 |       const SourceRange& range_,
      |         ~~~~~~~~~~~~~~~~~~~~~~~~
  144 |       TreeList&& trees_) {
      | ^
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Assign>]':
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:171:   required from here
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Assign>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'}
  220 |     return Maybe<T>(Compound::create(TK_OPTION, range, {value}));
      |                ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note:   initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)'
  143 |       const SourceRange& range_,
      |         ~~~~~~~~~~~~~~~~~~~~~~~~
  144 |       TreeList&& trees_) {
      | ^
cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics
command: buck-out/v2/gen/fbcode/999b02f9444004c1/tools/build/__wrap_nvcc.py__/wrap_nvcc.py -_NVCC_BIN_ fbcode ...<omitted>... ors/pytorch/lib/cuda/__ngram_repeat_block_cuda__/__objects__/ngram_repeat_block_cuda_kernel.cu.pic.o (rerun with -v to view the untruncated command)

```

Reviewed By: zhxchen17

Differential Revision: D33592885

fbshipit-source-id: a36dcb3c8265d009b2287f0a479695d1ddbf85aa
2022-01-14 21:58:31 -08:00
Lucian Grijincu
4bf1be898d caffe: fix warning: overloaded virtual function "torch::jit::Function::call" is only partially overridden in class "torch::jit::GraphFunction"
Summary:
Need to bring in all signatures

https://www.internalfb.com/code/fbsource/[36035b9e4e41813e215ffd5f4377d65b7259237e]/fbcode/caffe2/aten/src/ATen/core/function.h?lines=91-101

Test Plan:
```
Action Failed for fbcode//accelerators/pytorch/lib/cuda:ngram_repeat_block_cuda (ovr_config//platform/linux:x86_64-fbcode-platform010-clang-6dbc4bb1b9a32829)#5:
cxx_compile ngram_repeat_block_cuda_kernel.cu (pic) failed with non-zero exit code 1
debug information: action_digest=988629a726bc4eabcaf334db2317a969958d5fd2:94
stdout:
stderr:
fbcode/caffe2/torch/csrc/jit/api/function_impl.h(11): warning: overloaded virtual function "torch::jit::Function::call" is only partially overridden in class "torch::jit::GraphFunction"

fbcode/caffe2/torch/csrc/jit/api/function_impl.h(11): warning: overloaded virtual function "torch::jit::Function::call" is only partially overridden in class "torch::jit::GraphFunction"

fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Property>]':
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:117:   required from here
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Property>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'}
  220 |     return Maybe<T>(Compound::create(TK_OPTION, range, {value}));
      |                ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note:   initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)'
  143 |       const SourceRange& range_,
      |         ~~~~~~~~~~~~~~~~~~~~~~~~
  144 |       TreeList&& trees_) {
      | ^
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h: In instantiation of 'static torch::jit::Maybe<T> torch::jit::Maybe<T>::create(const torch::jit::SourceRange&, const T&) [with T = torch::jit::List<torch::jit::Assign>]':
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:505:171:   required from here
fbcode/caffe2/torch/csrc/jit/frontend/tree_views.h:220:33: error: cannot convert 'const torch::jit::List<torch::jit::Assign>' to 'torch::jit::TreeList&&' {aka 'c10::SmallVector<c10::intrusive_ptr<torch::jit::Tree>, 4>&&'}
  220 |     return Maybe<T>(Compound::create(TK_OPTION, range, {value}));
      |                ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
fbcode/caffe2/torch/csrc/jit/frontend/tree.h:144:1: note:   initializing argument 3 of 'static torch::jit::TreeRef torch::jit::Compound::create(int, const torch::jit::SourceRange&, torch::jit::TreeList&&)'
  143 |       const SourceRange& range_,
      |         ~~~~~~~~~~~~~~~~~~~~~~~~
  144 |       TreeList&& trees_) {
      | ^
cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option '-Wno-ignored-optimization-argument' may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option '-Wno-ambiguous-reversed-operator' may have been intended to silence earlier diagnostics
command: buck-out/v2/gen/fbcode/999b02f9444004c1/tools/build/__wrap_nvcc.py__/wrap_nvcc.py -_NVCC_BIN_ fbcode ...<omitted>... ors/pytorch/lib/cuda/__ngram_repeat_block_cuda__/__objects__/ngram_repeat_block_cuda_kernel.cu.pic.o (rerun with -v to view the untruncated command)
```

Differential Revision: D33579670

fbshipit-source-id: 9acb443732feb3e921ce0fa5f38f21ed44f64114
2022-01-14 20:27:09 -08:00
Nikita Shulga
3ed27a96ed [BE] Refactor repetitions into TorchVersion._cmp_wrapper` (#71344)
Summary:
First step towards https://github.com/pytorch/pytorch/issues/71280

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71344

Reviewed By: b0noI

Differential Revision: D33594463

Pulled By: malfet

fbshipit-source-id: 0295f0d9f0342f05a390b2bd4aa0a5958c76579b
2022-01-14 19:57:55 -08:00
Scott Wolchok
c43e0286a9 [PyTorch][Lazy] Make hashing null optionals cheap (#71290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71290

The existing code called an out-of-line hash function on a constant. This is just going to get the same random-looking 64-bit integer every time, so I just changed the constant to an integer I generated with `hex(random.randint(0x1000000000000000, 0xFFFFFFFFFFFFFFFF))` to get the same effect but without the runtime hashing.
ghstack-source-id: 146991945

Test Plan: CI

Reviewed By: wconstab

Differential Revision: D33574676

fbshipit-source-id: d6ce1e1cc0db67dfede148b7e3173508ec311ea8
2022-01-14 17:13:50 -08:00