Summary:
Added Complex support with AVX to unary ops and binary ops.
I need to add nan propagation to minimum() and maximum() in the future.
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: pytorch-cpu-strided-complex extension
Preliminary Benchmarks are here.
I tried rrii and riri and found that riri is better in most situations.
Divide is very slow because you can't reduce 1/(x+y)
Sqrt is also very slow.
Reciprocal could be sped up after I add conj()
Everything else is typically within 20% of the real number performance.
Questions:
Why does macOS not support mil? #if AT_MKL_ENABLED() && !defined(__APPLE__) in vml.h. MKL does support some complex operations like Abs, so I was curious about trying it.
Is MKL just calling AVX?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26500
Differential Revision: D17835431
Pulled By: ezyang
fbshipit-source-id: 6746209168fbeb567af340c22bf34af28286bd54
Summary:
According to https://github.com/pytorch/pytorch/issues/27285 , seems we do not intend to use shebang as an indication of Python version, thus
we enable EXE001 flake8 check.
For violations, we either remove shebang from non-executable Python scripts or grant them executable permission.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27560
Differential Revision: D17831782
Pulled By: ezyang
fbshipit-source-id: 6282fd3617b25676a6d959af0d318faf05c09b26
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27173
`docs/source/named_tensor.rst` is the entry point; most users will land
either here or the named tensor tutorial when looking to use named
tensors. We should strive to make this as readable, concise, and understandable
as possible.
`docs/source/name_inference.rst` lists all of the name inference rules.
It should be clear but it's hard to make it concise.
Please let me know if anything doesn't make sense and please propose
alternative wordings and/or restructuring to improve the documentation.
This should ultimately get cherry-picked into the 1.3 branch as one
monolithic commit so it would be good to get all necessary changes made
in this PR and not have any follow ups.
Test Plan: - built and reviewed locally with `cd docs/ && make html`.
Differential Revision: D17763046
Pulled By: zou3519
fbshipit-source-id: c7872184fc4b189d405b18dad77cad6899ae1522
Summary:
Adds comprehensive memory instrumentation to the CUDA caching memory allocator.
# Counters
Added comprehensive instrumentation for the following stats:
- Allocation requests (`allocation`)
- Allocated memory (`allocated_bytes`)
- Reserved segments from cudaMalloc (`segment`)
- Reserved memory (`reserved_bytes`)
- Active memory blocks (`active`)
- Active memory (`active_bytes`)
- Inactive, non-releasable blocks (`inactive_split`)
- Inactive, non-releasable memory (`inactive_split_bytes`)
- Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`)
- Number of OOMs (`num_ooms`)
Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator.
# Snapshots
Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state.
# Implementation: major changes
- Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary.
- Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments.
- Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq
# Implementation: minor changes
- Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`.
- Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module.
- Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`.
- `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent.
- `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`.
- Style (add access modifiers in the allocator class, random nit fixes, etc.)
# Testing
- Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`.
- Ran on various basic workflows (toy example, CIFAR)
# Performance
Running the following speed benchmark: https://pastebin.com/UNndQg50
- Before this PR: 45.98 microseconds per tensor creation
- After this PR: 46.65 microseconds per tensor creation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361
Differential Revision: D17758747
Pulled By: jma127
fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6
Summary:
10 lines of error context (on both sides) is overkill, especially now
that we have line numbers. With a compilation stack of a couple
functions, it becomes a pain to scroll to the top of the stack to see
the real error every time.
This also fixes class names in the compilation stack to a format of
`ClassName.method_name` instead of the the full qualified name
Old output
```
clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor):
Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'.
:
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20
top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
batch_idx = torch.arange(num_images, device=device)[:, None]
objectness = objectness[batch_idx, top_n_idx]
levels = levels[batch_idx, top_n_idx]
proposals = proposals[batch_idx, top_n_idx]
final_boxes = []
final_scores = []
for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
keep = box_ops.remove_small_boxes(boxes, self.min_size)
boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
# non-maximum suppression, independently done per level
keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
# keep only topk scoring predictions
keep = keep[:self.post_nms_top_n]
boxes, scores = boxes[keep], scores[keep]
final_boxes.append(boxes)
final_scores.append(scores)
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8
num_images = len(anchors)
num_anchors_per_level = [o[0].numel() for o in objectness]
objectness, pred_bbox_deltas = \
concat_box_prediction_layers(objectness, pred_bbox_deltas)
# apply pred_bbox_deltas to anchors to obtain the decoded proposals
# note that we detach the deltas because Faster R-CNN do not backprop through
# the proposals
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
proposals = proposals.view(num_images, -1, 4)
boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
losses = {}
if self.training:
assert targets is not None
labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
loss_objectness, loss_rpn_box_reg = self.compute_loss(
objectness, pred_bbox_deltas, labels, regression_targets)
losses = {
'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8
"""
if self.training and targets is None:
raise ValueError("In training mode, targets should be passed")
original_image_sizes = [(img.shape[-2], img.shape[-3]) for img in images]
images, targets = self.transform(images, targets)
features = self.backbone(images.tensors)
if isinstance(features, torch.Tensor):
features = OrderedDict([(0, features)])
proposals, proposal_losses = self.rpn(images, features, targets)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)
losses = {}
losses.update(detector_losses)
losses.update(proposal_losses)
# TODO: multiple return types??
# if self.training:
```
New output
```
RuntimeError:
clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor):
Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'.
:
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20
final_scores = []
for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
keep = box_ops.remove_small_boxes(boxes, self.min_size)
boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
proposals = proposals.view(num_images, -1, 4)
boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
losses = {}
'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8
if isinstance(features, torch.Tensor):
features = OrderedDict([(0, features)])
proposals, proposal_losses = self.rpn(images, features, targets)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
detections = self.transform.postprocess
```
](https://our.intern.facebook.com/intern/diff/17560963/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26765
Pulled By: driazati
Differential Revision: D17560963
fbshipit-source-id: e463548744b505ca17f0158079b80e08fda47d49
Summary:
Adds the method `add_hparams` to `torch.utils.tensorboard` API docs. Will want to have this in PyTorch 1.3 release.
cc sanekmelnikov lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27344
Differential Revision: D17753689
Pulled By: orionr
fbshipit-source-id: cc8636e0bdcf3f434444cd29471c62105491039d
Summary:
Resubmit of https://github.com/pytorch/pytorch/pull/25980.
Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles.
We can still manage to get it work with tarfile, but let's delay it when there's an ask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26723
Differential Revision: D17551795
Pulled By: ailzhang
fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e
Summary:
This PR does a few small improvements to hub:
- add support `verbose` option in `torch.load`. Note that this mutes hitting cache message but keeps the message of first download as suggested. fixes https://github.com/pytorch/pytorch/issues/24791
- add support loading state dict from tar file or zip file in `torch.hub.load_state_dict_from_url`.
- add `torch.hub.download_url_to_file` as public API, and add BC bit for `_download_url_to_file`.
- makes hash check in filename optional through `check_hash`, many users don't have control over the naming, relaxing this constraint could potentially avoid duplicating download code on user end.
- move pytorch CI off `pytorch/vision` and use `ailzhang/torchhub_example` as a dedicated test repo. fixes https://github.com/pytorch/pytorch/issues/25865
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25980
Differential Revision: D17495679
Pulled By: ailzhang
fbshipit-source-id: 695df3e803ad5f9ca33cfbcf62f1a4f8cde0dbbe
Summary:
Changelog:
- Remove `torch.gels` which was deprecated in v1.2.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26480
Test Plan: - No tests were changed and all callsites for `torch.gels` where modified to `torch.lstsq` when `torch.lstsq` was introduced
Differential Revision: D17527207
Pulled By: zou3519
fbshipit-source-id: 28e2fa3a3bf30eb6b9029bb5aab198c4d570a950
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26240
In particular adds support for empty/empty_like which is needed for memory layouts to work.
Test Plan: Imported from OSS
Differential Revision: D17443220
Pulled By: dzhulgakov
fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25263
This adds an api to return true in script and false in eager, which together with ignore allows guarding of not yet supported JIT features. Bikeshedding requested please.
cc zou3519
```
def foo():
if not torch.jit.is_scripting():
return torch.linear(...)
else:
return addmm(...)
```
Test Plan: Imported from OSS
Differential Revision: D17272443
Pulled By: eellison
fbshipit-source-id: de0f769c7eaae91de0007b98969183df93a91f42
Summary:
Improve handling of mixed-type tensor operations.
This PR affects the arithmetic (add, sub, mul, and div) operators implemented via TensorIterator (so dense but not sparse tensor ops).
For these operators, we will now promote to reasonable types where possible, following the rules defined in https://github.com/pytorch/pytorch/issues/9515, and error in cases where the cast would require floating point -> integral or non-boolean to boolean downcasts.
The details of the promotion rules are described here:
https://github.com/nairbv/pytorch/blob/promote_types_strict/docs/source/tensor_attributes.rst
Some specific backwards incompatible examples:
* now `int_tensor * float` will result in a float tensor, whereas previously the floating point operand was first cast to an int. Previously `torch.tensor(10) * 1.9` => `tensor(10)` because the 1.9 was downcast to `1`. Now the result will be the more intuitive `tensor(19)`
* Now `int_tensor *= float` will error, since the floating point result of this operation can't be cast into the in-place integral type result.
See more examples/detail in the original issue (https://github.com/pytorch/pytorch/issues/9515), in the above linked tensor_attributes.rst doc, or in the test_type_promotion.py tests added in this PR:
https://github.com/nairbv/pytorch/blob/promote_types_strict/test/test_type_promotion.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22273
Reviewed By: gchanan
Differential Revision: D16582230
Pulled By: nairbv
fbshipit-source-id: 4029cca891908cdbf4253e4513c617bba7306cb3
Summary:
All of the code examples should now run as unit tests, save for those
that require interaction (i.e. show `pdb` usage) and those that use
CUDA.
`save` had to be moved before `load` in `jit/__init__.py` so `load`
could use the file generated by `save`
](https://our.intern.facebook.com/intern/diff/17192417/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25668
Pulled By: driazati
Differential Revision: D17192417
fbshipit-source-id: 931b310ae0c3d2cc6affeabccae5296f53fe42bc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25262
Preserve the type of ignore'd functions on serialization. Currently we first compile an ignore'd function with it's annotated type when first compiling, but do not preserve it. This is important for being able to compile models with not-yet-supported features in JIT.
```
torch.jit.ignore
def unsupported(x):
return x
def foo():
if not torch.jit._is_scripting():
return torch.linear(...)
else:
return unsupported(...)
```
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision: D17199043
Pulled By: eellison
fbshipit-source-id: 1196fd94c207b9fbee1087e4b2ef7d4656a6647f
Summary:
Adds links to torchaudio and torchtext to docs index. We should eventually evolve this to bring the audio and text docs builds in like torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24245
Differential Revision: D17163539
Pulled By: soumith
fbshipit-source-id: 5754bdf7579208e291e53970b40f73ef119b758f
Summary:
I think...
I'm having issues building the site, but it appears to get rid of the error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25544
Differential Revision: D17157327
Pulled By: ezyang
fbshipit-source-id: 170235c52008ca78ff0d8740b2d7f5b67397b614
Summary:
I presume this is what was intended.
cc t-vi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25011
Differential Revision: D16980939
Pulled By: soumith
fbshipit-source-id: c55b22e119f3894bd124eb1dce4f92a719ac047a
Summary:
Another pass over the docs, this covers most of the remaining stuff
* content updates for new API
* adds links to functions instead of just names
* removes some useless indentations
* some more code examples + `testcode`s
](https://our.intern.facebook.com/intern/diff/16847964/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24445
Pulled By: driazati
Differential Revision: D16847964
fbshipit-source-id: cd0b403fe4a89802ce79289f7cf54ee0cea45073
Summary:
Stacked PRs
* #24445 - [jit] Misc doc updates #2
* **#24435 - [jit] Add docs to CI**
This integrates the [doctest](http://www.sphinx-doc.org/en/master/usage/extensions/doctest.html) module into `jit.rst` so that we can run our code examples as unit tests. They're added to `test_jit.py` under the `TestDocs` class (which takes about 30s to run). This should help prevent things like #24429 from happening in the future. They can be run manually by doing `cd docs && make doctest`.
* The test setup requires a hack since `doctest` defines everything in the `builtins` module which upsets `inspect`
* There are several places where the code wasn't testable (i.e. it threw an exception on purpose). This may be resolvable, but I'd prefer to leave that for a follow up. For now there are `TODO` comments littered around.
](https://our.intern.facebook.com/intern/diff/16840882/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24435
Pulled By: driazati
Differential Revision: D16840882
fbshipit-source-id: c4b26e7c374cd224a5a4a2d523163d7b997280ed
Summary:
This patch writes documentation for `Tensor.record_stream()`, which is not a documented API currently. I've discussed publishing it with colesbury in https://github.com/pytorch/pytorch/issues/23729.
The documentation is based on [the introduction at `CUDACachingAllocator.cpp`](25d1496d58/c10/cuda/CUDACachingAllocator.cpp (L47-L50)). ~~I didn't explain full details of the life cycle of memory blocks or stream awareness of the allocator for the consistent level of details with other documentations.~~ I explained about the stream awareness in a note block.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24078
Differential Revision: D16743526
Pulled By: zou3519
fbshipit-source-id: 05819c3cc96733e2ba93c0a7c0ca06933acb22f3
Summary:
This is a bunch of changes to the docs for stylistic changes,
correctness, and updates to the new script API / recent TorchScript
changes (i.e. namedtuple)
For reviewers, ping me to see a link of the rendered output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24371
Pulled By: driazati
Differential Revision: D16832417
fbshipit-source-id: a28e748cf1b590964ca0ae2dfb5d8259c766a203
Summary:
Stacked PRs
* #24258 - [jit] Add `trace_module` to docs
* **#24208 - [jit] Cleanup documentation around `script` and `trace`**
Examples / info was duplicated between `ScriptModule`, `script`, and
`trace`, so this PR consolidates it and moves some things around to make
the docs more clear.
For reviewers, if you want to see the rendered output, ping me for a
link
](https://our.intern.facebook.com/intern/diff/16746236/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24208
Pulled By: driazati
Differential Revision: D16746236
fbshipit-source-id: fac3c6e762a31c897b132b8421baa8d4d61f694c
Summary:
**Patch Description**:
Update the docs to reflect one no longer needs to install tensorboard nightly, as Tensorboard 1.14.0 was [released last week](https://github.com/tensorflow/tensorboard/releases/tag/1.14.0).
**Testing**:
Haven't actually tested pytorch with tensorboard 1.14 yet. I'll update this PR once I have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22026
Differential Revision: D16772136
Pulled By: orionr
fbshipit-source-id: 2e1e17300f304f50026837abbbc6ffb25704aac0
Summary:
This was previously buggy and not being displayed on master. This fixes
the issues with the script to generate the builtin function schemas and
moves it to its own page (it's 6000+ lines of schemas)
Sphinx looks like it will just keep going if it hits errors when importing modules, we should find out how to turn that off and put it in the CI
This also includes some other small fixes:
* removing internal only args from `script()` and `trace()` docs, this also requires manually keeping these argument lists up to date but I think the cleanliness is worth it
* removes outdated note about early returns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24056
Pulled By: driazati
Differential Revision: D16742406
fbshipit-source-id: 9102ba14215995ffef5aaafcb66a6441113fad59