Summary:
This would be the case when package is build for local development rather than for installation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47390
Reviewed By: janeyx99
Differential Revision: D24738416
Pulled By: malfet
fbshipit-source-id: 22bd676bc46e5d50a09539c969ce56d37cfe5952
Summary:
As typing.NoReturn is used in the codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47314
Reviewed By: seemethere
Differential Revision: D24712847
Pulled By: malfet
fbshipit-source-id: f0692d408316d630bc11f1ee881b695437fb47d4
Summary:
libiomp runtime is the only external dependency OS X package has if compiled with MKL
Copy it to the stage directory from one of the available rpathes
And remove all absolute rpathes, since project shoudl have none
Fixes https://github.com/pytorch/pytorch/issues/38607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47262
Reviewed By: walterddr
Differential Revision: D24705094
Pulled By: malfet
fbshipit-source-id: 9f588a3ec3c6c836c8986d858fb53df815a506c8
Summary:
Also, be a bit future-proof in support version list
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46921
Reviewed By: seemethere
Differential Revision: D24568733
Pulled By: malfet
fbshipit-source-id: ae34f8da1ed39b80dc34db0b06e4ef142104a3ff
Summary:
import print_function to make setup.py invoked by Python2 print human readable error:
```
% python2 setup.py
Python 2 has reached end-of-life and is no longer supported by PyTorch.
```
Also, remove `future` from the list of the PyTorch package install dependencies
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46317
Reviewed By: walterddr, bugra
Differential Revision: D24305004
Pulled By: malfet
fbshipit-source-id: 9181186170562384dd2c0e6a8ff0b1e93508f221
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45844
Someone pointed out that dataclasses were actually added to the python
stdlib in 3.7 and not 3.8, so bumping down the dependency on dataclasses
from 3.8 -> 3.7 makes sense here
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: walterddr, malfet
Differential Revision: D24113367
Pulled By: seemethere
fbshipit-source-id: 03d2d93f7d966d48a30a8e2545fd07dfe63b4fb3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45610
Also add to the usual documentation places that this option exists.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D24058199
Pulled By: suo
fbshipit-source-id: 81574fbd042f47587e2c7820c726fac0f68af2a7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45611
dataclasses was made a standard library item in 3.8
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: walterddr
Differential Revision: D24031740
Pulled By: seemethere
fbshipit-source-id: 15bdf1fe0d8de9b8ba7912e4a651f06b18d516ee
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:
```2to3 -f future -w caffe2```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033
Reviewed By: seemethere
Differential Revision: D23808648
Pulled By: bugra
fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
Summary:
The ATen/native/cuda headers were copied to torch/include, but then not included in the final package. Further, add ATen/native/hip headers to the installation, as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45097
Reviewed By: mruberry
Differential Revision: D23831006
Pulled By: malfet
fbshipit-source-id: ab527928185faaa912fd8cab208733a9b11a097b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44577
I would like to to move this to cmake so that I can depend on it
happening from other parts of the build.
This PR pulls out the logic for determining the version string and
writing the version file into its own module. `setup.py` still receives
the version string and uses it as before, but now the code for writing
out `torch/version.py` lives in a custom command in torch/CMakeLists.txt
I noticed a small inconsistency in how version info is populated.
`TORCH_BUILD_VERSION` is populated from `setup.py` at configuration
time, while `torch/version.py` is written at build time. So if, e.g. you
configured cmake on a certain git rev, then built it in on another, the
two versions would be inconsistent.
This does not appear to matter, so I opted to preserve the existing
behavior.
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D23734781
Pulled By: suo
fbshipit-source-id: 4002c9ec8058503dc0550f8eece2256bc98c03a4
Summary:
This can be taken from the system in which case it is not used from the submodule. Hence the check here limits the usage unnecessarily
ccing malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44278
Reviewed By: malfet
Differential Revision: D23568552
Pulled By: ezyang
fbshipit-source-id: 7fd2613251567f649b12eca0b1fe7663db9cb58d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629
How to approach reviewing this diff:
- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D23183978
Pulled By: ezyang
fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763
Summary:
This prevents confusing errors when the interpreter encounters some
syntax errors in the middle.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42870
Reviewed By: albanD
Differential Revision: D23269265
Pulled By: ezyang
fbshipit-source-id: 61f62cbe294078ad4a909fa87aa93abd08c26344
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522
Main changes:
- Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch.
- Changed the way the preprocessor flags are provided, and changed their name.
There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories.
I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one.
Test Plan: CI
Reviewed By: malfet
Differential Revision: D22959472
fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67
Summary:
Import __future__ to make `print(*args)` a syntactically correct statement under Python-2
Otherwise, if once accidentally invokes setup.py using Python-2 interpreter they will be greeted by:
```
File "setup.py", line 229
print(*args)
^
SyntaxError: invalid syntax
```
instead of:
```
Python 2 has reached end-of-life and is no longer supported by PyTorch.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41960
Reviewed By: orionr, seemethere
Differential Revision: D22710174
Pulled By: malfet
fbshipit-source-id: ffde3ddd585707ba1d39e57e0c6bc9c4c53f8004
Summary:
Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703
Differential Revision: D21960684
Pulled By: ezyang
fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65
Summary:
It just depends on a single `torch_python` library.
C library does not depend on standard C++ library and as result it closes https://github.com/pytorch/pytorch/issues/36941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39375
Reviewed By: orionr
Differential Revision: D21840645
Pulled By: malfet
fbshipit-source-id: 777c189feee9d6fc686816d92cb9f109b8aac7ca
Summary:
**Summary**
This commit adds the headers required to define and use JIT backends to
`package_data` in `setup.py` so that they are exported and copied to the
same place as the rest of the headers when PyTorch is installed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38525
Differential Revision: D21601806
Pulled By: SplitInfinity
fbshipit-source-id: 1615dd4047777926e013d7dd14fe427d5ffb8b70
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35617
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up some cruft that we put in place to support it.
Test Plan: CI
Differential Revision: D20842883
Pulled By: dreiss
fbshipit-source-id: 18dc5219ba99658c0ca7e2f26863df008c420e6a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38157
This removes the error prone process of assembling `torch/__init__.pyi`
(and frequently forgetting to expose things), since now we can simply
rely on the true source file to get things done. Most of the old
codegen in gen_pyi.py is now rerouted to various files:
- `torch/_C/__init__.pyi` (the dumping pile of all misc bindings)
- `torch/_C/_nn.pyi` (NN function bindings)
- `torch/_C/_VariableFunctions.pyi` (torch function bindings)
`torch.types` grew a bunch more definitions that previously where
defined in `torch/__init__.pyi`
Some miscellaneous changes
- Fixed a bug where we treat single TensorList argument as implying
varargs are accepted. This is actually only supported on IntList.
This means we can correctly generate a stub for dequantize.
- Add missing manual stub for nonzero
- Switched torch/onnx/operators.py to directly refer to _C module,
since apparently mypy doesn't think that methods prefixed with
underscores get reexported. This may be a recurring theme; maybe
we need to find a better way to solve it.
Because I was really lazy, I dumped namedtuple definitions in both
`torch._C` and `torch._C._VariableFunctions`. This is definitely wrong.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497400
Pulled By: ezyang
fbshipit-source-id: 07b126141c82efaca37be27c07255cb2b9b3f064
Summary:
We should not rely on the async exceptions. Catching C++ only exception is more sensible and may get a boost in both space (1163 MB -> 1073 MB, 0.92x) and performance(51m -> 49m, 0.96x).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37235
Differential Revision: D21256918
Pulled By: ezyang
fbshipit-source-id: 572ee96f2e4c48ad13f83409e4e113483b3a457a
Summary:
These options are disabled by default, and are supposed to be used by
linux distro developers. With the existing shortcut option
USE_SYSTEM_LIBS toggled, these new options will be enabled as well.
Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should
no longer check the existence of git submodules.
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277
Differential Revision: D21256999
Pulled By: ezyang
fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf
Summary:
Line 33+ contains instructions on how to disable use, 108+ on how to enable it.
The default in CMakeLists.txt is enabled, so drop the latter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36993
Differential Revision: D21161793
Pulled By: ngimel
fbshipit-source-id: 08c5eecaf8768491f90d4a52c338ecea32a0c35e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35613
Python 2 has reached end-of-life and is no longer supported by PyTorch.
To spare users from a long, doomed setup when trying to use PyTorch with
Python 2, detect this case early and fail with a clear message. This
commit covers setup.py.
Test Plan: Attempted to build PyTorch with Python 2 and saw a clear error *quickly*.
Differential Revision: D20842881
Pulled By: dreiss
fbshipit-source-id: caaaa0dbff83145ff668bd25df6d7d4b3ce12e47
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35411
The file and class names in ATen/core/boxing were quite confusing.
Let's rename them for readability.
Also move function schema inference out of the boxing logic into op_registration.h where it belongs.
ghstack-source-id: 101539206
Test Plan: waitforsandcastle
Differential Revision: D20653621
fbshipit-source-id: 6a79c73d5758bee1e072d543c030913b18a69c7c
Summary:
The original behavior of pytorch c10d only supports built-in c10d backends, such as
nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically
loading 3rd party communication libraries which are derived from ProcessGroup base class.
related RFC is in: https://github.com/pytorch/pytorch/issues/27955
Through this way, user just need specify a 3rd party c10d backend name when invoking
torch.distributed.init_process_group(). The proposed logic will try to load corresponding
c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend
through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068
Differential Revision: D19174838
Pulled By: agolynski
fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62
Summary:
As a followup to https://github.com/pytorch/pytorch/pull/35042 this removes python2 from setup.py and adds Python 3.8 to the list of supported versions. We're already testing this in CircleCI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35539
Differential Revision: D20709060
Pulled By: orionr
fbshipit-source-id: 5d40bc14cb885374fec370fc7c5d3cde8769039a
Summary:
## Motivation
This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300.
DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version.
This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture.
<br>
## What's included?
Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes:
<br>
**General:**
1. Replace op-level allocator with global-registered allocator
```
// before
ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z);
// after
ideep::sum::compute(scales, {x, y}, z);
```
The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator.
```
RegisterEngineAllocator cpu_alloc(
ideep::engine::cpu_engine(),
[](size_t size) {
return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size);
},
[](void* p) {
c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p);
}
);
```
------
2. Simplify group convolution
We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case.
As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code.
```
// aten/src/ATen/native/mkldnn/Conv.cpp
if (w.ndims() == x.ndims() + 1) {
AT_ASSERTM(
groups > 1,
"Only group _mkldnn_conv2d weights could have been reordered to 5d");
kernel_size[0] = w.get_dim(0) * w.get_dim(1);
std::copy_n(
w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1);
} else {
std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin());
}
```
------
3. Enable DNNL built-in cache
Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint.
This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before.
------
4. Use 64-bit integer to denote dimensions
We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector.
<br>
**Misc changes in each commit:**
**Commit:** change build options
Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`.
Old | New
-- | --
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES
WITH_TEST | MKLDNN_BUILD_TESTS
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME
MKLDNN_USE_MKL | N/A (not use MKL anymore)
------
**Commit:** aten reintegration
- aten/src/ATen/native/mkldnn/BinaryOps.cpp
Implement binary ops using new operation `binary` provided by DNNL
- aten/src/ATen/native/mkldnn/Conv.cpp
Clean up group convolution checks
Simplify conv backward integration
- aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp
Simplify prepacking convolution weights
- test/test_mkldnn.py
Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue
- torch/utils/mkldnn.py
Prepack weight tensor on module `__init__` to achieve better performance significantly
------
**Commit:** caffe2 reintegration
- caffe2/ideep/ideep_utils.h
Clean up unused type definitions
- caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc
Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit`
- caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc
Clean up group convolution checks
Revamp convolution API
- caffe2/ideep/operators/conv_transpose_op.cc
Clean up group convolution checks
Clean up deconv workaround code
------
**Commit:** custom allocator
- Register c10 allocator as mentioned above
<br><br>
## Performance
We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20.
ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T)
-- | -- | --
pytorch resnet18 | 121.4% | 99.7%
pytorch resnet50 | 123.1% | 106.9%
pytorch resnext101_32x8d | 116.3% | 100.1%
pytorch resnext50_32x4d | 141.9% | 104.4%
pytorch mobilenet_v2 | 163.0% | 105.8%
caffe2 alexnet | 303.0% | 99.2%
caffe2 googlenet-v3 | 101.1% | 99.2%
caffe2 inception-v1 | 102.2% | 101.7%
caffe2 mobilenet-v1 | 356.1% | 253.7%
caffe2 resnet101 | 100.4% | 99.8%
caffe2 resnet152 | 99.8% | 99.8%
caffe2 shufflenet | 141.1% | 69.0% †
caffe2 squeezenet | 98.5% | 99.2%
caffe2 vgg16 | 136.8% | 100.6%
caffe2 googlenet-v3 int8 | 100.0% | 100.7%
caffe2 mobilenet-v1 int8 | 779.2% | 943.0%
caffe2 resnet50 int8 | 99.5% | 95.5%
_Configuration:
Platform: Skylake 8180
Latency Test: 4 threads, warmup 30, iteration 500, batch size 1
Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_
† Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422
Test Plan:
Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results
10% improvement for ResNext with avx512, neutral on avx2
More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP
Reviewed By: yinghai
Differential Revision: D20381325
Pulled By: dzhulgakov
fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34774
This PR provides pybind11's `type_caster<at::Generator>` that allows mapping `at::Generator` instance returned from user-defined method to python `torch::Generator`, defined as `THPGenerator ` c++ class.
This allows 1) defining custom RNG in c++ extension 2) using custom RNG in python code.
`TestRNGExtension.test_rng` shows how to use custom RNG defined in `rng_extension.cpp`
Test Plan: Imported from OSS
Differential Revision: D20549451
Pulled By: pbelevich
fbshipit-source-id: 312a6deccf8228f7f60695bbf95834620d52f5eb
Summary:
Because `past` is used in `caffe2.python.core`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35057
Test Plan: CI
Differential Revision: D20547042
Pulled By: malfet
fbshipit-source-id: cad2123c7b88271fea37f21e616df551075383a8
Summary:
Was originally not a requirement but we should add it back here since
it's required on import and we require it anyways for our conda
packages.
Tested with:
```
❯ pkginfo -f requires_dist *.whl
requires_dist: ['numpy']
```
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34510
Differential Revision: D20352125
Pulled By: seemethere
fbshipit-source-id: 383e396fe500ed7043d83c3df57d1772d0fff1e6
Summary:
Per https://github.com/pytorch/pytorch/issues/19161 PyTorch is incompatible with 3.6.0 due to the missing `PySlice_Unpack`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34724
Test Plan: CI + try to load pytorch binary using python-3.6.0
Differential Revision: D20449052
Pulled By: malfet
fbshipit-source-id: 2c787fc64f5d1377c7f935ad2f3c77f46723d7dd
Summary:
Attempt to build pytorch with ASAN on system with gcc-8 fails due to the mismatch system compilation flags.
Address the issue by using original compiler to build `torch._C` extension
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34549
Test Plan: Run `.jenkins/pytorch/build-asan.sh` on FC-30
Differential Revision: D20373781
Pulled By: malfet
fbshipit-source-id: 041c8d25f96b4436385a5e0eb6fc46e9b5fdf3f1
Summary:
This PR move glu to Aten(CPU).
Test script:
```
import torch
import torch.nn.functional as F
import time
torch.manual_seed(0)
def _time():
if torch.cuda.is_available():
torch.cuda.synchronize()
return time.time()
device = "cpu"
#warm up
for n in [10, 100, 1000, 10000]:
input = torch.randn(128, n, requires_grad=True, device=device)
grad_output = torch.ones(128, n // 2, device=device)
for i in range(1000):
output = F.glu(input)
output.backward(grad_output)
for n in [10, 100, 1000, 10000]:
fwd_t = 0
bwd_t = 0
input = torch.randn(128, n, requires_grad=True, device=device)
grad_output = torch.ones(128, n // 2, device=device)
for i in range(10000):
t1 = _time()
output = F.glu(input)
t2 = _time()
output.backward(grad_output)
t3 = _time()
fwd_t = fwd_t + (t2 -t1)
bwd_t = bwd_t + (t3 - t2)
fwd_avg = fwd_t / 10000 * 1000
bwd_avg = bwd_t / 10000 * 1000
print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
% (n, fwd_avg, bwd_avg))
```
Test device: **skx-8180.**
Before:
```
input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms).
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms).
input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms).
input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms).
```
After:
```
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms).
```
Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179
Differential Revision: D19839835
Pulled By: VitalyFedyunin
fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32325
The purpose of this PR is to enable PyTorch dispatching on `at::Generator*` parameters and demonstrate how it can be used in cpp extensions to implement custom RNG.
1. `CustomRNGKeyId` value added to DispatchKey enum and `DispatchKeySet key_set_` added to `at::Generator`
2. The overloaded `operator()(at::Generator* gen)` added to MultiDispatchKeySet.
3. The existing CPUGenerator and CUDAGenerator class are supplied with CPUTensorId and CUDATensorId dispatch keys
4. The implementation of CPU's `cauchy_kernel`(as an example, because it's already moved to ATen) was templatized and moved to `ATen/native/cpu/DistributionTemplates.h` to make it available for cpp extensions
5. Minor CMake changes to make native/cpu tensors available for cpp extensions
6. RegisterCustomRNG test that demonstrates how CustomCPUGenerator class can be implemented and how custom_rng_cauchy_ native function can be registered to handle Tensor::cauchy_ calls.
Test Plan: Imported from OSS
Differential Revision: D19604558
Pulled By: pbelevich
fbshipit-source-id: 2619f14076cee5742094a0be832d8530bba72728
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
Summary:
This PR adds a more complete list of pytorch header files to be installed at build time. It also fixes one instance of including a header from local src directory instead of installed directory.
A more complete set of headers enable other modules to correctly work with pyTorch built for ROCm.
cc: ezyang bddppq iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32076
Differential Revision: D19372933
Pulled By: ezyang
fbshipit-source-id: 3b5f3241c001fa05ea448c359a706ce9a8214aa0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31155
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19262584
Pulled By: ezyang
fbshipit-source-id: 147ac5a9c36e813ea9a2f68b498880942d661be5
Summary:
We dont have ATen/native/*.h in torch target before, and we would like it to be exposed for external use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30835
Differential Revision: D18836160
Pulled By: zrphercule
fbshipit-source-id: 7330a9c9d8b65f173cc332b1cfeeb18c7dca20a8
Summary:
This adds the HIP_VERSION cmake variable as hip_version.
This should help detecting ROCm, e.g. in https://github.com/pytorch/pytorch/issues/22091.
To parallel CUDA, hip_version is a string.
An alternative variant might be to split by '.' and only take the first two parts.
The method suffers a bit from ROCm not being as monolithic as CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29815
Differential Revision: D18532267
Pulled By: bddppq
fbshipit-source-id: 1bde4ad0cfacc47bfd1c0945e130921d8575a5bf
Summary:
Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659
Differential Revision: D18458208
Pulled By: bddppq
fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d
Summary:
We dont have ATen/native/quantized/cpu/*.h in torch target before, and we would like it to be exposed for external use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29418
Differential Revision: D18383534
Pulled By: zrphercule
fbshipit-source-id: 72c06ae2c10e8cc49e7256c9e9b89288263bbfde
Summary:
This problem is from issue [https://github.com/pytorch/pytorch/issues/28753](https://github.com/pytorch/pytorch/issues/28753)
The header files on directories`math` and `threadpool` should be included on the built package because they are included on the other header files, such as on file `torch/include/caffe2/utils/math.h`
```
#include "caffe2/core/common.h"
#include "caffe2/core/types.h"
#include "caffe2/utils/math/broadcast.h"
#include "caffe2/utils/math/elementwise.h"
#include "caffe2/utils/math/reduce.h"
#include "caffe2/utils/math/transpose.h"
#include "caffe2/utils/math/utils.h"
```
But the `setup.py` doesn't include the header files on `master` branch. The header files on `utils` directory of a built `torch` package are the following:
```
> ls include/caffe2/utils
bench_utils.h conversions.h eigen_utils.h map_utils.h murmur_hash3.h proto_wrap.h smart_tensor_printer.h
cast.h cpuid.h filler.h math-detail.h proto_convert.h signal_handler.h string_utils.h
cblas.h cpu_neon.h fixed_divisor.h math.h proto_utils.h simple_queue.h zmq_helper.h
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28869
Differential Revision: D18226319
Pulled By: soumith
fbshipit-source-id: 51575ddc559181c069b3324aa9b2d1669310ba25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26140
Per https://github.com/pytorch/pytorch/issues/25883, we want to work
towards C++/Python API parity. This diff adds clip_grad_norm_ to the c++ API to
improve parity.
ghstack-source-id: 91334333
ghstack-source-id: 91334333
Test Plan: Added a unit test
Differential Revision: D17312367
fbshipit-source-id: 753ba3a4d084d01f3cc8919da3108e67c809ad65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27149
Extract version to version.txt and add reading version logic to setup.py and fb/torch_version.py
ghstack-source-id: 91271883
Test Plan: N/A
Reviewed By: gchanan, ezyang
Differential Revision: D17689307
fbshipit-source-id: 21899502027cec71b63d9dc151e09ff5ff3f279d
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.
Previously in https://github.com/pytorch/pytorch/issues/25482, one test failed because TensorRT detects cuDNN differently, and there may be situations we can find cuDNN but TensorRT cannot. This is fixed by passing our detection result down to TensorRT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25876
Differential Revision: D17346270
Pulled By: ezyang
fbshipit-source-id: c1e7ad4a1cb20f964fe07a72906f2f002425d894
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26337
- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.
ghstack-source-id: 90459911
Test Plan: unit tests
Differential Revision: D17416607
fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
Summary:
This PR aims to re-organize C++ API `torch::nn` folder structure in the following way:
- Every module in `torch/csrc/api/include/torch/nn/modules/` (except `any.h`, `named_any.h`, `modulelist.h`, `sequential.h`, `embedding.h`) has a strictly equivalent Python file in `torch/nn/modules/`. For example:
`torch/csrc/api/include/torch/nn/modules/pooling.h` -> `torch/nn/modules/pooling.py`
`torch/csrc/api/include/torch/nn/modules/conv.h` -> `torch/nn/modules/conv.py`
`torch/csrc/api/include/torch/nn/modules/batchnorm.h` -> `torch/nn/modules/batchnorm.py`
`torch/csrc/api/include/torch/nn/modules/sparse.h` -> `torch/nn/modules/sparse.py`
- Containers such as `any.h`, `named_any.h`, `modulelist.h`, `sequential.h` are moved into `torch/csrc/api/include/torch/nn/modules/container/`, because their implementations are too long to be combined into one file (like `torch/nn/modules/container.py` in Python API)
- `embedding.h` is not renamed to `sparse.h` yet, because we have another work stream that works on API parity for Embedding and EmbeddingBag, and renaming the file would cause conflict. After the embedding API parity work is done, we will rename `embedding.h` to `sparse.h` to match the Python file name, and move the embedding options out to options/ folder.
- `torch/csrc/api/include/torch/nn/functional/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/functional/pooling.h` contains the functions for pooling, which are then used by the pooling modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`.
- `torch/csrc/api/include/torch/nn/options/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/options/pooling.h` contains MaxPoolOptions, which is used by both MaxPool modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`, and max_pool functions in `torch/csrc/api/include/torch/nn/functional/pooling.h`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26262
Differential Revision: D17422426
Pulled By: yf225
fbshipit-source-id: c413d2a374ba716dac81db31516619bbd879db7f
Summary:
local build is slow... test in CI...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26083
Differential Revision: D17346949
Pulled By: ailzhang
fbshipit-source-id: f552d1a4be55ad4e2bd915af7c5a2c1b6667c446
Summary:
What dist_check.py does is largely merely determining whether we should
use set "USE_IBVERBS" to ON or OFF when the user sets "USE_GLOO_IBVERBS"
to ON. But this is unnecessary, because this complicated determination
will always be overrided by gloo:
2101e02cea/cmake/Dependencies.cmake (L19-L28)
Since dist_check.py becomes irrelevant, this commit also simplifies the
setting of `USE_DISTRIBUTED` (by removing its explicit setting in Python scripts), and deprecate `USE_GLOO_IBVERBS` in favor
of `USE_IBVERBS`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25879
Differential Revision: D17282395
Pulled By: pietern
fbshipit-source-id: a10735f50728d89c3d81fd57bcd26764e7f84dd1
Summary:
FindCUDNN.cmake and cuda.cmake have done the detection. This commit deletes `tools/setup_helpers/cudnn.py` as it is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25482
Differential Revision: D17226408
Pulled By: ezyang
fbshipit-source-id: abd9cd0244cabea1f5d9f93f828d632d77c8dd5e
Summary:
In facebookincubator/gloo#212, a libuv based Gloo transport was introduced,
which allows us to use Gloo on macOS (and later perhaps also Windows). This
commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS.
A few notes:
* The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`.
* The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS).
* The TCP store works but sometimes crashes on process termination.
* The distributed tests are not yet run.
* The nightly builds don't use `USE_DISTRIBUTED=1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260
Reviewed By: mrshenli
Differential Revision: D17202381
Pulled By: pietern
fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25083
I missed this in the last PR
Test Plan: Imported from OSS
Differential Revision: D17005372
Pulled By: jamesr66a
fbshipit-source-id: 1200a6cd88fb9051aed8baf3162a9f8ffbf65189
Summary:
`python_requires` helps the installer choose the correct version of this package for the user's running Python.
This is especially necessary when dropping Python 2 (https://github.com/pytorch/pytorch/issues/23795) but is useful now too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23863
Differential Revision: D16692908
Pulled By: soumith
fbshipit-source-id: 3c9ba2eb1d1cf12763d6284daa4f18f605abb373
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23895
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D16688489
Pulled By: ezyang
fbshipit-source-id: a56d0180a0bc57775badd9e31ea3d441d5fd4f88
Summary:
add setup metadata to help PyPI flesh out content on pypi package page.
Apparently this might help flesh out the "Used By" feature according to driazati
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22085
Differential Revision: D16604703
Pulled By: soumith
fbshipit-source-id: ddb4f7ba7c24fdf718260aed28cc7bc9afb46de9
Summary:
Currently the build type is decided by the environment variable DEBUG
and REL_WITH_DEB_INFO. This commit also lets CMAKE_BUILD_TYPE be
effective. This makes the interface more consistent with CMake. This
also prepares https://github.com/pytorch/pytorch/issues/22776.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22875
Differential Revision: D16281663
Pulled By: ezyang
fbshipit-source-id: 952f92aad85ff59f1c7abe8256eca8a4a0936026
Summary:
---
How does the current code subsume all detections in the deleted `nccl.py`?
- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.
- The main NCCL detection happens in [FindNCCL.cmake](8377d4b32c/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](8377d4b32c/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.
- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.
- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
* The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA. `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.
* Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
- It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
- It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).
---
Regarding for relevant issues:
- https://github.com/pytorch/pytorch/issues/12063 and https://github.com/pytorch/pytorch/issues/2877: These are properly handled, as explained in the updated comment.
- https://github.com/pytorch/pytorch/issues/2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81e A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:
> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22930
Differential Revision: D16440275
Pulled By: ezyang
fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
Summary:
MKL-DNN is the main library for computation when we use ideep device. It can use kernels implemented by different algorithms (including JIT, CBLAS, etc.) for computation. We add the "USE_MKLDNN_CBLAS" (default OFF) build option so that users can decide whether to use CBLAS computation methods or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19014
Differential Revision: D16094090
Pulled By: ezyang
fbshipit-source-id: 3f0b1d1a59a327ea0d1456e2752f2edd78d96ccc
Summary:
Following up b811b6d5c0
* Use property instead of __setattr__ in CMake.
* Add a comment clarifying when built_ext.run is called.
---
cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21792
Differential Revision: D15860606
Pulled By: umanwizard
fbshipit-source-id: ba1fa07f58d4eac81ac27fa9dc7115d1cdd3dec0
Summary:
Currently when building extensions, variables such as USE_CUDA, USE_CUDNN are used to determine what libraries should be linked. But we should use what CMake has detected, because:
1. If CMake found them unavailable but the variables say some libraries should be linked, the build would fail.
2. If the first build is made using a set of non-default build options, rebuild must have these option passed to setup.py again, otherwise the extension build process is inconsistent with CMake. For example,
```bash
# First build
USE_CUDA=0 python setup.py install
# Subsequent builds like this would fail, unless "build/" is deleted
python setup.py install
```
This commit addresses the above issues by using variables from CMakeCache.txt when building the extensions.
---
The changes in `setup.py` may look lengthy, but the biggest changed block is mostly moving them into a function `configure_extension_build` (along with some variable names changed to `cmake_cache_vars['variable name']` and other minor changes), because it must be called after CMake has been called (and thus the options used and system environment detected by CMake become available).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21653
Differential Revision: D15824506
Pulled By: ezyang
fbshipit-source-id: 1e1eb7eec7debba30738f65472ccad966ee74028
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.
The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774
Differential Revision: D15769965
Pulled By: kostmo
fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
Summary:
Add an option to setup.py to stop the build process once cmake terminates. This leaves users a chance to fine adjust build options. Also update README accordingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21034
Differential Revision: D15530096
Pulled By: soumith
fbshipit-source-id: 71ac6ff8483c3ee77c38d88f0d059db53a7d3901
Summary:
Sometimes users forget using the "--recursive" option when they update submodules. This added check should help expose this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20937
Differential Revision: D15502846
Pulled By: mrshenli
fbshipit-source-id: 34c28a2c71ee6442d16b8b741ea44a18733b1536
Summary:
When detecting the presence of NumPy using import, move numpy-related variable assignments outside the try block (i.e., to an else block) to improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20739
Differential Revision: D15453916
Pulled By: ezyang
fbshipit-source-id: d3c37f2b290846be3c6a1462251cbb3e95d493be
Summary:
I haven't had a chance to rigorously try these out yet so don't merge yet.
Closes#18725.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18963
Differential Revision: D14832897
Pulled By: ezyang
fbshipit-source-id: 4780e7a34126bc66ddbfd9d808dfc9e0edd77e68
Summary:
Added stubs for:
* The `device` module
* The `cuda` module
* Parts of the `optim` module
* Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start.
This would close#16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue.
The big remaining missing package is `nn`.
Also added a `py.typed` file so mypy will pick up on the type stubs. That closes#17639.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511
Differential Revision: D14715053
Pulled By: ezyang
fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9
Summary:
`python setup.py develop` fails with following messages.
~~~
...
-- Building with NumPy bindings
-- Not using cuDNN
-- Not using MIOpen
-- Not using CUDA
-- Using MKLDNN
-- Not using NCCL
-- Building without distributed package
Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd to C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd
copying torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python
building 'torch._C' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\torch
creating build\temp.win-amd64-3.7\Release\torch\csrc
...
creating C:\data\source\pytorch\build\lib.win-amd64-3.7\torch
C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /nodefaultlib:libucrt.lib ucrt.lib /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\data\source\pytorch\torch\lib /LIBPATH:C:\data\dlenv\libs /LIBPATH:C:\data\dlenv\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" shm.lib torch_python.lib /EXPORT:PyInit__C build\temp.win-amd64-3.7\Release\torch/csrc/stub.obj /OUT:build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib /NODEFAULTLIB:LIBCMT.LIB
ライブラリ build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib とオブジェクト build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.exp を作成中
コード生成しています。
コード生成が終了しました。
copying build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd -> torch
copying build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> caffe2\python
copying build/temp.win-amd64-3.7/Release/torch/csrc/_C.cp37-win_amd64.lib -> build/lib.win-amd64-3.7/torch/lib/_C.lib
error: could not create 'build/lib.win-amd64-3.7/torch/lib/_C.lib': No such file or directory
~~~
When `python setup.py install` is executed, `torch/lib` has been created by previous process (copying many files) and this copy succeeds. But in develop mode, that process does not executed and this copy fails.
This patch creates `torch/lib` directory if do not exist.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18666
Differential Revision: D14704269
Pulled By: ezyang
fbshipit-source-id: b2d7c698a906b945bf34bb78f17b91b4fdfd3294
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18090
This schema inference is needed by the c10 operator registration mechanism. Move it to c10.
It is going to be used by diffs stacked on top.
Reviewed By: ezyang
Differential Revision: D14491454
fbshipit-source-id: 0f8ddcdbd91467c8347d315dd443a1ca8b216481
Summary:
Add check and provide useful warning/error information to user if foxi is not checked out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17477
Reviewed By: zrphercule
Differential Revision: D14212896
Pulled By: houseroad
fbshipit-source-id: 557247d5d8fdc016b1c24c2a21503e59f874ad09
Summary:
Fix#16650.
Headers such as `ATen/cpu/vml.h` contain `#include <ATen/cpu/vec256/vec256.h>`
for example, but these vec256 headers aren't included, due to commit e4c0bb1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17220
Differential Revision: D14165695
Pulled By: ezyang
fbshipit-source-id: 27b2aa2a734b3719ca4af0565f79623b64b2620f
Summary:
light weight implementation of LLVM filecheck utility. Currently only handles string matching - regexes & saving a regex to a variable name can be added as needed.
Current intended usage is through FileCheckBuilder python handle, and is shown in the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16858
Differential Revision: D14096244
Pulled By: eellison
fbshipit-source-id: c7c8d1457691c105e6ccbb3c1a378d96baac2569
Summary:
Since we don't do tmp_install any more it's better to include all necessary headers.
cc kostmo for better suggestions of how to list all headers here
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16890
Differential Revision: D14079848
Pulled By: dzhulgakov
fbshipit-source-id: 4522c80d05e5d91f99f6700cde46cac559330d28
Summary:
This is needed to check for wrong arguments or --help options
before `build_deps()` is executed. Otherwise command line arguments
are not parsed and checked until `setup()` is run.
Fixes: #16707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16914
Differential Revision: D14041236
Pulled By: soumith
fbshipit-source-id: 41f635772ccf47f05114775d5a19ae04c495ab3b
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414
Differential Revision: D13863635
Pulled By: zdevito
fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
Summary:
We have:
- This is an initial stab at creating a type stub `torch/__init__.pyi` .
- This is only tested on Python 3, since that's the only Python version mypy
works on.
- So far, we only aim at doing this for torch functions and torch.Tensor.
- Quite a few methods and functions have to be typed manually. These are
done in `torch/__init__.pyi.in`
For me, PyCharm (the non-paid one) didn't seem to indicate errors in the .pyi when opening and seemed to be able to get the type hint for the few functions I tried, but I don't use PyCharm for my usual PyTorch activities, so I didn't extensively try this out.
An example of a generated PYI is at [this gist](https://gist.github.com/ezyang/bf9b6a5fa8827c52152858169bcb61b1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12500
Differential Revision: D13695553
Pulled By: ezyang
fbshipit-source-id: 4566c71913ede4e4c23ebc4a72c17151f94e8e21
Summary:
This commit removes the dependency on `build_pytorch_libs.sh` by moving the remaining functionality that is not expressible in cmake into python. Removing the indirection through bash also removes over 300 lines of environment munging code that is incredibly hard to understand because it passes a lot of secret parameters through `os.env`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16289
Reviewed By: ezyang
Differential Revision: D13821662
Pulled By: zdevito
fbshipit-source-id: d658d26925e3b1169ac1e3d44a159cf8a1f0d9b1
Summary:
Now it is only necessary to use 'develop' or 'install' to build. Incremental cmake is on by default. `develop --cmake` forces it to rerun.
The NinjaBuilder stuff is dead. It was used to make building _C.so
faster but now _C.so is just an empty stub file.
Removed a bunch of custom build commands from setup.py that are
no longer meaningful now that cmake handles most of the build.
Removed unused targets in build_pytorch_lib.sh/bat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16162
Differential Revision: D13744155
Pulled By: zdevito
fbshipit-source-id: d836484782c65b7f8e8c7a82620886f7a7777892
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16050
The c10 dispatcher will (soon) depend on IValue and IValue can't be moved to c10 yet because it depends on at::Tensor, which depends on legacy Type dispatch and we don't want the legacy dispatch in c10.
So instead, we move the c10 dispatcher back to ATen/core until we can actually move at::Tensor to c10.
Reviewed By: ezyang
Differential Revision: D13684517
fbshipit-source-id: 1125f4254223907c52f96ff73034f6d4ae9fd0a7
Summary:
Confirmed on a local run that all the additional headers are present. This shouldn't be caught in any existing tests though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16124
Differential Revision: D13720773
Pulled By: pjh5
fbshipit-source-id: 22a42639f5649cac555ecc5a8b6760a8cbfcf01f
Summary:
bypass-lint
- Change all Caffe2 builds to use setup.py instead of cmake
- Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp
- Move skipIfCI logic from onnx test scripts to the rest of CI logic
- Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917
Reviewed By: orionr
Differential Revision: D13637583
Pulled By: pjh5
fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316
This starts cleaning up the files in c10 according to the module structure we decided on.
Move to c10/util:
- Half.h, Half-inl.h, Half.cpp, bitcasts.h
Move to c10/core:
- Device.h, Device.cpp
- DeviceType.h, DeviceType.cpp
i-am-not-moving-c2-to-c10
Reviewed By: dzhulgakov
Differential Revision: D13498493
fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63
Summary:
Currently re-implements the dataloader for stateful datasets. Outstanding work:
- Refactor DataLoader and DataLoader2 to have common base classes and only differ in specifi pieces of logic,
- Figure out how to not duplicate the `MapDataset` logic for stateful vs. non-stateful
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15096
Differential Revision: D13522043
Pulled By: goldsborough
fbshipit-source-id: 08e461ca51783047f11facc4d27dfa2e4f1e4c2a
Summary:
…done once
This allow no-op build to work correctly even when BUILD_CAFFE2_OPS is on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14982
Differential Revision: D13413960
Pulled By: zdevito
fbshipit-source-id: 6e5412a8c375af8a47c76f548cdd31cff15f3853
Summary:
This is broken out of https://github.com/pytorch/pytorch/pull/13733/
We want to install cpp tests so they can ultimately be runnable from that location for Caffe2 tests run from PyTorch builds.
cc pjh5 yf225 anderspapitto
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15000
Reviewed By: pjh5
Differential Revision: D13416253
Pulled By: orionr
fbshipit-source-id: 51280be0a22557a742f90c9f303c58c35cbd4a38
Summary:
1. Changes the prints along the 'rebuild' pathway to respect the '-q' flag of setup.py
A clean rebuild now only prints:
[zdevito@devgpu172.prn2 /data/users/zdevito/pytorch] python setup.py -q rebuild develop
[0/1] Install the project...
-- Install configuration: "RelWithDebInfo"
ninja: no work to do.
ninja: no work to do.
ninja: no work to do.
ninja: no work to do.
ninja: no work to do.
ninja: no work to do.
2. Deletes apparently dead calls to `generate_code`. Now that CMake builds these files,
it appears that it is getting called twice and the second version is never used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14972
Reviewed By: soumith
Differential Revision: D13396330
Pulled By: zdevito
fbshipit-source-id: 83c45143bbc6a6d2c1cfee929291ec059f2b5dc3
Summary:
This has 4 changes
1) propagate USE_SYSTEM_NCCL. Previously it was ignored and cmake always did a FindPackage
2) respect SCCACHE_DISABLE in our caffe2 sccache wrapper for circleci
3) use SCCACHE_DISABLE when building nccl, because it triggers the same bug as when using CCACHE (already tracked in https://github.com/pytorch/pytorch/issues/13362). This was hidden because we weren't respecting USE_SYSTEM_NCCL, and were never building nccl ourselves in CI
4) In one particular CI configuration (caffe2, cuda 8, cudnn 7), force USE_SYSTEM_NCCL=1. Building the bundled nccl triggers a bug in nvlink. I've done some investigation, but this looks like a tricky, preexisting bug, so rather than hold up this diff I'm tracking it separately in https://github.com/pytorch/pytorch/issues/14486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14195
Differential Revision: D13237502
Pulled By: anderspapitto
fbshipit-source-id: 1100ac1269c7cd39e2e0b3ba12a56a3ce8977c55
Summary:
export - print a method with python_print
import - import a method with import_method
We want to ensure:
export(g) == export(import(export(g)))
That is after after exporting/importing once, the graph will stay exactly
the same. This is less strict that g == import(export(g)) which would
require us to maintain a lot more information about the structure of the
IR and about the names of debug symbols.
This PR addresses this with the following fixes:
* print out double-precision numbers with high enough precision such
that they always parse in the same way
* when creating loop-carried dependencies, sort them
by variable name, ensuring a consistent order
* parse nan correctly
* DCE: remove unused outputs of if statements, and loop-carried dependencies
in loops that are dead both after the loop and inside the body of the
loop.
* Do not set uniqueName for variables whose names are _[0-9]+, these
are probably rare in user code, and we need a way to communicate
that we do not care about a variable name when re-parsing the graph.
Otherwise temporary variable names will jitter around.
* Expand the definition of a constant in printing code to None,
and family.
* Allow re-treeing to work as long as the only thing in its way is a
constant node. These do not have side effects but are sometimes
inserted in a different order when tracing compared to how we print them.
* Print all constant nodes out first in the order in which they are used_val
(or, if they are inlined, ensure they get assigned CONSTANT.cX number
in a consistent order). Cleanup tuples (this is done in the compiler,
but not in the tracer, leading to some tuple indexing jitter if not
done).
* use strtod_l, not std::stod which can throw exceptions
Other:
* Add REL_WITH_DEB_INFO to setup.py. It already existed for the
cmake files. Threading it into setup.py allows us to turn on
debug symbols with optimization everywhere.
* enable round trip testing for all generated graphs. This only adds
~6 seconds to total build time but tests printing for every graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064
Differential Revision: D13094637
Pulled By: zdevito
fbshipit-source-id: 0a1c6912194d965f15d6b0c6cf838ccc551f161d
Summary:
This is the next minimal step towards moving _C into cmake. For now,
leave _C in setup.py, but reduce it to an empty stub file. All of its
sources are now part of the new torch-python cmake target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742
Reviewed By: soumith
Differential Revision: D13089691
Pulled By: anderspapitto
fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13838
According to Sebastian, the detail convention is specifically for header-private
functionality. That's not what c10/detail is; it's general, library private headers
which may be used in multiple places within PyTorch. Rename it to impl to avoid
the confusion in nomenclature.
Reviewed By: smessmer
Differential Revision: D13024368
fbshipit-source-id: 050f2632d83a69e3ae53ded88e8f938c5d61f0ef
Summary:
The python lib path on Windows was set to an incorrect path. This fixes it to be consistent with Linux.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13848
Differential Revision: D13030945
Pulled By: soumith
fbshipit-source-id: 7fb9013ffe66cff98018aea25fdb5cda03cbceb1
Summary:
1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267)
2) CentOS 7.5 docker (ROCm 279)
* Always install the libraries at docker creation for ubuntu.
* Add Dockerfile for CentOS ROCm
* Enable the centos build
* Source devtoolset in bashrc
* Set locales correctly depending on whether we are on Ubuntu or CentOS
* Install a newer cmake for CentOS
* Checkout thrust as there is no package for CentOS yet.
PyTorch/Caffe2 on ROCm passed tests: https://github.com/ROCmSoftwarePlatform/pytorch/pull/280
For attention: bddppq ezyang
Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12899
Differential Revision: D13029424
Pulled By: bddppq
fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342
This PR introduces a few new concepts:
- DeviceGuardImplInterface, and implementations for CPU and CUDA, which
provide a generic interface for interfacing with device and stream state,
without requiring a direct dependency on the code in question.
- InlineDeviceGuard, a general template for generating both specialized
and dynamically dispatched device guard implementations. Dynamic
dispatch is done by specializing it on a VirtualGuardImpl.
- Provide a device-independent DeviceGuard class, which can be used even
from CPU code. It uses the aforementioned dynamic dispatch.
- CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch
but can only be used from CUDA.
- StreamGuard, which is the same as above, but for streams rather than
devices.
- Optional variants of all the aforementioned guards, which are a no-op if
no device/stream is specified
- CUDAMultiStreamGuard, specifically for the case when we want to set
a device on every guard.
There are some subtle semantic changes, which have been thoroughly documented
in the class definition.
BC-breaking changes:
- Move constructor/assignment have been removed from all device guard
implementations.
- In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write
'reset_device', because if you switch devices/device types, the stream/device on the
previous device is unset. This is different from previous behavior.
- CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard
or CUDAMultiStreamGuard as appropriate for your use case.
Reviewed By: dzhulgakov
Differential Revision: D12849620
fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e
Summary:
We now have submodules that have submodules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13769
Reviewed By: soumith
Differential Revision: D13000203
Pulled By: SsnL
fbshipit-source-id: 63c0c19c6c9d25ae3bf255a2421a82ca68278866
Summary:
MKLDNN is not supported on ppc64le change USE_MKLDNN to OFF for ppc64le
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13759
Differential Revision: D12993121
Pulled By: soumith
fbshipit-source-id: 539d5cfcff2c03b59fa71e10b52fac333a64c381
Summary:
- fixes weights-contiguous requirement for THCUNN Convolutions
- Add tests that conv backward pass works for non-contiguous weights
- fix RNN tests / error messages to be consistent and pass
- relax weight grad precision for fp16 for a particular test
- fix regression of CMAKE_PREFIX_PATH not passing through
- add missing skipIfNoLapack annotations where needed
Differential Revision: D12918456
Pulled By: soumith
fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13217
Caffe2 proto headers are not included in pytorch package data (https://github.com/pytorch/pytorch/blob/master/setup.py#L1180). However, they are required for building custom Caffe2 ops living outside PyTorch/Caffe2 repo (e.g. custom Detectron ops).
Reviewed By: pjh5
Differential Revision: D12815881
fbshipit-source-id: 4d1aaa6a69a2193247586e85e4244fbbdb3e8192
Summary:
libcaffe2.so depends on libgloo.a for the ops in caffe2/contrib/gloo.
Symbols in libgloo.a that are not used are ignored and don't end up in
libcaffe2.so. libc10d.a depends on the caffe2 target, which in turn
depends on the gloo target, and it expects all libgloo.a symbols to be
part of libcaffe2.so. Symbols from libgloo.a that are not used in
libcaffe2.so remain undefined in libc10d.a.
To fix this, we link to libgloo.a when linking _C.so, such that any
gloo symbols in libc10d.a are resolved when linking _C.so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13462
Differential Revision: D12892830
Pulled By: pietern
fbshipit-source-id: 7560b3899b62f76081b394498480e513a84cefab
Summary:
always build nccl from within the main cmake build, rather than via a separate invocation in build_pytorch_libs.sh. Use the existing caffe2 codepaths
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13150
Differential Revision: D12815674
Pulled By: anderspapitto
fbshipit-source-id: a710b6f242d159b9816911a25ee2c4b8c3f855aa
Summary:
We want to move _C into the same cmake invocation that builds
libcaffe2 and libtorch. However, _C depends on THD and c10d, which in
turn depend on libcaffe2. That means that we can't move _C into that
cmake file unless we do these two first. This change does so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12775
Differential Revision: D10457374
Pulled By: anderspapitto
fbshipit-source-id: 2c1aa3b8a418a73d2112e93c7da53a2e70cf7bba
Summary:
rather than pass a list through a text file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12951
Differential Revision: D10528309
Pulled By: anderspapitto
fbshipit-source-id: d94befcd61b6304815859694b623046f256462df
Summary:
This is used to patch our cmake cuda scripts - should be in the installation script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13013
Reviewed By: ir413
Differential Revision: D10519104
Pulled By: Yangqing
fbshipit-source-id: 542049224ea41068f32d4c0f6399c7e8b684f764
Summary:
This PR implements a DataLoader API for the C++ frontend.
The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;
Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.
Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors
The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.
There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.
I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.
There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.
apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918
Reviewed By: ezyang
Differential Revision: D9998881
Pulled By: goldsborough
fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea
Summary:
There will be a link error when the caffe2 doesn't use its protobuf under third_party. The pytorch will always link that protobuf. The pytorch doesn't use the protobuf directly. We could remove it from
the list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12451
Differential Revision: D10262676
Pulled By: ezyang
fbshipit-source-id: c2ff3fdf757fc21ed689e7f663c082064b1a0bca
Summary:
There are still a few work to be done:
- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h
This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:
(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.
Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354
Reviewed By: orionr
Differential Revision: D10238910
Pulled By: Yangqing
fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
Summary:
Properly set cmake python_library and include_dirs hints, so that systems with multiple version of python can still find the correct libraries and header files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12569
Differential Revision: D10359910
Pulled By: soumith
fbshipit-source-id: 2238dcbed7aac8a818c9435e6bba46cda5f81cad
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself
NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359
Reviewed By: orionr, yns88
Differential Revision: D10219665
Pulled By: teng-li
fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881
Summary:
Previously, we were only enabling Flush-To-Zero (FTZ) and
Denormals-Are-Zero (DAZ) when compiling with SSE3 enabled. After,
Christian's patch (https://github.com/pytorch/pytorch/pull/12109) we
won't be compiling core files with SSE3 or SSE4 enabled, to better
support older AMD processors.
This moves the FTZ and DAZ code behind a runtime CPU check in
preparation for that change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12386
Differential Revision: D10222237
Pulled By: colesbury
fbshipit-source-id: 7ffe32561ab965e1e5f9eb6e679602bbf4775538
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself
NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312
Differential Revision: D10190845
Pulled By: teng-li
fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae
Summary:
This variable is already being used so this just serves to document that. I think it's an important variable, too, so it should definitely be documented there somewhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12265
Differential Revision: D10162261
Pulled By: soumith
fbshipit-source-id: e0d01e012c2fedea63372de9967a8eaa3745fe94
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11970
Adds an ATen-core-headers target, which caffe2_cpu_internal depends
on, and makes ATen-core depend on caffe2_headers. If you link against
ATen-core, you must ALSO link against caffe2_cpu_internal; if you
link against caffe2_cpu_internal, you must ALSO link against ATen-core,
otherwise you'll have undefined symbols.
Then, we merge template data<T>() method with Caffe2 implementation,
demonstrating that includes to Caffe2 (core) from ATen/core are working
Reviewed By: jerryzh168
Differential Revision: D9967509
fbshipit-source-id: 3d220c38b2c3c646f8ff2884fdcc889fa9276c7a
Summary:
This does 6 things:
- add c10/util/Registry.h as the unified registry util
- cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10
Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077
Reviewed By: ezyang
Differential Revision: D10050771
Pulled By: Yangqing
fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
Summary:
This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct.
cc Yangqing ezyang soumith goldsborough pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053
Differential Revision: D10041878
Pulled By: orionr
fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0
Summary:
Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), *which is only passed when building a C++ extension*.
Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498
Why is this useful?
1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed.
2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API.
3. The C++ extension API simply becomes richer by gaining access to the C++ API headers.
soumith ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510
Reviewed By: ezyang
Differential Revision: D9998835
Pulled By: goldsborough
fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535
Summary:
Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name.
I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero.
One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them.
Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found.
ezyang gchanan soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725
Differential Revision: D9948244
Pulled By: goldsborough
fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
Summary:
The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385
Reviewed By: orionr
Differential Revision: D9884906
Pulled By: mingzhe09088
fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6
Summary:
I'm just doing the honors and bumping the version to 1.0.0.
1.0 preview and RC releases will have the 1.0.0.dev{date} tag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11717
Reviewed By: SsnL
Differential Revision: D9840857
Pulled By: soumith
fbshipit-source-id: 4c9c2e01dccb3c521dab26c49e1569d970a87ace
Summary:
This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726
Reviewed By: orionr
Differential Revision: D9844075
Pulled By: pjh5
fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa
Summary:
Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed.
This updated logic fixes the input caching
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644
Reviewed By: orionr
Differential Revision: D9814348
Pulled By: soumith
fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f
Summary:
This makes torch.distributed works for CPU only build.
Also added one more CI test case to cover MPI CPU build.
All CI tests should cover this change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513
Differential Revision: D9784546
Pulled By: teng-li
fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7
Summary:
This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work.
This also changes the code for calling python functions from script to allow non-tensor inputs/outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505
Differential Revision: D9765466
Pulled By: zdevito
fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69
Summary:
This speeds up incremental builds by doing the following changes:
- Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy"
- Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`.
*Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun*
Current no-op rebuild speedup:
- 1m 15s -> 20s
There are some lingering bugs. No-op rebuilds rerun `cmake` for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild).
So what you see
```
python setup.py rebuild develop # first time - ~5 mins
python setup.py rebuild develop # second time - ~3 mins
python setup.py rebuild develop # third time - ~2 mins
python setup.py rebuild develop # fourth time - ~20 seconds
python setup.py rebuild develop # fifth time - ~20 seconds
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487
Differential Revision: D9769087
Pulled By: soumith
fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5
Summary:
Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with
```
USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps
```
Also add a flag to build Caffe2 ops, which is default `ON`. Disable with
```
NO_CAFFE2_OPS=1 python setup.py build_deps
```
cc Yangqing soumith pjh5 mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462
Reviewed By: soumith
Differential Revision: D9758156
Pulled By: orionr
fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63
Summary:
This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them.
I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think?
Also ezyang for CI changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611
Differential Revision: D9597627
Pulled By: goldsborough
fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3
Summary:
Continuing pjh5's work to remove FULL_CAFFE2 flag completely.
With these changes you'll be able to also do something like
```
NO_TEST=1 python setup.py build_deps
```
and this will skip building tests in caffe2, aten, and c10d. By default the tests are built.
cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321
Reviewed By: mingzhe09088
Differential Revision: D9694950
Pulled By: orionr
fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8
Summary:
The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase.
I also split code into a `.h` and `.cpp` file for better code organization.
The controller you requested could not be found. pietern apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805
Differential Revision: D9688604
Pulled By: goldsborough
fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8
Summary:
* purge hcSPARSE now that rocSPARSE is available
* integrate a custom hcc and HIP
* hcc brings two important compiler fixes (fixes hundreds of unit tests)
* HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch)
* mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault)
* optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198
Differential Revision: D9652340
Pulled By: ezyang
fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690
Summary:
Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set.
cc pietern The controller you requested could not be found. cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221
Reviewed By: Yangqing
Differential Revision: D9664267
Pulled By: orionr
fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73
Summary:
- In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if
`from __future__ import division` is not imported in the file.
- The / operator is universally set to do "true" division for integers
- Added a `prim::FloorDiv` operator because it is used in loop unrolling.
The error if users use '/' in python 2 without importing from __future__
occurs when building the JIT AST.
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016
Differential Revision: D9613527
Pulled By: zou3519
fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6
Summary:
Will use USE_DISTRIBUTED for both c10d and THD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237
Differential Revision: D9647825
Pulled By: teng-li
fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893
Differential Revision: D9615053
Pulled By: ezyang
fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
Summary:
Added MPI group support.
And this will make all previous group test cases of MPI passed.
Also, release the MPI thread level support by serializing different PG's MPI ops. This is required.
The build is fixed too
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128
Differential Revision: D9602188
Pulled By: teng-li
fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497
Summary:
We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default.
cc zdevito pjh5 ezyang Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772
Reviewed By: pjh5
Differential Revision: D9465894
Pulled By: orionr
fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201
Summary:
In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed.
cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771
Reviewed By: pjh5
Differential Revision: D9474458
Pulled By: orionr
fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492
Summary:
* first integration of MIOpen for batch norm and conv on ROCm
* workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing
* workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script
* use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm
* enable test_sparse set on CI, skip tests that don't work currently on ROCm
* enable more tests in test_optim after the elementwise_bug got fixed
* enable more tests in test_dataloader
* improvements to hipification and ROCm build
With this, resnet18 on CIFAR data trains without hang or crash in our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612
Reviewed By: bddppq
Differential Revision: D9423872
Pulled By: ezyang
fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:
1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
1. It defines an op in `op.{h,cpp}`
2. Registers it with the JIT using `RegisterOperators`
3. Builds it into a shared library via a `CMakeLists.txt`
4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!
The pure C++ and the Python builds are separate and not coupled in any way.
zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226
Differential Revision: D9296839
Pulled By: goldsborough
fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
Summary:
delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR). This ensures that all consumers of the torch CMake file go through as unified a path as possible.
In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508
Differential Revision: D9354398
Pulled By: anderspapitto
fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9
Summary:
In my environment, it looks like setup.py hangs when running
```
FULL_CAFFE2=1 python setup.py build_deps
```
Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`.
cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530
Differential Revision: D9349597
Pulled By: orionr
fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4
Summary:
It just calls into `ninja install`. For iterative work on
libtorch.so/_C.so,
`python setup.py rebuild_libtorch develop` should provide quick iteration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036
Differential Revision: D9317869
Pulled By: anderspapitto
fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406
Reviewed By: Jorghi12
Differential Revision: D9277093
Pulled By: ezyang
fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
Summary:
```
This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some
programs that use multiprocessing. The backtrace pointed to
StorageRef.__del__ being called from subtype_dealloc. My guess is that
the Python interpreter was shutdown before all C++ Storage objects were
deallocated. Deallocating the C++ Storage called the finalizer which
called back into Python after it was no longer safe to do so.
This avoids a callback from C++ into Python during Storage finalization.
Instead, dead Storage objects (expired weak references) are collected
periodically when shared_cache exceeds a limit. The limit is scaled with
2x the number of live references, which places an upper bound on the
amount of extra memory held by dead Storage objects. In practice, this
should be very small.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407
Differential Revision: D9272400
Pulled By: colesbury
fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2
Summary:
I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544
Reviewed By: orionr
Differential Revision: D9267111
Pulled By: pjh5
fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894
Summary:
This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation.
Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380
Differential Revision: D9267981
Pulled By: Yangqing
fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390
Summary:
Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`.
But "setup.py clean" deletes these folders because those are described in `.gitignore` file.
To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker.
Discussed in #10206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233
Differential Revision: D9175515
Pulled By: ezyang
fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37
Summary:
This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print. If they are encountered elsewhere a NYI exception will be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324
Reviewed By: jramseyer
Differential Revision: D8807128
Pulled By: eellison
fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078
Summary:
ATenCore.h is a dummy header to just test that this is working at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019
Reviewed By: smessmer
Differential Revision: D9067262
Pulled By: ezyang
fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee
Summary:
* THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>`
* Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet.
* There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides)
* Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides
* Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides)
Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go.
Note for gchanan: review from commit "ci" and after
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561
Reviewed By: cpuhrsch
Differential Revision: D8901926
Pulled By: ezyang
fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510
Summary:
Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other.
1) with setup.py. This method
- used the setuptools C extension functionality
- worked on all platforms
- did not build test_jit/test_api binaries
- did not include the C++ api
- always included python functionality
- produced _C.so
2) with cpp_build. This method
- used CMake
- did not support Windows or ROCM
- was capable of building the test binaries
- included the C++ api
- did not build the python functionality
- produced libtorch.so
This diff combines the two.
1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build
- is CMake-based
- works on all platforms
- builds the test binaries
- includes the C++ api
- does not include the python functionality
- produces libtorch.so
2) the setup.py build
- compiles the python functionality
- calls into the CMake build to build libtorch.so
- produces _C.so, which has a dependency on libtorch.so
In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792
Reviewed By: ezyang
Differential Revision: D8764181
Pulled By: anderspapitto
fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f
Summary:
Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation).
- `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph.
- Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select).
- Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph)
Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198
Reviewed By: zdevito
Differential Revision: D8744466
Pulled By: ChunliF
fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.
Commit 1
=======
This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.
* Switches schema over to parsed declarations, in the future this will allow something like:
```
registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
...
})
```
This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).
The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.
Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.
This removes the other way we encoded schema, and makes it easier to see what schema are registered.
Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6
* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.
* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.
* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.
Commit 2
=======
This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.
* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.
* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.
* Remove addInterpreterOpHandler in favor of global operator registry.
* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.
* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.
* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.
* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication
* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885
Reviewed By: jamesr66a
Differential Revision: D8751048
Pulled By: zdevito
fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557
Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions. Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died. This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.
So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage. Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr. When all strong references die,
data_ptr dies and the finalizers get invoked.
There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage. For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity). This means if you
keep calling it, we'll keep piling on finalizers. That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148
Differential Revision: D8729106
Pulled By: ezyang
fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
Summary:
Tested on my mac on a pretty clean anaconda3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509
Reviewed By: orionr
Differential Revision: D8702257
Pulled By: pjh5
fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689
Summary:
When we moved the libaten build into libcaffe2, we changed the location where it generated compile_commands.json such that it was no longer being picked up by the build script. This fixes it so it is still found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9227
Reviewed By: goldsborough
Differential Revision: D8757984
Pulled By: zdevito
fbshipit-source-id: 73df26bf08d98f18ac841d6c0db7e332fd328ab6
Summary:
With the Cppzation of a few files in `TH`/`THC`, the CPP extensions got broken whenever the user uses feature from `THC` in their files, when pytorch is installed via `python setup.py install`.
This addresses issues such as
```
/home/me/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC/THCDeviceTensorUtils.cuh:5:25: fatal error: THCTensor.hpp: No such file or directory
```
Closes https://github.com/pytorch/pytorch/pull/9182
Reviewed By: soumith
Differential Revision: D8734581
Pulled By: fmassa
fbshipit-source-id: 2a1138f208592eaccb01fcdb805a6b369d7a497a
Summary:
Add BatchTensor class
- construct from data, mask, dims or construct from list of tensors
- can return a list of tensors from an BatchTensor class
next step: do IR level transformation and operators
Closes https://github.com/pytorch/pytorch/pull/8922
Differential Revision: D8668986
Pulled By: ChunliF
fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e