Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43317
Previous version was returning the path with a prefix so subsequent `getRecord` would fail.
There's only one place in PyTorch codebase that uses this function (introduced in https://github.com/pytorch/pytorch/pull/29339 ) and it's unlikely that anyone else is using it - it's not a public API anyway.
Test Plan: unittest
Reviewed By: houseroad
Differential Revision: D23235241
fbshipit-source-id: 6f7363e6981623aa96320f5e39c54e65d716240b
Summary:
solves most of gh-38011 in the framework of solving gh-32703.
These should only be formatting fixes, I did not try to fix grammer and syntax.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41068
Differential Revision: D22411919
Pulled By: zou3519
fbshipit-source-id: 25780316b6da2cfb4028ea8a6f649bb18b746440
Summary:
Move Storage class from __init__.pyi.in to types.py and make it a protocol, since this is not a real class
Expose `PyTorchFileReader` and `PyTorchFileWriter` native classes
Ignore function attributes, as there are yet no good way to type annotate those, see https://github.com/python/mypy/issues/2087
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40862
Differential Revision: D22344743
Pulled By: malfet
fbshipit-source-id: 95cdb6f980ee79383960f306223e170c63df3232
Summary:
BC NOTE:
This change makes it so modules saved with torch.jit.save in PyTorch 1.6 can be loaded by previous versions of PyTorch unless they use torch.div or (soon) torch.full. It also lets tensors saved using torch.save be loaded by previous versions. So this is the opposite of BC-breaking, but I'm using that label to highlight this issue since we don't have a "BC-improving" label.
PR NOTE:
When an operator's semantics change in PyTorch we want to do two things:
1) Preserve the semantics of older serialized Torchscript programs that use the operator
2) Ensure the new semantics are respected
Historically, this meant writing a Versioned Symbol that would remap older versions of the operator into current PyTorch code (1), and bumping the produced file format version (2). Unfortunately, bumping the produced file format version is a nuclear option for ensuring semantics are respected, since it also prevents older versions of PyTorch from loading anything (even tensors!) from newer versions.
Dynamic versioning addresses the nuclear consequences of bumping the produced file format version by only bumping it when necessary. That is, when an operator with changed semantics is detected in the serialized Torchscript. This will prevent Torchscript programs that use the changed operator from loading on earlier versions of PyTorch, as desired, but will have no impact on programs that don't use the changed operator.
Note that this change is only applicable when using torch.jit.save and torch.jit.load. torch.save pickles the given object using pickle (by default), which saves a function's Python directly.
No new tests for this behavior are added since the existing tests for versioned division in test_save_load already validate that models with div are loaded correctly at version 4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40279
Reviewed By: dzhulgakov
Differential Revision: D22168291
Pulled By: mruberry
fbshipit-source-id: e71d6380e727e25123c7eedf6d80e5d7f1fe9f95
Summary:
I added the following to the docs:
1. `torch.save`.
1. Added doc for `_use_new_zipfile_serialization` argument.
2. Added a note telling that extension does not matter while saving.
3. Added an example showing the use of above argument along with `pickle_protocol=5`.
2. `torch.split`
1. Added an example showing the use of the function.
3. `torch.squeeze`
1. Added a warning for batch_size=1 case.
4. `torch.set_printoptions`
1. Changed the docs of `sci_mode` argument from
```
sci_mode: Enable (True) or disable (False) scientific notation. If
None (default) is specified, the value is defined by `_Formatter`
```
to
```
sci_mode: Enable (True) or disable (False) scientific notation. If
None (default=False) is specified, the value is defined by
`torch._tensor_str._Formatter`.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39303
Differential Revision: D21904504
Pulled By: zou3519
fbshipit-source-id: 92a324257d09d6bcfa0b410d4578859782b94488
Summary:
Instead of copying to a buffer, then setting a tensor's storage with that buffer, create a storage directly from the file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36362
Pulled By: driazati
Differential Revision: D21889537
fbshipit-source-id: edbd430073c2bbf52332fe7b3b2590e7d936dedf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
This PR implements the following linear algebra algorithms for low-rank matrices:
- [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
+ exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061).
+ uses `torch.lowrank.get_approximate_basis`
+ exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] PCA - using `torch.svd_lowrank`
+ uses `torch.svd_lowrank`
+ exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices, uses non-centered sparse matrix algorithm
+ [x] documentation
- [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124)
+ exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883)
+ exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)`
+ [x] dense matrices
+ [x] batches of dense matrices
+ [x] sparse matrices
+ [x] documentation
- [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically
+ the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point.
- [x] benchmarks
+ [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations.
+ [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster.
+ [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results)
Resolves https://github.com/pytorch/pytorch/issues/8049.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488
Differential Revision: D20193196
Pulled By: vincentqb
fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e
Summary:
Fixes https://github.com/pytorch/pytorch/issues/32289
This has been fixed upstream as of Python 3.8.2. I think the easiest and least invasive way to ameliorate this is to catch the error condition and print a more informative error asking the user to update their Python version. It might be possible to buffer the data into memory and then read from memory, but that would be an invasive change and might cause memory exhaustion for very large models.
Suggestions for alternate fixes or ways to improve the error message wording are very welcome.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33824
Differential Revision: D20131722
Pulled By: ezyang
fbshipit-source-id: a6e3fbf4bf7f9dcce5772b36f7a622cbf14b5ae4
Summary:
Stacked PRs
* #32958 - Make zip serialization the default
* **#32244 - Fix some bugs with zipfile serialization**
It includes the following changes:
* Split up tests so that we can test both serialization methods
* Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end)
* Call `readinto` on a buffer if possible instead of `read` + a copy
* Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine)
](https://our.intern.facebook.com/intern/diff/19418935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32244
Pulled By: driazati
Reviewed By: eellison
Differential Revision: D19418935
fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573
Summary:
In the long string, formalstring thinks it is good to have a name.
When using dict, literal is better for readability and faster than dict constructor.
I always appreciate your efforts in creating the world's best frameworks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31352
Differential Revision: D19191967
Pulled By: ngimel
fbshipit-source-id: 21f063b163b67de8cf9761a4db5991f74318e991
Summary:
This PR updates `torch::pickle_save` to use the new zipfile format introduced in #29232 and adds `torch::pickle_load` which can decode the zipfile format. Now that `torch.save/load` use this format as well (if the `_use_new_zipfile_serialization` flag is `True`), raw values saved in Python can be loaded in C++ and vice versa.
Fixes#20356
](https://our.intern.facebook.com/intern/diff/18607087/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30108
Pulled By: driazati
Differential Revision: D18607087
fbshipit-source-id: 067cdd5b1cf9c30ddc7e2e5021a8cceee62d8a14
Summary:
This PR looks for a `constants.pkl` file at the top level in a zip file
in `torch.load`. If found, it calls `torch.jit.load` instead and issues
a warning to call `torch.jit.load` directly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29339
Differential Revision: D18611095
Pulled By: driazati
fbshipit-source-id: f070a02f6b5509054fc3876b3e8356bbbcc183e1
Summary:
Stacked PRs
* https://github.com/pytorch/pytorch/issues/29244 - Use custom CRC
* **https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization**
This adds a serialization method that uses a zipfile (https://github.com/pytorch/pytorch/issues/26567). Right now it is
guarded behind a flag `_use_new_zipfile_serialization`. In release mode it seems to have performance about the same / slightly better than the current serialization in some simple benchmarks for large/small tensors.
Follow ups:
* Flip the `_use_new_zipfile_serialization` flag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29232
Differential Revision: D18332036
Pulled By: driazati
fbshipit-source-id: 1bac0847c4d599612cba905f2cac8248783be2f4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28965
Fixed the reference to correct object
Test Plan:
Added new unit test test_serialization_save_warnings in test_torch
Verified by running the test_torch tests
Imported from OSS
Differential Revision: D18306797
fbshipit-source-id: bbdc7a1aa59a395fcbb736bcc7c3f96db45454d3
Summary:
Default encoding when using torch.load to 'utf-8'
This commit provides changes for cases where user tries to torch.load
a pickled module with non-ASCII characters in the docstring as
discussed in https://github.com/pytorch/pytorch/issues/21743. The default encoding was changed from 'ascii'
to 'utf-8'. Documentation for `torch.load` was updated and two tests
(loading py2 unicode module with unicode in it; error throwing when
user explicitly sets wrong encoding) were written.
~~This commit provides changes for better error handling in cases
where user tries to `torch.load` a pickled module with non-ASCII
characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743.~~
Ping ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26421
Differential Revision: D17581633
Pulled By: yf225
fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620
Summary:
If source code is not available due to packaging (e.g. sources are compiled to .pyc), TorchScript produces very obscure error message. This tries to make it nicer and allow to customize message by overriding _utils_internal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25415
Test Plan: Really hard to unittest properly. Did one off testing by compiling to .pyc and checking the message.
Differential Revision: D17118238
Pulled By: dzhulgakov
fbshipit-source-id: 3cbfee0abddc8613000680548bfe0b8ed52a36b0
Summary:
This addresses #18436
The logic replicates the essence of closing file descriptors in numpy:
bf20e30340/numpy/core/include/numpy/npy_3kcompat.h (L278)
This stores the position of the file descriptor before resetting it to the Python handle offset, then resets to the original position before exit. The Python-side handle is then updated to reflect the new position. Also added somewhat more demanding tests to cover this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20270
Differential Revision: D15275902
Pulled By: soumith
fbshipit-source-id: 5ca8a52b61c7718d2e69571f72f80b1350b0acdb
Summary:
This will allow pathlib.Path object to the torch.load as an input argument.
Fixes#16607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18562
Differential Revision: D14668255
Pulled By: soumith
fbshipit-source-id: 0ae4f7c210918582912f2d1ef2a98f1ab288c540
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17297
When `torch.load` needs to load a tensor, no matter which device it will be end up being loaded on, it first creates a CPU storage for it of the necessary size. This storage is allocated but it's not "set" yet, hence no data is written to it: it exists in the kernel's memory map, but it's not resident and doesn't take up physical pages. Then, this storage is passed to the `map_location` function (if the parameter is a string, a device or a map, PyTorch builds that function automatically). The default map for CUDA consists effectively in `lambda storage, _: storage.cuda()` (I omitted the code needed to pick the correct device). This creates a GPU storage and copies over the data of the CPU storage. *This step is unnecessary as we're copying uninitialized memory*. (Surprisingly enough, though, it appears the kernel is smart enough that reading from the unpaged CPU memory doesn't cause it to become paged.) Once `map_location` returns a storage residing on the correct target device, `torch.load` resumes reading the file and copying the tensor's content over into the storage. This will overwrite the content that had previously been written to it, which confirms that the above copy was pointless.
A way to avoid this useless copy is to just create and return a new empty storage on the target GPU, instead of "transforming" the original one.
This does indeed increase the performance:
```
In [5]: torch.save(torch.rand(100, 100, 100), "/tmp/tensor")
In [6]: %timeit torch.load("/tmp/tensor", map_location="cuda")
1.55 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [7]: %timeit torch.load("/tmp/tensor", map_location=lambda storage, _: torch.cuda.FloatStorage(storage.size()))
1.03 ms ± 44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Credit for this diff is shared with adamlerer and fmassa.
Differential Revision: D14147673
fbshipit-source-id: a58d4bc0d894ca03a008499334fc2cdd4cc91e9f
Summary:
We have:
- This is an initial stab at creating a type stub `torch/__init__.pyi` .
- This is only tested on Python 3, since that's the only Python version mypy
works on.
- So far, we only aim at doing this for torch functions and torch.Tensor.
- Quite a few methods and functions have to be typed manually. These are
done in `torch/__init__.pyi.in`
For me, PyCharm (the non-paid one) didn't seem to indicate errors in the .pyi when opening and seemed to be able to get the type hint for the few functions I tried, but I don't use PyCharm for my usual PyTorch activities, so I didn't extensively try this out.
An example of a generated PYI is at [this gist](https://gist.github.com/ezyang/bf9b6a5fa8827c52152858169bcb61b1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12500
Differential Revision: D13695553
Pulled By: ezyang
fbshipit-source-id: 4566c71913ede4e4c23ebc4a72c17151f94e8e21
Summary:
Throw a warning when calling `torch.load` on a zip file
Fixes#15570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15578
Differential Revision: D13555954
Pulled By: driazati
fbshipit-source-id: a37ecdb3dd0c23eff809f86e2f8b74cd48ff7277
Summary:
We align the restore logic to `torch.load`, we try to restore to the right device, and if the device is not available, an exception is raised. We allow user to remap the device through a parameter `map_location`, it can be 1) a string like 'cuda:0`, `cpu`, 2) a device, torch.device('cpu'), 3) a dict, {'cuda:1', 'cuda:0'}, and a function, and its signature looks like string map_location(tensor, saved_device_string).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14454
Reviewed By: zrphercule
Differential Revision: D13271956
Pulled By: houseroad
fbshipit-source-id: dfd6b6049b0dc07549ddeddf2dea03ac53ba6d49
Summary:
In #9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT. In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format. In #9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.
So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently. So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices. So back they come.
In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.
Fixes#10120.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314
Reviewed By: ailzhang
Differential Revision: D9671966
Pulled By: ezyang
fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
Summary:
Storage views were previously used to implement CUDA IPC sharing,
but they weren't necessary. The new strategy is described in
Note [CUDA IPC and the caching allocator].
This also fixes an unrelated bug, where we weren't actually using
the Tensor forking pickler, because we didn't register a pickler
for torch.Tensor.
Fixes#9447. Fixes#46.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
CC apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466
Reviewed By: apaszke
Differential Revision: D8859698
Pulled By: ezyang
fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
Summary:
Addresses #7415 . Adding a note first, will do the API change if there's a need in the future.
Closes https://github.com/pytorch/pytorch/pull/9019
Differential Revision: D8694056
Pulled By: ailzhang
fbshipit-source-id: 0b6fa43fa62ac55deff3b3b099d1bc9fee74a5f9
* Raise error when torch.load a storage on a non-existing device
Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine
would raise an unreadable error:
```
~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self)
223 if self.idx is -1:
224 return
--> 225 self.prev_idx = torch._C._cuda_getDevice()
226 if self.prev_idx != self.idx:
227 torch._C._cuda_setDevice(self.idx)
AttributeError: module 'torch._C' has no attribute '_cuda_getDevice'
```
This PR makes it so that torch.load raises a hard error if one tries to
load a storage onto a non-existing device and suggests the user to use
torch.load's map_location feature.
* Address comments
* missing dep
* Fix some minor errors in existing docs.
* Fix Convolution and Pooling docs in torch.nn.functional
* Cleaned up torch.nn.functional docs
* Address @SsnL 's comments
* Add multiplication sign missing in docs
* Fix more typos, and clear some warnings
* Change infinity symbol in LPPool2d
* Revert some changes in torch.nn.functional
* Few more minor changes
* Improvize documentation
1. Add formula for erf, erfinv
2. Make exp, expm1 similar to log, log1p
3. Symbol change in ge, le, ne, isnan
* Fix minor nit in the docstring
* More doc improvements
1. Added some formulae
2. Complete scanning till "Other Operations" in Tensor docs
* Add more changes
1. Modify all torch.Tensor wherever required
* Fix Conv docs
1. Fix minor nits in the references for LAPACK routines
* Improve Pooling docs
1. Fix lint error
* Improve docs for RNN, Normalization and Padding
1. Fix flake8 error for pooling
* Final fixes for torch.nn.* docs.
1. Improve Loss Function documentation
2. Improve Vision Layers documentation
* Fix lint error
* Improve docstrings in torch.nn.init
* Fix lint error
* Fix minor error in torch.nn.init.sparse
* Fix Activation and Utils Docs
1. Fix Math Errors
2. Add explicit clean to Makefile in docs to prevent running graph generation script
while cleaning
3. Fix utils docs
* Make PYCMD a Makefile argument, clear up prints in the build_activation_images.py
* Fix batch norm doc error
* Allow torch.load to take pathlib.Path
pathlib has been python standard library for filesystem path since python 3.4
But `torch.load` currently cannot take `pathlib.Path` as its filename of state dictionary.
I changed `torch.load` and `_with_file_like` to check so that they can accept `pathlib.Path` typed filepath.
* Fix flake8: too long line & indentation
- BC BREAKING: export now also takes a mandatory file-ish argument, specifying
the file to export the protobuf to. I rewrote the tests to use BytesIO to
get out the string so they could parse it again.
- BC BREAKING: export no longer returns the tensors that were computed. To
get these, use the internal _export function.
- Multiple inputs to models are now supported by passing a tuple to input.
(Old API of a single Variable still works.)
- Keyword arguments to models are now supported via kwargs keyword arg.
- Renamed embed_params to export_params, and it now defaults to True.
- Toffee tests now live in their own test_toffee.py file. I had to
rename a pile of expect files for this.
- Removed defunct torch.toffee imports from autograd to solve module import
cycle.
- Helper function _with_file_like to abstract over opening file-ish arguments,
taken from torch.save()
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Add more detail to CUDA documentation
Also adds better cross-linking to the pages that discuss relevant topics.
* Adds recommendation to torch.save docs
* Make the version numbers for the docs dynamic
Might need tweaks for beta, 1.0, etc.
Here's the command I used to invoke autopep8 (in parallel!):
git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i
Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.
Also configures flake8 to match pep8's behavior.
Also configures TravisCI to check the whole project for lint.
The __getstate__ and __setstate__ functions are called from copy.copy as
well as pickling. The source code inspection currently slows down the
data parallel code because it makes a copy of the object every
iteration.