Commit Graph

62 Commits

Author SHA1 Message Date
peterjc123
d45f3d0d5c Skip cpp_extensions test when possible on Windows (#6423) 2018-04-12 12:12:39 +02:00
Peter Goldsborough
6f10978e7b
Skip C++ extensions test when ninja is not available (#6480) 2018-04-10 14:50:24 -07:00
Peter Goldsborough
c3f7e5ff55
Install signal handler for SIGCHLD in run_test.py (#6436)
Handle exit signal in run_test.py
2018-04-10 11:31:23 -07:00
peterjc123
63af898d46 Fix extension test on Windows (#5548)
* Change cpp_extensions.py to make it work on Windows

* Fix linting

* Show python paths

* Debug

* Debug 1

* set PYTHONPATH

* Add ATen into library

* expose essential libs and functions, and copy _C.lib

* Specify dir in header

* Update check_abi for MSVC

* Activate cl environment to compile cpp extensions

* change version string

* Redirect stderr to stdout

* Add monkey patch for windows

* Remove unnecessary self

* Fix various issues

* Append necessary flags

* add /MD flag to cuda

* Install ninja

* Use THP_API instead of THP_CLASS

* Beautify the paths

* Revert "Use THP_API instead of THP_CLASS"

This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d.

* Use THP_API instead of THP_CLASS(new)
2018-04-02 13:53:25 -04:00
Edward Z. Yang
2ad972c9eb
A complete revamp of our test scripts. (#5904)
- All of the scripts are based off of the idea that they should be as
  simple as possible, and all the heavy lifting done in the construction
  of the Docker file.  The scripts are really simple now.  A bigger
  philosophical discussion can be found in .jenkins/README.md

- build-asan.sh is split out of build.sh, as ASAN builds are a bit
  specialized and it's inappropriate to run many of the other builds
  as part of them.

- We now build and run with mkl/mkl-include on the CPU only builds

- We now report sccache and ccache stats at the end of all builds.

- run_test.py flushes stdout/stderr before making a subprocess call,
  which should solve our interleaving problems.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-22 16:31:50 -04:00
Peter Goldsborough
4613eef69e Simplify run_test.py and dont use shell=True (#5767)
* Simplify run_test.py and dont use shell=True

* Fix non-shell output for check_output and always print to stderr

* Use shlex.split instead of str.split

* s/log/print_to_stderr

* with_init -> with_init_file

* Remove bufsize argument
2018-03-15 01:12:51 -04:00
Edward Z. Yang
3f3b686056 Refactor run_test.py to pass all options, not just verbose. (#5760)
I need this because run_test is going to need to read other
options than just verbose when I implement JUnit XML dumping.
(JUnit XML dumping cannot be implemented solely by frobbing
--python because the XML file to dump to must vary based on the
test name.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-14 07:44:58 -04:00
Edward Z. Yang
cadeb0cb17
Revert "ATen ReduceOps (#5481)" (#5765)
* Revert "ATen ReduceOps (#5481)"

This reverts commit 310c3735b9.

* Revert "Check that new cpuinfo and tbb submodules exist (#5714)"

This reverts commit 1a23c9901d.
2018-03-13 23:50:16 -04:00
Peter Goldsborough
16fa12214d raise RuntimeError on test failure (#5754) 2018-03-13 18:53:43 -04:00
cpuhrsch
310c3735b9 ATen ReduceOps (#5481)
This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-12 15:19:12 -04:00
Peter Goldsborough
6404904d8a Fix run_test.py (#5693) 2018-03-10 19:16:40 -05:00
Peter Goldsborough
53876c4606 Rewrite run_test.sh in Python (#5615) 2018-03-09 22:02:02 +01:00