Commit Graph

13 Commits

Author SHA1 Message Date
Aapo Kyrola
d1def93166 [torch/debuggability] use log.info() in addition to print() in timeoutguard (#57296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57296

Seems many trainers disable print(), so we cannot see the thread dumps with CompleteInTimeOrDie(). So log.info() also.

Test Plan: sandcastle

Reviewed By: aalmah

Differential Revision: D28098738

fbshipit-source-id: dfdca8801bacf5c7bccecc2387cb7ef41dadfa46
2021-04-29 15:23:35 -07:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Andrew Dye
6ecaed5021 Generate a core dump when CompleteInTimeOrDie forcefully quits
Summary: CompleteInTimeOrDie was added to detect deadlocks and proactively exit. In addition, call os.abort() to generate a core dump so that the error is actionable.

Reviewed By: bmaurer

Differential Revision: D6938343

fbshipit-source-id: 8bd36da4f4bb1195bd3398f25d133a6ebf1c66ad
2018-02-08 14:08:51 -08:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Aapo Kyrola
cffbbfa9e3 Revert D5655753: [Caffe2] better straggler exit procedure
Summary:
This reverts commit ad0c998feeb03bcb0cf4e5127fb3cc7bb00dcedb

bypass-lint

Differential Revision: D5655753

fbshipit-source-id: 2f1d350286d2ee31e8045c9bd03ef1235f1a93ec
2017-08-25 14:23:09 -07:00
Aapo Kyrola
4c9eff807b better straggler exit procedure
Differential Revision: D5655753

fbshipit-source-id: ad0c998feeb03bcb0cf4e5127fb3cc7bb00dcedb
2017-08-24 12:33:30 -07:00
Thomas Dudziak
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Aapo Kyrola
2cddbc719c Euthanize a process with timeout
Summary: vigneshr has been experiencing randomly that the process does not exit in the end. We don't know what causes this, so this will help with two ways: (1) by putting timeout_guard.EuthanizeIfNecessary(600) in the end of the operator, you ensure that the process is killed in 10 minutes, allowing for retry; (2) this killing will cause python stack traces to be dumped, helping debug the real issue.

Differential Revision: D4635781

fbshipit-source-id: b558418c80671c00effdd514e4ddc01e935c95df
2017-03-01 11:38:11 -08:00
Aapo Kyrola
aa3156c235 Remove use of logging module and np.random.randint() due to deadlocks with forks
Summary: See http://bugs.python.org/issue6721. Since everstore loaders use ProcessPoolExecutor, which is based on forks, and there was perhaps update of the numpy library or some unralted lirbary, we started getting subprocesses stuck at np.random.randint().   Also changed logging to prints, since logging is known to have issues with multiprocessing.  See https://www.prod.facebook.com/groups/fbpython/permalink/1438647216176641/

Differential Revision: D4633725

fbshipit-source-id: ae948a1827c71a3a2119d6a3248706728984df31
2017-03-01 03:32:56 -08:00
Aapo Kyrola
0a060dae50 better killing after timeout, cleanup
Summary:
This fixes at partly a recurrent problem when using everstore data input (or any other data input with multiprocessing). If the main process dies violently, the child processes are not killed. One cause for this was when using the TimeoutGuard(), as it called os._exit(1) that prevents any cleanup happening. I changed it to send SIGINT signal to the PID, and if in 10 secs the process is still living, calling os._exit(1). In my tests, this works well.

Did some other cleanup:
- improved logging of inputs/sec in data_workers
- removed redundant atexit() handling as the multiprocessing pool does it itself

Differential Revision: D4602550

fbshipit-source-id: 64d4526a2a3625d163d23f078286e719d56998f4
2017-02-23 13:16:19 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00