pytorch

OSSForks/pytorch

Fork 0

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Commit Graph

Author	SHA1	Message	Date
Aapo Kyrola	35fa9e9c5f	a couple small reliability improvements Summary: A couple of more misc changes: - allow starting the coordinator multiple times -- this makes data parallel programming easier - make the fetcher id a global sequence, before each gpu had same ids for workers - my flow jobs got stuck when joining the fetcher threads. I think there is actually a memory fencing problem with the is_active boolean. But I am too tired to add proper condition variables there. Instead just add timeout to join(). It is needed anyway since some i/o thread could get blocked. Differential Revision: D4333381 fbshipit-source-id: 88226c8a9c9a5e05d771360a502a2ba21a6b9d76	2016-12-15 21:29:29 -08:00
Aapo Kyrola	0b52b3c79d	Generalize threaded data input via queues + Everstore input Summary: Xray sampler (originally by ajtulloch) and prigoyal's resnet trainer use variants of the threaded data input where worker threads put stuff into a python queue that is drained by an enqueuer thread that dumps those batches to a Caffe2 queue, that is then drained by the net's DequeueBlobs operator. There is a lot of boilerplate, which is also quite complicated. This diff is an attempt to generalize that general stuff under a new module "data_workers" (name could be improved). Basically you pass it a function that is able to return chunks of data (usually data + labels). I also created a module 'everstore_data_input' which generalizes everstore-origin data input with preprocessing function (image augmentation , for example). See how I refactored sampler.py for the usage. Next we could create fetcher function for Laser data. Differential Revision: D4297667 fbshipit-source-id: 8d8a863b177784ae13940730a27dc76cd1dd3dac	2016-12-15 12:01:30 -08:00

Author

SHA1

Message

Date

Aapo Kyrola

35fa9e9c5f

a couple small reliability improvements

Summary:
A couple of more misc changes:
- allow starting the coordinator multiple times -- this makes data parallel programming easier
- make the fetcher id a global sequence, before each gpu had same ids for workers
- my flow jobs got stuck when joining the fetcher threads. I think there is actually a memory fencing problem with the is_active boolean. But I am too tired to add proper condition variables there. Instead just add timeout to join(). It is needed anyway since some i/o thread could get blocked.

Differential Revision: D4333381

fbshipit-source-id: 88226c8a9c9a5e05d771360a502a2ba21a6b9d76

2016-12-15 21:29:29 -08:00

Aapo Kyrola

0b52b3c79d

Generalize threaded data input via queues + Everstore input

Summary:
Xray sampler (originally by ajtulloch) and prigoyal's resnet trainer use variants of the threaded data input where worker threads put stuff into a python queue that is drained by an enqueuer thread that dumps those batches to a Caffe2 queue, that is then drained by the net's DequeueBlobs operator.

There is a lot of boilerplate, which is also quite complicated.

This diff is an attempt to generalize that general stuff under a new module "data_workers" (name could be improved). Basically you pass it a function that is able to return chunks of data (usually data + labels).

I also created a module 'everstore_data_input' which generalizes everstore-origin data input with preprocessing function (image augmentation , for example). See how I refactored sampler.py for the usage.

Next we could create fetcher function for Laser data.

Differential Revision: D4297667

fbshipit-source-id: 8d8a863b177784ae13940730a27dc76cd1dd3dac

2016-12-15 12:01:30 -08:00

2 Commits