pytorch/caffe2/distributed/store_ops_test_util.py
sf-wind 602a09dde7 Update caffe2 from facebook 4f527ef46abf (#2234)
* [GanH]: two_task_discriminator

as titled

and adding label smooth

* [Dper2] Simplified UI options needed for blob magnitude visualization

* [GanH]: fix tags

as titled

* Added type and shape inference for GatherRange operator

This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.

* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python

We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.

* Bind Gloo IoException to IoError in Python

Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.

* [GanH]: add label smoothing to softmax with loss

as titled

* [C2] Enable LARS in Adagrad and hook it to DPER

* [DPER] Don't pass LayerModelHelper in create_trainer_nodes

Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.

* fix bugs in LambdaRankNdcgOp

the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.

* Restrict thread pool on iOS to only big cores

Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.

* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

* make clang happy and get fewer warnings

make clang happy and get fewer warnings

* [Personalization] Support add_output_schema() in layer_model_helper

Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.

Solution:
For flexibility, we want to add fields to output_schema incrementally.

Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.

Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer

Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
2018-03-12 12:22:59 -07:00

92 lines
3.0 KiB
Python

# Copyright (c) 2016-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
## @package store_ops_test_util
# Module caffe2.distributed.store_ops_test_util
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from multiprocessing import Process, Queue
import numpy as np
from caffe2.python import core, workspace
class StoreOpsTests(object):
@classmethod
def _test_set_get(cls, queue, create_store_handler_fn, index, num_procs):
store_handler = create_store_handler_fn()
blob = "blob"
value = np.full(1, 1, np.float32)
# Use last process to set blob to make sure other processes
# are waiting for the blob before it is set.
if index == (num_procs - 1):
workspace.FeedBlob(blob, value)
workspace.RunOperatorOnce(
core.CreateOperator(
"StoreSet",
[store_handler, blob],
[],
blob_name=blob))
output_blob = "output_blob"
workspace.RunOperatorOnce(
core.CreateOperator(
"StoreGet",
[store_handler],
[output_blob],
blob_name=blob))
try:
np.testing.assert_array_equal(workspace.FetchBlob(output_blob), 1)
except AssertionError as err:
queue.put(err)
workspace.ResetWorkspace()
@classmethod
def test_set_get(cls, create_store_handler_fn):
# Queue for assertion errors on subprocesses
queue = Queue()
# Start N processes in the background
num_procs = 4
procs = []
for index in range(num_procs):
proc = Process(
target=cls._test_set_get,
args=(queue, create_store_handler_fn, index, num_procs, ))
proc.start()
procs.append(proc)
# Test complete, join background processes
for proc in procs:
proc.join()
# Raise first error we find, if any
if not queue.empty():
raise queue.get()
@classmethod
def test_get_timeout(cls, create_store_handler_fn):
store_handler = create_store_handler_fn()
net = core.Net('get_missing_blob')
net.StoreGet([store_handler], 1, blob_name='blob')
workspace.RunNetOnce(net)