Commit Graph

49 Commits

Author SHA1 Message Date
Dmytro Dzhulgakov
49457a7be7 Logging for ATen op subtype
Summary: ATenOp should go away, but before it does it's important to understand what's going inside of it. We already log `arguments`, but it's rather hard to parse in scuba as its a list, not a dictionary. Let's extract operator name explicitly so that grouping works well

Test Plan: unittest

Reviewed By: ngimel

Differential Revision: D21057966

fbshipit-source-id: 86be7cca39055620477a28bd5d8ab29e8edd2ff9
2020-04-19 23:02:50 -07:00
Dmytro Dzhulgakov
1f759936f0 Propagate model id used by Predictor to Caffe2 logging
Summary:
Does the same things as D19658565 but for Caffe2 models.

From investigation https://fb.quip.com/PbgsAEmoJVuf the model id that predictor uses and the model id saved inside the model don't match. Common reason is recurring fluent2 jobs but there are others.

Since model_id from predictor is what the rest of datasets use, it's way more useful imho. I've considered adding both ids, but it'd require additional piping and I don't think it's that useful.

Test Plan: unittests added

Reviewed By: houseroad

Differential Revision: D20630599

fbshipit-source-id: 3e6d0cb0b6f8c8b6ae5935138f55ae7a2ff60653
2020-03-29 23:07:32 -07:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Gemfield
6ed57e052d Fix the return value of ParseFromString (#19262)
Summary:
Fix the return value of ParseFromString.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19262

Differential Revision: D14937605

Pulled By: ezyang

fbshipit-source-id: 3f441086517186a075efb3d74f09160463b696b3
2019-04-15 12:39:29 -07:00
Duc Ngo
172ec4ace5 caffe2 - Util to cleanup external inputs and outputs from a NetDef (#18194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18194

Add a util method to cleanup external inputs and outputs from a NetDef

The following conditions will be met after the modification
- No duplicate external inputs
- No duplicate external outputs
- Going through list of ops in order, all op inputs must be outputs
from other ops, or registered as external inputs.
- All external outputs must be outputs of some operators.

Reviewed By: ZolotukhinM

Differential Revision: D14528589

fbshipit-source-id: c8d82fda1946aa3696abcbec869a4a8bb22f09b6
2019-03-22 11:23:03 -07:00
Junjie Bai
a682ce9144 Add back HIP support to async net (#13400)
Summary:
We lost HIP support in last refactoring 620ece2668
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13400

Differential Revision: D12868211

Pulled By: bddppq

fbshipit-source-id: 72dbfda105b826bee28ddf480e88fca7d63f93d8
2018-10-31 17:52:36 -07:00
Ilia Cherniavskii
620ece2668 Simplify thread pool creation logic (#13114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13114

Using one thread pool creator for all device types

Reviewed By: manojkris, wesolwsk

Differential Revision: D10851533

fbshipit-source-id: 32ca51d7932ba7faa8137df26315f52ecb4c6157
2018-10-26 16:02:08 -07:00
Yangqing Jia
28dba2f928 Unify all *_EXPORT and *_IMPORT macros across c++ backend (#12019)
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.

This is a codemod by mechanically doing the following change:

CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019

Reviewed By: ezyang, teng-li

Differential Revision: D10016276

Pulled By: Yangqing

fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
2018-09-25 17:41:05 -07:00
Orion Reblitz-Richardson
6508db7421 Remove BUILD_CAFFE2 and build everything (#8338)
Summary:
This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification.

cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338

Reviewed By: mingzhe09088

Differential Revision: D9600513

Pulled By: orionr

fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d
2018-08-31 13:10:24 -07:00
Edward Yang
91797c0672 Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946

```
codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h
```

Reviewed By: houseroad

Differential Revision: D9539945

fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e
2018-08-28 11:57:08 -07:00
Yangqing Jia
0a809fc8b1 build changes to make cpu unified build working. (#10504)
Summary:
Properly annotated all apis for cpu front. Checked with cmake using

cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON

and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504

Reviewed By: ezyang

Differential Revision: D9316491

Pulled By: Yangqing

fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
2018-08-15 17:22:36 -07:00
Lu Fang
63233f98ad Bump up opset version to 7 in Caffe2 ONNX exporter (#8854)
Summary:
Will bump up to opset 8 in another PR to match the current opset version.

Already tested through generating the models in current model zoo.
Closes https://github.com/pytorch/pytorch/pull/8854

Reviewed By: ezyang

Differential Revision: D8666437

Pulled By: houseroad

fbshipit-source-id: feffdf704dd3136aa59c0f1ff1830c14d1bd20aa
2018-06-28 07:39:02 -07:00
Duc Ngo
f52c2ca1c6 net_async tracing use enable_profile arg from NetDef (#8927)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8927

Closes https://github.com/pytorch/pytorch/pull/8855

- Add parameter `enable_tracing` to the Arg field of NetDef. `net_async_tracing` will only enable Tracer for Net instances that have this field set (unless the command line argument also include the net name).
- Append a unique id to the json profiling result file because there could be multiple instances of the same net running.
- Dump json profling file regularly instead of just when the Tracer object is destroyed

Reviewed By: ilia-cher

Differential Revision: D8372378

fbshipit-source-id: 8adc9d59f48b67456beed2e3a88235c298fdfd01
2018-06-27 16:24:57 -07:00
Orion Reblitz-Richardson
edd4e2c5d1
Expose proto utils and ONNX (#8073)
* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files
2018-06-13 10:25:32 -07:00
Yinghai Lu
ef8f556212
[Caffe2] Changes done inside Facebook (#6378)
* fix unit test for sqrt op

From the error logging:

[idx, grad, grad_estimate] are:
[[ 146.            0.5           0.45776367]
 [ 147.            0.5           0.45776367]

The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; )

The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss)

This diff
- increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :)
- also clean up, and merge the test case for inplace Vs. non-inplace

Tested with:

`CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"`

* CompositeReader & CompositeReaderBuilder

A new type of reader gluing multiple readers together.

* Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid"

Original commit changeset: 9325a4356dbe

* [dai][WIP] convert params to int8 on ps before sending to trainer

Add float->uint8 conversion in addition to float->fp16 conversion in model_saver.

* [easy] improve unit test for sparse length sum ops

as desc.

#accept2ship

* Update GitHub upstream to 771fcb3455

* move sparse hash unique ops to OOS and add unit tests

- move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1
- The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2

- fix the CUDA UniqueOp for the case when batch is empty.
- add unit test

* group_norm_op for caffe2

This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494

This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel).

* Resubmit D7405233: disappeared in D7464958

OOS publish causes the op missing -- however, test was still there

* [c2] add sparse hash engine for cuda unique op

The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU.

* [dper][gpu] enable unit testing gpu trainer for sparse nn

to debug the GPU trainer using mock data in unit test.

make it easier to develop GPU trainer for new models.

* Reuse Gloo context for Synchronize() calls

Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts).

* [GanH/WGAN][1/n]: add FC param clipping

as titled

* [mobile] minimizing changes between caffe2_benchmark and speed_benchmark

* [GanH]: enable diagnose within model

avoid finding blob names but to directly enable inside the model

* Add `net_transformer_fun` option to DPM

This callback allows for various transformations to be made to the
model after gradient operators have been added. The immediate motivation for
this is to allow transformations such has "checkpoint-and-recompute" which
allow trading off memory for additional compute.

Adding several callbacks like this has made DPM's API less than ideal at this
stage. However, I could not find any reasonable alternative.

* [DT] [33/n] Compile flow task groups

task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary.

* Initial commit for sparse_normalize vectorization and benchmark

* [GanH]: LB Calibration for JSD

as titled

* Tracing event in async executor

Adding event tracing through TRACE_EVENT macro in async executor

* [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset

D7409751 got lost in D7464958

* Visualizing realtime weights values

we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index.
Currently, we assume the blob to be 2 dimensional.

* [GanH][Easy]: Fix Homotopy Weighting

apparantely, there was a bug in homotopy weight (alpha, beta) update

* [c2] move sparse hash unique op out of oss

so that oss do not need to depend on google hash map.

* Get rid of std::round as it's not supported on Android

* Revert changes on setup.py

* Skip shaky test on Dataio

* fix
2018-04-10 21:11:43 -07:00
Orion Reblitz-Richardson
dbac044759 Add protobuf wrapper functions to proto_utils.
* These will be used when we statically link libprotobuf.a inside libcaffe2.so
2018-03-28 10:05:20 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Yangqing Jia
2d03ae2f85 Move ParseProtobufFromLargeString to proto_utils (#2354)
* Move ParseProtobufFromLargeString to proto_utils

* ParseProtobuf -> ParseProto to be consistent in naming
2018-03-21 17:05:14 -07:00
Yangqing Jia
611a89c4b6 Remove more protobuf APIs. (#2348)
* Wrap ShutdownProtobufLibrary

* Remove text_format.h header and only put the function in proto_utils.h

* ParseFromString returns bool
2018-03-21 10:29:45 -07:00
Ilia Cherniavskii
1149b9bbb5 Polling async net executor
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
 Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
 ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device

Reviewed By: dzhulgakov

Differential Revision: D5985110

fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
2017-11-03 07:27:44 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Alisson Gusatti Azzolini
68f358452b Add node_name to DeviceOption
Summary: Allow for generalizing net transforms.

Reviewed By: Yangqing

Differential Revision: D5812140

fbshipit-source-id: e3f30acad362ae1f0614ee218d331b525710b88e
2017-09-13 16:04:04 -07:00
Aapo Kyrola
631971e459 threaded RNN executor for CPU, multi-stream executor CUDA
Summary:
Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs.
With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism
over timesteps. In my experiments, it was not good to use more than 2 streams, though.

Flag --caffe2_rnn_executor can be used to switch the executor off.

Reviewed By: salexspb

Differential Revision: D5749304

fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c
2017-09-06 12:26:30 -07:00
Yangqing Jia
65112f3865 code cleanup: separate the several net implementations to separate files.
Summary: TSIA.

Reviewed By: harouwu

Differential Revision: D5670906

fbshipit-source-id: 507e789978144341bf696fb20dc11f3c2d55493b
2017-08-21 22:07:48 -07:00
Aapo Kyrola
a53192e334 Revert D5001637: [Caffe2][RNN] Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This reverts commit 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8

bypass-lint

Differential Revision: D5001637

fbshipit-source-id: 4d6250ae7e66ea0aa635a68d943d552e5db65b69
2017-08-16 03:21:49 -07:00
Aapo Kyrola
453c60ce28 Threaded dependency-aware RNNExecutor (frontier/diagonal execution).
Summary:
This diff adds dependency-aware concurrent/parallel execution of operators in stepnets. For CPU, we use multi-threaded execution. For CUDA, we use multiple streams and cuda events for parallelism and dependency tracking.

Much of the diff is about computing dependency graph, which was quite tricky because we need to also avoid write-races of multiple operators running in multiple timesteps in parallel. Also, recurrent blobs "change name" when passing over timestep ("_prev"), so that needs to be handled as well.

This diff also restores the link-ops that I unlanded earlier.

The performance gain of this diff is very good for CPU (same perf as with static_dag, even better on forward-only). On CUDA, the gains are modest, at least with the sizes i was testing with.

Reviewed By: salexspb

Differential Revision: D5001637

fbshipit-source-id: 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8
2017-08-15 23:55:15 -07:00
Andrew Tulloch
ac3a1328d5 Remove unnecesary .proto files
Reviewed By: Yangqing

Differential Revision: D5577595

fbshipit-source-id: cd234893a1be3807aca3195bb29aab7ecfee2d8a
2017-08-08 07:17:07 -07:00
Victor Gao
34be12353b comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 15:14:43 -07:00
Junjie Bai
44790697c7 Nuke arg_helper() in OperatorBase
Reviewed By: akyrola

Differential Revision: D5449624

fbshipit-source-id: 20ff6568fe3482af94d1d266e9b47a1709b5004e
2017-07-19 13:52:39 -07:00
Junjie Bai
5881aa0a78 Use shared_ptr to share OperatorDef across threads
Reviewed By: akyrola

Differential Revision: D5434291

fbshipit-source-id: 89f470d1e2dcde36c3273d86565b1952d7682808
2017-07-17 23:49:59 -07:00
Dmytro Dzhulgakov
f8089c789c One more proto_utils.h fix
Reviewed By: ajtulloch

Differential Revision: D5380322

fbshipit-source-id: b1aa445984bf87feb81dcf08f782f48777d359c5
2017-07-07 02:47:50 -07:00
Dmytro Dzhulgakov
87730360d1 Small improvements to CreateOperatorDef
Summary:
- allow initializer lists directly with `vector<string>{}` part thanks do default initialization
- reduce the number of instances

Reviewed By: nicolasvasilache

Differential Revision: D5370056

fbshipit-source-id: b8fae3b12144257644e098b284df7369d5bdb377
2017-07-05 11:50:01 -07:00
Fei Sun
39fa092a13 Constant string is generated from Protobuf instead of Thrift
Summary: To make the predictor open souorce, move the constants that are generated from Thrift to Protobuf.

Reviewed By: salexspb

Differential Revision: D4656884

fbshipit-source-id: d4dbb3416e8396185e0981fcd9a090fbb054a18a
2017-04-04 15:03:39 -07:00
Alexander Sidorov
fe9a243b83 Add default value for GetRepeatedField
Summary:
This is just by analogue with GetSingleArgument which already
has a default_value support

Reviewed By: Yangqing

Differential Revision: D4819789

fbshipit-source-id: cf271d9f345f14f3e373186365726c738c1c26f3
2017-04-03 12:04:22 -07:00
Dmytro Dzhulgakov
8a35fea9eb Improve error message for not found operator
Summary: Seems like a lot of confusion in the group lately has been about missing CUDA operators. Let's make it clearer in the error message.

Reviewed By: azzolini

Differential Revision: D4737037

fbshipit-source-id: 56c7819df909bf954510296703bff5f221fa8ae7
2017-03-21 10:35:00 -07:00
Yangqing Jia
204867a884 in lite mode, return the non-readable string, better than nothing.
Summary: TSIA

Reviewed By: bwasti

Differential Revision: D4379950

fbshipit-source-id: 8a5d0b5454c2f1b874526f4393c4b575966bc889
2017-01-17 11:59:30 -08:00
Dmytro Dzhulgakov
4de888e167 Add optional gradient on weights for (Sparse)LengthsWeightedSum
Summary:
It ended up much messier than originally expected. Maybe we should have just hardcode it, but I've tried to be "generic" so far at expense of code readability.

The main issue is that for weights computation we need access to original embedding matrix and in sparse case we need to relookup the embeddings to do the dot product with output grads.

Thus I'm making weight grad computation optional, controlled by a flag and it triggers invocation of a different backward op that produces both grads at the same time.

So far it's implemented only for 'Lengths' version. It'd be straightforward to implement (Un)SortedSegment versions but I haven't done that yet.

Reviewed By: kennyhorror

Differential Revision: D4388215

fbshipit-source-id: 23132ab7daa1f5eec49233f802af1fe75b469c2b
2017-01-11 11:44:38 -08:00
Aapo Kyrola
95b3309a87 Gradient Input memory sharing using memonger blob sharing
Summary:
This diff brings us to roughly par with Torch on ResNet memory usage. On batch size 32, Resnet-50 took 7497MiB, after this 5010 MiB. This will thus allow us to handle 64 images / GPU, or 256 images / 4 GPUs.

In addition, I added a special argument to DagNet that causes it to run only one thread for the first iteration. This is needed since there are allocations on the first iteration's backward pass due to gradient sharing, and this will cause NCCL to deadlock.

The sharing of gradient buffers requires inferring which gradients can share memory (i.e that they are not used concurrently). Previous memonger code uses topological sort, but rbgirshick showed that it does not work with tree-like models. Thus, I wrote a new optimization algorithm based on DFS. It takes about 0.25 secs / GPU on resnet-50, so is clearly fast enough.

Module data_parallel_model supports this feature natively.

Reviewed By: prigoyal

Differential Revision: D4363209

fbshipit-source-id: 73b11e7610438098bb11bff0af8075ab0cf2c0f1
2017-01-09 19:44:23 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Yangqing Jia
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
Yangqing Jia
3d54e7b40e fbsync: changes to implement operator schema 2016-09-08 18:07:01 -07:00
Yangqing Jia
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
Yangqing Jia
559053d3a8 chunky sync 2016-05-13 14:43:48 -07:00
Yangqing Jia
05465783c6 optionally use protobuf lite 2015-12-05 16:15:00 -08:00
Yangqing Jia
a71667859f I thought I removed this. Maybe on another machine? 2015-11-25 10:16:37 -08:00
Yangqing Jia
a74d606df7 A collection of changes:
(1) Registry now uses std::function for more flexible use cases.
(2) dropout adds an "is_test" keyword.
(3) Making all gradient registered via C++. Python still provides gradient wrapper.

TODO item is to make the autograd SSA in C++ if possible. Problem is if we want to dynamically
register python gradients we will be sort of screwed because in c++ things are registered
via static variables.
2015-11-07 16:12:18 -08:00
Yangqing Jia
648d1b101a A consolidation of a couple random weekend work.
(1) various bugfixes.
(2) Tensor is now a class independent from its data type. This allows us
    to write easier type-independent operators.
(3) code convention changes a bit: dtype -> T, Tensor<*Context> -> Tensor* alias.
(4) ParallelNet -> DAGNet to be more consistent with what it does.
(5) Caffe's own flags library instead of gflags.
(6) Caffe's own logging library instead of glog, but glog can be chosen with
    compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros
    like CHECK, DCHECK now have prefix CAFFE_, and LOG(*) now becomes
    CAFFE_LOG_*.
(7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF
    in build_env.py.
2015-10-11 23:14:06 -07:00
Yangqing Jia
a07c255d16 Some utility function changes 2015-07-29 09:21:02 -07:00
Yangqing Jia
2ed1077a83 A clean init for Caffe2, removing my earlier hacky
commits.
2015-06-25 16:26:01 -07:00