pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Giuseppe Ottaviano	69bb0e0285	[caffe2] Avoid some double (and triple) lookups in workspace (#53319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53319 Noticed these in profiles. Also switch to `unordered_map`. Test Plan: Unit tests. Reviewed By: swolchok Differential Revision: D26504408 fbshipit-source-id: 9e14d55909a4af019058b8c27c67ee2348cd02a9	2021-03-04 22:57:02 -08:00
Ilia Cherniavskii	01986e9890	Wait for all op types in SimpleNet (#39493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39493 Make sure we wait for all types, incl. async cpu ops Test Plan: CI Reviewed By: kennyhorror Differential Revision: D21873540 fbshipit-source-id: 37875cade68e1b3323086833f8d4db79362a68e8	2020-06-11 13:00:34 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Edward Yang	91797c0672	Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946 ``` codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h ``` Reviewed By: houseroad Differential Revision: D9539945 fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e	2018-08-28 11:57:08 -07:00
Andrei Maximov	432b3adffc	Print blob sizes on fatal signal (#10766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766 Added a `Workspace::ForEach(...)` API for accessing the global set of existing Workspace instances. This is used in the signal handler to print blob info on the thread receiving a fatal signal. Reviewed By: mraway Differential Revision: D9147768 fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4	2018-08-23 13:39:55 -07:00
Dmytro Dzhulgakov	7bc87172ea	Kill Tensor::shares_data (#10217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10217 It's only used in debug printing and is not that reliable anyway. If we want to implement it later - we should do it proper accounting for shared storages. Reviewed By: jerryzh168 Differential Revision: D9155685 fbshipit-source-id: 48320d41a0c4155645f3ba622ef88730a4567895	2018-08-03 17:40:39 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Marat Dukhan	224493d9ce	NNPACK: Use new bindings and custom thread pool Summary: This change should dramatically (~10X) improve performance of convolution with NNPACK engine Closes https://github.com/caffe2/caffe2/pull/1730 Reviewed By: sf-wind Differential Revision: D6695895 Pulled By: Maratyszcza fbshipit-source-id: 26291916811ef4cb819a59aec848c4e23668e568	2018-01-11 10:48:12 -08:00
Ilia Cherniavskii	d28720b90a	Backpropagation for While op Summary: Adds support for backprop to While op, fixes gradient computation for Pow Reviewed By: azzolini Differential Revision: D6456875 fbshipit-source-id: 9f660317ad6f3898ff7d8ce43098f85c3426409b	2017-12-18 16:03:45 -08:00
Alexander Sidorov	bfdd864631	Automatically pretranspose FCs in BlackBoxPredictor Summary: pretransposing FCs seems to offset loses we get from low batch sizes in AdIndexer. First I confirmed this on local benchmarks (see previous diff). Then in https://fburl.com/yuo49onj I showed how this change saves 19% of FC time on AdIndexer. Which is already $0.4M in cap. exp. and over 3 years gives 5x more ROI. We also we reuse this code for later more efficient gemm implementations. I.e. msmelyan is working on new fp16 gemm which would cut bandwidth usage 2x. We can reuse code in this diff for repacking required by a new gemm. In this diff I had to take care of memory usage. Here are several possible approaches to the transformation: 1. Perform on the fly, copy the memory. This is what is done in skinny gemm (FC with engine SKINNY) Cons: slow first execution, memory is replicated for each thread 2. Perform copy of weights in operator constructor. On the fly in dbg mode verify that hash on original weight is the same Cons: memory is still replicated for each thread 3. Perform copy weights in Predictor constructor Cons: if we have 2 predictors sharing the same weight blob (via PredictorContainer), we still get 3x more memory. I.e. original weights and two copies for each of the predictors in a container 4. Replace weights in Predictor constructor, take care of mapping to support weight sharing within a Predictor container This is the approach taken in this diff, it solves issues above and doensn't create any memory overhead. Cons: Logic became complex, requires a mutex at initialization time Reviewed By: akyrola Differential Revision: D6214593 fbshipit-source-id: 25da6ba7bfd39fc8f4b578094d3f334c7957490d	2017-11-09 17:35:32 -08:00
Hassan Eslami	5388948b59	CreateLocalBlob for workspace Summary: Adds the ability to create a local blob in the workspace even if the blob exists in the parent workspace. This is to support cases where a user wants to create a local copy of the blob and hide the blob from the parent workspace. Reviewed By: akyrola Differential Revision: D6194386 fbshipit-source-id: 92c064159ac635ee76c211abc013b72bd8752447	2017-11-01 21:32:47 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Ilia Cherniavskii	f8f5e79f5f	Backpropagation for If operator Summary: Adding backward pass support for If operator: - Implemented necessary changes to Do operator and generation of gradient Do operator to properly forward gradient blobs in and out of subnet - Using WorkspaceManager to keep track of workspaces used by Do, in case we need to have access to local blobs to compute gradients (also important for loop's backprop) - Update to Workspace to handle blob binding from multiple parent workspaces - Implemented generation of gradient If operator - Unit test to build and train a net with If control op Reviewed By: azzolini Differential Revision: D5745096 fbshipit-source-id: 1023c90a2113716254424d1e50b9e560fe9083e5	2017-09-18 16:17:42 -07:00
Ilia Cherniavskii	67a55b81e3	Forward blobs into workspace Summary: Better isolation for workspaces to allow forwarding selected blobs from parent to child workspace, possibly under new names. Used for proper isolation of subnets (loops, then/else branhes, etc) from outer workspace. Reviewed By: azzolini Differential Revision: D5681667 fbshipit-source-id: e61a2c7c98ee2abf1f0761905f4bfae47c201c32	2017-08-22 18:45:56 -07:00
Jon Morton	9349dab8a0	Full sync of fbcode to fbobjc/fbandroid Summary: running ##xplat/caffe2/fb_sync.sh##. Also add two new core sources to the BUCK file, and add ##createSharedBuffer## to NNPACKConvOp. Reviewed By: ajtulloch Differential Revision: D5373061 fbshipit-source-id: c030b2629d2715e1d2776c98715f57e2650922c9	2017-07-31 17:38:38 -07:00
Jon Morton	9b9df3fbeb	Sync mobile codebase changes back to fbcode Summary: Rather chunky sync of changes made exclusively to mobile codebases back to fbcode. Reviewed By: ajtulloch Differential Revision: D5314405 fbshipit-source-id: c4d0a7244468f953eb63288306bc9bc78eb9e1be	2017-07-18 17:54:41 -07:00
Junjie Bai	5881aa0a78	Use shared_ptr to share OperatorDef across threads Reviewed By: akyrola Differential Revision: D5434291 fbshipit-source-id: 89f470d1e2dcde36c3273d86565b1952d7682808	2017-07-17 23:49:59 -07:00
Aapo Kyrola	d43b42fb37	allow querying tensor device + tool to validate that all ops have tensors from correct devices (GPUs) Summary: Quite common, hard-to-debug, performance bug for multi-GPU training has been that operators have been passed tensors that reside on different GPU than what the op runs on. Since we have peer access enabled, this works, but is just much slower. With data parallel model this problem arises rarely as it has static analysis of the operators, but if someone bypassed DPM or uses FeedBlob with incorrect device options, this problem can happen. To make debugging easier, I added device-field to tensor that stores the device information that allocated the memory. In addition, I added a function to go through operator inputs and outputs and compare their tensor device to the operator device. This check is run after first iteration with prof_dag only. Also renamed ShapeCall to TensorInfoFun, as it now returns so much other info than the shape. I think this is pretty safe diff, but do you find it problematic to add a new field to tensor? Reviewed By: dzhulgakov Differential Revision: D5335505 fbshipit-source-id: 511b6c122dff9a205f43951984868ffd40f7ac30	2017-07-01 09:16:37 -07:00
Alisson Gusatti Azzolini	db1d62caf7	Move RunPlan to a separate file Summary: This RunPlan is getting complex and confusing. The first step to clean it up is to move it out of workspace.cc to better mark separation of concerns. Reviewed By: kennyhorror Differential Revision: D5100721 fbshipit-source-id: 4be0559eba1abb8bb1ddc3818698763c2e014ef2	2017-05-24 11:07:15 -07:00
Aapo Kyrola	c86610b738	special executor class for RecurrentNetworks (just single threaded now) Summary: This is preamble for the "diagonal executor". Instead of creating a Net for each timestep, we have a single executor for the RecurrentNetworkOp that manages ops per timestep. This will be used if net_type='rnn', so one can still use the old way by using a net type of 'simple' or 'dag' (so there is effective kill-switch if there are some issues with this). Did this only for the forward-model. Gradient op will follow later on, but it is basically similar, just reverse order. Reviewed By: salexspb Differential Revision: D4979933 fbshipit-source-id: bda77918ec518cb6b29d7021ee036d59eb2dd303	2017-05-01 19:06:25 -07:00
Yangqing Jia	cf317d1106	create_net: explicitly specify if one wants to overwrite the network. Summary: This is from discussion with dzhulgakov : as a step towards revisiting the core.Net autonaming, we will first guard against accidental overwrites of existing networks in the workspace. ajtulloch since we are doing Predictors in mobile, this should be safe right? azzolini - I assume this would be safe, but would love to get your approval. akyrola - would this hurt xray? Reviewed By: dzhulgakov Differential Revision: D4897725 fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5	2017-04-17 21:46:53 -07:00
Aapo Kyrola	a2065f3c1e	report capacity bytes as part of workspace blob stats Summary: Instead of reporting the number of total elements of tensor, report the number of bytes. But report the capacity of the tensor, not the current number of bytes. Reviewed By: jamesr66a, salexspb Differential Revision: D4851633 fbshipit-source-id: 464d552f41f1b5f25753b0e7001d299b6dac1966	2017-04-07 19:16:37 -07:00
Aapo Kyrola	ffd298376a	option to print tensor shapes at exit Summary: Added Caffe2 cmd line option --caffe2_print_blob_sizes_at_exit=1, that when enabled, will print all tensor sizes at the workspace destructor. Handy especially when using sub-workspaces like with RNNs. Note that the sizes are number of elements, not bytes. Output is designed to be easily excel-copypasteable. TODO: add sorting Reviewed By: jamesr66a Differential Revision: D4844628 fbshipit-source-id: 11608a1710ae5c89bbd741edb506d25496606185	2017-04-06 21:36:04 -07:00
Ou Jin	eeb7279020	compile execution step Summary: When the execution step is representing things like: for loop execution_step net1 execution_step net2 net3 the preparation cost for execution step is too high. This diff moves most of the shared information in the CompiledExecutionStep to save time. After the change the benchmark result for parameter server handler is as following: (be aware that the first two have some variance) INFO:__main__:==Summary== INFO:__main__:Time <function case_if at 0x7f7160c32938> 0.0752924203873 INFO:__main__:Time <function case_loop at 0x7f7160c329b0> 0.0677666187286 INFO:__main__:Time <function case_simple_net at 0x7f7160c32a28> 0.0605396509171 INFO:__main__:Time <function case_one_loop at 0x7f7160c32aa0> 0.0611681699753 Before the change: INFO:main:==Summary== INFO:main:Time <function case_if at 0x7f19d079f848> 0.100815701485 INFO:main:Time <function case_loop at 0x7f19d079f8c0> 0.0864136457443 INFO:main:Time <function case_simple_net at 0x7f19d079f938> 0.0614696979523 INFO:main:Time <function case_one_loop at 0x7f19d079f9b0> 0.0598972082138 Reviewed By: azzolini Differential Revision: D4643926 fbshipit-source-id: 5a4b97230ba778e0ff5cbafc8a216335a191068a	2017-03-08 23:49:41 -08:00
Alisson Gusatti Azzolini	8fa156d082	Improve "reporter net" design Summary: Previously we had several limitations for a reporter net: - needed to be a net, not an execution step - only one allowed per execution step, with a single interval Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts. Reviewed By: dzhulgakov Differential Revision: D4583686 fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d	2017-02-21 20:17:40 -08:00
Alexander Sidorov	8bff8014b3	print out inputs in lstm test to catch when it is fluky Summary: We get fluky lstm tests on a numerical gradient check. I would like to improve accuracy of the latter. But first need an example. After lading this TestWarden would find a bad input for me. Reviewed By: urikz Differential Revision: D4467223 fbshipit-source-id: 68d4bf22af11190f39fa28332c6d99efbb192132	2017-01-25 20:59:21 -08:00
Liang Xiong	1aafeb3565	clean up memory of c2/sigrid predictor Summary: trying to optimize c2 predictor memory usage. mainly to remove unsed dbreader and dper metadata. Differential Revision: D4232595 fbshipit-source-id: dcd7aa7dd09587ec9811a9e5ec725e0c22757665	2016-11-29 15:18:39 -08:00
Jeff Johnson	da7add3da8	Better threadpool sizing heuristics Summary: The old heuristic functioned badly on octa-core phones (e.g., the S6). Limiting the number of threads to 4 in the 8 core case seemed to give optimum performance. For 4 cores, 3 threads still seems to yield best performance, as does 2 threads for 2 cores in the iOS phones, though those cores are very different than the typical ARM cores in Android phones. I figure at the limit, we should limit ourselves to half the cores available, especially since in a big.LITTLE configuration, only half the cores are likely to be big. I need to get my hands on a deca-core phone or tablet to try out this heuristic, but I certainly figure that this will function better than what we had before (which would be 9 threads on a 10 core device). Reviewed By: ajtulloch Differential Revision: D4220341 fbshipit-source-id: 06fa7677789fcdbec03d98bb85a565f1d22099e1	2016-11-29 15:18:37 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	d1e9215184	fbsync	2016-10-07 13:08:53 -07:00
Yangqing Jia	b23e51d467	chunky sync	2016-09-06 15:55:19 -07:00
Yangqing Jia	05512d1e10	sync	2016-08-10 11:02:15 -07:00
Yangqing Jia	bcea409c82	sync	2016-07-28 15:06:43 -07:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00
Yangqing Jia	559053d3a8	chunky sync	2016-05-13 14:43:48 -07:00
Yangqing Jia	4f2530d8ce	expose benchmark code to python	2015-12-15 20:42:54 -08:00
Yangqing Jia	73f3daf736	minor bugfix for workspace	2015-12-13 08:37:36 -08:00
Yangqing Jia	a3dcd9250a	bugfix	2015-11-10 23:11:05 -08:00
Yangqing Jia	d734ddc196	Adding optional Eigen code. Added a switch USE_SYSTEM_EIGEN in Env. Misc changes.	2015-10-18 16:55:24 -07:00
Yangqing Jia	648d1b101a	A consolidation of a couple random weekend work. (1) various bugfixes. (2) Tensor is now a class independent from its data type. This allows us to write easier type-independent operators. (3) code convention changes a bit: dtype -> T, Tensor<Context> -> Tensor alias. (4) ParallelNet -> DAGNet to be more consistent with what it does. (5) Caffe's own flags library instead of gflags. (6) Caffe's own logging library instead of glog, but glog can be chosen with compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros like CHECK, DCHECK now have prefix CAFFE_, and LOG() now becomes CAFFE_LOG_. (7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF in build_env.py.	2015-10-11 23:14:06 -07:00
Yangqing Jia	036229c889	[style] Finishing name changes for the rest of the fields in the protobuf.	2015-07-01 18:16:43 -07:00
Yangqing Jia	2ed1077a83	A clean init for Caffe2, removing my earlier hacky commits.	2015-06-25 16:26:01 -07:00

45 Commits