Summary:
The NSString writeToFile:atomically: method was deprecated in iOS 2.0.
This diff replaces it with a call to writeToFile:atomically:encoding:error:
duplicate of D51003188 to fix gh permissions
Test Plan: ci
Differential Revision: D51164941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113377
Approved by: https://github.com/kirklandsign
Summary: This can help debug issues esp fc/bc issues with coreml tools, when a model fails to load.
Test Plan:
On a macbook fbsource,
```
arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple --auto-test-schemes --force-with-wrong-xcode
```
It builds and runs the Playground app using a bunch of coreml models on my iPhone. Here is one for example,
https://pxl.cl/3nSPn
Also forcefully triggering MLModel ctor failure to test this code by setting a `modelURL=nil`, and as expected got this,
```
libc++abi: terminating due to uncaught exception of type c10::Error: Error loading MLModel Error details: Localized_description: nil value for URL Domain: com.apple.CoreML Code: 3 User Info: {
NSLocalizedDescription = "nil value for URL";
} Input Shapes: N/A
Exception raised from compile at xplat/caffe2/torch/csrc/jit/backends/coreml/objc/PTMCoreMLBackend.mm:162 (most recent call first):
(no backtrace available)
```
Instead of a previous message would have been,
```
Loading MLModel failed
```
Unrelated issues
* P829736691 - with running MaskRCNN on Coreml with the Playground app. Only happens some times.
* P829741377 - with Metal Operator Tests with the Playground app.
Differential Revision: D49349726
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109444
Approved by: https://github.com/kimishpatel
This PR enables `-Winconsistent-missing-destructor-override` and `-Winconsistent-missing-override`
and fixes violations.
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 47e904e</samp>
This pull request updates the code of various classes and operators in the `caffe2` and `aten` subdirectories to use the `override` specifier instead of the `virtual` keyword for destructors and other virtual functions that override a base class function. This improves the code readability, quality, and consistency with C++ best practices. It also modifies the `./CMakeLists.txt` file to enable warnings for these specifiers, but disable errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104032
Approved by: https://github.com/malfet
Summary:
https://www.internalfb.com/logview/details/instagram_ios_crashes/d5fd49a99f3ee21a82b66861de797711
CoreML is crashing in torch::jit::mobile::coreml::CoreMLBackend::compile(c10::IValue, c10::Dict<c10::IValue, c10::IValue>) (PTMCoreMLBackend.mm<175>)
This is related to the crash here https://www.internalfb.com/logview/details/instagram_ios_crashes/a8a317c8da13cd577529e1763364f496/?trace_key=8002f84f5ea00ac68b0dfb91878c754a&selected-logview-tab=shared
kimishpatel's original fix here D44386623 by passing modelID by value instead of reference, however I believe it just moved the error to loadModel invocation.
When we create a copy of modelID on loadModel invocation, it is a reference to the string within the preprocessed IValue payload. When the payload is deallocated, modelID is no longer valid and the dispatched thread still tries to use it causing the error
Test Plan:
```
Running with tpx session id: 2a77b7b1-7594-4479-8ac3-c01db29cf5cc
Trace available for this run at /tmp/tpx-20230407-173155.849234-2a77b7b1-7594-4479-8ac3-c01db29cf5cc/trace.log
RemoteExecution session id: reSessionID-2a77b7b1-7594-4479-8ac3-c01db29cf5cc-tpx
I0407 17:31:55.970502 780835 ConfigeratorDomainConfigs.cpp:177] Notify user with updated size: 92 removed size: 0
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1970325002807752
✓ ListingSuccess: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests : 13 tests discovered (0.177)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchBITests/testBITextModel (0.028)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchBITests/testBIXRayModel (0.167)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCPUBlasTests/testGemmComplexDouble (0.001)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCPUBlasTests/testGemmComplexFloat (0.001)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCPUBlasTests/testGemmDouble (0.001)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCPUBlasTests/testGemmFloat (0.001)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCoreMLTests/testGanModel (0.303)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCoreMLTests/testMCSModel (0.395)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCoreMLTests/testMCSModelInvalidInputShape (0.305)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchCoreMLTests/testXirpModel (0.110)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchDynamicPyTorchTests/testDynamicPytorchFamFlDictModel (0.014)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchDynamicPyTorchTests/testDynamicPytorchFamFlModel (0.005)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - PyTorchDynamicPyTorchTests/testDynamicPyTorchXirpModel (0.065)
✓ Pass: //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests - main (13.177)
```
Differential Revision: D44808433
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98655
Approved by: https://github.com/SS-JIA, https://github.com/tiandiao123, https://github.com/kirklandsign
Summary:
We don't want to load when loading model on Core ML and `at::empty` is considered an op.
So replace it with from_blob.
Test Plan:
Run Core ML backend to ensure it works for existing use cases.
Also test running Core ML backend without any ops.
Differential Revision: D43961679
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96564
Approved by: https://github.com/f-meloni, https://github.com/kimishpatel
Summary:
When performing inference using the Core ML delegate, memory is increasing indefinitely. This is due to Core ML allocating memory within `predictionFromFeatures:error:`. Seems that the autorelease pool does not release the return values from the prediction method until inference is stopped completely. So we need to release with `autoreleasepool` manually ([per Apple guidance in the Apple Developer Forums](https://developer.apple.com/forums/thread/692425)).
This commit wraps `autoreleasepool` around the `execute` function of `PTMCoreMLBackend`, which is the scope of where the return values of `predictionFromFeatures:error:` are. Also added in `PTMCoreMLExecutor` for good measure.
Differential Revision: D43520767
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95384
Approved by: https://github.com/mcr229
Summary: This change adds input shape when CoreML throws an errors.
Test Plan: testMCSModelInvalidInputShape tests that the assert throws when invalid input shapes are provided.
Differential Revision: D43449112
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95249
Approved by: https://github.com/mcr229
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
Handling constant data for xnnpack delegation. This allows us to handle new modules like such:
```
class Module(torch.nn.Module):
def __init__(self):
super().__init__()
self._constant = torch.ones(4, 4, 4)
def forward(self, x):
return x + self._constant
```
this is the precursor work to handling convolution, as we need to serialize constant data(weights)
Differential Revision: [D41050349](https://our.internmc.facebook.com/intern/diff/D41050349/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89445
Approved by: https://github.com/digantdesai
Here we pass XNNExecutor* to compile model so that XNNExecutor can be allocated by runtime. This signature change is for executorch:
```
XNNExecutor compileModel(void* buffer) --> void compileModel(void* buffer, XNNExecutor* executor)
```
The intended usecase for allocating Executor and Compiling the serialized flatbuffer:
```
XNNExecutor* executor = runtime_allocator->allocateList<jit::xnnpack::delegate::XNNExecutor>(1);
XNNCompiler::compileModel(processed.buffer, executor);
```
Differential Revision: [D41208387](https://our.internmc.facebook.com/intern/diff/D41208387/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89090
Approved by: https://github.com/digantdesai
As title, add three things to the schema
1. debug handle for each node
2. file identifier, so we can sanity check we are getting the xnnpack schema flatbuffers file, instead of other random binary
3. extension, so the dumped binary will end up with its own extension like `myschema.xnnpack` (maybe can have a better name) instead of the default extension `.bin`
Differential Revision: [D40906970](https://our.internmc.facebook.com/intern/diff/D40906970/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89033
Approved by: https://github.com/mcr229
This is the on-device runtime work. We modify the compile and execute from our hacky solution from before to what will actually be running at runtime.
First we rebuild our graph from the serialized flatbuffer string. We also introduce a runtime wrapper that inherits CustomClassHolder that allows us to forward along the built xnngraph runtime to our execute function
Once the subgraph object has been rebuilt by our we pass it along to the runtime wrapper for us to forward along to execute
At execute we prep the input/outputs and invoke the runtime using our runtime wrapper. Finally we forward those results to our execution
Differential Revision: [D39413031](https://our.internmc.facebook.com/intern/diff/D39413031/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39413031/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88780
Approved by: https://github.com/digantdesai
# Executor Class
Executor object used to wrap our xnn_runtime object. The ideal flow of this object looks as such:
```
executor.set_inputs(vector<tensor> inputs, vector<tensor> outputs)
executor.forward()
```
This will likely be returned by our delegate compile and given over to execute in order to run inference using the xnn runtime
##### Executorch Considerations
```
#include <ATen/Functions.h>
#include <ATen/Utils.h>
```
These Aten functions are included in order to use at::Tensor when setting the inputs, this will change when used for Executorch because we will be switching from at::Tensor to whatever tensor abstraction is used for ET. Seems like they have the same call for `.data_ptr<float>()`, so realistically all logic here will be the same.
ATen/Utils is used for TORCH_CHECK. We will switch to ET_CHECK_MESSAGE for executorch.
Differential Revision: [D40733121](https://our.internmc.facebook.com/intern/diff/D40733121/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88778
Approved by: https://github.com/digantdesai
Summary:
https://www.internalfb.com/code/fbsource/[c0e4da0b5c7fff3b4e31e4611033c30cabdc6aef]/fbcode/caffe2/torch/csrc/jit/backends/backend_detail.cpp?lines=268-276
seems like the torchscript addition of
`$unpack, = self.__backend.execute( ... `
the comma after unpack forces the result of execute to have only one item. So for this fix now when the size of the outputs > 1, execute returns a List List of outputs (basically put the outputs in another list before putting it into the list we return)
```
[[output1, output2, output3, ...]]
```
instead of
```
[output1, output2, output3, ...]
```
Do we want to fix this in backend_detail? Or should we make the change in our delegate to accomadate the torchscript? Proposing this q here. Requesting cccclai, kimishpatel for approval here
Test Plan: unblocked models for chengxiangyin and models in pytorch playground all passing unit tests
Reviewed By: kimishpatel, cccclai
Differential Revision: D40328684
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88345
Approved by: https://github.com/jmdetloff, https://github.com/Skylion007
We introduced the serializer we created in the previous diff to our XNNGraph builder, the purpose of this is to serialize parts of the graph as we build this. At the end, we are able to finish and serialize the xnngraph into a std::string for use when we forward this along to on-device runtime.
The next diff will rebuild the xnngraph from the serialization we introduce here, so testing the serialization of the graph will be done in the next diff
Differential Revision: [D39335580](https://our.internmc.facebook.com/intern/diff/D39335580/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39335580/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87908
Approved by: https://github.com/digantdesai
This point we perform conversion for Torchscript IR to XNNPack graph. Currently we only support converting Add Nodes and fp32 tensor values.
As a caveat, we are not building this at runtime. So for testing we just run the xnn graph once ahead of time and with sample inputs and forward it to execute. This is only for testing, and will be changed in a later diff. This will allow us to check that graph creation is sound.
Differential Revision: [D39838851](https://our.internmc.facebook.com/intern/diff/D39838851/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87824
Approved by: https://github.com/digantdesai, https://github.com/salilsdesai
Beginning of building the xnnpack graph from the torchscript IR. We first massage the torchscript graph using a few graph passes that perform things such as unused self argument removal and constant propagation.
This also performs tracing for us so that the model does not have to be prepped by tracing before being lowered by us.
The other check we perform is through the torchscript IR to identify any nodes that are not lowerable/supported, and throwing an error to spit out the specific nodes that are not lowerable.
Differential Revision: [D39838338](https://our.internmc.facebook.com/intern/diff/D39838338/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39838338/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87128
Approved by: https://github.com/salilsdesai
Summary: It turns out disk cache space is more limited than I realized - Instagram starts evicting cached items at 10mb. We don't actually need to cache the model specs, once the model is compiled all we need is the compiled model. With this diff, after model compilation succeeds we cleanup the model specs from disk.
Test Plan: Delete instagram from device to ensure an empty cache, build, launch camera, open a MCS or Segmentation effect, confirm it loads and works correctly. Restart the app and launch again, to confirm it can load the compiled model from cache as well.
Differential Revision: D39562009
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85136
Approved by: https://github.com/kimishpatel
Summary:
Catch the error and throw an exception with PTMCoreMLBackend when inference fails. This way the error description will be available in the logged crash, as opposed to crashing with a less descriptive exception.
I'll be drafting follow up diffs to actually catch exceptions in the segmentation shim next. Ideally we would fail inference gracefully and not crash at all, but at least after this diff we'll have the full diagnostic info
Test Plan: Force an error, and confirm its description appears in the exception thrown via the console
Differential Revision: D39407865
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84804
Approved by: https://github.com/mcr229
Summary:
Delegated CoreML models with cpuAndGPU flag set does not properly run models on CPU
- Fix will allow us to target models on CPU
Test Plan: brymkowski can you test this on your performance benchmarks?
Reviewed By: salilsdesai
Differential Revision: D39361382
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84725
Approved by: https://github.com/jmdetloff
Summary: We added this observer to help us diagnose memory issues that have since resolved. It should be safe to clean this up.
Test Plan: Diff just removed logging, so just build IG and confirm no errors.
Differential Revision: D38843701
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83703
Approved by: https://github.com/mcr229
Summary: Ensure that all models are recompiled when OS updates, instead of just the first model loaded after the OS update. To do this we need to save the compilation os version for each model.
Test Plan: Build and run IG, launch a segmentation effect, confirm it renders correctly. To test the OS upgrade flow you could manually change '[UIDevice currentDevice].systemVersion' to a different string and launch again, and confirm you can still access effects.
Differential Revision: D38361641
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82661
Approved by: https://github.com/mcr229
### Description
The original intention of this code was to log whenever inference fails, not whenever it succeeds. This change updates the logic of should_log to match this intention.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82604
Approved by: https://github.com/SS-JIA
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.
The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.
The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features. I'm open to suggestions
for how to structure the features better. The main changes:
- Added an --allowlist-pattern flag, which turns off the grep lint
if some other line exists. This is used to stop the grep
lint from complaining about pybind11 includes if the util
include already exists.
- Added --match-first-only flag, which lets grep only match against
the first matching line. This is because, even if there are multiple
includes that are problematic, I only need to fix one of them.
We don't /really/ need this, but when I was running lintrunner -a
to fixup the preexisting codebase it was annoying without this,
as the lintrunner overall driver fails if there are multiple edits
on the same file.
I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.
Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.
See also https://github.com/pybind/pybind11/issues/4099
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
Summary:
MLModel loads much faster when compute units are set to CPU only. It seems when loading with compute units set to all a large amount of preprocessing work is done during init.
So, in order to speed up our effect load time, load a cpu MLModel synchronously, and a configured MLModel asyncronously. When the second model finishes loading about 600 ms later, swap the models out.
So, for about half a second inference will occur on the cpu, but after that will kick over to gpu or npu.
On iPhone 12 I'm seeing a > 10x improvement in load time as recorded by RenderTimeLogger.cpp
Test Plan:
- Add an override to https://www.internalfb.com/intern/qe2/ig_ios_person_segmentation_universe to opt into the coreml segmentation model
- Launch IG camera and apply an effect that uses segmentation, such as green screen
- Confirm that segmentation works.
https://pxl.cl/277JL
Reviewed By: kimishpatel
Differential Revision: D37597965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80941
Approved by: https://github.com/mcr229, https://github.com/kimishpatel
Summary: Error messages from `TORCH_CHECK` are stripped during production builds via `-DSTRIP_ERROR_MESSAGES`. This diff introduces a new macro `COREML_CHECK` which will always preserve the error message. This macro is used when encountering errors produced by CoreML API calls so that we can heve enough context to debug.
Test Plan:
Test in pytorch playground:
```
arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2:torch_mobile_coreApple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple -a ModelRunnerDevOps -a //xplat/caffe2:torch_mobile_all_opsApple -fd --force-with-wrong-xcode
```
Differential Revision: D36378286
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77429
Approved by: https://github.com/kimishpatel
Summary:
Add an observer to `PTMCoreMLExecutor` so we can inspect OOMs in production to help with T115554493.
The behaviour of the logger is as such:
1. Each time a model is compiled, there is a chance we publish all logs to QPL. This is determined by the randomly generated `_model_load_id` and `_sample_thresh`.
2. If we are publishing all logs, then every `_sample_every` inferences will be logged via QPL.
3. Every QPL log will collect memory metrics before and after model compilation/inference
4. If memory pressure is not normal (remaining mem < 400 MB) before or after compilation/inference, then that compilation/inference will be logged to QPL no matter what.
Previous diff got reverted due to OSS CI failures. Fixed those failures in this iteration.
Test Plan:
We can test in pytorch playground and inspect the QPL logs through Flipper:
```
arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2:torch_mobile_coreApple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple -a ModelRunnerDevOps -a //xplat/caffe2:torch_mobile_all_opsApple -a coreml_memory_observer -a //xplat/perflogger:perfloggerApple -fd --force-with-wrong-xcode
```
To check results in Hive/Scuba, test in instagram:
```
arc focus2 -b igios-no-extensions -a //fbobjc/Apps/Instagram/AppLibraries/Core/QPL/IGPerformanceLogging:IGPerformanceLogging -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2:torch_mobile_coreApple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple -a //xplat/caffe2:torch_mobile_all_opsApple -a //xplat/perflogger:perfloggerApple -a coreml_memory_observerApple -c pt.enable_qpl=1 --force-with-wrong-xcode
```
Note that we need to change `_sample_thresh` to ensure logs show up.
Reviewed By: kimishpatel
Differential Revision: D35970823
fbshipit-source-id: 2cb4d73931f35a0fa7f362b9fb44b9f0a00aeb82
(cherry picked from commit 1e965acf72958fc59d0cd0b4958e54955cc1adf2)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76251
Add an observer to `PTMCoreMLExecutor` so we can inspect OOMs in production to help with T115554493.
The behaviour of the logger is as such:
1. Each time a model is compiled, there is a chance we publish all logs to QPL. This is determined by the randomly generated `_model_load_id` and `_sample_thresh`.
2. If we are publishing all logs, then every `_sample_every` inferences will be logged via QPL.
3. Every QPL log will collect memory metrics before and after model compilation/inference
4. If memory pressure is not normal (remaining mem < 400 MB) before or after compilation/inference, then that compilation/inference will be logged to QPL no matter what.
Test Plan:
We can test in pytorch playground and inspect the QPL logs through Flipper:
```
arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2:torch_mobile_coreApple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple -a ModelRunnerDevOps -a //xplat/caffe2:torch_mobile_all_opsApple -a coreml_memory_observer -a //xplat/perflogger:perfloggerApple -fd --force-with-wrong-xcode
```
To check results in Hive/Scuba, test in instagram:
```
arc focus2 -b igios-no-extensions -a //fbobjc/Apps/Instagram/AppLibraries/Core/QPL/IGPerformanceLogging:IGPerformanceLogging -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2:torch_mobile_coreApple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple -a //xplat/caffe2:torch_mobile_all_opsApple -a //xplat/perflogger:perfloggerApple -a coreml_memory_observerApple -c pt.enable_qpl=1 --force-with-wrong-xcode
```
Note that we need to change `_sample_thresh` to ensure logs show up.
Reviewed By: kimishpatel
Differential Revision: D35511873
fbshipit-source-id: 59f2fa2d021178ceab1fcf5ee94b2f15ceca32ee
(cherry picked from commit 8b8af55410ea1231693ee980c80d8a749f5ad870)