I noticed that a lot of bugs are being suppressed by torchdynamo's default
error suppression, and worse yet, there's no way to unsuppress them. After
discussion with voz and soumith, we decided that we will unify error suppression
into a single option (suppress_errors) and default suppression to False.
If your model used to work and no longer works, try TORCHDYNAMO_SUPPRESS_ERRORS=1
to bring back the old suppression behavior.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87440
Approved by: https://github.com/voznesenskym, https://github.com/albanD
It seems like when popen.communicate() is used it waits for all the
desendents of popen to close the stdin/stderr. However, if we have
have worker processes running in the child, and the child segfaults,
those processes will stay alive until someone waitpid's the child.
Since those children have open handles to the stdin/stderr pipe,
communicate never returns.
This change just writes the output to temp files and directly calls
wait() on the child, which returns as soon as it dies.
cc @jansel @lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87335
Approved by: https://github.com/anijain2305, https://github.com/voznesenskym
Fixes https://github.com/pytorch/torchdynamo/issues/1690
This fixes the error seen in the minifiers. But does not repro the original issue that prompted the above issue.
Fx minifiers work at the level of Fx-graphs, and the original issue lies outside of the Fx graph and is only visible on the second iteration. Therefore, the original issue escapes the abstraction of our existing Fx-based minifiers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87062
Approved by: https://github.com/eellison