Commit Graph

374 Commits

Author SHA1 Message Date
vfdev-5
f727bed2e6 [inductor] Updated upsample_bilinear2d decomposition (#104182)
Description:
- Updated upsample_bilinear2d decomposition
  - added support for uint8 dtype support
  - code improvements
- Added uint8 dtype tests

Perf considerations:
- There is minor perf regression (speed-up ~0.7) on cases uint8, align_corners=True when output is smaller/equal (256, 256)
- For cases, when output is larger (256, 256) and input dtype uint8, nightly output is wrong, so IMO large perf regression (speed-up around ~0.2) should not be taken into account.

## Perfs benchmarks

```
[--------------------------------------------------------------------------------------------------------------------------------------------------------- Interpolate, cpu --------------------------------------------------------------------------------------------------------------------------------------------------------]
                                                                                                                                                    |  Eager (2.3.0a0+gitafcfdb1) PR  |  Compiled (2.3.0a0+gitafcfdb1) PR  |  Compiled (2.3.0a0+gitde89a53) Nightly  |  speed-up PR vs Nightly  |  Eager (2.3.0a0+gitde89a53) Nightly
1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Input (1, 3, 500, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)       |        565.212 (+-3.548)        |        1384.210 (+-10.798)         |           1230.996 (+-32.930)           |     0.889 (+-0.000)      |          566.253 (+-1.526)
      Input (1, 3, 500, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)      |        565.404 (+-1.614)        |         1491.649 (+-7.763)         |            2974.959 (+-6.006)           |     1.994 (+-0.000)      |          566.476 (+-1.742)
      Input (1, 3, 500, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)           |        270.761 (+-0.861)        |         1557.777 (+-4.699)         |            1080.919 (+-4.243)           |     0.694 (+-0.000)      |          269.829 (+-0.986)
      Input (1, 3, 500, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)          |        270.960 (+-0.995)        |        1723.913 (+-12.433)         |            3191.938 (+-6.194)           |     1.852 (+-0.000)      |          269.962 (+-1.657)
      Input (1, 3, 500, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)     |        1555.884 (+-5.169)       |         1178.753 (+-4.957)         |            1910.445 (+-5.988)           |     1.621 (+-0.000)      |          1560.804 (+-6.793)
      Input (1, 3, 500, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)    |        1651.193 (+-6.952)       |         1323.466 (+-6.059)         |            3374.842 (+-8.168)           |     2.550 (+-0.000)      |          1653.497 (+-8.018)
      Input (1, 3, 500, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)         |        978.482 (+-10.183)       |         1383.768 (+-4.341)         |            2147.841 (+-6.581)           |     1.552 (+-0.000)      |          979.983 (+-1.499)
      Input (1, 3, 500, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)        |        1074.472 (+-5.031)       |         1414.912 (+-5.754)         |           3590.968 (+-10.042)           |     2.538 (+-0.000)      |          1074.589 (+-3.948)
      Input (4, 3, 500, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)       |        2168.703 (+-8.964)       |        5400.528 (+-26.628)         |           4777.299 (+-11.891)           |     0.885 (+-0.000)      |          2168.133 (+-7.667)
      Input (4, 3, 500, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)      |       2169.132 (+-12.618)       |        6583.866 (+-28.959)         |           11986.894 (+-45.838)          |     1.821 (+-0.000)      |         2174.488 (+-10.317)
      Input (4, 3, 500, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)           |        992.808 (+-6.086)        |         5985.028 (+-9.532)         |            4334.158 (+-9.423)           |     0.724 (+-0.000)      |          989.604 (+-5.499)
      Input (4, 3, 500, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)          |        987.618 (+-6.350)        |        6963.044 (+-28.885)         |           15441.096 (+-55.324)          |     2.218 (+-0.000)      |          985.573 (+-5.159)
      Input (4, 3, 500, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)     |       6695.557 (+-35.067)       |        4657.603 (+-14.220)         |           8058.708 (+-41.684)           |     1.730 (+-0.000)      |         6714.996 (+-38.626)
      Input (4, 3, 500, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)    |       7040.481 (+-39.486)       |        5445.704 (+-16.659)         |           13906.618 (+-53.298)          |     2.554 (+-0.000)      |         7034.453 (+-44.626)
      Input (4, 3, 500, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (256, 256)         |       3926.186 (+-10.660)       |        5741.433 (+-12.748)         |           9356.036 (+-40.848)           |     1.630 (+-0.000)      |         3930.598 (+-17.086)
      Input (4, 3, 500, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (256, 256)        |        4308.536 (+-9.607)       |        6122.755 (+-47.278)         |           15637.567 (+-54.392)          |     2.554 (+-0.000)      |         4307.463 (+-11.268)
      Input (1, 3, 1200, 1300), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)     |       2512.740 (+-10.860)       |         1573.590 (+-5.061)         |            451.355 (+-1.210)            |     0.287 (+-0.000)      |         2511.727 (+-10.930)
      Input (1, 3, 1200, 1300), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)    |       2489.926 (+-11.915)       |         1537.233 (+-4.212)         |            2501.470 (+-7.446)           |     1.627 (+-0.000)      |         2500.000 (+-12.155)
      Input (1, 3, 1200, 1300), torch.uint8, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)         |        632.032 (+-2.108)        |         1496.994 (+-4.194)         |            404.759 (+-1.064)            |     0.270 (+-0.000)      |          630.122 (+-4.086)
      Input (1, 3, 1200, 1300), torch.uint8, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)        |        629.174 (+-4.386)        |         1708.935 (+-8.817)         |            2643.296 (+-9.723)           |     1.547 (+-0.000)      |          628.388 (+-1.326)
      Input (1, 3, 1200, 1300), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)   |        4409.941 (+-8.016)       |         1160.133 (+-4.698)         |            1897.089 (+-9.392)           |     1.635 (+-0.000)      |         4450.959 (+-10.438)
      Input (1, 3, 1200, 1300), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)  |       4493.427 (+-11.703)       |         1329.226 (+-4.740)         |           2835.872 (+-12.241)           |     2.133 (+-0.000)      |          4506.973 (+-9.914)
      Input (1, 3, 1200, 1300), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)       |        901.712 (+-4.071)        |         1320.739 (+-5.197)         |            2207.605 (+-8.219)           |     1.671 (+-0.000)      |          904.757 (+-4.558)
      Input (1, 3, 1200, 1300), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)      |        990.080 (+-3.922)        |         1702.563 (+-7.909)         |           3074.196 (+-10.478)           |     1.806 (+-0.000)      |          990.482 (+-4.444)
      Input (4, 3, 1200, 1300), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)     |       9785.550 (+-58.445)       |        6135.680 (+-33.569)         |           1628.572 (+-19.770)           |     0.265 (+-0.000)      |         9893.606 (+-62.377)
      Input (4, 3, 1200, 1300), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)    |       9710.191 (+-57.597)       |        6066.824 (+-36.364)         |           10469.110 (+-42.775)          |     1.726 (+-0.000)      |         9919.022 (+-72.190)
      Input (4, 3, 1200, 1300), torch.uint8, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)         |       2790.356 (+-12.188)       |        6134.101 (+-28.694)         |            1576.832 (+-6.030)           |     0.257 (+-0.000)      |         2761.122 (+-11.503)
      Input (4, 3, 1200, 1300), torch.uint8, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)        |       2778.711 (+-13.603)       |        6608.528 (+-37.776)         |           10841.549 (+-49.429)          |     1.641 (+-0.000)      |         2753.037 (+-10.995)
      Input (4, 3, 1200, 1300), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)   |      45533.868 (+-102.618)      |         4962.994 (+-8.215)         |           9003.968 (+-38.179)           |     1.814 (+-0.000)      |        43531.261 (+-102.951)
      Input (4, 3, 1200, 1300), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)  |       45932.699 (+-81.207)      |        5595.682 (+-11.482)         |           12302.907 (+-50.254)          |     2.199 (+-0.000)      |         43916.455 (+-80.468)
      Input (4, 3, 1200, 1300), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (200, 300)       |        3827.804 (+-8.057)       |        6311.580 (+-25.021)         |           11760.614 (+-51.531)          |     1.863 (+-0.000)      |         3849.959 (+-10.848)
      Input (4, 3, 1200, 1300), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (200, 300)      |        4169.007 (+-8.452)       |        6820.716 (+-35.310)         |           15264.633 (+-49.982)          |     2.238 (+-0.000)      |         4183.875 (+-19.104)
      Input (1, 3, 300, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)       |        1306.914 (+-7.470)       |        10598.101 (+-38.410)        |           2678.031 (+-11.051)           |     0.253 (+-0.000)      |          1307.470 (+-8.519)
      Input (1, 3, 300, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)      |        1307.268 (+-8.197)       |        10161.123 (+-45.643)        |           17148.842 (+-55.402)          |     1.688 (+-0.000)      |          1308.077 (+-8.553)
      Input (1, 3, 300, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)           |        548.574 (+-2.157)        |        10072.806 (+-41.368)        |            2408.971 (+-6.997)           |     0.239 (+-0.000)      |          547.726 (+-1.721)
      Input (1, 3, 300, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)          |        546.664 (+-1.484)        |        11123.694 (+-43.636)        |           18058.070 (+-48.552)          |     1.623 (+-0.000)      |          547.151 (+-1.627)
      Input (1, 3, 300, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)     |       7935.051 (+-71.022)       |        7654.533 (+-29.512)         |           12414.194 (+-87.450)          |     1.622 (+-0.000)      |         7900.056 (+-53.997)
      Input (1, 3, 300, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)    |       8546.732 (+-53.118)       |        8583.572 (+-35.656)         |          19111.824 (+-166.978)          |     2.227 (+-0.000)      |         8515.433 (+-63.300)
      Input (1, 3, 300, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)         |       6202.642 (+-34.355)       |        8915.622 (+-62.293)         |           14327.295 (+-52.188)          |     1.607 (+-0.000)      |         6213.329 (+-39.740)
      Input (1, 3, 300, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)        |       6811.128 (+-33.747)       |        9647.316 (+-50.837)         |           20830.594 (+-62.979)          |     2.159 (+-0.000)      |         6822.512 (+-37.092)
      Input (4, 3, 300, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)       |       5079.586 (+-19.067)       |        42238.442 (+-87.643)        |           11282.141 (+-42.477)          |     0.267 (+-0.000)      |         5104.234 (+-17.706)
      Input (4, 3, 300, 400), torch.uint8, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)      |       5079.575 (+-16.306)       |        41512.995 (+-83.710)        |          68789.816 (+-440.001)          |     1.657 (+-0.000)      |         5097.446 (+-21.724)
      Input (4, 3, 300, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)           |        2039.974 (+-8.614)       |       42322.773 (+-111.866)        |           10399.237 (+-43.140)          |     0.246 (+-0.000)      |         2043.808 (+-10.707)
      Input (4, 3, 300, 400), torch.uint8, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)          |       2036.214 (+-10.083)       |        44353.281 (+-71.548)        |          73340.412 (+-324.780)          |     1.654 (+-0.000)      |          2039.000 (+-9.554)
      Input (4, 3, 300, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)     |       33821.523 (+-96.639)      |        30552.094 (+-65.023)        |          49494.486 (+-872.916)          |     1.620 (+-0.000)      |         33844.404 (+-92.466)
      Input (4, 3, 300, 400), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)    |      36196.104 (+-128.169)      |        34038.432 (+-79.697)        |          75761.226 (+-905.194)          |     2.226 (+-0.000)      |         36260.473 (+-94.642)
      Input (4, 3, 300, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (600, 700)         |       24827.821 (+-77.335)      |        37006.218 (+-86.318)        |          61297.625 (+-898.192)          |     1.656 (+-0.000)      |         24823.275 (+-80.945)
      Input (4, 3, 300, 400), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (600, 700)        |       27266.138 (+-70.262)      |        40109.475 (+-94.248)        |          92086.075 (+-404.922)          |     2.296 (+-0.000)      |         27287.992 (+-89.507)

Times are in microseconds (us).

[--------------------------------------------------------------------------------------------------------------------------------------------------------- Interpolate, cuda ---------------------------------------------------------------------------------------------------------------------------------------------------------]
                                                                                                                                                      |  Eager (2.3.0a0+gitafcfdb1) PR  |  Compiled (2.3.0a0+gitafcfdb1) PR  |  Compiled (2.3.0a0+gitde89a53) Nightly  |  speed-up PR vs Nightly  |  Eager (2.3.0a0+gitde89a53) Nightly
1 threads: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Input (1, 3, 2345, 2456), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345)   |         98.259 (+-0.014)        |          97.156 (+-0.008)          |             97.443 (+-0.031)            |     1.003 (+-0.000)      |           98.248 (+-0.021)
      Input (1, 3, 2345, 2456), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345)  |         97.048 (+-0.016)        |          97.480 (+-0.018)          |             96.819 (+-0.126)            |     0.993 (+-0.000)      |           97.045 (+-0.015)
      Input (1, 3, 2345, 2456), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345)       |         97.944 (+-0.028)        |          91.686 (+-0.411)          |             93.894 (+-1.011)            |     1.024 (+-0.000)      |           97.933 (+-0.008)
      Input (1, 3, 2345, 2456), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345)      |         98.008 (+-0.011)        |          91.205 (+-0.346)          |             96.854 (+-0.058)            |     1.062 (+-0.000)      |           97.203 (+-0.010)
      Input (4, 3, 2345, 2456), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345)   |        384.318 (+-0.011)        |         382.793 (+-0.007)          |            382.472 (+-0.011)            |     0.999 (+-0.000)      |          384.701 (+-0.012)
      Input (4, 3, 2345, 2456), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345)  |        384.266 (+-0.009)        |         385.333 (+-0.024)          |            382.554 (+-0.022)            |     0.993 (+-0.000)      |          384.386 (+-0.016)
      Input (4, 3, 2345, 2456), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345)       |        383.924 (+-0.011)        |         570.071 (+-0.030)          |            545.615 (+-0.051)            |     0.957 (+-0.000)      |          384.044 (+-0.012)
      Input (4, 3, 2345, 2456), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345)      |        384.184 (+-0.016)        |         560.857 (+-0.026)          |            552.447 (+-0.040)            |     0.985 (+-0.000)      |          384.063 (+-0.016)
      Input (1, 3, 1234, 1345), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456)   |        122.188 (+-0.053)        |         116.744 (+-1.006)          |            163.762 (+-0.015)            |     1.403 (+-0.000)      |          121.874 (+-0.015)
      Input (1, 3, 1234, 1345), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456)  |        122.156 (+-0.012)        |         182.692 (+-0.013)          |            161.653 (+-0.018)            |     0.885 (+-0.000)      |          121.926 (+-0.014)
      Input (1, 3, 1234, 1345), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456)       |        105.852 (+-0.324)        |         119.545 (+-0.294)          |            190.527 (+-0.023)            |     1.594 (+-0.000)      |          105.999 (+-0.446)
      Input (1, 3, 1234, 1345), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456)      |        106.507 (+-0.282)        |         120.060 (+-0.257)          |            162.330 (+-0.012)            |     1.352 (+-0.000)      |          106.567 (+-0.385)
      Input (4, 3, 1234, 1345), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456)   |        447.907 (+-0.015)        |         463.863 (+-1.779)          |            650.492 (+-0.331)            |     1.402 (+-0.000)      |          446.596 (+-0.017)
      Input (4, 3, 1234, 1345), torch.float32, torch.contiguous_format | mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456)  |        447.750 (+-0.017)        |         723.832 (+-0.170)          |            641.539 (+-0.075)            |     0.886 (+-0.000)      |          446.467 (+-0.019)
      Input (4, 3, 1234, 1345), torch.float32, torch.channels_last | mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456)       |        439.549 (+-0.031)        |         507.772 (+-2.879)          |            758.795 (+-0.482)            |     1.494 (+-0.000)      |          440.372 (+-0.025)
      Input (4, 3, 1234, 1345), torch.float32, torch.channels_last | mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456)      |        439.538 (+-0.029)        |         509.260 (+-2.704)          |            654.195 (+-2.621)            |     1.285 (+-0.000)      |          440.362 (+-0.026)

Times are in microseconds (us).
```

[Source](f4751a3196/perf_interp_mode.py), [Output](899f34c024/output/20231213-214209-upsample-bilinear-pr_vs_nightly-speedup.md)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104182
Approved by: https://github.com/lezcano
2023-12-14 14:50:06 +00:00
angelayi
639060cb0b Use get_mkldnn_enabled for decompositions (#115448)
`torch._C.has_mkldnn` does not respect cases where users try to disable mkldnn using `torch._C._set_mkldnn_enabled()`. This is relevant to edge use cases, where they do not want decompositions to go to the ATen opset, and do not want the mkldnn operator to appear in the graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115448
Approved by: https://github.com/jgong5, https://github.com/ydwu4
2023-12-12 22:42:51 +00:00
Isuru Fernando
505574c46a Add decomposition for torch.block_diag (#115096)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115096
Approved by: https://github.com/peterbell10
2023-12-11 20:04:22 +00:00
Isuru Fernando
d40a7c6026 Add decompositions for replication_pad (#115113)
Fixes #115395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115113
Approved by: https://github.com/peterbell10
2023-12-09 02:44:07 +00:00
Isuru Fernando
fb19947962 Add decompositions for reflection_pad{1, 2, 3}d (#115100)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115100
Approved by: https://github.com/peterbell10
2023-12-08 23:05:57 +00:00
Jason Ansel
7979ba7b43 [inductor] Add dropout type check to match eager (#115040)
Fixes #98970

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115040
Approved by: https://github.com/oulgen
2023-12-03 23:05:02 +00:00
Kurt Mohler
6f32eb7eef Add decomp for replication_pad2d and use for CUDA deterministic (#111590)
Fixes #95578

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590
Approved by: https://github.com/peterbell10
2023-12-01 18:56:09 +00:00
PyTorch MergeBot
013675ff59 Revert "Add decomp for replication_pad2d and use for CUDA deterministic (#111590)"
This reverts commit f1286161a6.

Reverted https://github.com/pytorch/pytorch/pull/111590 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing XLA job.  The job is also failing on the PR, but the log classifier failed to find the failed test which lead to it being marked wrongly as flaky ([comment](https://github.com/pytorch/pytorch/pull/111590#issuecomment-1833004794))
2023-11-30 02:28:14 +00:00
Kurt Mohler
f1286161a6 Add decomp for replication_pad2d and use for CUDA deterministic (#111590)
Fixes #95578

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590
Approved by: https://github.com/peterbell10
2023-11-29 21:50:46 +00:00
Antonio Kim
7fc292930c Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-21 23:07:21 +00:00
vfdev-5
1f8d00c5a3 [inductor] Added decomposition for upsample_nearest_exact Nd (#113749)
Description:
- Added decomposition for upsample_nearest_exact: 1d, 2d, 3d

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113749
Approved by: https://github.com/lezcano
2023-11-21 13:03:47 +00:00
PyTorch MergeBot
fe428a284b Revert "Add torch._lazy_clone to create COW tensors (#113397)"
This reverts commit 9916d8a9ea.

Reverted https://github.com/pytorch/pytorch/pull/113397 on behalf of https://github.com/DanilBaibak due to Unfortunately, I need to revert your PR because the lower [PR in the stack](https://github.com/pytorch/pytorch/pull/113396) is failing a bunch of internal build jobs. ([comment](https://github.com/pytorch/pytorch/pull/113397#issuecomment-1818761224))
2023-11-20 10:21:09 +00:00
GD06
b30580e121 [PT] Include tensor shape info in the error messages of torch split (#113984)
Summary: Include tensor shape info in the error messages of torch split.

Test Plan: CI

Differential Revision: D51436684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113984
Approved by: https://github.com/ezyang
2023-11-19 01:34:57 +00:00
Kurt Mohler
9916d8a9ea Add torch._lazy_clone to create COW tensors (#113397)
Part of #109833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113397
Approved by: https://github.com/ezyang
ghstack dependencies: #113396
2023-11-17 01:58:51 +00:00
PyTorch MergeBot
252e68a83b Revert "Add support for torch.Generator type in TorchScript (#110413)"
This reverts commit 54493fe8c4.

Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557))
2023-11-15 00:51:23 +00:00
Antonio Kim
54493fe8c4 Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-13 23:18:14 +00:00
Mengwei Liu
5506b9db43 [decomp] Fix _scaled_dot_product_flash_attention decomposition bug (#113102)
For `_scaled_dot_product_flash_attention` we don't have

`Tensor? attn_mask=None`

but `scaled_dot_product_attention` has. In the original decomp there's a
mixup where I added this argument to
`_scaled_dot_product_flash_attention`.

Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113102
Approved by: https://github.com/ezyang
2023-11-08 21:47:37 +00:00
PyTorch MergeBot
9a28a7b498 Revert "Add support for torch.Generator type in TorchScript (#110413)"
This reverts commit 27e31ab6e8.

Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164))
2023-11-07 15:53:32 +00:00
Antonio Kim
27e31ab6e8 Add support for torch.Generator type in TorchScript (#110413)
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)

CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98
2023-11-06 21:27:02 +00:00
Han Qi
5a6f8014c4 Add a decomposition for _weight_norm_interface. (#112193)
Fixes #112086

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112193
Approved by: https://github.com/ezyang
2023-11-01 19:51:11 +00:00
Peter Bell
04024926f4 Use pytree.tree_map_ everywhere (#112417)
Wherever we discard the output of `tree_map` it's better to call `tree_map_`
which doesn't unflatten the mapped results and so is a lot cheaper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112417
Approved by: https://github.com/lezcano
ghstack dependencies: #112391, #112392, #112393, #112394
2023-10-31 15:57:06 +00:00
Peter Bell
66c32d099a Use pytree.arg_tree_leaves everywhere (#112394)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394
Approved by: https://github.com/lezcano
ghstack dependencies: #112391, #112392, #112393
2023-10-31 15:57:06 +00:00
Peter Bell
bbd5b935e4 Use pytree.tree_leaves everywhere (#112324)
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327, #112323
2023-10-30 03:39:04 +00:00
lezcano
c8a5bb451e Do not import sympy within torch._prims_common (#112034)
This is the first of a few PRs that avoid importing SymPy at import time.
The pitch here is that we (almost!) do not have SymPy on our API, so
this should be feasible.

This should speed-up torch imports by a good 15% as per
https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589

In this PR we just move a few global imports into local imports.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034
Approved by: https://github.com/ezyang
2023-10-26 12:53:25 +00:00
PyTorch MergeBot
98c329b19e Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906)"
This reverts commit 9606cda64e.

Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))
2023-10-11 11:41:21 +00:00
SS-JIA
9606cda64e [core ATen IR] Add decompositions for max, min, var_mean (#110906)
## Context

Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators:

```
aten.max(x) -> return aten.amax(x), aten.argmax(x)
aten.min(x) -> return aten.amin(x), aten.argmin(x)
aten.var_mean(x) -> return aten.var(x), aten.mean(x)
```

For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano

Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906
Approved by: https://github.com/manuelcandales
2023-10-11 00:06:24 +00:00
Kazuaki Ishizaki
fde28fdc8c Fix typo under torch/_decomp directory (#110821)
This PR fixes typo of comments in files under `torch/_decomp` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110821
Approved by: https://github.com/Skylion007
2023-10-08 20:33:49 +00:00
Stephen Jia
c2e7a0d689 [core IR] Add decomps for aten.sum and aten.squeeze variants (#110645)
Summary:
## Context

Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant.

Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10

Test Plan: Github CI + Meta Internal CI

Differential Revision: D49965952

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645
Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales
2023-10-07 04:21:51 +00:00
cdzhan
7cc0020a80 [decomp] Fix different return type in threshold_backward vs. eager (#110689)
due to type promotion with floating point scalar in decompositions.py

Fixes part of #100838

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110689
Approved by: https://github.com/ezyang
2023-10-06 20:59:58 +00:00
chilli
ceb773b68d Fix #110680 (requires_grad typo in decomp) (#110687)
Fixes https://github.com/pytorch/pytorch/issues/110680
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110687
Approved by: https://github.com/voznesenskym, https://github.com/lezcano
ghstack dependencies: #110501, #110504, #110591, #110668
2023-10-06 10:36:01 +00:00
Jerry Zhang
f2a1b93549 Back out "[quant] Support integer implementations for adaptive_avg_pool2d (#104226)" (#110316)
Summary:
Original commit changeset: acdb5b34e3aa

Original Phabricator Diff: D47321689

Test Plan: opinfo tests in CI

Differential Revision: D49789403

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316
Approved by: https://github.com/kimishpatel
2023-10-03 16:59:23 +00:00
Stephen Jia
ff96f6d04f [core IR][reland] Add split.Tensor and unbind decompositions to core ATen decomp table (#110323)
Summary:
This is a reland of [github PR #110102]( https://github.com/pytorch/pytorch/pull/110102).

The original PR had to be unlanded due to internal CI failures. This diff applies some small fixes to the failing tests to adjust to the new decompositions.

Note that `lift_fresh` will not be decomposed for now, since it was found that [constant propogation looks specifically for `lift_fresh`](13af952f94/torch/fx/experimental/proxy_tensor.py (L381-L386)). Therefore decomposing `lift_fresh` will interfere with constant propogation during export.

Test Plan: Github CI and internal CI

Differential Revision: D49761321

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110323
Approved by: https://github.com/jansel
2023-10-03 14:35:04 +00:00
Peter Bell
be3b16daad [decomp] Fix baddbmm decomposition (#109714)
The decomposition is currently registered without the pw_cast_for_opmath
decorator, due to the ordering of decorators being meaningful.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714
Approved by: https://github.com/lezcano
2023-09-28 21:23:44 +00:00
PyTorch MergeBot
e0b035c220 Revert "[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102)"
This reverts commit 22e706f768.

Reverted https://github.com/pytorch/pytorch/pull/110102 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110102#issuecomment-1739856671))
2023-09-28 19:03:25 +00:00
SS-JIA
22e706f768 [core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102)
## Context

Add existing decomps for `lift_fresh`, `split.Tensor`, and `unbind` to the core ATen decomposition table. Do not use them in inductor, since Inductor currently lowers these directly.

One note though is that `lift_fresh`'s decomposition has a note saying it's not correct under autograd. However, my understanding is that these decompositions are registered to the `"post_autograd"` decomposition table, meaning autograd wouldn't be a factor. Would like some confirmation that this premise is correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110102
Approved by: https://github.com/jansel
2023-09-28 01:21:45 +00:00
SS-JIA
dec140f1ea [core IR] Add a core decomposition for aten.all (#110093)
## Context

Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093
Approved by: https://github.com/manuelcandales, https://github.com/peterbell10, https://github.com/lezcano
2023-09-27 01:31:41 +00:00
SS-JIA
9928c10e71 [core IR] Add glu as a core decomposition (#110043)
## Context

Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043
Approved by: https://github.com/peterbell10, https://github.com/lezcano
ghstack dependencies: #110046
2023-09-27 00:23:05 +00:00
SS-JIA
5df8aca994 [core IR] Add a core decomposition for floor_divide (#110046)
## Context

Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table.

This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition

```
# TorchInductor-only decomposition. It should not be taken to core.
# See https://github.com/pytorch/torchdynamo/pull/1120
```

but couldn't discern the reason why this is the case. cc: @lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046
Approved by: https://github.com/peterbell10
2023-09-26 08:39:21 +00:00
Mwiza Kunda
5c4b5baf21 Fix python decomps for OpOverloadPackets and add tests (#107707)
- Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments)

- Add out parameter wrappers to python decomps for aten ops that have out overloads

CC. @ezyang @albanD @lezcano

Fixes #107713

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707
Approved by: https://github.com/lezcano
2023-09-25 20:53:30 +00:00
SS-JIA
7de669f2f9 [core IR] Remove trunc decomp and add trunc to core (#109902)
Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator.

Going forward, provide similar treatment for operators that map cleanly to hardware instructions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902
Approved by: https://github.com/peterbell10
2023-09-25 18:18:06 +00:00
Jijie Wei
334ead04a9 Back out "[decomp] Fix baddbmm decomposition (#109714)" (#109855)
Summary:
Original commit changeset: 95c462a380c9

Original Phabricator Diff: D49484954

this diff cause test failure for deterministic ne test see:https://www.internalfb.com/sandcastle/job/18014399565419856/

Test Plan:
buck2 test 'fbcode//mode/opt' fbcode//aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test -- --exact 'aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test - aps_models.ads.icvr.tests.icvr_fm_e2e_deterministic_ne_test.ICVR_FM_E2EDeterministicNeTest: test_e2e_deterministic_icvr_fm_pt2_fsdp_multi_gpus'

https://www.internalfb.com/intern/testinfra/testrun/16888498605839953

Differential Revision: D49527271

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109855
Approved by: https://github.com/yanboliang
2023-09-22 22:01:38 +00:00
Mwiza Kunda
8dedc9dd9b Add meta tests for layer/group/batch norm backward (#109591)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109591
Approved by: https://github.com/ezyang
2023-09-21 18:58:51 +00:00
Mwiza Kunda
6b7b9c796e Fix registering jit decompositions for jvp for out wrapped decomps (#109367)
Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since:
- `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter
- `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper

The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367
Approved by: https://github.com/lezcano
2023-09-21 16:36:51 +00:00
Peter Bell
6f0cf5a837 [decomp] Decompose unsafe_split{,_with_sizes} into safe variants (#109668)
The "safety" aspect refers to the output not being registered as aliasing the
input, but after AOTAutograd I don't think this distinction matters. However,
we shouldn't use the same decomposition as the safe variant in case the backend
doesn't want to decompose split.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668
Approved by: https://github.com/lezcano
ghstack dependencies: #109667
2023-09-20 18:45:56 +00:00
Peter Bell
9e629dd73c [decomp] Add all std and std_mean overloads to core decompostions (#109667)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109667
Approved by: https://github.com/lezcano
2023-09-20 18:45:56 +00:00
Peter Bell
36a8105f54 [decomp] Fix baddbmm decomposition (#109714)
The decomposition is currently registered without the pw_cast_for_opmath
decorator, due to the ordering of decorators being meaningful.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714
Approved by: https://github.com/lezcano
2023-09-20 18:40:21 +00:00
Salil Desai
40b2c796dc [Decomposition] baddbmm (#108534)
Summary:
Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions

ff38c0e2f9/torch/_inductor/decomposition.py (L203)

Test Plan: Phabricator + OSS Tests

Differential Revision: D48871741

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534
Approved by: https://github.com/SherlockNoMad
2023-09-20 12:49:32 +00:00
Salil Desai
d0cc623192 [Decomposition] _unsafe_view (#108713)
Summary:
Decomp already exists so just add it to core_aten_decompositions

https://www.internalfb.com/code/fbsource/[9d5eabd7b213d1a356d4e7bb400355d574ea924b]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=3091

Differential Revision: D48619079

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108713
Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad
2023-09-19 13:37:35 +00:00
Salil Desai
2e721aab98 [Decomposition] Trunc (#109319)
Summary:
Add Decomp for Trunc and add it to core_aten_decompositions

Differential Revision: D49042033

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319
Approved by: https://github.com/SherlockNoMad
2023-09-19 13:30:13 +00:00
Salil Desai
ae66d0b3bf [Decomposition] clamp_max (#108718)
Summary:
Decomp already exists so just add it to core_aten_decompositions

https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1855

Differential Revision: D48880026

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108718
Approved by: https://github.com/SherlockNoMad
2023-09-19 13:25:35 +00:00