pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
vfdev-5	f727bed2e6	[inductor] Updated upsample_bilinear2d decomposition (#104182 ) Description: - Updated upsample_bilinear2d decomposition - added support for uint8 dtype support - code improvements - Added uint8 dtype tests Perf considerations: - There is minor perf regression (speed-up ~0.7) on cases uint8, align_corners=True when output is smaller/equal (256, 256) - For cases, when output is larger (256, 256) and input dtype uint8, nightly output is wrong, so IMO large perf regression (speed-up around ~0.2) should not be taken into account. ## Perfs benchmarks ``` [--------------------------------------------------------------------------------------------------------------------------------------------------------- Interpolate, cpu --------------------------------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitde89a53) Nightly \| speed-up PR vs Nightly \| Eager (2.3.0a0+gitde89a53) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input (1, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 565.212 (+-3.548) \| 1384.210 (+-10.798) \| 1230.996 (+-32.930) \| 0.889 (+-0.000) \| 566.253 (+-1.526) Input (1, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 565.404 (+-1.614) \| 1491.649 (+-7.763) \| 2974.959 (+-6.006) \| 1.994 (+-0.000) \| 566.476 (+-1.742) Input (1, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 270.761 (+-0.861) \| 1557.777 (+-4.699) \| 1080.919 (+-4.243) \| 0.694 (+-0.000) \| 269.829 (+-0.986) Input (1, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 270.960 (+-0.995) \| 1723.913 (+-12.433) \| 3191.938 (+-6.194) \| 1.852 (+-0.000) \| 269.962 (+-1.657) Input (1, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 1555.884 (+-5.169) \| 1178.753 (+-4.957) \| 1910.445 (+-5.988) \| 1.621 (+-0.000) \| 1560.804 (+-6.793) Input (1, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 1651.193 (+-6.952) \| 1323.466 (+-6.059) \| 3374.842 (+-8.168) \| 2.550 (+-0.000) \| 1653.497 (+-8.018) Input (1, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 978.482 (+-10.183) \| 1383.768 (+-4.341) \| 2147.841 (+-6.581) \| 1.552 (+-0.000) \| 979.983 (+-1.499) Input (1, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 1074.472 (+-5.031) \| 1414.912 (+-5.754) \| 3590.968 (+-10.042) \| 2.538 (+-0.000) \| 1074.589 (+-3.948) Input (4, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 2168.703 (+-8.964) \| 5400.528 (+-26.628) \| 4777.299 (+-11.891) \| 0.885 (+-0.000) \| 2168.133 (+-7.667) Input (4, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 2169.132 (+-12.618) \| 6583.866 (+-28.959) \| 11986.894 (+-45.838) \| 1.821 (+-0.000) \| 2174.488 (+-10.317) Input (4, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 992.808 (+-6.086) \| 5985.028 (+-9.532) \| 4334.158 (+-9.423) \| 0.724 (+-0.000) \| 989.604 (+-5.499) Input (4, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 987.618 (+-6.350) \| 6963.044 (+-28.885) \| 15441.096 (+-55.324) \| 2.218 (+-0.000) \| 985.573 (+-5.159) Input (4, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 6695.557 (+-35.067) \| 4657.603 (+-14.220) \| 8058.708 (+-41.684) \| 1.730 (+-0.000) \| 6714.996 (+-38.626) Input (4, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 7040.481 (+-39.486) \| 5445.704 (+-16.659) \| 13906.618 (+-53.298) \| 2.554 (+-0.000) \| 7034.453 (+-44.626) Input (4, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 3926.186 (+-10.660) \| 5741.433 (+-12.748) \| 9356.036 (+-40.848) \| 1.630 (+-0.000) \| 3930.598 (+-17.086) Input (4, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 4308.536 (+-9.607) \| 6122.755 (+-47.278) \| 15637.567 (+-54.392) \| 2.554 (+-0.000) \| 4307.463 (+-11.268) Input (1, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 2512.740 (+-10.860) \| 1573.590 (+-5.061) \| 451.355 (+-1.210) \| 0.287 (+-0.000) \| 2511.727 (+-10.930) Input (1, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 2489.926 (+-11.915) \| 1537.233 (+-4.212) \| 2501.470 (+-7.446) \| 1.627 (+-0.000) \| 2500.000 (+-12.155) Input (1, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 632.032 (+-2.108) \| 1496.994 (+-4.194) \| 404.759 (+-1.064) \| 0.270 (+-0.000) \| 630.122 (+-4.086) Input (1, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 629.174 (+-4.386) \| 1708.935 (+-8.817) \| 2643.296 (+-9.723) \| 1.547 (+-0.000) \| 628.388 (+-1.326) Input (1, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 4409.941 (+-8.016) \| 1160.133 (+-4.698) \| 1897.089 (+-9.392) \| 1.635 (+-0.000) \| 4450.959 (+-10.438) Input (1, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 4493.427 (+-11.703) \| 1329.226 (+-4.740) \| 2835.872 (+-12.241) \| 2.133 (+-0.000) \| 4506.973 (+-9.914) Input (1, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 901.712 (+-4.071) \| 1320.739 (+-5.197) \| 2207.605 (+-8.219) \| 1.671 (+-0.000) \| 904.757 (+-4.558) Input (1, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 990.080 (+-3.922) \| 1702.563 (+-7.909) \| 3074.196 (+-10.478) \| 1.806 (+-0.000) \| 990.482 (+-4.444) Input (4, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 9785.550 (+-58.445) \| 6135.680 (+-33.569) \| 1628.572 (+-19.770) \| 0.265 (+-0.000) \| 9893.606 (+-62.377) Input (4, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 9710.191 (+-57.597) \| 6066.824 (+-36.364) \| 10469.110 (+-42.775) \| 1.726 (+-0.000) \| 9919.022 (+-72.190) Input (4, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 2790.356 (+-12.188) \| 6134.101 (+-28.694) \| 1576.832 (+-6.030) \| 0.257 (+-0.000) \| 2761.122 (+-11.503) Input (4, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 2778.711 (+-13.603) \| 6608.528 (+-37.776) \| 10841.549 (+-49.429) \| 1.641 (+-0.000) \| 2753.037 (+-10.995) Input (4, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 45533.868 (+-102.618) \| 4962.994 (+-8.215) \| 9003.968 (+-38.179) \| 1.814 (+-0.000) \| 43531.261 (+-102.951) Input (4, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 45932.699 (+-81.207) \| 5595.682 (+-11.482) \| 12302.907 (+-50.254) \| 2.199 (+-0.000) \| 43916.455 (+-80.468) Input (4, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 3827.804 (+-8.057) \| 6311.580 (+-25.021) \| 11760.614 (+-51.531) \| 1.863 (+-0.000) \| 3849.959 (+-10.848) Input (4, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 4169.007 (+-8.452) \| 6820.716 (+-35.310) \| 15264.633 (+-49.982) \| 2.238 (+-0.000) \| 4183.875 (+-19.104) Input (1, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 1306.914 (+-7.470) \| 10598.101 (+-38.410) \| 2678.031 (+-11.051) \| 0.253 (+-0.000) \| 1307.470 (+-8.519) Input (1, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 1307.268 (+-8.197) \| 10161.123 (+-45.643) \| 17148.842 (+-55.402) \| 1.688 (+-0.000) \| 1308.077 (+-8.553) Input (1, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 548.574 (+-2.157) \| 10072.806 (+-41.368) \| 2408.971 (+-6.997) \| 0.239 (+-0.000) \| 547.726 (+-1.721) Input (1, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 546.664 (+-1.484) \| 11123.694 (+-43.636) \| 18058.070 (+-48.552) \| 1.623 (+-0.000) \| 547.151 (+-1.627) Input (1, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 7935.051 (+-71.022) \| 7654.533 (+-29.512) \| 12414.194 (+-87.450) \| 1.622 (+-0.000) \| 7900.056 (+-53.997) Input (1, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 8546.732 (+-53.118) \| 8583.572 (+-35.656) \| 19111.824 (+-166.978) \| 2.227 (+-0.000) \| 8515.433 (+-63.300) Input (1, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 6202.642 (+-34.355) \| 8915.622 (+-62.293) \| 14327.295 (+-52.188) \| 1.607 (+-0.000) \| 6213.329 (+-39.740) Input (1, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 6811.128 (+-33.747) \| 9647.316 (+-50.837) \| 20830.594 (+-62.979) \| 2.159 (+-0.000) \| 6822.512 (+-37.092) Input (4, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 5079.586 (+-19.067) \| 42238.442 (+-87.643) \| 11282.141 (+-42.477) \| 0.267 (+-0.000) \| 5104.234 (+-17.706) Input (4, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 5079.575 (+-16.306) \| 41512.995 (+-83.710) \| 68789.816 (+-440.001) \| 1.657 (+-0.000) \| 5097.446 (+-21.724) Input (4, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 2039.974 (+-8.614) \| 42322.773 (+-111.866) \| 10399.237 (+-43.140) \| 0.246 (+-0.000) \| 2043.808 (+-10.707) Input (4, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 2036.214 (+-10.083) \| 44353.281 (+-71.548) \| 73340.412 (+-324.780) \| 1.654 (+-0.000) \| 2039.000 (+-9.554) Input (4, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 33821.523 (+-96.639) \| 30552.094 (+-65.023) \| 49494.486 (+-872.916) \| 1.620 (+-0.000) \| 33844.404 (+-92.466) Input (4, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 36196.104 (+-128.169) \| 34038.432 (+-79.697) \| 75761.226 (+-905.194) \| 2.226 (+-0.000) \| 36260.473 (+-94.642) Input (4, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 24827.821 (+-77.335) \| 37006.218 (+-86.318) \| 61297.625 (+-898.192) \| 1.656 (+-0.000) \| 24823.275 (+-80.945) Input (4, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 27266.138 (+-70.262) \| 40109.475 (+-94.248) \| 92086.075 (+-404.922) \| 2.296 (+-0.000) \| 27287.992 (+-89.507) Times are in microseconds (us). [--------------------------------------------------------------------------------------------------------------------------------------------------------- Interpolate, cuda ---------------------------------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitde89a53) Nightly \| speed-up PR vs Nightly \| Eager (2.3.0a0+gitde89a53) Nightly 1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input (1, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 98.259 (+-0.014) \| 97.156 (+-0.008) \| 97.443 (+-0.031) \| 1.003 (+-0.000) \| 98.248 (+-0.021) Input (1, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 97.048 (+-0.016) \| 97.480 (+-0.018) \| 96.819 (+-0.126) \| 0.993 (+-0.000) \| 97.045 (+-0.015) Input (1, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 97.944 (+-0.028) \| 91.686 (+-0.411) \| 93.894 (+-1.011) \| 1.024 (+-0.000) \| 97.933 (+-0.008) Input (1, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 98.008 (+-0.011) \| 91.205 (+-0.346) \| 96.854 (+-0.058) \| 1.062 (+-0.000) \| 97.203 (+-0.010) Input (4, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 384.318 (+-0.011) \| 382.793 (+-0.007) \| 382.472 (+-0.011) \| 0.999 (+-0.000) \| 384.701 (+-0.012) Input (4, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 384.266 (+-0.009) \| 385.333 (+-0.024) \| 382.554 (+-0.022) \| 0.993 (+-0.000) \| 384.386 (+-0.016) Input (4, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 383.924 (+-0.011) \| 570.071 (+-0.030) \| 545.615 (+-0.051) \| 0.957 (+-0.000) \| 384.044 (+-0.012) Input (4, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 384.184 (+-0.016) \| 560.857 (+-0.026) \| 552.447 (+-0.040) \| 0.985 (+-0.000) \| 384.063 (+-0.016) Input (1, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 122.188 (+-0.053) \| 116.744 (+-1.006) \| 163.762 (+-0.015) \| 1.403 (+-0.000) \| 121.874 (+-0.015) Input (1, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 122.156 (+-0.012) \| 182.692 (+-0.013) \| 161.653 (+-0.018) \| 0.885 (+-0.000) \| 121.926 (+-0.014) Input (1, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 105.852 (+-0.324) \| 119.545 (+-0.294) \| 190.527 (+-0.023) \| 1.594 (+-0.000) \| 105.999 (+-0.446) Input (1, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 106.507 (+-0.282) \| 120.060 (+-0.257) \| 162.330 (+-0.012) \| 1.352 (+-0.000) \| 106.567 (+-0.385) Input (4, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 447.907 (+-0.015) \| 463.863 (+-1.779) \| 650.492 (+-0.331) \| 1.402 (+-0.000) \| 446.596 (+-0.017) Input (4, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 447.750 (+-0.017) \| 723.832 (+-0.170) \| 641.539 (+-0.075) \| 0.886 (+-0.000) \| 446.467 (+-0.019) Input (4, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 439.549 (+-0.031) \| 507.772 (+-2.879) \| 758.795 (+-0.482) \| 1.494 (+-0.000) \| 440.372 (+-0.025) Input (4, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 439.538 (+-0.029) \| 509.260 (+-2.704) \| 654.195 (+-2.621) \| 1.285 (+-0.000) \| 440.362 (+-0.026) Times are in microseconds (us). ``` [Source](`f4751a3196/perf_interp_mode.py`), [Output](`899f34c024/output/20231213-214209-upsample-bilinear-pr_vs_nightly-speedup.md`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104182 Approved by: https://github.com/lezcano	2023-12-14 14:50:06 +00:00
angelayi	639060cb0b	Use get_mkldnn_enabled for decompositions (#115448 ) `torch._C.has_mkldnn` does not respect cases where users try to disable mkldnn using `torch._C._set_mkldnn_enabled()`. This is relevant to edge use cases, where they do not want decompositions to go to the ATen opset, and do not want the mkldnn operator to appear in the graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115448 Approved by: https://github.com/jgong5, https://github.com/ydwu4	2023-12-12 22:42:51 +00:00
Isuru Fernando	505574c46a	Add decomposition for torch.block_diag (#115096 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115096 Approved by: https://github.com/peterbell10	2023-12-11 20:04:22 +00:00
Isuru Fernando	d40a7c6026	Add decompositions for replication_pad (#115113 ) Fixes #115395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115113 Approved by: https://github.com/peterbell10	2023-12-09 02:44:07 +00:00
Isuru Fernando	fb19947962	Add decompositions for reflection_pad{1, 2, 3}d (#115100 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115100 Approved by: https://github.com/peterbell10	2023-12-08 23:05:57 +00:00
Jason Ansel	7979ba7b43	[inductor] Add dropout type check to match eager (#115040 ) Fixes #98970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115040 Approved by: https://github.com/oulgen	2023-12-03 23:05:02 +00:00
Kurt Mohler	6f32eb7eef	Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 ) Fixes #95578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590 Approved by: https://github.com/peterbell10	2023-12-01 18:56:09 +00:00
PyTorch MergeBot	013675ff59	Revert "Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 )" This reverts commit `f1286161a6`. Reverted https://github.com/pytorch/pytorch/pull/111590 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing XLA job. The job is also failing on the PR, but the log classifier failed to find the failed test which lead to it being marked wrongly as flaky ([comment](https://github.com/pytorch/pytorch/pull/111590#issuecomment-1833004794))	2023-11-30 02:28:14 +00:00
Kurt Mohler	f1286161a6	Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 ) Fixes #95578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590 Approved by: https://github.com/peterbell10	2023-11-29 21:50:46 +00:00
Antonio Kim	7fc292930c	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-21 23:07:21 +00:00
vfdev-5	1f8d00c5a3	[inductor] Added decomposition for upsample_nearest_exact Nd (#113749 ) Description: - Added decomposition for upsample_nearest_exact: 1d, 2d, 3d Pull Request resolved: https://github.com/pytorch/pytorch/pull/113749 Approved by: https://github.com/lezcano	2023-11-21 13:03:47 +00:00
PyTorch MergeBot	fe428a284b	Revert "Add `torch._lazy_clone` to create COW tensors (#113397 )" This reverts commit `9916d8a9ea`. Reverted https://github.com/pytorch/pytorch/pull/113397 on behalf of https://github.com/DanilBaibak due to Unfortunately, I need to revert your PR because the lower [PR in the stack](https://github.com/pytorch/pytorch/pull/113396) is failing a bunch of internal build jobs. ([comment](https://github.com/pytorch/pytorch/pull/113397#issuecomment-1818761224))	2023-11-20 10:21:09 +00:00
GD06	b30580e121	[PT] Include tensor shape info in the error messages of torch split (#113984 ) Summary: Include tensor shape info in the error messages of torch split. Test Plan: CI Differential Revision: D51436684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113984 Approved by: https://github.com/ezyang	2023-11-19 01:34:57 +00:00
Kurt Mohler	9916d8a9ea	Add `torch._lazy_clone` to create COW tensors (#113397 ) Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113397 Approved by: https://github.com/ezyang ghstack dependencies: #113396	2023-11-17 01:58:51 +00:00
PyTorch MergeBot	252e68a83b	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit `54493fe8c4`. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557))	2023-11-15 00:51:23 +00:00
Antonio Kim	54493fe8c4	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-13 23:18:14 +00:00
Mengwei Liu	5506b9db43	[decomp] Fix _scaled_dot_product_flash_attention decomposition bug (#113102 ) For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113102 Approved by: https://github.com/ezyang	2023-11-08 21:47:37 +00:00
PyTorch MergeBot	9a28a7b498	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit `27e31ab6e8`. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164))	2023-11-07 15:53:32 +00:00
Antonio Kim	27e31ab6e8	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-06 21:27:02 +00:00
Han Qi	5a6f8014c4	Add a decomposition for _weight_norm_interface. (#112193 ) Fixes #112086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112193 Approved by: https://github.com/ezyang	2023-11-01 19:51:11 +00:00
Peter Bell	04024926f4	Use `pytree.tree_map_` everywhere (#112417 ) Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112417 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393, #112394	2023-10-31 15:57:06 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
lezcano	c8a5bb451e	Do not import sympy within torch._prims_common (#112034 ) This is the first of a few PRs that avoid importing SymPy at import time. The pitch here is that we (almost!) do not have SymPy on our API, so this should be feasible. This should speed-up torch imports by a good 15% as per https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589 In this PR we just move a few global imports into local imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034 Approved by: https://github.com/ezyang	2023-10-26 12:53:25 +00:00
PyTorch MergeBot	98c329b19e	Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906 )" This reverts commit `9606cda64e`. Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))	2023-10-11 11:41:21 +00:00
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
Kazuaki Ishizaki	fde28fdc8c	Fix typo under torch/_decomp directory (#110821 ) This PR fixes typo of comments in files under `torch/_decomp` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110821 Approved by: https://github.com/Skylion007	2023-10-08 20:33:49 +00:00
Stephen Jia	c2e7a0d689	[core IR] Add decomps for `aten.sum` and `aten.squeeze` variants (#110645 ) Summary: ## Context Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant. Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10 Test Plan: Github CI + Meta Internal CI Differential Revision: D49965952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645 Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales	2023-10-07 04:21:51 +00:00
cdzhan	7cc0020a80	[decomp] Fix different return type in threshold_backward vs. eager (#110689 ) due to type promotion with floating point scalar in decompositions.py Fixes part of #100838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110689 Approved by: https://github.com/ezyang	2023-10-06 20:59:58 +00:00
chilli	ceb773b68d	Fix #110680 (requires_grad typo in decomp) (#110687 ) Fixes https://github.com/pytorch/pytorch/issues/110680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110687 Approved by: https://github.com/voznesenskym, https://github.com/lezcano ghstack dependencies: #110501, #110504, #110591, #110668	2023-10-06 10:36:01 +00:00
Jerry Zhang	f2a1b93549	Back out "[quant] Support integer implementations for adaptive_avg_pool2d (#104226 )" (#110316 ) Summary: Original commit changeset: acdb5b34e3aa Original Phabricator Diff: D47321689 Test Plan: opinfo tests in CI Differential Revision: D49789403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316 Approved by: https://github.com/kimishpatel	2023-10-03 16:59:23 +00:00
Stephen Jia	ff96f6d04f	[core IR][reland] Add `split.Tensor` and `unbind` decompositions to core ATen decomp table (#110323 ) Summary: This is a reland of [github PR #110102]( https://github.com/pytorch/pytorch/pull/110102). The original PR had to be unlanded due to internal CI failures. This diff applies some small fixes to the failing tests to adjust to the new decompositions. Note that `lift_fresh` will not be decomposed for now, since it was found that [constant propogation looks specifically for `lift_fresh`](`13af952f94/torch/fx/experimental/proxy_tensor.py (L381-L386)`). Therefore decomposing `lift_fresh` will interfere with constant propogation during export. Test Plan: Github CI and internal CI Differential Revision: D49761321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110323 Approved by: https://github.com/jansel	2023-10-03 14:35:04 +00:00
Peter Bell	be3b16daad	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-28 21:23:44 +00:00
PyTorch MergeBot	e0b035c220	Revert "[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 )" This reverts commit `22e706f768`. Reverted https://github.com/pytorch/pytorch/pull/110102 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110102#issuecomment-1739856671))	2023-09-28 19:03:25 +00:00
SS-JIA	22e706f768	[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 ) ## Context Add existing decomps for `lift_fresh`, `split.Tensor`, and `unbind` to the core ATen decomposition table. Do not use them in inductor, since Inductor currently lowers these directly. One note though is that `lift_fresh`'s decomposition has a note saying it's not correct under autograd. However, my understanding is that these decompositions are registered to the `"post_autograd"` decomposition table, meaning autograd wouldn't be a factor. Would like some confirmation that this premise is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110102 Approved by: https://github.com/jansel	2023-09-28 01:21:45 +00:00
SS-JIA	dec140f1ea	[core IR] Add a core decomposition for aten.all (#110093 ) ## Context Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093 Approved by: https://github.com/manuelcandales, https://github.com/peterbell10, https://github.com/lezcano	2023-09-27 01:31:41 +00:00
SS-JIA	9928c10e71	[core IR] Add glu as a core decomposition (#110043 ) ## Context Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043 Approved by: https://github.com/peterbell10, https://github.com/lezcano ghstack dependencies: #110046	2023-09-27 00:23:05 +00:00
SS-JIA	5df8aca994	[core IR] Add a core decomposition for floor_divide (#110046 ) ## Context Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table. This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition ``` # TorchInductor-only decomposition. It should not be taken to core. # See https://github.com/pytorch/torchdynamo/pull/1120 ``` but couldn't discern the reason why this is the case. cc: @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046 Approved by: https://github.com/peterbell10	2023-09-26 08:39:21 +00:00
Mwiza Kunda	5c4b5baf21	Fix python decomps for OpOverloadPackets and add tests (#107707 ) - Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments) - Add out parameter wrappers to python decomps for aten ops that have out overloads CC. @ezyang @albanD @lezcano Fixes #107713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707 Approved by: https://github.com/lezcano	2023-09-25 20:53:30 +00:00
SS-JIA	7de669f2f9	[core IR] Remove trunc decomp and add trunc to core (#109902 ) Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902 Approved by: https://github.com/peterbell10	2023-09-25 18:18:06 +00:00
Jijie Wei	334ead04a9	Back out "[decomp] Fix baddbmm decomposition (#109714 )" (#109855 ) Summary: Original commit changeset: 95c462a380c9 Original Phabricator Diff: D49484954 this diff cause test failure for deterministic ne test see:https://www.internalfb.com/sandcastle/job/18014399565419856/ Test Plan: buck2 test 'fbcode//mode/opt' fbcode//aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test -- --exact 'aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test - aps_models.ads.icvr.tests.icvr_fm_e2e_deterministic_ne_test.ICVR_FM_E2EDeterministicNeTest: test_e2e_deterministic_icvr_fm_pt2_fsdp_multi_gpus' https://www.internalfb.com/intern/testinfra/testrun/16888498605839953 Differential Revision: D49527271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109855 Approved by: https://github.com/yanboliang	2023-09-22 22:01:38 +00:00
Mwiza Kunda	8dedc9dd9b	Add meta tests for layer/group/batch norm backward (#109591 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109591 Approved by: https://github.com/ezyang	2023-09-21 18:58:51 +00:00
Mwiza Kunda	6b7b9c796e	Fix registering jit decompositions for jvp for out wrapped decomps (#109367 ) Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since: - `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter - `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367 Approved by: https://github.com/lezcano	2023-09-21 16:36:51 +00:00
Peter Bell	6f0cf5a837	[decomp] Decompose unsafe_split{,_with_sizes} into safe variants (#109668 ) The "safety" aspect refers to the output not being registered as aliasing the input, but after AOTAutograd I don't think this distinction matters. However, we shouldn't use the same decomposition as the safe variant in case the backend doesn't want to decompose split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668 Approved by: https://github.com/lezcano ghstack dependencies: #109667	2023-09-20 18:45:56 +00:00
Peter Bell	9e629dd73c	[decomp] Add all std and std_mean overloads to core decompostions (#109667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109667 Approved by: https://github.com/lezcano	2023-09-20 18:45:56 +00:00
Peter Bell	36a8105f54	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-20 18:40:21 +00:00
Salil Desai	40b2c796dc	[Decomposition] baddbmm (#108534 ) Summary: Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions `ff38c0e2f9/torch/_inductor/decomposition.py (L203)` Test Plan: Phabricator + OSS Tests Differential Revision: D48871741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534 Approved by: https://github.com/SherlockNoMad	2023-09-20 12:49:32 +00:00
Salil Desai	d0cc623192	[Decomposition] _unsafe_view (#108713 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[9d5eabd7b213d1a356d4e7bb400355d574ea924b]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=3091 Differential Revision: D48619079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108713 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-19 13:37:35 +00:00
Salil Desai	2e721aab98	[Decomposition] Trunc (#109319 ) Summary: Add Decomp for Trunc and add it to core_aten_decompositions Differential Revision: D49042033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:30:13 +00:00
Salil Desai	ae66d0b3bf	[Decomposition] clamp_max (#108718 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1855 Differential Revision: D48880026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108718 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:25:35 +00:00

1 2 3 4 5 ...

374 Commits