pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	45c5a23237	Revert "Add Intel GPU info collection to the collect env script (#137846 )" This reverts commit `5264f8cd8d`. Reverted https://github.com/pytorch/pytorch/pull/137846 on behalf of https://github.com/malfet due to Just testing if it will fix PR time benchmarks signal ([comment](https://github.com/pytorch/pytorch/pull/137846#issuecomment-2963232606))	2025-06-11 15:18:47 +00:00
Jing Xu	5264f8cd8d	Add Intel GPU info collection to the collect env script (#137846 ) As title, add Intel GPU info collection to the collect env script Output examples: 1. CPU on Windows ``` C:\Users\user\miniforge3\envs\py310\lib\site-packages\torch\_subclasses\functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) Collecting environment information... PyTorch version: 2.8.0.dev20250528+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Microsoft Windows 11 Enterprise (10.0.22631 64-bit) GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Name: 12th Gen Intel(R) Core(TM) i7-1270P Manufacturer: GenuineIntel Family: 198 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 1711 MaxClockSpeed: 2200 L2CacheSize: 9216 L2CacheSpeed: None Revision: None Versions of relevant libraries: [pip3] torch==2.8.0.dev20250528+cpu [conda] torch 2.8.0.dev20250528+cpu pypi_0 pypi ``` 2. XPU on Windows ``` Collecting environment information... PyTorch version: 2.8.0a0+gitef6306e Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Microsoft Windows 10 Pro (10.0.19045 64-bit) GCC version: (GCC) 13.1.0 Clang version: Could not collect CMake version: version 3.29.3 Libc version: N/A Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: True XPU used to build PyTorch: 20250101 Intel GPU driver version: * 32.0.101.6795 (20250520000000.****+) Intel GPU models onboard: Intel(R) Arc(TM) A770 Graphics Intel GPU models detected: * [0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33184', total_memory=15915MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1) HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: ---------------------- Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Manufacturer: GenuineIntel Family: 179 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 2401 MaxClockSpeed: 2401 L2CacheSize: 24576 L2CacheSpeed: None Revision: 21767 ---------------------- Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Manufacturer: GenuineIntel Family: 179 Architecture: 9 ProcessorType: 3 DeviceID: CPU1 CurrentClockSpeed: 2200 MaxClockSpeed: 2401 L2CacheSize: 24576 L2CacheSpeed: None Revision: 21767 Versions of relevant libraries: [pip3] intel_extension_for_pytorch==2.8.10+gitb3ea3a1 [pip3] numpy==2.1.2 [pip3] optree==0.13.1 [pip3] pytorch-triton-xpu==3.3.1+gitb0e26b73 [pip3] torch==2.8.0a0+gitef6306e [conda] intel-extension-for-pytorch 2.8.10+gitb3ea3a1 pypi_0 pypi [conda] mkl 2025.1.0 pypi_0 pypi [conda] mkl-dpcpp 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-datafitting 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-stats 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-vm 2025.1.0 pypi_0 pypi [conda] pytorch-triton-xpu 3.3.1+gitb0e26b73 pypi_0 pypi [conda] torch 2.8.0a0+gitef6306e pypi_0 pypi ``` 3. CPU on Linux ``` /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) Collecting environment information... PyTorch version: 2.8.0.dev20250528+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: AlmaLinux 8.10 (Cerulean Leopard) (x86_64) GCC version: (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7) Clang version: Could not collect CMake version: version 4.0.0 Libc version: glibc-2.28 Python version: 3.12.10 (main, Apr 19 2025, 05:03:56) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] (64-bit runtime) Python platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.28 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-87 Thread(s) per core: 2 Core(s) per socket: 22 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz Stepping: 7 CPU MHz: 1000.000 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 30976K NUMA node0 CPU(s): 0-21,44-65 NUMA node1 CPU(s): 22-43,66-87 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities Versions of relevant libraries: [pip3] torch==2.8.0.dev20250528+cpu [conda] Could not collect ``` 5. XPU on Linux ``` Collecting environment information... PyTorch version: 2.8.0.dev20250516+xpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.31.6 Libc version: glibc-2.35 Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:19:12) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-5.15.50-051550-generic-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: True XPU used to build PyTorch: 20250101 Intel GPU driver version: * intel_opencl: 24.39.31294.21-1032~22.04 * level_zero: 1.17.44.0-1022~22.04 Intel GPU models onboard: * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 Intel GPU models detected: * [0] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [1] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [2] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [3] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [4] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [5] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [6] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [7] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 224 On-line CPU(s) list: 0-223 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8480+ CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 56 Socket(s): 2 Stepping: 6 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 4000.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 5.3 MiB (112 instances) L1i cache: 3.5 MiB (112 instances) L2 cache: 224 MiB (112 instances) L3 cache: 210 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-55,112-167 NUMA node1 CPU(s): 56-111,168-223 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==2.2.5 [pip3] pytorch-triton-xpu==3.3.0+git0bcc8265 [pip3] torch==2.8.0.dev20250516+xpu [conda] mkl 2025.1.0 pypi_0 pypi [conda] numpy 2.2.5 pypi_0 pypi [conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi [conda] pytorch-triton-xpu 3.3.0+git0bcc8265 pypi_0 pypi [conda] torch 2.8.0.dev20250516+xpu pypi_0 pypi ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/137846 Approved by: https://github.com/guangyey, https://github.com/malfet Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-06-11 01:22:06 +00:00
PyTorch MergeBot	6fb6293159	Revert "Add Intel GPU info collection to the collect env script (#137846 )" This reverts commit `c6b4f98625`. Reverted https://github.com/pytorch/pytorch/pull/137846 on behalf of https://github.com/etaf due to This is breaking tests on xpu, detail log: https://hud.pytorch.org/pr/pytorch/pytorch/154962#43700962849 ([comment](https://github.com/pytorch/pytorch/pull/137846#issuecomment-2954517883))	2025-06-09 03:13:27 +00:00
Jing Xu	c6b4f98625	Add Intel GPU info collection to the collect env script (#137846 ) As title, add Intel GPU info collection to the collect env script Output examples: 1. CPU on Windows ``` C:\Users\user\miniforge3\envs\py310\lib\site-packages\torch\_subclasses\functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) Collecting environment information... PyTorch version: 2.8.0.dev20250528+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Microsoft Windows 11 Enterprise (10.0.22631 64-bit) GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Name: 12th Gen Intel(R) Core(TM) i7-1270P Manufacturer: GenuineIntel Family: 198 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 1711 MaxClockSpeed: 2200 L2CacheSize: 9216 L2CacheSpeed: None Revision: None Versions of relevant libraries: [pip3] torch==2.8.0.dev20250528+cpu [conda] torch 2.8.0.dev20250528+cpu pypi_0 pypi ``` 2. XPU on Windows ``` Collecting environment information... PyTorch version: 2.8.0a0+gitef6306e Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Microsoft Windows 10 Pro (10.0.19045 64-bit) GCC version: (GCC) 13.1.0 Clang version: Could not collect CMake version: version 3.29.3 Libc version: N/A Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: True XPU used to build PyTorch: 20250101 Intel GPU driver version: * 32.0.101.6795 (20250520000000.****+) Intel GPU models onboard: Intel(R) Arc(TM) A770 Graphics Intel GPU models detected: * [0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33184', total_memory=15915MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1) HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: ---------------------- Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Manufacturer: GenuineIntel Family: 179 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 2401 MaxClockSpeed: 2401 L2CacheSize: 24576 L2CacheSpeed: None Revision: 21767 ---------------------- Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Manufacturer: GenuineIntel Family: 179 Architecture: 9 ProcessorType: 3 DeviceID: CPU1 CurrentClockSpeed: 2200 MaxClockSpeed: 2401 L2CacheSize: 24576 L2CacheSpeed: None Revision: 21767 Versions of relevant libraries: [pip3] intel_extension_for_pytorch==2.8.10+gitb3ea3a1 [pip3] numpy==2.1.2 [pip3] optree==0.13.1 [pip3] pytorch-triton-xpu==3.3.1+gitb0e26b73 [pip3] torch==2.8.0a0+gitef6306e [conda] intel-extension-for-pytorch 2.8.10+gitb3ea3a1 pypi_0 pypi [conda] mkl 2025.1.0 pypi_0 pypi [conda] mkl-dpcpp 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-datafitting 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-stats 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-vm 2025.1.0 pypi_0 pypi [conda] pytorch-triton-xpu 3.3.1+gitb0e26b73 pypi_0 pypi [conda] torch 2.8.0a0+gitef6306e pypi_0 pypi ``` 3. CPU on Linux ``` /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) Collecting environment information... PyTorch version: 2.8.0.dev20250528+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: AlmaLinux 8.10 (Cerulean Leopard) (x86_64) GCC version: (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7) Clang version: Could not collect CMake version: version 4.0.0 Libc version: glibc-2.28 Python version: 3.12.10 (main, Apr 19 2025, 05:03:56) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] (64-bit runtime) Python platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.28 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-87 Thread(s) per core: 2 Core(s) per socket: 22 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz Stepping: 7 CPU MHz: 1000.000 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 30976K NUMA node0 CPU(s): 0-21,44-65 NUMA node1 CPU(s): 22-43,66-87 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities Versions of relevant libraries: [pip3] torch==2.8.0.dev20250528+cpu [conda] Could not collect ``` 5. XPU on Linux ``` Collecting environment information... PyTorch version: 2.8.0.dev20250516+xpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.31.6 Libc version: glibc-2.35 Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:19:12) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-5.15.50-051550-generic-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: True XPU used to build PyTorch: 20250101 Intel GPU driver version: * intel_opencl: 24.39.31294.21-1032~22.04 * level_zero: 1.17.44.0-1022~22.04 Intel GPU models onboard: * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 Intel GPU models detected: * [0] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [1] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [2] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [3] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [4] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [5] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [6] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [7] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 224 On-line CPU(s) list: 0-223 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8480+ CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 56 Socket(s): 2 Stepping: 6 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 4000.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 5.3 MiB (112 instances) L1i cache: 3.5 MiB (112 instances) L2 cache: 224 MiB (112 instances) L3 cache: 210 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-55,112-167 NUMA node1 CPU(s): 56-111,168-223 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==2.2.5 [pip3] pytorch-triton-xpu==3.3.0+git0bcc8265 [pip3] torch==2.8.0.dev20250516+xpu [conda] mkl 2025.1.0 pypi_0 pypi [conda] numpy 2.2.5 pypi_0 pypi [conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi [conda] pytorch-triton-xpu 3.3.0+git0bcc8265 pypi_0 pypi [conda] torch 2.8.0.dev20250516+xpu pypi_0 pypi ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/137846 Approved by: https://github.com/guangyey, https://github.com/malfet Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-06-06 05:53:24 +00:00
PyTorch MergeBot	0db3e0cf29	Revert "Add Intel GPU info collection to the collect env script (#137846 )" This reverts commit `e1180c7228`. Reverted https://github.com/pytorch/pytorch/pull/137846 on behalf of https://github.com/malfet due to Breaks doc test, but should be easily fixable ([comment](https://github.com/pytorch/pytorch/pull/137846#issuecomment-2947935940))	2025-06-06 03:08:48 +00:00
Jing Xu	e1180c7228	Add Intel GPU info collection to the collect env script (#137846 ) As title, add Intel GPU info collection to the collect env script Output examples: 1. CPU on Windows ``` C:\Users\user\miniforge3\envs\py310\lib\site-packages\torch\_subclasses\functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) Collecting environment information... PyTorch version: 2.8.0.dev20250528+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Microsoft Windows 11 Enterprise (10.0.22631 64-bit) GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Name: 12th Gen Intel(R) Core(TM) i7-1270P Manufacturer: GenuineIntel Family: 198 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 1711 MaxClockSpeed: 2200 L2CacheSize: 9216 L2CacheSpeed: None Revision: None Versions of relevant libraries: [pip3] torch==2.8.0.dev20250528+cpu [conda] torch 2.8.0.dev20250528+cpu pypi_0 pypi ``` 2. XPU on Windows ``` Collecting environment information... PyTorch version: 2.8.0a0+gitef6306e Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Microsoft Windows 10 Pro (10.0.19045 64-bit) GCC version: (GCC) 13.1.0 Clang version: Could not collect CMake version: version 3.29.3 Libc version: N/A Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: True XPU used to build PyTorch: 20250101 Intel GPU driver version: * 32.0.101.6795 (20250520000000.****+) Intel GPU models onboard: Intel(R) Arc(TM) A770 Graphics Intel GPU models detected: * [0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33184', total_memory=15915MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1) HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: ---------------------- Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Manufacturer: GenuineIntel Family: 179 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 2401 MaxClockSpeed: 2401 L2CacheSize: 24576 L2CacheSpeed: None Revision: 21767 ---------------------- Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz Manufacturer: GenuineIntel Family: 179 Architecture: 9 ProcessorType: 3 DeviceID: CPU1 CurrentClockSpeed: 2200 MaxClockSpeed: 2401 L2CacheSize: 24576 L2CacheSpeed: None Revision: 21767 Versions of relevant libraries: [pip3] intel_extension_for_pytorch==2.8.10+gitb3ea3a1 [pip3] numpy==2.1.2 [pip3] optree==0.13.1 [pip3] pytorch-triton-xpu==3.3.1+gitb0e26b73 [pip3] torch==2.8.0a0+gitef6306e [conda] intel-extension-for-pytorch 2.8.10+gitb3ea3a1 pypi_0 pypi [conda] mkl 2025.1.0 pypi_0 pypi [conda] mkl-dpcpp 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-datafitting 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-stats 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-vm 2025.1.0 pypi_0 pypi [conda] pytorch-triton-xpu 3.3.1+gitb0e26b73 pypi_0 pypi [conda] torch 2.8.0a0+gitef6306e pypi_0 pypi ``` 3. CPU on Linux ``` /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.) cpu = _conversion_method_template(device=torch.device("cpu")) Collecting environment information... PyTorch version: 2.8.0.dev20250528+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: AlmaLinux 8.10 (Cerulean Leopard) (x86_64) GCC version: (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7) Clang version: Could not collect CMake version: version 4.0.0 Libc version: glibc-2.28 Python version: 3.12.10 (main, Apr 19 2025, 05:03:56) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] (64-bit runtime) Python platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.28 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-87 Thread(s) per core: 2 Core(s) per socket: 22 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz Stepping: 7 CPU MHz: 1000.000 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 30976K NUMA node0 CPU(s): 0-21,44-65 NUMA node1 CPU(s): 22-43,66-87 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities Versions of relevant libraries: [pip3] torch==2.8.0.dev20250528+cpu [conda] Could not collect ``` 5. XPU on Linux ``` Collecting environment information... PyTorch version: 2.8.0.dev20250516+xpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.31.6 Libc version: glibc-2.35 Python version: 3.10.17 \| packaged by conda-forge \| (main, Apr 10 2025, 22:19:12) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-5.15.50-051550-generic-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: True XPU used to build PyTorch: 20250101 Intel GPU driver version: * intel_opencl: 24.39.31294.21-1032~22.04 * level_zero: 1.17.44.0-1022~22.04 Intel GPU models onboard: * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 * Intel(R) Data Center GPU Max 1550 Intel GPU models detected: * [0] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [1] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [2] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [3] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [4] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [5] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [6] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) * [7] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1) HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 224 On-line CPU(s) list: 0-223 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8480+ CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 56 Socket(s): 2 Stepping: 6 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 4000.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 5.3 MiB (112 instances) L1i cache: 3.5 MiB (112 instances) L2 cache: 224 MiB (112 instances) L3 cache: 210 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-55,112-167 NUMA node1 CPU(s): 56-111,168-223 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==2.2.5 [pip3] pytorch-triton-xpu==3.3.0+git0bcc8265 [pip3] torch==2.8.0.dev20250516+xpu [conda] mkl 2025.1.0 pypi_0 pypi [conda] numpy 2.2.5 pypi_0 pypi [conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi [conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi [conda] pytorch-triton-xpu 3.3.0+git0bcc8265 pypi_0 pypi [conda] torch 2.8.0.dev20250516+xpu pypi_0 pypi ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/137846 Approved by: https://github.com/guangyey, https://github.com/malfet Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-06-05 22:35:04 +00:00
Aaron Gokaslan	1bd6bc7190	[BE]: Enable ruff YTT linter for Python version checks (#153547 ) Adds ruff YTT checks to help future proof version checks and follow best practices here. Also makes it easier for static linters like mypy to detect python version branching. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153547 Approved by: https://github.com/albanD	2025-05-14 21:09:16 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
Adam J. Stewart	675f69f40f	collect_env: gracefully handle no pip (#151607 ) If pip is not installed: ### Before ```console > python3 torch/utils/collect_env.py Collecting environment information... Traceback (most recent call last): File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 694, in <module> main() ~~~~^^ File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 677, in main output = get_pretty_env_info() File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 672, in get_pretty_env_info return pretty_str(get_env_info()) ~~~~~~~~~~~~^^ File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 497, in get_env_info pip_version, pip_list_output = get_pip_packages(run_lambda) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^ File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 450, in get_pip_packages for line in out.splitlines() ^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'splitlines' ``` ### After ```console > python3 torch/utils/collect_env.py Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: macOS 15.4 (arm64) GCC version: Could not collect Clang version: 20.1.0 CMake version: version 3.31.6 Libc version: N/A Python version: 3.13.2 (main, Apr 8 2025, 15:27:33) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime) Python platform: macOS-15.4-arm64-arm-64bit-Mach-O Is CUDA available: N/A CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A CPU: Apple M2 Pro Versions of relevant libraries: [pip3] Could not collect [conda] Could not collect ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151607 Approved by: https://github.com/malfet	2025-04-18 12:28:58 +00:00
Ethan Wee	6cbf97ede8	[ROCm] enable HIPMallocAsyncAllocator (#149145 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/izaitsevfb Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-03-19 23:42:35 +00:00
PyTorch MergeBot	e1d143cb7b	Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145 )" This reverts commit `ee1a2b7810`. Reverted https://github.com/pytorch/pytorch/pull/149145 on behalf of https://github.com/izaitsevfb due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/149145#issuecomment-2738115728))	2025-03-19 21:12:13 +00:00
Ethan Wee	ee1a2b7810	[ROCm] enable HIPMallocAsyncAllocator (#149145 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-03-19 03:59:55 +00:00
PyTorch MergeBot	9d37b501db	Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145 )" This reverts commit `2e02c07a5d`. Reverted https://github.com/pytorch/pytorch/pull/149145 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @albanD, might you be able to help get this PR landed? See D71214814 for more details on the failure. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/149145#issuecomment-2730104736))	2025-03-17 16:17:02 +00:00
Ethan Wee	2e02c07a5d	[ROCm] enable HIPMallocAsyncAllocator (#149145 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/jeffdaily	2025-03-14 18:21:27 +00:00
PyTorch MergeBot	0aa34e9591	Revert "Collect packages with importlib in collect_env (#144616 )" This reverts commit `3541d2a2aa`. Reverted https://github.com/pytorch/pytorch/pull/144616 on behalf of https://github.com/malfet due to Somehow this change causes test_bottleneck_cuda to fail ([comment](https://github.com/pytorch/pytorch/pull/144616#issuecomment-2586095595))	2025-01-13 03:11:04 +00:00
Sv. Lockal	3541d2a2aa	Collect packages with importlib in collect_env (#144616 ) If pytorch is installed systemwide (via os package manager) or by alternative package manager like `uv`, pip is not available, causing error in `collect_env`. However it is still possible to collect exactly the same list using `importlib` API, which is always available. Fixes #144615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144616 Approved by: https://github.com/malfet	2025-01-12 23:21:08 +00:00
Alexander Grund	67cf126cf8	Disable PIP version check in collect_env (#142308 ) Disables version check which might require users to reach out to PyPI, reference: https://pip.pypa.io/en/latest/cli/pip/#cmdoption-disable-pip-version-check Switches pip to be used directly as a python module (`python3 -mpip`) instead of relying on `pip3` or `pip` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142308 Approved by: https://github.com/seemethere	2024-12-10 19:16:36 +00:00
Jing Xu	14e6624473	Update wmic command used in collect_env.py to its counterpart in powershell due to its deprecation (#138297 ) As title. `wmic` is deprecated in Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138297 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-10-18 07:03:17 +00:00
ur4t	0b168ceb6d	Collect Nvidia libraries with collect_env.py (#138076 ) Collect Nvidia libraries to diagnose issues like #133548. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138076 Approved by: https://github.com/malfet	2024-10-18 05:05:00 +00:00
Aaron Orenstein	8db9dfa2d7	Flip default value for mypy disallow_untyped_defs [9/11] (#127846 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846 Approved by: https://github.com/ezyang ghstack dependencies: #127842, #127843, #127844, #127845	2024-06-08 18:50:06 +00:00
Kazuaki Ishizaki	117ab34891	Documenting the torch.utils.collect_env.get_pretty_env_info function (#128123 ) Fixes #127888 This PR adds docstring to the `torch.utils.collect_env.get_pretty_env_info` function Pull Request resolved: https://github.com/pytorch/pytorch/pull/128123 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-06-07 00:43:18 +00:00
Arun Pa	3acbfd602e	Document torch.utils.collect_env.get_env_info function (#128021 ) Fixes #127911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128021 Approved by: https://github.com/malfet	2024-06-05 17:44:47 +00:00
Aaron Gokaslan	3fe437b24b	[BE]: Update flake8 to v6.1.0 and fix lints (#116591 ) Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling. - Replace `assert(0)` with `raise AssertionError()` - Remove extraneous parenthesis i.e. - `assert(a == b)` -> `assert a == b` - `if(x > y or y < z):`->`if x > y or y < z:` - And `return('...')` -> `return '...'` Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591 Approved by: https://github.com/albanD, https://github.com/malfet	2024-01-03 06:04:44 +00:00
albanD	5e58be678c	Make collect env BC compatible (#116532 ) To avoid errors like the one in https://github.com/pytorch/pytorch/issues/116531 when the user tries to run collect_env Pull Request resolved: https://github.com/pytorch/pytorch/pull/116532 Approved by: https://github.com/malfet	2023-12-30 01:13:37 +00:00
joncrall	765d4599ee	Give users control over packages in torch.utils.collect_env (#112993 ) I'm looking to repurpose some logic in `torch.utils.collect_env` for the `geowatch` package. I'm mostly able to just use this script as a library, which is great because it reduces code in my package. However, the issue is that the package patterns that are relevant to torch are hard-coded inside of `get_conda_packages` and `get_pip_packages`. The changes I made are simple. I defined the default package patterns as two global sets, and I added an argument to each function that lets the user customize exactly what package patterns are relevant. If they are not specified the defaults are used. I was considering extending the power of the patterns by utilizing `fnmatch`, `re` (or [xdev.pattern](https://github.com/Erotemic/xdev/blob/main/xdev/patterns.py) which abstracts them both), but instead I opted to just use the existing `__contains__` test to keep things simple. From torch's perspective this should make maintaining this file slightly easier because to update relevant packages, the developer now updates two neighboring top-level globals instead of two separated local variables. However, it does add an argument to two functions, and that argument isn't used in torch itself, so there is an argument for removing that, and then users could still have some control by modifying globals, but I think the way I did it balances the tradeoffs well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112993 Approved by: https://github.com/zou3519	2023-11-28 22:35:25 +00:00
ChanBong	5e10dd2c78	fix docstring issues in torch.utils (#113335 ) Fixes #112634 Fixes all the issues listed except in `torch/utils/_pytree.py` as the file no longer exists. ### Error counts \|File \| Count Before \| Count now\| \|---- \| ---- \| ---- \| \|`torch/utils/collect_env.py` \| 39 \| 25\| \|`torch/utils/cpp_extension.py` \| 51 \| 13\| \|`torch/utils/flop_counter.py` \| 25 \| 8\| \|`torch/utils/_foreach_utils.py.py` \| 2 \| 0\| \|`torch/utils/_python_dispatch.py.py` \| 26 \| 25\| \|`torch/utils/backend_registration.py` \| 15 \| 4\| \|`torch/utils/checkpoint.py` \| 29 \| 21\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/113335 Approved by: https://github.com/ezyang	2023-11-13 19:37:25 +00:00
Jithun Nair	333d5821ee	[ROCm] Add gcnArchName to collect_env and torch.cuda.get_device_properties (#107477 ) Printing just the device name is not helpful when investigating PyTorch issues filed for specific AMD GPUs, as the support/issue might depend on the gfx arch, which is part of the gcnArchName property. `torch.cuda.get_device_properties(0).gcnArchName` will print the value of the `gcnArchName` property: eg. ``` >>> torch.cuda.get_device_properties(0).gcnArchName 'gfx906:sramecc+:xnack-' ``` ``` root@6f064e3c19fb:/data/pytorch/test# python ../torch/utils/collect_env.py ... GPU models and configuration: AMD Radeon Graphics(gfx906:sramecc+:xnack-) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107477 Approved by: https://github.com/albanD	2023-10-31 23:05:36 +00:00
Justin Chu	d24e7be243	Include `onnx` and `onnxscript` information in collect_env.py (#110560 ) `onnx` and `onnxscript` are used in torch.onnx.dynamo_export since 2.0. It would be helpful to collect version information in user issue reports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110560 Approved by: https://github.com/albanD	2023-10-05 01:29:04 +00:00
Xuehai Pan	0bf30c140a	[pytree] Use OpTree for PyTree manipulation (#93139 ) Split from #92679. Use C++-based PyTree implementation. ## Highlights 1. High performance (20x speedup than the pure-Python implementation, 10%-20% overall speedup for `torch.fx`) 2. Multi-input tree-map support 3. Custom tree node registry with namespace isolation Refs: - #65761 - #91323 - #92679 From https://github.com/pytorch/pytorch/issues/65761#issuecomment-1334746366: > ### 0. Out-of-box compatible with JAX's pytree, provides the same interfaces and functions (and more). > > ### 1. High-performance: `optree` has comparable fast tree operations (~0.9x for `dict`s and ~2.5x for `OrderedDict`s) than JAX's pytree and it is 20x faster than `torch.utils._pytree`. > > `optree` implements some common Python container types in C++ (e.g., `OrderedDict`) and achieves 2.5x performance than JAX's pytree. Check out section [Built-in PyTree Node Types](https://github.com/metaopt/optree#built-in-pytree-node-types) and [Benchmark](https://github.com/metaopt/optree#benchmark) for more details. > > \| Module \| Nodes \| OpTree (μs) \| JAX XLA (μs) \| PyTorch (μs) \| DM-Tree (μs) \| Speedup (J / O) \| Speedup (P / O) \| Speedup (D / O) \| > \| :-------- \| ----: \| ----------: \| -----------: \| -----------: \| -----------: \| --------------: \| --------------: \| --------------: \| > \| TinyMLP \| 53 \| 26.40 \| 68.19 \| 586.87 \| 34.14 \| 2.58 \| 22.23 \| 1.29 \| > \| AlexNet \| 188 \| 84.28 \| 259.51 \| 2182.07 \| 125.12 \| 3.08 \| 25.89 \| 1.48 \| > \| ResNet18 \| 698 \| 288.57 \| 807.27 \| 7881.69 \| 429.39 \| 2.80 \| 27.31 \| 1.49 \| > \| ResNet34 \| 1242 \| 580.75 \| 1564.97 \| 15082.84 \| 819.02 \| 2.69 \| 25.97 \| 1.41 \| > \| ResNet50 \| 1702 \| 791.18 \| 2081.17 \| 20982.82 \| 1104.62 \| 2.63 \| 26.52 \| 1.40 \| > \| ResNet101 \| 3317 \| 1603.93 \| 3939.37 \| 40382.14 \| 2208.63 \| 2.46 \| 25.18 \| 1.38 \| > \| ResNet152 \| 4932 \| 2446.56 \| 6267.98 \| 56892.36 \| 3139.17 \| 2.56 \| 23.25 \| 1.28 \| > \| ViT-H/14 \| 3420 \| 1681.48 \| 4488.33 \| 41703.16 \| 2504.86 \| 2.67 \| 24.80 \| 1.49 \| > \| Swin-B \| 2881 \| 1565.41 \| 4091.10 \| 34241.99 \| 1936.75 \| 2.61 \| 21.87 \| 1.24 \| > \| \| \| \| \| \| Average \| 2.68 \| 24.78 \| 1.38 \| > > <div align="center"> > <img src="https://user-images.githubusercontent.com/16078332/200494435-fd5bb385-59f7-4811-b520-98bf5763ccf3.png" width="90%" /> > </div> > > ### 2. Namespace Isolation for the PyTree Type Registry > > In addition to the JAX's pytree registry for custom node type registration, `optree` adds `namespace` isolation to the registry. Users can register the same type multiple times for different flatten/unflatten behavior. It also provides module-level isolation for safety reasons. For example, you can add a unique prefix to your namespace to isolate your registry with other modules (e.g., `torch.xxx`, `torch.functorch.xxx`): > > ```python > # Register a Python type into a namespace > import torch > > optree.register_pytree_node( > torch.Tensor, > # (tensor) -> (children, metadata) > flatten_func=lambda tensor: ( > (tensor.cpu().numpy(),), > dict(dtype=tensor.dtype, device=tensor.device, requires_grad=tensor.requires_grad), > ), > # (metadata, children) -> tensor > unflatten_func=lambda metadata, children: torch.tensor(children[0], *metadata), > namespace='torch.torch2numpy', > ) > ``` > > ```python > >>> tree = {'weight': torch.ones(size=(1, 2)).cuda(), 'bias': torch.zeros(size=(2,))} > >>> tree > {'weight': tensor([[1., 1.]], device='cuda:0'), 'bias': tensor([0., 0.])} > > # Flatten without specifying the namespace > >>> tree_flatten(tree) # `torch.Tensor`s are leaf nodes > ([tensor([0., 0.]), tensor([[1., 1.]], device='cuda:0')], PyTreeSpec({'bias': , 'weight': })) > > # Flatten with the namespace > >>> leaves, treespec = optree.tree_flatten(tree, namespace='torch.torch2numpy') > >>> leaves, treespec > ( > [array([0., 0.], dtype=float32), array([[1., 1.]], dtype=float32)], > PyTreeSpec( > { > 'bias': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cpu'), 'requires_grad': False}], []), > 'weight': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False}], [*]) > }, > namespace='torch.torch2numpy' > ) > ) > > # `entries` are not defined and use `range(len(children))` > >>> optree.tree_paths(tree, namespace='torch.torch2numpy') > [('bias', 0), ('weight', 0)] > > # Unflatten back to a copy of the original object > >>> optree.tree_unflatten(treespec, leaves) > {'bias': tensor([0., 0.]), 'weight': tensor([[1., 1.]], device='cuda:0')} > ``` > > Check out section [Registering a Container-like Custom Type as Non-leaf Nodes](https://github.com/metaopt/optree#notes-about-the-pytree-type-registry) for more details. > > ### 3. Support both `None` as Non-leaf Node and `None` as Leaf > > In JAX's implementation, `None` is always an internal non-leaf node with an arity 0, which is like an empty tuple. This limits the usage of the JAX's pytree utilities for PyTorch. For example, the `nn.Module` uses `_parameters` and `_buffers` (`OrderedDict[str, Optional[Tensor]]`) to hold the tensors, while the value can be a tensor or `None`. > > `optree` supports both `None` as Non-leaf Node (JAX's default) and `None` as Leaf (PyTorch's default). Check out section [None is Non-leaf Node vs. None is Leaf](https://github.com/metaopt/optree#none-is-non-leaf-node-vs-none-is-leaf) for more details. > > ### 4. Some other improvements and bug fixes > > 1. Adds in-place version of treemap (`tree_map_`), which reduces redundant unflatten operation for better performance. > 2. Adds support for tree flatten and tree map with paths. (useful for `functorch` module extraction). > 3. Improves the JAX's pytree sorting support for `dict`s. > 4. Better string representation `repr(PyTreeSpec)`. > 5. Fixes some bugs for JAX's pytree of hashing, pickle serialization, segmentation fault for infinite recursion, and tree-compose/tree-transpose. From https://github.com/pytorch/pytorch/pull/92679#issuecomment-1398778481: > ```python > # pytree_make_fx_bench.py > import torch > from torch.fx.experimental.proxy_tensor import make_fx > import time > > def f(x): > for _ in range(10000): > x = x+x > return x > > import time > begin = time.time() > out = make_fx(f, tracing_mode="real")(torch.randn(20)) > begin = time.time() > print(f'tracing_mode="real" {time.time() - begin:.2f}') > out = make_fx(f, tracing_mode="fake")(torch.randn(20)) > print(f'tracing_mode="fake" {time.time() - begin:.2f}') > > out = make_fx(f, tracing_mode="symbolic")(torch.randn(20)) > print(f'tracing_mode="symbolic" {time.time() - begin:.2f}') > ``` > > This seems to run around 10-20% faster with the optree implementation: > > ``` > # Optree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 6.32 > tracing_mode="symbolic" 27.13 > ``` > > ``` > # torch.utils._pytree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 7.66 > tracing_mode="symbolic" 31.07 > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93139 Approved by: https://github.com/malfet	2023-09-18 21:24:56 +00:00
Aaron Gokaslan	2f95a3d0fc	[BE]: Apply ruff PERF fixes to torch (#104917 ) Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-11 20:45:21 +00:00
Nikita Shulga	1ac663d9f1	`collect_env`: parse HIP version exception free (#101844 ) Should prevent broken collect_env reporting as shown in https://github.com/pytorch/vision/issues/7561#issue-1698000841 <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 5204e0f</samp> > _`get_version_or_na`_ > _Helper function refactors_ > _Code like autumn leaves_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/101844 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi	2023-05-19 17:24:35 +00:00
Nikita Shulga	bd78532020	[BE] Fix `collect_env` for python-path-with-space (#98415 ) By invoking [`Popen`](https://docs.python.org/2.7/library/subprocess.html#popen-constructor) with list of command line arguments, rather than strings that would be parsed by shell. Test plan: ```shell % conda create -n py311 python=3.11 % cd ~/miniconda3/envs % cp -a py311 py\ 311 % ./py\ 311/bin/python -mtorch.utils.collect_env ``` Fixes https://github.com/pytorch/pytorch/issues/98385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98415 Approved by: https://github.com/huydhn	2023-04-06 01:09:23 +00:00
mpearce25	5e6e984835	flake8 version reporting in collect_env (#94573 ) Fixes #94571 # Testing `[pip3] flake8==3.9.2` now appears under `Versions of relevant libraries:` when running: `python torch/utils/collect_env.py` ### Output with this change ``` Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: macOS 13.1 (x86_64) GCC version: Could not collect Clang version: 14.0.0 (clang-1400.0.29.202) CMake version: Could not collect Libc version: N/A Python version: 3.9.12 (main, Apr 5 2022, 01:53:17) [Clang 12.0.0 ] (64-bit runtime) Python platform: macOS-10.16-x86_64-i386-64bit Is CUDA available: N/A CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz Versions of relevant libraries: [pip3] flake8==3.9.2 [pip3] mypy==0.971 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.5 [pip3] numpydoc==1.2 [conda] blas 1.0 mkl [conda] mkl 2021.4.0 hecd8cb5_637 [conda] mkl-service 2.4.0 py39h9ed2024_0 [conda] mkl_fft 1.3.1 py39h4ab4a9b_0 [conda] mkl_random 1.2.2 py39hb2f4e1b_0 [conda] numpy 1.21.5 py39h2e5f0a9_1 [conda] numpy-base 1.21.5 py39h3b1a694_1 [conda] numpydoc 1.2 pyhd3eb1b0_0 ``` ### Output before ``` Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: macOS 13.1 (x86_64) GCC version: Could not collect Clang version: 14.0.0 (clang-1400.0.29.202) CMake version: Could not collect Libc version: N/A Python version: 3.9.12 (main, Apr 5 2022, 01:53:17) [Clang 12.0.0 ] (64-bit runtime) Python platform: macOS-10.16-x86_64-i386-64bit Is CUDA available: N/A CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz Versions of relevant libraries: [pip3] mypy==0.971 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.5 [pip3] numpydoc==1.2 [conda] blas 1.0 mkl [conda] mkl 2021.4.0 hecd8cb5_637 [conda] mkl-service 2.4.0 py39h9ed2024_0 [conda] mkl_fft 1.3.1 py39h4ab4a9b_0 [conda] mkl_random 1.2.2 py39hb2f4e1b_0 [conda] numpy 1.21.5 py39h2e5f0a9_1 [conda] numpy-base 1.21.5 py39h3b1a694_1 [conda] numpydoc 1.2 pyhd3eb1b0_0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94573 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-03-28 03:24:41 +00:00
Edward Z. Yang	4454655a4c	Add triton to relevant packages (#96663 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96663 Approved by: https://github.com/janeyx99, https://github.com/malfet, https://github.com/atalman	2023-03-13 19:02:07 +00:00
Aaron Gokaslan	3ce1ebb6fb	Apply some safe comprehension optimizations (#94323 ) Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323 Approved by: https://github.com/albanD	2023-02-07 23:53:46 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Jing Xu	79243516f6	collect CPU info with collect_env.py for new issues reporting (#93899 ) Add CPU information collection feature to collect_env.py for new issues reporting. This helps us to triage issues on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93899 Approved by: https://github.com/malfet	2023-02-03 04:58:53 +00:00
Syed Tousif Ahmed	77d94ac5ab	Sets CUDA_MODULE_LOADING to LAZY when not set by the user (#85692 ) This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY". It was tested using the following commands: ``` python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows a memory usage of: 287,047,680 bytes vs ``` CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows 666,632,192 bytes. C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality). cc: @ptrblck @ngimel @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/85692 Approved by: https://github.com/malfet	2022-10-13 14:03:01 +00:00
Eli Uriegas	be25566d13	tools: Ensure compat for collect_env with python 3.5 Users were reporting errors of not being able to use collect_env with older versions of python. This adds a test to ensure that we maintain compat for this script with older versions of python Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78946 Approved by: https://github.com/janeyx99	2022-06-06 21:32:57 +00:00
Philip Meier	635aaa3d9d	replace "grep" with Python processing in `collect_env.py` (#77148 ) Fixes #77063. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77148 Approved by: https://github.com/ezyang	2022-05-10 19:35:15 +00:00
Eli Uriegas	c170d395de	utils: Only check for xnnpack if torch installed (#74342 ) Summary: Fixes a bug where collect_env.py was not able to be run without having torch installed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74342 Reviewed By: malfet, janeyx99 Differential Revision: D34943464 Pulled By: seemethere fbshipit-source-id: dbaa0004b88cb643a9c6426c9ea7c5be3d3c9ef5 (cherry picked from commit 4f39ebb823f88df0c3902db15deaffc6ba481cb3)	2022-03-17 15:31:26 +00:00
Digant Desai	b2054d3025	Prepare for an update to the XNNPACK submodule (#72642 ) Summary: - Target Sha1: ae108ef49aa5623b896fc93d4298c49d1750d9ba - Make USE_XNNPACK a dependent option on cmake minimum version 3.12 - Print USE_XNNPACK under cmake options summary, and print the availability from collet_env.py - Skip XNNPACK based tests when XNNPACK is not available - Add SkipIfNoXNNPACK wrapper to skip tests - Update cmake version for xenial-py3.7-gcc5.4 image to 3.12.4 - This is required for the backwards compatibility test. The PyTorch op schema is XNNPACK dependent. See, aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp for example. The nightly version is assumed to have USE_XNNPACK=ON, so with this change we ensure that the test build can also have XNNPACK. - HACK: skipping test_xnnpack_integration tests on ROCM Pull Request resolved: https://github.com/pytorch/pytorch/pull/72642 Reviewed By: kimishpatel Differential Revision: D34456794 Pulled By: digantdesai fbshipit-source-id: 85dbfe0211de7846d8a84321b14fdb061cd6c037 (cherry picked from commit 6cf48e7b64d6979962d701b5d493998262cc8bfa)	2022-02-25 00:39:15 +00:00
Nikita Shulga	3f06f29577	Improve pip package determination (#63321 ) Summary: Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'` Also, add mypy to the list of packages of interest Discovered while looking at https://github.com/pytorch/pytorch/issues/63279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321 Reviewed By: walterddr Differential Revision: D30342099 Pulled By: malfet fbshipit-source-id: fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0	2021-08-16 13:54:39 -07:00
Ruslan Semenov	150c828803	Add lint rule to keep collect_env.py python2 compliant (#60946 ) Summary: Fixes T94400857 - [x] Add lint rule - [x] Verify lint rule works - [x] Fix torch/utils/collect_env.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/60946 Reviewed By: malfet, mruberry Differential Revision: D29457294 Pulled By: rsemenov fbshipit-source-id: 3c0670408d7aee1479e1de335291deb13a04ace9	2021-06-29 11:57:53 -07:00
Edward Yang	4c00df12ec	Include full Python version in collect_env.py output (#59632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59632 Before: ``` Python version: 3.7 (64-bit runtime) ``` After: ``` Python version: 3.7.7 (default, Mar 23 2020, 17:31:31) [Clang 4.0.1 (tags/RELEASE_401/final)] (64-bit runtime) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28961500 Pulled By: ezyang fbshipit-source-id: 0f95a49abf6977941f09a64243916576a820679f	2021-06-24 12:11:01 -07:00
Michael Wootton	2f3be2735f	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: zou3519 Differential Revision: D29186394 Pulled By: ezyang fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9	2021-06-21 11:46:08 -07:00
Nikita Shulga	f7c15610aa	Collect kernel version (#58485 ) Summary: Collect env should collect kernel and glibc version Fixes https://github.com/pytorch/pytorch/issues/58387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58485 Reviewed By: walterddr Differential Revision: D28510564 Pulled By: malfet fbshipit-source-id: ad3d4b93f51db052720bfaa4322138c55816921b	2021-05-18 10:57:59 -07:00
David Riazati	1ec12fd491	Add minidump collection via breakpad (#55647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647 This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B). ```bash $ cat <<EOF > test.py import torch torch.utils.enable_minidump_collection() # temporary util that just segfaults torch._C._crash() EOF $ python test.py Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp fish: “python test.py” terminated by signal SIGSEGV (Address boundary error) $ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp $ gdb python core.dmp ... commence debugging ... ``` Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something). Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27679767 Pulled By: driazati fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7	2021-04-16 13:05:01 -07:00
Natalia Gimelshein	f94c95a2dd	Revert D23752058: [pytorch][PR] Don't split oversize cached blocks Test Plan: revert-hammer Differential Revision: D23752058 (`67dcd62310`) Original commit changeset: ccb7c13e3cf8 fbshipit-source-id: 12ae9702135ea510e9714ed97fb75ca3b9f97c27	2021-04-14 09:24:08 -07:00
Michael Wootton	67dcd62310	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: ngimel Differential Revision: D23752058 Pulled By: ezyang fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8	2021-04-14 03:04:41 -07:00

1 2

78 Commits