mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
0d3d84d866
78 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
45c5a23237 |
Revert "Add Intel GPU info collection to the collect env script (#137846)"
This reverts commit
|
||
|
|
5264f8cd8d |
Add Intel GPU info collection to the collect env script (#137846)
As title, add Intel GPU info collection to the collect env script
Output examples:
1. CPU on Windows
```
C:\Users\user\miniforge3\envs\py310\lib\site-packages\torch\_subclasses\functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Collecting environment information...
PyTorch version: 2.8.0.dev20250528+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Enterprise (10.0.22631 64-bit)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Name: 12th Gen Intel(R) Core(TM) i7-1270P
Manufacturer: GenuineIntel
Family: 198
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 1711
MaxClockSpeed: 2200
L2CacheSize: 9216
L2CacheSpeed: None
Revision: None
Versions of relevant libraries:
[pip3] torch==2.8.0.dev20250528+cpu
[conda] torch 2.8.0.dev20250528+cpu pypi_0 pypi
```
2. XPU on Windows
```
Collecting environment information...
PyTorch version: 2.8.0a0+gitef6306e
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro (10.0.19045 64-bit)
GCC version: (GCC) 13.1.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: N/A
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250101
Intel GPU driver version:
* 32.0.101.6795 (20250520000000.******+***)
Intel GPU models onboard:
* Intel(R) Arc(TM) A770 Graphics
Intel GPU models detected:
* [0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33184', total_memory=15915MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
----------------------
Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
Manufacturer: GenuineIntel
Family: 179
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 2401
MaxClockSpeed: 2401
L2CacheSize: 24576
L2CacheSpeed: None
Revision: 21767
----------------------
Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
Manufacturer: GenuineIntel
Family: 179
Architecture: 9
ProcessorType: 3
DeviceID: CPU1
CurrentClockSpeed: 2200
MaxClockSpeed: 2401
L2CacheSize: 24576
L2CacheSpeed: None
Revision: 21767
Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.8.10+gitb3ea3a1
[pip3] numpy==2.1.2
[pip3] optree==0.13.1
[pip3] pytorch-triton-xpu==3.3.1+gitb0e26b73
[pip3] torch==2.8.0a0+gitef6306e
[conda] intel-extension-for-pytorch 2.8.10+gitb3ea3a1 pypi_0 pypi
[conda] mkl 2025.1.0 pypi_0 pypi
[conda] mkl-dpcpp 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-datafitting 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-stats 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-vm 2025.1.0 pypi_0 pypi
[conda] pytorch-triton-xpu 3.3.1+gitb0e26b73 pypi_0 pypi
[conda] torch 2.8.0a0+gitef6306e pypi_0 pypi
```
3. CPU on Linux
```
/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Collecting environment information...
PyTorch version: 2.8.0.dev20250528+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: AlmaLinux 8.10 (Cerulean Leopard) (x86_64)
GCC version: (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7)
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.28 Python version: 3.12.10 (main, Apr 19 2025, 05:03:56) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] (64-bit runtime) Python platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.28
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz
Stepping: 7
CPU MHz: 1000.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 30976K
NUMA node0 CPU(s): 0-21,44-65
NUMA node1 CPU(s): 22-43,66-87
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Versions of relevant libraries:
[pip3] torch==2.8.0.dev20250528+cpu
[conda] Could not collect
```
5. XPU on Linux
```
Collecting environment information...
PyTorch version: 2.8.0.dev20250516+xpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.6
Libc version: glibc-2.35
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:19:12) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.15.50-051550-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250101
Intel GPU driver version:
* intel_opencl: 24.39.31294.21-1032~22.04
* level_zero: 1.17.44.0-1022~22.04
Intel GPU models onboard:
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
Intel GPU models detected:
* [0] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [1] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [2] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [3] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [4] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [5] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [6] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [7] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-223
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8480+
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 56
Socket(s): 2
Stepping: 6
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 5.3 MiB (112 instances)
L1i cache: 3.5 MiB (112 instances)
L2 cache: 224 MiB (112 instances)
L3 cache: 210 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-55,112-167
NUMA node1 CPU(s): 56-111,168-223
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==2.2.5
[pip3] pytorch-triton-xpu==3.3.0+git0bcc8265
[pip3] torch==2.8.0.dev20250516+xpu
[conda] mkl 2025.1.0 pypi_0 pypi
[conda] numpy 2.2.5 pypi_0 pypi
[conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi
[conda] pytorch-triton-xpu 3.3.0+git0bcc8265 pypi_0 pypi
[conda] torch 2.8.0.dev20250516+xpu pypi_0 pypi
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137846
Approved by: https://github.com/guangyey, https://github.com/malfet
Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
|
||
|
|
6fb6293159 |
Revert "Add Intel GPU info collection to the collect env script (#137846)"
This reverts commit
|
||
|
|
c6b4f98625 |
Add Intel GPU info collection to the collect env script (#137846)
As title, add Intel GPU info collection to the collect env script
Output examples:
1. CPU on Windows
```
C:\Users\user\miniforge3\envs\py310\lib\site-packages\torch\_subclasses\functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Collecting environment information...
PyTorch version: 2.8.0.dev20250528+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Enterprise (10.0.22631 64-bit)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Name: 12th Gen Intel(R) Core(TM) i7-1270P
Manufacturer: GenuineIntel
Family: 198
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 1711
MaxClockSpeed: 2200
L2CacheSize: 9216
L2CacheSpeed: None
Revision: None
Versions of relevant libraries:
[pip3] torch==2.8.0.dev20250528+cpu
[conda] torch 2.8.0.dev20250528+cpu pypi_0 pypi
```
2. XPU on Windows
```
Collecting environment information...
PyTorch version: 2.8.0a0+gitef6306e
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro (10.0.19045 64-bit)
GCC version: (GCC) 13.1.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: N/A
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250101
Intel GPU driver version:
* 32.0.101.6795 (20250520000000.******+***)
Intel GPU models onboard:
* Intel(R) Arc(TM) A770 Graphics
Intel GPU models detected:
* [0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33184', total_memory=15915MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
----------------------
Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
Manufacturer: GenuineIntel
Family: 179
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 2401
MaxClockSpeed: 2401
L2CacheSize: 24576
L2CacheSpeed: None
Revision: 21767
----------------------
Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
Manufacturer: GenuineIntel
Family: 179
Architecture: 9
ProcessorType: 3
DeviceID: CPU1
CurrentClockSpeed: 2200
MaxClockSpeed: 2401
L2CacheSize: 24576
L2CacheSpeed: None
Revision: 21767
Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.8.10+gitb3ea3a1
[pip3] numpy==2.1.2
[pip3] optree==0.13.1
[pip3] pytorch-triton-xpu==3.3.1+gitb0e26b73
[pip3] torch==2.8.0a0+gitef6306e
[conda] intel-extension-for-pytorch 2.8.10+gitb3ea3a1 pypi_0 pypi
[conda] mkl 2025.1.0 pypi_0 pypi
[conda] mkl-dpcpp 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-datafitting 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-stats 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-vm 2025.1.0 pypi_0 pypi
[conda] pytorch-triton-xpu 3.3.1+gitb0e26b73 pypi_0 pypi
[conda] torch 2.8.0a0+gitef6306e pypi_0 pypi
```
3. CPU on Linux
```
/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Collecting environment information...
PyTorch version: 2.8.0.dev20250528+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: AlmaLinux 8.10 (Cerulean Leopard) (x86_64)
GCC version: (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7)
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.28 Python version: 3.12.10 (main, Apr 19 2025, 05:03:56) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] (64-bit runtime) Python platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.28
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz
Stepping: 7
CPU MHz: 1000.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 30976K
NUMA node0 CPU(s): 0-21,44-65
NUMA node1 CPU(s): 22-43,66-87
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Versions of relevant libraries:
[pip3] torch==2.8.0.dev20250528+cpu
[conda] Could not collect
```
5. XPU on Linux
```
Collecting environment information...
PyTorch version: 2.8.0.dev20250516+xpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.6
Libc version: glibc-2.35
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:19:12) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.15.50-051550-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250101
Intel GPU driver version:
* intel_opencl: 24.39.31294.21-1032~22.04
* level_zero: 1.17.44.0-1022~22.04
Intel GPU models onboard:
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
Intel GPU models detected:
* [0] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [1] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [2] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [3] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [4] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [5] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [6] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [7] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-223
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8480+
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 56
Socket(s): 2
Stepping: 6
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 5.3 MiB (112 instances)
L1i cache: 3.5 MiB (112 instances)
L2 cache: 224 MiB (112 instances)
L3 cache: 210 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-55,112-167
NUMA node1 CPU(s): 56-111,168-223
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==2.2.5
[pip3] pytorch-triton-xpu==3.3.0+git0bcc8265
[pip3] torch==2.8.0.dev20250516+xpu
[conda] mkl 2025.1.0 pypi_0 pypi
[conda] numpy 2.2.5 pypi_0 pypi
[conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi
[conda] pytorch-triton-xpu 3.3.0+git0bcc8265 pypi_0 pypi
[conda] torch 2.8.0.dev20250516+xpu pypi_0 pypi
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137846
Approved by: https://github.com/guangyey, https://github.com/malfet
Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
|
||
|
|
0db3e0cf29 |
Revert "Add Intel GPU info collection to the collect env script (#137846)"
This reverts commit
|
||
|
|
e1180c7228 |
Add Intel GPU info collection to the collect env script (#137846)
As title, add Intel GPU info collection to the collect env script
Output examples:
1. CPU on Windows
```
C:\Users\user\miniforge3\envs\py310\lib\site-packages\torch\_subclasses\functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\utils\tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Collecting environment information...
PyTorch version: 2.8.0.dev20250528+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Enterprise (10.0.22631 64-bit)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Name: 12th Gen Intel(R) Core(TM) i7-1270P
Manufacturer: GenuineIntel
Family: 198
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 1711
MaxClockSpeed: 2200
L2CacheSize: 9216
L2CacheSpeed: None
Revision: None
Versions of relevant libraries:
[pip3] torch==2.8.0.dev20250528+cpu
[conda] torch 2.8.0.dev20250528+cpu pypi_0 pypi
```
2. XPU on Windows
```
Collecting environment information...
PyTorch version: 2.8.0a0+gitef6306e
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro (10.0.19045 64-bit)
GCC version: (GCC) 13.1.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: N/A
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:06:35) [MSC v.1943 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250101
Intel GPU driver version:
* 32.0.101.6795 (20250520000000.******+***)
Intel GPU models onboard:
* Intel(R) Arc(TM) A770 Graphics
Intel GPU models detected:
* [0] _XpuDeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33184', total_memory=15915MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
----------------------
Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
Manufacturer: GenuineIntel
Family: 179
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 2401
MaxClockSpeed: 2401
L2CacheSize: 24576
L2CacheSpeed: None
Revision: 21767
----------------------
Name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
Manufacturer: GenuineIntel
Family: 179
Architecture: 9
ProcessorType: 3
DeviceID: CPU1
CurrentClockSpeed: 2200
MaxClockSpeed: 2401
L2CacheSize: 24576
L2CacheSpeed: None
Revision: 21767
Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.8.10+gitb3ea3a1
[pip3] numpy==2.1.2
[pip3] optree==0.13.1
[pip3] pytorch-triton-xpu==3.3.1+gitb0e26b73
[pip3] torch==2.8.0a0+gitef6306e
[conda] intel-extension-for-pytorch 2.8.10+gitb3ea3a1 pypi_0 pypi
[conda] mkl 2025.1.0 pypi_0 pypi
[conda] mkl-dpcpp 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-datafitting 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-stats 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-vm 2025.1.0 pypi_0 pypi
[conda] pytorch-triton-xpu 3.3.1+gitb0e26b73 pypi_0 pypi
[conda] torch 2.8.0a0+gitef6306e pypi_0 pypi
```
3. CPU on Linux
```
/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Collecting environment information...
PyTorch version: 2.8.0.dev20250528+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: AlmaLinux 8.10 (Cerulean Leopard) (x86_64)
GCC version: (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7)
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.28 Python version: 3.12.10 (main, Apr 19 2025, 05:03:56) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] (64-bit runtime) Python platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.28
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz
Stepping: 7
CPU MHz: 1000.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 30976K
NUMA node0 CPU(s): 0-21,44-65
NUMA node1 CPU(s): 22-43,66-87
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Versions of relevant libraries:
[pip3] torch==2.8.0.dev20250528+cpu
[conda] Could not collect
```
5. XPU on Linux
```
Collecting environment information...
PyTorch version: 2.8.0.dev20250516+xpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.6
Libc version: glibc-2.35
Python version: 3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:19:12) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-5.15.50-051550-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250101
Intel GPU driver version:
* intel_opencl: 24.39.31294.21-1032~22.04
* level_zero: 1.17.44.0-1022~22.04
Intel GPU models onboard:
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
* Intel(R) Data Center GPU Max 1550
Intel GPU models detected:
* [0] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [1] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [2] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [3] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [4] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [5] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [6] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
* [7] _XpuDeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31294+21', total_memory=65536MB, max_compute_units=512, gpu_eu_count=512, gpu_subslice_count=64, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-223
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8480+
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 56
Socket(s): 2
Stepping: 6
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 5.3 MiB (112 instances)
L1i cache: 3.5 MiB (112 instances)
L2 cache: 224 MiB (112 instances)
L3 cache: 210 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-55,112-167
NUMA node1 CPU(s): 56-111,168-223
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==2.2.5
[pip3] pytorch-triton-xpu==3.3.0+git0bcc8265
[pip3] torch==2.8.0.dev20250516+xpu
[conda] mkl 2025.1.0 pypi_0 pypi
[conda] numpy 2.2.5 pypi_0 pypi
[conda] onemkl-sycl-blas 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-dft 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-lapack 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-rng 2025.1.0 pypi_0 pypi
[conda] onemkl-sycl-sparse 2025.1.0 pypi_0 pypi
[conda] pytorch-triton-xpu 3.3.0+git0bcc8265 pypi_0 pypi
[conda] torch 2.8.0.dev20250516+xpu pypi_0 pypi
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137846
Approved by: https://github.com/guangyey, https://github.com/malfet
Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
|
||
|
|
1bd6bc7190 |
[BE]: Enable ruff YTT linter for Python version checks (#153547)
Adds ruff YTT checks to help future proof version checks and follow best practices here. Also makes it easier for static linters like mypy to detect python version branching. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153547 Approved by: https://github.com/albanD |
||
|
|
e2f9759bd0 |
Fix broken URLs (#152237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet |
||
|
|
675f69f40f |
collect_env: gracefully handle no pip (#151607)
If pip is not installed:
### Before
```console
> python3 torch/utils/collect_env.py
Collecting environment information...
Traceback (most recent call last):
File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 694, in <module>
main()
~~~~^^
File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 677, in main
output = get_pretty_env_info()
File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 672, in get_pretty_env_info
return pretty_str(get_env_info())
~~~~~~~~~~~~^^
File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 497, in get_env_info
pip_version, pip_list_output = get_pip_packages(run_lambda)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/Users/Adam/pytorch/torch/utils/collect_env.py", line 450, in get_pip_packages
for line in out.splitlines()
^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'splitlines'
```
### After
```console
> python3 torch/utils/collect_env.py
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: macOS 15.4 (arm64)
GCC version: Could not collect
Clang version: 20.1.0
CMake version: version 3.31.6
Libc version: N/A
Python version: 3.13.2 (main, Apr 8 2025, 15:27:33) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime)
Python platform: macOS-15.4-arm64-arm-64bit-Mach-O
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A
CPU:
Apple M2 Pro
Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151607
Approved by: https://github.com/malfet
|
||
|
|
6cbf97ede8 |
[ROCm] enable HIPMallocAsyncAllocator (#149145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/izaitsevfb Co-authored-by: Jeff Daily <jeff.daily@amd.com> |
||
|
|
e1d143cb7b |
Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145)"
This reverts commit
|
||
|
|
ee1a2b7810 |
[ROCm] enable HIPMallocAsyncAllocator (#149145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> |
||
|
|
9d37b501db |
Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145)"
This reverts commit
|
||
|
|
2e02c07a5d |
[ROCm] enable HIPMallocAsyncAllocator (#149145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145 Approved by: https://github.com/jeffdaily |
||
|
|
0aa34e9591 |
Revert "Collect packages with importlib in collect_env (#144616)"
This reverts commit
|
||
|
|
3541d2a2aa |
Collect packages with importlib in collect_env (#144616)
If pytorch is installed systemwide (via os package manager) or by alternative package manager like `uv`, pip is not available, causing error in `collect_env`. However it is still possible to collect exactly the same list using `importlib` API, which is always available. Fixes #144615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144616 Approved by: https://github.com/malfet |
||
|
|
67cf126cf8 |
Disable PIP version check in collect_env (#142308)
Disables version check which might require users to reach out to PyPI, reference: https://pip.pypa.io/en/latest/cli/pip/#cmdoption-disable-pip-version-check Switches pip to be used directly as a python module (`python3 -mpip`) instead of relying on `pip3` or `pip` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142308 Approved by: https://github.com/seemethere |
||
|
|
14e6624473 |
Update wmic command used in collect_env.py to its counterpart in powershell due to its deprecation (#138297)
As title. `wmic` is deprecated in Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138297 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> |
||
|
|
0b168ceb6d |
Collect Nvidia libraries with collect_env.py (#138076)
Collect Nvidia libraries to diagnose issues like #133548. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138076 Approved by: https://github.com/malfet |
||
|
|
8db9dfa2d7 |
Flip default value for mypy disallow_untyped_defs [9/11] (#127846)
See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846 Approved by: https://github.com/ezyang ghstack dependencies: #127842, #127843, #127844, #127845 |
||
|
|
117ab34891 |
Documenting the torch.utils.collect_env.get_pretty_env_info function (#128123)
Fixes #127888 This PR adds docstring to the `torch.utils.collect_env.get_pretty_env_info` function Pull Request resolved: https://github.com/pytorch/pytorch/pull/128123 Approved by: https://github.com/ezyang, https://github.com/malfet |
||
|
|
3acbfd602e |
Document torch.utils.collect_env.get_env_info function (#128021)
Fixes #127911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128021 Approved by: https://github.com/malfet |
||
|
|
3fe437b24b |
[BE]: Update flake8 to v6.1.0 and fix lints (#116591)
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
- `assert(a == b)` -> `assert a == b`
- `if(x > y or y < z):`->`if x > y or y < z:`
- And `return('...')` -> `return '...'`
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
|
||
|
|
5e58be678c |
Make collect env BC compatible (#116532)
To avoid errors like the one in https://github.com/pytorch/pytorch/issues/116531 when the user tries to run collect_env Pull Request resolved: https://github.com/pytorch/pytorch/pull/116532 Approved by: https://github.com/malfet |
||
|
|
765d4599ee |
Give users control over packages in torch.utils.collect_env (#112993)
I'm looking to repurpose some logic in `torch.utils.collect_env` for the `geowatch` package. I'm mostly able to just use this script as a library, which is great because it reduces code in my package. However, the issue is that the package patterns that are relevant to torch are hard-coded inside of `get_conda_packages` and `get_pip_packages`. The changes I made are simple. I defined the default package patterns as two global sets, and I added an argument to each function that lets the user customize exactly what package patterns are relevant. If they are not specified the defaults are used. I was considering extending the power of the patterns by utilizing `fnmatch`, `re` (or [xdev.pattern](https://github.com/Erotemic/xdev/blob/main/xdev/patterns.py) which abstracts them both), but instead I opted to just use the existing `__contains__` test to keep things simple. From torch's perspective this should make maintaining this file slightly easier because to update relevant packages, the developer now updates two neighboring top-level globals instead of two separated local variables. However, it does add an argument to two functions, and that argument isn't used in torch itself, so there is an argument for removing that, and then users *could* still have some control by modifying globals, but I think the way I did it balances the tradeoffs well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112993 Approved by: https://github.com/zou3519 |
||
|
|
5e10dd2c78 |
fix docstring issues in torch.utils (#113335)
Fixes #112634 Fixes all the issues listed except in `torch/utils/_pytree.py` as the file no longer exists. ### Error counts |File | Count Before | Count now| |---- | ---- | ---- | |`torch/utils/collect_env.py` | 39 | 25| |`torch/utils/cpp_extension.py` | 51 | 13| |`torch/utils/flop_counter.py` | 25 | 8| |`torch/utils/_foreach_utils.py.py` | 2 | 0| |`torch/utils/_python_dispatch.py.py` | 26 | 25| |`torch/utils/backend_registration.py` | 15 | 4| |`torch/utils/checkpoint.py` | 29 | 21| Pull Request resolved: https://github.com/pytorch/pytorch/pull/113335 Approved by: https://github.com/ezyang |
||
|
|
333d5821ee |
[ROCm] Add gcnArchName to collect_env and torch.cuda.get_device_properties (#107477)
Printing just the device name is not helpful when investigating PyTorch issues filed for specific AMD GPUs, as the support/issue might depend on the gfx arch, which is part of the gcnArchName property. `torch.cuda.get_device_properties(0).gcnArchName` will print the value of the `gcnArchName` property: eg. ``` >>> torch.cuda.get_device_properties(0).gcnArchName 'gfx906:sramecc+:xnack-' ``` ``` root@6f064e3c19fb:/data/pytorch/test# python ../torch/utils/collect_env.py ... GPU models and configuration: AMD Radeon Graphics(gfx906:sramecc+:xnack-) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107477 Approved by: https://github.com/albanD |
||
|
|
d24e7be243 |
Include onnx and onnxscript information in collect_env.py (#110560)
`onnx` and `onnxscript` are used in torch.onnx.dynamo_export since 2.0. It would be helpful to collect version information in user issue reports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110560 Approved by: https://github.com/albanD |
||
|
|
0bf30c140a |
[pytree] Use OpTree for PyTree manipulation (#93139)
Split from #92679. Use C++-based PyTree implementation. ## Highlights 1. High performance (20x speedup than the pure-Python implementation, 10%-20% overall speedup for `torch.fx`) 2. Multi-input tree-map support 3. Custom tree node registry with namespace isolation Refs: - #65761 - #91323 - #92679 From https://github.com/pytorch/pytorch/issues/65761#issuecomment-1334746366: > ### 0. Out-of-box compatible with JAX's pytree, provides the same interfaces and functions (and more). > > ### 1. High-performance: `optree` has comparable fast tree operations (~0.9x for `dict`s and ~2.5x for `OrderedDict`s) than JAX's pytree and it is 20x faster than `torch.utils._pytree`. > > `optree` implements some common Python container types in C++ (e.g., `OrderedDict`) and achieves 2.5x performance than JAX's pytree. Check out section [Built-in PyTree Node Types](https://github.com/metaopt/optree#built-in-pytree-node-types) and [Benchmark](https://github.com/metaopt/optree#benchmark) for more details. > > | Module | Nodes | OpTree (μs) | JAX XLA (μs) | PyTorch (μs) | DM-Tree (μs) | Speedup (J / O) | Speedup (P / O) | Speedup (D / O) | > | :-------- | ----: | ----------: | -----------: | -----------: | -----------: | --------------: | --------------: | --------------: | > | TinyMLP | 53 | 26.40 | 68.19 | 586.87 | 34.14 | 2.58 | 22.23 | 1.29 | > | AlexNet | 188 | 84.28 | 259.51 | 2182.07 | 125.12 | 3.08 | 25.89 | 1.48 | > | ResNet18 | 698 | 288.57 | 807.27 | 7881.69 | 429.39 | 2.80 | 27.31 | 1.49 | > | ResNet34 | 1242 | 580.75 | 1564.97 | 15082.84 | 819.02 | 2.69 | 25.97 | 1.41 | > | ResNet50 | 1702 | 791.18 | 2081.17 | 20982.82 | 1104.62 | 2.63 | 26.52 | 1.40 | > | ResNet101 | 3317 | 1603.93 | 3939.37 | 40382.14 | 2208.63 | 2.46 | 25.18 | 1.38 | > | ResNet152 | 4932 | 2446.56 | 6267.98 | 56892.36 | 3139.17 | 2.56 | 23.25 | 1.28 | > | ViT-H/14 | 3420 | 1681.48 | 4488.33 | 41703.16 | 2504.86 | 2.67 | 24.80 | 1.49 | > | Swin-B | 2881 | 1565.41 | 4091.10 | 34241.99 | 1936.75 | 2.61 | 21.87 | 1.24 | > | | | | | | **Average** | **2.68** | **24.78** | **1.38** | > > <div align="center"> > <img src="https://user-images.githubusercontent.com/16078332/200494435-fd5bb385-59f7-4811-b520-98bf5763ccf3.png" width="90%" /> > </div> > > ### 2. Namespace Isolation for the PyTree Type Registry > > In addition to the JAX's pytree registry for custom node type registration, `optree` adds `namespace` isolation to the registry. Users can register the same type multiple times for different flatten/unflatten behavior. It also provides module-level isolation for safety reasons. For example, you can add a unique prefix to your namespace to isolate your registry with other modules (e.g., `torch.xxx`, `torch.functorch.xxx`): > > ```python > # Register a Python type into a namespace > import torch > > optree.register_pytree_node( > torch.Tensor, > # (tensor) -> (children, metadata) > flatten_func=lambda tensor: ( > (tensor.cpu().numpy(),), > dict(dtype=tensor.dtype, device=tensor.device, requires_grad=tensor.requires_grad), > ), > # (metadata, children) -> tensor > unflatten_func=lambda metadata, children: torch.tensor(children[0], **metadata), > namespace='torch.torch2numpy', > ) > ``` > > ```python > >>> tree = {'weight': torch.ones(size=(1, 2)).cuda(), 'bias': torch.zeros(size=(2,))} > >>> tree > {'weight': tensor([[1., 1.]], device='cuda:0'), 'bias': tensor([0., 0.])} > > # Flatten without specifying the namespace > >>> tree_flatten(tree) # `torch.Tensor`s are leaf nodes > ([tensor([0., 0.]), tensor([[1., 1.]], device='cuda:0')], PyTreeSpec({'bias': *, 'weight': *})) > > # Flatten with the namespace > >>> leaves, treespec = optree.tree_flatten(tree, namespace='torch.torch2numpy') > >>> leaves, treespec > ( > [array([0., 0.], dtype=float32), array([[1., 1.]], dtype=float32)], > PyTreeSpec( > { > 'bias': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cpu'), 'requires_grad': False}], [*]), > 'weight': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False}], [*]) > }, > namespace='torch.torch2numpy' > ) > ) > > # `entries` are not defined and use `range(len(children))` > >>> optree.tree_paths(tree, namespace='torch.torch2numpy') > [('bias', 0), ('weight', 0)] > > # Unflatten back to a copy of the original object > >>> optree.tree_unflatten(treespec, leaves) > {'bias': tensor([0., 0.]), 'weight': tensor([[1., 1.]], device='cuda:0')} > ``` > > Check out section [Registering a Container-like Custom Type as Non-leaf Nodes](https://github.com/metaopt/optree#notes-about-the-pytree-type-registry) for more details. > > ### 3. Support both `None` as Non-leaf Node and `None` as Leaf > > In JAX's implementation, `None` is always an internal non-leaf node with an arity 0, which is like an empty tuple. This limits the usage of the JAX's pytree utilities for PyTorch. For example, the `nn.Module` uses `_parameters` and `_buffers` (`OrderedDict[str, Optional[Tensor]]`) to hold the tensors, while the value can be a tensor or `None`. > > `optree` supports both `None` as Non-leaf Node (JAX's default) and `None` as Leaf (PyTorch's default). Check out section [None is Non-leaf Node vs. None is Leaf](https://github.com/metaopt/optree#none-is-non-leaf-node-vs-none-is-leaf) for more details. > > ### 4. Some other improvements and bug fixes > > 1. Adds in-place version of treemap (`tree_map_`), which reduces redundant unflatten operation for better performance. > 2. Adds support for tree flatten and tree map with paths. (useful for `functorch` module extraction). > 3. Improves the JAX's pytree sorting support for `dict`s. > 4. Better string representation `repr(PyTreeSpec)`. > 5. Fixes some bugs for JAX's pytree of hashing, pickle serialization, segmentation fault for infinite recursion, and tree-compose/tree-transpose. From https://github.com/pytorch/pytorch/pull/92679#issuecomment-1398778481: > ```python > # pytree_make_fx_bench.py > import torch > from torch.fx.experimental.proxy_tensor import make_fx > import time > > def f(x): > for _ in range(10000): > x = x+x > return x > > import time > begin = time.time() > out = make_fx(f, tracing_mode="real")(torch.randn(20)) > begin = time.time() > print(f'tracing_mode="real" {time.time() - begin:.2f}') > out = make_fx(f, tracing_mode="fake")(torch.randn(20)) > print(f'tracing_mode="fake" {time.time() - begin:.2f}') > > out = make_fx(f, tracing_mode="symbolic")(torch.randn(20)) > print(f'tracing_mode="symbolic" {time.time() - begin:.2f}') > ``` > > This seems to run around 10-20% faster with the optree implementation: > > ``` > # Optree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 6.32 > tracing_mode="symbolic" 27.13 > ``` > > ``` > # torch.utils._pytree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 7.66 > tracing_mode="symbolic" 31.07 > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93139 Approved by: https://github.com/malfet |
||
|
|
2f95a3d0fc |
[BE]: Apply ruff PERF fixes to torch (#104917)
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD |
||
|
|
1ac663d9f1 |
collect_env: parse HIP version exception free (#101844)
Should prevent broken collect_env reporting as shown in https://github.com/pytorch/vision/issues/7561#issue-1698000841 <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 5204e0f</samp> > _`get_version_or_na`_ > _Helper function refactors_ > _Code like autumn leaves_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/101844 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi |
||
|
|
bd78532020 |
[BE] Fix collect_env for python-path-with-space (#98415)
By invoking [`Popen`](https://docs.python.org/2.7/library/subprocess.html#popen-constructor) with list of command line arguments, rather than strings that would be parsed by shell. Test plan: ```shell % conda create -n py311 python=3.11 % cd ~/miniconda3/envs % cp -a py311 py\ 311 % ./py\ 311/bin/python -mtorch.utils.collect_env ``` Fixes https://github.com/pytorch/pytorch/issues/98385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98415 Approved by: https://github.com/huydhn |
||
|
|
5e6e984835 |
flake8 version reporting in collect_env (#94573)
Fixes #94571 # Testing `[pip3] flake8==3.9.2` now appears under `Versions of relevant libraries:` when running: `python torch/utils/collect_env.py` ### Output with this change ``` Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: macOS 13.1 (x86_64) GCC version: Could not collect Clang version: 14.0.0 (clang-1400.0.29.202) CMake version: Could not collect Libc version: N/A Python version: 3.9.12 (main, Apr 5 2022, 01:53:17) [Clang 12.0.0 ] (64-bit runtime) Python platform: macOS-10.16-x86_64-i386-64bit Is CUDA available: N/A CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz Versions of relevant libraries: [pip3] flake8==3.9.2 [pip3] mypy==0.971 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.5 [pip3] numpydoc==1.2 [conda] blas 1.0 mkl [conda] mkl 2021.4.0 hecd8cb5_637 [conda] mkl-service 2.4.0 py39h9ed2024_0 [conda] mkl_fft 1.3.1 py39h4ab4a9b_0 [conda] mkl_random 1.2.2 py39hb2f4e1b_0 [conda] numpy 1.21.5 py39h2e5f0a9_1 [conda] numpy-base 1.21.5 py39h3b1a694_1 [conda] numpydoc 1.2 pyhd3eb1b0_0 ``` ### Output before ``` Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: macOS 13.1 (x86_64) GCC version: Could not collect Clang version: 14.0.0 (clang-1400.0.29.202) CMake version: Could not collect Libc version: N/A Python version: 3.9.12 (main, Apr 5 2022, 01:53:17) [Clang 12.0.0 ] (64-bit runtime) Python platform: macOS-10.16-x86_64-i386-64bit Is CUDA available: N/A CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz Versions of relevant libraries: [pip3] mypy==0.971 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.5 [pip3] numpydoc==1.2 [conda] blas 1.0 mkl [conda] mkl 2021.4.0 hecd8cb5_637 [conda] mkl-service 2.4.0 py39h9ed2024_0 [conda] mkl_fft 1.3.1 py39h4ab4a9b_0 [conda] mkl_random 1.2.2 py39hb2f4e1b_0 [conda] numpy 1.21.5 py39h2e5f0a9_1 [conda] numpy-base 1.21.5 py39h3b1a694_1 [conda] numpydoc 1.2 pyhd3eb1b0_0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94573 Approved by: https://github.com/malfet, https://github.com/kit1980 |
||
|
|
4454655a4c |
Add triton to relevant packages (#96663)
Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96663 Approved by: https://github.com/janeyx99, https://github.com/malfet, https://github.com/atalman |
||
|
|
3ce1ebb6fb |
Apply some safe comprehension optimizations (#94323)
Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323 Approved by: https://github.com/albanD |
||
|
|
8fce9a09cd |
[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD |
||
|
|
79243516f6 |
collect CPU info with collect_env.py for new issues reporting (#93899)
Add CPU information collection feature to collect_env.py for new issues reporting. This helps us to triage issues on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93899 Approved by: https://github.com/malfet |
||
|
|
77d94ac5ab |
Sets CUDA_MODULE_LOADING to LAZY when not set by the user (#85692)
This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY". It was tested using the following commands: ``` python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows a memory usage of: 287,047,680 bytes vs ``` CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows 666,632,192 bytes. C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality). cc: @ptrblck @ngimel @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/85692 Approved by: https://github.com/malfet |
||
|
|
be25566d13 |
tools: Ensure compat for collect_env with python 3.5
Users were reporting errors of not being able to use collect_env with older versions of python. This adds a test to ensure that we maintain compat for this script with older versions of python Signed-off-by: Eli Uriegas <eliuriegasfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78946 Approved by: https://github.com/janeyx99 |
||
|
|
635aaa3d9d |
replace "grep" with Python processing in collect_env.py (#77148)
Fixes #77063. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77148 Approved by: https://github.com/ezyang |
||
|
|
c170d395de |
utils: Only check for xnnpack if torch installed (#74342)
Summary: Fixes a bug where collect_env.py was not able to be run without having torch installed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74342 Reviewed By: malfet, janeyx99 Differential Revision: D34943464 Pulled By: seemethere fbshipit-source-id: dbaa0004b88cb643a9c6426c9ea7c5be3d3c9ef5 (cherry picked from commit 4f39ebb823f88df0c3902db15deaffc6ba481cb3) |
||
|
|
b2054d3025 |
Prepare for an update to the XNNPACK submodule (#72642)
Summary:
- Target Sha1: ae108ef49aa5623b896fc93d4298c49d1750d9ba
- Make USE_XNNPACK a dependent option on cmake minimum version 3.12
- Print USE_XNNPACK under cmake options summary, and print the
availability from collet_env.py
- Skip XNNPACK based tests when XNNPACK is not available
- Add SkipIfNoXNNPACK wrapper to skip tests
- Update cmake version for xenial-py3.7-gcc5.4 image to 3.12.4
- This is required for the backwards compatibility test.
The PyTorch op schema is XNNPACK dependent. See,
aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp for
example. The nightly version is assumed to have USE_XNNPACK=ON,
so with this change we ensure that the test build can also
have XNNPACK.
- HACK: skipping test_xnnpack_integration tests on ROCM
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72642
Reviewed By: kimishpatel
Differential Revision: D34456794
Pulled By: digantdesai
fbshipit-source-id: 85dbfe0211de7846d8a84321b14fdb061cd6c037
(cherry picked from commit 6cf48e7b64d6979962d701b5d493998262cc8bfa)
|
||
|
|
3f06f29577 |
Improve pip package determination (#63321)
Summary: Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'` Also, add mypy to the list of packages of interest Discovered while looking at https://github.com/pytorch/pytorch/issues/63279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321 Reviewed By: walterddr Differential Revision: D30342099 Pulled By: malfet fbshipit-source-id: fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0 |
||
|
|
150c828803 |
Add lint rule to keep collect_env.py python2 compliant (#60946)
Summary: Fixes T94400857 - [x] Add lint rule - [x] Verify lint rule works - [x] Fix torch/utils/collect_env.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/60946 Reviewed By: malfet, mruberry Differential Revision: D29457294 Pulled By: rsemenov fbshipit-source-id: 3c0670408d7aee1479e1de335291deb13a04ace9 |
||
|
|
4c00df12ec |
Include full Python version in collect_env.py output (#59632)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59632 Before: ``` Python version: 3.7 (64-bit runtime) ``` After: ``` Python version: 3.7.7 (default, Mar 23 2020, 17:31:31) [Clang 4.0.1 (tags/RELEASE_401/final)] (64-bit runtime) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28961500 Pulled By: ezyang fbshipit-source-id: 0f95a49abf6977941f09a64243916576a820679f |
||
|
|
2f3be2735f |
Don't split oversize cached blocks (#44742)
Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: zou3519 Differential Revision: D29186394 Pulled By: ezyang fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9 |
||
|
|
f7c15610aa |
Collect kernel version (#58485)
Summary: Collect env should collect kernel and glibc version Fixes https://github.com/pytorch/pytorch/issues/58387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58485 Reviewed By: walterddr Differential Revision: D28510564 Pulled By: malfet fbshipit-source-id: ad3d4b93f51db052720bfaa4322138c55816921b |
||
|
|
1ec12fd491 |
Add minidump collection via breakpad (#55647)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647 This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B). ```bash $ cat <<EOF > test.py import torch torch.utils.enable_minidump_collection() # temporary util that just segfaults torch._C._crash() EOF $ python test.py Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp fish: “python test.py” terminated by signal SIGSEGV (Address boundary error) $ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp $ gdb python core.dmp ... commence debugging ... ``` Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something). Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27679767 Pulled By: driazati fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7 |
||
|
|
f94c95a2dd |
Revert D23752058: [pytorch][PR] Don't split oversize cached blocks
Test Plan: revert-hammer
Differential Revision:
D23752058 (
|
||
|
|
67dcd62310 |
Don't split oversize cached blocks (#44742)
Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: ngimel Differential Revision: D23752058 Pulled By: ezyang fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8 |