mirror of
https://github.com/zebrajr/faceswap.git
synced 2025-12-06 12:20:27 +01:00
* Core Updates
- Remove lib.utils.keras_backend_quiet and replace with get_backend() where relevant
- Document lib.gpu_stats and lib.sys_info
- Remove call to GPUStats.is_plaidml from convert and replace with get_backend()
- lib.gui.menu - typofix
* Update Dependencies
Bump Tensorflow Version Check
* Port extraction to tf2
* Add custom import finder for loading Keras or tf.keras depending on backend
* Add `tensorflow` to KerasFinder search path
* Basic TF2 training running
* model.initializers - docstring fix
* Fix and pass tests for tf2
* Replace Keras backend tests with faceswap backend tests
* Initial optimizers update
* Monkey patch tf.keras optimizer
* Remove custom Adam Optimizers and Memory Saving Gradients
* Remove multi-gpu option. Add Distribution to cli
* plugins.train.model._base: Add Mirror, Central and Default distribution strategies
* Update tensorboard kwargs for tf2
* Penalized Loss - Fix for TF2 and AMD
* Fix syntax for tf2.1
* requirements typo fix
* Explicit None for clipnorm if using a distribution strategy
* Fix penalized loss for distribution strategies
* Update Dlight
* typo fix
* Pin to TF2.2
* setup.py - Install tensorflow from pip if not available in Conda
* Add reduction options and set default for mirrored distribution strategy
* Explicitly use default strategy rather than nullcontext
* lib.model.backup_restore documentation
* Remove mirrored strategy reduction method and default based on OS
* Initial restructure - training
* Remove PingPong
Start model.base refactor
* Model saving and resuming enabled
* More tidying up of model.base
* Enable backup and snapshotting
* Re-enable state file
Remove loss names from state file
Fix print loss function
Set snapshot iterations correctly
* Revert original model to Keras Model structure rather than custom layer
Output full model and sub model summary
Change NNBlocks to callables rather than custom keras layers
* Apply custom Conv2D layer
* Finalize NNBlock restructure
Update Dfaker blocks
* Fix reloading model under a different distribution strategy
* Pass command line arguments through to trainer
* Remove training_opts from model and reference params directly
* Tidy up model __init__
* Re-enable tensorboard logging
Suppress "Model Not Compiled" warning
* Fix timelapse
* lib.model.nnblocks - Bugfix residual block
Port dfaker
bugfix original
* dfl-h128 ported
* DFL SAE ported
* IAE Ported
* dlight ported
* port lightweight
* realface ported
* unbalanced ported
* villain ported
* lib.cli.args - Update Batchsize + move allow_growth to config
* Remove output shape definition
Get image sizes per side rather than globally
* Strip mask input from encoder
* Fix learn mask and output learned mask to preview
* Trigger Allow Growth prior to setting strategy
* Fix GUI Graphing
* GUI - Display batchsize correctly + fix training graphs
* Fix penalized loss
* Enable mixed precision training
* Update analysis displayed batch to match input
* Penalized Loss - Multi-GPU Fix
* Fix all losses for TF2
* Fix Reflect Padding
* Allow different input size for each side of the model
* Fix conv-aware initialization on reload
* Switch allow_growth order
* Move mixed_precision to cli
* Remove distrubution strategies
* Compile penalized loss sub-function into LossContainer
* Bump default save interval to 250
Generate preview on first iteration but don't save
Fix iterations to start at 1 instead of 0
Remove training deprecation warnings
Bump some scripts.train loglevels
* Add ability to refresh preview on demand on pop-up window
* Enable refresh of training preview from GUI
* Fix Convert
Debug logging in Initializers
* Fix Preview Tool
* Update Legacy TF1 weights to TF2
Catch stats error on loading stats with missing logs
* lib.gui.popup_configure - Make more responsive + document
* Multiple Outputs supported in trainer
Original Model - Mask output bugfix
* Make universal inference model for convert
Remove scaling from penalized mask loss (now handled at input to y_true)
* Fix inference model to work properly with all models
* Fix multi-scale output for convert
* Fix clipnorm issue with distribution strategies
Edit error message on OOM
* Update plaidml losses
* Add missing file
* Disable gmsd loss for plaidnl
* PlaidML - Basic training working
* clipnorm rewriting for mixed-precision
* Inference model creation bugfixes
* Remove debug code
* Bugfix: Default clipnorm to 1.0
* Remove all mask inputs from training code
* Remove mask inputs from convert
* GUI - Analysis Tab - Docstrings
* Fix rate in totals row
* lib.gui - Only update display pages if they have focus
* Save the model on first iteration
* plaidml - Fix SSIM loss with penalized loss
* tools.alignments - Remove manual and fix jobs
* GUI - Remove case formatting on help text
* gui MultiSelect custom widget - Set default values on init
* vgg_face2 - Move to plugins.extract.recognition and use plugins._base base class
cli - Add global GPU Exclude Option
tools.sort - Use global GPU Exlude option for backend
lib.model.session - Exclude all GPUs when running in CPU mode
lib.cli.launcher - Set backend to CPU mode when all GPUs excluded
* Cascade excluded devices to GPU Stats
* Explicit GPU selection for Train and Convert
* Reduce Tensorflow Min GPU Multiprocessor Count to 4
* remove compat.v1 code from extract
* Force TF to skip mixed precision compatibility check if GPUs have been filtered
* Add notes to config for non-working AMD losses
* Rasie error if forcing extract to CPU mode
* Fix loading of legace dfl-sae weights + dfl-sae typo fix
* Remove unused requirements
Update sphinx requirements
Fix broken rst file locations
* docs: lib.gui.display
* clipnorm amd condition check
* documentation - gui.display_analysis
* Documentation - gui.popup_configure
* Documentation - lib.logger
* Documentation - lib.model.initializers
* Documentation - lib.model.layers
* Documentation - lib.model.losses
* Documentation - lib.model.nn_blocks
* Documetation - lib.model.normalization
* Documentation - lib.model.session
* Documentation - lib.plaidml_stats
* Documentation: lib.training_data
* Documentation: lib.utils
* Documentation: plugins.train.model._base
* GUI Stats: prevent stats from using GPU
* Documentation - Original Model
* Documentation: plugins.model.trainer._base
* linting
* unit tests: initializers + losses
* unit tests: nn_blocks
* bugfix - Exclude gpu devices in train, not include
* Enable Exclude-Gpus in Extract
* Enable exclude gpus in tools
* Disallow multiple plugin types in a single model folder
* Automatically add exclude_gpus argument in for cpu backends
* Cpu backend fixes
* Relax optimizer test threshold
* Default Train settings - Set mask to Extended
* Update Extractor cli help text
Update to Python 3.8
* Fix FAN to run on CPU
* lib.plaidml_tools - typofix
* Linux installer - check for curl
* linux installer - typo fix
389 lines
15 KiB
Python
389 lines
15 KiB
Python
#!/usr/bin python3
|
|
""" Collects and returns Information on available GPUs.
|
|
|
|
The information returned from this module provides information for both Nvidia and AMD GPUs.
|
|
However, the information available for Nvidia is far more thorough than what is available for
|
|
AMD, where we need to plug into plaidML to pull stats. The quality of this data will vary
|
|
depending on the OS' particular OpenCL implementation.
|
|
"""
|
|
|
|
import logging
|
|
import os
|
|
import platform
|
|
|
|
from lib.utils import get_backend
|
|
|
|
if platform.system() == 'Darwin':
|
|
import pynvx # pylint: disable=import-error
|
|
IS_MACOS = True
|
|
else:
|
|
import pynvml
|
|
IS_MACOS = False
|
|
|
|
# Limited PlaidML/AMD Stats
|
|
try:
|
|
from lib.plaidml_tools import PlaidMLStats as plaidlib # pylint:disable=ungrouped-imports
|
|
except ImportError:
|
|
plaidlib = None
|
|
|
|
|
|
_EXCLUDE_DEVICES = []
|
|
|
|
|
|
def set_exclude_devices(devices):
|
|
""" Add any explicitly selected GPU devices to the global list of devices to be excluded
|
|
from use by Faceswap.
|
|
|
|
Parameters
|
|
----------
|
|
devices: list
|
|
list of indices corresponding to the GPU devices connected to the computer
|
|
"""
|
|
logger = logging.getLogger(__name__)
|
|
logger.debug("Excluding GPU indicies: %s", devices)
|
|
if not devices:
|
|
return
|
|
_EXCLUDE_DEVICES.extend(devices)
|
|
|
|
|
|
class GPUStats():
|
|
""" Holds information and statistics about the GPU(s) available on the currently
|
|
running system.
|
|
|
|
Parameters
|
|
----------
|
|
log: bool, optional
|
|
Whether the class should output information to the logger. There may be occasions where the
|
|
logger has not yet been set up when this class is queried. Attempting to log in these
|
|
instances will raise an error. If GPU stats are being queried prior to the logger being
|
|
available then this parameter should be set to ``False``. Otherwise set to ``True``.
|
|
Default: ``True``
|
|
"""
|
|
def __init__(self, log=True):
|
|
# Logger is held internally, as we don't want to log when obtaining system stats on crash
|
|
self._logger = logging.getLogger(__name__) if log else None
|
|
self._log("debug", "Initializing {}".format(self.__class__.__name__))
|
|
|
|
self._plaid = None
|
|
self._initialized = False
|
|
self._device_count = 0
|
|
self._active_devices = list()
|
|
self._handles = list()
|
|
self._driver = None
|
|
self._devices = list()
|
|
self._vram = None
|
|
|
|
self._initialize(log)
|
|
|
|
self._driver = self._get_driver()
|
|
self._devices = self._get_devices()
|
|
self._vram = self._get_vram()
|
|
if not self._active_devices:
|
|
self._log("warning", "No GPU detected. Switching to CPU mode")
|
|
return
|
|
|
|
self._shutdown()
|
|
self._log("debug", "Initialized {}".format(self.__class__.__name__))
|
|
|
|
@property
|
|
def device_count(self):
|
|
"""int: The number of GPU devices discovered on the system. """
|
|
return self._device_count
|
|
|
|
@property
|
|
def cli_devices(self):
|
|
""" list: List of available devices for use in faceswap's command line arguments """
|
|
return ["{}: {}".format(idx, device) for idx, device in enumerate(self._devices)]
|
|
|
|
@property
|
|
def exclude_all_devices(self):
|
|
""" bool: ``True`` if all GPU devices have been explicitly disabled otherwise ``False`` """
|
|
return all(idx in _EXCLUDE_DEVICES for idx in range(len(self._devices)))
|
|
|
|
@property
|
|
def _is_plaidml(self):
|
|
""" bool: ``True`` if the backend is plaidML otherwise ``False``. """
|
|
return self._plaid is not None
|
|
|
|
@property
|
|
def sys_info(self):
|
|
""" dict: GPU Stats that are required for system information logging.
|
|
|
|
The dictionary contains the following data:
|
|
|
|
**vram** (`list`): the total amount of VRAM in Megabytes for each GPU as pertaining to
|
|
:attr:`_handles`
|
|
|
|
**driver** (`str`): The GPU driver version that is installed on the OS
|
|
|
|
**devices** (`list`): The device name of each GPU on the system as pertaining
|
|
to :attr:`_handles`
|
|
|
|
**devices_active** (`list`): The device name of each active GPU on the system as
|
|
pertaining to :attr:`_handles`
|
|
"""
|
|
return dict(vram=self._vram,
|
|
driver=self._driver,
|
|
devices=self._devices,
|
|
devices_active=self._active_devices)
|
|
|
|
def _log(self, level, message):
|
|
""" If the class has been initialized with :attr:`log` as `True` then log the message
|
|
otherwise skip logging.
|
|
|
|
Parameters
|
|
----------
|
|
level: str
|
|
The log level to log at
|
|
message: str
|
|
The message to log
|
|
"""
|
|
if self._logger is None:
|
|
return
|
|
logger = getattr(self._logger, level.lower())
|
|
logger(message)
|
|
|
|
def _initialize(self, log=False):
|
|
""" Initialize the library that will be returning stats for the system's GPU(s).
|
|
For Nvidia (on Linux and Windows) the library is `pynvml`. For Nvidia (on macOS) the
|
|
library is `pynvx`. For AMD `plaidML` is used.
|
|
|
|
Parameters
|
|
----------
|
|
log: bool, optional
|
|
Whether the class should output information to the logger. There may be occasions where
|
|
the logger has not yet been set up when this class is queried. Attempting to log in
|
|
these instances will raise an error. If GPU stats are being queried prior to the
|
|
logger being available then this parameter should be set to ``False``. Otherwise set
|
|
to ``True``. Default: ``False``
|
|
"""
|
|
if not self._initialized:
|
|
if get_backend() == "amd":
|
|
self._log("debug", "AMD Detected. Using plaidMLStats")
|
|
loglevel = "INFO" if self._logger is None else self._logger.getEffectiveLevel()
|
|
self._plaid = plaidlib(log_level=loglevel, log=log)
|
|
elif IS_MACOS:
|
|
self._log("debug", "macOS Detected. Using pynvx")
|
|
try:
|
|
pynvx.cudaInit()
|
|
except RuntimeError:
|
|
self._initialized = True
|
|
return
|
|
else:
|
|
try:
|
|
self._log("debug", "OS is not macOS. Trying pynvml")
|
|
pynvml.nvmlInit()
|
|
except (pynvml.NVMLError_LibraryNotFound, # pylint: disable=no-member
|
|
pynvml.NVMLError_DriverNotLoaded, # pylint: disable=no-member
|
|
pynvml.NVMLError_NoPermission) as err: # pylint: disable=no-member
|
|
if plaidlib is not None:
|
|
self._log("debug", "pynvml errored. Trying plaidML")
|
|
self._plaid = plaidlib(log=log)
|
|
else:
|
|
msg = ("There was an error reading from the Nvidia Machine Learning "
|
|
"Library. Either you do not have an Nvidia GPU (in which case "
|
|
"this warning can be ignored) or the most likely cause is "
|
|
"incorrectly installed drivers. If this is the case, Please remove "
|
|
"and reinstall your Nvidia drivers before reporting."
|
|
"Original Error: {}".format(str(err)))
|
|
self._log("warning", msg)
|
|
self._initialized = True
|
|
return
|
|
except Exception as err: # pylint: disable=broad-except
|
|
msg = ("An unhandled exception occured loading pynvml. "
|
|
"Original error: {}".format(str(err)))
|
|
if self._logger:
|
|
self._logger.error(msg)
|
|
else:
|
|
print(msg)
|
|
self._initialized = True
|
|
return
|
|
self._initialized = True
|
|
self._get_device_count()
|
|
self._get_active_devices()
|
|
self._get_handles()
|
|
|
|
def _shutdown(self):
|
|
""" Shutdown pynvml if it was the library used for obtaining stats and set
|
|
:attr:`_initialized` back to ``False``. """
|
|
if self._initialized:
|
|
self._handles = list()
|
|
if not IS_MACOS and not self._is_plaidml:
|
|
pynvml.nvmlShutdown()
|
|
self._initialized = False
|
|
|
|
def _get_device_count(self):
|
|
""" Detect the number of GPUs attached to the system and allocate to
|
|
:attr:`_device_count`. """
|
|
if self._is_plaidml:
|
|
self._device_count = self._plaid.device_count
|
|
elif IS_MACOS:
|
|
self._device_count = pynvx.cudaDeviceGetCount(ignore=True)
|
|
else:
|
|
try:
|
|
self._device_count = pynvml.nvmlDeviceGetCount()
|
|
except pynvml.NVMLError:
|
|
self._device_count = 0
|
|
self._log("debug", "GPU Device count: {}".format(self._device_count))
|
|
|
|
def _get_active_devices(self):
|
|
""" Obtain the indices of active GPUs (those that have not been explicitly excluded by
|
|
CUDA_VISIBLE_DEVICES, plaidML or command line arguments) and allocate to
|
|
:attr:`_active_devices`. """
|
|
if self._is_plaidml:
|
|
self._active_devices = self._plaid.active_devices
|
|
else:
|
|
if self._device_count == 0:
|
|
self._active_devices = []
|
|
else:
|
|
devices = [idx for idx in range(self._device_count) if idx not in _EXCLUDE_DEVICES]
|
|
env_devices = os.environ.get("CUDA_VISIBLE_DEVICES", "")
|
|
if env_devices:
|
|
env_devices = [int(i) for i in env_devices.split(",")]
|
|
devices = [idx for idx in devices if idx in env_devices]
|
|
self._active_devices = devices
|
|
self._log("debug", "Active GPU Devices: {}".format(self._active_devices))
|
|
|
|
def _get_handles(self):
|
|
""" Obtain the internal handle identifiers for the system GPUs and allocate to
|
|
:attr:`_handles`. """
|
|
if self._is_plaidml:
|
|
self._handles = self._plaid.devices
|
|
elif IS_MACOS:
|
|
self._handles = pynvx.cudaDeviceGetHandles(ignore=True)
|
|
else:
|
|
self._handles = [pynvml.nvmlDeviceGetHandleByIndex(i)
|
|
for i in range(self._device_count)]
|
|
self._log("debug", "GPU Handles found: {}".format(len(self._handles)))
|
|
|
|
def _get_driver(self):
|
|
""" Obtain and return the installed driver version for the system's GPUs.
|
|
|
|
Returns
|
|
-------
|
|
str
|
|
The currently installed GPU driver version
|
|
"""
|
|
if self._is_plaidml:
|
|
driver = self._plaid.drivers
|
|
elif IS_MACOS:
|
|
driver = pynvx.cudaSystemGetDriverVersion(ignore=True)
|
|
else:
|
|
try:
|
|
driver = pynvml.nvmlSystemGetDriverVersion().decode("utf-8")
|
|
except pynvml.NVMLError:
|
|
driver = "No Nvidia driver found"
|
|
self._log("debug", "GPU Driver: {}".format(driver))
|
|
return driver
|
|
|
|
def _get_devices(self):
|
|
""" Obtain the name of the installed devices. The quality of this information depends on
|
|
the backend and OS being used, but it should be sufficient for identifying cards.
|
|
|
|
Returns
|
|
-------
|
|
list
|
|
List of device names for connected GPUs as corresponding to the values in
|
|
:attr:`_handles`
|
|
"""
|
|
self._initialize()
|
|
if self._device_count == 0:
|
|
names = list()
|
|
if self._is_plaidml:
|
|
names = self._plaid.names
|
|
elif IS_MACOS:
|
|
names = [pynvx.cudaGetName(handle, ignore=True)
|
|
for handle in self._handles]
|
|
else:
|
|
names = [pynvml.nvmlDeviceGetName(handle).decode("utf-8")
|
|
for handle in self._handles]
|
|
self._log("debug", "GPU Devices: {}".format(names))
|
|
return names
|
|
|
|
def _get_vram(self):
|
|
""" Obtain the total VRAM in Megabytes for each connected GPU.
|
|
|
|
Returns
|
|
-------
|
|
list
|
|
List of floats containing the total amount of VRAM in Megabytes for each connected GPU
|
|
as corresponding to the values in :attr:`_handles
|
|
"""
|
|
self._initialize()
|
|
if self._device_count == 0:
|
|
vram = list()
|
|
elif self._is_plaidml:
|
|
vram = self._plaid.vram
|
|
elif IS_MACOS:
|
|
vram = [pynvx.cudaGetMemTotal(handle, ignore=True) / (1024 * 1024)
|
|
for handle in self._handles]
|
|
else:
|
|
vram = [pynvml.nvmlDeviceGetMemoryInfo(handle).total /
|
|
(1024 * 1024)
|
|
for handle in self._handles]
|
|
self._log("debug", "GPU VRAM: {}".format(vram))
|
|
return vram
|
|
|
|
def _get_free_vram(self):
|
|
""" Obtain the amount of VRAM that is available, in Megabytes, for each connected GPU.
|
|
|
|
Returns
|
|
-------
|
|
list
|
|
List of floats containing the amount of VRAM available, in Megabytes, for each
|
|
connected GPU as corresponding to the values in :attr:`_handles
|
|
|
|
Notes
|
|
-----
|
|
There is no useful way to get free VRAM on PlaidML. OpenCL loads and unloads VRAM as
|
|
required, so this returns the total memory available per card for AMD cards, which us
|
|
not particularly useful.
|
|
|
|
"""
|
|
self._initialize()
|
|
if self._is_plaidml:
|
|
vram = self._plaid.vram
|
|
elif IS_MACOS:
|
|
vram = [pynvx.cudaGetMemFree(handle, ignore=True) / (1024 * 1024)
|
|
for handle in self._handles]
|
|
else:
|
|
vram = [pynvml.nvmlDeviceGetMemoryInfo(handle).free / (1024 * 1024)
|
|
for handle in self._handles]
|
|
self._shutdown()
|
|
self._log("debug", "GPU VRAM free: {}".format(vram))
|
|
return vram
|
|
|
|
def get_card_most_free(self):
|
|
""" Obtain statistics for the GPU with the most available free VRAM.
|
|
|
|
Returns
|
|
-------
|
|
dict
|
|
The dictionary contains the following data:
|
|
|
|
**card_id** (`int`): The index of the card as pertaining to :attr:`_handles`
|
|
|
|
**device** (`str`): The name of the device
|
|
|
|
**free** (`float`): The amount of available VRAM on the GPU
|
|
|
|
**total** (`float`): the total amount of VRAM on the GPU
|
|
|
|
If a GPU is not detected then the **card_id** is returned as ``-1`` and the amount
|
|
of free and total RAM available is fixed to 2048 Megabytes.
|
|
"""
|
|
if len(self._active_devices) == 0:
|
|
return {"card_id": -1,
|
|
"device": "No GPU devices found",
|
|
"free": 2048,
|
|
"total": 2048}
|
|
free_vram = [self._get_free_vram()[i] for i in self._active_devices]
|
|
vram_free = max(free_vram)
|
|
card_id = self._active_devices[free_vram.index(vram_free)]
|
|
retval = {"card_id": card_id,
|
|
"device": self._devices[card_id],
|
|
"free": vram_free,
|
|
"total": self._vram[card_id]}
|
|
self._log("debug", "Active GPU Card with most free VRAM: {}".format(retval))
|
|
return retval
|