faceswap/lib/plaidml_tools.py
torzdf d8557c1970
Faceswap 2.0 (#1045)
* Core Updates
    - Remove lib.utils.keras_backend_quiet and replace with get_backend() where relevant
    - Document lib.gpu_stats and lib.sys_info
    - Remove call to GPUStats.is_plaidml from convert and replace with get_backend()
    - lib.gui.menu - typofix

* Update Dependencies
Bump Tensorflow Version Check

* Port extraction to tf2

* Add custom import finder for loading Keras or tf.keras depending on backend

* Add `tensorflow` to KerasFinder search path

* Basic TF2 training running

* model.initializers - docstring fix

* Fix and pass tests for tf2

* Replace Keras backend tests with faceswap backend tests

* Initial optimizers update

* Monkey patch tf.keras optimizer

* Remove custom Adam Optimizers and Memory Saving Gradients

* Remove multi-gpu option. Add Distribution to cli

* plugins.train.model._base: Add Mirror, Central and Default distribution strategies

* Update tensorboard kwargs for tf2

* Penalized Loss - Fix for TF2 and AMD

* Fix syntax for tf2.1

* requirements typo fix

* Explicit None for clipnorm if using a distribution strategy

* Fix penalized loss for distribution strategies

* Update Dlight

* typo fix

* Pin to TF2.2

* setup.py - Install tensorflow from pip if not available in Conda

* Add reduction options and set default for mirrored distribution strategy

* Explicitly use default strategy rather than nullcontext

* lib.model.backup_restore documentation

* Remove mirrored strategy reduction method and default based on OS

* Initial restructure - training

* Remove PingPong
Start model.base refactor

* Model saving and resuming enabled

* More tidying up of model.base

* Enable backup and snapshotting

* Re-enable state file
Remove loss names from state file
Fix print loss function
Set snapshot iterations correctly

* Revert original model to Keras Model structure rather than custom layer
Output full model and sub model summary
Change NNBlocks to callables rather than custom keras layers

* Apply custom Conv2D layer

* Finalize NNBlock restructure
Update Dfaker blocks

* Fix reloading model under a different distribution strategy

* Pass command line arguments through to trainer

* Remove training_opts from model and reference params directly

* Tidy up model __init__

* Re-enable tensorboard logging
Suppress "Model Not Compiled" warning

* Fix timelapse

* lib.model.nnblocks - Bugfix residual block
Port dfaker
bugfix original

* dfl-h128 ported

* DFL SAE ported

* IAE Ported

* dlight ported

* port lightweight

* realface ported

* unbalanced ported

* villain ported

* lib.cli.args - Update Batchsize + move allow_growth to config

* Remove output shape definition
Get image sizes per side rather than globally

* Strip mask input from encoder

* Fix learn mask and output learned mask to preview

* Trigger Allow Growth prior to setting strategy

* Fix GUI Graphing

* GUI - Display batchsize correctly + fix training graphs

* Fix penalized loss

* Enable mixed precision training

* Update analysis displayed batch to match input

* Penalized Loss - Multi-GPU Fix

* Fix all losses for TF2

* Fix Reflect Padding

* Allow different input size for each side of the model

* Fix conv-aware initialization on reload

* Switch allow_growth order

* Move mixed_precision to cli

* Remove distrubution strategies

* Compile penalized loss sub-function into LossContainer

* Bump default save interval to 250
Generate preview on first iteration but don't save
Fix iterations to start at 1 instead of 0
Remove training deprecation warnings
Bump some scripts.train loglevels

* Add ability to refresh preview on demand on pop-up window

* Enable refresh of training preview from GUI

* Fix Convert
Debug logging in Initializers

* Fix Preview Tool

* Update Legacy TF1 weights to TF2
Catch stats error on loading stats with missing logs

* lib.gui.popup_configure - Make more responsive + document

* Multiple Outputs supported in trainer
Original Model - Mask output bugfix

* Make universal inference model for convert
Remove scaling from penalized mask loss (now handled at input to y_true)

* Fix inference model to work properly with all models

* Fix multi-scale output for convert

* Fix clipnorm issue with distribution strategies
Edit error message on OOM

* Update plaidml losses

* Add missing file

* Disable gmsd loss for plaidnl

* PlaidML - Basic training working

* clipnorm rewriting for mixed-precision

* Inference model creation bugfixes

* Remove debug code

* Bugfix: Default clipnorm to 1.0

* Remove all mask inputs from training code

* Remove mask inputs from convert

* GUI - Analysis Tab - Docstrings

* Fix rate in totals row

* lib.gui - Only update display pages if they have focus

* Save the model on first iteration

* plaidml - Fix SSIM loss with penalized loss

* tools.alignments - Remove manual and fix jobs

* GUI - Remove case formatting on help text

* gui MultiSelect custom widget - Set default values on init

* vgg_face2 - Move to plugins.extract.recognition and use plugins._base base class
cli - Add global GPU Exclude Option
tools.sort - Use global GPU Exlude option for backend
lib.model.session - Exclude all GPUs when running in CPU mode
lib.cli.launcher - Set backend to CPU mode when all GPUs excluded

* Cascade excluded devices to GPU Stats

* Explicit GPU selection for Train and Convert

* Reduce Tensorflow Min GPU Multiprocessor Count to 4

* remove compat.v1 code from extract

* Force TF to skip mixed precision compatibility check if GPUs have been filtered

* Add notes to config for non-working AMD losses

* Rasie error if forcing extract to CPU mode

* Fix loading of legace dfl-sae weights + dfl-sae typo fix

* Remove unused requirements
Update sphinx requirements
Fix broken rst file locations

* docs: lib.gui.display

* clipnorm amd condition check

* documentation - gui.display_analysis

* Documentation - gui.popup_configure

* Documentation - lib.logger

* Documentation - lib.model.initializers

* Documentation - lib.model.layers

* Documentation - lib.model.losses

* Documentation - lib.model.nn_blocks

* Documetation - lib.model.normalization

* Documentation - lib.model.session

* Documentation - lib.plaidml_stats

* Documentation: lib.training_data

* Documentation: lib.utils

* Documentation: plugins.train.model._base

* GUI Stats: prevent stats from using GPU

* Documentation - Original Model

* Documentation: plugins.model.trainer._base

* linting

* unit tests: initializers + losses

* unit tests: nn_blocks

* bugfix - Exclude gpu devices in train, not include

* Enable Exclude-Gpus in Extract

* Enable exclude gpus in tools

* Disallow multiple plugin types in a single model folder

* Automatically add exclude_gpus argument in for cpu backends

* Cpu backend fixes

* Relax optimizer test threshold

* Default Train settings - Set mask to Extended

* Update Extractor cli help text
Update to Python 3.8

* Fix FAN to run on CPU

* lib.plaidml_tools - typofix

* Linux installer - check for curl

* linux installer - typo fix
2020-08-12 10:36:41 +01:00

288 lines
11 KiB
Python

#!/usr/bin python3
""" PlaidML tools.
Statistics and setup for PlaidML on AMD devices.
This module must be kept separate from Keras, and be called prior to any Keras import, as the
plaidML Keras backend is set from this module.
"""
import json
import logging
import os
import sys
import plaidml
_INIT = False
_LOGGER = None
_EXCLUDE_DEVICES = []
class PlaidMLStats():
""" Handles the initialization of PlaidML and the returning of GPU information for connected
cards from the PlaidML library.
This class is initialized early in Faceswap's Launch process from :func:`setup_plaidml`, with
statistics made available from :class:`~lib.gpu_stats.GPUStats`
Parameters
---------
log_level: str, optional
The requested Faceswap log level. Also dictates the level that PlaidML logging is set at.
Default:`"INFO"`
log: bool, optional
Whether this class should output to the logger. If statistics are being accessed during a
crash, then the logger may not be available, so this gives the option to turn logging off
in those kinds of situations. Default:``True``
"""
def __init__(self, log_level="INFO", log=True):
if not _INIT and log:
# Logger held internally, as we don't want to log when obtaining system stats on crash
global _LOGGER # pylint:disable=global-statement
_LOGGER = logging.getLogger(__name__)
_LOGGER.debug("Initializing: %s: (log_level: %s, log: %s)",
self.__class__.__name__, log_level, log)
self._initialize(log_level)
self._ctx = plaidml.Context()
self._supported_devices = self._get_supported_devices()
self._devices = self._get_all_devices()
self._device_details = [json.loads(device.details.decode())
for device in self._devices if device.details]
if self._devices and not self.active_devices:
self._load_active_devices()
if _LOGGER:
_LOGGER.debug("Initialized: %s", self.__class__.__name__)
# PROPERTIES
@property
def devices(self):
"""list: The :class:`pladml._DeviceConfig` objects for GPUs that PlaidML has
discovered. """
return self._devices
@property
def active_devices(self):
""" list: List of device indices for active GPU devices. """
return [idx for idx, d_id in enumerate(self._ids)
if d_id in plaidml.settings.device_ids and idx not in _EXCLUDE_DEVICES]
@property
def device_count(self):
""" int: The total number of GPU Devices discovered. """
return len(self._devices)
@property
def drivers(self):
""" list: The driver versions for each GPU device that PlaidML has discovered. """
return [device.get("driverVersion", "No Driver Found") for device in self._device_details]
@property
def vram(self):
""" list: The VRAM of each GPU device that PlaidML has discovered. """
return [int(device.get("globalMemSize", 0)) / (1024 * 1024)
for device in self._device_details]
@property
def names(self):
""" list: The name of each GPU device that PlaidML has discovered. """
return ["{} - {} ({})".format(
device.get("vendor", "unknown"),
device.get("name", "unknown"),
"supported" if idx in self._supported_indices else "experimental")
for idx, device in enumerate(self._device_details)]
@property
def _ids(self):
""" list: The device identification for each GPU device that PlaidML has discovered. """
return [device.id.decode() for device in self._devices]
@property
def _experimental_indices(self):
""" list: The indices corresponding to :attr:`_ids` of GPU devices marked as
"experimental". """
retval = [idx for idx, device in enumerate(self.devices)
if device not in self._supported_indices]
if _LOGGER:
_LOGGER.debug(retval)
return retval
@property
def _supported_indices(self):
""" list: The indices corresponding to :attr:`_ids` of GPU devices marked as
"supported". """
retval = [idx for idx, device in enumerate(self._devices)
if device in self._supported_devices]
if _LOGGER:
_LOGGER.debug(retval)
return retval
# INITIALIZATION
def _initialize(self, log_level):
""" Initialize PlaidML.
Set PlaidML to use Faceswap's logger, and set the logging level
Parameters
----------
log_level: str, optional
The requested Faceswap log level. Also dictates the level that PlaidML logging is set
at.
"""
global _INIT # pylint:disable=global-statement
if _INIT:
if _LOGGER:
_LOGGER.debug("PlaidML already initialized")
return
if _LOGGER:
_LOGGER.debug("Initializing PlaidML")
self._set_plaidml_logger()
self._set_verbosity(log_level)
_INIT = True
if _LOGGER:
_LOGGER.debug("Initialized PlaidML")
@classmethod
def _set_plaidml_logger(cls):
""" Set PlaidMLs default logger to Faceswap Logger and prevent propagation. """
if _LOGGER:
_LOGGER.debug("Setting PlaidML Default Logger")
plaidml.DEFAULT_LOG_HANDLER = logging.getLogger("plaidml_root")
plaidml.DEFAULT_LOG_HANDLER.propagate = 0
if _LOGGER:
_LOGGER.debug("Set PlaidML Default Logger")
@classmethod
def _set_verbosity(cls, log_level):
""" Set the PlaidML logging verbosity
log_level: str
The requested Faceswap log level. Also dictates the level that PlaidML logging is set
at.
"""
if _LOGGER:
_LOGGER.debug("Setting PlaidML Loglevel: %s", log_level)
if isinstance(log_level, int):
numeric_level = log_level
else:
numeric_level = getattr(logging, log_level.upper(), None)
if numeric_level < 10:
# DEBUG Logging
plaidml._internal_set_vlog(1) # pylint:disable=protected-access
elif numeric_level < 20:
# INFO Logging
plaidml._internal_set_vlog(0) # pylint:disable=protected-access
else:
# WARNING Logging
plaidml.quiet()
def _get_supported_devices(self):
""" Obtain GPU devices from PlaidML that are marked as "supported".
Returns
-------
list
The :class:`pladml._DeviceConfig` objects for GPUs that PlaidML has discovered.
"""
experimental_setting = plaidml.settings.experimental
plaidml.settings.experimental = False
devices = plaidml.devices(self._ctx, limit=100, return_all=True)[0]
plaidml.settings.experimental = experimental_setting
supported = [device for device in devices
if device.details
and json.loads(device.details.decode()).get("type", "cpu").lower() == "gpu"]
if _LOGGER:
_LOGGER.debug(supported)
return supported
def _get_all_devices(self):
""" Obtain all available (experimental and supported) GPU devices from PlaidML.
Returns
-------
list
The :class:`pladml._DeviceConfig` objects for GPUs that PlaidML has discovered.
"""
experimental_setting = plaidml.settings.experimental
plaidml.settings.experimental = True
devices, _ = plaidml.devices(self._ctx, limit=100, return_all=True)
plaidml.settings.experimental = experimental_setting
experi = [device for device in devices
if device.details
and json.loads(device.details.decode()).get("type", "cpu").lower() == "gpu"]
if _LOGGER:
_LOGGER.debug("Experimental Devices: %s", experi)
all_devices = experi + self._supported_devices
if _LOGGER:
_LOGGER.debug(all_devices)
return all_devices
def _load_active_devices(self):
""" If the plaidml user configuration settings exist, then set the default GPU from the
settings file, Otherwise set the GPU to be the one with most VRAM. """
if not os.path.exists(plaidml.settings.user_settings): # pylint:disable=no-member
if _LOGGER:
_LOGGER.debug("Setting largest PlaidML device")
self._set_largest_gpu()
else:
if _LOGGER:
_LOGGER.debug("Setting PlaidML devices from user_settings")
def _set_largest_gpu(self):
""" Set the default GPU to be a supported device with the most available VRAM. If no
supported device is available, then set the GPU to be the an experimental device with the
most VRAM available. """
category = "supported" if self._supported_devices else "experimental"
if _LOGGER:
_LOGGER.debug("Obtaining largest %s device", category)
indices = getattr(self, "_{}_indices".format(category))
if not indices:
_LOGGER.error("Failed to automatically detect your GPU.")
_LOGGER.error("Please run `plaidml-setup` to set up your GPU.")
sys.exit(1)
max_vram = max([self.vram[idx] for idx in indices])
if _LOGGER:
_LOGGER.debug("Max VRAM: %s", max_vram)
gpu_idx = min([idx for idx, vram in enumerate(self.vram)
if vram == max_vram and idx in indices])
if _LOGGER:
_LOGGER.debug("GPU IDX: %s", gpu_idx)
selected_gpu = self._ids[gpu_idx]
if _LOGGER:
_LOGGER.info("Setting GPU to largest available %s device. If you want to override "
"this selection, run `plaidml-setup` from the command line.", category)
plaidml.settings.experimental = category == "experimental"
plaidml.settings.device_ids = [selected_gpu]
def setup_plaidml(log_level, exclude_devices):
""" Setup PlaidML for AMD Cards.
Sets the Keras backend to PlaidML, loads the plaidML backend and makes GPU Device information
from PlaidML available to :class:`~lib.gpu_stats.GPUStats`.
Parameters
----------
log_level: str
Faceswap's log level. Used for setting the log level inside PlaidML
exclude_devices: list
A list of integers of device IDs that should not be used by Faceswap
"""
logger = logging.getLogger(__name__) # pylint:disable=invalid-name
logger.info("Setting up for PlaidML")
logger.verbose("Setting Keras Backend to PlaidML")
# Add explicitly excluded devices to list. The contents have already been checked in GPUStats
if exclude_devices:
_EXCLUDE_DEVICES.extend(int(idx) for idx in exclude_devices)
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
plaid = PlaidMLStats(log_level)
logger.info("Using GPU(s): %s", [plaid.names[i] for i in plaid.active_devices])
logger.info("Successfully set up for PlaidML")