Transforms¶
1. Problem Definition¶
A tools configuration is sometimes limited by the requirement that it can be built from the CLI arguments.
For example, a tool may want to be configured with a set of file paths, but the CLI only supports building list types.
Therefore, there is often a desire to transform a configuration option from one value to another ahead of running the primary flow of the tool.
A common example of a transformation is reading the contents of a file.
Many tools will accept a file path as a CLI argument, which is immediately opened, its content read, and then the file is closed on not used again.
This is a common pattern and can be abstracted into a reusable transformation: file_content(path: str) -> str.
In more complex examples, a particular file format may be expected, and have its own transformation. For example json files can use json_content(path: str) -> dict[str, Any]. Similar transformations can be defined for other file formats, such as CSV, yaml, pickle, etc.
A note on defaults and choices etc.¶
These transformations are limited to operation on the value read from the CLI. When constructing an instance of the dataclass through other mechanisms (such as the constructor provided by dataclasses) the transformation is not applied.
In order to support configuration options whose types match those of the CLI arguments (e.g. choices) the typing of those options must be updated to match the transformation (input) type.
E.g. a transformation that accepts an int and returns a str would require the choices to be a list of int values, not their string representations.
For default and default_factory values this is further complicated by the fact that these values are stored directly in the dataclasses field.
To maintain the typing correctly, the default value must be transformed before being assigned to the field.
Similarly, the default_factory must be stored in the dataclasses field as a function that returns the transformed value. I.e. default_factory = lambda: transform(factory()).
Unfortunately, this means that cfgclasses will separately have to store the _raw_ default value, since the un-transformed value needs to be specified to argparse.
Note however, that a default_factory value does not need to stored, as this is not supported by argparse, so if default_factory is used it can be called and assigned to the default value instead.
2. Solution¶
cfgclasses allows the specification of a transformation function for a particular configuration option.
The transformation function accepts a single argument or type T and returns a value of type U.
The configuration option must also be of type U, and type T must be type supported by cfgclasses arguments (simple types like str or int; list of simple types; or Optional of simple types).
Ideally a transformation function is strongly typed and cfgclasses and mypy would be capable of inferring the types T and U.
However, functions can be permissive with their inputs (e.g. Iterable[str]) where cfgclasses would require a stricted type (list[str]).
Furthermore, the use of lambda functions as transforms or the use of other untyped functions would mean inspecting the transform function for its typing would be difficult.
Therefore, a type for the input to the transformation must also be specified.
Class transformations¶
In addition to the above, it is also sometimes desirable to transform a group of configuration options into a single instance of another type.
For example, in the popular example case of specifying the logging level, it is common to have a --debug option and the mutually-exclusive --quiet option.
These options can then be used to set the logging level to DEBUG or ERROR respectively, with the default being INFO.
Without the use of transforms, the config class definition would look like so:
@mutually_exclusive
@dataclass
class LoggingConfig:
debug: bool = arg("Enable debug logging")
quiet: bool = arg("Disable info logging")
def log_level(self) -> int:
if self.debug:
return logging.DEBUG
elif self.quiet:
return logging.ERROR
else:
return logging.INFO
@dataclass
class Config:
log_level: LoggingConfig
While this pattern is sufficient, it does have one major drawback: When manually constructing an instance of Config, e.g. during testing, an invalid combination of debug and quiet can be used.
config = Config(LoggingConfig(debug=True, quiet=True))
In this situation, the behavior depends on which of the debug or quiet modes is checked first in the log_level() method.
With the use of a class transform, this problem can be avoided.
@mutually_exclusive
@dataclass
class LoggingConfig:
debug: bool = arg("Enable debug logging")
quiet: bool = arg("Disable info logging")
def log_level(self) -> int:
if self.debug:
return logging.DEBUG
elif self.quiet:
return logging.ERROR
else:
return logging.INFO
@dataclass
class Config:
log_level: int = cfgtransform(LoggingConfig, LoggingConfig.log_level)
In this case, the log_level configuration option is transformed from a LoggingConfig instance to an int using the LoggingConfig.log_level function used previously.
While it is still possible to construct invalid instances of LoggingConfig, it is no longer possible to construct an invalid instance of Config itself.
Another pattern where these transform are useful is constructing more complex classes from simple dataclass definitions. Often a class definition may be out of the programmers control (e.g. part of a third party library) or the required functionality may mean that usage of a dataclass is not possible. In these cases, the programmer can define a simple dataclass with options sufficient to then build the more complex class from.
class NotADataClass:
def __init__(self, a: int, b: str):
self.a = a
self.b = b
@dataclass
class DataClassConfig:
a: int = arg("An int")
b: str = arg("A string")
def to_not_a_dataclass(self) -> NotADataClass:
return NotADataClass(self.a, self.b)
@dataclass
class Config:
not_a_dataclass: NotADataClass = cfgtransform(
DataClassConfig,
DataClassConfig.to_not_a_dataclass,
)
3. Design ammendments¶
The following changes are made to the design to support transformations:
The
arg()andoptional()functions are updated to accept this transformation function as an optional argument. The typing of the functions is also updated to reflect the use of the transformation.These functions are also updated to accept the transformation type as an argument
The
ConfigOptsclass is updated to contain transformation function and transformation type members and is made to be Generic over the appropriate types, maintaining the type safety when using the transformations.The
ConfigOptsis also used to store the default value as this is now distinct from the value stored in thedataclassesfield.When building the
Specificationfor adataclassthe transformation function and type are extracted from theConfigOptsclass and stored in theSpecification.If not specified, the transformation function is set to the identity function and the transformation type is set to the type of the configuration option.
To build the
dataclassfrom the CLI arguments, instead of directly assigning the value from theargparse.Namespacethe transformation function is invoked with the value from theargparse.Namespace.
For the class transformation, the following changes are made:
A new
cfgtransform()function is added which takes a type and a transform function.A new
ConfigClassTransformtype is defined to contain the type and transform function for classes.The
Specificationis updated to check for this class in the dataclass metadata and store the transform information.When building the
dataclassfrom the CLI arguments, the transforms of each of its subspecs are invoked to apply the transformations.
4. Implementation and testing¶
Type safety is key point to be maintained with this change. It would be easy to overlook during implementation and testing with excessive use of the Any type.
Additional testing is to be implemented to ensure that the type safety is maintained. This includes test cases where mypy would fail due to transform functions and types being incompatible. This is an unusual testing pattern, but is required to verify and maintain that typing bugs are not introduced through this change or any future additions.