Design¶
This page provides design details for the cfgclasses module.
1. Problem Definition¶
- This project -
cfgclasses- aims to solve several issues around the development of command line interfaces (CLIs) in Python using argparse. These issues are: The output from argparse is a typeless namespace allowing
mypyto miss typing bugsDefining a tools CLI using argparse’s
add_argument()method involves repetetive boilerplate code
The impact of the type safety issue can be minimised by carefully structuring a tools entry points, however for smaller tools this is often frustrating and distracts from the core functionality of the tool. cfgclasses provides a 100% type safe interface for defining a tools CLI, and allows the programmer to focus on the structure of the configuration rather than the structure of the CLI.
The boilerplate issue is more of a problem for larger tools, but can be a significant issue for smaller tools as well. Often the structure of this boilerplate code leads to focus on the tools interface, and less on what the configuration structure of the tool is. This can lead to maintainability issues where new options are added without consideration for the impact on the overall configuration structure, resulting in invalid configurations being possible but not being caught. This is a natural outcome of the structure of argparse: the programmer defines the CLI for the tool, and then the namespace is generated from this definition. cfgclasses reverses this focus: the configuration structure is defined first, and the CLI is generated from this.
Adding to the boilerplate issue is the fact there are common patterns that appear in the usage of argparse’s add_argument() method, yet no central utility methods exist to encapsulate these patterns. In larger projects, utilities can be specified within the project itself for common argument patterns (e.g. file paths, boolean flags, etc), however for smaller projects this is not practical. cfgclasses aims to provide centralised utilities for common argument patterns.
Similarly, groups of tools which accept similar groups of arguments often duplicate code as building an argparse.ArgumentParser does not lead to natural code reuse. cfgclasses uses nested dataclass definitions (encapsulation), aiming to make code reuse easier and more natural. This same encapsulation also allows for logical grouping of configuration items and methods associated with these items, which can be useful for tools with large numbers of options.
1.1 Inspiration - dataclasses¶
The dataclasses module was introduced in Python 3.7, and provides a decorator that allows the definition of classes with minimal boilerplate. The dataclasses module is not a replacement for classes, but rather a tool to reduce the boilerplate required to define classes whose primary focus is to define a collection of stored data.
cfgclasses aims to build upon the dataclasses module to provide a similar interface that allows the definition of classes that represent the configuration of a tool. In the same way that dataclasses for not replace regular classes, cfgclasses does not aim to replace argparse, but rather to provide a more configuration-structure-first focussed interface for programmers building CLI tools.
1.2 Summary¶
- In summary,
cfgclassesaims to provide the following: A 100% type safe interface for defining a tools CLI
A configuration-structure-first focussed interface for defining a tools CLI
Centralised utilities for common argument patterns
cfgclasses does not aim to replace argparse, but to build upon it with the use of the dataclasses module.
Note
The cfgclasses module is not a replacement for argparse and there are features of argparse which are not currently, and may never be, supported.
2. Design Overview¶
This sections describes the high-level flow of the cfgclasses module. The subsections then go on to describe each step in the flow in greater detail including descriptions of the key classes and methods involved.
2.1 High-level flow¶
- The high-level design of
cfgclassesis as follows: A programmer defines their tools configuration in a class using the
dataclasses.dataclassdecorator.The programmer calls the
cfgclasses.parse_args()method on this class with the CLI arguments.The
parse_args()method performs the following internal flow:Creates a
Specificationinstance from the class definition.Builds an
argparse.ArgumentParserinstance from theSpecificationinstance.Parses the CLI arguments using this
argparse.ArgumentParser.Reads from the
argparse.Namespaceoutput to create an instance of the programmersdataclass.
Note that step (3) is entirely handled by the argparse module. Steps (1), (2) and (4) are each discussed further in the following subsections.
- The
Specificationclass is a representation of thedataclassdefinition, where each item of config is expressed by aSpecificationItemwhich has attributes mapping directly to the arguments accepted byargparse.ArgumentParser.add_argument(). Creating this specification of the arguments separately from the creation and population of theSpecificationbenefits the code flow in a number of ways: keeps the most complex logic away from the interaction with
argparse, simplifying the code and the testingputs this complex logic in pure functions, further improving the testability
simplifies the code path for nested configuration definitions as no branching is required
In addition to this parse_args() definition, the public API also includes ConfigOpts and NonPositionalConfigOpts classes (where the latter is a subclass of the former) which can be used to customize the options for a field in the dataclass definition. These are stored in the dataclasses.field() metadata for the field.
Furthermore, a collection of helper functions for common patterns are provided, as well as an arg() function which creates the dataclasses field and inserts a ConfigOpts instance into the metadata in one convenient call. As such, it is expected that a programmer will not need to directly interact with the ConfigOpts classes, but rather will use the arg() function and the helper functions. Only when a programmer wants finer control over the dataclasses field definition will they need to manually create a ConfigOpts instance.
2.2. Building the specification¶
The Specification is built from the dataclass definition by using the dataclasses module to inspect the class definition and producing a SpecificationItem for each field in the class, or another Specification instance for each nested dataclass. @@@ Is this nesting a problem?
- The
SpecificationItemis an abstract base class consisting of: a name for the field
a type for the field
an optional help string for the field
an optional metavar name for the field
an optional list of valid choices for the field
an optional default value for the field
- It contains the the following methods:
get_optnames()- returns a list of option namesget_kwargs()- returns a dictionary of keyword argumentsadd_to_parser()- adds an argument to the given parser using theget_optnames()andget_kwargs()methods
- There are then several subclasses of
SpecificationItemwhich are used to represent the different types of fields that can be defined in adataclasssupported bycfgclasses. These subclasses are: StandardSpecItem- a standard CLI argument (non-positional)OptionalSpecItem- an optional argumentBoolSpecItem- a boolean flag such as--debugListSpecItem- an argument accepting a list of values
PositionalSpecItem- a positional CLI argumentListPositionalSpecItem- a positional argument that accepts a list
These classifications of arguments are based on the sets of options to argparse that are required to be set to achieve the appropriate typing and related behavior. See the following section for the details of how these options are set.
2.2.1 Programmer customization¶
The programmer can specify the type of the field in the dataclass definition, and can use the default and default_factory options to control the default value given to argparse. However, to customise any other attributes for an entry, or to set an argument as positional, the dataclasses.field() metadata is used.
The dataclasses metadata is an untyped dictionary stored on fields, that allows us to store any information we like on a field as well as the built-in values like defaults. cfgclasses uses a particular entry in this metadata cfgclasses.CFG_METADATA_FIELD to store a ConfigOpts or NonPositionalConfigOpts instance.
- These classes allow the following customization:
help- the help string for the argumentmetavar- the ‘metavar’ name to display for the argumentchoices- the list of valid choices for the argument
- Additionally, the
NonPositionalConfigOptsclass allows the user to specify: optnames- the list of option names for the argument (e.g.["-d", "--debug"])
If there is no entry in a fields metadata for cfgclasses.CFG_METADATA_FIELD then the default value of NonPositionalConfigOpts is used. If an instance of ConfigOpts is found, then the field is treated as a positional argument.
2.2.2 Programmer argument helper functions¶
Interacting with dataclasses and creating instances of ConfigOpts is more verbose than necessary in many cases, so cfgclasses provides several utilities to improve this.
The arg() function has the following interface:
def arg(
help: str,
*optnames: str,
positional: bool = False,
metavar: Optional[str] = None,
choices: Optional[list[Any]] = None,
default: Any = dataclasses.MISSING,
default_factory: Optional[Callable[[], Any]] = None,
) -> dataclasses.Field[Any]:
These arguments are mapped to the equivalent in either dataclasses.field() for default and default_factory, or to the ConfigOpts constructor for the other arguments. The positional argument is used to determine whether to create a ConfigOpts or NonPositionalConfigOpts instance, and is positional is provided then optnames cannot be specified.
2.3 Populating the ArgumentParser¶
From the Specification, each member SpecificationItem is added to the ArgumentParser using add_argument() and each nested Specification is added with add_argument_group(), recursively adding its own members and subspecs to this new group.
- The
ArgOptsinstance for eachSpecificationItemcontains information on the arguments to be passed toargparse. There are some common options which can be set for all arguments, and there are options which are used to control the typing behavior ofargparseto match the expected output types. The common options are: type- the type of the argument that argparse should parse the argument ashelp- the help string for the argumentmetavar- the ‘metavar’ name to display for the argumentchoices- the list of valid choices for the argumentdefault- the default value for the argument
Other than type all of these are optional and are set by the programmer when defining their dataclass fields.
The additional option dest is also commonly used with add_argument to set the name of the attribute in the argparse.Namespace that the value of the argument should be stored in. This value matches the name of she SpecificationItem.
- The typing considerations are:
For lists:
nargsis set to"+"[1]typeis the inner type of the list (e.g.list[int] -> int)requiredis set toTrueifdefaultis not given
For optional types:
typeis the inner type of the optional (e.g.Optional[int] -> int)requiredis set toFalse
For booleans
action- the action to be taken when the argument is parsedtype- is not givenrequiredis not given
actionis set tostore_trueunlessdefaultisTruein which case the value used isstore_false[2]For all other types
typeis the type of the argumentrequiredis set toTrueifdefaultis not given
2.3.1 Positional options¶
cfgclasses allows the programmer to specify that any argument is positional, allowing the cli to appear as:
$ tool.py positional_arg
instead of:
$ tool.py --positional_arg positional_arg
However, this is only possible for regular and list types, not for Optional types or booleans. This is because there is no way for a user to specify the None value to a positional argument for Optional and argparse does not correctly parse True or False strings into booleans. Furthermore, a positional boolean argument is not very useful as a boolean flag is more descriptive with --, e.g. tool.py --debug rather than the more mysterious tool.py True.
Finally, required cannot be specified for any positional argument created for argparse.
2.4 Creating the dataclass instance from the argparse.Namespace¶
Once the argparse.Namespace has been produced it will contain attributes for all the SpecificationItem instances in the Specification and its nested subspecs. The attribute names will have been set using the dest option to add_argument() to match the name attribute of the SpecificationItem. The values of these attributes will have appropriate types such that they can be directly assigned to the corresponding field in the dataclass definition.
As such, the dataclass can be instantiated by iterating over the SpecificationItem instances and assigning the value of the corresponding attribute in the argparse.Namespace to the field in the dataclass. For nested configuration, we simply need to perform this operation depth-first so that we build the nested configuration instances and assign them to the correct fields in the parent configuration instance.
2.5 Summary¶
- The
parse_args()method of thecfgclassesmodule performs the following steps: Creates a
Specificationinstance from thedataclassdefinitionBuilds an
argparse.ArgumentParserinstance from theSpecificationinstanceParses the CLI arguments using this
argparse.ArgumentParserReads from the
argparse.Namespaceoutput to create an instance of the programmersdataclass
The bulk of the complexity is in the creation of the SpecificationItem instances as there is a combination of logic based on the type of the fields and the users options for the field to be processed.
Additional Feature Designs¶
There are additional features of the module that have their own design pages as listed below. These pages are expansions on this original design that build upon the core principles and design aspects covered here.