A program's source code is inspected module by module. As imports of modules are encountered, other modules are added to the program. The inspection process for each module involves top-to-bottom traversal of the code using depth-first processing of the abstract syntax tree, with a stack of namespaces being employed to record definitions and names within namespaces are they are visited. Inspection continues until all modules in the program have been imported and inspected.
The inspection process is focused on two primary tasks:
The results of inspection are written out to cache files, one for each module in the program.
The inspector module performs much of the inspection work, relying on the common module for certain tasks, with the modules module providing the relevant module abstractions including those writing to and reading from cache files, and the resolving module completing each module's inspection. The importer module coordinates inspection across the whole program.
Names can be introduced to a namespace via several different mechanisms:
The origins of names are discovered by considering local, global and built-in namespaces. Where a name is local, it is defined in the same namespace. Where a name is global, it is defined in the same module at the module level. Where a name is built-in, it is defined in a module in the __builtins__ package and exposed via the __builtins__ namespace.
Initial tentative identification of names will sort names into two categories: locals and external names. Global variables employed in function (or method) namespaces may not be defined when a function body is inspected, either within a module or being imported from another module, and so it is not initially possible to more specifically determine the nature of a name.
These categories are later refined to distinguish between globals and built-in names as external names. The built-in origin of a name is only suggested when no locals or globals can provide the name, and the final identification of such names, plus other external names introduced as locals or globals via imports, will occur at the end of the inspection activity. Names that are unrecognised by then may be treated like unrecognised built-ins.
Static objects, forming the fundamental structure of the program, expose their names through the general structure hierarchy. Classes, which are defined as part of this structure, depend on well-defined names for any base classes they employ. For example:
class C(A): # depends on A being a name that can be resolved (and being a class) ... class D(A.B.C): # depends on A.B.C being resolvable (and being a class) ...
Base classes must be identifiable and unambiguous. Since base classes may be imported, their identification may not necessarily occur immediately, but it must be possible once all information about a program is known and when all such dependencies are resolved.
Attributes are introduced to namespaces as follows:
Attributes are only supported on program objects if they have been defined or bound as described above. Any attempt to access or set an attribute on an object using a name that was not determined through the above process is considered an invalid operation. Thus, augmenting the attributes available on an object (so-called "monkeypatching") is not possible.
When an attribute is accessed, the location of its use is recorded and ultimately associated with assignments of the name involved, just as is done for accesses of the plain name itself, but the attribute details are subsequently collected together for each assignment or version of the name. This is discussed below.
In order to support inheritance, a process of propagation makes class and instance attributes available to any given class from its ancestor classes according to a depth-first traversal of a class's base class hierarchy. Thus, for each class, given the definition of its base classes, a complete collection of class and instance attribute names can be determined. The actual mechanism of propagation occurs during the consolidation phase of inspection, principally because class bases are not generally immediately identifiable upon completing the inspection of any given class.
Each assignment of a name defines a version of the name within a namespace. The location of this definition consists of the following:
When a name is used, the location of its use is recorded and is ultimately associated with the assignments that may be providing the name at that location. This permits information about the type of the name to become available for each usage location. The location of each name usage consists of the following:
Name accesses are described as special cases of attribute accesses: where attributes would be indicated, none are specified.
Attribute accesses are operations involving attributes where those attributes are obtained or set in conjunction with an accessor expression. They are recorded in terms of location tuples, each describing an access as follows:
As with name accesses, the access instance number distinguishes between different accesses employing the same details.
Consider the following example:
def f(): p = ... # p version 0 p.a # p.a access 0 if fn().a: # anonymous a access 0 q = ... # q version 0 q.a # q.a access 0 p # p access 0 else: q = ... # q version 1 q.a # q.a access 1 q.b # q.b access 0 p # p access 1 |
Since the names involved may be provided by different name versions, accesses are counted independently. Meanwhile, a non-name or anonymous accessor may be involved in an access. Such anonymous accessors are independent and do not accumulate attribute usage because they potentially involve completely different objects.
Namespace | Name | Attribute | Access number |
module.f | p | a | 0 |
module.f | p | {} | 0 |
module.f | p | {} | 1 |
module.f | q | a | 0 |
module.f | q | a | 1 |
module.f | q | b | 0 |
module.f | {} | a | 0 |
Accessors may be recorded using a similar location scheme. For example:
Namespace | Name | Attribute | Name version |
module.f | p | {} | 0 |
module.f | q | {} | 0 |
module.f | q | {} | 1 |
Here, the attribute field is left empty and indicates that name definitions are being described. Although the data is superficially similar to name accesses, it should be remembered that accesses employ an access number whereas accessors employ a name version, with such identifiers being different things.
Within each scope, the names employed by the program and the attributes used with those names are tracked. As the path of execution diverges and converges again through control-flow constructs, the tracking attempts to maintain the attribute usage associated with each name, or specifically each version of each name.
y = ... # y version 0 while cond0: if cond1: y.a1 elif cond2: y = ... # y version 1 y.a2 else: y.a3 # y version 0 is used with a1 or a3 or neither # y version 1 is used with a2 |
The outcome of such tracking should be an indication of the attribute usage with each name based on the shortest routes that names can take through the control-flow structure. Such shortest-route usage defines the minimal selection of attributes that can be considered used with a name, and thus such usage defines the broadest selection of object types that can be identified as supporting such attributes. In the above example, the following minimal selections of attributes apply:
Name Version | Minimal Set of Attributes | Types |
y (version 0) | empty set | any object |
y (version 1) | a2 | only objects providing the single attribute |
The assumption is made that any attribute used with a name is always provided by the object referenced by that name and that the correct execution of the code does not rely on an attribute being absent (and thus raising an exception). By defining usage for each name, the toolchain can determine whether any type can provide the attributes used with a name, producing a compile-time error if no type supports such usage.
It is possible that certain routes taken by names might define attribute usage that cannot be supported by types that do support the shortest-route usage, yet it might not be appropriate to forbid usage of such types with such names: the program logic may intend that such types do not visit the regions of the code that employ the attributes that such types cannot support. However, as a consequence, run-time tests will be required to prevent attribute accesses that are inappropriate for such types from taking place. In the above example, the following maximal selections apply:
Name Version | Maximal Set of Attributes | Types |
y (version 0) | a1, a3 | only objects providing both attributes |
y (version 1) | a1, a2, a3 | only objects providing all three attributes |
Tracking occurs separately in function or method namespaces and at a level combining the static namespaces in a module. This latter combination of namespaces permits the flow of global name details through class namespaces. Tracking employs special tracking names with which usage is associated, with globals and class attributes employing complete object paths to describe their names, whereas locals merely employ the plain names defined and used in local namespaces. Some examples follow:
Namespace | Name | Name Scope | Tracking Name |
__main__ (module) | x | global | __main__.x |
__main__.C (class) | x | global | __main__.x |
__main__.C (class) | y | local (class attribute) | __main__.C.y |
__main__.f (function) | y | local | y |
Each name version may be associated with a value upon assignment provided by an expression known as an initialiser. Such values may be used to inform the interpretation of operations on names, restricting the types involved to the initialised values. Some examples are as follows:
x = 123 # initialiser is a constant, indicating a known type x = classname() # initialiser refers to a class, indicating instantiation
Names may also be assigned the values of other names, and such a name becomes an alias for such other names. Aliases can therefore be used to combine attribute usage observations across names. Other forms of aliases involve assigning attributes accessed via other names to any given name. Some examples are as follows:
y = x # y y.p # p can be assumed to apply to x z = x.q # z can be restricted to the types deduced for x.q
Initialising values are retained for later resolution because their identities are not generally known until certain name resolution activities have taken place.
During inspection only limited information is available about the nature of invocations. However, some information will already be available about global names and these may be defined by classes. Thus, invocations that cause the instantiation of classes may be determined even during the inspection phase.
Information about class instantiation is most useful in combination with the initialisation of names. When an assignment occurs, any initialising expression that provides enough information can be evaluated to see if it describes instantiation. If so, the nature of the instantiation can be associated with the name and used in the deduction process to constrain any usage of that name and its attributes. Such restrictions on the types associated with names are applied in the deduction phase.
Literal values or literals are specific values that appear in the program, typically representing numbers, character strings, mappings or collections. Literal numbers and strings are effectively constants, meaning that their values are unambiguously defined and can eventually be referenced using a unique identifier that applies throughout the program and which can refer to an initialised object in any generated program. Initially, they are recorded for the namespace in which they appear. For example:
Namespace | Identifier | Type | Encoding | Value |
__main__ | __main__.$c0 | __builtins__.str.string | iso-8859-15 | '\xc6\xd8\xc5' |
__main__ | __main__.$c1 | __builtins__.int.int | 123 |
Since values may appear again in other namespaces, a mapping is generated from such local identifiers to common global identifiers for constants. Where such an identifier is derived from the value content, this can potentially occur immediately, although in practice (and in general) such a mapping will be generated after all modules have been inspected.
Collections and mappings may also be initialised using literal syntax but may contain non-constant information such as names or expressions whose values are unknown before the program is run. Such values can be represented as instantiation operations of the appropriate type in any generated program, and the instances of the type concerned can be associated with any names to which such literals are assigned. Special names are used to refer to literal types for collections and mappings. For example:
Identifier | Type |
$Ldict | __builtins__.dict.dict |
$Llist | __builtins__.list.list |
$Ltuple | __builtins__.tuple.tuple |
Such special names merely serve as convenient placeholders for the types of any literals mentioned in a module.
As briefly discussed above, certain activities occur after information has been collected from an input program. Within each module, names that are external to namespaces are recorded in a name reference collection. These references identify unrecognised names but do not generally define their origins. Examples of name references are as follows:
Name | Identity |
__main__.repr | <deferred>:__builtins__.repr |
__main__.sys | <module>:sys |
In order to obtain the final identities, deferred references may need to be resolved, yielding concrete references:
Name | Identity |
__main__.repr | <function>:__builtins__.identity.repr |
__main__.sys | <module>:sys |
This process of resolution cannot be performed immediately with only the knowledge available in a single module. Only with all program modules loaded can such resolution occur.
After all modules are loaded, and with all objects present in the program at this point, each deferred reference defined within a module should ultimately yield an identity. Any failure to do so indicates usage of a name not defined in the program. The process of identifying them by resolving what each of them points to, is illustrated by the following example:
Reference | Object Path to Search |
<depends>:__builtins__.repr | __builtins__.repr |
<depends>:__builtins__.identity.repr | __builtins__.identity.repr |
<function>:__builtins__.identity.repr | identification has occurred |
Initially, the origin information from the reference is used to find the details of the referenced object. However, in this case, the details yield another deferred reference that has not yet been resolved. Consequently, the origin information must be used to find the details of the object referenced in this case, finally yielding a reference with concrete object information. Deferred references are mutated to concrete references, thus changing the nature of such references throughout the accumulated data.
During deferred reference resolution, relationships are catalogued between modules in which deferred references are found and those providing the corresponding referenced objects. In addition, module ordering dependencies are established on the basis that some kinds of objects must be initialised before being used in other parts of the program. As a result, the modules providing such objects must themselves be initialised before the modules referencing such objects. More information on this topic can be found in the import sequencing documentation.
With all dependencies resolved, further work can be done with the details within each module. The resolving module provides the functionality to each module instance to perform such work.
Thus, attribute details for each module and its classes are finalised. Meanwhile, modules that are still hidden are removed from the program.
External names will not have been resolved during the inspection process, but with information about the whole program, it becomes possible to identify names and to determine whether attribute accesses involving those names can also be identified. Thus, name references from each namespace are investigated in turn as follows.
Each name referencing a static object is considered a constant access of that object. However, given a number of accompanying attributes describing an access, each attribute is added in turn, and the resulting object path identified. If the path identifies a constant object, the next attribute in the access chain is added and the resulting object path identified, and so on. The maximal collection of attributes identifying a constant object is then recorded as a constant access, with any remaining attributes in the access chain retained for traversal in a later phase.
Names yielding constant accesses do not need to have their identity deduced through the application of attribute usage.
During inspection, initialisers will have been recorded in terms of expression results such as names, invocations, readily-identifiable instances, and attribute accesses. With details of the whole program plus constant accesses having been made available, such results may be definitively identified and associated with initialised names.
Alias information may also be refined at this point by identifying attribute accesses that are used to initialise names.
With class contents and relationships identified, class attribute inheritance and instance attribute availability can be defined. Some classes may depend on external state for their initialisation, and so additional module dependencies may be defined at this stage. The importer module contains the functionality to conduct these whole-program activities.
Once inspection is complete, the available knowledge concerns the whole program and not merely individual modules or parts of the program. However, only limited efforts have been made until this point, notably in the acquisition of statically-defined object details when referencing names or attributes, to integrate the different modules. The deduction process is concerned with such integration.