The goal of the Semantic UAST is to provide a set of UAST node types with a strictly defined semantic meaning that does not depend on the programming language.
Semantic UAST types are defined in the Babelfish SDK on top of the schema-less representation.
@type field in Object nodes is used to determine an exact type in the Semantic UAST type system. Besides the Semantic UAST types, drivers may emit language-dependant node types that were not yet covered by Semantic UAST concepts.
UAST node types can have a namespace similar to XML namespaces.
For example, Java AST defines a
Identifier node type, while Go AST defines a similar type called
Ident, and the Semantic UAST has it's own concept called
To distinguish between these node types the
lang: prefix is added to each type, and
uast: prefix is added for types defined by Semantic UAST. The prefix without the
: is called a namespace.
For our example, types listed above will be written in the following form when adding namespaces:
As described in the schema-less representation spec, object fields starting with
@ are considered internal and may be present on any object regardless of the type (schema).
This UAST specification defines few more special fields:
@token - a text representation of this node in the source file. This field is only available for compatibility reasons. If available,
@pos should be used to get the source code corresponding to the UAST node.
@role - stores an array with role codes. This field can be used to interpret native AST types that were not yet covered by Semantic UAST.
All other field are defined by the Semantic UAST schema.
Types are defined in the SDK. In case of doubt use that source file as reference.
Object that stores all positional information for a node. This node kind
Keys of this object can be arbitrary names for positional fields of the UAST node. Only two fields are defined:
end to allow users to access source snippet related to the node.
As an example of a custom positional information, a ternary operator
x ? y : z node may store individual positions for
: characters as a separate
else fields in
Positions node. This field will always be in the parent node under the
Represents a position in a source code file. Cannot have any fields except ones defined below. Belong to a
Positions parent node.
Position as an absolute byte offset (0-based index).
Line number (1-based index).
Column number (1-based index). The byte offset of the position relative to a line.
Identifier is a name for an entity. The name could be any valid UTF8 string.
An identifier name.
A UTF-8 string literal.
Format parameter is a driver-specific string format that was used for the literal in the source file.
An unescaped and unquoted UTF-8 string value.
Driver-specific format that was used for the literal in the source file.
Qualified name consists of multiple identifiers organized in a hierarchy. Identifiers are stored starting from the root level of hierarchy to the leaf. The closest analogy is the filesystem path.
A path elements starting from the root of the hierarchy to the leaf.
Comments can span any number of lines.
Block flag indicates that the comment uses block syntax (
/* ... */ in Go) instead of line-comment syntax (
// in Go).
Comments might have a prefix and suffix for the whole comment, and each comment line may also be prefixed with a
Tab to express a following pattern:
/** This is a multiline* block comment*/
In this case the
Suffix will be set to
Tab would be set to
An unescaped comment text (UTF-8).
A prefix added to the first line of the comment.
A suffix added to the last line of the comment.
A prefix added to each line of the comment.
If the comment is a multi-line comment.
Block groups multiple statements and enforces sequential execution of these statements.
Eventually, blocks will also include a reference to a scope if it defines one.
An ordered list of statements.
Aliases provide a way to assign a name to an entity or give it an alternative name in a specific scope. An alias acts like an immutable alias for an object. The only way to reassign the name used by an alias in a specific scope is to shadow it in a new child scope.
Alias should contain a reference to the scope where a name should be defined. But since scopes are not be covered by the current spec, an actual definition of this relation will be specified in the future.
Examples of aliases are names for types, constants, functions, local names for imports, local names for imported symbols, etc.
A name that is assigned to an entity.
An entity that will be aliased by a new name.
Imports are statements that can load external modules into a program or a library.
Import declaration can be described as a static statement in the sense that an effect of it is not affected by code execution and is not affected by the position of the node inside UAST.
Import can either:
Register all exported symbols in the target scope (
All == true).
Register specific symbols in the target scope (
len(Names) != 0).
Act as a side-effect import (both
Names field are not set).
A name that is assigned to an entity.
Import all definitions from the modules into the scope.
Import specific names from the module. Can refer to an
Runtime import has the same structure as an import declaration, but have slightly different semantics. Runtime import may appear anywhere in the code, thus it may be affected by code execution.
Runtime re-import has the same semantics as Runtime Import, but it will re-execute an initialization code when importing the same package the second time.
Generic grouping node containing other nodes, used as common ancestor for inheriting in other more specific types.
Group containing the nodes for a function definition.
Node representing the type signature of a function definition.
Argument or return value, usually for a
Type specification (
Default value, if given.
True if it takes a variable number of arguments in a list-like format.
True if it takes a variable number of arguments in a map-like format.