datapyground.sql.parser

A basic SQL Parser for parsing SQL queries into Abstract Syntax Trees (AST).

Given a SQL Query like "SELECT id, name FROM users, orders WHERE users.id = orders.user_id", the parser will produce an AST like:

{
    "type": "select",
    "projections": [
        {"type": "identifier", "value": "id"},
        {"type": "identifier", "value": "name"},
    ],
    "from": [
        {"type": "identifier", "value": "users"},
        {"type": "identifier", "value": "orders"}
    ],
    "where": {
        "left": {"type": "identifier", "value": "users.id"},
        "op": {"type": "operator", "value": "="},
        "right": {"type": "identifier", "value": "orders.user_id"},
    },
    "group_by": None,
    "order_by": None,
    "limit": None,
    "offset": None,
}

At the moment only SELECT statements are handled because DataPyground is a read-only platform.

The parser is based on a simple recursive descent parsing approach, where each SQL clause is parsed by a dedicated method. The parser is implemented as a class with methods for each clause, and the parsing is done by advancing through the tokens and building the AST.

The parser is not a full SQL parser and has limitations in terms of what it can support. For example, it does not support nested queries, subqueries, or complex expressions.

The parser is designed to be simple and educational, to showcase how SQL queries can be parsed and converted into an AST. In a production project, you would typically use a dedicated SQL parser library like SQLGlot or Calcite.

Classes

class datapyground.sql.parser.Parser(text: str)[source]

A simple SQL Parser for parsing SQL queries into Abstract Syntax Trees (AST).

The Parser class identifies what type of query is being parsed and delegates the parsing to the appropriate parser for that query type. At the moment, only SELECT statements are supported and are handled by the SelectStatementParser class.

The parser relies on datapyground.sql.tokenize.Tokenizer to tokenize the input SQL query and convert it into a list of datapyground.sql.tokenize.Token objects that the parser will work with.

Parameters:

text – The input SQL query text to parse.

parse() dict[source]

Parse the query and return the Abstract Syntax Tree (AST).

The AST is always in the form of dictionary containing the structure of the parsed SQL query:

{
    "type": "select",
    "projections": [...],
    "from": [...],
    "where": [{...}],
    "group_by": [...],
    "order_by": [...],
    "limit": ...,
    "offset": ...
}
class datapyground.sql.parser.SelectStatementParser(tokens: list[Token])[source]

A parser for SELECT statements that converts SQL queries into Abstract Syntax Trees (AST).

The Parser class delegates the parsing of SELECT statements to this class, which handles the parsing of SELECT statements only.

The main parser has already tokenized the input , so this parser works directly with the tokens

Parameters:

tokens – A list of tokens representing the SQL query to parse.

advance(count: int = 1) None[source]

Advance the parser current_token to a subsequent token in the token list.

By default it will move to the next token, as that’s the most common use case.

But when invoking subparsers like datapyground.sql.expressions.ExpressionParser, it will be necessary to advance by as many tokens as the subparser consumed.

peek() Token[source]

Allows to take a look at what’s the next token, without advancing the parser.

In case the behavior of parsing the current token depends on the token that follows it, this function allows to check what’s the next token without consuming it.

After the parser has took a look at the subsequent token, and decided what to do with it, it will have to subsequently advance the parser to consume the token.

parse() dict[source]

Parse the SELECT statement and return the Abstract Syntax Tree (AST).

In case any part of the select is missing, like ORDER BY, GROUP BY, LIMIT, OFFSET, the corresponding AST field will be present and set to None.

The only required part are the SELECT projections and FROM clauses.

parse_projections() list[dict][source]

Parse the SELECT projections from the SQL query.

Projections can be expressions too, like SUM(salary), or ROUND(SUM(salary), 2) or even A + B, handles aliases like SELECT SUM(salary) AS total_salary too.

Returns a list of projection something like:

{"type": "projection", "value": {"type": "identifier", "value": "id"}, "alias": "CustomerID"}
parse_from_clause() list[dict][source]

Parse the FROM clause of the SQL query.

The FROM clause can contain multiple tables, separated by commas.

Returns a list of table names.

parse_join_clause(left_table: str) dict[source]

Parse a JOIN clause from the SQL query.

A JOIN clause starts with the optional Join Type: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL if missing, INNER JOIN is assumed.

The JOIN clause is followed by the table name to join with, and optionally the ON keyword followed by an expression constituting the join condition.

Returns a dictionary representing the join clause, looking like:

{
    "type": "join",
    "join_type": "inner",
    "left_table": {"type": "identifier", "value": "users"},
    "right_table": {"type": "identifier", "value": "orders"},
    "join_condition": {
        "left": {"type": "identifier", "value": "users.id"},
        "op": {"type": "operator", "value": "="},
        "right": {"type": "identifier", "value": "orders.user_id"},
    }
}
Parameters:

left_table – The name of the left table in the join clause.

parse_where_clause() dict[source]

Parse the WHERE clause of the SQL query.

The WHERE clause is a conditional expression that filters the rows returned by the query.

This method delegates the parsing of the expression to the ExpressionParser and it doesn’t do much more than that. It expects the expression parser to return a boolean expression, but it doesn’t enforce that.

It’s up to the consumer of the AST to interpret the WHERE clause correctly and eventually error if it’s not a valid boolean expression.

parse_group_by_clause() list[dict][source]

Parse the GROUP BY clause of the SQL query.

Returns the list of columns to group by, looking like:

[{"type": "identifier", "value": "id"}, {"type": "identifier", "value": "name"}]
parse_order_by_clause() list[dict][source]

Parse the ORDER BY clause of the SQL query.

Returns a list of columns to order by, with the sort order (ASC or DESC).

The result will look like:

[{"type": "ordering", "column": {"type": "identifier", "value": "id"}, "order": "ASC"},
 {"type": "ordering", "column": {"type": "identifier", "value": "name"}, "order": "DESC"}]
parse_limit_or_offset_clause() int[source]

Parse the LIMIT or OFFSET clauses of the SQL query.

Returns the value as an integer.

parse_expression() dict[source]

Parse an expression from the SQL query.

For the actual parsing it relies on the datapyground.sql.expressions.ExpressionParser, after which it advances the parser by the number of tokens consumed by the expression parser.

consume_punctuation(*values: str) bool[source]

Consume a punctuation token with a specific value.

If the current token is a punctuation token with one of the specified values, it will consume the token and return True. Otherwise, it will return False.

This is used by other parsing functions when there is a list of values to parse, it will consume the punctuation token slike ',' and return True as far as there are more values to consume.

Parameters:

values – The list of valid punctuation characters to consume.

Exceptions

SQLParseError

An exception raised when an error occurs during SQL parsing.