SQLFLUFF(1) | SQLFluff | SQLFLUFF(1) |
sqlfluff - SQLFluff 2.3.5
Bored of not having a good SQL linter that works with whichever dialect you're working with? Fluff is an extensible and modular linter designed to help you write good SQL and catch errors and bad SQL before it hits your database.
Notable releases:
For more detail on other releases, see our Release Notes.
Want to see where and how people are using SQLFluff in their projects? Head over to SQLFluff in the Wild for inspiration.
To get started just install the package, make a sql file and then run SQLFluff and point it at the file. For more details or if you don't have python or pip already installed see Getting Started.
$ pip install sqlfluff $ echo " SELECT a + b FROM tbl; " > test.sql $ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT01 | Expected only single space before 'SELECT' keyword. | Found ' '. [layout.spacing] L: 1 | P: 1 | LT02 | First line should not be indented. | [layout.indent] L: 1 | P: 1 | LT13 | Files must not begin with newlines or whitespace. | [layout.start_of_file] L: 1 | P: 11 | LT01 | Expected only single space before binary operator '+'. | Found ' '. [layout.spacing] L: 1 | P: 14 | LT01 | Expected only single space before naked identifier. | Found ' '. [layout.spacing] L: 1 | P: 27 | LT01 | Unnecessary trailing whitespace at end of file. | [layout.spacing] L: 1 | P: 27 | LT12 | Files must end with a single trailing newline. | [layout.end_of_file] All Finished 📜 🎉!
To get started with SQLFluff you'll need python and pip installed on your machine, if you're already set up, you can skip straight to Installing sqlfluff.
How to install python and pip depends on what operating system you're using. In any case, the python wiki provides up to date instructions for all platforms here.
There's a chance that you'll be offered the choice between python versions. Support for python 2 was dropped in early 2020, so you should always opt for a version number starting with a 3. As for more specific options beyond that, SQLFluff aims to be compatible with all current python versions, and so it's best to pick the most recent.
You can confirm that python is working as expected by heading to your terminal or console of choice and typing python --version which should give you a sensible read out and not an error.
$ python --version Python 3.9.1
For most people, their installation of python will come with pip (the python package manager) preinstalled. To confirm this you can type pip --version similar to python above.
$ pip --version pip 21.3.1 from ...
If however, you do have python installed but not pip, then the best instructions for what to do next are on the python website.
Assuming that python and pip are already installed, then installing SQLFluff is straight forward.
$ pip install sqlfluff
You can confirm its installation by getting SQLFluff to show its version number.
$ sqlfluff version 2.3.5
To get a feel for how to use SQLFluff it helps to have a small .sql file which has a simple structure and some known issues for testing. Create a file called test.sql in the same folder that you're currently in with the following content:
SELECT a+b AS foo, c AS bar from my_table
You can then run sqlfluff lint test.sql --dialect ansi to lint this file.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT09 | Select targets should be on a new line unless there is | only one select target. | [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations | and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'. | [layout.indent] L: 1 | P: 9 | LT01 | Expected single whitespace between naked identifier and | binary operator '+'. [layout.spacing] L: 1 | P: 10 | LT01 | Expected single whitespace between binary operator '+' | and naked identifier. [layout.spacing] L: 1 | P: 11 | LT01 | Expected only single space before 'AS' keyword. Found ' | '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces. | [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'. | [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case. | [capitalisation.keywords] All Finished 📜 🎉!
You'll see that SQLFluff has failed the linting check for this file. On each of the following lines you can see each of the problems it has found, with some information about the location and what kind of problem there is. One of the errors has been found on line 1, position * (as shown by :code:`L: 1 | P: 9`) and it's a problem with rule *LT01 (for a full list of rules, see Rules Reference). From this (and the following error) we can see that the problem is that there is no space either side of the + symbol in a+b. Head into the file, and correct this issue so that the file now looks like this:
SELECT a + b AS foo, c AS bar from my_table
Rerun the same command as before, and you'll see that the original error (violation of LT01) no longer shows up.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT09 | Select targets should be on a new line unless there is | only one select target. | [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations | and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'. | [layout.indent] L: 1 | P: 13 | LT01 | Expected only single space before 'AS' keyword. Found ' | '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces. | [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'. | [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case. | [capitalisation.keywords]
To fix the remaining issues, we're going to use one of the more advanced features of SQLFluff, which is the fix command. This allows more automated fixing of some errors, to save you time in sorting out your sql files. Not all rules can be fixed in this way and there may be some situations where a fix may not be able to be applied because of the context of the query, but in many simple cases it's a good place to start.
For now, we only want to fix the following rules: LT02, LT12, CP01
$ sqlfluff fix test.sql --rules LT02,LT12,CP01 --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'. | [layout.indent] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces. | [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'FROM'. | [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case. | [capitalisation.keywords] ==== fixing violations ==== 4 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n]
...at this point you'll have to confirm that you want to make the changes by pressing y on your keyboard...
Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content is now different.
SELECT a + b AS foo, c AS bar FROM my_table
In particular:
We could also fix all of the fixable errors by not specifying --rules.
$ sqlfluff fix test.sql --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations | and aggregates. [structure.column_order] L: 2 | P: 10 | LT01 | Expected only single space before 'AS' keyword. Found ' | '. [layout.spacing] ==== fixing violations ==== 2 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content has been updated again.
SELECT c AS bar, a + b AS foo FROM my_table
The SQL statement is now well formatted according to all the rules defined in SQLFluff.
The --rules argument is optional, and could be useful when you or your organisation follows a slightly different convention than what we have defined.
So far we've covered the stock settings of SQLFluff, but there are many different ways that people style their sql, and if you or your organisation have different conventions, then many of these behaviours can be configured. For example, given the example above, what if we actually think that indents should only be two spaces, and rather than uppercase keywords, they should all be lowercase?
To achieve this we create a configuration file named .sqlfluff and place it in the same directory as the current file. In that file put the following content:
[sqlfluff] dialect = ansi [sqlfluff:indentation] tab_space_size = 2 [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower
Then rerun the same command as before.
$ sqlfluff fix test.sql --rules LT02,LT12,CP01,ST06,LT09,LT01
Then examine the file again, and you'll notice that the file has been fixed accordingly.
select c as bar, a + b as foo from my_table
For a full list of configuration options check out Default Configuration. Note that in our example here we've only set a few configuration values and any other configuration settings remain as per the default config. To see how these options apply to specific rules check out the "Configuration" section within each rule's documentation in Rules Reference.
From here, there are several more things to explore.
One last thing to note is that SQLFluff is a relatively new project and you may find bugs or strange things while using it. If you do find anything, the most useful thing you can do is to post the issue on GitHub where the maintainers of the project can work out what to do with it. The project is in active development and so updates and fixes may come out regularly.
SQL has been around for a long time, as a language for communicating with databases, like a communication protocol. More recently with the rise of data as a business function, or a domain in its own right SQL has also become an invaluable tool for defining the structure of data and analysis - not just as a one off but as a form of infrastructure as code.
As analytics transitions from a profession of people doing one-offs, and moves to building stable and reusable pieces of analytics, more and more principles from software engineering are moving in the analytics space. One of the best articulations of this is written in the viewpoint section of the docs for the open-source tool dbt. Two of the principles mentioned in that article are quality assurance and modularity.
The primary aim of SQLFluff as a project is in service of that first aim of quality assurance. With larger and larger teams maintaining large bodies of SQL code, it becomes more and more important that the code is not just valid but also easily comprehensible by other users of the same codebase. One way to ensure readability is to enforce a consistent style, and the tools used to do this are called linters.
Some famous linters which are well known in the software community are flake8 and jslint (the former is used to lint the SQLFluff project itself).
SQLFluff aims to fill this space for SQL.
SQL itself doesn't lend itself well to modularity, so to introduce some flexibility and reusability it is often templated. Typically this is done in the wild in one of the following ways:
"SELECT {foo} FROM {tbl}".format(foo="bar", tbl="mytable")
Which would evaluate to:
SELECT bar FROM mytable
All of these templating tools are great for modularity but they also mean that the SQL files themselves are no longer valid SQL code, because they now contain these configured placeholder values, intended to improve modularity.
SQLFluff supports both of the templating methods outlined above, as well as dbt projects, to allow you to still lint these "dynamic" SQL files as part of your CI/CD pipeline (which is great 🙌), rather than waiting until you're in production (which is bad 🤦, and maybe too late).
During the CI/CD pipeline (or any time that we need to handle templated code), SQLFluff needs additional info in order to interpret your templates as valid SQL code. You do so by providing dummy parameters in SQLFluff configuration files. When substituted into the template, these values should evaluate to valid SQL (so SQLFluff can check its style, formatting, and correctness), but the values don't need to match actual values used in production. This means that you can use much simpler dummy values than what you would really use. The recommendation is to use the simplest possible dummy value that still allows your code to evaluate to valid SQL so that the configuration values can be as streamlined as possible.
SQLFluff has a few components:
The core vision [1] for SQLFluff is to be really good at being the linter. The reasoning for this is outlined in SQL in the Wild.
Most of the codebase for SQLFluff is the parser, mostly because at the point of developing SQLFluff, there didn't appear to be a good option for a whitespace-aware parser that could be used instead.
With regards to the rules, SQLFluff aims to be opinionated but it also accepts that many organisations and groups have pre-existing strong conventions around how to write SQL and so ultimately SQLFluff should be flexible enough to support whichever rule set a user wishes to.
Notes
Rolling out SQLFluff, like rolling out any other linter or style guide, is not just about the technical rollout, but also how you introduce the tool to the team and organisation around you.
With that in mind, it's worth reminding ourselves what we're trying to achieve with a tool like this. A set of potential success criteria might be:
You like leading commas? Make a PR to .sqlfluff and let's discuss with the team what the implications would be.
Consider which of these success measures is most important and most desirable for your team. Write that down.
The following steps are a guide, which you should adapt to your organisation, and in particular its level of data maturity.
This step is done by you, or a small group of people who already think that linting is a good idea.
There are three sensible rollout phases:
In each of these phases you have three levers to play with:
Work out a sensible roadmap of how hard you want to go in each phase. Be clear who is responsible for changes at each phase. An example plan might look like this:
Bring your team together to introduce both linting as a concept and also SQLFluff as a tool. At this stage it's really important that the team understand *why* this is a good thing.
Consider whether to discuss the whole plan from step 2, or whether to only talk about the first few steps. Aim to make this an empowering experience that everyone can get involved with rather than another piece of admin they need to do.
At this stage, you might also want to consider other tools in the SQLFluff ecosystem such as the SQLFluff pre-commit hook and the SQLFluff VSCode plugin or SQLFluff online formatter.
Once the plan is in motion, make sure to start putting in place norms and rituals around how you change the rules. In particular:
It's normal for your usage of tools like SQLFluff to change and evolve over time. It's important to expect this change in advance, and welcome it when it happens. Always make sure you're driving toward the success measures you decided up front, rather than just resisting the change.
Did it work? If so, spread the word. Tell a friend about SQLFluff.
If you're lucky they might share your views on comma placement 🤷♀️.
If there is one part of building a linter that is going to be controversial it's going to be whitespace (closely followed by cApiTaLiSaTiOn 😁).
More specifically, whitespace divides into three key themes:
SQLFluff aims to be opinionated on this theme, but also configurable (see Configuring Layout). The tool will have a default viewpoint and will aim to have views on all of the important aspects of SQL layout, but if you (or your organisation) don't like those views then we aim to allow enough configuration that you can lint in line with your views, and still use SQLFluff. For more information on how to configure rules to your own viewpoint see Configuration.
NOTE:
Of the different elements of whitespace, spacing is likely the least controversial. By default, all elements are separated by a single space character. Except for very specific circumstances (see section on Aligned elements), any additional space between elements is usually unwanted and a distraction for the reader. There are however several common cases where no whitespace is more appropriate, which fall into two cases (for more details on where to configure these see Configuring layout and spacing).
SELECT col_a , col_b -- Newline present before column , col_c -- When inline, comma should still touch element before. , GREATEST(col_d, col_e) as col_f FROM tbl_a
A special case of spacing is where elements are set to be aligned within some limits. This is not enabled by default, but can be be configured to achieve layouts like:
SELECT a AS first_column, b AS second_column, (a + b) / 2 AS third_column FROM foo AS bar
In this example, the alias expressions are all aligned with each other. To configure this, SQLFluff needs to know what elements to align and how far to search to find elements which should be aligned with each other. The configuration to achieve this layout is:
[sqlfluff:layout:type:alias_expression] # We want non-default spacing _before_ the alias expressions. spacing_before = align # We want to align them within the next outer select clause. # This means for example that alias expressions within the FROM # or JOIN clause would _not_ be aligned with them. align_within = select_clause # The point at which to stop searching outward for siblings, which # in this example would likely be the boundary of a CTE. Stopping # when we hit brackets is usually a good rule of thumb for this # configuration. align_scope = bracketed
Of these configuration values, the align_scope is potentially the least obvious. The following example illustrates the impact it has.
-- With -- align_scope = bracketed -- align_within = select_clause WITH foo as ( SELECT a, b, c AS first_column d + e AS second_column ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar; -- With -- align_scope = bracketed -- align_within = statement WITH foo as ( SELECT a, b, c AS first_column d + e AS second_column ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar -- Now the FROM alias is also aligned. -- With -- align_scope = file -- align_within = select_clause WITH foo as ( SELECT a, b, c AS first_column -- Now the aliases here are aligned d + e AS second_column -- with the outer query. ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar -- With -- align_scope = file -- align_within = statement WITH foo as ( SELECT a, b, c AS first_column d + e AS second_column ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar
When controlling line breaks, we are trying to achieve a few different things:
Lastly, given we have multiple lines of SQL, to what extent should we indent some lines to provide visual cues to the structure of that SQL. It's important to note that SQL is not whitespace sensitive in its interpretation and that means that any principles we apply here are entirely for the benefit of humans. Your database doesn't care.
The indentation therefore should be treated as a hint to the reader of the structure of the code. This explains the common practice within most languages that nested elements (for example the contents of a set of brackets in a function call) should be indented one step from the outer elements. It's also convention that elements with the same level in a nested structure should have the same indentation, at least with regards to their local surroundings. As an example:
SELECT nested_within_select AS first_column, some_function( nested_within_function, also_nested_within_function ) AS indented_the_same_as_opening_bracket FROM indented_the_same_as_select
NOTE:
Comments are dealt with differently, depending on whether they're block comments (/* like this */), which might optionally include newlines, or inline comments (-- like this) which are necessarily only on one line.
SELECT /* This is a block comment starting on a new line which contains a newline (continuing with at least the same indent. - potentially containing greater indents - having no other code following it in the same line - and aligned with the line of code following it */ this_column as what_we_align_the_column_to FROM my_table
SELECT -- This is fine this_column as what_we_align_to, another_column as something_short, -- Is ok case -- This is aligned correctly with below when indented then take_care else try_harder end as the_general_guidance -- Even here we align with the line below FROM my_table
NOTE:
One approach to indenting nested elements is a layout called a hanging indent. In this layout, there is no line break before the first nested element, but subsequent elements are indented to match the line position of that first element. Two examples might be:
-- A select statement with two hanging indents: SELECT no_line_break_before_me, indented_to_match_the_first, 1 + (a + b) AS another_more_complex_example FROM my_table; -- This TSQL example is also in essence a hanging indent: DECLARE @prv_qtr_1st_dt DATETIME, @last_qtr INT, @last_qtr_first_mn INT, @last_qtr_yr INT;
In some circumstances this layout can be quite neat (the DECLARE statement is a good example of this), however once indents are nested or indentation styles are mixed it can rapidly become confusing (as partially shown in the first example). Additionally, unless the leading element of the first line is very short, hanging indents use much larger indents than a traditional simple indent where a line break is used before the first element.
Hanging indents have been supported in SQLFluff up to the 1.x versions, however they will no longer by supported from 2.0.0 onwards. This is due to the ambiguity which they bring to fixing poorly formatted SQL. Take the following code:
SELECT this_is, badly_formatted, code_and, not_obvious, what_was, intended FROM my_table
Given the lack of line break between SELECT and this_is, it would appear that the user is intending a hanging indent, however it is also plausible that they did not and they just forgot to add a line break between them. This ambiguity is unhelpful, both for SQLFluff as a tool, but also for people who write SQL that there two ways of indenting their SQL. Given SQLFluff aims to provide consistency in SQL layout and remove some of the burden of needing to make choices like this - and that it would be very unusual to keep only hanging indents and disable traditional ones - the only route left to consistency is to not allow hanging indents. Starting in 2.0.0, any hanging indents detected will be converted to traditional indents.
A close cousin of the hanging indent is the implicit indent. While it does look a little like a hanging indent, it's much more consistent in its behaviour and is supported from SQLFluff 2.0.0 onwards.
An implicit indent is exactly like a normal indent, but doesn't have to be actually taken to influence the indentation of lines after it - it just needs to be left un-closed before the end of the line. These are normally available in clauses which take the form of KEYWORD <expression>, like WHERE clauses or CASE expressions.
-- This WHERE clause here takes advantage of an implicit indent. SELECT * FROM my_table WHERE condition_a AND condition_b; -- With implicit indents disabled (which is currently the -- default), the above formulation is not allowed, and instead -- there should be a newline immediately after `WHERE` (which -- is the location of the _implicit_ indent). SELECT * FROM my_table WHERE condition_a AND condition_b;
When addressing both indentation and line-length, implicit indents allow a slightly more compact layout, without significant drawbacks in legibility. They also enable a style much closer to some established style guides.
They are however not recommended by many of the major style guides at time of writing (including the dbt Labs SQL style guide and the Mozilla SQL style guide), and so are disabled by default. To enable them, set the allow_implicit_indents flag in sqluff.indentation to True.
SQLFluff supports templated elements in code, such as those offered by jinja2 (or dbt which relies on it). For simple cases, templated elements are handled as you would expect by introducing additional indents into the layout.
SELECT a, {% for n in ['b', 'c', 'd'] %} -- This section is indented relative to 'a' because -- it is inside a jinja for loop. {{ n }}, {% endfor %} e FROM my_table
This functionality can be turned off if you wish using the template_blocks_indent option in your Configuration.
It's important to note here, that SQLFluff lints the code after it has been rendered, and so only has access to code which is still present after that process.
SELECT a, {% if False %} -- This section of the code cannot be linted because -- it is never rendered due to the `if False` condition. my + poorly + spaced - and/indented AS section_of_code {% endif %} e FROM my_table
More complex templated cases are usually characterised by templated tags cutting across the parse tree. This more formally is where the opening and closing tags of a templated section exist at different levels in the parsed structure. Starting in version 2.x, these will be treated differently (Prior to version 2.x, situations like this were sometimes handled inconsistently or incorrectly).
Indentation should act as a visual cue to the structure of the written SQL, and as such, the most important thing is that template tags belonging to the same block structure use the same indentation. In the example below, this is the opening and closing elements of the second if statement. If treated as a simple case, these tags would have different indents, because they are at different levels of the parse tree and so clearly there is a conflict to be resolved.
The view SQLFluff takes on how to resolve this conflict is to pull all of the tags in this section down to the indent of the least indented (in the example below that would be the closing endif tag). This is similar to the treatment of C Preprocessor Directives, which are treated somewhat as being outside the structure of the rest of the file. In these cases, the content is also not further indented as in the simple case because it makes it harder to line up elements within the affected section and outside (in the example below the SELECT and FROM are a good illustration).
SELECT a, {% if True %} -- This is a simple case. The opening and closing tag are -- both at the same level within the SELECT clause. simple_case AS example, {% endif %} b, {% if True %} -- This is a complex case. The opening tag is within the SELECT -- clause, but the closing tag is outside the statement -- entirely. complex_case AS example FROM table_option_one {% else %} complex_case_two AS example FROM table_option_two {% endif %}
Configuration for layout is spread across three places:
One of the key areas for this is the indentation of the JOIN expression, which we'll use as an example.
Semantically, a JOIN expression is part of the FROM expression and therefore would be expected to be indented. However according to many of the most common SQL style guides (including the dbt Labs SQL style guide and the Mozilla SQL style guide) the JOIN keyword is expected to at the same indent as the FROM keyword. By default, SQLFluff sides with the current consensus, which is to not indent the JOIN keyword, however this is one element which is configurable.
By setting values in the sqlfluff:indentation section of your config file you can control how this is parsed.
For example, the default indentation would be as follows:
SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2
By setting your config file to:
[sqlfluff:indentation] indented_joins = True
Then the expected indentation will be:
SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2
There is a similar indented_using_on config (defaulted to True) which can be set to False to prevent the USING or ON clause from being indented, in which case the original SQL would become:
SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2
It's worth noting at this point, that for some users, the additional line break after ON is unexpected, and this is a good example of an implicit indent. By setting your config to:
[sqlfluff:indentation] indented_using_on = False allow_implicit_indents = True
Then the expected indentation will be:
SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2
There is also a similar indented_on_contents config (defaulted to True) which can be set to False to align any AND subsections of an ON block with each other. If set to False (assuming implicit indents are still enabled) the original SQL would become:
SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2
These can also be combined, so if indented_using_on config is set to False, indented_on_contents is also set to False, and allow_implicit_indents is set tot True then the SQL would become:
SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2
There is also a similar indented_ctes config (defaulted to False) which can be set to True to enforce CTEs to be indented within the WITH clause:
WITH some_cte AS ( SELECT 1 FROM table1 ), some_other_cte AS ( SELECT 1 FROM table1 ) SELECT 1 FROM some_cte
There is also a similar indented_then config (defaulted to True) which can be set to False to allow THEN without an indent after WHEN:
SELECT a, CASE WHEN b >= 42 THEN 1 ELSE 0 END AS c FROM some_table
By default, SQLFluff aims to follow the most common approach to indentation. However, if you have other versions of indentation which are supported by published style guides, then please submit an issue on GitHub to have that variation supported by SQLFluff.
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. The syntax of this section is very expressive; however in normal use, only very small alterations should be necessary from the Default Configuration.
The syntax of the section headings here select by type, which corresponds to the type defined in the dialect. For example the following section applies to elements of the type comma, i.e. ,.
[sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing
Within these configurable sections there are a few key elements which are available:
[sqlfluff:layout:type:comma] line_position = leading
-- Setting line_position to just `alone` -- within [sqlfluff:layout:type:set_operator] -- would not allow: SELECT a UNION SELECT b; -- ...or... SELECT a UNION SELECT b; -- but *would* allow both of the following: SELECT a UNION SELECT b; SELECT a UNION SELECT b; -- However the default is set to `alone:strict` -- then the *only* acceptable configuration is: SELECT a UNION SELECT b;
Rules in SQLFluff are implemented as crawlers. These are entities which work their way through the parsed structure of a query to evaluate a particular rule or set of rules. The intent is that the definition of each specific rule should be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away. To understand how rules are enabled and disabled see Enabling and Disabling Rules.
Certain rules belong to the core rule group. In order for a rule to be designated as core, it must meet the following criteria:
Core rules can also make it easier to roll out SQLFluff to a team by only needing to follow a 'common sense' subset of rules initially, rather than spending time understanding and configuring all the rules, some of which your team may not necessarily agree with.
We believe teams will eventually want to enforce more than just the core rules, and we encourage everyone to explore all the rules and customize a rule set that best suites their organization.
See the Configuration section for more information on how to enable only core rules by default.
SQLFluff features inline error ignoring. For example, the following will ignore the lack of whitespace surrounding the * operator.
a.a*a.b AS bad_1 -- noqa: LT01
Multiple rules can be ignored by placing them in a comma-delimited list.
a.a * a.b AS bad_2, -- noqa: LT01, LT03
It is also possible to ignore non-rule based errors, and instead opt to ignore templating (TMP) & parsing (PRS) errors.
WHERE col1 = 2 AND dt >= DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY) -- noqa: PRS
NOTE:
Should the need arise, not specifying specific rules to ignore will ignore all rules on the given line.
a.a*a.b AS bad_3 -- noqa
Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.
-- Ignore rule AL02 from this line forward SELECT col_a a FROM foo -- noqa: disable=AL02 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all
SQLFluff is designed to be flexible in supporting a variety of dialects. Not all potential dialects are supported so far, but several have been implemented by the community. Below are a list of the currently available dialects. Each inherits from another, up to the root ansi dialect.
For a canonical list of supported dialects, run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff.
NOTE:
Consider when adding new features to a dialect:
This is the base dialect which holds most of the definitions of common SQL commands and structures. If the dialect which you're actually using isn't specifically implemented by SQLFluff, using this dialect is a good place to start.
This dialect doesn't intend to be brutal in adhering to (and only to) the ANSI SQL spec (mostly because ANSI charges for access to that spec). It aims to be a representation of vanilla SQL before any other project adds their spin to it, and so may contain a slightly wider set of functions than actually available in true ANSI SQL.
The dialect for Amazon Athena.
The dialect for Google BigQuery.
The dialect for ClickHouse.
The dialect Databricks.
The dialect for Db2.
The dialect for DuckDB.
The dialect for Exasol.
The dialect for Greenplum.
The dialect for Hive.
The dialect for Materialize.
The dialect for MySQL.
The dialect for Oracle SQL. Note: this does not include PL/SQL.
This is based around the PostgreSQL spec. Many other SQL instances are often based on PostreSQL syntax. If you're running an unsupported dialect, then this is often the dialect to use (until someone makes a specific dialect).
The dialect for Amazon Redshift.
The dialect for Snowflake, which has much of its syntax inherited from PostgreSQL.
The dialect for SOQL (Salesforce Object Query Language).
The dialect for Apache Spark SQL. It inherits from ANSI and includes relevant syntax from Greenplum for commands that permit Hive Format. Spark SQL extensions provided by the Delta Lake project are also implemented in this dialect.
This implementation focuses on the Ansi Compliant Mode introduced in Spark3, instead of being Hive Compliant. The introduction of ANSI Compliance provides better data quality and easier migration from traditional DBMS.
Versions of Spark prior to 3.x will only support the Hive dialect.
The dialect for SQLite.
The dialect for T-SQL (aka Transact-SQL).
The dialect for Teradata.
The dialect for Trino.
SQLFluff is designed to be used both as a utility for developers but also to be part of CI/CD pipelines.
A full list of Security Advisories is available on GitHub.
Given the context of how SQLFluff is designed to be used, there are three different tiers of access which users may have access to manipulate how the tool functions in a secure environment.
To disable this option entirely via the CLI:
$ sqlfluff lint my_path --library-path none
To disable this option entirely via the python API:
"""This is an example of providing config overrides.""" from sqlfluff.core import FluffConfig, Linter sql = "SELECT 1\n" config = FluffConfig( overrides={ "dialect": "snowflake", # NOTE: We explicitly set the string "none" here rather # than a None literal so that it overrides any config # set by any config files in the path. "library_path": "none", } ) linted_file = Linter(config=config).lint_string(sql) assert linted_file.get_violations() == []
The exit code provided by SQLFluff when run as a command line utility is designed to assist usefulness in deployment pipelines. If no violations are found then the exit code will be 0. If violations are found then a non-zero code will be returned which can be interrogated to find out more.
For projects with large amounts of (potentially imperfect) SQL code, the full SQLFluff output could be very large, which can be distracting -- perhaps the CI build for a one-line SQL change shouldn't encourage the developer to fix lots of unrelated quality issues.
To support this use case, SQLFluff integrates with a quality checking tool called diff-quality. By running SQLFluff using diff-quality (rather than running it directly), you can limit the the output to the new or modified SQL in the branch (aka pull request or PR) containing the proposed changes.
Currently, diff-quality requires that you are using git for version control.
NOTE: Installing SQLFluff automatically installs the diff_cover package that provides the diff-quality tool.
In your CI build script:
1. Set the current working directory to the git repository containing the SQL code to be checked.
$ diff-quality --violations sqlfluff
The output will look something like:
------------- Diff Quality Quality Report: sqlfluff Diff: origin/master...HEAD, staged and unstaged changes ------------- sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql (0.0%): sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql:5: Unquoted Identifiers must be consistently upper case. ------------- Total: 1 line Violations: 1 line % Quality: 0% -------------
These messages are basically the same as those provided directly by SQLFluff, although the format is a little different. Note that diff-quality only lists the line _numbers_, not the character position. If you need the character position, you will need to run SQLFluff directly.
For more information on diff-quality, see the documentation. It covers topics such as:
pre-commit is a framework to manage git "hooks" triggered right before a commit is made.
A git hook is a git feature to "fire off custom scripts" when specific actions occur.
Using pre-commit with SQLFluff is a good way to provide automated linting to SQL developers.
With pre-commit, you also get the benefit of only linting/fixing the files that changed.
SQLFluff comes with two pre-commit hooks:
WARNING:
Although it is not advised, you can tell SQLFluff to try and fix these files by overriding the fix_even_unparsable setting in .sqlfluff config file or using the sqlfluff fix --FIX-EVEN-UNPARSABLE command line option.
Overriding this behavior may break your SQL. If you use this override, always be sure to review any fixes applied to files with templating or parse errors to verify they are okay.
You should create a file named .pre-commit-config.yaml at the root of your git project, which should look like this:
repos: - repo: https://github.com/sqlfluff/sqlfluff rev: 2.3.5 hooks: - id: sqlfluff-lint # For dbt projects, this installs the dbt "extras". # You will need to select the relevant dbt adapter for your dialect # (https://docs.getdbt.com/docs/available-adapters): # additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt'] - id: sqlfluff-fix # Arbitrary arguments to show an example # args: [--rules, "LT02,CP02"] # additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
When trying to use the dbt templater, uncomment the additional_dependencies to install the extras. This is equivalent to running pip install <dbt-adapter> sqlfluff-templater-dbt.
You can specify the version of dbt-adapter used in pre-commit, for example:
additional_dependencies : ['dbt-bigquery==1.0.0', 'sqlfluff-templater-dbt']
See the list of available dbt-adapters.
Note that you can pass the same arguments available through the CLI using args:.
There are two way to utilize SQLFluff to annotate Github PRs.
For more information and examples on using SQLFluff in GitHub Actions, see the sqlfluff-github-actions repository.
SQLFluff accepts configuration either through the command line or through configuration files. There is rough parity between the two approaches with the exception that templating configuration must be done via a file, because it otherwise gets slightly complicated.
For details of what's available on the command line check out the CLI Reference.
For file based configuration SQLFluff will look for the following files in order. Later files will (if found) will be used to overwrite any values read from earlier files.
Within these files, the first four will be read like a cfg file, and SQLFluff will look for sections which start with sqlfluff, and where subsections are delimited by a semicolon. For example the jinjacontext section will be indicated in the section started with [sqlfluff:jinjacontext].
For example, a snippet from a .sqlfluff file (as well as any of the supported cfg file types):
[sqlfluff] templater = jinja sql_file_exts = .sql,.sql.j2,.dml,.ddl [sqlfluff:indentation] indented_joins = False indented_using_on = True template_blocks_indent = False [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True
For the pyproject.toml file, all valid sections start with tool.sqlfluff and subsections are delimited by a dot. For example the jinjacontext section will be indicated in the section started with [tool.sqlfluff.jinjacontext].
For example, a snippet from a pyproject.toml file:
[tool.sqlfluff.core] templater = "jinja" sql_file_exts = ".sql,.sql.j2,.dml,.ddl" [tool.sqlfluff.indentation] indented_joins = False indented_using_on = True template_blocks_indent = False [tool.sqlfluff.templater] unwrap_wrapped_queries = True [tool.sqlfluff.templater.jinja] apply_dbt_builtins = True # For rule specific configuration, use dots between the names exactly # as you would in .sqlfluff. In the background, SQLFluff will unpack the # configuration paths accordingly. [tool.sqlfluff.rules.capitalisation.keywords] capitalisation_policy = "upper"
When setting up a new project with SQLFluff, we recommend keeping your configuration file fairly minimal. The config file should act as a form of documentation for your team i.e. a record of what decisions you've made which govern how your format your SQL. By having a more concise config file, and only defining config settings where they differ from the defaults - you are more clearly stating to your team what choices you've made.
However, there are also a few places where the default configuration is designed more for existing projects, rather than fresh projects, and so there is an opportunity to be a little stricter than you might otherwise be with an existing codebase.
Here is a simple configuration file which would be suitable for a starter project:
[sqlfluff] # Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html # Or run 'sqlfluff dialects' dialect = snowflake # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to exclude, or None # See https://docs.sqlfluff.com/en/stable/configuration.html#enabling-and-disabling-rules # AM04 (ambiguous.column_count) and ST06 (structure.column_order) are # two of the more controversial rules included to illustrate usage. exclude_rules = ambiguous.column_count, structure.column_order # The standard max_line_length is 80 in line with the convention of # other tools and several style guides. Many projects however prefer # something a little longer. # Set to zero or negative to disable checks. max_line_length = 120 # CPU processes to use while linting. # The default is "single threaded" to allow easy debugging, but this # is often undesirable at scale. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = -1 # If using the dbt templater, we recommend setting the project dir. [sqlfluff:templater:dbt] project_dir = ./ [sqlfluff:indentation] # While implicit indents are not enabled by default. Many of the # SQLFluff maintainers do use them in their projects. allow_implicit_indents = True # The default configuration for aliasing rules is "consistent" # which will auto-detect the setting from the rest of the file. This # is less desirable in a new project and you may find this (slightly # more strict) setting more useful. [sqlfluff:rules:aliasing.table] aliasing = explicit [sqlfluff:rules:aliasing.column] aliasing = explicit [sqlfluff:rules:aliasing.length] min_alias_length = 3 # The default configuration for capitalisation rules is "consistent" # which will auto-detect the setting from the rest of the file. This # is less desirable in a new project and you may find this (slightly # more strict) setting more useful. # Typically we find users rely on syntax highlighting rather than # capitalisation to distinguish between keywords and identifiers. # Clearly, if your organisation has already settled on uppercase # formatting for any of these syntax elements then set them to "upper". # See https://stackoverflow.com/questions/608196/why-should-i-capitalize-my-sql-keywords-is-there-a-good-reason [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower [sqlfluff:rules:capitalisation.identifiers] capitalisation_policy = lower [sqlfluff:rules:capitalisation.functions] extended_capitalisation_policy = lower [sqlfluff:rules:capitalisation.literals] capitalisation_policy = lower [sqlfluff:rules:capitalisation.types] extended_capitalisation_policy = lower
SQLFluff uses nesting in its configuration files, with files closer overriding (or patching, if you will) values from other files. That means you'll end up with a final config which will be a patchwork of all the values from the config files loaded up to that path. The exception to this is the value for templater, which cannot be set in config files in subdirectories of the working directory. You don't need any config files to be present to make SQLFluff work. If you do want to override any values though SQLFluff will use files in the following locations in order, with values from later steps overriding those from earlier:
This whole structure leads to efficient configuration, in particular in projects which utilise a lot of complicated templating.
In addition to configuration files mentioned above, SQLFluff also supports comment based configuration switching in files. This allows specific SQL file to modify a default configuration if they have specific needs.
When used, these apply to the whole file, and are parsed from the file in an initial step before the rest of the file is properly parsed. This means they can be used for both rule configuration and also for parsing configuration.
To use these, the syntax must start as an inline sql comment beginning with sqlfluff (i.e. -- sqlfluff). The line is then interpreted as a colon-seperated address of the configuation value you wish to set. A few common examples are shown below:
-- Set Indented Joins -- sqlfluff:indentation:indented_joins:True -- Set a smaller indent for this file -- sqlfluff:indentation:tab_space_size:2 -- Set keywords to be capitalised -- sqlfluff:rules:capitalisation.keywords:capitalisation_policy:upper SELECT * FROM a JOIN b USING(c)
We recommend only using this configuration approach for configuration that applies to one file in isolation. For configuration changes for areas of a project or for whole projects we recommend Nesting of configuration files.
Rules can be configured with the .sqlfluff config files.
Common rule configurations can be set in the [sqlfluff:rules] section.
For example:
[sqlfluff:rules] allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all
Rule specific configurations are set in rule specific subsections.
For example, enforce that keywords are upper case by configuring the rule CP01:
[sqlfluff:rules:capitalisation.keywords] # Keywords capitalisation_policy = upper
All possible options for rule sections are documented in Rules Reference.
For an overview of the most common rule configurations that you may want to tweak, see Default Configuration (and use Rules Reference to find the available alternatives).
The decision as to which rules are applied to a given file is applied on a file by file basis, by the effective configuration for that file. There are two configuration values which you can use to set this:
Each of these two configuration values accept a comma separated list of references. Each of those references can be:
These different references can be mixed within a given expression, which results in a very powerful syntax for selecting exactly which rules are active for a given file.
NOTE:
When considering configuration inheritance, each of rules and exclude_rules will totally overwrite any values in parent config files if they are set in a child file. While the subtraction operation between both of them is calculated "per file", there is no combination operation between two definitions of rules (just one overwrites the other).
The effect of this is that we recommend one of two approaches:
For example, to disable the rules LT08 and RF02:
[sqlfluff] exclude_rules = LT08, RF02
To enable individual rules, configure rules, respectively.
For example, to enable RF02:
[sqlfluff] rules = RF02
Rules can also be enabled/disabled by their grouping. Right now, the only rule grouping is core. This will enable (or disable) a select group of rules that have been deemed 'core rules'.
[sqlfluff] rules = core
More information about 'core rules' can be found in the Rules Reference.
Additionally, some rules have a special force_enable configuration option, which allows to enable the given rule even for dialects where it is disabled by default. The rules that support this can be found in the Rules Reference.
The default values can be seen in Default Configuration.
See also: Ignoring Errors & Files.
To keep displaying violations for specific rules, but not have those issues lead to a failed run, rules can be downgraded to warnings. Rules set as warnings won't cause a file to fail, but will still be shown in the CLI to warn users of their presence.
The configuration of this behaves very like exclude_rules above:
[sqlfluff] warnings = LT01, LT04
With this configuration, files with no other issues (other than those set to warn) will pass. If there are still other issues, then the file will still fail, but will show both warnings and failures.
== [test.sql] PASS L: 2 | P: 9 | LT01 | WARNING: Missing whitespace before + == [test2.sql] FAIL L: 2 | P: 8 | CP02 | Unquoted identifiers must be consistently upper case. L: 2 | P: 11 | LT01 | WARNING: Missing whitespace before +
This is particularly useful as a transitional tool when considering the introduction on new rules on a project where you might want to make users aware of issues without blocking their workflow (yet).
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. To understand more about this section, see the section of the docs dedicated to layout: Configuring Layout.
This section explains how to configure templating for SQL files.
When writing SQL files, users might utilise some kind of templating. The SQL file itself is written with placeholders which get rendered to proper SQL at run time. This can range from very simple placeholder templating to complex Jinja templating.
SQLFluff supports templated sections in SQL, see Stage 1, the templater. This is achieved by the following set of operations:
SQLFluff does not automatically have access to the same environment used in production template setup. This means it is necessary to either provide that environment or provide dummy values to effectively render the template and generate valid SQL. Refer to the templater sections below for details.
SQLFluff natively supports the following templating engines
Also, SQLFluff has an integration to use dbt as a templater.
NOTE:
For example, if the raw SQL uses a {% if condition %} block, the rendered version of the template will only include either the {% then %} or the {% else %} block (depending on the provided configuration for the templater), but not both.
In this case, because SQLFluff linting can only operate on the output of the templater, some areas of the raw SQL will never be seen by the linter and will not be covered by lint rules.
This is functionality we hope to support in future.
Variables are available in all the templaters. By default the templating engine will expect variables for templating to be available in the config, and the templater will be look in the section corresponding to the context for that templater. By convention, the config for the jinja templater is found in the sqlfluff:templater:jinja:context section, the config for the python templater is found in the sqlfluff:templater:python:context section, the one for the placeholder templater is found in the sqlfluff:templater:placeholder:context section.
For example, if passed the following .sql file:
SELECT {{ num_things }} FROM {{ tbl_name }} WHERE id > 10 LIMIT 5
...and the following configuration in .sqlfluff in the same directory:
[sqlfluff:templater:jinja:context] num_things=456 tbl_name=my_table
...then before parsing, the sql will be transformed to:
SELECT 456 FROM my_table WHERE id > 10 LIMIT 5
NOTE:
The Jinja templater uses Jinja2 to render templates.
There are multiple, complementary ways of configuring the Jinja templater.
Configuration | Variables | Macros | Filters | Documentation |
Config file | ✅ | ✅ | ❌ | Complex Jinja Variable Templating and Jinja Macro Templating (from config) |
Macro Path | ❌ | ✅ | ❌ | Jinja Macro Templating (from file) |
Library | ✅ | ✅ | ✅ | Library Templating |
For example, a snippet from a .sqlfluff file that uses all config options:
[sqlfluff] templater = jinja [sqlfluff:templater:jinja] apply_dbt_builtins = True load_macros_from_path = my_macros library_path = sqlfluff_libs [sqlfluff:templater:jinja:context] my_list = ['a', 'b', 'c'] MY_LIST = ("d", "e", "f") my_where_dict = {"field_1": 1, "field_2": 2} [sqlfluff:templater:jinja:macros] a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}
Apart from the Generic variable templating that is supported for all templaters, two more advanced features of variable templating are available for Jinja.
case sensitivity and native python types. Both are illustrated in the following example:
[sqlfluff:templater:jinja:context] my_list = ['a', 'b', 'c'] MY_LIST = ("d", "e", "f") my_where_dict = {"field_1": 1, "field_2": 2}
SELECT {% for elem in MY_LIST %} '{{elem}}' {% if not loop.last %}||{% endif %} {% endfor %} as concatenated_list FROM tbl WHERE {% for field, value in my_where_dict.items() %} {{field}} = {{value}} {% if not loop.last %}and{% endif %} {% endfor %}
...will render as...
SELECT 'd' || 'e' || 'f' as concatenated_list FROM tbl WHERE field_1 = 1 and field_2 = 2
Note that the variable was replaced in a case sensitive way and that the settings in the config file were interpreted as native python types.
Macros (which also look and feel like functions are available only in the jinja templater. Similar to Generic Variable Templating, these are specified in config files, what's different in this case is how they are named. Similar to the context section above, macros are configured separately in the macros section of the config. Consider the following example.
If passed the following .sql file:
SELECT {{ my_macro(6) }} FROM some_table
...and the following configuration in .sqlfluff in the same directory (note the tight control of whitespace):
[sqlfluff:templater:jinja:macros] a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}
...then before parsing, the sql will be transformed to:
SELECT 6 + 12 FROM some_table
Note that in the code block above, the variable name in the config is a_macro_def, and this isn't apparently otherwise used anywhere else. Broadly this is accurate, however within the configuration loader this will still be used to overwrite previous values in other config files. As such this introduces the idea of config blocks which could be selectively overwritten by other configuration files downstream as required.
In addition to macros specified in the config file, macros can also be loaded from files or folders. This is specified in the config file:
[sqlfluff:templater:jinja] load_macros_from_path = my_macros
load_macros_from_path is a comma-separated list of .sql files or folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for macros in the folder /home/my_project/my_macros/ (but not subfolders). Any macros defined in the config will always take precedence over a macro defined in the path.
Note: The load_macros_from_path setting also defines the search path for Jinja include or import. Unlike with macros (as noted above), subdirectories are supported. For example, if load_macros_from_path is set to my_macros, and there is a file my_macros/subdir/my_file.sql, you can do:
{% include 'subdir/include_comment.sql' %}
NOTE:
REMEMBER: The reason SQLFluff supports macros is to enable it to parse templated sql without it being a blocker. It shouldn't be a requirement that the templating is accurate - it only needs to work well enough that parsing and linting are helpful.
One of the main use cases which inspired SQLFluff as a project was dbt. It uses jinja templating extensively and leads to some users maintaining large repositories of sql files which could potentially benefit from some linting.
NOTE:
To use the dbt templater, go to dbt templater.
SQLFluff anticipates this use case and provides some built in macro blocks in the Default Configuration which assist in getting started with dbt projects. In particular it provides mock objects for:
NOTE:
If using SQLFluff with jinja as your templater, you may have library function calls within your sql files that can not be templated via the normal macro templating mechanisms:
SELECT foo, bar FROM baz {{ dbt_utils.group_by(2) }}
To template these libraries, you can use the sqlfluff:jinja:library_path config option:
[sqlfluff:templater:jinja] library_path = sqlfluff_libs
This will pull in any python modules from that directory and allow sqlfluff to use them in templates. In the above example, you might define a file at sqlfluff_libs/dbt_utils.py as:
def group_by(n): return "GROUP BY 1,2"
If an __init__.py is detected, it will be loaded alongside any modules and submodules found within the library path.
SELECT {{ custom_sum('foo', 'bar') }}, {{ foo.bar.another_sum('foo', 'bar') }} FROM baz
sqlfluff_libs/__init__.py:
def custom_sum(a: str, b: str) -> str: return a + b
sqlfluff_libs/foo/__init__.py:
# empty file
sqlfluff_libs/foo/bar.py:
def another_sum(a: str, b: str) -> str: return a + b
Additionally, the library can be used to expose Jinja Filters to the Jinja environment used by SQLFluff.
This is achieve by setting a global variable named SQLFLUFF_JINJA_FILTERS. SQLFLUFF_JINJA_FILTERS is a dictionary where
For example, to make the Airflow filter ds available to SQLFLuff, add the following to the __init__.py of the library:
# https://github.com/apache/airflow/blob/main/airflow/templates.py#L53 def ds_filter(value: datetime.date | datetime.time | None) -> str | None: """Date filter.""" if value is None: return None return value.strftime("%Y-%m-%d") SQLFLUFF_JINJA_FILTERS = {"ds": ds_filter}
Now, ds can be used in SQL
SELECT "{{ "2000-01-01" | ds }}";
Ignoring Jinja templating errors provides a way for users to use SQLFluff while reducing or avoiding the need to spend a lot of time adding variables to [sqlfluff:templater:jinja:context].
When --ignore=templating is enabled, the Jinja templater behaves a bit differently. This additional behavior is usually but not always helpful for making the file at least partially parsable and fixable. It definitely doesn’t guarantee that every file can be fixed, but it’s proven useful for some users.
Here's how it works:
For example:
select {{ my_variable }} from {% include "my_table.sql" %}
is interpreted as:
select my_variable from my_table
The values provided by the Jinja templater act a bit (not exactly) like a mixture of several types:
Because the values behave like Undefined, it's possible to replace them using Jinja's default() filter. For example:
select {{ my_variable | default("col_a") }} from my_table
is interpreted as:
select col_a from my_table
Libraries such as SQLAlchemy or Psycopg use different parameter placeholder styles to mark where a parameter has to be inserted in the query.
For example a query in SQLAlchemy can look like this:
SELECT * FROM table WHERE id = :myid
At runtime :myid will be replace by a value provided by the application and escaped as needed, but this is not standard SQL and cannot be parsed as is.
In order to parse these queries is then necessary to replace these placeholders with sample values, and this is done with the placeholder templater.
Placeholder templating can be enabled in the config using:
[sqlfluff] templater = placeholder
A few common styles are supported:
-- colon WHERE bla = :my_name -- colon_nospaces -- (use with caution as more prone to false positives) WHERE bla = table:my_name -- numeric_colon WHERE bla = :2 -- pyformat WHERE bla = %(my_name)s -- dollar WHERE bla = $my_name or WHERE bla = ${my_name} -- question_mark WHERE bla = ? -- numeric_dollar WHERE bla = $3 or WHERE bla = ${3} -- percent WHERE bla = %s -- ampersand WHERE bla = &s or WHERE bla = &{s} or USE DATABASE MARK_{ENV}
These can be configured by setting param_style to the names above:
[sqlfluff:templater:placeholder] param_style = colon my_name = 'john'
then you can set sample values for each parameter, like my_name above. Notice that the value needs to be escaped as it will be replaced as a string during parsing. When the sample values aren't provided, the templater will use parameter names themselves by default.
When parameters are positional, like question_mark, then their name is simply the order in which they appear, starting with 1.
[sqlfluff:templater:placeholder] param_style = question_mark 1 = 'john'
In case you need a parameter style different from the ones above, you can pass a custom regex.
[sqlfluff:templater:placeholder] param_regex = __(?P<param_name>[\w_]+)__ my_name = 'john'
N.B. quotes around param_regex in the config are interpreted literally by the templater. e.g. param_regex='__(?P<param_name>[w_]+)__' matches '__some_param__' not __some_param__
the named parameter param_name will be used as the key to replace, if missing, the parameter is assumed to be positional and numbers are used instead.
Also consider making a pull request to the project to have your style added, it may be useful to other people and simplify your configuration.
Uses native Python f-strings. As described in Generic Variable Templating, an example usage would look be configured as follows:
If passed the following .sql file:
SELECT * FROM {tbl_name}
...and the following configuration in .sqlfluff in the same directory:
[sqlfluff] templater = python [sqlfluff:templater:python:context] tbl_name = my_table
...then before parsing, the sql will be transformed to:
SELECT * FROM my_table
NOTE:
dbt templating is still a relatively new feature added in 0.4.0 and is still in very active development! If you encounter an issue, please let us know in a GitHub issue or on the SQLFluff slack workspace.
dbt is not the default templater for SQLFluff (it is jinja). dbt is a complex tool, so using the default jinja templater will be simpler. You should be aware when using the dbt templater that you will be exposed to some of the complexity of dbt. Users may wish to try both templaters and choose according to how they intend to use SQLFluff.
A simple rule of thumb might be:
Pros:
Cons:
In order to get started using SQLFluff with a dbt project you will first need to install the relevant dbt adapter for your dialect and the sqlfluff-templater-dbt package using your package manager of choice (e.g. pip install dbt-postgres sqlfluff-templater-dbt) and then will need the following configuration:
In .sqlfluff:
[sqlfluff] templater = dbt
In .sqlfluffignore:
target/ # dbt <1.0.0 dbt_modules/ # dbt >=1.0.0 dbt_packages/ macros/
You can set the dbt project directory, profiles directory and profile with:
[sqlfluff:templater:dbt] project_dir = <relative or absolute path to dbt_project directory> profiles_dir = <relative or absolute path to the directory that contains the profiles.yml file> profile = <dbt profile> target = <dbt target>
NOTE:
To use builtin dbt Jinja functions SQLFluff provides a configuration option that enables usage within templates.
[sqlfluff:templater:jinja] apply_dbt_builtins = True
This will provide dbt macros like ref, var, is_incremental(). If the need arises builtin dbt macros can be customised via Jinja macros in .sqlfluff configuration file.
[sqlfluff:templater:jinja:macros] # Macros provided as builtins for dbt projects dbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %} dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %} dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %} dbt_var = {% macro var(variable, default='') %}item{% endmacro %} dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %}
If your project requires that you pass variables to dbt through command line, you can specify them in template:dbt:context section of .sqlfluff. See below configuration and its equivalent dbt command:
[sqlfluff:templater:dbt:context] my_variable = 1
dbt run --vars '{"my_variable": 1}'
You already know you can pass arguments (--verbose, --exclude-rules, etc.) through the CLI commands (lint, fix, etc.):
$ sqlfluff lint my_code.sql -v --exclude-rules LT08,RF02
You might have arguments that you pass through every time, e.g rules you always want to ignore. These can also be configured:
[sqlfluff] verbose = 1 exclude_rules = LT08,RF02
Note that while the exclude_rules config looks similar to the above example, the verbose config has an integer value. This is because verbose is stackable meaning there are multiple levels of verbosity that are available for configuration. See CLI Reference for more details about the available CLI arguments. For more details about rule exclusion, see Enabling and Disabling Rules.
Similar to flake8's ignore, individual lines can be ignored by adding -- noqa to the end of the line. Additionally, specific rules can be ignored by quoting their code or the category.
-- Ignore all errors SeLeCt 1 from tBl ; -- noqa -- Ignore rule CP02 & rule CP03 SeLeCt 1 from tBl ; -- noqa: CP02,CP03 -- Ignore all parsing errors SeLeCt from tBl ; -- noqa: PRS
Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.
-- Ignore rule AL02 from this line forward SELECT col_a a FROM foo -- noqa: disable=AL02 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all
General categories of errors can be ignored using the --ignore command line option or the ignore setting in .sqlfluffignore. Types of errors that can be ignored include:
Similar to Git's .gitignore and Docker's .dockerignore, SQLFluff supports a .sqlfluffignore file to control which files are and aren't linted. Under the hood we use the python pathspec library which also has a brief tutorial in their documentation.
An example of a potential .sqlfluffignore placed in the root of your project would be:
# Comments start with a hash. # Ignore anything in the "temp" path /temp/ # Ignore anything called "testing.sql" testing.sql # Ignore any ".tsql" files *.tsql
Ignore files can also be placed in subdirectories of a path which is being linted and the sub files will also be applied within that subdirectory.
The default configuration is as follows, note the Builtin Jinja Macro Blocks in section [sqlfluff:templater:jinja:macros] as referred to above.
NOTE:
This is for two reasons:
If you are starting a fresh project and are looking for a good starter config, check out the New Project Configuration section above.
[sqlfluff] # verbose is an integer (0-2) indicating the level of log output verbose = 0 # Turn off color formatting of output nocolor = False # Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html # Or run 'sqlfluff dialects' dialect = None # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to check, default to all rules = all # Comma separated list of rules to exclude, or None exclude_rules = None # Below controls SQLFluff output, see max_line_length for SQL output output_line_length = 80 # Number of passes to run before admitting defeat runaway_limit = 10 # Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating) ignore = None # Warn only for rule codes (one of more rule codes, seperated by commas: e.g. LT01,LT02) # Also works for templating and parsing errors by using TMP or PRS warnings = None # Whether to warn about unneeded '-- noqa:' comments. warn_unused_ignores = False # Ignore linting errors found within sections of code coming directly from # templated code (e.g. from within Jinja curly braces. Note that it does not # ignore errors from literal code found within template loops. ignore_templated_areas = True # can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig encoding = autodetect # Ignore inline overrides (e.g. to test if still required) disable_noqa = False # Comma separated list of file extensions to lint # NB: This config will only apply in the root folder sql_file_exts = .sql,.sql.j2,.dml,.ddl # Allow fix to run on files, even if they contain parsing errors # Note altering this is NOT RECOMMENDED as can corrupt SQL fix_even_unparsable = False # Very large files can make the parser effectively hang. # The more efficient check is the _byte_ limit check which # is enabled by default. The previous _character_ limit check # is still present for backward compatibility. This will be # removed in a future version. # Set either to 0 to disable. large_file_skip_char_limit = 0 large_file_skip_byte_limit = 20000 # CPU processes to use while linting. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = 1 # Max line length is set by default to be in line with the dbt style guide. # https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md # Set to zero or negative to disable checks. max_line_length = 80 [sqlfluff:indentation] # See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations indent_unit = space tab_space_size = 4 indented_joins = False indented_ctes = False indented_using_on = True indented_on_contents = True indented_then = True indented_then_contents = True allow_implicit_indents = False template_blocks_indent = True # This is a comma seperated list of elements to skip # indentation edits to. skip_indentation_in = script_content # If comments are found at the end of long lines, we default to moving # them to the line _before_ their current location as the convention is # that a comment precedes the line it describes. However if you prefer # comments moved _after_, this configuration setting can be set to "after". trailing_comments = before # Layout configuration # See https://docs.sqlfluff.com/en/stable/layout.html#configuring-layout-and-spacing [sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing [sqlfluff:layout:type:binary_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:statement_terminator] spacing_before = touch line_position = trailing [sqlfluff:layout:type:end_of_file] spacing_before = touch [sqlfluff:layout:type:set_operator] line_position = alone:strict [sqlfluff:layout:type:start_bracket] spacing_after = touch [sqlfluff:layout:type:end_bracket] spacing_before = touch [sqlfluff:layout:type:start_square_bracket] spacing_after = touch [sqlfluff:layout:type:end_square_bracket] spacing_before = touch [sqlfluff:layout:type:start_angle_bracket] spacing_after = touch [sqlfluff:layout:type:end_angle_bracket] spacing_before = touch [sqlfluff:layout:type:casting_operator] spacing_before = touch spacing_after = touch:inline [sqlfluff:layout:type:slice] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:dot] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:comparison_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:assignment_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:object_reference] spacing_within = touch:inline [sqlfluff:layout:type:numeric_literal] spacing_within = touch:inline [sqlfluff:layout:type:sign_indicator] spacing_after = touch:inline [sqlfluff:layout:type:tilde] spacing_after = touch:inline [sqlfluff:layout:type:function_name] spacing_within = touch:inline spacing_after = touch:inline [sqlfluff:layout:type:array_type] spacing_within = touch:inline [sqlfluff:layout:type:typed_array_literal] spacing_within = touch [sqlfluff:layout:type:sized_array_type] spacing_within = touch [sqlfluff:layout:type:struct_type] spacing_within = touch:inline [sqlfluff:layout:type:bracketed_arguments] spacing_before = touch:inline [sqlfluff:layout:type:typed_struct_literal] spacing_within = touch [sqlfluff:layout:type:semi_structured_expression] spacing_within = touch:inline spacing_before = touch:inline [sqlfluff:layout:type:array_accessor] spacing_before = touch:inline [sqlfluff:layout:type:colon] spacing_before = touch [sqlfluff:layout:type:colon_delimiter] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:path_segment] spacing_within = touch [sqlfluff:layout:type:sql_conf_option] spacing_within = touch [sqlfluff:layout:type:sqlcmd_operator] # NOTE: This is the spacing between the operator and the colon spacing_before = touch [sqlfluff:layout:type:comment] spacing_before = any spacing_after = any [sqlfluff:layout:type:pattern_expression] # Snowflake pattern expressions shouldn't have their spacing changed. spacing_within = any [sqlfluff:layout:type:placeholder] # Placeholders exist "outside" the rendered SQL syntax # so we shouldn't enforce any particular spacing around # them. spacing_before = any spacing_after = any [sqlfluff:layout:type:common_table_expression] # The definition part of a CTE should fit on one line where possible. # For users which regularly define column names in their CTEs they # may which to relax this config to just `single`. spacing_within = single:inline # By setting a selection of clauses to "alone", we hint to the reflow # algorithm that in the case of a long single line statement, the # first place to add newlines would be around these clauses. # Setting this to "alone:strict" would always _force_ line breaks # around them even if the line isn't too long. [sqlfluff:layout:type:select_clause] line_position = alone [sqlfluff:layout:type:where_clause] line_position = alone [sqlfluff:layout:type:from_clause] line_position = alone [sqlfluff:layout:type:join_clause] line_position = alone [sqlfluff:layout:type:groupby_clause] line_position = alone [sqlfluff:layout:type:orderby_clause] # NOTE: Order by clauses appear in many places other than in a select # clause. To avoid unexpected behaviour we use `leading` in this # case rather than `alone`. line_position = leading [sqlfluff:layout:type:having_clause] line_position = alone [sqlfluff:layout:type:limit_clause] line_position = alone # Template loop tokens shouldn't dictate spacing around them. [sqlfluff:layout:type:template_loop] spacing_before = any spacing_after = any [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True # Some rules can be configured directly from the config common to other rules [sqlfluff:rules] allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all [sqlfluff:rules:capitalisation.keywords] # Keywords capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.identifiers] # Unquoted identifiers extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.functions] # Function names extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.literals] # Null & Boolean Literals capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.types] # Data Types extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:ambiguous.join] # Fully qualify JOIN clause fully_qualify_join_types = inner [sqlfluff:rules:ambiguous.column_references] # GROUP BY/ORDER BY column references group_by_and_order_by_style = consistent [sqlfluff:rules:aliasing.table] # Aliasing preference for tables aliasing = explicit [sqlfluff:rules:aliasing.column] # Aliasing preference for columns aliasing = explicit [sqlfluff:rules:aliasing.length] min_alias_length = None max_alias_length = None [sqlfluff:rules:aliasing.forbid] # Avoid table aliases in from clauses and join conditions. # Disabled by default for all dialects unless explicitly enabled. # We suggest instead using aliasing.length (AL06) in most cases. force_enable = False [sqlfluff:rules:convention.select_trailing_comma] # Trailing commas select_clause_trailing_comma = forbid [sqlfluff:rules:convention.count_rows] # Consistent syntax to count all rows prefer_count_1 = False prefer_count_0 = False [sqlfluff:rules:convention.terminator] # Semi-colon formatting approach multiline_newline = False require_final_semicolon = False [sqlfluff:rules:convention.blocked_words] # Comma separated list of blocked words that should not be used blocked_words = None blocked_regex = None match_source = False [sqlfluff:rules:convention.quoted_literals] # Consistent usage of preferred quotes for quoted literals preferred_quoted_literal_style = consistent # Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres) force_enable = False [sqlfluff:rules:convention.casting_style] # SQL type casting preferred_type_casting_style = consistent [sqlfluff:rules:references.from] # References must be in FROM clause # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:references.qualification] # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.consistent] # References must be consistently used # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:references.keywords] # Keywords should not be used as identifiers. unquoted_identifiers_policy = aliases quoted_identifiers_policy = none # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.special_chars] # Special characters in identifiers unquoted_identifiers_policy = all quoted_identifiers_policy = all allow_space_in_identifier = False additional_allowed_characters = None ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.quoting] # Policy on quoted and unquoted identifiers prefer_quoted_identifiers = False prefer_quoted_keywords = False ignore_words = None ignore_words_regex = None force_enable = False [sqlfluff:rules:layout.long_lines] # Line length ignore_comment_lines = False ignore_comment_clauses = False [sqlfluff:rules:layout.select_targets] wildcard_policy = single [sqlfluff:rules:structure.subquery] # By default, allow subqueries in from clauses, but not join clauses forbid_subquery_in = join [sqlfluff:rules:structure.join_condition_order] preferred_first_table_in_join_clause = earlier
SQLFluff exposes a public api for other python applications to use. A basic example of this usage is given here, with the documentation for each of the methods below.
"""This is an example of how to use the simple sqlfluff api.""" from typing import Any, Dict, Iterator, List, Union import sqlfluff # -------- LINTING ---------- my_bad_query = "SeLEct *, 1, blah as fOO from mySchema.myTable" # Lint the given string and return an array of violations in JSON representation. lint_result = sqlfluff.lint(my_bad_query, dialect="bigquery") # lint_result = # [ # { # "code": "CP01", # "line_no": 1, # "line_pos": 1, # "description": "Keywords must be consistently upper case.", # } # ... # ] # -------- FIXING ---------- # Fix the given string and get a string back which has been fixed. fix_result_1 = sqlfluff.fix(my_bad_query, dialect="bigquery") # fix_result_1 = 'SELECT *, 1, blah AS foo FROM myschema.mytable\n' # We can also fix just specific rules. fix_result_2 = sqlfluff.fix(my_bad_query, rules=["CP01"]) # fix_result_2 = 'SELECT *, 1, blah AS fOO FROM mySchema.myTable' # Or a subset of rules... fix_result_3 = sqlfluff.fix(my_bad_query, rules=["CP01", "CP02"]) # fix_result_3 = 'SELECT *, 1, blah AS fOO FROM myschema.mytable' # -------- PARSING ---------- # Parse the given string and return a JSON representation of the parsed tree. parse_result = sqlfluff.parse(my_bad_query) # parse_result = {'file': {'statement': {...}, 'newline': '\n'}} # This JSON structure can then be parsed as required. # An example usage is shown below: def get_json_segment( parse_result: Dict[str, Any], segment_type: str ) -> Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]: """Recursively search JSON parse result for specified segment type. Args: parse_result (Dict[str, Any]): JSON parse result from `sqlfluff.fix`. segment_type (str): The segment type to search for. Yields: Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]: Retrieves children of specified segment type as either a string for a raw segment or as JSON or an array of JSON for non-raw segments. """ for k, v in parse_result.items(): if k == segment_type: yield v elif isinstance(v, dict): yield from get_json_segment(v, segment_type) elif isinstance(v, list): for s in v: yield from get_json_segment(s, segment_type) # e.g. Retrieve array of JSON for table references. table_references = list(get_json_segment(parse_result, "table_reference")) print(table_references) # [[{'identifier': 'mySchema'}, {'dot': '.'}, {'identifier': 'myTable'}]] # Retrieve raw table name from last identifier in the table reference. for table_reference in table_references: table_name = list(get_json_segment(parse_result, "naked_identifier"))[-1] print(f"table_name: {table_name}") # table_name: myTable
The simple API presents only a fraction of the functionality present within the core SQLFluff library. For more advanced use cases, users can import the Linter() and FluffConfig() classes from sqlfluff.core. As of version 0.4.0 this is considered as experimental only as the internals may change without warning in any future release. If you come to rely on the internals of SQLFluff, please post an issue on GitHub to share what you're up to. This will help shape a more reliable, tidy and well documented public API for use.
This page aims to act as a guide for migrating between major SQLFluff releases. Necessarily this means that bugfix releases, or releases requiring no change for the user are not mentioned. For full details of each individual release, see the detailed changelog.
This release changes some of the interfaces between SQLFluff core and our plugin ecosystem. The only breaking change is in the interface between SQLFluff and templater plugins (which are not common in the ecosystem, hence why this is only a minor and not a major release).
For all plugins, we also recommend a different structure for their imports (especially for rule plugins which are more common in the ecosystem) - for performance and stability reasons. Some users had been experiencing very long import times with previous releases as a result of the layout of plugin imports. Users with affected plugins will begin to see a warning from this release onward, which can be resolved for their plugin by updating to a new version of that plugin which follows the guidelines.
Templaters before this version would pass a make_template() callable to the slicing methods as part of being able to map the source file. This method would accept a str and return a jinja2.environment.Template object to allow the templater to render multiple variants of the template to do the slicing operation (which allows linting issues found in templated files to be mapped accurately back to their position in the unrendered source file). This approach is not very generalisable, and did not support templating operations with libraries other than jinja2.
As a result, we have amended the interface to instead pass a render_func() callable, which accepts a str and returns a str. This works fine for the jinja templater (and by extension the dbt templater) as they can simply wrap the original callable with a method that calls render() on the original Template object. It also however opens up the door to other templating engines, and in particular to remote templaters which might pass unrendered code over a HTTP connection for rendering.
Specifically:
We recommend that the module in a plugin which defines all of the hook implementations (anything using the @hookimpl decorator) must be able to fully import before any rule implementations are imported. More specifically, SQLFluff must be able to both import and run any implementations of get_configs_info() before any plugin rules (i.e. any derivatives of BaseRule) are imported. Because of this, we recommend that rules are defined in a separate module to the root of the plugin and then only imported within the get_rules() method.
Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented from this version onward, recommending that you update your plugin.
See the Developing Plugins section of the docs for an example.
Upgrading to 2.0 brings several important breaking changes:
To upgrade smoothly between versions, we recommend the following sequence:
To illustrate the points above, this is an illustrative example config for a 2.0 compatible project. Note that the config is fairly brief and sets only the values which differ from the default config.
[sqlfluff] dialect = snowflake templater = dbt max_line_length = 120 # Exclude some specific rules based on a mixture of codes and names exclude_rules = RF02, RF03, RF04, ST06, ST07, AM05, AM06, convention.left_join, layout.select_targets [sqlfluff:indentation] # Enabling implicit indents for this project. # See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations allow_implicit_indents = True # Add a few specific rule configurations, referenced by the rule names # and not by the rule codes. [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower [sqlfluff:rules:capitalisation.identifiers] capitalisation_policy = lower [sqlfluff:rules:capitalisation.functions] extended_capitalisation_policy = lower # An example of setting a custom layout specification which # is more lenient than default config. [sqlfluff:layout:type:set_operator] line_position = alone
This release brings several internal changes, and acts as a prelude to 2.0.0. In particular, the following config values have changed:
If any of these values have been set in your config, they will be automatically translated to the new values at runtime, and a warning will be shown. To silence the warning, update your config file to the new values. For more details on configuring layout see Configuring Layout.
This release brings several potentially breaking changes to the underlying parse tree. For users of the cli tool in a linting context you should notice no change. If however your application relies on the structure of the SQLFluff parse tree or the naming of certain elements within the yaml format, then this may not be a drop-in replacement. Specifically:
If using the python api, the parent type (such as identifier) will still register if you call .is_type("identifier"), as this function checks all inherited types. However the eventual type returned by .get_type()` will now be (in most cases) what used to be accessible at .name. The name attribute will be deprecated in a future release.
This release introduces the capability to automatically skip large files, and sets default limits on the maximum file size before a file is skipped. Users should see a performance gain, but may experience warnings associated with these skipped files.
It is recommended that the following is read in conjunction with exploring the codebase. dialect_ansi.py in particular is helpful to understand the recursive structure of segments and grammars. Some more detail is also given on our Wiki including a Contributing Dialect Changes guide.
At a high level, the behaviour of SQLFluff is divided into a few key stages. Whether calling sqlfluff lint, sqlfluff fix or sqlfluff parse, the internal flow is largely the same.
This stage only applies to templated SQL, most commonly Jinja and dbt. Vanilla SQL is sent straight to stage 2, the lexer.
In order to lint templated SQL, SQLFluff must first convert the 'raw' or pre-templated code into valid SQL, which can then be parsed. The templater returns both the raw and post-templated SQL so that any rule violations which occur in templated sections can be ignored and the rest mapped to their original line location for user feedback.
SQLFluff supports two templating engines: Jinja and dbt.
Under the hood dbt also uses Jinja, but in SQLFluff uses a separate mechanism which interfaces directly with the dbt python package.
For more details on how to configure the templater see Templating Configuration.
The lexer takes SQL and separates it into segments of whitespace and code. Where we can impart some high level meaning to segments, we do, but the result of this operation is still a flat sequence of typed segments (all subclasses of RawSegment).
The parser is arguably the most complicated element of SQLFluff, and is relied on by all the other elements of the tool to do most of the heavy lifting.
When working on the parser there are a couple of design principles to keep in mind.
Given the complete parse tree, rule classes check for linting errors by traversing the tree, looking for segments and patterns of concern. If the rule discovers a violation, it returns a LintResult pointing to the segment which caused the violation.
Some rules are able to fix the problems they find. If this is the case, the rule will return a list of fixes, which describe changes to be made to the tree. This can include edits, inserts, or deletions. Once the fixes have been applied, the updated tree is written to the original file.
Many rules supported by SQLFluff involve the spacing and layout of different elements, either to enforce a particular layout or just to add or remove code elements in a way sensitive to the existing layout configuration. The way this is achieved is through some centralised utilities in the sqlfluff.utils.reflow module.
This module aims to achieve several things: * Less code duplication by implementing reflow logic in only one place.
To support this, the module provides a ReflowSequence class which allows access to all of the relevant operations which can be used to reformat sections of code, or even a whole file. Unless there is a very good reason, all rules should use this same approach to ensure consistent treatment of layout.
Rules in SQLFluff are implemented as classes inheriting from BaseRule. SQLFluff crawls through the parse tree of a SQL file, calling the rule's _eval() function for each segment in the tree. For many rules, this allows the rule code to be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away.
Some rules are a poor fit for the simple traversal pattern described above. Typical reasons include:
These rules can override BaseRule's recurse_into field, setting it to False. For these rules False, _eval() is only called once, with the root segment of the tree. This can be much more efficient, especially on large files. For example, see rules LT13 and LT12 , which only look at the beginning or end of the file, respectively.
By default, SQLFluff calls _eval() for all segments, even "unparsable" segments, i.e. segments that didn't match the parsing rules in the dialect. This causes issues for some rules. If so, setting _works_on_unparsable to False tells SQLFluff not to call _eval() for unparsable segments and their descendants.
These are other fields on BaseRule. Rules can override them.
needs_raw_stack defaults to False. Some rules use RuleContext.raw_stack property to access earlier segments in the traversal. This can be useful, but it adds significant overhead to the linting process. For this reason, it is disabled by default.
There are two phases of rule running.
1. The main phase is appropriate for most rules. These rules are assumed to interact and potentially cause a cascade of fixes requiring multiple passes. These rules run the runaway_limit number of times (default 10).
2. The post phase is for post-processing rules, not expected to trigger any downstream rules, e.g. capitalization fixes. They are run in a post-processing loop at the end. This loop is identical to the main loop, but is only run 2 times at the end (once to fix, and once again to confirm no remaining issues).
The two phases add complexity, but they also improve performance by allowing SQLFluff to run fewer rules during the main phase, which often runs several times.
NOTE: post rules also run on the first pass of the main phase so that any issues they find will be presented in the list of issues output by sqlfluff fix and sqlfluff lint.
These newer modules provide a higher-level API for rules working with segments and slices. Rules that need to navigate or search the parse tree may benefit from using these. Eventually, the plan is for all rules to use these modules. As of December 30, 2021, 17+ rules use these modules.
The modules listed below are submodules of sqlfluff.utils.functional.
SQLFluff is extensible through "plugins". We use the pluggy library to make linting Rules pluggable, which enable users to implement rules that are just too "organization specific" to be shared, or too platform specific to be included in the core library.
NOTE:
Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented, recommending that you update your plugin.
# The root module will need to import `hookimpl`, but # should not yet import the rule definitions for the plugin. from sqlfluff.core.plugin import hookimpl @hookimpl def get_rules(): # Rules should be imported within the `get_rules` method instead from my_plugin.rules import MyRule return [MyRule]
We have an example plugin in sqlfluff/plugins/sqlfluff-plugin-example which you can use as a template for rules, or the sqlfluff/plugins/sqlfluff-templater-dbt which you can use as a template for templater plugins.
Currently, only Rules and Templaters can be added through plugins. Over time we expect more elements of SQLFluff will be extensible with plugins. Each plugin can implement multiple Rules or Templaters.
We recommend that the name of a plugin should start with "sqlfluff-" to be clear on the purpose of your plugin.
A plugin may need to include a default configuration if its rules are configurable: use plugin default configurations only for that reason! We advise against overwriting core configurations by using a default plugin configuration, as there is no mechanism in place to enforce precedence between the core library configs and plugin configs, and multiple plugins could clash.
A plugin Rule class name should have the structure: "Rule_PluginName_L000". The 'L' can be any letter and is meant to categorize rules; you could use the letter 'S' to denote rules that enforce security checks for example.
An important thing to note when running custom implemented rules: Run pip install -e ., inside the plugin folder so custom rules in linting are included.
A plugin Rule code includes the PluginName, so a rule "Rule_L000" in core will have code "L000", while "Rule_PluginName_L000" will have code "PluginName_L000". Codes are used to display errors, they are also used as configuration keys.
We make it easy for plugin developers to test their rules by exposing a testing library in sqlfluff.utils.testing.
Would you like to have other parts of SQLFluff be "pluggable"? Tell us about it in a GitHub issue 😄.
Want to find other people who are using SQLFluff in production use cases? Want to brag about how you're using it? Just want to show solidarity with the project and provide a testimonial for it?
Just add a section below by raising a PR on GitHub by editing this file ✏️.
We have a fast-growing community on Slack, come and join us!
Follow us On Twitter @SQLFluff for announcements and other related posts.
Alan Cruickshank
2024, Alan Cruickshank
February 22, 2024 |