Case-sensitivity in Aurora PostgreSQL
By Brian
When moving from SQL Server to PostgreSQL a common issue is case sensitivity. PostgreSQL 12 adds support for nondeterministic collations. These are my notes from testing various scenarios in Aurora. In my opinion CITEXT is still the best option despite native support.
Create some test data
First, Create a new Case Sensitive and Case Insensitive collation. Technically these exist under other names, but I thought his was more clear in the examples. The important part is the deterministic flag. This is Unicode speak for compare the bytes (true) or compare the logical values (false).
|
|
Then, I use the collations in a table definition.
|
|
Finally, I insert some data with mixed case.
|
|
Look at the impact on comparisons
First, search for data without the custom collation. Note only one row is returned.
|
|
And, again with the custom collation. Note both rows are returned.
|
|
Alternatively, you could just specify the collation at the time of the search rather than defining the column with this collation.
|
|
Cool that seems to work as expected.
What about LIKE statements?
PSQL supports a LIKE for case-sensitive and ILIKE case-insensitive comparisons. This was true before v12 and nothing has changed. So, LIKE will return only 2 rows
|
|
While, ILIKE will return 4 rows.
|
|
However, you cannot use either LIKE or ILIKE on name 2 with the non-deterministic (i.e. case-insensitive) collation. It throws an error. Ugh.
|
|
Therefore, you must explicitly specify a deterministic collation.
|
|
That just feels clunky.
What about CITEXT?
Recent versions of Aurora (event before v12) include the CITEXT extension. You can enable it for a database like this.
|
|
And then you can create a field of type CITEXT.
|
|
Add some data
|
|
Then you can perform case-insensitive comparisons.
|
|
If also supports and use ILIKE as expected.
|
|
Generally, that just feels more natural. So in the case of text fields, I still prefer the CITEXT type.
What about schema objects?
PostgreSQL uses lowercase schema names by default. For example, let’s create a table with Pascal Case names common in SQL Server.
|
|
Note that PostgreSQL ignored the case, except where I quote the fields. Therefore, all three of these work.
|
|
Also notice that the SuperHero is stored as superhero in the result. But, SideKick was left as is because of the double quotes.
|
|
However, because I specified a specific case in the definition, I must query the table the same way. If I try to insert data without double quoting SideKick, PostgreSQL will assume lowercase and it will fail.
|
|
If I specify a schema object with a specific case, I must always refer to that object with quotes.
|
|
Therefore, it’s probably best to never use quotes. Some visual design tools always add the quotes so be careful of these. If you are using one of these tools try to always use lowercase schema object names.