Jon Morton's Blog

home

PostgreSQL: Don't Use CHAR or VARCHAR

20 Nov 2013

tl;dr: Use TEXT instead of VARCHAR or CHAR; TEXT has exactly the same performance and space usage characteristics while being much more flexible.


Earlier this week, while browsing some stack overflow questions about postgres' textual data types, I came across a link to a set of benchmarks that compare the performance of the CHAR, VARCHAR, and TEXT types in PostgreSQL — I highly recommend reading through the article if you have the time. It turns out that many variable length data types (and some that are fixed length, like CHAR) are represented internally with the same varlena data structure, which is simply a byte array prefixed with a 32-bit length field.

Traditionally, the CHAR(x) type, which stores strings of fixed length x, is to be used for data sets containing very short or known uniform length entries. In theory, this type could be implemented to provide better performance than variable length types in certain cases (like when scanning a table) because each entry resides at a multiple-of-x offset. The VARCHAR(x) type specifies a variable length string with with max length x. One benefit of requiring a max length is that the length of each entry could be represented by one byte (for VARCHAR(1) to VARCHAR(255)), or two bytes (for up to VARCHAR(65535)), which saves space - this is, in fact, exactly what MySQL does.

Clearly though, all of the above benefits do not apply to Postgres since we already know they're all represented by varlena, and the benchmarks I linked to above confirm that. Knowing this, it follows then that it's certainly never a good idea to use CHAR(x) over VARCHAR(x) — at best they behave exactly the same, and in most cases with CHAR you'll end up wasting a lot of space, since all data will be right-padded with spaces if it is shorter than the specified fixed length. This padding can also make certain string operations and search queries behave unexpectedly.

But what about VARCHAR vs TEXT?

Well, the obvious advantage of VARCHAR is that you can impose a DBMS-level length constraint on the data coming in — which, while very useful, also comes with signficant drawbacks. If you want to change the max length to be larger, postgres will have to rewrite the table, which can take a long time and requires an exclusive table lock for the entirety of the operation. Clearly not ideal in a live environment, especially for larger tables

EDIT: The above crossed out claims are not true since PG 9.2 if you are making the max larger, according to this HN comment. It's still true (without hacks) for making the constraint smaller, though.

If you were to impose these constraints in your application code instead, changing the max length would be a trivial commit.

However, it can still be useful to have lengths checked in postgres. Luckily there are plenty of other options that can be used with TEXT fields to acheive the same effect with similar performance: domains, constraints, and triggers come to mind.


EDIT: I suppose it'd probably be useful to provide an example of using TEXT and constraints to create fields with more flexibility than VARCHAR. Here's a code sample that creates a domain for storing usernames:

CREATE DOMAIN dom_username TEXT CHECK (
    LENGTH(VALUE) > 3 AND
    LENGTH(VALUE) < 200 AND
    VALUE ~ '^[A-Za-z][A-Za-z0-9]+$'
);

CREATE TABLE user (
  name dom_username,
  email TEXT
);

It should be obvious why this is leaps and bounds better than varchar: we get checks for min and max length, and we validate that the username is only alphanumeric starting with a letter. Also, it's super easy to change these constraints later.