tl;dr: Use TEXT instead of VARCHAR or CHAR; TEXT has exactly the same performance and space usage characteristics while being much more flexible.
Earlier this week, while browsing some stack overflow questions about
postgres' textual data types, I came across a link to
a set of benchmarks that compare the performance of the
TEXT types in PostgreSQL — I highly recommend reading through
the article if you have the time. It turns out that many variable length
data types (and some that are fixed length, like CHAR) are represented internally with the same
varlena data structure, which is simply a byte array prefixed with
a 32-bit length field.
CHAR(x) type, which stores strings of fixed length
to be used for data sets containing very short or known uniform length entries.
In theory, this type could be implemented to provide better performance than
variable length types in certain cases (like when scanning a table) because each
entry resides at a multiple-of-x offset.
VARCHAR(x) type specifies a variable length string with
with max length
x. One benefit of requiring a max length is that the length
of each entry could be represented by one byte (for
VARCHAR(255)), or two
bytes (for up to
VARCHAR(65535)), which saves space - this is, in fact,
exactly what MySQL does.
Clearly though, all of the above benefits do not apply to
Postgres since we already know they're all represented by
varlena, and the
benchmarks I linked to above confirm that. Knowing this, it
follows then that it's certainly never a good idea to use
VARCHAR(x) — at best they behave exactly the same, and in most cases
CHAR you'll end up wasting a lot of space, since all data will be
right-padded with spaces if it is shorter than the specified fixed length. This
padding can also make certain string operations and search queries behave
But what about
Well, the obvious advantage of
VARCHAR is that you can impose a DBMS-level
length constraint on the data coming in — which, while very useful, also
comes with signficant drawbacks. If you want to change the max length to be
postgres will have to rewrite the table, which can take a long time
and requires an exclusive table lock for the entirety of the operation. Clearly
not ideal in a live environment, especially for larger tables
EDIT: The above crossed out claims are not true since PG 9.2 if you are making the max larger, according to this HN comment. It's still true (without hacks) for making the constraint smaller, though.
If you were to impose these constraints in your application code instead, changing the max length would be a trivial commit.
However, it can still be useful to have lengths checked in postgres. Luckily
there are plenty of other options that can be used with
TEXT fields to
acheive the same effect with similar performance: domains, constraints, and
triggers come to mind.
EDIT: I suppose it'd probably be useful to provide an example of using TEXT and constraints to create fields with more flexibility than VARCHAR. Here's a code sample that creates a domain for storing usernames:
CREATE DOMAIN dom_username TEXT CHECK ( LENGTH(VALUE) > 3 AND LENGTH(VALUE) < 200 AND VALUE ~ '^[A-Za-z][A-Za-z0-9]+$' ); CREATE TABLE user ( name dom_username, email TEXT );
It should be obvious why this is leaps and bounds better than varchar: we get checks for min and max length, and we validate that the username is only alphanumeric starting with a letter. Also, it's super easy to change these constraints later.