Generating Custom Sequences in Postgres

Postgres, unlike MySQL, uses sequences for auto-incrementing IDs. It’s a very powerful feature, and gives you a lot of control over how to store a primary key.

But say you want to generate a unique value for a field in a table, only this value is visible to the end user, and always increments by a set amount, regardless of what the primary key’s value. And, because this is a multi-tenant database, the primary key is guaranteed not to be sequential. You might also want to prefix this value with a short string, indicating what its value is.

For example, you have an ecommerce platform and every order that is placed gets its own number. You don’t want to just use a field like order_id as the order number, because you want to prefix each order number with the string ORD. A new account may be the millionth on the platform, and the order_id for their first order is 1484002193, or a UUID, and not something you would generally want to expose.

You can’t easily take advantage of Postgres’ native sequence handling to get this to work. To use a native sequence, you would have to create a new one for each new account, and ostensibly for each type of object you wanted to have a sequence for (orders, customer numbers, invoice numbers, etc). Postgres sequences are also not bound by a transaction which ensures they are globally unique (a sequence incremented by nextval() in a transaction is still incremented even if the transaction is rolled back). And, finally, your userland code would be responsible for generating the next sequence, not the database.

Clearly, using Postgres sequences is not ideal. Lets look at how we can create a small table and stored procedure to generate a sequence.

To begin, we’ll need a table to track sequence names, the account they are associated with, a prefix, and their next value.

CREATE TABLE account_sequence (
  sequence_id serial NOT NULL,
  account_id integer NOT NULL REFERENCES account (account_id) ON DELETE CASCADE,
  sequence text NOT NULL,
  prefix text NOT NULL DEFAULT ''::text,
  next_value integer NOT NULL DEFAULT 1,
  zero_pad integer,
  CONSTRAINT account_sequence_pkey PRIMARY KEY (sequence_id)
) WITH (OIDS=FALSE);

CREATE UNIQUE INDEX account_sequence_account_id_sequence_idx ON account_sequence (account_id, sequence);

This table forces each sequence name to be unique for each account. Thus, you could have one sequence named customer, one named order, and one named invoice. Additionally, it allows you to zero-pad the sequence value so you could generate sequences like 0001, 0002, 0003, 0004, etc.

You will notice we store the next value of the sequence rather than the current value. This makes it very easy to calculate the next sequence value, and allows you to easily query the table to see what it will be without having to run the stored procedure to generate the sequence. It also allows you to start the sequence at a value other than 1. A customer migrating to your platform could start their new order numbers at a much higher value to easily indicate where an order was placed.

Next, we’ll write a short stored procedure named generate_sequence() to generate the next value in a sequence.

CREATE OR REPLACE FUNCTION generate_sequence(_account_id integer, _sequence text) RETURNS TEXT AS $$
DECLARE
  _prefix text;
  _next_value text;
  _zero_pad integer;
  _value text;
BEGIN
  SELECT asq.prefix, asq.next_value::text, asq.zero_pad
  INTO _prefix, _next_value, _zero_pad
  FROM account_sequence asq
  WHERE asq.account_id = _account_id
    AND asq.sequence = _sequence;

  _value := _prefix || _next_value;

  IF _zero_pad IS NOT NULL THEN
    _value := lpad(_value, _zero_pad, '0');
  END IF;

  UPDATE account_sequence SET
    next_value = next_value + 1
  WHERE account_id = _account_id
    AND sequence = _sequence;

  RETURN _value;
END;
$$ LANGUAGE plpgsql;

The generate_sequence() stored procedure takes two arguments: an account_id and sequence. My preference is to prefix variable names in stored procedures with an underscore so you can easily differentiate them from column names.

The stored procedure begins by loading in the necessary fields of the account_sequence record into several predefined variables.

Next, the return value is calculated and zero padded if necessary. The return value, stored in _value, is generated by concatenating the prefix of the account_sequence record with the next_value field.

Finally, the next value of the sequence is calculated and saved.

Using this sequence is simple.

SELECT generate_sequence(1039, 'invoice');
 generate_sequence
-------------------
 INV2005
(1 row)

Another nice property of the generate_sequence() stored procedure is that it takes advantage of MVCC. If you require that the sequence be truly incremental, you can generate a sequence in a transaction and block other sequences from being generated until the transaction is committed or rolled back.

This stored procedure simply returns NULL if the account or sequence could not be found. You could easily update it to RAISE an error if the sequence did not exist.

And finally, because this code lives as a stored procedure, you can easily set it up as a pre-insert trigger on a table to generate sequence before a new row is inserted.

Building your own sequence generator may seem clunky, but it gives you a great deal of flexibility when dealing with records that need a human-visible unique identifier.

Shoe Dog

I picked up the book Shoe Dog by Phil Knight near the end of 2016. I had seen it in bookstores, but ignored it until Bill Gates said it was one of the best books he had read all year.

Shoe Dog chronicles the rise and many near-falls of Nike. The book takes you from the very beginnings in the early 60’s to the mid-80’s when Nike had cemented itself as one of the most dominant brands of all time.

Phil Knight started Nike with his running coach, Bill Bowerman, in the early 1960’s. A single blog post can’t do the book justice on how fascinating the rise of Nike truly was.

For example, Knight would frequently travel to Japan to ensure the shoes were made properly (Nike started as a running shoe company only) and had to essentially bribe an employee of the shoe manufacturer (formerly Tiger, now Asics) to do business with him rather than a competitor. He also expanded his company much faster than they could handle. He let his employees buy a warehouse on the East Coast without having enough inventory to fill it or enough demand to justify it.

It seemed like the company was in a perpetual state of catastrophe, with Knight wanting to grow the company, and quickly finding there wasn’t enough money to pay his employees and suppliers. In one instance, he found his company was making over $70 million a year in revenue, and had less than $10,000 in the bank.

Knight is known as being very private – and the book didn’t cover much of his personal life. For example, only a page or two was dedicated to his late son Matt (who he appeared to have an estranged relationship with). He discussed his first love, and his wife, but neither in much detail.

Unquestionably, Shoe Dog was my favorite book of 2016. It’s incredibly fast paced, the characters Knight assembled to build Nike were truly one of a kind, and regardless what your aspirations are in life, it is well worth your time to read it.

Day at the Cabin

My family and I spent the day at my in-laws cabin. It’s a relatively remote cabin backed by a large forest and several ponds for fishing.

It’s only 75 minutes outside of Dallas, yet provides a great escape from the world for a day. My boys haven’t stopped talking about it and can’t wait to go back. I love that we have an area they can visit and explore nature from time to time.

Unit Testing Your Service Layer is a Waste of Time

Writing unit tests for your application’s service layer is a waste of your time and won’t strengthen your application any better than functional tests would. Writing functional tests that thoroughly test your application does what it says it will do correctly is a much better use of your resources.

I first learned about test driven development (TDD) and automated testing in 2009 and was blown away. I immediately purchased the book xUnit Test Patterns and devoured it. I learned about code smells and fixtures and stubs and mocks and everything in-between. It was amazing that I could actually verify that my application works like I intended it to. Throw TDD in the mix where I could guarantee that I only write as much code as necessary to pass my tests and I was hooked.

I immediately took an academic approach to testing. Invert every components control, mock every depended on component (DoC), completely isolate and unit test everything: the database, other objects, even the file system.

For trivial applications like a project hacked together on a weekend, this practice was fine. I felt like I spent an equal amount of time writing tests as I did writing the code itself.

Fast forward a few years and I have a successful consultancy and am working full time writing web and network applications. I’m still trying to write unit tests and follow TDD principles, but it just isn’t working.

For example, I would write both unit and functional tests for my application’s controllers. For a Symfony application, this is a non-trivial task as the number of depended on components in each controller can easily force your tests to be longer than the controller code itself. Breaking controllers into very small units is difficult as they often deal with handling UI interactions and must communicate to multiple components of the application.

Application Development Has Evolved

When TDD and automated testing became integrated with object oriented programming, databases were big behemoths that required a full time team to manage. File systems were very slow and not reliable. It made sense that your tests should be comprised of small units that were tested in isolation: your components were unreliable!

Today, enterprise applications are contained in a single repository. You can mirror a production server on your insanely powerful laptop with a single Vagrantfile. Working with databases in your tests is a cinch (especially with tools to build out your fixtures in YAML). In all but a few instances, you generally don’t need to worry about a network device or disk drive failing during test suite execution. And modern frameworks employ multiple environments with a dependency container that makes it very simple to mock complex components in your test suite.

Application development has evolved, and the time spent writing unit tests that mock every component in your system does not offer the same dividends as writing a simple functional or integration test does.

Motivating Example

Lets look at a small example of how writing a simple functional test can rapidly increase development and still provide the same guarantees your code works.

If you recall from my previous article on using the ORM, I am in the process of building an importer for an online store. I have the code written, and now I want to verify it is correct (I’ve thrown any notion of TDD out of the window).

In the first example, all depended on components of the system under test (SUT) are mocked.

<?php

use MyApp\Library\Feeds\ProductImporter;

class ProductImporterTest extends PHPUnit_Framework_TestCase
{

    public function testImportingProducts()
    {
        // Imagine this was a sample array of data.
        $results = [];

        // Spend the next N lines mocking all of your DOCs.
        $db = Mockery::mock('Doctrine\DBAL\Connection')
            ->shouldReceive('fetchAll')
            ->andReturn($results);

        $importer = new ProductImporter($db);
        $importer->import('{"fake":"data"}');

        $this->assertEquals(4, $importer->getRecordCount());
        $this->assertFalse($importer->hasError());
    }

}

The first example is academic; it’s pure. It proves my code works, it’s fast, and it tests individual units. The second example is functional. I’m taking actual files, running them through my code, and seeing how the code responds.

<?php

use MyApp\Tests\TestCase;

class ProductImporterTest extends TestCase
{

    /**
     * @dataProvider providerProductFile
     */
    public function testImportingProducts($file, $recordCount, $hasError)
    {
        $fileContents = file_get_contents(__DIR__ . '/' . $file);

        $importer = $this->getContainer()
            ->get('my_app.product_importer')
            ->import($fileContents);

        $this->assertEquals($recordCount, $importer->getRecordCount());
        $this->assertEquals($hasError, $importer->hasError());
    }

    public function providerProductFile()
    {
        $provider = [
            ['products.invalid.json', 0, true],
            ['products.small.json', 1, false],
            ['products.large.json', 1000, false]
        ];

        return $provider;
    }

}

Another benefit of the second example is that the data provider can grow over time. Client deliver a malformed file and you want to see how the code responds? Throw it in the provider. Client thinks there’s an issue with your importer because their file matches your spec? Throw it in the provider. It makes finding actual bugs that a unit test would completely ignore.

When to Write Unit Tests

There are, of course, times when writing unit tests is necessary and writing functional tests may be impossible.

If you are writing a library or framework, it is wise to write unit tests. You don’t know how your code will actually be used, so having formal verification that your library classes do what they say they will is a must. It also adds a great deal of built in documentation to your codebase because a passing test, by definition, accurately documents your code.

Another time to write unit tests would be for a legacy codebase right before a refactoring. If the codebase is old enough, your DoC’s may be very difficult to work with and thus writing a unit test will accurately capture the functionality of your application.

From the moment I started writing tests, I’ve attempted to get every developer at every company I’ve worked for to start writing tests without much luck. In retrospect, I feel if I had started with a “functional first” approach, I would have been more successful in my efforts. Introducing a developer to testing by way of writing simple functional tests may be the best bet to get all developers writing tests, period.

I Am a Great Programmer, But Horrible At Algorithms

This is an old post written in February 2013 and published my old blog. I am publishing it here because it resonated with the community when it was first posted. The basic idea continues to flourish with the rise of sites like rejected.us and Max Howell’s very popular tweet.

I am a great programmer, but a horrible algorithmist. It is a thought that has been weighing on me heavily recently, and I’d like to gather other developers feelings on the subject as well.

I started what can be called my professional development career back in 1999. I was still in middle school, but my father hired me at his software company. My official duty was to make updates to our websites, but I mostly ended up bugging the other developers to help me learn.

From there I picked up Perl (somewhat) and then moved to PHP and front end web development where I have stayed comfortably for the last twelve years.

When it comes to building large scale systems, understanding the details of those systems, and actually writing them, I do very well. I can write elegant PHP code (believe it exists), and really understand programming well. I do all the things a software craftsman does: writes tests, automates as much as possible, learns new technologies, hones my craft with side work and open source work, and build systems that will scale with demand and customer requests.

I even have a degree in Computer Science from what I think is a great university.

However, I feel I am a horrible algorithmist.

Ask me to write a complex algorithm (even one that has been discovered), and I start to get sweaty palmed and nervous. Is this a symptom you have as well? To really be able to express an algorithm in code, I really have to spend a lot of time understanding it to do so.

I understand that an algorithm is just a series of steps to complete a problem. I am referring to complex algorithms like sorting, recursive merging strategies, cryptography, and compression, to name a few.

My proudest college accomplishment was writing the A* algorithm for my first Data Structures and Algorithms class. I spent hours physically drawing graphs and keeping written tables of the heap that the nodes were being pushed onto and off of.

I even kept the drawings because I was so proud of them (click the links below to see the sketches).

What it boils down to is I often have trouble seeing the underlying algorithm to a complex problem. I once interviewed with Amazon and did not make it past the second round because I could not see the underlying algorithm in one of the questions they asked me (the questions on overall architecture, however, I aced just fine). Fortunately, this is not something you either have or do not. Some programmers do have a natural ability to see the underlying algorithm to a problem, but if you can not, it can be learned.

Am I alone in feeling this? Do other programmers struggle with this as well? Is this a manifestation of Imposter Syndrome? I thoroughly enjoyed college, but I did not study as hard as I should have. If you are a Computer Science major in college now and a lot of this does not come naturally, I urge you: please spend time on your studies. Really learn the algorithms presented in class. If you never actually use them during your career, at least it will help you feel more like a programmer.

Application Scalability Has Evolved

This is an old post written in November 2012 and published my old blog. I am publishing it here because I believe most of the thoughts presented have become true. We have Vagrant and Docker for fast onboarding, modern frameworks allow applications to be built very rapidly, the rise of continuous integration tools proves rapid release cycles and automated testing are popular, and the domination of Slack shows communication is as important as ever.

Application scalability has evolved. I think a lot of technologists, from developers to operations, believe that application scaling means to simply add more servers behind a load balancer and call it a day. That might have been the case when writing web applications was a new medium of software development in the late 1990s and early 2000s, but the last several years have completely changed what it means to write a scalable application.

Modern application scalability is all about speed. With as competitive as writing software is today, speed is not only a feature, but a complete game changer.

On-Boarding New Developers

Writing a scalable application means you need to be able to allow new developers to work on it as quickly as possible. The delta between when a developer joins your organization or project to when they can begin hacking on it should be measured in minutes, not weeks, days, or even hours.

Have you ever started at a new company and not felt useful on the project you are working on until several weeks into working on it? I have. That application is not scalable. Ideally you should be able to arrive at work on your first day, open your new computer, clone the project you are working on, adjust some build settings, and immediately build, test, and deploy the project (locally).

A scalable application allows developers to work on their own local machines, running their own local databases, services, cron jobs, and other necessary applications. A scalable application allows developers to work entirely in their own sandbox.

Application Building

The ease at which you can build a new version of your application determines how scalable it is. A slow build time means you can not respond well to increased load. Large applications (500,000 LOC or larger) should take several minutes to build. Smaller applications should take several seconds.

If you are constantly fighting the build process, and are afraid of deployments, your application is not scalable. Take the time to reduce the delta from when a change is made to when it is pushed to production. It is worth spending the time to build a configuration system so you can turn on and off features in production. This way, you can build out features, push them into production, and turn them on and off as needed. This is a powerful mechanism to reduce the number of errors you have after releasing code that has been under development for several months.

Release Cycles

A scalable application has rapid release cycles. Rapid release cycles find bugs quickly and respond to new requirements faster. As soon as your code is written, has passed all of its tests, and been reviewed, push it to production. Having tools already in place to monitor failures makes it easy to determine how your code is operating in production. Your goal should be to release more high quality production code as fast as possible.

We have better tools, development practices, and hardware now to continue allowing developers build in a silo for three months and then spend four sleepless days releasing their code.

Communication Styles

Teams must communicate well to build a scalable product. This includes non-development teams as well. Inefficient communication leads to poor product development. We all have been members of email threads longer than a book. Information is lost or disguised, and building a product that scales to your customers demands becomes more and more difficult.

Communication needs to be efficient to build a scalable product. Developers need to be able to communicate clearly and efficiently between themselves. That means quiet working conditions and the best tools possible. A developer can hopefully express herself most efficiently with code, so having good code review tools and practices in place is a must. Code reviews primary purpose is not to catch bugs, it is to spread knowledge.

How quickly can sales managers communicate to your customers about production ready features? The faster you can onboard new customers to your product will help you build a scalable platform. You will find your pain points quicker and patch them faster. The Marketing and Development departments should work hand in hand. Everyone at the company is responsible for marketing the product, and everything they do markets the product.

Test Coverage

A scalable software platform has great test coverage. With a great suite of tests, developers can easily mimic different load scenarios with the product without having to release it into production first. They will immediately know if their changes affect the speed of their product negatively.

Customers pay your company for quality work. Releasing software without tests is not releasing quality work.

Spend the (short) amount of time it takes to have your developers learn testing. Spend the (short) amount of time it takes to write tests for your application. It will pay off dividends in the end

Ability to Change

Application scalability boils down to one thing: the ability for a team to change rapidly. Each of the sections above relates to rapid change. All of it can seem overwhemling if you are used to a slower development pace. As you add more of the above suggestions into your development flow, you will notice a snowball effect.

Each new change will piggy back on top of the previous one and the delta of time to get it successfully integrated into your team will be smaller than the previous change. For example, getting your developers to write tests may take two weeks. But as soon as they are writing tests, their code reviews will increase in quality because they are reviewing the code and test changes. Suddenly, bugs that were not clear earlier scream out of the screen. Knowledge will spread quicker, and developers will on-boarded faster. Releases will take seconds rather than hours. Communication between developers and sales managers will increase because developers will be excited to show off their new working features.

Application scalability has evolved. To be a successful software engineer, you must evolve with it.

Customizing the precision in percent formatting in Sonata IntlBundle

By default, the number_format_percent macro in the Sonata IntlBundle formats numbers without a decimal fraction. For example, if the value passed to it is .1240, the resulting formatted string will be “12%”. If you want to add additional levels of precision, you can customize it with the fraction_digits argument like this:

{{ .1240|number_format_percent({'fraction_digits': 2}) }}

The resulting formatted string will be “12.40%”. This isn’t documented very well, and is a source of frustration if you want to add more precision.