Testing File Downloads with Guzzle

I have a class that does two things: downloads a call recording as an MP3 from Twilio and then re-uploads it to Amazon S3 for permanent storage. This scenario happens frequently, and I wanted to be able to test it without actually making a real HTTP request from Twilio or to Amazon S3. In this article, I’ll explain the open source libraries I used, as well as how to configure them.

Libraries

To begin, I installed the following libraries (taken directly from my composer.json file):

  • "adlawson/vfs": "^0.12.1"
  • "guzzlehttp/guzzle": "^6.3"

The adlawson/vfs package is installed as part of the require-dev block rather than the require block because it is only necessary as part of your tests. VFS stands for virtual filesystem, and this library allows you to mount a virtual filesystem that your test code can write data to. The virtual filesystem can then be torn down at the end of the test without any impact to your actual filesystem. PHP has a powerful feature built-in to the language called “streams“. The VFS library makes use of streams to read and write files directly in memory which mimics a real filesystem.

Guzzle is a very nice HTTP client. It provides a nice wrapper for interacting with HTTP endpoints, but more importantly, provides a way to mock HTTP requests and responses.

Code

Let’s start by looking at the code we need to test.

namespace MyApp\Library\Calls;

use MyApp\Entity\Call;
use MyApp\Library\Messengers\TwilioClient;
use MyApp\Library\Storage\Uploader;

use GuzzleHttp\Client as GuzzleClient;

class CallAudioProcessor
{

    /** @var GuzzleHttp\Client */
    private $guzzleClient;

    /** @var MyApp\Library\Messengers\TwilioClient */
    private $twilioClient;

    /** @var MyApp\Library\Storage\Uploader */
    private $uploader;

    /** @var string */
    private $fileDirectory;

    public function __construct(
        TwilioClient $twilioClient,
        Uploader $uploader,
        GuzzleClient $guzzleClient,
        string $fileDirectory
    )
    {
        $this->twilioClient = $twilioClient;
        $this->uploader = $uploader;
        $this->guzzleClient = $guzzleClient;

        $this->fileDirectory = $fileDirectory;
    }

    public function processCallRecording(Call $call) : Call
    {
        // Attempt to get the recording from Twilio.
        $recording = $this->twilioClient->fetchCallRecording(
            $call->getObjectId()
        );

        // Generate the path where the recording file will be downloaded.
        $filePath = $this->fileDirectory . $recording['fileName'];

        $this->guzzleClient->request('GET', $recording['recordingUrl'], [
            'sink' => $filePath
        ]);

        if (is_file($filePath)) {
            $fileUrl = $this->uploader->uploadFile(
                $recording['fileName']
            );

            $call->setFileName($recording['fileName'])
                ->setFileSize(filesize($filePath))
                ->setFileUrl($fileUrl);

            unlink($filePath);
        }

        return $call;
    }

}

The test will also be responsible for mocking the TwilioClient and Uploader classes so they can return mocked data.

We’re interested in a test that tests the processCallRecording() method. We want to test the successful path, that is: a recording was found and successfully uploaded to a remote storage location.

Test

Now that our system under test, or SUT, is properly defined, we can take a look at the unit test that runs it.

namespace MyApp\Tests\Calls;

use MyApp\Entity\Call;
use MyApp\Library\Messengers\TwilioClient;
use MyApp\Library\Storage\Uploader;

use GuzzleHttp\Client as GuzzleClient;
use GuzzleHttp\Handler\MockHandler as GuzzleMockHandler;
use GuzzleHttp\HandlerStack as GuzzleHandlerStack;
use GuzzleHttp\Psr7\Response as GuzzleResponse;

use Vfs\FileSystem as VirtualFileSystem;
use Vfs\Node\File as VirtualFile;

use Faker\Factory as Faker;

use PHPUnit\Framework\TestCase;

class CallAudioProcessorTest extends TestCase
{

    public function testProcessingCallRecording()
    {
        $faker = Faker::create();

        // Create the mocks for the TwilioClient and Uploader.
        $twilioClient = $this->createMock(TwilioClient::class);
        $twilioClient->expects($this->once())->method('fetchCallRecording')->willReturn([
            'fileName' => sprintf('%s.mp3', md5(uniqid())),
            'recordingUrl' => $faker->url
        ]);

        $uploader = $this->createMock(Uploader::class);
        $uploader->expects($this->once())->method('uploadFile')->willReturn($faker->url);

        // Mount the VFS. In a non-test environment, the $fileDirectory
        // might be something like /var/data/files, but in the test,
        // it is the URL to a stream created by the VFS.
        $fileDirectory = 'vfs://';

        $vfs = VirtualFileSystem::factory($fileDirectory);
        $vfs->mount();

        // Create the mock for the GuzzleClient.
        $guzzleMockHandler = new GuzzleMockHandler([
            new GuzzleResponse(200)
        ]);

        $guzzleClient = new GuzzleClient([
            'handler' => GuzzleHandlerStack::create(
                $guzzleMockHandler
            )
        ]);

        $callAudioProcessor = new CallAudioProcessor(
            $twilioClient, $guzzleClient, $uploader, $fileDirectory
        );

        $call = new Call;
        
        $this->assertNull($call->getFileName());
        $this->assertNull($call->getFileUrl());

        $call = $callAudioProcessor->processCallRecording($call);

        $this->assertNotNull($call->getFileName());
        $this->assertNotNull($call->getFileUrl());

        // Important to unmount the VFS
        // so other tests can use it.
        $vfs->unmount();
    }

}

The test starts by mocking the TwilioClient and Uploader classes. These are straightforward mocks that have methods expected to be called once and that will return some basic information.

After the mocks are created, the VFS is mounted. You can make up any mount point you want, and I use vfs:// which I think makes sense. Once mounted, the mock GuzzleClient is created. The native PHPUnit mocking library is not used here, and instead the mock handler provided by Guzzle is used. Though it is not terribly important in this test, the mock handler from Guzzle provides greater flexibility in how your tests behave which is why I prefer it over a PHPUnit mock.

At this point, our fixtures are defined properly and we’re ready to execute the test itself. You can see that a new instance of the CallAudioProcessor class is instantiated with the mocks we’ve built as arguments to the constructor.

Note: If this were a functional test and you were instantiating the CallAudioProcessor from a dependency injector (like the one used in Symfony), you would have to provide setters that allow you to override the depended on components (or DOCs) passed into the constructor.

Finally, the method we’re testing, processCallRecording(), is called, and we assert that the code worked as intended.

Important: You must unmount the VFS at the end of the test! If you fail to do this, any subsequent test using the same VFS scheme (vfs:// in this example) will fail. Behind the scenes, the VFS library destroys the stream when unmount() is called. If this is not done, a subsequent test will attempt to re-create an identical stream, and PHP will complain.

Conclusion

Interacting with the filesystem is always a pain-point when writing tests (regardless of the test type: unit or functional). A virtual filesystem is a beautiful way to alleviate this pain, and it works great for all test types (my functional tests make use of them extensively too).

Generating Custom Sequences in Postgres

Postgres, unlike MySQL, uses sequences for auto-incrementing IDs. It’s a very powerful feature, and gives you a lot of control over how to store a primary key.

But say you want to generate a unique value for a field in a table, only this value is visible to the end user, and always increments by a set amount, regardless of what the primary key’s value. And, because this is a multi-tenant database, the primary key is guaranteed not to be sequential. You might also want to prefix this value with a short string, indicating what its value is.

For example, you have an ecommerce platform and every order that is placed gets its own number. You don’t want to just use a field like order_id as the order number, because you want to prefix each order number with the string ORD. A new account may be the millionth on the platform, and the order_id for their first order is 1484002193, or a UUID, and not something you would generally want to expose.

You can’t easily take advantage of Postgres’ native sequence handling to get this to work. To use a native sequence, you would have to create a new one for each new account, and ostensibly for each type of object you wanted to have a sequence for (orders, customer numbers, invoice numbers, etc). Postgres sequences are also not bound by a transaction which ensures they are globally unique (a sequence incremented by nextval() in a transaction is still incremented even if the transaction is rolled back). And, finally, your userland code would be responsible for generating the next sequence, not the database.

Clearly, using Postgres sequences is not ideal. Lets look at how we can create a small table and stored procedure to generate a sequence.

To begin, we’ll need a table to track sequence names, the account they are associated with, a prefix, and their next value.

CREATE TABLE account_sequence (
  sequence_id serial NOT NULL,
  account_id integer NOT NULL REFERENCES account (account_id) ON DELETE CASCADE,
  sequence text NOT NULL,
  prefix text NOT NULL DEFAULT ''::text,
  next_value integer NOT NULL DEFAULT 1,
  zero_pad integer,
  CONSTRAINT account_sequence_pkey PRIMARY KEY (sequence_id)
) WITH (OIDS=FALSE);

CREATE UNIQUE INDEX account_sequence_account_id_sequence_idx ON account_sequence (account_id, sequence);

This table forces each sequence name to be unique for each account. Thus, you could have one sequence named customer, one named order, and one named invoice. Additionally, it allows you to zero-pad the sequence value so you could generate sequences like 0001, 0002, 0003, 0004, etc.

You will notice we store the next value of the sequence rather than the current value. This makes it very easy to calculate the next sequence value, and allows you to easily query the table to see what it will be without having to run the stored procedure to generate the sequence. It also allows you to start the sequence at a value other than 1. A customer migrating to your platform could start their new order numbers at a much higher value to easily indicate where an order was placed.

Next, we’ll write a short stored procedure named generate_sequence() to generate the next value in a sequence.

CREATE OR REPLACE FUNCTION generate_sequence(_account_id integer, _sequence text) RETURNS TEXT AS $$
DECLARE
  _prefix text;
  _next_value text;
  _zero_pad integer;
  _value text;
BEGIN
  SELECT asq.prefix, asq.next_value::text, asq.zero_pad
  INTO _prefix, _next_value, _zero_pad
  FROM account_sequence asq
  WHERE asq.account_id = _account_id
    AND asq.sequence = _sequence;

  _value := _prefix || _next_value;

  IF _zero_pad IS NOT NULL THEN
    _value := lpad(_value, _zero_pad, '0');
  END IF;

  UPDATE account_sequence SET
    next_value = next_value + 1
  WHERE account_id = _account_id
    AND sequence = _sequence;

  RETURN _value;
END;
$$ LANGUAGE plpgsql;

The generate_sequence() stored procedure takes two arguments: an account_id and sequence. My preference is to prefix variable names in stored procedures with an underscore so you can easily differentiate them from column names.

The stored procedure begins by loading in the necessary fields of the account_sequence record into several predefined variables.

Next, the return value is calculated and zero padded if necessary. The return value, stored in _value, is generated by concatenating the prefix of the account_sequence record with the next_value field.

Finally, the next value of the sequence is calculated and saved.

Using this sequence is simple.

SELECT generate_sequence(1039, 'invoice');
 generate_sequence
-------------------
 INV2005
(1 row)

Another nice property of the generate_sequence() stored procedure is that it takes advantage of MVCC. If you require that the sequence be truly incremental, you can generate a sequence in a transaction and block other sequences from being generated until the transaction is committed or rolled back.

This stored procedure simply returns NULL if the account or sequence could not be found. You could easily update it to RAISE an error if the sequence did not exist.

And finally, because this code lives as a stored procedure, you can easily set it up as a pre-insert trigger on a table to generate sequence before a new row is inserted.

Building your own sequence generator may seem clunky, but it gives you a great deal of flexibility when dealing with records that need a human-visible unique identifier.

Unit Testing Your Service Layer is a Waste of Time

Writing unit tests for your application’s service layer is a waste of your time and won’t strengthen your application any better than functional tests would. Writing functional tests that thoroughly test your application does what it says it will do correctly is a much better use of your resources.

I first learned about test driven development (TDD) and automated testing in 2009 and was blown away. I immediately purchased the book xUnit Test Patterns and devoured it. I learned about code smells and fixtures and stubs and mocks and everything in-between. It was amazing that I could actually verify that my application works like I intended it to. Throw TDD in the mix where I could guarantee that I only write as much code as necessary to pass my tests and I was hooked.

I immediately took an academic approach to testing. Invert every components control, mock every depended on component (DoC), completely isolate and unit test everything: the database, other objects, even the file system.

For trivial applications like a project hacked together on a weekend, this practice was fine. I felt like I spent an equal amount of time writing tests as I did writing the code itself.

Fast forward a few years and I have a successful consultancy and am working full time writing web and network applications. I’m still trying to write unit tests and follow TDD principles, but it just isn’t working.

For example, I would write both unit and functional tests for my application’s controllers. For a Symfony application, this is a non-trivial task as the number of depended on components in each controller can easily force your tests to be longer than the controller code itself. Breaking controllers into very small units is difficult as they often deal with handling UI interactions and must communicate to multiple components of the application.

Application Development Has Evolved

When TDD and automated testing became integrated with object oriented programming, databases were big behemoths that required a full time team to manage. File systems were very slow and not reliable. It made sense that your tests should be comprised of small units that were tested in isolation: your components were unreliable!

Today, enterprise applications are contained in a single repository. You can mirror a production server on your insanely powerful laptop with a single Vagrantfile. Working with databases in your tests is a cinch (especially with tools to build out your fixtures in YAML). In all but a few instances, you generally don’t need to worry about a network device or disk drive failing during test suite execution. And modern frameworks employ multiple environments with a dependency container that makes it very simple to mock complex components in your test suite.

Application development has evolved, and the time spent writing unit tests that mock every component in your system does not offer the same dividends as writing a simple functional or integration test does.

Motivating Example

Lets look at a small example of how writing a simple functional test can rapidly increase development and still provide the same guarantees your code works.

If you recall from my previous article on using the ORM, I am in the process of building an importer for an online store. I have the code written, and now I want to verify it is correct (I’ve thrown any notion of TDD out of the window).

In the first example, all depended on components of the system under test (SUT) are mocked.

<?php

use MyApp\Library\Feeds\ProductImporter;

class ProductImporterTest extends PHPUnit_Framework_TestCase
{

    public function testImportingProducts()
    {
        // Imagine this was a sample array of data.
        $results = [];

        // Spend the next N lines mocking all of your DOCs.
        $db = Mockery::mock('Doctrine\DBAL\Connection')
            ->shouldReceive('fetchAll')
            ->andReturn($results);

        $importer = new ProductImporter($db);
        $importer->import('{"fake":"data"}');

        $this->assertEquals(4, $importer->getRecordCount());
        $this->assertFalse($importer->hasError());
    }

}

The first example is academic; it’s pure. It proves my code works, it’s fast, and it tests individual units. The second example is functional. I’m taking actual files, running them through my code, and seeing how the code responds.

<?php

use MyApp\Tests\TestCase;

class ProductImporterTest extends TestCase
{

    /**
     * @dataProvider providerProductFile
     */
    public function testImportingProducts($file, $recordCount, $hasError)
    {
        $fileContents = file_get_contents(__DIR__ . '/' . $file);

        $importer = $this->getContainer()
            ->get('my_app.product_importer')
            ->import($fileContents);

        $this->assertEquals($recordCount, $importer->getRecordCount());
        $this->assertEquals($hasError, $importer->hasError());
    }

    public function providerProductFile()
    {
        $provider = [
            ['products.invalid.json', 0, true],
            ['products.small.json', 1, false],
            ['products.large.json', 1000, false]
        ];

        return $provider;
    }

}

Another benefit of the second example is that the data provider can grow over time. Client deliver a malformed file and you want to see how the code responds? Throw it in the provider. Client thinks there’s an issue with your importer because their file matches your spec? Throw it in the provider. It makes finding actual bugs that a unit test would completely ignore.

When to Write Unit Tests

There are, of course, times when writing unit tests is necessary and writing functional tests may be impossible.

If you are writing a library or framework, it is wise to write unit tests. You don’t know how your code will actually be used, so having formal verification that your library classes do what they say they will is a must. It also adds a great deal of built in documentation to your codebase because a passing test, by definition, accurately documents your code.

Another time to write unit tests would be for a legacy codebase right before a refactoring. If the codebase is old enough, your DoC’s may be very difficult to work with and thus writing a unit test will accurately capture the functionality of your application.

From the moment I started writing tests, I’ve attempted to get every developer at every company I’ve worked for to start writing tests without much luck. In retrospect, I feel if I had started with a “functional first” approach, I would have been more successful in my efforts. Introducing a developer to testing by way of writing simple functional tests may be the best bet to get all developers writing tests, period.

I Am a Great Programmer, But Horrible At Algorithms

This is an old post written in February 2013 and published my old blog. I am publishing it here because it resonated with the community when it was first posted. The basic idea continues to flourish with the rise of sites like rejected.us and Max Howell’s very popular tweet.

I am a great programmer, but a horrible algorithmist. It is a thought that has been weighing on me heavily recently, and I’d like to gather other developers feelings on the subject as well.

I started what can be called my professional development career back in 1999. I was still in middle school, but my father hired me at his software company. My official duty was to make updates to our websites, but I mostly ended up bugging the other developers to help me learn.

From there I picked up Perl (somewhat) and then moved to PHP and front end web development where I have stayed comfortably for the last twelve years.

When it comes to building large scale systems, understanding the details of those systems, and actually writing them, I do very well. I can write elegant PHP code (believe it exists), and really understand programming well. I do all the things a software craftsman does: writes tests, automates as much as possible, learns new technologies, hones my craft with side work and open source work, and build systems that will scale with demand and customer requests.

I even have a degree in Computer Science from what I think is a great university.

However, I feel I am a horrible algorithmist.

Ask me to write a complex algorithm (even one that has been discovered), and I start to get sweaty palmed and nervous. Is this a symptom you have as well? To really be able to express an algorithm in code, I really have to spend a lot of time understanding it to do so.

I understand that an algorithm is just a series of steps to complete a problem. I am referring to complex algorithms like sorting, recursive merging strategies, cryptography, and compression, to name a few.

My proudest college accomplishment was writing the A* algorithm for my first Data Structures and Algorithms class. I spent hours physically drawing graphs and keeping written tables of the heap that the nodes were being pushed onto and off of.

I even kept the drawings because I was so proud of them (click the links below to see the sketches).

What it boils down to is I often have trouble seeing the underlying algorithm to a complex problem. I once interviewed with Amazon and did not make it past the second round because I could not see the underlying algorithm in one of the questions they asked me (the questions on overall architecture, however, I aced just fine). Fortunately, this is not something you either have or do not. Some programmers do have a natural ability to see the underlying algorithm to a problem, but if you can not, it can be learned.

Am I alone in feeling this? Do other programmers struggle with this as well? Is this a manifestation of Imposter Syndrome? I thoroughly enjoyed college, but I did not study as hard as I should have. If you are a Computer Science major in college now and a lot of this does not come naturally, I urge you: please spend time on your studies. Really learn the algorithms presented in class. If you never actually use them during your career, at least it will help you feel more like a programmer.

Application Scalability Has Evolved

This is an old post written in November 2012 and published my old blog. I am publishing it here because I believe most of the thoughts presented have become true. We have Vagrant and Docker for fast onboarding, modern frameworks allow applications to be built very rapidly, the rise of continuous integration tools proves rapid release cycles and automated testing are popular, and the domination of Slack shows communication is as important as ever.

Application scalability has evolved. I think a lot of technologists, from developers to operations, believe that application scaling means to simply add more servers behind a load balancer and call it a day. That might have been the case when writing web applications was a new medium of software development in the late 1990s and early 2000s, but the last several years have completely changed what it means to write a scalable application.

Modern application scalability is all about speed. With as competitive as writing software is today, speed is not only a feature, but a complete game changer.

On-Boarding New Developers

Writing a scalable application means you need to be able to allow new developers to work on it as quickly as possible. The delta between when a developer joins your organization or project to when they can begin hacking on it should be measured in minutes, not weeks, days, or even hours.

Have you ever started at a new company and not felt useful on the project you are working on until several weeks into working on it? I have. That application is not scalable. Ideally you should be able to arrive at work on your first day, open your new computer, clone the project you are working on, adjust some build settings, and immediately build, test, and deploy the project (locally).

A scalable application allows developers to work on their own local machines, running their own local databases, services, cron jobs, and other necessary applications. A scalable application allows developers to work entirely in their own sandbox.

Application Building

The ease at which you can build a new version of your application determines how scalable it is. A slow build time means you can not respond well to increased load. Large applications (500,000 LOC or larger) should take several minutes to build. Smaller applications should take several seconds.

If you are constantly fighting the build process, and are afraid of deployments, your application is not scalable. Take the time to reduce the delta from when a change is made to when it is pushed to production. It is worth spending the time to build a configuration system so you can turn on and off features in production. This way, you can build out features, push them into production, and turn them on and off as needed. This is a powerful mechanism to reduce the number of errors you have after releasing code that has been under development for several months.

Release Cycles

A scalable application has rapid release cycles. Rapid release cycles find bugs quickly and respond to new requirements faster. As soon as your code is written, has passed all of its tests, and been reviewed, push it to production. Having tools already in place to monitor failures makes it easy to determine how your code is operating in production. Your goal should be to release more high quality production code as fast as possible.

We have better tools, development practices, and hardware now to continue allowing developers build in a silo for three months and then spend four sleepless days releasing their code.

Communication Styles

Teams must communicate well to build a scalable product. This includes non-development teams as well. Inefficient communication leads to poor product development. We all have been members of email threads longer than a book. Information is lost or disguised, and building a product that scales to your customers demands becomes more and more difficult.

Communication needs to be efficient to build a scalable product. Developers need to be able to communicate clearly and efficiently between themselves. That means quiet working conditions and the best tools possible. A developer can hopefully express herself most efficiently with code, so having good code review tools and practices in place is a must. Code reviews primary purpose is not to catch bugs, it is to spread knowledge.

How quickly can sales managers communicate to your customers about production ready features? The faster you can onboard new customers to your product will help you build a scalable platform. You will find your pain points quicker and patch them faster. The Marketing and Development departments should work hand in hand. Everyone at the company is responsible for marketing the product, and everything they do markets the product.

Test Coverage

A scalable software platform has great test coverage. With a great suite of tests, developers can easily mimic different load scenarios with the product without having to release it into production first. They will immediately know if their changes affect the speed of their product negatively.

Customers pay your company for quality work. Releasing software without tests is not releasing quality work.

Spend the (short) amount of time it takes to have your developers learn testing. Spend the (short) amount of time it takes to write tests for your application. It will pay off dividends in the end

Ability to Change

Application scalability boils down to one thing: the ability for a team to change rapidly. Each of the sections above relates to rapid change. All of it can seem overwhemling if you are used to a slower development pace. As you add more of the above suggestions into your development flow, you will notice a snowball effect.

Each new change will piggy back on top of the previous one and the delta of time to get it successfully integrated into your team will be smaller than the previous change. For example, getting your developers to write tests may take two weeks. But as soon as they are writing tests, their code reviews will increase in quality because they are reviewing the code and test changes. Suddenly, bugs that were not clear earlier scream out of the screen. Knowledge will spread quicker, and developers will on-boarded faster. Releases will take seconds rather than hours. Communication between developers and sales managers will increase because developers will be excited to show off their new working features.

Application scalability has evolved. To be a successful software engineer, you must evolve with it.