The Performance Factor in Event Sourcing: What You Need to Know

One of the most asked questions when discussing event sourcing is:
Doesn't it take ages to load all the events? This article will answer this question in all its facets. First, we need to divide this question into two different cases: read and write.

The Writing Side: Aggregate

When creating the events in the aggregate, they are stored in the event store, which operates as an append-only log. When loading an aggregate, all corresponding events are queried and replayed to reconstruct the aggregate's state. This often raises the question: will loading a long-living aggregate with hundreds or even thousands of events to rebuild the aggregate state become a performance bottleneck?

Append-Only Log

Writes

In most cases, writes should be much faster initially compared to a traditional normalized table structure. This is because we only have a handful of indexes on the event store table. In normalized tables, there are often unique keys and many foreign keys on the tables, which are extra indexes and constraints that need to be checked for every write operation. On the other hand, due to the nature of the event store holding a lot of entries, like millions of rows, the unique constraint will slow things down over time. It would take a long time to reach the threshold of experiencing slow writes in the event store, and if you encounter it, you can still shorten your stream by splitting it.

Updates

Due to the immutability of the event stream, we are normally not performing any update actions on the table. But if we did, there are no real differences compared to a classical table approach. Updates are quite fast if indexes are used - otherwise, they can get slow.

Deletes

Deletion of entries in the database almost always results in a re-ordering of the B-Tree, which is the default index strategy for most databases. This costs time, and since we normally do not delete entries in our event store, this will not impact us when using event sourcing. You could count this as a performance gain compared to the traditional way if you want.

Read

Now to the most interesting part of this section: reading all events for an aggregate. First, reading from the database is naturally fast. If you ever encounter slow queries, it is often because of multiple joins or missing needed indexes.

But what can we do if an aggregate lives so long that it accumulates so many events that it's getting slow to load? Even though it's rarely the case, there are solutions to that problem. Let's dive into two of them.

Cache the Aggregate: Snapshots

The most commonly heard solution to this problem is probably snapshotting. This technique is a type of caching where the current state of the aggregate is serialized and saved into a persistent store. It is important that the state is saved with the current playhead of the aggregate. This snapshot is then loaded, deserialized, and afterward, the events that occurred after the snapshot creation are applied to the aggregate. For our library, the configuration can be quickly set up using the attribute #[Snapshot], which requires the cache pool name. There is also the option to configure how many events should trigger a cache renewal by specifying a batch amount. So in this example after 1000 events a snapshot will be automatically generated and saved - you don't need to do this yourself.

use Patchlevel\EventSourcing\Aggregate\BasicAggregateRoot;
use Patchlevel\EventSourcing\Attribute\Aggregate;
use Patchlevel\EventSourcing\Attribute\Snapshot;

#[Aggregate('profile')]
#[Snapshot('default', batch: 1000)]
final class Profile extends BasicAggregateRoot
{
    // ...
}

As with every cache, there are situations where we need to invalidate it. This happens when we update the aggregate code. For example, by removing or adding a new property. This is necessary because the serialized aggregate stored in the cache is no longer in sync with the current class. Cache invalidation can be done by updating the version, which can also be configured through the attribute. This makes it effortless to invalidate the cache during deployment.

use Patchlevel\EventSourcing\Aggregate\BasicAggregateRoot;
use Patchlevel\EventSourcing\Attribute\Aggregate;
use Patchlevel\EventSourcing\Attribute\Snapshot;

#[Aggregate('profile')]
#[Snapshot('default', version: '2')]
final class Profile extends BasicAggregateRoot
{
    // ...
}

To be honest, this technique is rarely used in real applications. Why? Loading 10,000 events for an aggregate takes only 50ms in our benchmarks. This is already quite fast, and an aggregate that accumulates so many events in its lifetime is rare. However, for these cases, you could use the snapshot cache to improve loading times.

Note

We benchmark every PR via GitHub actions using PostgreSQL to ensure no performance degradation slips through. You can check an example here.

Aggregate Lifecycle: Split Stream

There is also a more natural way for an aggregate to reduce its loading time. In most businesses, certain events mark the beginning of a new cycle, such as a contract renewal. We can utilize these events to shorten the stream for the aggregate. This is done by marking the event with the #[SplitStream] attribute. When this event is recorded and saved, all past events are marked as archived. This results in loading only the events starting from the one that split the stream.

use Patchlevel\EventSourcing\Attribute\Event;
use Patchlevel\EventSourcing\Attribute\SplitStream;

#[Event('customer.contract_renewed')]
#[SplitStream]
final class ContractRenewed
{
    public function __construct(
        // contract renewal data
        public CustomerId $customerId,
        public \DateTimeImmutable $until,
        // other aggregate data
        public string $name,
    ) {
    }
}

As shown in the code example, some comments indicate that we not only require the event's data but also all the data from the aggregate at that point. This is logical because the aggregate will now start loading from this event onward. Therefore, all past information gathered up to the stream split must be present in this event. Otherwise, the aggregate would operate with incomplete data.

But don't worry — the past events are not lost. They are only marked as archived, ensuring the store does not load them anymore for recreating the aggregate state.

This solution achieves faster loading times by reducing the number of events that need to be loaded. The key difference from snapshotting is that this approach aligns with business logic. Split stream events are an integral part of the business and reflect how the business operates with its data. You can read more about split stream in our documentation.

The Application Side: Object Hydration

The next aspect to consider is the application side, in our case, PHP. Often, we want to represent data as objects to simplify working with it, and for this purpose, we commonly use an ORM like doctrine/orm. What many people don't know is that these ORMs also perform complex and time-intensive processes: the hydration of data into objects. This process can become time-consuming, especially in complex structures involving multiple joins. Ocramius has written an excellent blog post on this topic, titled Doctrine ORM Hydration Performance Optimization.

This does not change when using event sourcing. Here, a hydration step is also needed for the events. However, the structures involved are typically much simpler, making the hydration process significantly faster and more straightforward. To further optimize this, we developed a hydrator tailored for this use case. It features a modern and intuitive configuration using #[Attributes] and includes built-in GDPR support, leveraging crypto shredding.

The Reading Side: Projections

Now, the reading side – the side where most people agree it is inherently more performant and flexible compared to the traditional ORM-based approach. With event sourcing, we can create highly optimized read models tailored to our specific needs, offering virtually unlimited possibilities in designing projections. This flexibility is one of the greatest advantages of event sourcing.

Here is an example of how a projector could look. The purpose of this projection is to display the number of guests currently checked in at different hotels.

use Doctrine\DBAL\Connection;
use Patchlevel\EventSourcing\Aggregate\Uuid;
use Patchlevel\EventSourcing\Attribute\Projector;
use Patchlevel\EventSourcing\Attribute\Setup;
use Patchlevel\EventSourcing\Attribute\Subscribe;
use Patchlevel\EventSourcing\Attribute\Teardown;
use Patchlevel\EventSourcing\Subscription\Subscriber\SubscriberUtil;

#[Projector('hotel')]
final class HotelProjector
{
    use SubscriberUtil;

    public function __construct(
        private readonly Connection $db,
    ) {
    }

    /** @return list<array{id: string, name: string, guests: int}> */
    public function getHotels(): array
    {
        return $this->db->fetchAllAssociative("SELECT id, name, guests FROM {$this->table()};");
    }

    #[Subscribe(HotelCreated::class)]
    public function handleHotelCreated(HotelCreated $event, Uuid $aggregateId): void
    {
        $this->db->insert(
            $this->table(),
            [
                'id' => $aggregateId->toString(),
                'name' => $event->hotelName,
                'guests' => 0,
            ],
        );
    }

    #[Subscribe(GuestIsCheckedIn::class)]
    public function handleGuestIsCheckedIn(Uuid $aggregateId): void
    {
        $this->db->executeStatement(
            "UPDATE {$this->table()} SET guests = guests + 1 WHERE id = ?;",
            [$aggregateId->toString()],
        );
    }

    #[Subscribe(GuestIsCheckedOut::class)]
    public function handleGuestIsCheckedOut(Uuid $aggregateId): void
    {
        $this->db->executeStatement(
            "UPDATE {$this->table()} SET guests = guests - 1 WHERE id = ?;",
            [$aggregateId->toString()],
        );
    }

    #[Setup]
    public function create(): void
    {
        $this->db->executeStatement("CREATE TABLE IF NOT EXISTS {$this->table()} (id VARCHAR PRIMARY KEY, name VARCHAR, guests INTEGER);");
    }

    #[Teardown]
    public function drop(): void
    {
        $this->db->executeStatement("DROP TABLE IF EXISTS {$this->table()};");
    }

    private function table(): string
    {
        return 'projection_' . $this->subscriberId();
    }
}

Each method of the projector with a #[Subscribe] attribute gets called as soon as an event, to which it is subscribed, is recorded. With this, we can create a different read model for each use case, all populated by these events.

Pick the Best Tool for the Job

We can choose the most suitable database for each read model based on its specific requirements. This decision can be made independently for every read model, providing the flexibility to utilize specialized tools tailored to specific use cases. All factors can be considered in this decision, including performance, special features, or infrastructure concerns.

If we later realize that a different tool would be a better fit – for example, switching from MySQL to Elasticsearch for a read model to improve search capabilities – the migration process is straightforward. We update the projector to accommodate any necessary changes and deploy it. Data migration happens automatically by reprocessing all events and applying them to the new projection. This eliminates the need for a dedicated migration script to transfer data from one storage system to another.

Normalization & JOIN-less Queries

If we decide that a relational database is the right choice, we can optimize the table structure for performance. Typically, joins are a common bottleneck in slow database queries, especially when joining multiple tables. Joins are often necessary because traditional table designs prioritize reducing data redundancy and contextualizing data across multiple tables.

However, for read models, we don't need to normalize tables in the same way. This allows us to consolidate all necessary data into a single, denormalized table if desired. The result is join-less queries, which significantly improve query performance.

If you're concerned that this approach might make things more complicated, don't worry! You can still design your read models with multiple tables that can be joined together if that suits your needs better. This approach simply adds more options without taking anything away.

Conclusion

Event sourcing offers a robust way to handle data by maintaining an immutable log of events. While concerns about performance - particularly around reading and writing aggregates - are valid when using long-living aggregates, the techniques discussed in this article demonstrate how those challenges can be mitigated.

Aggregate Writing Performance: The append-only nature of the event store ensures fast writes, particularly in the early stages. The absence of complex constraints like foreign keys further enhances this.
Aggregate Reading Performance: Advanced techniques like snapshots and split streams help optimize aggregate loading times, even for long-lived aggregates with thousands of events.
Projections: The flexibility of event sourcing shines through in the reading layer. With projections, you can design read-optimized models tailored to specific use cases, leveraging the best storage tools for each scenario. The ability to rebuild projections from events ensures adaptability without data migration headaches.

By leveraging these strategies, event sourcing not only retains its immutability and traceability benefits but also remains performant, even for complex or high-volume applications. Whether you're optimizing for aggregate performance or read-side flexibility, event sourcing provides powerful tools and patterns to meet your needs.

For more detailed guidance, check out our documentation, and feel free to share your thoughts or questions in the comments or on GitHub!