In one of our recent blog posts, we showed the benefits that event sourcing can bring to your project and business. One
key takeaway was that we no longer lose any data. However, this can also lead to problems in specific situations, such
as when removing data sets. What do we do if we have data that needs to be removed? Is this possible with event
sourcing? How can this be done with an immutable event store? I will explain how you can overcome this problem and
how our PHP event sourcing library handles it.
Expected data removal
First, let us begin with the simpler case. We already know that the data will need to be removed in the future. A good
example could be personal data from our users. Due to the EU regulations of GDPR, we need to ensure that we can delete
all personal data from a user. Let's take a simple example: we have an aggregate where we save the name of a user, and
this name can also be changed, which is handled by a dedicated event called NameChanged
.
use Patchlevel\EventSourcing\Aggregate\Uuid;
use Patchlevel\EventSourcing\Attribute\Event;
#[Event('name_changed')]
final class NameChanged
{
public function __construct(
public Uuid $id,
public string $name,
) {
}
}
Now, if we save this event, it will be part of our immutable event store, so we cannot update it afterward. However,
since we already know that this could cause problems due to GDPR, we can take action to prevent issues. The solution is
to avoid saving sensitive data directly in our event store. We will discuss two options here: Crypto Shredding and
Tokenization.
Crypto Shredding
With Crypto Shredding, we save the data encrypted in our event store. This way, we don’t have the name in plaintext in
our database but instead as an encrypted string. The key used to encrypt the data is saved separately. This could be in
the same database in a different table, a separate database, or even on the filesystem. Why are we doing this? With this
setup, we can “delete” the data at any time. As soon as the user requests removal from our system, we delete the
encryption key for that user. When this happens, the encrypted data in our event store becomes unreadable: et voilà,
problem solved.
Our library supports Crypto Shredding and is easy to use.
For implementation, we provide some Attributes
: PersonalData
to mark sensitive data and DataSubjectId
to identify
and save the encryption key. For encrypted data, there is also the option to provide fallback data if desired.
use Patchlevel\EventSourcing\Aggregate\Uuid;
use Patchlevel\EventSourcing\Attribute\Event;
#[Event('name_changed')]
final class NameChanged
{
public function __construct(
#[DataSubjectId]
public Uuid $id,
#[PersonalData(fallback: 'anon')]
public string $name,
) {
}
}
Here, we would get anon
for our name
property if decryption is unsuccessful. With this fallback, our application can
still function as expected without crashing due to missing data. Next, we have the configuration for the type of
encryption to use. If you are
using the symfony bundle, the
configuration is a breeze:
patchlevel_event_sourcing:
cryptography:
enabled: true
algorithm: 'aes-256-gcm'
Now, our DoctrineCipherKeyStore
will be used to store the encrypted data. Since it's based
on doctrine/dbal, a wide range of databases is already supported. The algorithm is
used by our openssl
-based implementation to encrypt and decrypt the data.
Tokenization
Tokenization is another technique that can be used to prevent saving sensitive data in the event store. With this
approach, we don't encrypt the data and save it to the store. Instead, we send the data to
a vault and receive a token in return. This token is then passed around in our application
and stored in the event store. Whenever we need the real data, we query it from the vault, which returns the data we
need.
One advantage of this solution is that we are now only handling tokens in our domain instead of sensitive data. This
reduces the potential issues you could encounter. What do I mean by that? Well, unintentionally leaking sensitive data
becomes highly unlikely, since you need to explicitly access the vault for that. Retrieving this data often requires a
valid reason, which also increases the auditability of the data.
Unfortunately, we don't provide a solution for tokenization in our library, as we believe that tokenization should be
done before the data touches the persistence layer. Therefore, tokenization is out of scope for the library. However, if
you disagree, don't hesitate
to open an issue or even submit a PR on GitHub!
Unexpected removal of data is required
Now, let's talk about the more challenging part: the case of deleting data we did not anticipate needing to remove. The
event store is immutable, and this case is no exception, so manipulating the event store is still a no-go. This may seem
like an impossible task, doesn't it? But don't worry, there is a solution.
Rewrite History
We cannot update the events in our store, but we can recreate our store. What do I mean by that? I mean reading all
of our events and writing them into a new store. Between these two operations, we can perform whatever changes we need.
This could involve dropping a complete stream, editing values for placeholders, or applying one of the previously
described solutions. The result will be a cleaned-up new event store without the data we needed to remove. We could also
get rid of some upcasters in this process if we change the
events in the same way our upcasters did.
We are working on a new feature that will simplify the task
considerably. With this, you can read the current store and execute a list
of translators on the messages. These can include
things like renaming, updating, filtering, or even creating new events. After that, the new message stream is written
into our new store. Once the new stream has been tested, we can switch our application to use the new event store.
Recreating the new event store may take some time if our old store has already grown over time.
$oldStore;
$newStore;
$pipeline = new Pipe(
$oldStore->load(),
new ChainTranslator([
new AnonymizeUserInformationTranslator(),
new MapProfileAdressToProfileLocationTranslator(),
new ExcludeEventTranslator([ProfileNameUpdated::class]),
new RecalculatePlayheadTranslator(),
])
);
$newStore->save(...$pipeline);
The example above shows how you could create a one-time command to test the process and migrate the old store to the new
one. I included multiple translators to demonstrate that there are many possible ways to handle these situations. One
option could be to anonymize the data using our crypto-shredding feature to remove plaintext data from the store.
Another solution is to map the event to a different event that excludes the sensitive data. The final approach is to
drop entire events. For each case, you should thoroughly test the application afterward to prevent failures.
For a more sustainable solution, we recommend using
the subscription engine to execute the migration for
several reasons. First, this allows us to batch saves to the new Store
easily if we use the BatchableSubscriber
.
Second, we can run this in parallel within our application and recreate it easily if anything goes wrong. Lastly, schema
creation is also handled automatically.
#[Subscriber('migrate', RunMode::Once)]
final class MigrateStoreSubscriber implements BatchableSubscriber
{
private readonly SchemaDirector $schemaDirector;
private array $messages = [];
private readonly array $translators;
public function __construct(
private readonly Store $targetStore,
) {
$this->schemaDirector = new DoctrineSchemaDirector(
$targetStore->connection(),
new ChainDoctrineSchemaConfigurator([$targetStore]),
);
$this->translators = [
new AnonymizeUserInformationTranslator(),
new MapProfileAdressToProfileLocationTranslator(),
new ExcludeEventTranslator([ProfileNameUpdated::class]),
new RecalculatePlayheadTranslator(),
];
}
#[Subscribe('*')]
public function handle(Message $message): void
{
$this->messages[] = $message;
}
public function beginBatch(): void
{
$this->messages = [];
}
public function commitBatch(): void
{
$pipeline = new Pipe($this->messages, $this->translators);
$this->messages = [];
$this->targetStore->save(...$pipeline);
}
public function rollbackBatch(): void
{
$this->messages = [];
}
public function forceCommit(): bool
{
return count($this->messages) >= 10_000;
}
#[Setup]
public function setup(): void
{
$this->schemaDirector->create();
}
#[Teardown]
public function teardown(): void
{
$this->schemaDirector->drop();
}
}
Conclusion
In this post, we explored how to address the challenges of data removal in event-sourced applications, focusing on cases
where compliance and privacy laws require specific data to be deletable. With techniques like Crypto Shredding and
Tokenization, we can handle personal data securely, either by encrypting sensitive information with removable keys
or by storing tokens instead of actual data. These approaches ensure handling sensitive data and GDPR compliance by
enabling data to be effectively deleted from an immutable store.
When unexpected data deletion is needed, we can Rewrite History to re-create the event store without sensitive data.
By reading, modifying, and then writing back events into a new store, developers can meet legal or business requirements
without altering the integrity of the event-based architecture. Together, these solutions allow applications based
on event sourcing to handle data removal securely and flexibly, ensuring
both regulatory compliance and system resilience.