Designing atomicity and resiliency when publishing to the event bus
When you publish integration events through a distributed messaging system like your event bus, you
have the problem of atomically updating the original database and publishing an event (that is, either
both operations complete or none of them). For instance, in the simplified example shown earlier, the
code commits data to the database when the product price is changed and then publishes a
ProductPriceChangedIntegrationEvent message. Initially, it might look essential that these two
operations be performed atomically. However, if you are using a distributed transaction involving the
database and the message broker, as you do in older systems like
Microsoft Message Queuing
(MSMQ)
, this approach is not recommended for the reasons described by the
CAP theorem
.
Basically, you use microservices to build scalable and highly available systems. Simplifying somewhat,
the CAP theorem says that you cannot build a (distributed) database (or a microservice that owns its
model) that’s c
ontinually available, strongly consistent,
and
tolerant to any partition. You must choose
two of these three properties.
In microservices-based architectures, you should choose availability and tolerance, and you should
de-emphasize strong consistency. Therefore, in most modern microservice-based applications, you
usually do not want to use distributed transactions in messaging, as you do when you implement
distributed transactions
based on the Windows Distributed Transaction Coordinator (DTC) with
MSMQ
.
Let’s go back to the initial issue and its example. If the service crashes after the database is updated
(in this case, right after the line of code with
_context.SaveChangesAsync()
), but before the
integration event is published, the overall system could become inconsistent. This approach might be
business critical, depending on the specific business operation you are dealing with.
As mentioned earlier in the architecture section, you can have several approaches for dealing with this
issue:
•
Using the full
Event Sourcing pattern
.
•
Using transaction log mining.
•
Using the
Outbox pattern
. This is a transactional table to store the integration events (extending
the local transaction).
For this scenario, using the full Event Sourcing (ES) pattern is one of the best approaches, if not
the
best. However, in many application scenarios, you might not be able to implement a full ES system. ES
144
CHAPTER 5 | Designing and Developing Multi-Container and Microservice-Based .NET Applications
means storing only domain events in your transactional database, instead of storing current state
data. Storing only domain events can have great benefits, such as having the history of your system
available and being able to determine the state of your system at any moment in the past. However,
implementing a full ES system requires you to rearchitect most of your system and introduces many
other complexities and requirements. For example, you would want to use a database specifically
made for event sourcing, such as
Event Store
, or a document-oriented database such as Azure
Cosmos DB, MongoDB, Cassandra, CouchDB, or RavenDB. ES is a great approach for this problem, but
not the easiest solution unless you are already familiar with event sourcing.
The option to use transaction log mining initially looks transparent. However, to use this approach,
the microservice has to be coupled to your RDBMS transaction log, such as the SQL Server transaction
log. This approach is probably not desirable. Another drawback is that the low-level updates recorded
in the transaction log might not be at the same level as your high-level integration events. If so, the
process of reverse-engineering those transaction log operations can be difficult.
A balanced approach is a mix of a transactional database table and a simplified ES pattern. You can
use a state such as “ready to publish the event,” which you set in the original event when you commit
it to the integration events table. You then try to publish the event to the event bus. If the publish-
event action succeeds, you start another transaction in the origin service and move the state from
“ready to publish the event” to “event already published.”
If the publish-event action in the event bus fails, the data still will not be inconsistent within the origin
microservice
—it is still marked as “ready to publish the event,” and with respect to the rest of the
services, it will eventually be consistent. You can always have background jobs checking the state of
the transactions or integration events. If the job finds an event in the “ready to publish
the event”
state, it can try to republish that event to the event bus.
Notice that with this approach, you are persisting only the integration events for each origin
microservice, and only the events that you want to communicate to other microservices or external
systems. In contrast, in a full ES system, you store all domain events as well.
Therefore, this balanced approach is a simplified ES system. You need a list of integration events with
their current state (“ready to publish” versus “published”). Bu
t you only need to implement these
states for the integration events. And in this approach, you do not need to store all your domain data
as events in the transactional database, as you would in a full ES system.
If you are already using a relational database, you can use a transactional table to store integration
events. To achieve atomicity in your application, you use a two-step process based on local
transactions. Basically, you have an IntegrationEvent table in the same database where you have your
domain entities. That table works as an insurance for achieving atomicity so that you include persisted
integration events into the same transactions that are committing your domain data.
Step by step, the process goes like this:
1.
The application begins a local database transaction.
2.
It then updates the state of your domain entities and inserts an event into the integration event
table.
3.
Finally, it commits the transaction, so you get the desired atomicity and then
145
CHAPTER 5 | Designing and Developing Multi-Container and Microservice-Based .NET Applications
4.
You publish the event somehow (next).
When implementing the steps of publishing the events, you have these choices:
•
Publish the integration event right after committing the transaction and use another local
transaction to mark the events in the table as being published. Then, use the table just as an
artifact to track the integration events in case of issues in the remote microservices, and perform
compensatory actions based on the stored integration events.
•
Use the table as a kind of queue. A separate application thread or process queries the
integration event table, publishes the events to the event bus, and then uses a local transaction
to mark the events as published.
Figure 6-22 shows the architecture for the first of these approaches.
Figure 6-22. Atomicity when publishing events to the event bus
The approach illustrated in Figure 6-22 is missing an additional worker microservice that is in charge
of checking and confirming the success of the published integration events. In case of failure, that
additional checker worker microservice can read events from the table and republish them, that is,
repeat step number 2.
About the second approach: you use the EventLog table as a queue and always use a worker
microservice to publish the messages. In that case, the process is like that shown in Figure 6-23. This
shows an additional microservice, and the table is the single source when publishing events.
146
CHAPTER 5 | Designing and Developing Multi-Container and Microservice-Based .NET Applications
Figure 6-23. Atomicity when publishing events to the event bus with a worker microservice
For simplicity, the eShopOnContainers sample uses the first approach (with no additional processes or
checker microservices) plus the event bus. However, the eShopOnContainers sample is not handling
all possible failure cases. In a real application deployed to the cloud, you must embrace the fact that
issues will arise eventually, and you must implement that check and resend logic. Using the table as a
queue can be more effective than the first approach if you have that table as a single source of events
when publishing them (with the worker) through the event bus.
Dostları ilə paylaş: |