A simple solution to scalability problems: Event Sourcing

A simple solution to scalability problems: Event Sourcing

In the past months I've been playing around with Kotlin and Spring's event sourcing engine. To get to know it better, I build a really simple clone of Untappd. If you don't know Untappd, a tl;dr is: Foursquare for beers. You can checkin a beer, share it, rate the beer and see what your friends are drinking.

Why Event Sourcing

One benefit of event sourcing is that you can separate your application into small, loosely coupled concerns (e.g. microservices) and make them communicate through events. On the other hand, you need to be comfortable with eventual consistency.
For this task, I decide to go with a modular monolithic.

Code organization

All code is hosted on Github

As you can see from the image below, I created a package called domains and one sub-package for each domain I was dealing with.

Screen-Shot-2018-06-29-at-17.54.17

Each sub-domain contains its own controller, models, repositories and services, pretty much like a total independent service.

User, Checkin and Beerare connected through user_id and beer_id on Checkin but
there is no bond between these two classes on the code (not even on the database).

User class:

@Entity
@Table(name = "users", indexes = [
  Index(name = "email_idx", columnList = "email", unique = true)
])
data class User(
        @Column(nullable = false, length = 50)
        var email: String = "",

        @Column(nullable = false)
        @JsonIgnore
        var password: String = ""

) : BaseEntity()

Checkin class:

@Entity
@Table(name = "checkins", indexes = [
  Index(name = "idx_user_id_on_checkin", columnList = "userId"),
  Index(name = "idx_beer_id_on_checkin", columnList = "beerId")
])
@EntityListeners(AuditingEntityListener::class)
data class Checkin(
        @Id
        @GeneratedValue(generator = "UUID")
        @GenericGenerator(
                name = "UUID",
                strategy = "org.hibernate.id.UUIDGenerator")
        var id: UUID? = null,

        @CreatedDate
        var createdAt: LocalDateTime? = null,

        @Column(nullable = false)
        var beerId: UUID? = null,

        @Column(nullable = false)
        var userId: UUID? = null,

        @Column(columnDefinition = "TEXT")
        var description: String = "",

        @Column
        var rate: Int = 5
) : AbstractAggregateRoot<Checkin>() {

  @PrePersist
  fun created() {
    registerEvent(CheckinCreated(this))
  }
}

With this approach, the application can be broken down without much hassle.

Event Sourcing in action

Notice that last method on Checkin? registerEvent is a method declared in AbstractAggregateRoot class. AbstractAggregateRoot is a convenience class that helps us dispatch domain events to our context. You can also dispatch events without using AbstractAggregateRoot (more on that later).
After a checkin in a beer, an event called CheckinCreated wraps the checkin and is enqueued to be dispatched. Spring will dispatch it after a save happens on the aggregate.

Capturing events

Spring has a nice approach to capture and process these events: @EventListener annotation.
At this point, after we save the checkin, Spring dispatches our CheckinCreated event to whoever is listening. We need to update the beer profile (increase checkin count, refresh beer rating, etc.) and update user's profile.
The beer domain has a listener called BeerCheckedInListener and handles it through the fun beerCheckedIn(event: CheckinCreated):

@EventListener
@Transactional
    fun beerCheckedIn(event: CheckinCreated) {
    logger.info("Updating beer rating")

    var beerId = event.checkin.beerId
    var averageRating = checkinRepository.findAverageRatingFromBeer(beerId!!)

    var beer = beerRepository.findById(beerId).get()
    beer.averageRating = BigDecimal(averageRating.toString())
    beer.totalCheckin += 1

    beerRepository.save(beer)
    }

what we are doing here is:

  1. get the beer id from the checkin
  2. calculate and set the new average rating for the beer
  3. increase the total checkin count
  4. save

and we follow a similar process on UserCheckinListener (you can see it here)

Dispatching events without AbstractAggregateRoot

It is possible to dispatch an event without rely on AbstractAggragateRoot. This class will dispatch events only after a save is perform. But how do we dispatch an event in a delete?
In that case we can use AbstractApplicationContext
First, we include AbstractApplicationContext in our controller

@RestController
@RequestMapping("/checkin")
class CreateCheckinController(
        val checkinRepository: CheckinRepository,
        val beerRepository: BeerRepository,
        val applicationContext: AbstractApplicationContext) {

then, during a delete call, we make use of it

@DeleteMapping(value = ["/{id}"])
  fun deleteCheckin(@PathVariable("id") checkinId: UUID): ResponseEntity<String> {
    val checkin = checkinRepository.findById(checkinId).orElseThrow { RuntimeException("checkin not found") }
    checkinRepository.delete(checkin)
    applicationContext.publishEvent(CheckinDeleted(checkin))

    return ResponseEntity.status(HttpStatus.NO_CONTENT).body("")
  }

publishEvent will dispatch our event to appropriate listeners.

Eventual consistency

Although this happens fast, it is not happening in a single transaction. The listeners are marked with @Async, which means that we will process then in a different thread. If the application is under load and all CPU cores are in use, it might take some time to update its profile and users might not see their checkin right away. Though its unlikely to happen, you will experience slowness if the beer has lots of checkins because the time to calculate the rating increases. Not a big deal in the beginning but something to keep in mind.

CQRS

Every time you see "event sourcing", it might be accompanied by "CQRS"

What is CQRS?

As Martin Fowler puts:

At its heart is the notion that you can use a different model to update information than the model you use to read information

In most of applications, you will have a one-to-one relationship between database tables and application models, for example, a class User maps to a database table called users.

For complex systems, this might not be ideal for many reasons. Lets use our application as example.

We have a class named Beer that maps to a table beers. This beer has thousands of checkins, has a certain rating and there are many venues selling it. So if you want to build a "beer profile" you have to:

  1. query the beers table
  2. calculate an average rating for the beer by querying checkins
  3. query the venues table to cross check which/how many venues are selling it.

while your application is small, this might not be a problem, because you have X beers, sold by Y venues. But if your application grow, and you have 10 times X beers and 1000 times Y venues, query all this data, run calculation over checkins, etc this can become your bottleneck real soon.

One might think that cache this information might be a good solution, and it might be for a period of time. Rely on cache for system stability is not a good idea. If you lose your cache (say your Redis instance had a network problem, because trust me, it WILL crash sooner or later) your application will go down. You now have a single point of failure: cache.

A better solution would be CQRS. You keep writing the checkin to checkins table, user information to users and so on, but you introduce a new model called BeerProfile.
By using event sourcing, every time a beer is checked in, you fire an event (CheckinCreatedEvent, as we did). Then, a listener will asynchronously process this event, find the BeerProfile corresponding to the beer, increase its checkin count, refresh the rating, updating the numbers on many distinct users drink it, and so on.

So from now on, when a user wants the beer profile, you don't need to cross check three different tables since all beer related information is in a single place: beer profile table.

Conclusion

CQRS and event sourcing might be good for your application but adding those concepts will also bring more complexity. New tables means more data, more expensive. Asynchronous method calls are harder to debug. Event sourcing might fail as anything else and eventual consistency can hit you hard.
Study and analyze if something is right for you, don't surf other people's wave just because everybody is doing.

Icons created by Freepik - Flaticon