Learning by Doing – BlockChain & Akka Tutorial

Intro

BlockChain is a really hot topic in the IT world. Most of the developers have no idea how it works. There are general ideas in the heads, but it’s full of black-boxes. There are several blog posts and toy-projects online to demonstrate some basic idea about Blockchain. This will be one of these. I think the best way to learn a new thing is by trying it out, or in this case, by trying to implement something. And then argue why it would be a bad solution in some real-life situations. (If you are interested, at the end of this post there will be a list of links about the posts I read and I think are worth reading.)

Why is this blockchain thing that interesting? Because it tries to solve problems that we didn’t have before, or we didn’t even know we had. What if we stop trusting? How can we know what the truth is? How can we agree on something if we don’t trust each other? We have a lot of room to play with ideas. We can come up with new ideas of mathematical or logical problems. It’s challenging (smile)

learn blockchain with akka tutorial

I work with a JVM-based stack, mostly with Scala. In this post I will use Scala too. I like the actor model and reactive things, so I will use Akka where I can. (And in the project there are communications, consensus, work-balancing, and other interesting stuff so I use these where I can as well.) If you don’t understand how the blockchain works but you know the Scala syntax, this post will help you understand the basics, show some of the problems and some way to resolve each of them. If you are familiar with the blockchain basics, you can learn some Scala, or see how actors can work together.

The code is available on GitHub. For some of the core classes, I have tests too. However, because of the “toy” part of the project, and the “minimalist sample implementation,” most of the classes didn’t get tests. (I shrank some of my code-blocks, left out some formatters and actor props.)

Now, let’s jump in.

 

What is BlockChain?

You will never figure this out, but it is literally a chain of blocks. It depends on the actual implementation, but you can think of it like a linked listEvery block is linked to a previous block(s). What is a block? Some wrapped data. The block is where we store the information, and the whole chain is what makes this information valid. If I want to store something, I need to make a block, and append it to the chain. I know it’s really abstract, but we need to start somewhere.

In this post I will show a blockchain solution which similar to bitcoin. I will not persist the chain (but will talk about serialization), and I will not show how the transactions work, just show the pure “chain” and its building and validating mechanism. (Most posts start with the transactions a.k.a. the data in a block. I think that is why people can’t separate the blockchain and crypto currencies in their heads. You don’t need to know what a transaction is (yet)!)

Let’s code something:

Done! Good job! There are two problems here. The first is that we didn’t solve any trust issue (which is the main problem we want to solve), and the second is that this cannot really leave our JVM (or if we try to move it, we need to move the whole chain).

Let’s make it more “movable”:

A bit better. An index is generally a good idea. The toString can be the json representation of this object (debugging and future serialization/store reasons, the JsObject is used here as a more structural Any alternative). And a hash function, using all of the fields is a pretty good start for a “pointer.” I should talk about the hash a bit, or rather why it’s better than just using the index? If you give me three blocks (1, 0, a), (2, 1, b), (3, 2, c), and I want to change the data in the second block, I can easily do so with (2, 1, e). So I can modify any of the blocks without breaking the chain-links. If the pointer is the full content of the previous block, I need to modify all the blocks built on that. For the prev. example (1, {}, a), (2, { (1, {}, a)}, b), (3, {(2, { (1, {}, a)}, b)}, c) is your three blocks, and if I want to modify the second to (2, { (1, [], a)}, e) I need to modify the third too. If I want to change a little in one of the early blocks; I need to rewrite the whole chain. (It will not stop me because I’m not lazy, but we will solve that situation later.) Of course I don’t want to store every block in a new block, so I need to come up with an algorithm. An algorithm that guarantees that if I use it on the same input, it will generate the same output. And it will be “really hard” to find an another input which has the same output. (The really hard part is guaranteed by mathematicians.) More about hashes and cryptographic hash functions.

Our chain is not so good, I can add independent blocks to it. Solve this issue and add some more logic:

With this implementation, if you give me a new block, I can decide whether or not it can be appended to my chain. And because of the hash function (if we want to stay consistent (with the same chain)), I can’t modify any block in my chain; the hashes will not mach. I need to recreate all of the blocks after the modified one. If I do this, the next block you give me will not fit to my chain, or the next block I try to send to you will not fit to your chain. (So we won’t agree on a general truth.)

But if it’s not just two of us, we need to consider two other things; concurrency, and communication failures. If there are three people, you, me and Alice, the network can ruin our chain state. What if Alice sends us a new block, but you never get it. And then I try to build a new block on top of Alice’s block and send it to you? You will not accept it, because you’ll see a missing block. But if you let me send both blocks (the missing one and mine), I can modify the block from Alice. And you can’t tell whether Alice or I’m messing with you later. (You can’t tell which blocks contain truth and which blocks are modified.) We need to solve this situation, and the method developed first is: try to restrict the block creation somehow.

Proof of Work

There are many algorithms to solve the block creation problem. I will show the bitcoin method, which is based on the ‘proof of work.’ The idea behind this is simple: every participant will try to solve a mathematical problem. Whoever solves it first broadcasts the new block (and the result in it), and the others can check whether the result was right. If the block is accepted, a new round starts. The winner block is appended to the others’ chain every time. If you want to come up with a new block, you need to solve the problem. If you want to modify a block, you need to solve that block’s problem, and all of the modified blocks’ problems too. Which means that if you want to modify a block, you need to solve problems faster than the others altogether. The given work (problem solving) verifies that the longest chain is the valid one. (Take a break here, and if you need to, read this section again.)

We have an interesting challenge with this idea! We need to create a “hard” problem with fast result-validation. And if we want some scaling in users but want to restrict the block creation intensity (for example one block every one minute), we need to make this problem scalable. (The more people join our chain, the harder the problem we need to create for them. Hence the bigger (sum) computing capacity we’ll need to solve the problem in (roughly) the same timeframe the lower computing capacity solved the easier problem before.)

One very good idea is: we can modify something in our new block freely and give some restrictions about the hash generated. For example, the hash could be a String, def isValid = newBlock.hash.startsWith("a"). You might be surprised, but bitcoin builds on the exact same idea. They extended this to a better scale. They tried to create a function with two input parameters, where one parameter is some kind of difficulty, and the other is a hash, and the output is the answer to the ultimate question: “isValid?”. One interesting part is that the difficulty needs to scale linearly. So if I want a block in 10 minutes, and new nodes come up, and the block generation speed drops to ~9 minutes, I can multiply my difficulty by 1.1 and the block generation time will go back to ~10 minutes. (When I talk about block generation time I mean the average time of the last N successfully created blocks.)

Take a look at a working concept:

We need a difficulty to carry on, and a nonce. The nonce is the part of the block that we can freely modify. The “mine” function will do exactly this. (When we reach the transaction part, I will explain the name of the function. But now you can think of it like creating a new block from nowhere.) Modify the nonce of the given block, and if it’s valid; return it, if not; call it recursively.

I added a time-stamp too. If we want to adjust the difficulty, we will need that.

The DifficultyHelper is a piece of art mathematical formula. If you want to understand how it works, I advise you to read this. (The difficulty adjust uses down rounding for a more consistent double value across nodes.)

So far It is just some basic mathematical concept. We have about 50 lines of code. We have blocks, and we can add them one-by-one to the chain. The fun part begins when we start to communicate with others!

 

Distributed blocks

As I said before, the whole blockchain thingy is trying to solve some kind of a distributed data storage problem with trust issues. It’s like a big, slow, distributed database. One big difference between here and a “simple” distributed database is that there, the participants don’t trust each other. What we are talking about here is not a 10-node-cluster created by one sysadmin (from the same source). It’s an N-node-cluster, where everybody can have a node (or at least bitcoin was build with this idea, there are some newer concepts, too). And If everybody can have a node, somebody will try to attack the network (or try to create a “tricky” node). So instead of restricting who can host a node, we restrict the chances of the system being attacked.

Okay, so we want to share our data with others. We want to get data from others. We want to communicate.

Our node has two fairly separated states. When we start a node, we want to synchronize the blocks from the other nodes to get the “latest general truth”, and when we have synchronized, we want to create a new block with our “truth” (faster than the others). When we find a valid block, we want to tell the others. So that the others can validate it and broadcast to even further nodes. If someone sends us a block, we can validate it, append to our chain and broadcast to some others. What happens if we get a block not valid? We drop it. What happens if the given block has a higher id then our next id should be? We missed some blocks. We went out of sync, so we need to synchronize again. Sounds easy! Do some implementation! Start with the supporting object and the classes inside it.

The messages are mostly clear. If this actor sends a SyncRequest (with the last id it has) to an other actor, that actor will respond with a SyncResponse in a list of blocks. When a new block is created, we broadcast NewBlockCreated with the new block. The only strange thing is the SyncDown, which is an inner communication message, signaling that this actor needs to be synced.

The states are separated to syncing or ready.

The data should be clear too, when we are in a stable state, we have one chain. When we are syncing, we need to remember the original chain (in case we are under attack and need to restore). And, we need to manage the operations on the new (inProgress) chain. The actualChain function will help us determinate the needed lastLocalBlockIndex to the SyncDown message.

Here, I assume a minor refactor in our BlockChain class. First, I cut it to a trait and an in-memory implementation. Second, I implemented a new function which can add a list of blocks to an existing chain.


(Okay, I could use some of the tricks from this blogpost, but now it is fine enough for demonstration. Later I can easily change my in-memory instance to a file-stored variation.)

I modified the constructor a bit with some implicit parameters, for easier instance creation (and testing). I encapsulated the config parameters of the difficulty to a new value holder class. Most of the interface implementation is really straightforward. The new things came with calculateNextDiff (which will modify the difficulty of the next empty block, and help check the block validation), the isNewBlockValid (which now uses the difficulty in the validation process) and the addBlocks. The add blocks function is not as clear at first. We need to sort the new blocks. If some of the new blocks can be added to the old chain, we add them one by one (and validate them while adding, this is the recursive function). When we have no intersection in the given blocks and our original ones, we drop some blocks from the start of the list, and return a smaller list. (So the caller will sync again maybe with more luck.) The return params are a bit messy too. If we return with true, that means the second param is valid and gets enhanced with the newly given blocks. If we return a false, we tell the caller that this operation has failed, and that the caller can try again: something else with the given second parameter. If the second parameter is empty that means that someone tried to attack us, and its genesis block is not matching to our genesis block, or its given blocks are not valid. (This part could use a refactor, if you have any modification idea pls comment or PR in the repo.)

Going back to our SyncActor:

Our actor is a finite state machine (FSM in akka). It starts with a syncing state, with the data we get from the actor creator.

When someone wants us to sync down, we send a sync request to a random node and keep our state. If we get a NewBlockCreated message, we redirect that message to ourselves (with the “tell” operator we keep the original sender), and try to handle that again in the Ready state. We do the same when we get a SyncRequest. (These two handlers are needed for a low number of node setups, so some nodes can leave the Syncing state.) When we get a SyncResponse with no data, we know that our state is the last state in the network so we can move to the Ready state. If we get a nonempty response, we try to add the new blocks to our chain. If we don’t need to retry the sync, we go to the Ready state with the original or the newly constructed chain (based on which is the longest). When we can’t apply the given blocks to the old chain, our returning “newChain” will be shorter, so we can sync down from a smaller index (and then we will get a longer response next time). If somehow our newly given blocks aren’t valid, the “newChain” will be empty and we can leave our Sync state.

When we are in the ready state three things can happen.

  1. We need to sync, so we go to the syncing state.
  2. We get a SyncRequest, so we respond with the requested blocks.
  3. We get a new block, so we check it. If it’s valid, we broadcast it. If it’s not valid, but it has a larger index than our last block, we try to sync. Otherwise the given block is just “bad,” so we do nothing.

(Ofc if sb found a new block, we need to mine that! See below.)

We will have some kind of timer, so we do a general sync time-to-time. We log the unhandled messages, just in case.


One really cool feature in the FSM actors is the onTransition handler. When we leave the Sync state, we start mining (and log the last block). When we start Syncing, we just simply log this.

Our private functions are mostly just call/message delegations. (Before we start mining with a new actor, we kill all of the previously created actors.) (If you are a good observer, you can already see some flaws in our logic. For example if we just change states to Sync and then back to Ready, we will start mining from the beginning again. But these “bugs” are not as harmful in an experimental piece of code. We will need to delegate some tasks from this actor to others on the long run anyway, so I will live with that now and resolve them later.)

Now we can synchronize. Or at least we will if we write our last (four) components.

I told you something about an external timer which will force the actor to Sync state from time to time.

Akka has a Timers trait, really handy in these cases.

We need a miner too:

This is exactly the same functionality as the previously shown miner function. It has some logging in it, and the recursion handled with actor messages.

These two were logical. But the communication actor will be strange…

First of all, I used akka clusters. Akka clusters are for trusted ~100 node problems. So this implementation happened because it’s one of the fastest implementations I could come up with, and is not a non-trusted environment. But it is just an actor. We let the code be flexible enough to change the communication component to REST or have a p2p implementation later.

Secondly, I used FSM to solve the egg vs chicken problem (the sync actor needs a communication actor, but the communication actor needs a target too). (TBH it’s an interesting and non-trivial problem to solve in akka, still I tried to find the best way to solve this.)

When we get our first SendToRandomNode message, we get initialized with the sender. And we just deliver messages.


We will send random messages to Up nodes (and only to others, so we filter out ourselves).

At this point, we don’t need any direct block serialization. All of the block movements are handled by akka. But if we want to make this chain implementation usable for others, we will need to change this layer to some p2p and rest based layer. (And for example json serialization will help us describe blocks outside of the JVM.)

What do we have?

We have some implementation! But we can’t try it out yet (sad) We need to write one last component; the main function(/class/object/whatever) (and an appconf)!

Now we can start our first node with ‘sbt “run 4500″‘ and a second node with ‘sbt “run 4501″‘ and we can scale our node number up. If we stop and start instances we will see that they synchronize up quickly and start the race for the new block.

A list of our features and a quick recap:

  • we can create blocks
  • the blocks are serializable
  • we have a consistent hash function
  • the blocks contain their parents’ hashes so it’s hard to modify only one block in the list
  • the list of blocks is called a chain and we can validate a new block and can add it to the chain
  • we can handle adding multiple blocks too
  • we have a PoW algorithm which can scale and slow down the block creation
  • with the PoW we can ensure that the longest valid chain is the “truth”
  • we can write our own data to a block and make that block valid on the actual chain with our “mining” mechanism
  • we can distribute the newly created block to other nodes
  • we can synchronize with other nodes

A list of missing features and flaws:

  • our communication protocol is not built for untrusted nodes
  • our synchronization protocol can be attacked (because of the current mining start/stop implementation)
  • our mining is one threaded and could be improved
  • we don’t store the chain
  • we don’t have any purpose yet… Why do I want to write my data into the chain? Why would anyone want to store data in the chain?
  • we solved a lot of problems but why?

Closing thoughts

What I learned from this (and hopefully you too)?

First of all I think this is far less of a rocket science than most of the IT world thinks. There are some really good ideas in it, the whole concept is really interesting (and more humanistic with the non trusting part), and if we learn the basics, we get a new perspective.

While I played with it, I learned some akka features, like FSM, and Clusters. I want to play more with the actor hierarchy and clusters too. So in the next part I will definitely try to implement a mining pool.

We barely touched the blocks. In the next episode I will implement transactions and block validation with transactions. I will write a digital wallet, and try to make a clearer picture as to why this non trusting thing is important in the banking and financial era.

While I implemented this, I read a bunch of blogposts, wiki pages, and other sources. I think implementing a problem will give way to more interesting questions than just reading about it. So if you are interested in this technology, I think you need to start implement some silly “will never work” solution to face the same problems the “big players” had. (Maybe writing a virtual machine for smart contract execution is a bit of an overkill for a petproject (smile) ) I really want to try some Byzantine algo and some DAG implementation too in the future.

Links

https://amarszalek.net/blog/2018/03/20/simple-blockchain-in-kotlin/

https://hackernoon.com/learn-blockchains-by-building-one-117428612f46

https://dzone.com/articles/blockchain-implementation-with-java-code

https://github.com/sungjk/s-blockchain

https://github.com/virattt/ScalaBlockchain

https://github.com/fluency03/blockchain-in-scala

https://medium.com/programmers-blockchain/create-simple-blockchain-java-tutorial-from-scratch-6eeed3cb03fa


Wanari is an 18-year-old custom software development company located at the heart of Budapest, Hungary. As you can see we keep up with the times and are pretty tech-savvy 😀 To learn more about us, check out our website. For more content like this, follow us on social media just like Fb or LinkedIn.

Gergő Törcsvári

Gergő Törcsvári

Software Developer at Wanari
I would love to change the world, but they won’t give me the source code (yet).