In the next part of understanding database series, we will continue with our previous example of building a database and explore concurrency in databases. Please take some time to read through earlier part of the series.
Atomicity & Durability (A & D)
Lets say if we are trying to transfer money from account A to account B, and our database looks like this.
|NAME||ACCOUNT BALANCE||TRANSFER ALLOWED|
To transfer 100 USD from John's account to Jane's account, we would have to fetch John's record and update the balance to 10000-100 and save it back. In next step we would fetch Jane's account and update the balance to 500+100 and save it back. However we failed to check Jane's account is not allowed for transfer, this would cause exception to be raised by our system, but we now have a situation where we have withdrawn 100 USD from John's account but not able to put it in Jane's account. This is a business validation but in real world there are a million things which could go wrong, the database could go down in middle of transfer or power outage, in those scenario we need to ensure that the whole transfer should either go through or fail not in middle.
We need a Unit of work which should be either in success of fail state and can not be in partial state, or a Transaction.
To implement such a scenario, we need to make sure that the data is not written to our primary storage, while we have to make sure that the changes are not lost in case of crash. So lets put the instructions in a separate file and when the user says Commit we would transfer to primary storage which is our CSV file in this case. The last point were the database has transferred to primary storage is called a Checkpoint, usually the rollback or recovery is done from last Checkpoint.
In case of crash we would simply pick up this log and replay it on primary storage for all unwritten changes. This concept is implemented in various different ways like Write Ahead Log, Undo/Redo logs but essentially the concept is fairly similar.
Consistency is essentially a promise that after our Unit of work is complete, the database would move from one consistent state to another, just so that we do not catch our database with its pants down.
In our example of primitive database, as long as one person is using it, it would work fine but what if multiple people are using it simultaneously. Lets explore what can go wrong in a multiple user scenario. Lets say I ask for a record which you are trying to update, which value should the database return the one which you updated but have not committed or the one which was earlier to the update you did. What if we both try to update the same record, so in our example if we both try to take out 10000 USD from John at precisely same instance, what would happen?
Damn! its getting more and more complicated, why cant we keep one guy out while one of them is updating the record. Actually we can, however it would be rude to keep some one out who is just trying to read the data, so lets allow him to read last consistent data which we had, however lets keep the update guy waiting till we are done.
Again, it would be really slow and rude to keep the next update guy out of entire database even if he is updating something else, so lets allow him to update unless he touches our Table, in other words lets put a Table Lock on our database. This is a sort of pessimistic approach, so we can allow the user to request for Row Lock, so that we can allow other people to update other rows in same table.
There are other forms of concurrency control which we would explore in other posts of understanding database series.