CS3223

Course Title

Database Systems Implementation

Grade

B+

Semester

AY23/24 S2

Review

Ah another B+ course for me. This course suffers the same fate as my CS2100 and CS2106. It is a VERY computational-heavy course, with lots of arbitrary calculations here and there. You are the computer. Be the computer. But then again, it is a pre-requisite for many of the other database courses, so you have no choice but to take this.

The content is so dry and so boring. It has little to do with SQL, and more to do with data structures and performance. You start off learning about B-trees, which allow the database to store data in a sorted manner. You go through how to search for specific data, how to evaluate queries, and how to rebalance the tree after insertions and deletions.

After B-trees, the next big topic is Hashing, either linear or dynamic hashing. In this topic, you focus more on how to build your index from scratch. The difference between the linear and dynamic hashing is quite significant, so it's best to fully understand them before moving on.

Then, you move on algorithms like handling SELECT, PROJECT, MERGE and JOIN. But instead of being explanation questions, their significance comes in the middle section of the course.

The entire middle section is dedicated to query optimisation and performance evaluation. When running these queries, the database can evaluate them using different strategies. Using different indices, reshuffling the search queries, those are some of the things the optimiser change do to figure out the optimal strategy. And guess what? That's your job too! You are the computer. Be the computer. Do all these calculations yourself and see what answers you get! The big thing you need to be able to count is the number of disk I/Os done by your program, which basically means how many times your code reads from memory. This one metric leads to so much pain. So many conditions, so many calculations.

The final section is a little more of a break. It's all about concurrency control. Basically how you make sure that data is always correct, yet allow many processes to access it at once. You can use locks, you can use timestamps, you can use scoped locks. All valid ways of implementing concurrency control. Much more straightforward than the computation questions, but still pretty dry and difficult.

Workload is more on the consistent side. Every week, you have to present your answer at least once. This means you can't show up to the tutorial empty-handed. You must prepare your answers beforehand. All that for 3%, which is not worth IMO. I probably got my B+ because of my finals, which had a lot of concepts that I didn't know very well.