Todo: https://www.youtube.com/watch?v=vdPALZ-GCfI&list=PLSE8ODhjZXjbj8BMuIrRcacnQh20hmY9g

F2023 #00 - Course Overview & Logistics (CMU Intro to Database Systems)

Also there is Adv Databases course

image

Types of Databases

Selecting the right database is crucial for project success. Here’s a summary of key points:

image

Vector DB

image

A vector database indexes and stores vector embeddings for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling.

A vector database stores high-dimensional vectors extracted from various unstructured data, like audio, video, image, and text. Then we can calculate the similarity among unstructured data. Typical use cases include:

CAP and Databases

image

Ref:

SQL

Atomicity (all-or-nothing guarantee)

Atomicity: The system can only be in the state it was before or after the operation, not something in between.

In the context of ACID, atomicity is not about concurrency. It does not describe what happens if several processes try to access the same data at the same time, because that is covered under the letter I, for isolation.

Atomicity simplifies this problem: if a transaction was aborted, the application can be sure it didn’t change anything, so it can safely be retried.

Consistency

The idea of ACID consistency is that you have certain statements about your data (invariants) that must always be true — for example, in an accounting system, credits and debits across all accounts must always be balanced.

However, this idea of consistency depends on the application’s notion of invariants, and it’s the application’s responsibility to define its transactions correctly so that they preserve consistency.

Atomicity, isolation, and durability are properties of the database, whereas consistency (in the ACID sense) is a property of the application.

Atomicity vs Isolation

image

Single-object writes

Isolation levels

Concurrently running transactions shouldn’t interfere with each other. For example, if one transaction makes several writes, then another transaction should see either all or none of those writes, but not some subset.

image

image

🔹 Serializalble: This is the highest isolation level. Concurrent transactions are guaranteed to be executed in sequence.

🔹 Repeatable Read: Data read during the transaction stays the same as the transaction starts.

🔹 Read Committed: Data modification can only be read after the transaction is committed.

🔹 Read Uncommitted: Other transactions can read the data modification before a transaction is committed.

image

Ref: https://blog.bytebytego.com/p/what-are-database-isolation-levels

Dirty Read

Violating isolation: one transaction reads another transaction’s uncommitted writes (a “dirty read”)

image Ref: https://maxnilz.com/docs/003-database/015-concurrency-control/

No need to even write, you can just read too and you might end up reading uncommitted write

image

Dirty Write

With dirty writes, conflicting writes from different transactions can be mixed up.

image

If a transaction can overwrite data written by another transaction that is not yet committed (or aborted), this is called a “Dirty Write”. If transactions update multiple objects, dirty writes can lead to a bad outcome.

Transactions running at the read committed isolation level must prevent dirty writes, usually by delaying the second write until the first write’s transaction has committed or aborted.

Implementation details preventing Dirty Writes and Dirty Reads

[!NOTE]
Most commonly, databases prevent dirty writes by using row-level locks:

  • When a transaction wants to modify a particular object (row or document), it must first acquire a lock on that object.
  • It must then hold that lock until the transaction is committed or aborted.
  • Only one transaction can hold the lock for any given object; if another transaction wants to write to the same object, it must wait until the first transaction is committed or aborted before it can acquire the lock and continue.
  • This locking is done automatically by databases in read committed mode (or stronger isolation levels).

Preventing Dirty reads:

  • For every object that is written, the database remembers both the old committed value and the new value set by the transaction that currently holds the write lock.
  • While the transaction is ongoing, any other transactions that read the object are simply given the old value.
  • Only when the new value is committed do transactions switch over to reading the new value.

Snapshot Isolation and Repeatable Read

Read Committed provides isolation again dirty reads and dirty writes, but still there are few issues.

image

image

Summary

image

SQL Databases

image

image

image

Nice read about Cockroach DB https://www.cockroachlabs.com/blog/distributed-sql-key-value-store/

image