Read “Understanding the Linux Kernel” page cache chapter

image

Just using LRU won’t help here.

Problem

There is another problem with the current Linux page cache: Processes that read or write a lot of data but only do so once also have their data cached. For example, imagine a backup process. A backup process will read nearly all data stored on disk (and maybe write it to an archive file), but caching the data from this read or write operations does not make a lot of sense because it is not very likely to be accessed again soon (at least not more likely than if the backup process did not run at all). However, the Linux page cache will still cache that data and move other data out of memory. Now, accessing that data again will result in slow read requests from the hardware, effectively slowing the system down, sometimes to a point that is worse than if the page cache had been disabled completely.

How is this implemented:

The Linux Page cache maintains a radix tree for every process address space. For this radix tree it maintains two lists

We have both active and inactive list to avoid Scan resistance problem, otherwise when we scan a large file, it would clear whole cache

image

The active list specifies the hot list and the inactive list specifies the cold list.

Pages are initially inserted into the inactive list and when they are accessed twice, they are moved to the head of the active list.

Ref: https://arpitbhayani.me/blogs/2q-cache/ and https://arpitbhayani.me/blogs/midpoint-insertion-caching-strategy/

Simplified 2Q

Simplified 2Q algorithm works with two buffers: the primary LRU buffer - Am and a secondary FIFO buffer - A1. New faulted pages first go to the secondary buffer A1 and then when the page is referenced again, it moves to the primary LRU buffer Am. This ensures that the page that moves to the primary LRU buffer is hot and indeed requires to be cached.

image

MySQL workaround instead of maintaining two lists

Midpoint insertion strategy

image

More examples with vmtouch

image

Ref: https://biriukov.dev/docs/page-cache/4-page-cache-eviction-and-page-reclaim/

Refault distance

When a page is evicted, the page is removed from the inactive list and its reference from the radix tree is removed too. However, in place of the removed page, a shadow meta information is stored. When a page is accessed and if there exists a shadow information then the page is directly inserted into the active page. Otherwise, the page is stored initially in the inactive list. In this way, past access information is already used in the Linux cache. However, once the page is put into the active list, this information is lost! So, the next time the same page can be evicted again when actually evicting another page could have been beneficial. This is the first problem that we try to solve.

https://github.com/torvalds/linux/blob/master/mm/workingset.c