mirror of https://github.com/torvalds/linux.git
Sasha Levin has reported two THP BUGs[1][2]. I believe both of them
have the same root cause. Let's look to them one by one.
The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!". It's
BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page(). From my
testing I see that page_mapcount() is higher than mapcount here.
I think it happens due to race between zap_huge_pmd() and
page_check_address_pmd(). page_check_address_pmd() misses PMD which is
under zap:
CPU0 CPU1
zap_huge_pmd()
pmdp_get_and_clear()
__split_huge_page()
anon_vma_interval_tree_foreach()
__split_huge_page_splitting()
page_check_address_pmd()
mm_find_pmd()
/*
* We check if PMD present without taking ptl: no
* serialization against zap_huge_pmd(). We miss this PMD,
* it's not accounted to 'mapcount' in __split_huge_page().
*/
pmd_present(pmd) == 0
BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
page_remove_rmap(page)
atomic_add_negative(-1, &page->_mapcount)
The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
This happens in similar way:
CPU0 CPU1
zap_huge_pmd()
pmdp_get_and_clear()
page_remove_rmap(page)
atomic_add_negative(-1, &page->_mapcount)
__split_huge_page()
anon_vma_interval_tree_foreach()
__split_huge_page_splitting()
page_check_address_pmd()
mm_find_pmd()
pmd_present(pmd) == 0 /* The same comment as above */
/*
* No crash this time since we already decremented page->_mapcount in
* zap_huge_pmd().
*/
BUG_ON(mapcount != page_mapcount(page))
/*
* We split the compound page here into small pages without
* serialization against zap_huge_pmd()
*/
__split_huge_page_refcount()
VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
So my understanding the problem is pmd_present() check in mm_find_pmd()
without taking page table lock.
The bug was introduced by me commit with commit
|
||
|---|---|---|
| .. | ||
| Kconfig | ||
| Kconfig.debug | ||
| Makefile | ||
| backing-dev.c | ||
| balloon_compaction.c | ||
| bootmem.c | ||
| bounce.c | ||
| cleancache.c | ||
| compaction.c | ||
| debug-pagealloc.c | ||
| dmapool.c | ||
| early_ioremap.c | ||
| fadvise.c | ||
| failslab.c | ||
| filemap.c | ||
| filemap_xip.c | ||
| fremap.c | ||
| frontswap.c | ||
| highmem.c | ||
| huge_memory.c | ||
| hugetlb.c | ||
| hugetlb_cgroup.c | ||
| hwpoison-inject.c | ||
| init-mm.c | ||
| internal.h | ||
| interval_tree.c | ||
| iov_iter.c | ||
| kmemcheck.c | ||
| kmemleak-test.c | ||
| kmemleak.c | ||
| ksm.c | ||
| list_lru.c | ||
| maccess.c | ||
| madvise.c | ||
| memblock.c | ||
| memcontrol.c | ||
| memory-failure.c | ||
| memory.c | ||
| memory_hotplug.c | ||
| mempolicy.c | ||
| mempool.c | ||
| migrate.c | ||
| mincore.c | ||
| mlock.c | ||
| mm_init.c | ||
| mmap.c | ||
| mmu_context.c | ||
| mmu_notifier.c | ||
| mmzone.c | ||
| mprotect.c | ||
| mremap.c | ||
| msync.c | ||
| nobootmem.c | ||
| nommu.c | ||
| oom_kill.c | ||
| page-writeback.c | ||
| page_alloc.c | ||
| page_cgroup.c | ||
| page_io.c | ||
| page_isolation.c | ||
| pagewalk.c | ||
| percpu-km.c | ||
| percpu-vm.c | ||
| percpu.c | ||
| pgtable-generic.c | ||
| process_vm_access.c | ||
| quicklist.c | ||
| readahead.c | ||
| rmap.c | ||
| shmem.c | ||
| slab.c | ||
| slab.h | ||
| slab_common.c | ||
| slob.c | ||
| slub.c | ||
| sparse-vmemmap.c | ||
| sparse.c | ||
| swap.c | ||
| swap_state.c | ||
| swapfile.c | ||
| truncate.c | ||
| util.c | ||
| vmacache.c | ||
| vmalloc.c | ||
| vmpressure.c | ||
| vmscan.c | ||
| vmstat.c | ||
| workingset.c | ||
| zbud.c | ||
| zsmalloc.c | ||
| zswap.c | ||