mirror of https://github.com/torvalds/linux.git
I don't usually pay much attention to the stale "? " addresses in
stack backtraces, but this lucky report from Pawel Sikora hints that
mremap's move_ptes() has inadequate locking against page migration.
3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
kernel BUG at include/linux/swapops.h:105!
RIP: 0010:[<ffffffff81127b76>] [<ffffffff81127b76>]
migration_entry_wait+0x156/0x160
[<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0
[<ffffffff810feee2>] ? __pte_alloc+0x42/0x120
[<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310
[<ffffffff81102a31>] handle_mm_fault+0x181/0x310
[<ffffffff81106097>] ? vma_adjust+0x537/0x570
[<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
[<ffffffff81109a05>] ? do_mremap+0x2d5/0x570
[<ffffffff81421d5f>] page_fault+0x1f/0x30
mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
and pagetable locks, were good enough before page migration (with its
requirement that every migration entry be found) came in, and enough
while migration always held mmap_sem; but not enough nowadays, when
there's memory hotremove and compaction.
The danger is that move_ptes() lets a migration entry dodge around
behind remove_migration_pte()'s back, so it's in the old location when
looking at the new, then in the new location when looking at the old.
Either mremap's move_ptes() must additionally take anon_vma lock(), or
migration's remove_migration_pte() must stop peeking for is_swap_entry()
before it takes pagetable lock.
Consensus chooses the latter: we prefer to add overhead to migration
than to mremapping, which gets used by JVMs and by exec stack setup.
Reported-and-tested-by: Paweł Sikora <pluto@agmk.net>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|---|---|---|
| .. | ||
| Kconfig | ||
| Kconfig.debug | ||
| Makefile | ||
| backing-dev.c | ||
| bootmem.c | ||
| bounce.c | ||
| cleancache.c | ||
| compaction.c | ||
| debug-pagealloc.c | ||
| dmapool.c | ||
| fadvise.c | ||
| failslab.c | ||
| filemap.c | ||
| filemap_xip.c | ||
| fremap.c | ||
| highmem.c | ||
| huge_memory.c | ||
| hugetlb.c | ||
| hwpoison-inject.c | ||
| init-mm.c | ||
| internal.h | ||
| kmemcheck.c | ||
| kmemleak-test.c | ||
| kmemleak.c | ||
| ksm.c | ||
| maccess.c | ||
| madvise.c | ||
| memblock.c | ||
| memcontrol.c | ||
| memory-failure.c | ||
| memory.c | ||
| memory_hotplug.c | ||
| mempolicy.c | ||
| mempool.c | ||
| migrate.c | ||
| mincore.c | ||
| mlock.c | ||
| mm_init.c | ||
| mmap.c | ||
| mmu_context.c | ||
| mmu_notifier.c | ||
| mmzone.c | ||
| mprotect.c | ||
| mremap.c | ||
| msync.c | ||
| nobootmem.c | ||
| nommu.c | ||
| oom_kill.c | ||
| page-writeback.c | ||
| page_alloc.c | ||
| page_cgroup.c | ||
| page_io.c | ||
| page_isolation.c | ||
| pagewalk.c | ||
| percpu-km.c | ||
| percpu-vm.c | ||
| percpu.c | ||
| pgtable-generic.c | ||
| prio_tree.c | ||
| quicklist.c | ||
| readahead.c | ||
| rmap.c | ||
| shmem.c | ||
| slab.c | ||
| slob.c | ||
| slub.c | ||
| sparse-vmemmap.c | ||
| sparse.c | ||
| swap.c | ||
| swap_state.c | ||
| swapfile.c | ||
| thrash.c | ||
| truncate.c | ||
| util.c | ||
| vmalloc.c | ||
| vmscan.c | ||
| vmstat.c | ||