Windows copes with this and even chkdsk does not detect or fix this
so we have to cope with it, too. Thanks to Pawel Kot for reporting
the problem.
- Miscellaneous updates to layout.h.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
this converts fs/jfs to kzalloc() usage.
compile tested with make allyesconfig
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Thanks to Adrian Bunk for debugging the problem and to Shaggy for
helping find the solution.
Also added a fix for 64K pages we found in loosely-related testing
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change reverts the 033b96fd30 commit
from Kay Sievers that removed the mount/umount uevents from the kernel.
Some older versions of HAL still depend on these events to detect when a
new device has been mounted. These events are not correctly emitted,
and are broken by design, and so, should not be relied upon by any
future program. Instead, the /proc/mounts file should be polled to
properly detect this kind of event.
A feature-removal-schedule.txt entry has been added, noting when this
interface will be removed from the kernel.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Minor updates to the documentation to bring them into sync with current
websites and available features. The debug flag was switched back to hex
to match the documentation.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) it should use nr_processes(), not nr_threads; otherwise we are getting
very confused find(1) and friends, among other things.
2) better do that at stat() time than at every damn lookup in procfs root.
Patch had been sitting in FC4 kernels for many months now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
I got all of these backwards. We want to return
min(input timeout, new timeout)
to userspace to prevent increasing the time-remaining value.
Thanks to Ernst Herzberg <earny@net4u.de> for reporting and diagnosing.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's a rather theoretical case of the BUG triggering in
fuse_reset_request():
- iget() fails because of OOM after a successful CREATE_OPEN request
- during IO on the resulting RELEASE request the connection is aborted
Fix and add warning to fuse_reset_request().
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a deadlock possible in the ext2 file system implementation. This
deadlock occurs when a file is removed from an ext2 file system which was
mounted with the "sync" mount option.
The problem is that ext2_xattr_delete_inode() was invoking the routine,
sync_dirty_buffer(), using a buffer head which was previously locked via
lock_buffer(). The first thing that sync_dirty_buffer() does is to lock
the buffer head that it was passed. It does this via lock_buffer(). Oops.
The solution is to unlock the buffer head in ext2_xattr_delete_inode()
before invoking sync_dirty_buffer(). This makes the code in
ext2_xattr_delete_inode() obey the same locking rules as all other callers
of sync_dirty_buffer() in the ext2 file system implementation.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Disable automatic checkpointing of the journal - this is a relic from older
ocfs2 days. Worth quite a bit of performance on longer running single node
tests.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* fix a hang in recovery that occurred in dlmlock_remote. the $RECOVERY
lock was never moved to the granted queue even after getting DLM_NORMAL
back from the master node.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* add dlm_wait_for_node_death function to be used after receiving a network
error. this will wait for the given timeout to allow the heartbeat
callbacks to update the domain map. without this, some paths may spin
and consume enough cpu that the heartbeat gets starved and never updates.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* fix a bug in dlm_convert_lock_handler where dlm_lockres_release_ast was
being called even if no ast was ever reserved
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* after successfully taking the $RECOVERY lock in EX mode, recheck to make
sure that recovery has not already begun or completed on another node
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1. The tracee can go from ptrace_stop() to do_signal_stop()
after __ptrace_unlink(p).
2. It is unsafe to __ptrace_unlink(p) while p->parent may wait
for tasklist_lock in ptrace_detach().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the kthread_ API instead of opencoding lots of hairy code for kernel
thread creation and teardown.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
This patch fixes an oops reported by Adrian Bunk in cifs_user_read when a null
read response is returned on a forcedirectio mount.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If 2 threads attached to the same process are blocking on different locks on
different files (maybe even on different servers) but have the same lock
arguments (i.e. same offset+length - actually quite common, since most
processes try to lock the entire file) then the first GRANTED call that wakes
one up will also wake the other.
Currently when the NLM_GRANTED callback comes in, lockd walks the list of
blocked locks in search of a match to the lock that the NLM server has
granted. Although it checks the lock pid, start and end, it fails to check
the filehandle and the server address.
By checking the filehandle and server IP address, we ensure that this only
happens if the locks truly are referencing the same file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch reverts commit f93ea411b73594f7d144855fd34278bcf34a9afc:
[PATCH] jbd: split checkpoint lists
This broke journal_flush() for OCFS2, which is its method of being sure
that metadata is sent to disk for another node.
And two related commits 8d3c7fce2d and
43c3e6f5ab with the subjects:
[PATCH] jbd: log_do_checkpoint fix
[PATCH] jbd: remove_transaction fix
These seem to be incremental bugfixes on the original patch and as such are
no longer needed.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The cifs session setup code has three cases, and a fourth for backlevel
LANMAN2 style session setup needed to be added. This new session setup
implmentation will eventually replace the other three and should be
easier to read while fixing a few minor problems (not setting
the LARGE READ/WRITEX flags when NTLMSSP was negotiated for example) and
adding support for NTLMv2 (which will be added with the next patch. In the
meantime, this code is marked in an CONFIG_CIFS_EXPERIMENTAL block and will
not be turned on by default until it is tested against more server types.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This fixes a potential oops if there is an error reported by
posix_acl_from_disk(). This is mostly theoretical due to the use of
magics and checksums in xattrs, but is still possible.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Unfortunately, the reiserfs_attrs_cleared bit in the superblock flag can
lie. File systems have been observed with the bit set, yet still contain
garbage in the stat data field, causing unpredictable results.
This patch backs out the enable-by-default behavior.
It eliminates the changes from: d50a5cd860,
and ef5e5414e7.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With David Woodhouse <dwmw2@infradead.org>
select() presently has a habit of increasing the value of the user's
`timeout' argument on return.
We were writing back a timeout larger than the original. We _deliberately_
round up, since we know we must wait at _least_ as long as the caller asks
us to.
The patch adds a couple of helper functions for magnitude comparison of
timespecs and of timevals, and uses them to prevent the various poll and
select functions from returning a timeout which is larger than the one which
was passed in.
The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
timeout value to the old one and was returning that. It should just return
the new timeout value.
(We have various handy timespec/timeval-to-from-nsec conversion functions in
time.h. But this code open-codes it all).
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: george anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The *at patches introduced fstatat and, due to inusfficient research, I
used the newfstat functions generally as the guideline. The result is that
on 32-bit platforms we don't have all the information needed to implement
fstatat64.
This patch modifies the code to pass up 64-bit information if
__ARCH_WANT_STAT64 is defined. I renamed the syscall entry point to make
this clear. Other archs will continue to use the existing code. On x86-64
the compat code is implemented using a new sys32_ function. this is what
is done for the other stat syscalls as well.
This patch might break some other archs (those which define
__ARCH_WANT_STAT64 and which already wired up the syscall). Yet others
might need changes to accomodate the compatibility mode. I really don't
want to do that work because all this stat handling is a mess (more so in
glibc, but the kernel is also affected). It should be done by the arch
maintainers. I'll provide some stand-alone test shortly. Those who are
eager could compile glibc and run 'make check' (no installation needed).
The patch below has been tested on x86 and x86-64.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ext2 inode attributes with relevance for jfs:
'a' EXT2_APPEND_FL -> append only
'i' EXT2_IMMUTABLE_FL -> immutable file
's' EXT2_SECRM_FL -> zero file
'u' EXT2_UNRM_FL -> allow for unrm
'A' EXT2_NOATIME_FL -> no access time
'D' EXT2_DIRSYNC_FL -> dirsync
'S' EXT2_SYNC_FL -> sync
overview of jfs flags (partially for OS/2)
value (OS/2) Linux ext2 attrs
------------------------------------------------
0x00010000 IFJOURNAL -
0x00020000 ISPARSE used
0x00040000 INLINEEA used
0x00080000 - - JFS_NOATIME_FL
0x00100000 - - JFS_DIRSYNC_FL
0x00200000 - - JFS_SYNC_FL
0x00400000 - - JFS_SECRM_FL
0x00800000 ISWAPFILE - JFS_UNRM_FL
0x01000000 - - JFS_APPEND_FL
0x02000000 IREADONLY - JFS_IMMUTABLE_FL
0x04000000 IHIDDEN - -
0x08000000 ISYSTEM - -
0x10000000 - -
0x20000000 IDIRECTORY used
0x40000000 IARCHIVE -
0x80000000 INEWNAME -
the implementation is straight forward, except
for the fact that the attributes have to be mapped
to match with the ext2 ones to avoid a separate
tool for manipulating them (this could be avoided
when using a separate flag field in the on-disk
representation, but the overhead is minimal)
a special jfs_ioctl is added to allow for the new
JFS_IOC_GETFLAGS and JFS_IOC_SETFLAGS calls.
a helper function jfs_set_inode_flags() to transfer
the flags from the on-disk version to the inode
minor changes to allow flag inheritance on inode
creation, as well as a cleanup of the on-disk
flags (including the new ones)
beforementioned helper to map between ext2 and jfs
versions of the new flags ...
the JFS_SECRM_FL and JFS_UNRM_FL are not done yet
and I'm not 100% sure they are worth the effort,
the rest seems to work out of the box ...
Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Direct backport of 2.4 fix that didn't get propagated to 2.6; original
comment follows:
<quote>
When I specify the NFS port for nfsroot (e.g.,
nfsroot=<dir>,port=2049), the
kernel uses the wrong port. In my case it tries to use 264 (0x108)
instead
of 2049 (0x801).
This patch adds the missing htons().
Eric
</quote>
Patch got applied in 2.4.21-pre6. Author: Eric Lammerts (<eric@lammerts.org>,
AFAICS).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early). Removed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the namespace structure is being shared, allocate a new one and copy
information from the current, shared, structure.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a user trigger this message on a box that had a lot of different
mounts, all with different options. It might help narrow down wtf happened
if we print out which device failed.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix one-shot support in inotify. We currently drop the IN_ONESHOT flag
during watch addition. Fix is to not do that.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix do_path_lookup() to avoid accessing invalid dentry or inode when the
link_path_walk() has failed. This should fix Bugme #5897.
Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I just noticed that my patch "don't create on open that fails due to
ERR_GRACE" (recently commited as fb553c0f17)
had an obvious problem that causes a deadlock on reboot recovery. Sending
in this now since it seems like a clear 2.6.16 candidate.--b.
We're returning with a lock held in some error cases.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
the clustering of extra pages in a buffered write.
SGI-PV: 949210
SGI-Modid: xfs-linux-melb:xfs-kern:25130a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Fix trivial type mixup in the debugfs function comments.
Signed-off-by: Vincent Hanquez <vincent@snarc.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When walking a path, the LOOKUP_CONTINUE flag is used by some filesystems
(for instance NFS) in order to determine whether or not it is looking up
the last component of the path. It this is the case, it may have to look
at the intent information in order to perform various tasks such as atomic
open.
A problem currently occurs when link_path_walk() hits a symlink. In this
case LOOKUP_CONTINUE may be cleared prematurely when we hit the end of the
path passed by __vfs_follow_link() (i.e. the end of the symlink path)
rather than when we hit the end of the path passed by the user.
The solution is to have link_path_walk() clear LOOKUP_CONTINUE if and only
if that flag was unset when we entered the function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ben points out that:
When writing files out using O_SYNC, jbd's 1 jiffy delay results in a
significant drop in throughput as the disk sits idle. The patch below
results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on my
IDE test box) when writing out files using O_SYNC.
So optimise the batching code by omitting it entirely if the process which is
doing a sync write is the same as the one which did the most recent sync
write. If that's true, we're unlikely to get any other processes joining the
transaction.
(Has been in -mm for ages - it took me a long time to get on to performance
testing it)
Numbers, on write-cache-disabled IDE:
/usr/bin/time -p synctest -n 10 -uf -t 1 -p 1 dir-name
Unpatched:
40 seconds
Patched:
35 seconds
Batching disabled:
35 seconds
This is the problematic single-process-doing-fsync case. With multiple
fsyncing processes the numbers are AFACIT unaltered by the patch.
Aside: performance testing and instrumentation shows that the transaction
batching almost doesn't help (testing with synctest -n 1 -uf -t 100 -p 10
dir-name on non-writeback-caching IDE). This is because by the time one
process is running a synchronous commit, a bunch of other processes already
have a transaction handle open, so they're all going to batch into the same
transaction anyway.
The batching seems to offer maybe 5-10% speedup with this workload, but I'm
pretty sure it was more important than that when it was first developed 4-odd
years ago...
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last fix for this function in fact opened up a much more often
triggering race.
It was uncommented tricky code, that was buggy. Add comment, make it less
tricky and fix bug.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.
As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().
(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The mount path had incorrectly asked the locking code to wait for recovery
completion, which deadlocks things because recovery waits for mount to
complete first.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
configfs always made item and attribute ownership root.root and
permissions based on a umask of 022. Add ->setattr() to allow
chown(2)/chmod(2), and persist the changes for the lifetime of the
items and attributes.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
fs/ocfs2/dlm/dlmrecovery.c does now use msleep(), and does therefore
need to #include <linux/delay.h> for getting the prototype of this
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* fix a hang which can occur during shutdown migration
* do not allow nodes to join during recovery
* when restarting lock mastery, do not ignore nodes which come up
* more than one node could become recovery master, fix this
* sleep to allow some time for heartbeat state to catch up to network
* extra debug info for bad recovery state problems
* make DLM_RECO_NODE_DATA_DONE a valid state for non-master recovery nodes
* prune all locks from dead nodes on $RECOVERY lock resources
* do NOT automatically add new nodes to mle nodemaps until they have properly
joined the domain
* make sure dlm_pick_recovery_master only exits when all nodes have synced
* properly handle dlmunlock errors in dlm_pick_recovery_master
* do not propagate network errors in dlm_send_begin_reco_message
* dead nodes were not being put in the recovery map sometimes, fix this
* dlmunlock was failing to clear the unlock actions on DLM_DENIED
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Including <asm/signal.h> results in compilation failure on ia64 due to
not including <linux/compiler.h>
Including <linux/signal.h> corrects the problem.
Please apply.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Functions called by __init funtions mustn't be __exit.
Reported by Jan-Benedict Glaw <jbglaw@lug-owl.de>.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch fixes an issue in fs/udf/namei.c reported by Coverity:
Error reported(1776)
CID: 1776
Checker: UNUSED_VALUE (help)
File: fs/udf/namei.c
Function: udf_lookup
Description: Pointer returned from "udf_find_entry" is never used
Patch description:
remove unused variable fi.
Signed-off-by: Jayachandran C. <c.jayachandran@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's doing
if (obh)
<stuff>
else
dereference obh
So presumably `obh' is never null in there.
This defect was found automatically by Coverity Prevent, a static analysis
tool.
Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix incorrect nlink of root inode for filesystems that use
simple_fill_super().
Signed-off-by: Vincent Hanquez <vincent@snarc.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The truncate() should write the file size before writing the new EOF entry.
This patch fixes it.
This bug was pointed out by Machida Hiroyuki.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ll_rw_block() needs to get ref-count only if it submits a buffer(). This
patch avoids the needless get/put of ref-count.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch replaces an own implementation with LL_RW_BLOCK(SWRITE,) which was
newly added.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The quota check in ext2_new_inode() returns ENOSPC where it should return
EDQUOT instead.
Signed-off-by: Herbert Pötzl <herbert@13thfloor.at>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is code in setfl() which attempts to preserve the O_APPEND flag on
IS_APPEND files... however IS_APPEND files could also be opened O_RDONLY
and in that case setfl() should not require O_APPEND...
coreutils 5.93 tail -f attempts to set O_NONBLOCK even on regular files...
unfortunately if you try this on an append-only log file the result is
this:
fcntl64(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl64(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = -1 EPERM (Operation not permitted)
I offer up the patch below as one way of fixing the problem... i've tested
it fixes the problem with tail -f but haven't really tested beyond that.
(I also reported the coreutils bug upstream... it shouldn't fail imho...
<https://savannah.gnu.org/bugs/index.php?func=detailitem&item_id=15473>)
Signed-off-by: dean gaudet <dean@arctic.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, if you open a file O_DIRECT, truncate it to a size that is not a
multiple of the disk block size, and then try to read the last block in the
file, the read will return 0. The problem is in do_direct_IO, here:
/* Handle holes */
if (!buffer_mapped(map_bh)) {
char *kaddr;
...
if (dio->block_in_file >=
i_size_read(dio->inode)>>blkbits) {
/* We hit eof */
page_cache_release(page);
goto out;
}
We shift off any remaining bytes in the final block of the I/O, resulting
in a 0-sized read. I've attached a patch that fixes this. I'm not happy
about how ugly the math is getting, so suggestions are more than welcome.
I've tested this with a simple program that performs the steps outlined for
reproducing the problem above. Without the patch, we get a 0-sized result
from read. With the patch, we get the correct return value from the short
read.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Joel Becker <Joel.Becker@oracle.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In case we have CONFIG_FS_XIP, ext2_show_options shows "xip" if
EXT2_MOUNT_XIP mount flag is set.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't compute and display the per-irq sums on ia64 either, too much
overhead for mostly useless figures.
Cc: Olaf Hering <olh@suse.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When v9fs_mux_rpc sends a 9P message, it may be put in the queue of unsent
request. If the user process receives a signal, v9fs_mux_rpc sets the
request error to ERREQFLUSH and assigns NULL to request's send message. If
the message was still in the unsent queue, v9fs_write_work would produce an
oops while processing it.
The patch makes sure that requests that are being flushed are moved to the
pending requests queue safely.
If a request is being flushed, don't remove it from the list of pending
requests even if it receives a reply before the flush is acknoledged. The
request will be removed during from the Rflush handler (v9fs_mux_flush_cb).
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
v9fs_put_str used to store pointer to the source string, instead of the
cbuf copy. This patch corrects it.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Two symlink fixes, v9fs_readlink didn't copy the last character of the
symlink name, v9fs_vfs_follow_link incorrectly called strlen of newly
allocated buffer instead of PATH_MAX.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a code path that passed size to ext2_xattr_set
(ext3_xattr_set_handle) before initializing it. The callees don't use the
value in that case, but gcc cannot tell. Always initialize size to get rid
of the warnings.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Shrinks "struct dentry" from 128 bytes to 124 on x86, allowing 31 objects
per slab instead of 30.
Cc: John Levon <levon@movementarian.org>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes the code like this:
bh = sb_find_get_block (sb, tmp + j);
if ((bh && DATA_BUFFER_USED(bh)) || tmp != fs32_to_cpu(sb, *p)) {
retry = 1;
brelse (bh);
goto next1;
}
bforget (bh);
sb_find_get_block() ordinarily returns a buffer_head with b_count>=2, and
this code assume that in case if "b_count>1" buffer is used, so this caused
infinite loop.
(akpm: that is-the-buffer-busy code is incomprehensible. Good riddance. Use
of block_truncate_page() seems sane).
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"rm" command, on file system with "ufs1" type cause system hang up. This
is, in fact, not so bad as it seems to be, because of after that in "kernel
control path" there are 3-4 places which may cause "oops".
So the first patch fix oopses, and the second patch fix "kernel hang up".
"oops" appears because of reading of group's summary info partly wrong, and
access to not first group's summary info cause "oops".
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fs/quota_v2.c: In function `v2_check_quota_file':
fs/quota_v2.c:39: warning: int format, different type arg (arg 2)
fs/quota_v2.c:39: warning: int format, different type arg (arg 3)
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Most of the 64 bit architectures will zero extend the first argument to
compat_sys_{openat,newfstatat,futimesat} which will fail if the 32 bit
syscall was passed AT_FDCWD (which is a small negative number). Declare
the first argument to be an unsigned int which will force the correct
sign extension when the internal functions are called in each case.
Also, do some small white space cleanups in fs/compat.c.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the server returns NLM_LCK_DENIED_NOLOCKS, we currently retry the
entire NLM_CANCEL request. This may end up looping forever unless the
server changes its mind (why would it do that, though?).
Ensure that we limit the number of retries (to 3).
See bug# 5957 in bugzilla.kernel.org.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The OpenGroup docs state that the arguments "block", "exclusive" and
"alock" must exactly match the arguments for the lock call that we are
trying to cancel.
Currently, "block" is always set to false, which is wrong.
See bug# 5956 on bugzilla.kernel.org.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Update some parameter descriptions to actually match the code.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a filesystem has been converted from 3.5.x to 3.6.x, we need an extra
check during file write to make sure we are not trying to make a 3.5.x file
> 2GB.
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs: journal_transaction_should_end should increase the count of
blocks allocated so the transaction subsystem can keep new writers from
creating a transaction that is too large.
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
write_ordered_buffers should handle dirty non-uptodate buffers without a
BUG()
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In data=journal mode, reiserfs writepage needs to make sure not to trigger
transactions while being run under PF_MEMALLOC. This patch makes sure to
redirty the page instead of forcing a transaction start in this case.
Also, calling filemap_fdata* in order to trigger io on the block device can
cause lock inversions on the page lock. Instead, do simple batching from
flush_commit_list.
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The b_private field in buffer heads needs to be zero filled when the
buffers are allocated. Thanks to Nathan Scott for finding this. It was
causing problems on systems with both XFS and reiserfs.
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After a transaction has closed but before it has finished commit, there is
a window where data=ordered mode requires invalidatepage to pin pages
instead of freeing them. This patch fixes a race between the
invalidatepage checks and data=ordered writeback, and it also adds a check
to the reiserfs write_ordered_buffers routines to write any anonymous
buffers that were dirtied after its first writeback loop.
That bug works like this:
proc1: transaction closes and a new one starts
proc1: write_ordered_buffers starts processing data=ordered list
proc1: buffer A is cleaned and written
proc2: buffer A is dirtied by another process
proc2: File is truncated to zero, page A goes through invalidatepage
proc2: reiserfs_invalidatepage sees dirty buffer A with reiserfs
journal head, pins it
proc1: write_ordered_buffers frees the journal head on buffer A
At this point, buffer A stays dirty forever
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the generic_permission code with a proper wrapper and callback instead
of having a local copy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This function is completely unused since the xattr permission checking
changes. Remove it and fold __reiserfs_permission into
reiserfs_permission.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
According to http://bugzilla.kernel.org/show_bug.cgi?id=5778
fs/reiserfs/file.c is missing this check.
Signed-off-by: Diego Calleja <diegocg@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch replaces yield and retry loop with __GFP_NOFAIL in
alloc_journal_list().
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove kmalloc() wrapper from fs/reiserfs/. Please note that a reiserfs
/proc entry format is changed because kmalloc statistics is removed.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Migrate a page with buffers without requiring writeback
This introduces a new address space operation migratepage() that may be used
by a filesystem to implement its own version of page migration.
A version is provided that migrates buffers attached to pages. Some
filesystems (ext2, ext3, xfs) are modified to utilize this feature.
The swapper address space operation are modified so that a regular
migrate_page() will occur for anonymous pages without writeback (migrate_pages
forces every anonymous page to have a swap entry).
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.15's hugepage faulting introduced huge_pages_needed accounting into
hugetlbfs: to count how many pages are already in cache, for spot check on
how far a new mapping may be allowed to extend the file. But it's muddled:
each hugepage found covers HPAGE_SIZE, not PAGE_SIZE. Once pages were
already in cache, it would overshoot, wrap its hugepages count backwards,
and so fail a harmless repeat mapping with -ENOMEM. Fixes the problem
found by Don Dupuis.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-By: Adam Litke <agl@us.ibm.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OpenBSD doesn't see "." correctly in directories created by Linux. Copying
files over several KB will buy you infinite loop in __getblk_slow().
Copying files smaller than 1 KB seems to be OK. Sometimes files will be
filled with zeros. Sometimes incorrectly copied file will reappear after
next file with truncated size.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fs/compat.c: In function `compat_sys_pselect7':
fs/compat.c:1820: warning: passing arg 5 of `compat_core_sys_select' from incompatible pointer type
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While asynchronous reads mean a performance improvement in most cases, if
the filesystem assumed that reads are synchronous, then async reads may
degrade performance (filesystem may receive reads out of order, which can
confuse it's own readahead logic).
With sshfs a 1.5 to 4 times slowdown can be measured.
There's also a need for userspace filesystems to know whether asynchronous
reads are supported by the kernel or not.
To achive these, negotiate in the INIT request whether async reads will be
used and the maximum readahead value. Update interface version to 7.6
If userspace uses a version earlier than 7.6, then disable async reads, and
set maximum readahead value to the maximum read size, as done in previous
versions.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An old patch designed to fix http://bugme.osdl.org/show_bug.cgi?id=4497,
"getdents gives empty/random result upon signal".
If smbfs's readdir() is interupted by a signal, smb_readdir() failed to
noticed that and proceeded to treat the unread-into page as valid directory
contents. Fix that up by handling the -ERESTARTSYS.
Thanks to Stian Skjelstad for reporting and testing.
Cc: Stian Skjelstad <stian@nixia.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A recent patch to
Allow run-time selection of NFS versions to export
meant that NO nfsacl service versions were exported. This patch restored
that functionality.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
creation due to ENOSPC. The current solution removes the inode when the
attribute insertion fails. Long term solution would be to make the inode
creation and attribute insertion atomic.
SGI-PV: 947610
SGI-Modid: xfs-linux-melb:xfs-kern:205193a
Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
typically - header + 13 pages).
Forgetting to set wsize on the mount command costs more than 10% on large
write (can be much more) so this makes a saner default. We still shrink
this default smaller if server can not support it.
Signed-off-by: Steve French <sfrench@us.ibm.com>
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build and boot tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
The compat layer timeout handling changes in:
9f72949f67
are busted. This is most easily seen with an X application
that uses sub-second select/poll timeout such as emacs. You
hit a key and it takes a second or so before the app responds.
The two ROUND_UP() calls upon entry are using {tv,ts}_sec where it
should instead be using {tv_usec,ts_nsec}, which perfectly explains
the observed incorrect behavior.
Another bug shot down with git bisect.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
EDAC requires a way to scrub memory if an ECC error is found and the chipset
does not do the work automatically. That means rewriting memory locations
atomically with respect to all CPUs _and_ bus masters. That means we can't
use atomic_add(foo, 0) as it gets optimised for non-SMP
This adds a function to include/asm-foo/atomic.h for the platforms currently
supported which implements a scrub of a mapped block.
It also adjusts a few other files include order where atomic.h is included
before types.h as this now causes an error as atomic_scrub uses u32.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following implementation of ppoll() and pselect() system calls
depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
thread_info.
These system calls have to change the signal mask during their
operation, and signal handlers must be invoked using the new, temporary
signal mask. The old signal mask must be restored either upon successful
exit from the system call, or upon returning from the invoked signal
handler if the system call is interrupted. We can't simply restore the
original signal mask and return to userspace, since the restored signal
mask may actually block the signal which interrupted the system call.
The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
path to trap into do_signal() just as TIF_SIGPENDING does, and by
causing do_signal() to use the saved signal mask instead of the current
signal mask when setting up the stack frame for the signal handler -- or
by causing do_signal() to simply restore the saved signal mask in the
case where there is no handler to be invoked.
The first patch implements the sys_pselect() and sys_ppoll() system
calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
#ifdef should go away in time when all architectures have implemented
it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
kernel (in the -mm tree), and the third patch then removes the
arch-specific implementations of sys_rt_sigsuspend() and replaces them
with generic versions using the same trick.
The fourth and fifth patches, provided by David Howells, implement
TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
adds the syscalls to the i386 syscall table.
This patch:
Add the pselect() and ppoll() system calls, providing core routines usable by
the original select() and poll() system calls and also the new calls (with
their semantics w.r.t timeouts).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name. These functions, openat etc, have been discussed on numerous
occasions. They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.
We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic. But this code is rather expensive. Here are some
results (similar to what Jim Meyering posted before).
The test creates a deep directory hierarchy on a tmpfs filesystem. Then
rm -fr is used to remove all directories. Without syscall support I get
this:
real 0m31.921s
user 0m0.688s
sys 0m31.234s
With syscall support the results are much better:
real 0m20.699s
user 0m0.536s
sys 0m20.149s
The interfaces are for obvious reasons currently not much used. But they'll
be used. coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them. I expect a patch to make follow soon. Every program which is walking
the filesystem tree will benefit.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
find_exported_dentry contains two duplicate loops to find an alias that the
acceptable callback likes. Split this out to a new helper and switch from
list_for_each to list_for_each_entry to make it more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A recent patch which checked the return status of vfs_getattr in nfsd,
completely missed the nfsproc.c (NFSv2) part. Here is it.
This patch moved the call to vfs_getattr from the xdr encoding (at which point
it is too late to return an error) to the call handling. This means several
calls to vfs_getattr are needed in nfsproc.c. Many are encapsulated in
nfsd_return_attrs and nfsd_return_dirop.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfsd_sync* return an errno, which usually needs to be converted to an errno,
sometimes immediately, sometimes a little later.
Also, nfsd_setattr returns an nfserr which SHOULDN'T be converted from
an errno (because it isn't one).
Also some tidyups of the form:
err = XX
err = nfserrno(err)
and
err = XX
if (err)
err = nfserrno(err)
become
err = nfserrno(XX)
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
missing nfserrno() in default case of a switch by return value of
posix_lock_file(); as the result we send negative host-endian to clients that
expect positive network-endian, preferably mentioned in RFC... BTW, that case
is not impossible - posix_lock_file() can return -ENOLCK and we do not handle
that one explicitly.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
->rp_status is network-endian and nobody byteswaps it before sending to
client; putting NFSERR_SERVERFAULT instead of nfserr_serverfault in there is
not nice...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-EINVAL (in host order, no less) is not a good thing to return to client.
nfsd4_truncate() returns it in one case and its callers expect nfs_.... from
it. AFAICS, it should be nfserr_inval
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several failure exits return -E<something> instead of nfserr_<something> and
vice versa.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up some unnecessary special-casing in the setattr code..
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix bug in rdattr_error return which causes correct error code to be
overwritten by nfserr_toosmall.
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bad bookkeeping of the share reservations when handling open upgrades was
causing open downgrade to fail.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In an earlier patch (commit b648330a1d) I noted
that a too-early grace-period check was preventing us from bumping the
sequence id on open. Unfortunately in that patch I stupidly moved the
grace-period check back too far, so now an open for create can succesfully
create the file while still returning ERR_GRACE.
The correct place for that check is after we've set the open_owner and handled
any replays, but before we actually start mucking with the filesystem.
Thanks to Avishay Traeger for reporting the bug.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfsd4_process_open1 is very highly nested; flatten it out a bit.
Also, the preceding comment, which just outlines the logic, seems redundant.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove some goto's that made the logic here a little more tortuous than
necessary.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We shouldn't check for replays until after checking whether the open owner is
confirmed. Clients are allowed to reuse openowners without bumping the seqid.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to make sure open reclaims are marked confirmed immediately so that we
can handle replays even if they fail (e.g. with a seqid-incrementing error).
(See 8.1.8.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure we get a directory when we look up the recovery directory.
Thanks to Christoph Hellwig for the bug report.
Based on feedback from Christoph and others, we may remove the need for this
lookup and just pass in a file descriptor from userspace instead, and/or
completely move the directory handling to userspace. For now we're just
fixing the obvious bugs.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should be opening this directory RDONLY, not RDWR.
Thanks to Christoph Hellwig for the bug report.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simple, useful debugging printk: print the number of each op as we process it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix some bad logic.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's confusing having both release_stateowner() and release_state_owner().
And as it turns out, release_state_owner() is short and only called from one
place; so just remove it.
Also note the confirmed check is superfluous there--preprocess_seqid_op
already check this.
And remove a redundant comment and a superfluous line assignment while we're
at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One of the things that's confusing about nfsd4_lock is that the lk_stateowner
field could be set to either of two different lockowners: the open owner or
the lock owner. Rename to lk_replay_owner and add a comment to make it clear
that it's used for whichever stateowner has its sequence id bumped for replay
detection.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
release_state_owner also puts the lock owner on the close_lru. There's no
need for that, though; replays of the failed lock would be handled by the
openowner not the lockowner.
Also consolidate the cleanup a bit, fixing leaks that can happen if errors
occur between the time a new lock owner is allocated and the lock is done.
Remove a comment and dprintk that look a little redundant.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Logic fixes for LOCK and UNLOCK.
- Move the permission check on the current file handle outside of
nfs4_lock_state()
- remove the file manager fl_release_private calls; fl_ops is not set.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are both called from two places close together. I could rearrange that
code so there is only one call site, but just removing the 'inline' is
probably best.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change nfsd_sync_dir to return an error if ->sync fails, and pass that error
up through the stack. This involves a number of rearrangements of error
paths, and care to distinguish between Linux -errno numbers and NFSERR
numbers.
In the 'create' routines, we continue with the 'setattr' even if a previous
sync_dir failed.
This patch is quite different from Takashi's in a few ways, but there is still
a strong lineage.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set the correct type and creator for symlinks.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
HFS+ also requires the correct creation date so recent version of OS X
recognize it as link.
Improve link handling:
- if something is wrong with the link, ignore the link attribute and treat
it as regular file (this also fixes a missing unlock during lookup).
- check for incorrect link counts during unlink.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Read the correct ctime from disk (it was written but never read for some
reason). Read also creation date, which is used in the next patch. (Problem
found by Olivier Castan <olivier.castan@certa.ssi.gouv.fr>)
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for HFSX, which allows for case-sensitive filenames.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the log level and a "hfs: " prefix to all kernel prints.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the log level and a "hfs: " prefix to all kernel prints. (HFS and HFS+
will use the same prefix, as they share some code and could be merged at some
point.)
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All standard system calls should be declared in include/linux/syscalls.h.
Add some of the new additions that were previously missed.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
v9fs mmap support was originally removed from v9fs at Al Viro's request,
but recently there have been requests from folks who want readpage
functionality (primarily to enable execution of files mounted via 9P).
This patch adds readpage support (but not writepage which contained most of
the objectionable code). It passes fsx-linux (and other regressions) so it
should be relatively safe.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We have to check that also the second checkpoint list is non-empty before
dropping the transaction.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While checkpointing we have to check that our transaction still is in the
checkpoint list *and* (not or) that it's not just a different transaction
with the same address.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a page while we are still submitting other buffers on the same page for
I/O.
SGI-PV: 948197
SGI-Modid: xfs-linux-melb:xfs-kern:25004a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Fix race in setting bitfields of fuse_conn. Spotted by Andrew Morton.
The two fields ->connected and ->mounted were always changed with the
fuse_lock held. But other bitfields in the same structure were changed
without the lock. In theory this could lead to losing the assignment of
even the ones under lock. The chosen solution is to change these two
fields to be a full unsigned type. The other bitfields aren't "important"
enough to warrant the extra complexity of full locking or changing them to
bitops.
For all bitfields document why they are safe wrt. concurrent
assignments.
Also make the initialization of the 'num_waiting' atomic counter explicit.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes fuse_readpages() to send READ requests asynchronously.
This makes it possible for userspace filesystems to utilize the kernel
readahead logic instead of having to implement their own (resulting in double
caching).
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a separate function for filling in the READ request. This will make it
possible to send asynchronous READ requests as well as synchronous ones.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now the INIT requests can be completely handled in inode.c and the
fuse_send_init() function need not be global any more.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add possibility for requests to run asynchronously and call an 'end' callback
when finished.
With this, the special handling of the INIT and RELEASE requests can be
cleaned up too.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add ability to abort a filesystem connection.
With the introduction of asynchronous reads, the ability to interrupt any
request is not enough to dissolve deadlocks, since now waiting for the request
completion (page unlocked) is independent of the actual request, so in a
deadlock all threads will be uninterruptible.
The solution is to make it possible to abort all requests, even those
currently undergoing I/O to/from userspace. The natural interface for this is
'mount -f mountpoint', but that only works as long as the filesystem is
attached. So also add an 'abort' attribute to the sysfs view of the
connection.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the 'waiting' attribute which indicates how many filesystem
requests are currently waiting to be completed. A non-zero value without any
filesystem activity indicates a hung or deadlocked filesystem.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kobjectify fuse_conn, and make it visible under /sys/fs/fuse/connections.
Lacking any natural naming, connections are numbered.
This patch doesn't add any attributes, just the infrastructure.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ->connected flag for a fuse_conn object previously only indicated whether
the device file for this connection is currently open or not.
Change it's meaning so that it indicates whether the connection is active or
not: now either umount or device release will clear the flag.
The separate ->mounted flag is still needed for handling background requests.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create a new list for requests in the process of being transfered to/from
userspace. This will be needed to be able to abort all requests even those
currently under I/O
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The state of request was made up of 2 bitfields (->sent and ->finished) and of
the fact that the request was on a list or not.
Unify this into a single state field.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Inline keyword is unnecessary in most cases. Clean them up.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Handle the case when the INIT request is answered with an error.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This function used the request object after decrementing its reference count
and releasing the lock. This could in theory lead to all sorts of problems.
Fix and simplify at the same time.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fuse_copy_finish() must be called before request_end(), since the later might
sleep, and no sleeping is allowed between fuse_copy_one() and
fuse_copy_finish() because of kmap_atomic()/kunmap_atomic() used in them.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Rio Karma portable MP3 player has its own proprietary partition table.
The partition layout is similar to a DOS boot sector but it begins at a
different offset and uses a different magic number (0xAB56 instead of
0xAA55). Add support for it to enable mounting the device.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
like other routines here, to ensure buffers are correctly initialised
with respect to b_private/b_end_io. Fixes an odd interaction between
XFS and reiserfs.
Signed-off-by: Nathan Scott <nathans@sgi.com>
Mark a few VFS functions as mandatory inline (based on Al Viro's request);
these must be inline due to stack usage issues during a recursive loop that
happens during the recursive symlink resolution (symlink to a symlink to a
symlink ..)
This patch at this point does not change behavior and is for documentation
purposes only (but this changes later in the series)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the remaining kmalloc() wrapper bits from fs/smbfs/.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fs/quota_v2.c can, under some conditions, issue a kernel message that says,
in totality, 'failed read'. This patch does the following:
1) Gives a hint who issued the error message, so people reading the logs
don't have to go grepping the entire kernel tree (with 11 false
positives).
2) Say what amount of data we expected, and actually got.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove redundant NULL check in reiserfs_lookup() as d_splice_alias() can take
NULL inode as input.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove redundant NULL check in isofs_lookup() as d_splice_alias() can take
NULL inode as input.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove redundant NULL check in ext3_lookup() as d_splice_alias() can take NULL
inode as input.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove redundant NULL check in ext2_lookup() as d_splice_alias() can take NULL
inode as input.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Anything that writes into a tmpfs filesystem is liable to disproportionately
decrease the available memory on a particular node. Since there's no telling
what sort of application (e.g. dd/cp/cat) might be dropping large files
there, this lets the admin choose the appropriate default behavior for their
site's situation.
Introduce a tmpfs mount option which allows specifying a memory policy and
a second option to specify the nodelist for that policy. With the default
policy, tmpfs will behave as it does today. This patch adds support for
preferred, bind, and interleave policies.
The default policy will cause pages to be added to tmpfs files on the node
which is doing the writing. Some jobs expect a single process to create
and manage the tmpfs files. This results in a node which has a
significantly reduced number of free pages.
With this patch, the administrator can specify the policy and nodes for
that policy where they would prefer allocations.
This patch was originally written by Brent Casavant and Hugh Dickins. I
added support for the bind and preferred policies and the mpol_nodelist
mount option.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A Christoph suggested that the /proc/devices file be converted to use the
seq_file interface. This patch does that.
I've obxerved one or two installation that had sufficiently large sans that
they overran the 4k limit on /proc/devices.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I tried to send the forcedeth maintainer an email, but it came back with:
"The mail address manfreds@colorfullife.com is not read anymore.
Please resent your mail to manfred@ instead of manfreds@."
This patch fixes this.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Here is update of ufs cleanup patch, brought on by the recently fixed
ubh_get_usb_second() bug that made some ugly code rather painfully
obvious. It also includes
- fix compilation warnings which appears if debug mode turn on
- remove unnecessary duplication of code to support UFS2
I tested it on ufs1 and ufs2 file-systems.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix Samba bugzilla bug 3301
In share mode encrypted password must be sent on tree connection (in our
case only the NTLM password is sent, not the older LANMAN one).
Signed-off-by: Steve French <sfrench@us.ibm.com>
There's a lack of parenthesis in fs/ufs/utils.h, so instead of the 512th
byte of buffer, the usb2 pointer will point to the nth structure of type
ufs_super_block_second.
This can cause a mount-time oops if you're unlucky (especially with
DEBUG_PAGEALLOC, which is how Alexey Dobriyan saw this problem)
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support to the proc_device_tree file for removing
and updating properties. Remove just removes the
proc file, update changes the data pointer within
the proc file. The remainder of the device-tree
changes occur elsewhere.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
from server when mount forcedirectio.
Allowing update of file size with non forcedirectio mounts should be
allowed in the fiture but requires carefully writing out the
last page in the local file if it is a partial page in order to
avoid corruption and careful serialization
Thanks to Maximiliano Curia who suggested similar changes and provided
a testcase.
Signed-off-by: Steve French <sfrench@us.ibm.com>
fs: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Move capable() from sched.h to capability.h;
- Use <linux/capability.h> where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sometimes we call do_journal_end() with t_refcount == 0. If quota is
turned on and we happen to have some inode with preallocation bad things
happen as we try to use the current handle for quota operations. Checks
for t_refcount in journal_begin() fail and we Oops. We raise t_refcount to
make those checks happy. We should not cause any bad as all the needed
quota blocks should be already attached to the transaction (they were
attached to the transaction when we allocated those preallocation blocks).
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o fs/proc/vmcore.c compilation gives warnings on ppc64. The reason being
that u64 is defined as unsigned long hence u64* is not same as loff_t*
and compiler cribs.
o Changed the parameter type to u64* instead of loff_t* to resolve the
conflict.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get more useful error info on space for trans items
SGI-PV: 947110
SGI-Modid: xfs-linux-melb:xfs-kern:24886a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
extracting the work from its queue. In addition, this processing also
decrement the inode's i_count. If there are any remaining works in queue
before this process terminates, we have unbalanced increment and decrement
of i_count. Thus it can cause assertion failure of vn_count. The fix
allows xyssyncd to process any remaining work before it is shutdown.
SGI-PV: 945935
SGI-Modid: xfs-linux-melb:xfs-kern:203970a
Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
trylock and deal with block layer congestion properly. Patch from David
Chinner and Christoph Hellwig.
SGI-PV: 947118
SGI-Modid: xfs-linux-melb:xfs-kern:203830a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
performances on rewrites since we can reduce the number of allocator
calls.
SGI-PV: 947118
SGI-Modid: xfs-linux-melb:xfs-kern:203829a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
case is very similar to delayed and unwritten extends. Reorganize the code
to share some code for these cases.
SGI-PV: 947118
SGI-Modid: xfs-linux-melb:xfs-kern:203827a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
handling offets From David Chinner and Christoph Hellwig
SGI-PV: 947118
SGI-Modid: xfs-linux-melb:xfs-kern:203826a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
page and a relative offset into that page around, and returns the current
xfs_iomap_t if the block at the specified offset fits into it, or a NULL
pointer otherwise. This patch passed the full 64bit offset into the inode
that all callers have anyway, and changes the return value to a simple
boolean. Also the function gets a more descriptive name: xfs_iomap_valid.
SGI-PV: 947118
SGI-Modid: xfs-linux-melb:xfs-kern:203825a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
allows us to submit much larger I/Os instead of sending down lots of small
buffer_heads. To do this we need to have a rather complicated I/O
submission and completion tracking infrastructure. Part of the latter has
been merged already a long time ago for direct I/O support. Part of the
problem is that we need to track sub-pagesize regions and for that we
still need buffer_heads for the time beeing. Long-term I hope we can move
to better data strucutures and/or maybe move this to fs/mpage.c instead of
having it in XFS. Original patch from Nathan Scott with various updates
from David Chinner and Christoph Hellwig.
SGI-PV: 947118
SGI-Modid: xfs-linux-melb:xfs-kern:203822a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
was caused by ENOSPC but not Rreclaimed by xfs_release or xfs_inactive.
The fix changed the condition in xfs_release and xfs_inactive to invoke
xfs_inactive_free_eofblocks for this special case, changed
xfs_inactive_free_eofblocks to clean the delayed blks after eof. It also
changed xfs_write to set correct eof when ENOSPC occurs.
SGI-PV: 946267
SGI-Modid: xfs-linux-melb:xfs-kern:203788a
Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
contention between filesystems and prevent deadlocks between filesystems
when a flush dependency exists between them.
SGI-PV: 947098
SGI-Modid: xfs-linux-melb:xfs-kern:24844a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
it needs to back out the inode creation. Tested by xfs_tests/077.
SGI-PV: 930841
SGI-Modid: xfs-linux-melb:xfs-kern:24842a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
updates and only sync back to the xfs inode when nessecary
SGI-PV: 946679
SGI-Modid: xfs-linux-melb:xfs-kern:203362a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
incorrectly limit people using this interface to size IO buffers.
SGI-PV: 910890
SGI-Modid: xfs-linux-melb:xfs-kern:24657a
Signed-off-by: Nathan Scott <nathans@sgi.com>
The assertion failure came from XFS QA41. The fix is done by enabling
truncate for delayed block in xfs_inactive.
SGI-PV: 945412
SGI-Modid: xfs-linux-melb:xfs-kern:202521a
Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
well. Also provides a mechanism for inheriting this property from the
parent directory for new files.
SGI-PV: 945264
SGI-Modid: xfs-linux-melb:xfs-kern:24367a
Signed-off-by: Nathan Scott <nathans@sgi.com>
This is a small patch that adds loglevel to a printk in
fs/afs/cmservice.c
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Patch adds a mising printk loglevel (I think KERN_WARNING is appropriate
here) in fs/binfmt_elf.c, and while I was there I made some tiny tiny tiny
adjustments to whitespacing in the neighborhood.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
This memset() line was indented with seven spaces, this patch fixes
it to use a tab instead. Yes, very trivial but it's the third time
I have to look at this line..
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Modify the initial trace output (which is based on flags in the binary
header) so that it is not done until after the magic number check. This
may well not be a flat format binary, so the flags could be invalid.
(Prime example, running a script).
Changes prompted by patches from Stuart Hughs.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the hfsplus_inode_check() debug function.
It also removes the now obsolete last_inode_cnt and inode_cnt from struct
hfsplus_sb_info.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following cleanups:
- there's no need for ext3_count_free() #ifndef EXT3FS_DEBUG
- having prototypes for ext3_count_free() in two different headers is
nonsense
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no need for ext2_count_free() #ifndef EXT2FS_DEBUG.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ICC likes to complain about storage class not being first, GCC doesn't
care much (except for cases like "inline static").
have a hard time seeing how it could break anything.
Thanks to Gabriel A. Devenyi for pointing out
http://linuxicc.sourceforge.net/ which is what made me create this patch.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we have found aliased dentry that we return, inode reference is not
dropped and inode is not attached anywhere, so it seems the reference to
inode is leaked in that case.
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These days ioctl32.h is only used for communication of fs/compat.c and
fs/compat_ioctl.c and doesn't contain anything of interest to drivers.
Remove inclusion in various drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Turn noatime and nodiratime into per-mount instead of per-sb flags.
After all the preparations this is a rather trivial patch. The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore. Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.
While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MS_NOATIME implies MS_NODIRATIME
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MS_RDONLU implies not atime updates at all, no need for the MS_NOATIME and
MS_NODIRATIME flags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all these entries in the arch ioctl32.c files are gone [1], we can
build fs/compat_ioctl.c as a normal object and kill tons of cruft. We need a
special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
This is not needed but harmless on all other architectures. Also remove some
superflous includes in fs/compat_ioctl.c
Tested on ppc64.
[1] parisc still had it's PPP handler left, which is not fully correct
for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
kick in for all netdevice users. We can introduce a proper handler
in one of the next patch series by adding a compat_ioctl method to
struct net_device but for now let's just kill it - parisc doesn't
compile in mainline anyway and I don't want this to block this
patchset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements generic handling of RTC_IRQP_READ32, RTC_IRQP_SET32,
RTC_EPOCH_READ32 and RTC_EPOCH_SET32 in fs/compat_ioctl.c. It's based on the
x86_64 code which needed a little massaging to be endian-clean.
parisc used COMPAT_IOCTL or generic w_long handlers for these whichce is wrong
and can't work because the ioctls encode sizeof(unsigned long) in their ioctl
number. parisc also duplicated COMPAT_IOCTL entries for other rtc ioctls
which I remove in this patch, too.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All callers use touch_atime now which takes a vfsmount and allows us to
implement per-mount noatime.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After my lookup_hash patch ->d_revalidate always gets a valid struct nameidata
passed (unless you use lookup_one_len which autofs4 doesn't), so we can switch
it from update_atime to touch_atime. This is a bit of an academic excercise
because autofs has a 1:1 vfsmount superblock relation, but I want to get rid
of update_atime so filesystems authors can't easily screw up per-mountpoint
noatime support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To allow various options to work per-mount instead of per-sb we need a
struct vfsmount when updating ctime and mtime. This preparation patch
replaces the inode_update_time routine with a file_update_atime routine so
we can easily get at the vfsmount. (and the file makes more sense in this
context anyway). Also get rid of the unused second argument - we always
want to update the ctime when calling this routine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
remove checks now in the VFS
XFS has an additional xattr interface through obscure ioctl. it requires
raised capabilities but we need to add some read-only/immutable checks anyway
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
remove checks now in the VFS
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Christoph Hellwig <hch@lst.de>
remove checks now in the VFS
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Christoph Hellwig <hch@lst.de>
remove checks now in the VFS
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
remove checks now in the VFS
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Christoph Hellwig <hch@lst.de>
The xattr code has rather complex permission checks because the rules are very
different for different attribute namespaces. This patch moves as much as we
can into the generic code. Currently all the major disk based filesystems
duplicate these checks, while many minor filesystems or network filesystems
lack some or all of them.
To do this we need defines for the extended attribute names in common code, I
moved them up from JFS which had the nicest defintions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add vfs_getxattr, vfs_setxattr and vfs_removexattr helpers for common checks
around invocation of the xattr methods. NFSD already was missing some of the
checks and there will be more soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: James Morris <jmorris@namei.org>
(James, I haven't touched selinux yet because it's doing various odd things
and I'm not sure how it would interact with the security attribute fallbacks
you added. Could you investigate whether it could use vfs_getxattr or if not
add a __vfs_getxattr helper to share the bits it is fine with?)
For NFSv4: instead of just converting it add an nfsd_getxattr helper for the
code shared by NFSv2/3 and NFSv4 ACLs. In fact that code isn't even
NFS-specific, but I'll wait for more users to pop up first before moving it to
common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I have heard some complaints about people not finding CONFIG_CRASH_DUMP
option and also some objections about its dependency on CONFIG_EMBEDDED.
The following patch ends that dependency. I thought of hiding it under
CONFIG_KEXEC, but CONFIG_PHYSICAL_START could also be used for some reasons
other than kexec/kdump and hence left it visible. I will also update the
documentation accordingly.
o Following patch removes the config dependency of CONFIG_PHYSICAL_START
on CONFIG_EMBEDDED. The reason being CONFIG_CRASH_DUMP option for
kdump needs CONFIG_PHYSICAL_START which makes CONFIG_CRASH_DUMP depend
on CONFIG_EMBEDDED. It is not always obvious for kdump users to choose
CONFIG_EMBEDDED.
o It also shifts the palce where this option appears, to make it closer
to kexec and kdump options.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Moving the crash_dump.c file to arch dependent part as kmap_atomic_pfn is
specific to i386 and highmem may not exist in other archs.
- Use ioremap for x86_64 to map the previous kernel memory.
- In copy_oldmem_page(), we now directly copy to the user/kernel buffer and
avoid the unneccesary copy to a kmalloc'd page.
Signed-off-by: Rachita Kothiyal <rachita@in.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Adrian Bunk <bunk@stusta.de>
- create one common dump_thread() prototype in kernel.h
- dump_thread() is only used in fs/binfmt_aout.c and can therefore be
removed on all architectures where CONFIG_BINFMT_AOUT is not
available
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch converts the superblock-lock semaphore to a mutex, affecting
lock_super()/unlock_super(). Tested on ext3 and XFS.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: Ingo Molnar <mingo@elte.hu>
(finished the conversion)
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch switches XFS over to use the new mutex code directly as
opposed to the previous workaround patch I posted earlier that avoided
the namespace clash by forcing it back to semaphores. This falls in the
'works for me<tm>' category.
Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In kbuild the file named 'Kbuild' has precedence over the file named
Makefile. Utilise a file named Kbuild to include the 2.6 Makefile for xfs
- since the xfs people likes to keep their arch specific Makefiles separate.
With this patch xfs does no longer rely on the KERNELRELEASE components to be global.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This moves the 32 bit ioctl compatibility handlers for
Video4Linux into a new file and adds explicit calls to them
to each v4l device driver.
Unfortunately, there does not seem to be any code handling
the v4l2 ioctls, so quite often the code goes through two
separate conversions, first from 32 bit v4l to 64 bit v4l,
and from there to 64 bit v4l2. My patch does not change
that, so there is still much room for improvement.
Also, some drivers have additional ioctl numbers, for
which the conversion should be handled internally to
that driver.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
balamurugan reported a problem where it was possible to have
the various Acorn partition types selected in the configuration,
but ACORN_PARTITION disabled. Since ACORN_PARTITION controls
whether we build fs/partitions/acorn.c, this lead to undefined
references to the adfspart_check_TYPE symbols.
Fix this by making the Acorn partition type symbols depend on
ACORN_PARTITION.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
For SG_IO requests, bio->bi_bdev may not be explicitly initialized. So make
bio_init() clear the field to make sure it's always NULL or valid.
Signed-off-by: Jens Axboe <axboe@suse.de>
Remove the unnecessary __attribute__((packed)) since the enum itself is packed
and not the location of it in the structure.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- remove unnecessary -ENOMEM assignments
- return correct value when buf_check_size for second time in a buffer
- handle failures when create_workqueue and kthread_create are called
- use kzalloc instead of kmalloc/memset 0
- v9fs_str_copy and v9fs_str_compare were buggy, were used only in one
place, correct the logic and move it to the place it is used.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Performance enhancement reducing the number of copies in the data and
stat paths.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
v9fs_create doesn't manage correctly the fids when it is called to create a
directory.. The fid created by the create 9P call (newfid) and the one
created by walking to already created file (wfidno) are not used
consistently.
This patch cleans up the usage of newfid and wfidno.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
New multiplexer implementation. Decreases the number of kernel threads
required. Better handling when the user process receives a signal.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a 9pfs server crashes, v9fs_fd_close() is called. Subsequently, in
cleaning up by performing a umount() on the FS that was provided by this
server v9fs_fd_close() is called again, and uses the old, freed valus of
trans->priv. This patch ensures that trans->priv can be freed only once,
otherwise this function bails early.
Signed-off-by: Michal Ostrowski <mostrows@watson.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a data corruption in smb_proc_setattr_unix()
(smb_filetype_from_mode() returns an u32, and there are only four bytes
reserved for it in data.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Function prototypes belong into header files.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove various things which were checking for gcc-1.x and gcc-2.x compilers.
From: Adrian Bunk <bunk@stusta.de>
Some documentation updates and removes some code paths for gcc < 3.2.
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"extern inline" doesn't make much sense.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch inlines the single user of struct super_block field
s_old_blocksize and removes the field.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mmap() returns -EINVAL if given a zero length, and thus elf_map() in
binfmt_elf.c does likewise if it attempts to map a (page-aligned) ELF
segment with zero filesize. Such a situation never arises with the default
linker scripts, but there's nothing inherently wrong with zero-filesize
(but non-zero memsize) ELF segments. Custom linker scripts can generate
them, and the kernel should be able to map them; this patch makes it so.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some long time ago, dentry struct was carefully tuned so that on 32 bits
UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple
of memory cache lines.
Then RCU was added and dentry struct enlarged by two pointers, with nice
results for SMP, but not so good on UP, because breaking the above tuning
(128 + 8 = 136 bytes)
This patch reverts this unwanted side effect, by using an union (d_u),
where d_rcu and d_child are placed so that these two fields can share their
memory needs.
At the time d_free() is called (and d_rcu is really used), d_child is known
to be empty and not touched by the dentry freeing.
Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so
the previous content of d_child is not needed if said dentry was unhashed
but still accessed by a CPU because of RCU constraints)
As dentry cache easily contains millions of entries, a size reduction is
worth the extra complexity of the ugly C union.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Paul Jackson <pj@sgi.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It would be helpful if the kernel did not silently stop parsing
nfs options, but instead warned about any he does not recognize. The
attached patch adds one printk to do just that.
It took me a couple of hours to find my configuration mistake.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only user of send_sigio_to_task() already holds tasklist_lock, so it is
better not to send the signal via send_group_sig_info() (which takes
tasklist recursively) but use group_send_sig_info().
The same change in send_sigurg()->send_sigurg_to_task().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are places in the resize code in which EXT3_SB() macro is used after
an statement like sbi = EXT3_SB(sb) is done. Inside the same function,
both sbi and EXT3_SB() are used to reference the super block Altough it is
not wrong, keeping it coherent increases legibility, IMHO.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the trailing newlines in calls to ext3_warning(). This function
already adds a trailing newline to the end of messages.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The patch below adds a new mount option to allow the external journal
device to be specified.
The syntax is as follows:
# mount -t ext3 -o journal_dev=0x0820 ...
where 0x0820 means major=8 and minor=32.
Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__group_complete_signal() sets ->group_stop_count in sig_kernel_coredump()
path and marks the target thread as ->group_exit_task. So any thread
except group_exit_task will go to handle_group_stop()->finish_stop().
However, when group_exit_task actually starts do_coredump(), it sets
SIGNAL_GROUP_EXIT, but does not reset ->group_stop_count while killing
other threads. If we have not yet stopped threads in the same thread
group, they all will spin in kernel mode until group_exit_task sends them
SIGKILL, because ->group_stop_count > 0 means:
recalc_sigpending_tsk() never clears TIF_SIGPENDING
get_signal_to_deliver() goes to handle_group_stop()
handle_group_stop() returns when SIGNAL_GROUP_EXIT set
syscall_exit/resume_userspace notice TIF_SIGPENDING,
call get_signal_to_deliver() again.
So we are wasting cpu cycles, and if one of these threads is rt_task() this
may be a serious problem.
NOTE: do_coredump() holds ->mmap_sem, so not stopped threads can't escape
coredumping after clearing ->group_stop_count.
See also this thread: http://marc.theaimsgroup.com/?t=112739139900002
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We've had two instances recently of overflows when doing
64_bit_value = (32_bit_value << PAGE_CACHE_SHIFT)
I did a tree-wide grep of `<<.*PAGE_CACHE_SHIFT' and this is the result.
- afs_rxfs_fetch_descriptor.offset is of type off_t, which seems broken.
- jfs and jffs are limited to 4GB anyway.
- reiserfs map_block_for_writepage() takes an unsigned long for the block -
it should take sector_t. (It'll fail for huge filesystems with
blocksize<PAGE_CACHE_SIZE)
- cramfs_read() needs to use sector_t (I think cramsfs is busted on large
filesystems anyway)
- affs is limited in file size anyway.
- I generally didn't fix 32-bit overflows in directory operations.
- arm's __flush_dcache_page() is peculiar. What if the page lies beyond 4G?
- gss_wrap_req_priv() needs checking (snd_buf->page_base)
Cc: Oleg Drokin <green@linuxhacker.ru>
Cc: David Howells <dhowells@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: <reiserfs-dev@namesys.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When making an fctl locking call through compat_sys_fcntl64 (i.e. a 32bit
app on a 64bit kernel), the syscall can return a locking range that is in
conflict with the queried lock.
If some aspect of this range does not fit in the 32bit structure, something
needs to be done.
The current code is wrong in several respects:
- It returns data to userspace even if no conflict was found
i.e. it should check l_type for F_UNLCK
- It returns -EOVERFLOW too agressively. A lock range covering
the last possible byte of the file (start = COMPAT_OFF_T_MAX,
len = 1) should be possible, but is rejected with the current test.
- A extra-long 'len' should not be a problem. If only that part
of the conflicting lock that would be visible to the 32bit
app needs to be reported to the 32bit app anyway.
This patch addresses those three issues and adds a comment to (hopefully)
record it for posterity.
Note: this patch mainly affects test-cases. Real applications rarely is
ever see the problems.
This patch has been tested (LSB test suite), and works.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SUS requires that when truncating a file to the size that it currently
is:
truncate and ftruncate should NOT modify ctime or mtime
O_TRUNC SHOULD modify ctime and mtime.
Currently mtime and ctime are always modified on most local
filesystems (side effect of ->truncate) or never modified (on NFS).
With this patch:
ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when
an update of these times is required whether size changes or not
(via a new argument to do_truncate). This allows NFS to do
the right thing for O_TRUNC.
inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE
sets the size to it's current value. This allows local filesystems
to do the right thing for f?truncate.
Also, the logic in inode_setattr is changed a bit so there are two return
points. One returns the error from vmtruncate if it failed, the other
returns 0 (there can be no other failure).
Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change
requested, we now fall-through and mark_inode_dirty. If a filesystem did
not have a ->truncate function, then vmtruncate will have changed i_size,
without marking the inode as 'dirty', and I think this is wrong.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
inode can never be NULL when calling this function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames relayfs_file_operations to relay_file_operations, and the
file operations themselves from relayfs_XXX to relay_file_XXX, to make it more
clear that they refer to relay files.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the optional is_global outparam to the create_buf_file()
callback. This can be used by clients to create a single global relayfs
buffer instead of the default per-cpu buffers. This was suggested as being
useful for certain debugging applications where it's more convenient to be
able to get all the data from a single channel without having to go to the
bother of dealing with per-cpu files.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a couple of callback functions that allow a client to hook
into relay_open()/close() and supply the files that will be used to represent
the channel buffers; the default implementation if no callbacks are defined is
to create the files in relayfs. This is to support the creation and use of
relay files in other filesystems such as debugfs, as implied by the fact that
relayfs_file_operations are exported.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since we're no longer using relayfs_inode_info, remove relayfs_alloc_inode()
and relayfs_destroy_inode() along with the relayfs inode cache.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use inode->u.generic_ip instead of relayfs_inode_info to store pointer to user
data. Clients using relayfs_file_create() to create their own files would
probably more expect their data to be stored in generic_ip; we also intend in
the next set of patches to get rid of relayfs-specific stuff in the file
operations, so we might as well do it here.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds and exports relayfs_remove_file(), for API symmetry (with
relayfs_create_file()).
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a mandatory fileops param to relayfs_create_file() and exports
that function so that clients can use it to create files defined by their own
set of file operations, in relayfs. The purpose is to allow relayfs
applications to create their own set of 'control' files alongside their relay
files in relayfs rather than having to create them in /proc or debugfs for
instance. relayfs_create_file() is also used by relay_open_buf() to create
the relay files for a channel. In this case, a pointer to
relayfs_file_operations is passed in, along with a pointer to the buffer
associated with the file.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The patch series implementa or fixes 3 things that were specifically requested
or suggested by relayfs users:
- support for non-relay files (patches 1-6)
Currently, the relayfs API only supports the creation of directories
(relayfs_create_dir()) and relay files (relay_open()). These patches adds
support for non-relay files (relayfs_create_file()). This is so relayfs
applications can create 'control files' in relayfs itself rather than in /proc
or via a netlink channel, as is currently done in the relay-app examples.
Basically what this amounts to is exporting relayfs_create_file() with an
additional file_ops param that clients can use to supply file operations for
their own special-purpose files in relayfs.
- make exported relay file ops useful (patches 7-8)
The relayfs relay_file_operations have always been exported, the intent being
to make it possible to create relay files in other filesystems such as
debugfs. The problem, though, is that currently the file operations are too
tightly coupled to relayfs to actually be used for this purpose. This patch
fixes that by adding a couple of callback functions that allow a client to
hook into relay_open()/close() and supply the files that will be used to
represent the channel buffers; the default implementation if no callbacks are
defined is to create the files in relayfs.
- add an option to create global relay buffer (patches 9-10) The file creation
callback also supplies an optional param, is_global, that can be used by
clients to create a single global relayfs buffer instead of the default
per-cpu buffers. This was suggested as being useful for certain debugging
applications where it's more convenient to be able to get all the data from a
single channel without having to go to the bother of dealing with per-cpu
files.
- cleanup, some renaming and Documentation updates (patches 11-12)
There were several comments that the use of netlink in the example code was
non-intuitive and in fact the whole relay-app business was needlessly
confusing. Based on that feedback, the example code has been completely
converted over to relayfs control files as supported by this patch, and have
also been made completely self-contained.
The converted examples along with a couple of new examples that demonstrate
using exported relay files can be found in relay-apps tarball:
http://prdownloads.sourceforge.net/relayfs/relay-apps-0.9.tar.gz?download
This patch:
Separate buffer create/destroy from inode create/destroy. We want to be able
to associate other data and not just relay buffers with inodes. Buffer
create/destroy is moved out of inode.c and into relayfs core code.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use atomic_inc_not_zero for rcu files instead of special case rcuref.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.
See mm/filemap.c:
And changes the filemap_write_and_wait() and filemap_write_and_wait_range().
Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
returns error. However, even if filemap_fdatawrite() returned an
error, it may have submitted the partially data pages to the device.
(e.g. in the case of -ENOSPC)
<quotation>
Andrew Morton writes,
If filemap_fdatawrite() returns an error, this might be due to some
I/O problem: dead disk, unplugged cable, etc. Given the generally
crappy quality of the kernel's handling of such exceptions, there's a
good chance that the filemap_fdatawait() will get stuck in D state
forever.
</quotation>
So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.
Trond, could you please review the nfs part? Especially I'm not sure,
nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes generic_cont_expand(), in order to share the code
with fatfs.
- Use vmtruncate() if ->prepare_write() returns a error.
Even if ->prepare_write() returns an error, it may already have added some
blocks. So, this truncates blocks outside of ->i_size by vmtruncate().
- Add generic_cont_expand_simple().
The generic_cont_expand_simple() assumes that ->prepare_write() can handle
the block boundary. With this, we don't need to care the extra byte.
And for expanding a file size by truncate(), fatfs uses the
added generic_cont_expand_simple().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch add to support of ->direct_IO() for mostly read.
The user of this seems to want to use for streaming read. So, current direct
I/O has limitation, it can only overwrite. (For write operation, mainly we
need to handle the hole etc..)
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All EXPORT_SYMBOL of fatfs is only for vfat/msdos. _GPL would be proper.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't need to allocate buffer for checking the buffer is uptodate. This
use sb_find_get_block() instead, and if it returns NULL it's not uptodate.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is overkill to update the FS_INFO whenever modifying
prev_free/free_clusters, because those are just a hint.
So, this patch uses ->write_super() for updating FS_INFO instead.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
instead of tasklist_lock read-locked. This is a scalability improvement on
SMP and a preemption-latency improvement under PREEMPT_RCU.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Suppress configuration of certain features for the FRV arch as they can't be
built for FRV at the moment:
(*) RTC
(*) HISAX_*
(*) PARPORT_PC
(*) VGA_CONSOLE
(*) BINFMT_ELF
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
patch 2ea55c01e0 fixed CIFS clobbering the
global fops structure for some per mount setting, by duplicating and having
2 fops structs. However the write to the fops was left behind, which is a
NOP in practice (due to the fact that we KNOW the fops has that field set
to NULL already due to the duplication). So remove it... In addition, another
instance of the same bug was forgotten in november.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
First discussed at http://marc.theaimsgroup.com/?t=113149255100001&r=1&w=2
- Use the check_range() in mempolicy.c to gather statistics.
- Improve the numa_maps code in general and fix some comments.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can. THis
operation requires root permissions.
It won't drop dirty data, so the user should run `sync' first.
Caveats:
a) Holds inode_lock for exorbitant amounts of time.
b) Needs to be taught about NUMA nodes: propagate these all the way through
so the discarding can be controlled on a per-node basis.
This is a debugging feature: useful for getting consistent results between
filesystem benchmarks. We could possibly put it under a config option, but
it's less than 300 bytes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch should fix compilation failure of fs/ufs/dir.c with defined UFS_DIR_DEBUG
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If someone changes the uid/gid mapping in userland, then we do eventually
want those changes to be propagated to the kernel. Currently the kernel
assumes that it may cache entries forever.
Add an expiration time + garbage collector for idmap entries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
inode->i_mode contains a lot more than just the mode bits. Make sure that
we mask away this extra stuff in SETATTR calls to the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Every ULP that uses the in-kernel RPC client, except the NLM
client, sets cl_chatty. There's no reason why NLM shouldn't set it, so
just get rid of cl_chatty and always be verbose.
Test-plan:
Compile with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols.
Start by creating an API to force RPC rebind, replacing logic that simply
sets cl_port to zero.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Thanks to Ed Keizer for bug and root cause. He says: "... we could only mount
the top-level Solaris share. We could not mount deeper into the tree.
Investigation showed that Solaris allows UNIX authenticated FSINFO only on the
top level of the share. This is a problem because we share/export our home
directories one level higher than we mount them. I.e. we share the partition
and not the individual home directories. This prevented access to home
directories."
We still may need to try auth_sys for the case where the client doesn't have
appropriate credentials.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The procedure that decodes statd sm_notify call seems to be skipping a
few arguments. How did this ever work?
>From folks at Polyserve.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server receives an NLM cancel call and finds no waiting lock to
cancel, then chances are the lock has already been applied, and the client
just hadn't yet processed the NLM granted callback before it sent the
cancel.
The Open Group text, for example, perimts a server to return either success
(LCK_GRANTED) or failure (LCK_DENIED) in this case. But returning an error
seems more helpful; the client may be able to use it to recognize that a
race has occurred and to recover from the race.
So, modify the relevant functions to return an error in this case.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The fl_next check here is superfluous (and possibly a layering violation).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently when lockd gets an NLM_CANCEL request, it also does an unlock for
the same range. This is incorrect.
The Open Group documentation says that "This procedure cancels an
*outstanding* blocked lock request." (Emphasis mine.)
Also, consider a client that holds a lock on the first byte of a file, and
requests a lock on the entire file. If the client cancels that request
(perhaps because the requesting process is signalled), the server shouldn't
apply perform an unlock on the entire file, since that will also remove the
previous lock that the client was already granted.
Or consider a lock request that actually *downgraded* an exclusive lock to
a shared lock.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Slightly simpler logic here makes it more trivial to verify that the up's
and down's are balanced here. Break out an assignment from a conditional
while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Upon return of a write delegation, the server will almost always bump the
change attribute. Ensure that we pick up that change so that we don't
invalidate our data cache unnecessarily.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
According to RFC3530 we're supposed to cache the change attribute
at the time the client receives a write delegation.
If the inode is clean, a CB_GETATTR callback by the server to the
client is supposed to return the cached change attribute.
If, OTOH, the inode is dirty, the client should bump the cached
change attribute by 1.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The SuS states that a call to write() will cause mtime to be updated on
the file. In order to satisfy that requirement, we need to flush out
any cached writes in nfs_getattr().
Speed things up slightly by not committing the writes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Most NFS server implementations allow up to 64KB reads and writes on the
wire. The Solaris NFS server allows up to a megabyte, for instance.
Now the Linux NFS client supports transfer sizes up to 1MB, too. This will
help reduce protocol and context switch overhead on read/write intensive NFS
workloads, and support larger atomic read and write operations on servers
that support them.
Test-plan:
Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
Tests with NFS over UDP to verify the maximum RPC payload size cap.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
To help NFS users and server developers, make the "inode number mismatch"
message display more useful information.
Test-plan:
None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_statfs() generates a log message when GETATTR returns an error. This
is usually a useless message. Make it a dprintk.
Test plan:
None
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Red Hat found a problem in the error recovery logic in __init_nfs.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace ad hoc write parameter sanity checking in nfs_file_direct_write()
with a call to generic_write_checks(). This should make the proper checks
modulo the O_LARGEFILE flag, and should catch NFSv2-specific limitations by
virtue of i_sb->s_maxbytes.
Test plan:
Posix compliance testing with both NFSv2 and NFSv3.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In RFC3530, the RENEW operation is allowed to use either
the same principal, RPC security flavour and (if RPCSEC_GSS), the same
mechanism and service that was used for SETCLIENTID_CONFIRM
OR
Any principal, RPC security flavour and service combination that
currently has an OPEN file on the server.
Choose the latter since that doesn't require us to keep credentials for
the same principal for the entire duration of the mount.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Convert private implementations in NFSv4 state recovery and delegation
code to use kthreads.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to
interrupt RPC calls too (as per the Solaris implementation).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When recovering from a delegation recall or a network partition, we need
to replay open(O_RDWR), open(O_RDONLY) and open(O_WRONLY) separately.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A closer reading of RFC3530 reveals that OPEN_DOWNGRADE must always
specify a access modes that have been the argument of a previous OPEN
operation.
IOW: doing OPEN(O_RDWR) and then OPEN_DOWNGRADE(O_WRONLY) is forbidden
unless the user called OPEN(O_WRONLY)
In order to fix that, we really need to track the three possible open
states separately.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
OPEN is a stateful operation, so we must ensure that it always
completes. In order to allow users to interrupt the operation,
we need to make the RPC call asynchronous, and then wait on
completion (or cancel).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFSv4 model requires us to complete all RPC calls that might
establish state on the server whether or not the user wants to
interrupt it. We may also need to schedule new work (including
new RPC calls) in order to cancel the new state.
The asynchronous RPC model will allow us to ensure that RPC calls
always complete, but in order to allow for "synchronous" RPC, we
want to add the ability to wait for completion.
The waits are, of course, interruptible.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Shrink the RPC task structure. Instead of storing separate pointers
for task->tk_exit and task->tk_release, put them in a structure.
Also pass the user data pointer as a parameter instead of passing it via
task->tk_calldata. This enables us to nest callbacks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we always initiate flushing of data before we exit
a single-page ->writepage() call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A typical nfsd call trace is
nfsd -> svc_process -> nfsd_dispatch -> nfsd3_proc_write ->
nfsd_write ->nfsd_vfs_write -> vfs_writev
These add up to over 300 bytes on the stack.
Looking at each of these, I see that nfsd_write (which includes
nfsd_vfs_write) contributes 0x8c to stack usage itself!!
It turns out this is because it puts a 'struct iattr' on the stack so
it can kill suid if needed. The following patch saves about 50 bytes
off the stack in this call path.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both vfs_getattr and i_op->fsync return error statuses which nfsd was
largely ignoring. This as noticed when exporting directories using fuse.
This patch cleans up most of the offences, which involves moving the call
to vfs_getattr out of the xdr encoding routines (where it is too late to
report an error) into the main NFS procedure handling routines.
There is still a called to vfs_gettattr (related to the ACL code) where the
status is ignored, and called to nfsd_sync_dir don't check return status
either.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split the checkpoint list of the transaction into two lists. In the first
list we keep the buffers that need to be submitted for IO. In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete. This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously invalid types were quietly changed to regular files, but at
revalidation the inode was changed to bad. This was rather inconsistent
behavior.
Now check if the type is valid on initial lookup, and return -EIO if not.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In direct_io mode, send at least one page per reqest. Previously it was
possible that reqests with zero data were sent, and hence the read/write
didn't make any progress, resulting in an infinite (though interruptible)
loop.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the maximum size of write data configurable by the filesystem. The
previous fixed 4096 limit only worked on architectures where the page size is
less or equal to this. This change make writing work on other architectures
too, and also lets the filesystem receive bigger write requests in direct_io
mode.
Normal writes which go through the page cache are still limited to a page
sized chunk per request.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the way a too large request is handled. Until now in this case the
device read returned -EINVAL and the operation returned -EIO.
Make it more flexibible by not returning -EINVAL from the read, but restarting
it instead.
Also remove the fixed limit on setxattr data and let the filesystem provide as
large a read buffer as it needs to handle the extended attribute data.
The symbolic link length is already checked by VFS to be less than PATH_MAX,
so the extra check against FUSE_SYMLINK_MAX is not needed.
The check in fuse_create_open() against FUSE_NAME_MAX is not needed, since the
dentry has already been looked up, and hence the name already checked.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make file operations on a bad inode fail. This just makes things a
bit more consistent.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for caching negative dentries.
Up till now, ->d_revalidate() always forced a new lookup on these. Now let
the lookup method return a zero node ID (not used for anything else) meaning a
negative entry, but with a positive cache timeout. The old way of signaling
negative entry (replying ENOENT) still works.
Userspace should check the ABI minor version to see whether sending a zero ID
is allowed by the kernel or not.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add 'frsize' member to the statfs reply.
I'm not sure if sending f_fsid will ever be needed, but just in case leave
some space at the end of the structure, so less compatibility mess would be
required.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change interface version to 7.4.
Following changes will need backward compatibility support, so store the minor
version returned by userspace.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by
S390, 64BIT and COMPAT.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Moved definition of CMS volume label to vtoc.h and modify partitions/ibm.c to
use this volume label definition instead of anonymous array.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes ramfs support shared-writable mmaps by:
(1) Attempting to perform a contiguous block allocation to the requested size
when truncate attempts to increase the file from zero size, such as
happens when:
fd = shm_open("/file/on/ramfs", ...):
ftruncate(fd, size_requested);
addr = mmap(NULL, subsize, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED,
fd, offset);
(2) Permitting any shared-writable mapping over any contiguous set of extant
pages. get_unmapped_area() will return the address into the actual ramfs
pages. The mapping may start anywhere and be of any size, but may not go
over the end of file. Multiple mappings may overlap in any way.
(3) Not permitting a file to be shrunk if it would truncate any shared
mappings (private mappings are copied).
Thus this patch provides support for POSIX shared memory on NOMMU kernels,
with certain limitations such as there being a large enough block of pages
available to support the allocation and it only working on directly mappable
filesystems.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimise rmap functions by minimising atomic operations when we know there
will be no concurrent modifications.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
supported. This helps us to safely use hugetlb pages in many more
applications. The patch makes the following changes. If needed, I also have
it broken out according to the following paragraphs.
1. Add a pair of functions to set/clear write access on huge ptes. The
writable check in make_huge_pte is moved out to the caller for use by COW
later.
2. Hugetlb copy-on-write requires special case handling in the following
situations:
- copy_hugetlb_page_range() - Copied pages must be write protected so
a COW fault will be triggered (if necessary) if those pages are written
to.
- find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the
page cache. MAP_PRIVATE pages still need to be locked however.
3. Provide hugetlb_cow() and calls from hugetlb_fault() and
hugetlb_no_page() which handles the COW fault by making the actual copy.
4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps
will be allowed. Make MAP_HUGETLB exempt from the depricated VM_RESERVED
mapping check.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nls_utf8 is available, and the check in hfsplus_fill_super checks the wrong
pointer for NULLness (it checks the saved nls, not the new one that it
needs to use.)
Signed-off-by: Joshua Kwan <joshk@triplehelix.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For filesystems with a blocksize < page size, we can merge same page
calls into the bio_vec at the end of the bio. This saves segments
on systems with a page size > the "normal" 4kb fs block size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@suse.de>
In particular, allow over-large read- or write-requests to be downgraded
to a more reasonable range, rather than considering them outright errors.
We want to protect lower layers from (the sadly all too common) overflow
conditions, but prefer to do so by chopping the requests up, rather than
just refusing them outright.
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I noticed that if sysfs_make_dirent fails to allocate the sd, then a
null will be passed to sysfs_put.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Block devices need to add the block device name to the symlink they put
in the device directory, otherwise multiple symlinks of the same name
can be created. This matches the class system, which works the same
way, we just forgot to convert block at the same time.
Cc: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The names of these events have been confusing from the beginning
on, as they have been more like claim/release events. We needed these
events for noticing HAL if storage devices have been mounted.
Thanks to Al, we have the proper solution now and can poll()
/proc/mounts instead to get notfied about mount tree changes.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- cluster/sys.c: make needlessly global code static
- dlm/: "extern" declarations for variables belong into header files
(and in this case, they are already in dlmdomain.h)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Link the code into the kernel build system. OCFS2 is marked as
experimental.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
dlmfs: A minimal dlm userspace interface implemented via a virtual
file system.
Most of the OCFS2 tools make use of this to take cluster locks when
doing operations on the file system.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
A distributed lock manager built with the cluster file system use case
in mind. The OCFS2 dlm exposes a VMS style API, though things have
been simplified internally. The only lock levels implemented currently
are NLMODE, PRMODE and EXMODE.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Node messaging via tcp. Used by the dlm and the file system for point
to point communication between nodes.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Disk based heartbeat. Configured and started from userspace, the
kernel component handles I/O submission and event generation via
callback mechanism.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
A simple node information service, filled and updated from
userspace. The rest of the stack queries this service for simple node
information.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Very simple printk wrapper which adds the ability to enable various
sets of debug messages at run-time.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
readpage(), prepare_write(), and commit_write() callers are updated to
understand the special return code AOP_TRUNCATED_PAGE in the style of
writepage() and WRITEPAGE_ACTIVATE. AOP_TRUNCATED_PAGE tells the caller that
the callee has unlocked the page and that the operation should be tried again
with a new page. OCFS2 uses this to detect and work around a lock inversion in
its aop methods. There should be no change in behaviour for methods that don't
return AOP_TRUNCATED_PAGE.
WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
made enums so that kerneldoc can be used to document their semantics.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Configfs, a file system for userspace-driven kernel object configuration.
The OCFS2 stack makes extensive use of this for propagation of cluster
configuration information into kernel.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
This patch removes all references to the bouncing address
rddunlap@osdl.org and one dead web page from the kernel.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
The old /proc interfaces were never updated to use loff_t, and are just
generally broken. Now, we should be using the seq_file interface for
all of the proc files, but converting the legacy functions is more work
than most people care for and has little upside..
But at least we can make the non-LFS rules explicit, rather than just
insanely wrapping the offset or something.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Prevent page->index << PAGE_CACHE_SHIFT from overflowing.
There is a casting there, but was added without care, so it's at the wrong
place. Note the extra parens around the shift - "+" is higher precedence than
"<<", leading to a GCC warning which saved all us.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial removal of unused variable from this file - doesn't even change the
generated assembly code, in fact (gcc should trigger a warning for unused value
here).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a Kconfig symbol SPARC that is defined on both the sparc and
sparc64 architectures.
This symbol makes some dependencies more readable.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFS client prevents mandatory lock, but there is a flaw on it; Locks are
possibly left if the mode is changed while locking.
This permits unlocking even if the mandatory lock bits are set.
Signed-off-by: ASANO Masahiro <masano@tnes.nec.co.jp>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's currently a diagnostic printk in relay_switch_subbuf() meant as
a warning if you accidentally try to log an event larger than the
sub-buffer size.
The problem is if this happens while logging from somewhere it's not
safe to be doing printks, such as in the scheduler, you can end up with
a deadlock. This patch removes the warning from relay_switch_subbuf()
and instead prints some diagnostic info when the channel is closed.
Thanks to Mathieu Desnoyers for pointing out the problem and
suggesting a fix.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We must check for MAY_SATTR before setting acls, which includes checking
for read-only exports: the lower-level setxattr operation that
eventually sets the acl cannot check export-level restrictions.
Bug reported by Martin Walter <mawa@uni-freiburg.de>.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When mixing -olock and -onolock mounts on the same client, we have to
check that fl->fl_u.nfs_fl.owner is set before dereferencing it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure we call unmap_mapping_range() and sync dirty pages to disk before
doing an NFS direct write.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This merge is pretty extensive. The conflict is over the new
req->retries parameter, so I had to change the prototype to
scsi_setup_blk_pc_cmnd() and the usage in sd, sr and st.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.
Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The following patch fixes a bug where if the journal is aborted, it can
leave a transaction open. The result will be a BUG when another code
path attempts to start a transaction and will get a "nesting into
different fs" error, since current->journal_info will be left non-NULL.
Original fix against SUSE kernel by Chris Mason <mason@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This should have been part of the original io error patch, but got
dropped somewhere along the way.
It's extremely important when handling the i/o error in the journal to
not commit the transaction with corrupt data. This patch adds that code
back in.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The below patch lets userspace have more control over the inodes that
inotify will watch. It introduces two new flags.
IN_ONLYDIR -- only watch the inode if it is a directory.
This is needed to avoid the race that can occur when we want to be
sure that we are watching a directory.
IN_DONT_FOLLOW -- don't follow a symlink. In combination
with IN_ONLYDIR we can make sure that we don't watch the target of
symlinks.
The issues the flags fix came up when writing the gnome-vfs inotify
backend. Default behaviour is unchanged.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit f549d6c18c introduced a generic
fallback for security xattrs, but appears to include a subtle bug.
Gentoo users with kernels with selinux compiled in, and coreutils compiled
with acl support, noticed that they could not copy files on tmpfs using
'cp'.
cp (compiled with acl support) copies the file, lists the extended
attributes on the old file, copies them all to the new file, and then
exits. However the listxattr() calls were failing with this odd behaviour:
llistxattr("a.out", (nil), 0) = 17
llistxattr("a.out", 0x7fffff8c6cb0, 17) = -1 ERANGE (Numerical result out of
range)
I believe this is a simple problem in the logic used to check the buffer
sizes; if the user sends a buffer the exact size of the data, then its ok
:)
This change solves the problem.
More info can be found at http://bugs.gentoo.org/113138
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Missing initialisation of attribute bitmask in _nfs4_proc_write()
- On success, _nfs4_proc_write() must return number of bytes written.
- Missing post_op_update_inode() in _nfs4_proc_write()
- Missing initialisation of attribute bitmask in _nfs4_proc_commit()
- Missing post_op_update_inode() in _nfs4_proc_commit()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we use set_page_writeback() in the appropriate places
to help the VM in keeping its page radix_tree in sync.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Steve Dickson writes:
Doing the following:
1. On server:
$ mkdir ~/t
$ echo Hello > ~/t/tmp
2. On client, wait for a string to appear in this file:
$ until grep -q foo t/tmp ; do echo -n . ; sleep 1 ; done
3. On server, create a *new* file with the same name containing that
string:
$ mv ~/t/tmp ~/t/tmp.old; echo foo > ~/t/tmp
will show how the client will never (and I mean never ;-) ) see
the updated file.
The problem is that we do not update nfsi->cache_change_attribute when the
file changes on the server (we only update it when our client makes the
changes). This again means that functions like nfs_check_verifier() will
fail to register when the parent directory has changed and should trigger
a dentry lookup revalidation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make sure cache_change_attribute is initialized to jiffies
so when the mtime changes on directory, the directory
will be refreshed.
Signed-off by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
assembling smb requests when setuids and Linux protocol extensions enabled
and in checking more matching sessions in multiuser mount mode.
Pointed out by Shaggy.
Signed-off-by: Steve French <sfrench@us.ibm.com>
the request queue. Also periodically wakeup response_q so threads can
check if stuck requests have timed out. Workaround Windows server illegal smb
length on transact2 findfirst response.
Signed-off-by: Steve French <sfrench@us.ibm.com>
disabled. Also set mode, uid, gid better on mkdir and create for the
case when Unix Extensions is not enabled and setuids is enabled. This is
necessary to fix the hole in which chown could be allowed for non-root
users in some cases if root mounted, and also to display the mode and uid
properly in some cases.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Access to a journaled HFS+ volume is not officially supported under Linux, so
mount such a volume read-only, but users can override this behaviour using the
"force" mount option.
The minimum requirement to relax this check is to at least check that the
journal is empty and so nothing needs to be replayed to make sure the volume
is consistent.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If an external device is used for a journal, by default it will use the
entire device. The reiserfs journal code allocates structures per journal
block when it mounts the file system. If the journal device is too large,
and memory cannot be allocated for the structures, it will continue and
ultimately panic when it can't pull one off the free list.
This patch handles the allocation failure gracefully and prints an error
message at mount time.
Changes: Updated error message to be more descriptive to the user.
Discussed and approved on ReiserFS Mailing List, Nov 28.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JFFS2 initialize f->sem mutex as "locked" in the slab constructor which is a
bug. Objects are freed with unlocked f->sem mutex. So, when they allocated
again, f->sem is unlocked because the slab cache constructor is not called for
them. The constructor is called only once when memory pages are allocated for
objects (namely, when the slab layer allocates new slabs). So, sometimes
'struct jffs2_inode_info' are allocated with unlocked f->sem, sometimes with
locked. This is a bug. Instead, initialize f->sem as unlocked in the
constructor. I.e., in the "constructed" state f->sem must be unlocked.
From: Keijiro Yano <keijiro_yano@yahoo.co.jp>
Acked-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Check for invalid node ID values in the new atomic create+open method.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check the created directory inode for aliases in the mkdir() method.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When quota file specified in mount options did not exist, we tried to
dereference NULL pointer later. Fix it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Assign the appropriate dentry operations to the dentry. Fixes memory leak.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch corrects the return value for the EXT3_IOC_GROUP_ADD in case it
fails due to the presence of multiple resizers at the filesystem.
The problem is a little bit more serious than a wrong return value in this
case, since the clause err=0 in the exit_journal path will lead to a call
to update_backups which in turns causes a NULL pointer dereference.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I now see another overflow in reiserfs that should lead to data corruptions
with files that are bigger than 4G under certain circumstances when using
mmap.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.
Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.
Sparc update from David in the next commit.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fix cifs negative dentries so they are freed faster (not requiring
umount or readdir e.g.) so the client recognizes the new file on
the server more quickly.
Signed-off-by: Steve French <sfrench@us.ibm.com>
In cases where the server has gone insane, nfs_update_inode() may end
up calling nfs_invalidate_inode(), which again calls stuff that takes
the inode->i_lock that we're already holding.
In addition, given the sort of things we have in NFS these days that
need to be cleaned up on inode release, I'm not sure we should ever
be calling make_bad_inode().
Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing
the caches, and marking the inode as being stale.
Thanks to Steve Dickson <SteveD@redhat.com> for spotting this.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When caching locks due to holding a file delegation, we must always
check against local locks before sending anything to the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
its queue of IO completion callbacks, thus creating the deadlock between
umount and xfslogd. Breaking the loop solves the problem.
SGI-PV: 943821
SGI-Modid: xfs-linux-melb:xfs-kern:202363a
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
left shift using PAGE_CACHE_SHIFT in fs/ntfs/file.c. Thanks to Andrew
Morton pointing this out to.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Work around gcc-2.95.x macro expansion bug.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When non-leader thread does exec, de_thread adds old leader to the init's
->children list in EXIT_ZOMBIE state and drops tasklist_lock.
This means that release_task(leader) in de_thread() is racy vs do_wait()
from init task.
I think de_thread() should set old leader's state to EXIT_DEAD instead.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: george anzinger <george@mvista.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, if a hugetlbfs is mounted without limits (the default), statfs()
will return -1 for max/free/used blocks. This does not appear to be in
line with normal convention: simple_statfs() and shmem_statfs() both return
0 in similar cases. Worse, it confuses the translation logic in
put_compat_statfs(), causing it to return -EOVERFLOW on such a mount.
This patch alters hugetlbfs_statfs() to return 0 for max/free/used blocks
on a mount without limits. Note that we need the test in the patch below,
rather than just using 0 in the sbinfo structure, because the -1 marked in
the free blocks field is used internally to tell the
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In fs/compat.c, whenever put_compat_statfs() returns an error, the
containing syscall returns -EFAULT. This is presumably by analogy with the
non-compat case, where any non-zero code from copy_to_user() should be
translated into an EFAULT. However, put_compat_statfs() is also return
-EOVERFLOW. The same applies for put_compat_statfs64().
This bug can be observed with a statfs() on a hugetlbfs directory.
hugetlbfs, when mounted without limits reports available, free and total
blocks as -1 (itself a bug, another patch coming). statfs() will
mysteriously return EFAULT although it's parameters are perfectly valid
addresses.
This patch causes the compat versions of statfs() and statfs64() to
correctly propogate the return values from put_compat_statfs() and
put_compat_statfs64().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Alexandra Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
From http://bugzilla.kernel.org/show_bug.cgi?id=4746
There is user data corruption when using ioctl(SIOCGIFCONF) in 32-bit
application running amd64 kernel. I do not think that this problem is
exploitable, but any data corruption may lead to security problems.
Following code demonstrates the problem
#include <stdint.h>
#include <stdio.h>
#include <sys/time.h>
#include <sys/socket.h>
#include <net/if.h>
#include <sys/ioctl.h>
char buf[256];
main()
{
int s = socket(AF_INET, SOCK_DGRAM, 0);
struct ifconf req;
int i;
req.ifc_buf = buf;
req.ifc_len = 41;
printf("Result %d\n", ioctl(s, SIOCGIFCONF, &req));
printf("Len %d\n", req.ifc_len);
for (i = 41; i < 256; i++)
if (buf[i] != 0)
printf("Byte %d is corrupted\n", i);
}
Steps to reproduce:
Compile the code above into 32-bit elf and run it. You'll get
Result 0
Len 32
Byte 48 is corrupted
Byte 52 is corrupted
Byte 53 is corrupted
Byte 54 is corrupted
Byte 55 is corrupted
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally for 2.6.16, but the semaphore causes problems for some
people so get rid of it now.
It's not needed anymore because the ioctl hash table is never changed
at run time now.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the case in which readdir reset file type when SFU mount option
specified.
Also fix sfu related functions to not request EAs (xattrs) when not
configured in Kconfig
Signed-off-by: Steve French <sfrench@us.ibm.com>
writev and aio_write to flush properly.
This is Christoph's patch merged with the new nobrl file operations
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
From: Christoph Hellwig <hch@lst.de>
- support vectored and async aio ops unconditionally - this is above
the pagecache and transparent to the fs
- remove cifs_read_wrapper. it was only doing silly checks and
calling generic_file_write in all cases.
- use do_sync_read/do_sync_write as read/write operations. They call
->readv/->writev which we now always implemente.
- add the filemap_fdatawrite calls to writev/aio_write which were
missing previously compared to plain write. no idea what the point
behind them is, but let's be consistent at least..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
So things like on-line resizing et al. work.
Based almost entirely upon a patch by Guido Gnther <agx@sigxcpu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a patch by Guido Guenther <agx@sigxcpu.org>.
Some of these ioctls had embedded time_t objects
or pointers, so needed translation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if
any, is done in the context of their owner who has allocated them on the
stack.
The sole user of a sync iocb's ctx reference was aio_complete() checking for
an elevated iocb ref count that could never happen. No path which grabs an
iocb ref has access to sync iocbs.
If we were to implement sync iocb cancelation it would be done by the owner of
the iocb using its on-stack reference.
Removing this chunk from aio_complete allows us to remove the entire kioctx
instance from mm_struct, reducing its size by a third. On a i386 testing box
the slab size went from 768 to 504 bytes and from 5 to 8 per page.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes lost referrence on ext3 current handle in
ext3_journalled_writepage().
Signed-Off-By: Denis Lunev <den@sw.ru>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The patch
http://linux.bkbits.net:8080/linux-2.6/diffs/fs/locks.c@1.70??nav=index.html
introduced a pretty nasty memory leak in the lease code. When freeing
the lease, the code in locks_delete_lock() will correctly clean up
the fasync queue, but when we return to fcntl_setlease(), the freed
fasync entry will be reinstated.
This patch ensures that we skip the call to fasync_helper() when we're
freeing up the lease.
Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Following Shaggy's suggestion, do a better job on the unicode string
handling routines in cifs in specifying that the wchar_t are really
little endian widechars (__le16).
Signed-off-by: Steve French <sfrench@us.ibm.com>
Linux-formatted jfs partitions have a different idea about what i_size
represents than partitions formatted on OS/2. The i_size calculation is
now based on the size of the directory index. For legacy partitions, which
have no directory index, the i_size is never being updated.
This patch adds back the original i_size calculations for legacy partitions.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
This patch makes a needlessly global function static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the needlessly global function path_lookup_create()
static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The frame buffer layer already had some code dealing with compat ioctls, this
patch moves over the remaining code from fs/compat_ioctl.c
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't implement these ioctls, but some architectures define them in the
headers. Bash picks them up and issues them frequently. Add compat_ioctl
handlers to silence warnings about unhandled copat ioctls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
->permission and ->lookup have a struct nameidata * argument these days to
pass down lookup intents. Unfortunately some callers of lookup_hash don't
actually pass this one down. For lookup_one_len() we don't have a struct
nameidata to pass down, but as this function is a library function only
used by filesystem code this is an acceptable limitation. All other
callers should pass down the nameidata, so this patch changes the
lookup_hash interface to only take a struct nameidata argument and derives
the other two arguments to __lookup_hash from it. All callers already have
the nameidata argument available so this is not a problem.
At the same time I'd like to deprecate the lookup_hash interface as there
are better exported interfaces for filesystem usage. Before it can
actually be removed I need to fix up rpc_pipefs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A few more callers of permission() just want to check for a different access
pattern on an already open file. This patch adds a wrapper for permission()
that takes a file in preparation of per-mount read-only support and to clean
up the callers a little. The helper is not intended for new code, everything
without the interface set in stone should use vfs_permission()
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Most permission() calls have a struct nameidata * available. This helper
takes that as an argument and thus makes sure we pass it down for lookup
intents and prepares for per-mount read-only support where we need a struct
vfsmount for checking whether a file is writeable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes an ancient changelog file.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The CONFIG_EXT{2,3}_CHECK options where were never available, and all they
did was to implement a subset of e2fsck in the kernel.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make reiserfs correctly return EDQUOT when the allocation failed due to
quotas (so far we just returned ENOSPC).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass down the silent flag to parse_options(). Without this fat gives
warnings when mounting some non-fat rootfs with options.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes almost all inclusions of linux/version.h. The 3
#defines are unused in most of the touched files.
A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
unfortunatly in linux/version.h.
There are also lots of #ifdef for long obsolete kernels, this was not
touched. In a few places, the linux/version.h include was move to where
the LINUX_VERSION_CODE was used.
quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`
search pattern:
/UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When non-leader thread does exec, de_thread calls release_task(leader) before
calling exit_itimers(). If local timer interrupt happens in between, it can
oops in send_group_sigqueue() while taking ->sighand->siglock == NULL.
However, we can't change send_group_sigqueue() to check p->signal != NULL,
because sys_timer_create() does get_task_struct() only in SIGEV_THREAD_ID
case. So it is possible that this task_struct was already freed and we can't
trust p->signal.
This patch changes de_thread() so that leader released after exit_itimers()
call.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If anyone needs a fully-functional befs driver, the easiest route to
that would probably be getting Haiku's befs driver to compile in
userland as a FUSE fs.
At any rate, attribute.c can go. It is easy enough to add back in if
anyone ever wants to do the (relativly minor) refactoring nessisary to
get it working.
Signed-off-by: Will Dyson <will.dyson@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
An unbindable mount does not forward or receive propagation. Also
unbindable mount disallows bind mounts. The semantics is as follows.
Bind semantics:
It is invalid to bind mount an unbindable mount.
Move semantics:
It is invalid to move an unbindable mount under shared mount.
Clone-namespace semantics:
If a mount is unbindable in the parent namespace, the corresponding
cloned mount in the child namespace becomes unbindable too. Note:
there is subtle difference, unbindable mounts cannot be bind mounted
but can be cloned during clone-namespace.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes bind, rbind, move, clone namespace and umount operations
aware of the semantics of slave mount (see Documentation/sharedsubtree.txt
in the last patch of the series for detailed description).
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A slave mount always has a master mount from which it receives
mount/umount events. Unlike shared mount the event propagation does not
flow from the slave mount to the master.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An unmount of a mount creates a umount event on the parent. If the
parent is a shared mount, it gets propagated to all mounts in the peer
group.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement handling of mount --move in presense of shared mounts (see
Documentation/sharedsubtree.txt in the end of patch series for detailed
description).
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement handling of MS_BIND in presense of shared mounts (see
Documentation/sharedsubtree.txt in the end of patch series for detailed
description).
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This creates shared mounts. A shared mount when bind-mounted to some
mountpoint, propagates mount/umount events to each other. All the
shared mounts that propagate events to each other belong to the same
peer-group.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A private mount does not forward or receive propagation. This patch
provides user the ability to convert any mount to private.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes the per-namespace semaphore in favor of a global semaphore.
This can have an effect on namespace scalability.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- clean up the ugliness in may_umount_tree()
- fix a bug in do_loopback(). after cloning a tree, do_loopback()
unlinks only the topmost mount of the cloned tree, leaving behind the
children mounts on their corresponding expiry list.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
umount is done under the protection of the namespace semaphore. This
can lead to intresting deadlocks when the last reference to a mount is
released, if filesystem code is in sufficiently nasty state.
This collects all the to-be-released-mounts and releases them after
releasing the namespace semaphore. That both reduces the time we are
holding namespace semaphore and gets the things more robust.
Idea proposed by Al Viro.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Old semantics: graft_tree() grabs a reference on the vfsmount before
returning success.
New one: graft_tree() leaves that to caller.
All the callers of graft_tree() immediately dropped that reference
anyway. Changing the interface takes care of this unnecessary overhead.
Idea proposed by Al Viro.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow caller of seq_open() to kmalloc() seq_file + whatever else they
want and set ->private_data to it. seq_open() will then abstain from
doing allocation itself.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- check_mnt() on the source of binding should've been unconditional
from the very beginning. My fault - as far I could've trace it,
that's an old thinko made back in 2001. Kudos to Miklos for spotting
it...
Fixed.
- code cleaned up.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The way we currently deal with quota and process accounting that might
keep vfsmount busy at umount time is inherently broken; we try to turn
them off just in case (not quite correctly, at that) and
a) pray umount doesn't fail (otherwise they'll stay turned off)
b) pray nobody doesn anything funny just as we turn quota off
Moreover, LSM provides hooks for doing the same sort of broken logics.
The proper way to deal with that is to introduce the second kind of
reference to vfsmount. Semantics:
- when the last normal reference is dropped, all special ones are
converted to normal ones and if there had been any, cleanup is done.
- normal reference can be cloned into a special one
- special reference can be converted to normal one; that's a no-op if
we'd already passed the point of no return (i.e. mntput() had
converted special references to normal and started cleanup).
The way it works: e.g. starting process accounting converts the vfsmount
reference pinned by the opened file into special one and turns it back
to normal when it gets shut down; acct_auto_close() is done when no
normal references are left. That way it does *not* obstruct umount(2)
and it silently gets turned off when the last normal reference to
vfsmount is gone. Which is exactly what we want...
The same should be done by LSM module that holds some internal
references to vfsmount and wants to shut them down on umount - it should
make them special and security_sb_umount_close() will be called exactly
when the last normal reference to vfsmount is gone.
quota handling is even simpler - we don't use normal file IO anymore, so
there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
deactivate_super(), where it really belongs.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the ability to the SMU driver to recover missing
calibration partitions from the SMU chip itself. It also adds some
dynamic mecanism to /proc/device-tree so that new properties are visible
to userland.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's no modular usage in the kernel and modules shouldn't use this
symbol.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the fs/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in fs/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert to proper kernel-doc format.
Some have extra blank lines (not allowed immed. after the function name)
or need blank lines (after all parameters). Function summary must be only
one line.
Colon (":") in a function description does weird things (causes kernel-doc
to think that it's a new section head sadly).
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add structure fields kernel-doc for 2 fields in struct journal_s.
Warning(/var/linsrc/linux-2614-rc4//include/linux/jbd.h:808): No description found for parameter 'j_wbuf'
Warning(/var/linsrc/linux-2614-rc4//include/linux/jbd.h:808): No description found for parameter 'j_wbufsize'
Convert fs/jbd/recovery.c non-static functions to kernel-doc format.
fs/jbd/recovery.c doesn't export any symbols, so it should use
!I instead of !E to eliminate this warning message:
Warning(/var/linsrc/linux-2614-rc4//fs/jbd/recovery.c): no structured comments found
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If an RPC socket is serving multiple programs, then the pg_authenticate of
the first program in the list is called, instead of pg_authenticate for the
program to be run.
This does not cause a problem with any programs in the current kernel, but
could confuse future code.
Also set pg_authenticate for nfsd_acl_program incase it ever gets used.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are a couple of tests which could possibly be confused by extremely
large numbers appearing in 'xdr' packets. I think the closest to an exploit
you could get would be writing random data from a free page into a file - i.e.
leak data out of kernel space.
I'm fairly sure they cannot be used for remote compromise.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a file in the NFSD filesystem that allows setting and querying of
which version of NFS are being exported. Changes are only allowed while no
server is running.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Most files in the nfsd filesystems are transaction files. You write a
request, and read a response.
For some (e.g. 'threads') it makes sense to just be able to read and get the
current value.
This functionality did exist but was broken recently when someone modified
nfsctl.c without going through the maintainer. This patch fixes the
regression.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a somewhat cosmetic fix to keep the SpecFS validation test from
complaining.
SpecFS want's to try chmod on symlinks, and ext3 and reiser (at least) return
ENOTSUPP.
Probably both sides are being silly, but it is easiest to simply make it a
non-issue and filter out chmod requests on symlinks at the nfsd level.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch passes the file handle supplied in iattr to userspace, in case the
->setattr() was invoked from sys_ftruncate(). This solves the permission
checking (or lack thereof) in ftruncate() for the class of filesystems served
by an unprivileged userspace process.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds an atomic create+open operation. This does not yet work if
the file type changes between lookup and create+open, but solves the
permission checking problems for the separte create and open methods.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new access call, which will only be called if ->permission is invoked
from sys_access(). In all other cases permission checking is delayed until
the actual filesystem operation.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch extends the iattr structure with a file pointer memeber, and adds
an ATTR_FILE validity flag for this member.
This is set if do_truncate() is invoked from ftruncate() or from
do_coredump().
The change is source and binary compatible.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both AFFS and HPFS update the ctime and mtime in the write path, after
generic_file_write returned and marked the inode dirty.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
afs actually had a write method that returned different errors depending on
whether some flag was set - better return the standard EINVAL errno.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No need to duplicate a generic readonly file ops table in befs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No need to duplicate a generic readonly file ops table in befs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch. This should now allow not to include sched.h
from module.h, which is done by a followup patch.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a memory leak possible in dentry_open(). If get_empty_filp()
fails, then the references to dentry and mnt need to be released. The
attached patch adds the calls to dput() and mntput() to release these two
references.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Get rid of the `int unused' parameter of __find_get_block_slow().
Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Data allocated with "__getname()" should always be free'd with "__putname()"
because of the AUDITSYSCALL code.
Signed-off-by: Davi Arnaut <davi.arnaut@gmail.com>
Cc: Urban Widmark <urban@teststation.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Data allocated with "__getname()" should always be free'd with "__putname()"
because of the AUDITSYSCALL code.
Signed-off-by: Davi Arnaut <davi.arnaut@gmail.com>
Cc: <rminnich@lanl.gov>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- "extern inline" -> "static inline"
- every file should #include the headers containing the prototypes for
it's global functions
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AIO was adding a new context's max requests to the global total before
testing if that resulting total was over the global limit. This let
innocent tasks get their new limit tested along with a racing guilty task
that was crossing the limit. This serializes the _nr accounting with a
spinlock It also switches to using unsigned long for the global totals.
Individual contexts are still limited to an unsigned int's worth of
requests by the syscall interface.
The problem and fix were verified with a simple program that spun creating
and destroying a context while holding on to another long lived context.
Before the patch a task creating a tiny context could get a spurious EAGAIN
if it raced with a task creating a very large context that overran the
limit.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The -EROFS check has moved up to permission() in the VFS a while ago.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In a case documented as
We should never be called with any of these states
BUG() in a case that would later result in a NULL pointer dereference.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reported by Eddy Petrisor <eddy.petrisor@gmail.com>
fs/built-in.o(.text+0x35fdc): In function `hfs_mdb_put':
: undefined reference to `unload_nls'
fs/built-in.o(.text+0x35ff1): In function `hfs_mdb_put':
: undefined reference to `unload_nls'
fs/built-in.o(.text+0x367a5): In function `parse_options':
super.c: undefined reference to `load_nls'
fs/built-in.o(.text+0x367db):super.c: undefined reference to `load_nls'
fs/built-in.o(.text+0x36938):super.c: undefined reference to `load_nls_default'
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the hlist_for_each_rcu() API, which is used only in one place, and
is trivially converted to hlist_for_each_entry_rcu(), making the code
shorter and more readable. Any out-of-tree uses may be similarly
converted.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a connector that reports fork, exec, id change, and exit
events for all processes to userspace. It replaces the fork_advisor patch
that ELSA is currently using. Applications that may find these events
useful include accounting/auditing (e.g. ELSA), system activity monitoring
(e.g. top), security, and resource management (e.g. CKRM).
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the backing_dev_info doesn't have BDI_CAP_NO_WRITEBACK we're not supposed
to write back an inode's pages. But in this situation write_inode_now()
refuses to write the inode itself as well. Fix.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames struct kmem_cache_s to kmem_cache so we can start using
it instead of kmem_cache_t typedef.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some callers to block-layer commit_write function treat non-zero return as
error, notably the loopback mount driver sometimes used in conjunction with
JFFS2 on NAND flash for bad block avoidance, etc. Return zero for success
as do various other commit_write functions.
Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- assume wbuf may be of size which is not power of 2
- don't make strange assumption about not padding wbuf for DataFlash
- use wbuf = DataFlash page and eraseblock >= 8 Dataflash pages
From: Peter Menzebach <pm-mtd@mw-itcon.de>
Acked-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simplify the debugging code further.
Update the TODO list
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Always keep valid data in reserved_size.
It did not cause problems, but the reservation code was unoptimal
when centralized summary was active or the size of the erase block
was very small.
Signed-off-by: Ferenc Havasi <havasi@inf.u-szeged.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Do the summary collection in the right place. If the device
was not writebuffered but had c->mtd->writev function
(e.g. blkmtd) the summary collector function was not called.
Signed-off-by: Ferenc Havasi <havasi@inf.u-szeged.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The goal of summary is to speed up the mount time. Erase block summary (EBS)
stores summary information at the end of every (closed) erase block. It is
no longer necessary to scan all nodes separetly (and read all pages of them)
just read this "small" summary, where every information is stored which is
needed at mount time.
This summary information is stored in a JFFS2_FEATURE_RWCOMPAT_DELETE. During
the mount process if there is no summary info the orignal scan process will
be executed. EBS works with NAND and NOR flashes, too.
There is a user space tool called sumtool to generate this summary
information for a JFFS2 image.
Signed-off-by: Ferenc Havasi <havasi@inf.u-szeged.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove support for virtual blocks, which are build by
concatenation of multiple physical erase blocks.
For more information please read the MTD mailing list thread
"[PATCH] remove support for virtual blocks"
Signed-off-by: Ferenc Havasi <havasi@inf.u-szeged.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When data starts from the beginning of NAND page, 'len' must be zero, not
c->wbuf_page.
Thanks to Zoltan Sogor for reporting this problem.
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
From: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Instead of building fragtree starting from node with the smallest version
number, start from the highest. This helps to avoid reading and checking
obsolete nodes.
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Replace the D1(printk()) style debugging with the new debug macros
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move functions to read inodes into readinode.c
Move functions to handle fragtree and dentry lists into nodelist.[ch]
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Small comment cleanups. Remove a unused macro
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rename functions to a name matching the functionality.
Remove stall debug code
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Various simplifiactions. printk format corrections.
Convert more code to use the new debug functions.
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When JFFS22 is unable to read the root inode, the bad root inode object is not
freed and remains sticked in the jffs2_i slab cache. When we further try to
free the slab cache (e.g., on rmmod jffs2), slab allocator subsystem panics.
Fix this bug.
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If debugging is disabled, define debugging functions as empty macros, instead
of using Dx() explicitly.
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
JFFS2 uses f->dents to store the pointer to the symlink target string (in case
the inode is symlink). This is somewhat ugly to use the same field for
different reasons. Introduce distinct field f->target for this purpose.
Note, f->fragtree, f->dents, f->target may probably be put in a union.
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move debug functions into a seperate source file
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix some dprintk's so that NLM, NFS client, and RPC client compile
cleanly if CONFIG_SYSCTL is disabled.
Test plan:
Compile kernel with CONFIG_NFS enabled and CONFIG_SYSCTL disabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we have a method of dealing with delegation recalls, actually
enable the caching of posix and BSD locks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Delegations allow us to cache posix and BSD locks, however when the
delegation is recalled, we need to "flush the cache" and send
the cached LOCK requests to the server.
This patch sets up the mechanism for doing so.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I missed this one... Any form of rename will result in a delegation
recall, so it is more efficient to return the one we hold before
trying the rename.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny
bits specified must be exactly equal to the union of the share_access and
share_deny bits specified for some subset of the OPENs in effect for
current openowner on the current file.
Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that
it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to
OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file
with O_WRONLY access mode.
Fix the problem by replacing nfs4_find_state() with a modified version of
nfs_find_open_context().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We must not remove the nfs4_state structure from the inode open lists
before we are in sequence lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cannot build XFS filesystem support as module with quota support. It
works only when the XFS filesystem support is compiled into the kernel.
Menuconfig prevents from setting CONFIG_XFS_FS=m and CONFIG_XFS_QUOTA=y.
How to reproduce: configure the XFS filesystem with quota support as
module. The resulting kernel won't have quota support compiled into
xfs.ko.
Fix: Changing the fs/xfs/Kconfig file from tristate to bool lets you
configure the quota support to be compiled into the XFS module. The
Makefile-linux-2.6 checks only for CONFIG_XFS_QUOTA=y.
Signed-off-by: Dimitri Puzin <tristan-777@ddkom-online.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Nathan Scott <nathans@sgi.com>
This is now used to issue a delayed allocation flush before reporting
quota, which allows the used space quota report to match reality.
Signed-off-by: Nathan Scott <nathans@sgi.com>
and leaf blocks. The problem cam from xfsqa test 117.
SGI-PV: 940655
SGI-Modid: xfs-linux:xfs-kern:201527a
Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
It seems logical.
Note that CONFIG_EXPERIMENTAL itself doesn't enable any code.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Spotted by Roger Willcocks <willcor @at@ gmail.com>
SGI-PV: 944858
SGI-Modid: xfs-linux:xfs-kern:201213a
Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
using xfs rt
SGI-PV: 944632
SGI-Modid: xfs-linux:xfs-kern:200983a
Signed-off-by: Eric Sandeen <sandeen@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
anymore and simplify the final put path a little
SGI-PV: 908809
SGI-Modid: xfs-linux:xfs-kern:200790a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
information gcc could not find out (that a directory always has a ..
entry), the others are outright gcc bugs.
SGI-PV: 943511
SGI-Modid: xfs-linux:xfs-kern:200055a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
the data/attr forks now grow up/down from either end of the literal area,
rather than dividing the literal area into two chunks and growing both
upward. Means we can now make much more efficient use of the attribute
space, incl. fitting DMF attributes inline in 256 byte inodes, and large
jumps in dbench3 performance numbers. It is self enabling, but can be
forced on/off via the attr2/noattr2 mount options.
SGI-PV: 941645
SGI-Modid: xfs-linux:xfs-kern:23837a
Signed-off-by: Nathan Scott <nathans@sgi.com>
the data/attr forks now grow up/down from either end of the literal area,
rather than dividing the literal area into two chunks and growing both
upward. Means we can now make much more efficient use of the attribute
space, incl. fitting DMF attributes inline in 256 byte inodes, and large
jumps in dbench3 performance numbers. It is self enabling, but can be
forced on/off via the attr2/noattr2 mount options.
SGI-PV: 941645
SGI-Modid: xfs-linux:xfs-kern:23836a
Signed-off-by: Nathan Scott <nathans@sgi.com>
the data/attr forks now grow up/down from either end of the literal area,
rather than dividing the literal area into two chunks and growing both
upward. Means we can now make much more efficient use of the attribute
space, incl. fitting DMF attributes inline in 256 byte inodes, and large
jumps in dbench3 performance numbers. It is self enabling, but can be
forced on/off via the attr2/noattr2 mount options.
SGI-PV: 941645
SGI-Modid: xfs-linux:xfs-kern:23835a
Signed-off-by: Nathan Scott <nathans@sgi.com>
filesystems to expose the filesystem stripe width in stat(2) rather than
the page cache size. This allows applications requiring high bandwidth to
easily determine the optimum I/O size for the underlying filesystem. The
default is to report the page cache size (i.e. "nolargeio").
SGI-PV: 942818
SGI-Modid: xfs-linux:xfs-kern:23830a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
replace PBF_NONE with an inverted PBF_DONE, so it's like all the other
flags.
SGI-PV: 942609
SGI-Modid: xfs-linux:xfs-kern:199136a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
writes. In addition flush the disk cache on fsync if the sync cached
operation didn't sync the log to disk (this requires some additional
bookeping in the transaction and log code). If the device doesn't claim to
support barriers, the filesystem has an extern log volume or the trial
superblock write with barriers enabled failed we disable barriers and
print a warning. We should probably fail the mount completely, but that
could lead to nasty boot failures for the root filesystem. Not enabled by
default yet, needs more destructive testing first.
SGI-PV: 912426
SGI-Modid: xfs-linux:xfs-kern:198723a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
reverse startup order
SGI-PV: 942063
SGI-Modid: xfs-linux:xfs-kern:198651a
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).
Signed-off-by: Jens Axboe <axboe@suse.de>
jfs has never been setting i_ctime or i_mtime when creating either hard
or symbolic links. I'm surprised nobody had noticed until now.
Thanks to Chris Spiegel for reporting the problem.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
When the inode count is zero in inode writeback, the
WARN_ON(!(inode->i_state & I_WILL_FREE));
is broken, and needs to test for either I_WILL_FREE|I_FREEING.
When the inode is in I_FREEING state, it's already out of the visibility
of the vm so it can't be freed so it doesn't require the __iget and the
generic_delete_inode path can call the sync internally to the lowlevel
fs callback during the last iput. So the inode being in I_FREEING is
also a valid condition for calling the sync with i_count == 0.
The specific stack trace is this:
0xc00000007b8fb6e0 0xc00000000010118c .__writeback_single_inode +0x5c
0xc00000007b8fb6e0 0xc0000000001014dc (lr) .sync_inode +0x3c
0xc00000007b8fb790 0xc0000000001014dc .sync_inode +0x3c
0xc00000007b8fb820 0xc0000000001a5020 .ext2_sync_inode +0x64
0xc00000007b8fb8f0 0xc0000000001a65b4 .ext2_truncate +0x3f8
0xc00000007b8fba40 0xc0000000001a6940 .ext2_delete_inode +0xdc
0xc00000007b8fbac0 0xc0000000000f7a5c .generic_delete_inode +0x124
0xc00000007b8fbb50 0xc0000000000f5fe0 .iput +0xb8
0xc00000007b8fbbe0 0xc0000000000e9fd4 .sys_unlink +0x2a8
0xc00000007b8fbd10 0xc00000000001048c .ret_from_syscall_1 +0x0
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes duplicate directory scanning code from fs/fat/dir.c. The
two functions that share identical code are fat_readdirx() and
fat_search_long(). This patch also renames fat_readdirx to __fat_readdir().
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now, vfat_rename() is using vfat_find() for sanity check. This removes that
sanity check, the cost of sanity check is too high.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a filesystem passes an idiotic blocksize into bread(), __getblk_slow() will
warn and will return NULL. We have a report (from Hubert Tonneau
<hubert.tonneau@fullpliant.org>) of isofs_fill_super() doing this (passing in
a silly block size) against an unplugged CDROM drive.
But a couple of __getblk_slow() callers forgot to check for the NULL bh, hence
oops.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds tests for the return value of sb_getblk() in the ext2/3
filesystems. In fs/buffer.c it is stated that the getblk() function never
fails. However, it does can return NULL in some situations due to I/O
errors, which may lead us to NULL pointer dereferences
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
list_move(&inode->i_list, &inode_in_use);
} else {
list_move(&inode->i_list, &inode_unused);
+ inodes_stat.nr_unused++;
}
}
wake_up_inode(inode);
Are you sure the above diff is correct? It was added somewhere between
2.6.5 and 2.6.8. I think it's wrong.
The only way I can imagine the i_count to be zero in the above path, is
that I_WILL_FREE is set. And if I_WILL_FREE is set, then we must not
increase nr_unused. So I believe the above change is buggy and it will
definitely overstate the number of unused inodes and it should be backed
out.
Note that __writeback_single_inode before calling __sync_single_inode, can
drop the spinlock and we can have both the dirty and locked bitflags clear
here:
spin_unlock(&inode_lock);
__wait_on_inode(inode);
iput(inode);
XXXXXXX
spin_lock(&inode_lock);
}
use inode again here
a construct like the above makes zero sense from a reference counting
standpoint.
Either we don't ever use the inode again after the iput, or the
inode_lock should be taken _before_ executing the iput (i.e. a __iput
would be required). Taking the inode_lock after iput means the iget was
useless if we keep using the inode after the iput.
So the only chance the 2.6 was safe to call __writeback_single_inode
with the i_count == 0, is that I_WILL_FREE is set (I_WILL_FREE will
prevent the VM to free the inode in XXXXX).
Potentially calling the above iput with I_WILL_FREE was also wrong
because it would recurse in iput_final (the second mainline bug).
The below (untested) patch fixes the nr_unused accounting, avoids recursing
in iput when I_WILL_FREE is set and makes sure (with the BUG_ON) that we
don't corrupt memory and that all holders that don't set I_WILL_FREE, keeps
a reference on the inode!
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix warnings from sparse due to un-declared functions that should either
have a header file or have been declared static
fs/ext2/bitmap.c:14:15: warning: symbol 'ext2_count_free' was not declared. Should it be static?
fs/ext2/namei.c:92:15: warning: symbol 'ext2_get_parent' was not declared. Should it be static?
fs/ext3/bitmap.c:15:15: warning: symbol 'ext3_count_free' was not declared. Should it be static?
fs/ext3/namei.c:1013:15: warning: symbol 'ext3_get_parent' was not declared. Should it be static?
fs/ext3/xattr.c:214:1: warning: symbol 'ext3_xattr_block_get' was not declared. Should it be static?
fs/ext3/xattr.c:358:1: warning: symbol 'ext3_xattr_block_list' was not declared. Should it be static?
fs/ext3/xattr.c:630:1: warning: symbol 'ext3_xattr_block_find' was not declared. Should it be static?
fs/ext3/xattr.c:863:1: warning: symbol 'ext3_xattr_ibody_find' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
de_thread() sends SIGKILL to all sub-threads and waits them to die in 'D'
state. It is possible that one of the threads already dequeued coredump
signal. When de_thread() unlocks ->sighand->lock that thread can enter
do_coredump()->coredump_wait() and cause a deadlock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Correct some typos and inconsistent use of "initialise" vs "initialize" in
comments. Reported by Ioannis Barkas.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I noticed some problems while running ext3 with the debug flag set on.
More precisely, I was unable to umount the filesystem. Some investigation
took me to the patch that follows.
At a first glance , the lock/unlock I've taken out seems really not
necessary, as the main code (outside debug) does not lock the super. The
only additional danger operations that debug code introduces seems to be
related to bitmap, but bitmap operations tends to be all atomic anyway.
I also took the opportunity to fix 2 spelling errors.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch deletes pointless code from coredump_wait().
1. It does useless mm->core_waiters inc/dec under mm->mmap_sem,
but any changes to ->core_waiters have no effect until we drop
->mmap_sem.
2. It calls yield() for absolutely unknown reason.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes incorrect error path in proc_get_inode(), when module
can't be get due to being unloaded. When try_module_get() fails, this
function puts de(!) and still returns inode with non-getted de.
There are still unresolved known bugs in proc yet to be fixed:
- proc_dir_entry tree is managed without any serialization
- create_proc_entry() doesn't setup de->owner anyhow,
so setting it later manually is inatomic.
- looks like almost all modules do not care whether
it's de->owner is set...
Signed-Off-By: Denis Lunev <den@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove last remains of NFS exportability support.
The code is actually buggy (as reported by Akshat Aranya), since 'alias'
will be leaked if it's non-null and alias->d_flags has DCACHE_DISCONNECTED.
This is not an active bug, since there will never be any disconnected
dentries. But it's better to get rid of the unnecessary complexity anyway.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
de_thread() calls del_timer_sync(->real_timer) under ->sighand->siglock.
This is deadlockable, it_real_fn sends a signal and needs this lock too.
Also, delete unneeded ->real_timer.data assignment.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
file_free_rcu) time, to reduce the size of this critical kernel object.
The trick I used is to move f_rcuhead and f_list in an union called f_u
The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
becomes f_u.f_list
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lookup_flags() is only called from the non-create case, so it needn't check
for O_CREAT|O_EXCL.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
task_struct is an internal structure to the kernel with a lot of good
information, that is probably interesting in core dumps. However there is
no way for user space to know what format that information is in making it
useless.
I grepped the GDB 6.3 source code and NT_TASKSTRUCT while defined is not
used anywhere else. So I would be surprised if anyone notices it is
missing.
In addition exporting kernel pointers to all the interesting kernel data
structures sounds like the very definition of an information leak. I
haven't a clue what someone with evil intentions could do with that
information, but in any attack against the kernel it looks like this is the
perfect tool for aiming that attack.
So since NT_TASKSTRUCT is useless as currently defined and is potentially
dangerous, let's just not export it.
(akpm: Daniel Jacobowitz <dan@debian.org> "would be amazed" if anything was
using NT_TASKSTRUCT).
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
TIOCSTART and TIOCSTOP are defined in asm/ioctls.h and asm/termios.h by
various architectures but not actually implemented anywhere but in the IRIX
compatibility layer, so remove their COMPATIBLE_IOCTL from parisc, ppc64
and sparc64.
Move the TIOCSLTC COMPATIBLE_IOCTL to common code, guided by an ifdef to
only show up on architectures that support it (same as the code handling it
in tty_ioctl.c), aswell as it's brother TIOCGLTC that wasn't handled so
far.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the problem (BUG 4964) with unmapped buffers in transaction's
t_sync_data list. The problem is we need to call filesystem's own
invalidatepage() from block_write_full_page().
block_write_full_page() must call filesystem's invalidatepage(). Otherwise
following nasty race can happen:
proc 1 proc 2
------ ------
- write some new data to 'offset'
=> bh gets to the transactions data list
- starts truncate
=> i_size set to new size
- mpage_writepages()
- ext3_ordered_writepage() to 'offset'
- block_write_full_page()
- page->index > end_index+1
- block_invalidatepage()
- discard_buffer()
- clear_buffer_mapped()
- commit triggers and finds unmapped buffer - BOOM!
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows SELinux to canonicalize the value returned from
getxattr() via the security_inode_getsecurity() hook, which is called after
the fs level getxattr() function.
The purpose of this is to allow the in-core security context for an inode
to override the on-disk value. This could happen in cases such as
upgrading a system to a different labeling form (e.g. standard SELinux to
MLS) without needing to do a full relabel of the filesystem.
In such cases, we want getxattr() to return the canonical security context
that the kernel is using rather than what is stored on disk.
The implementation hooks into the inode_getsecurity(), adding another
parameter to indicate the result of the preceding fs-level getxattr() call,
so that SELinux knows whether to compare a value obtained from disk with
the kernel value.
We also now allow getxattr() to work for mountpoint labeled filesystems
(i.e. mount with option context=foo_t), as we are able to return the
kernel value to the user.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add CONFIG_X86_32 for i386. This allows selecting options that only apply
to 32-bit systems.
(X86 && !X86_64) becomes X86_32
(X86 || X86_64) becomes X86
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Basic overcommit checking for hugetlb_file_map() based on an implementation
used with demand faulting in SLES9.
Since demand faulting can't guarantee the availability of pages at mmap
time, this patch implements a basic sanity check to ensure that the number
of huge pages required to satisfy the mmap are currently available.
Despite the obvious race, I think it is a good start on doing proper
accounting. I'd like to work towards an accounting system that mimics the
semantics of normal pages (especially for the MAP_PRIVATE/COW case). That
work is underway and builds on what this patch starts.
Huge page shared memory segments are simpler and still maintain their
commit on shmget semantics.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Below is a patch to implement demand faulting for huge pages. The main
motivation for changing from prefaulting to demand faulting is so that huge
page memory areas can be allocated according to NUMA policy.
Thanks to consolidated hugetlb code, switching the behavior requires changing
only one fault handler. The bulk of the patch just moves the logic from
hugelb_prefault() to hugetlb_pte_fault() and find_get_huge_page().
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reformat hugelbfs_forget_inode and add the missing but harmless
write_inode_now call. It looks the same as generic_forget_inode now except
for the call to truncate_hugepages instead of truncate_inode_pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
hugetlbfs_do_delete_inode is the same as generic_delete_inode now, so remove
it in favour of the latter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make hugetlbfs looks the same as generic_detelte_inode, fixing a bunch of
missing updates to it at the same time. Rename it to
hugetlbfs_do_delete_inode and add a real hugetlbfs_delete_inode that
implements ->delete_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move hugetlbfs accounting into ->alloc_inode / ->destroy_inode. This keeps
the code simpler, fixes a loeak where a failing inode allocation wouldn't
decrement the counter and moves hugetlbfs_delete_inode and
hugetlbfs_forget_inode closer to their generic counterparts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Final step in pushing down common core's page_table_lock. follow_page no
longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself;
and so no page_table_lock is taken in get_user_pages itself.
But get_user_pages (and get_futex_key) do then need follow_page to pin the
page for them: take Daniel's suggestion of bitflags to follow_page.
Need one for WRITE, another for TOUCH (it was the accessed flag before:
vanished along with check_user_page_readable, but surely get_numa_maps is
wrong to mark every page it finds as accessed), another for GET.
And another, ANON to dispose of untouched_anonymous_page: it seems silly for
that to descend a second time, let follow_page observe if there was no page
table and return ZERO_PAGE if so. Fix minor bug in that: check VM_LOCKED -
make_pages_present ought to make readonly anonymous present.
Give get_numa_maps a cond_resched while we're there.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the page_table_lock from around the calls to unmap_vmas, and replace
the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
now safe to descend without page_table_lock.
Don't attempt fancy locking for hugepages, just take page_table_lock in
unmap_hugepage_range. Which makes zap_hugepage_range, and the hugetlb test in
zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway. Nor
does unmap_vmas have much use for its mm arg now.
The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
page_table_lock: if they're implemented at all, they typically come down to
flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
(which we already audited for the mprotect case).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert those common loops using page_table_lock on the outside and
pte_offset_map within to use just pte_offset_map_lock within instead.
These all hold mmap_sem (some exclusively, some not), so at no level can a
page table be whipped away from beneath them. But whereas pte_alloc loops
tested with the "atomic" pmd_present, these loops are testing with pmd_none,
which on i386 PAE tests both lower and upper halves.
That's now unsafe, so add a cast into pmd_none to test only the vital lower
half: we lose a little sensitivity to a corrupt middle directory, but not
enough to worry about. It appears that i386 and UML were the only
architectures vulnerable in this way, and pgd and pud no problem.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Second step in pushing down the page_table_lock. Remove the temporary
bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
to hold page_table_lock, whether it's on init_mm or a user mm; take
page_table_lock internally to check if a racing task already allocated.
Convert their callers from common code. But avoid coming back to change them
again later: instead of moving the spin_lock(&mm->page_table_lock) down,
switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
encapsulate the mapping+locking and unlocking+unmapping together, and in the
end may use alternatives to the mm page_table_lock itself.
These callers all hold mmap_sem (some exclusively, some not), so at no level
can a page table be whipped away from beneath them; and pte_alloc uses the
"atomic" pmd_present to test whether it needs to allocate. It appears that on
all arches we can safely descend without page_table_lock.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability. Originally it was called whenever rss or
total_vm got raised. Then many of those callsites were replaced by a timer
tick call from account_system_time. Now Frank van Maarseveen reports that to
be found inadequate. How about this? Works for Frank.
Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths. Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit. Handle
mm->hiwater_vm in the same way, though it's much less of an issue. Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.
And there has been no collector of these hiwater statistics in the tree. The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).
There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high. A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.
What locking? None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact. But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PageReserved() calls from core code by tightening VM_RESERVED
handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged
in the page_alloc checks to help ensure we don't introduce any refcount
based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
deprecated. We never completely handled it correctly anyway, and is be
reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to
determine whether a struct page points to valid memory or not. This still
needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
thus mapcounted and count towards shared rss). These writes to the struct
page could cause excessive cacheline bouncing on big systems. There are a
number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I was lazy when we added anon_rss, and chose to change as few places as
possible. So currently each anonymous page has to be counted twice, in rss
and in anon_rss. Which won't be so good if those are atomic counts in some
configurations.
Change that around: keep file_rss and anon_rss separately, and add them
together (with get_mm_rss macro) when the total is needed - reading two
atomics is much cheaper than updating two atomics. And update anon_rss
upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but
that's not so good if an mm_counter_t is a special type. And how is rss
initialized? By set_mm_counter, all over the place. Come on, we just need to
initialize them both at once by set_mm_counter in mm_init (which follows the
memcpy when forking).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NUMA policy code predated nodemask_t so it used open coded bitmaps.
Convert everything to nodemask_t. Big patch, but shouldn't have any actual
behaviour changes (except I removed one unnecessary check against
node_online_map and one unnecessary BUG_ON)
Signed-off-by: "Andi Kleen" <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The xtTruncate code was only doing this for leaf pages. When a file is
horribly fragmented, we may truncate a file leaving an internal page with
an invalid head.next field, which may cause a stale page to be referenced.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create(). This patch
fixes up all in-kernel users of the function.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A "coldplug + udevstart" can be simple like this:
for i in /sys/block/*/*/uevent; do echo 1 > $i; done
for i in /sys/class/*/*/uevent; do echo 1 > $i; done
for i in /sys/bus/*/devices/*/uevent; do echo 1 > $i; done
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- ->releasepage() annotated (s/int/gfp_t), instances updated
- missing gfp_t in fs/* added
- fixed misannotation from the original sweep caught by bitwise checks:
XFS used __nocast both for gfp_t and for flags used by XFS allocator.
The latter left with unsigned int __nocast; we might want to add a
different type for those but for now let's leave them alone. That,
BTW, is a case when __nocast use had been actively confusing - it had
been used in the same code for two different and similar types, with
no way to catch misuses. Switch of gfp_t to bitwise had caught that
immediately...
One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications. Left alone for now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Beginning of gfp_t annotations:
- -Wbitwise added to CHECKFLAGS
- old __bitwise renamed to __bitwise__
- __bitwise defined to either __bitwise__ or nothing, depending on
__CHECK_ENDIAN__ being defined
- gfp_t switched from __nocast to __bitwise__
- force cast to gfp_t added to __GFP_... constants
- new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
the result to int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Optimise attribute revalidation when hardlinking. Add post-op attributes
for the directory and the original inode.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
"Optional" means that the close call will not fail if the getattr
at the end of the compound fails.
If it does succeed, try to refresh inode attributes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since the directory attributes change every time we CREATE a file,
we might as well pick up the new directory attributes in the same
compound.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_lookup() used to consult a lookup cache before trying an actual wire
lookup operation. The lookup cache would be invalid, of course, if the
parent directory's mtime had changed, so nfs_lookup performed an inode
revalidation on the parent.
Since nfs_lookup() doesn't use a cache anymore, the revalidation is no
longer necessary. There are cases where it will generate a lot of
unnecessary GETATTR traffic.
See http://bugzilla.linux-nfs.org/show_bug.cgi?id=9
Test-plan:
Use lndir and "rm -rf" and watch for excess GETATTR traffic or application
level errors.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since we almost always call nfs_end_data_update() after we called
nfs_refresh_inode(), we now end up marking the inode metadata
as needing revalidation immediately after having updated it.
This patch rearranges things so that we mark the inode as needing
revalidation _before_ we call nfs_refresh_inode() on those operations
that need it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow nfs_refresh_inode() also to update attributes on the inode if the
RPC call was sent after the last call to nfs_update_inode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
unaligned structures coming in off the wire
gcc on arm processors generates very odd code with pragma pack specified -
although it does pack the structures in some sense - it does not allow you
to access unaligned elements in nested structures at the right offset as other
architectures do. Oddly enough though, specifying the structures as packed
the long way - one by one with the packed attribute does work. Rather than
fighting over whether this is a gcc bug or some obscure side effect
of pragma pack, it is easier to do what most (all but 96 other places in
the kernel) do - and replace pragma pack with dozens of attribute(packed)
structure qualifiers. Much more verbose ... but at least it works.
Signed-off-by: David Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com> CG: -----------------------------------------------------------------------
fsck_hfs reveals lots of temporary files accumulating in the hidden
directory "\000\000\000HFS+ Private Data". According to the HFS+
documentation these are files which are unlinked while in use. However,
there may be a bug in the Linux hfsplus implementation which causes this to
happen even when the files are not in use. It looks like the "opencnt"
field is never initialized as (I think) it should be in hfsplus_read_inode.
This means that a file can appear to be still in use when in fact it has
been closed. This patch seems to fix it for me.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug which was reported and diagnosed by
Stefan Jones <stefan.jones@churchillrandoms.co.uk>
IDR trees include a cache of idr_layer objects. There's no way to destroy
this cache, so when we discard an overall idr tree we end up leaking some
memory.
Add and use idr_destroy() for this. v9fs and infiniband also need to use
idr_destroy() to avoid leaks.
Or, we make the cache global, like radix_tree_preload(). Which is probably
better. Later.
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Another case of missing call to security_file_permission: aio functions
(namely, io_submit) does not check credentials with security modules.
Below is the simple patch to the problem. It seems that it is enough to
check for rights at the request submission time.
Signed-off-by: Kostik Belousov <kostikbel@gmail.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
need to get in ahead of it that depend on that file handle. Fixes
occassional bad file handle errors on write with heavy use multiple process
cases.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Many thanks to Alberto Patino for testing and reporting the data
corruption. And many apologies for corrupting his partition.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
resp_len is passed in as buffer size to decode routine; make sure it's
set right in case where userspace provides less than a page's worth of
buffer.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Stop handing garbage to userspace in the case where a weird server clears the
acl bit in the getattr return (despite the fact that they've already claimed
acl support.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Storing a pointer to the struct rpc_task in the nfs_seqid is broken
since the nfs_seqid may be freed well after the task has been destroyed.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If someone tries to rename a directory onto an empty directory, we
currently fail and return EBUSY.
This patch ensures that we try the rename if both source and target
are directories, and that we fail with a correct error of EISDIR if
the source is not a directory.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We currently fail Connectathon test 6.10 in the case of 32-bit locks due
to incorrect error checking.
Also add support for l->l_len < 0 to 64-bit locks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server is in the unconfirmed OPEN state for a given open owner
and receives a second OPEN for the same open owner, it will cancel the
state of the first request and set up an OPEN_CONFIRM for the second.
This can cause a race that is discussed in rfc3530 on page 181.
The following patch allows the client to recover by retrying the
original open request.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This will allow nfs_permission() to perform additional optimizations when
walking the path, by folding the ACCESS(MAY_EXEC) call on the directory
into the lookup revalidation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make NFSv4 return the fully initialized file pointer with the
stateid that it created in the lookup w/intent.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is needed by NFSv4 for atomicity reasons: our open command is in
fact a lookup+open, so we need to be able to propagate open context
information from lookup() into the resulting struct file's
private_data field.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We no longer need to worry about collisions between close() and the state
recovery code, since the new close will automatically recheck the
file state once it is done waiting on its sequence slot.
Ditto for the nfs4_proc_locku() procedure.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Once the state_owner and lock_owner semaphores get removed, it will be
possible for other OPEN requests to reopen the same file if they have
lower sequence ids than our CLOSE call.
This patch ensures that we recheck the file state once
nfs_wait_on_sequence() has completed waiting.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4 file state-changing functions such as OPEN, CLOSE, LOCK,... are all
labelled with "sequence identifiers" in order to prevent the server from
reordering RPC requests, as this could cause its file state to
become out of sync with the client.
Currently the NFS client code enforces this ordering locally using
semaphores to restrict access to structures until the RPC call is done.
This, of course, only works with synchronous RPC calls, since the
user process must first grab the semaphore.
By dropping semaphores, and instead teaching the RPC engine to hold
the RPC calls until they are ready to be sent, we can extend this
process to work nicely with asynchronous RPC calls too.
This patch adds a new list called "rpc_sequence" that defines the order
of the RPC calls to be sent. We add one such list for each state_owner.
When an RPC call is ready to be sent, it checks if it is top of the
rpc_sequence list. If so, it proceeds. If not, it goes back to sleep,
and loops until it hits top of the list.
Once the RPC call has completed, it can then bump the sequence id counter,
and remove itself from the rpc_sequence list, and then wake up the next
sleeper.
Note that the state_owner sequence ids and lock_owner sequence ids are
all indexed to the same rpc_sequence list, so OPEN, LOCK,... requests
are all ordered w.r.t. each other.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
lock_kiocb() was introduced to serialize retrying and cancellation. In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb. Cancel has other problems and
has no significant in-tree users that have been complaining about it. So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently you do not get all the map entries on nommu systems because the
start function doesn't index into the list using the value of "pos".
Signed-off-by: David McCullough <davidm@snapgear.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oopsable since nfs_wait_on_inode() can get called as part of iput_final().
Unnecessary since the caller had better be damned sure that the inode won't
disappear from underneath it anyway.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the data cache has been marked as potentially invalid by nfs_refresh_inode,
we should invalidate it rather than assume that changes are due to our own
activity.
Also ensure that we always start with a valid cache before declaring it
to be protected by a delegation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"proc_smaps_operations" is not defined in case of "CONFIG_MMU=n".
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
New cifs_writepages routine was not updated bytes written in cifs stats.
Also added ability to clear /proc/fs/cifs/Stats by writing (0 or 1) to it.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Nir Tzachar <tzachar@cs.bgu.ac.il> points out that if an ELF file specifies a
zero-length bss at a whacky address, we cannot load that binary because
padzero() tries to zero out the end of the page at the whacky address, and
that may not be writeable.
See also http://bugzilla.kernel.org/show_bug.cgi?id=5411
So teach load_elf_binary() to skip the bss settng altogether if the elf file
has a zero-length bss segment.
Cc: Roland McGrath <roland@redhat.com>
Cc: Daniel Jacobowitz <dan@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a compatibility fix between Linux and Solaris when used with VxFS
filesystems: Solaris usually accepts acl entries in any order, but with
VxFS it replies with NFSERR_INVAL when it sees a four-entry acl that is not
in canonical form. It may also fail with other non-canonical acls -- I
can't tell, because that case never triggers: We only send non-canonical
acls when we fake up an ACL_MASK entry.
Instead of adding fake ACL_MASK entries at the end, inserting them in the
correct position makes Solaris+VxFS happy. The Linux client and server
sides don't care about entry order. The three-entry-acl special case in
which we need a fake ACL_MASK entry was handled in xdr_nfsace_encode. The
patch moves this into nfsacl_encode.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
v9fs_file_read and v9fs_file_write use kmalloc to allocate buffers as big
as the data buffer received as parameter. kmalloc cannot be used to
allocate buffers bigger than 128K, so reading/writing data in chunks bigger
than 128k fails.
This patch reorganizes v9fs_file_read and v9fs_file_write to allocate only
buffers as big as the maximum data that can be sent in one 9P message.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
file operations ->write(), ->aio_write(), and ->writev() for regular
files. This replaces the old use of generic_file_write(), et al and
the address space operations ->prepare_write and ->commit_write.
This means that both sparse and non-sparse (unencrypted and
uncompressed) files can now be extended using the normal write(2)
code path. There are two limitations at present and these are that
we never create sparse files and that we only have limited support
for highly fragmented files, i.e. ones whose data attribute is split
across multiple extents. When such a case is encountered,
EOPNOTSUPP is returned.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
and cond_resched() in the main loop as we could be dirtying a lot of
pages and this ensures we play nice with the VM and the system as a
whole.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>