commit 089d7720383d7bc9ca6b8824a05dfa66f80d1f41
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Sep 20 08:20:15 2017 +0200

    Linux 4.9.51

commit 7829684088a216b8b53894768cd4f483c246cb94
Author: Steffen Klassert <steffen.klassert@secunet.com>
Date:   Fri Aug 25 09:05:42 2017 +0200

    ipv6: Fix may be used uninitialized warning in rt6_check
    
    commit 3614364527daa870264f6dde77f02853cdecd02c upstream.
    
    rt_cookie might be used uninitialized, fix this by
    initializing it.
    
    Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
    Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ae04a8c4c6fc5b4aabfb166588045e2845b4d4e7
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Thu Aug 31 15:11:06 2017 -0700

    xfs: fix compiler warnings
    
    commit 7bf7a193a90cadccaad21c5970435c665c40fe27 upstream.
    
    Fix up all the compiler warnings that have crept in.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7b5fcb7fc05bdbce87e5bec9e358b059317ffb5f
Author: Song Liu <songliubraving@fb.com>
Date:   Thu Aug 24 09:53:59 2017 -0700

    md/raid5: release/flush io in raid5_do_work()
    
    commit 9c72a18e46ebe0f09484cce8ebf847abdab58498 upstream.
    
    In raid5, there are scenarios where some ios are deferred to a later
    time, and some IO need a flush to complete. To make sure we make
    progress with these IOs, we need to call the following functions:
    
        flush_deferred_bios(conf);
        r5l_flush_stripe_to_raid(conf->log);
    
    Both of these functions are called in raid5d(), but missing in
    raid5_do_work(). As a result, these functions are not called
    when multi-threading (group_thread_cnt > 0) is enabled. This patch
    adds calls to these function to raid5_do_work().
    
    Note for stable branches:
    
      r5l_flush_stripe_to_raid(conf->log) is need for 4.4+
      flush_deferred_bios(conf) is only needed for 4.11+
    
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Shaohua Li <shli@fb.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 81cb6f1a2a1964ed4d93604d1a3d49d92db2a01b
Author: Pan Bian <bianpan2016@163.com>
Date:   Sun Sep 17 14:07:12 2017 -0700

    xfs: use kmem_free to free return value of kmem_zalloc
    
    commit 6c370590cfe0c36bcd62d548148aa65c984540b7 upstream.
    
    In function xfs_test_remount_options(), kfree() is used to free memory
    allocated by kmem_zalloc(). But it is better to use kmem_free().
    
    Signed-off-by: Pan Bian <bianpan2016@163.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 772003c6a4282211487c9d33958594d7f2be7dd2
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Sep 17 14:07:11 2017 -0700

    xfs: open code end_buffer_async_write in xfs_finish_page_writeback
    
    commit 8353a814f2518dcfa79a5bb77afd0e7dfa391bb1 upstream.
    
    Our loop in xfs_finish_page_writeback, which iterates over all buffer
    heads in a page and then calls end_buffer_async_write, which also
    iterates over all buffers in the page to check if any I/O is in flight
    is not only inefficient, but also potentially dangerous as
    end_buffer_async_write can cause the page and all buffers to be freed.
    
    Replace it with a single loop that does the work of end_buffer_async_write
    on a per-page basis.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bb69e8a228a74c9aa7b70f6624e5c4fa1af70533
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Sep 17 14:07:10 2017 -0700

    xfs: don't set v3 xflags for v2 inodes
    
    commit dd60687ee541ca3f6df8758f38e6f22f57c42a37 upstream.
    
    Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
    if the inode isn't a v3 inode, because di_flags2 only exists on v3.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f46a61f686b0a8042ee4b7cb108ece81e3fb9401
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Sun Sep 17 14:07:09 2017 -0700

    xfs: fix incorrect log_flushed on fsync
    
    commit 47c7d0b19502583120c3f396c7559e7a77288a68 upstream.
    
    When calling into _xfs_log_force{,_lsn}() with a pointer
    to log_flushed variable, log_flushed will be set to 1 if:
    1. xlog_sync() is called to flush the active log buffer
    AND/OR
    2. xlog_wait() is called to wait on a syncing log buffers
    
    xfs_file_fsync() checks the value of log_flushed after
    _xfs_log_force_lsn() call to optimize away an explicit
    PREFLUSH request to the data block device after writing
    out all the file's pages to disk.
    
    This optimization is incorrect in the following sequence of events:
    
     Task A                    Task B
     -------------------------------------------------------
     xfs_file_fsync()
       _xfs_log_force_lsn()
         xlog_sync()
            [submit PREFLUSH]
                               xfs_file_fsync()
                                 file_write_and_wait_range()
                                   [submit WRITE X]
                                   [endio  WRITE X]
                                 _xfs_log_force_lsn()
                                   xlog_wait()
            [endio  PREFLUSH]
    
    The write X is not guarantied to be on persistent storage
    when PREFLUSH request in completed, because write A was submitted
    after the PREFLUSH request, but xfs_file_fsync() of task A will
    be notified of log_flushed=1 and will skip explicit flush.
    
    If the system crashes after fsync of task A, write X may not be
    present on disk after reboot.
    
    This bug was discovered and demonstrated using Josef Bacik's
    dm-log-writes target, which can be used to record block io operations
    and then replay a subset of these operations onto the target device.
    The test goes something like this:
    - Use fsx to execute ops of a file and record ops on log device
    - Every now and then fsync the file, store md5 of file and mark
      the location in the log
    - Then replay log onto device for each mark, mount fs and compare
      md5 of file to stored value
    
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Josef Bacik <jbacik@fb.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0e8d7e364ec546c44762664d30f4b1f6fd912197
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Sep 17 14:07:08 2017 -0700

    xfs: disable per-inode DAX flag
    
    commit 742d84290739ae908f1b61b7d17ea382c8c0073a upstream.
    
    Currently flag switching can be used to easily crash the kernel.  Disable
    the per-inode DAX flag until that is sorted out.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a46cf59265cf5282be0a488abc913e94db924e87
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:07 2017 -0700

    xfs: relog dirty buffers during swapext bmbt owner change
    
    commit 2dd3d709fc4338681a3aa61658122fa8faa5a437 upstream.
    
    The owner change bmbt scan that occurs during extent swap operations
    does not handle ordered buffer failures. Buffers that cannot be
    marked ordered must be physically logged so previously dirty ranges
    of the buffer can be relogged in the transaction.
    
    Since the bmbt scan may need to process and potentially log a large
    number of blocks, we can't expect to complete this operation in a
    single transaction. Update extent swap to use a permanent
    transaction with enough log reservation to physically log a buffer.
    Update the bmbt scan to physically log any buffers that cannot be
    ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
    caller rolls the transaction and restarts the scan. Finally, update
    the bmbt scan helper function to skip bmbt blocks that already match
    the expected owner so they are not reprocessed after scan restarts.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    [darrick: fix the xfs_trans_roll call]
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e2bb92633615ad801c4ab56fdb3eba3c701b2a3c
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:06 2017 -0700

    xfs: disallow marking previously dirty buffers as ordered
    
    commit a5814bceea48ee1c57c4db2bd54b0c0246daf54a upstream.
    
    Ordered buffers are used in situations where the buffer is not
    physically logged but must pass through the transaction/logging
    pipeline for a particular transaction. As a result, ordered buffers
    are not unpinned and written back until the transaction commits to
    the log. Ordered buffers have a strict requirement that the target
    buffer must not be currently dirty and resident in the log pipeline
    at the time it is marked ordered. If a dirty+ordered buffer is
    committed, the buffer is reinserted to the AIL but not physically
    relogged at the LSN of the associated checkpoint. The buffer log
    item is assigned the LSN of the latest checkpoint and the AIL
    effectively releases the previously logged buffer content from the
    active log before the buffer has been written back. If the tail
    pushes forward and a filesystem crash occurs while in this state, an
    inconsistent filesystem could result.
    
    It is currently the caller responsibility to ensure an ordered
    buffer is not already dirty from a previous modification. This is
    unclear and error prone when not used in situations where it is
    guaranteed a buffer has not been previously modified (such as new
    metadata allocations).
    
    To facilitate general purpose use of ordered buffers, update
    xfs_trans_ordered_buf() to conditionally order the buffer based on
    state of the log item and return the status of the result. If the
    bli is dirty, do not order the buffer and return false. The caller
    must either physically log the buffer (having acquired the
    appropriate log reservation) or push it from the AIL to clean it
    before it can be marked ordered in the current transaction.
    
    Note that ordered buffers are currently only used in two situations:
    1.) inode chunk allocation where previously logged buffers are not
    possible and 2.) extent swap which will be updated to handle ordered
    buffer failures in a separate patch.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a51e3e2cf3cbb306faa16784fd4f1791ee304816
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:05 2017 -0700

    xfs: move bmbt owner change to last step of extent swap
    
    commit 6fb10d6d22094bc4062f92b9ccbcee2f54033d04 upstream.
    
    The extent swap operation currently resets bmbt block owners before
    the inode forks are swapped. The bmbt buffers are marked as ordered
    so they do not have to be physically logged in the transaction.
    
    This use of ordered buffers is not safe as bmbt buffers may have
    been previously physically logged. The bmbt owner change algorithm
    needs to be updated to physically log buffers that are already dirty
    when/if they are encountered. This means that an extent swap will
    eventually require multiple rolling transactions to handle large
    btrees. In addition, all inode related changes must be logged before
    the bmbt owner change scan begins and can roll the transaction for
    the first time to preserve fs consistency via log recovery.
    
    In preparation for such fixes to the bmbt owner change algorithm,
    refactor the bmbt scan out of the extent fork swap code to the last
    operation before the transaction is committed. Update
    xfs_swap_extent_forks() to only set the inode log flags when an
    owner change scan is necessary. Update xfs_swap_extents() to trigger
    the owner change based on the inode log flags. Note that since the
    owner change now occurs after the extent fork swap, the inode btrees
    must be fixed up with the inode number of the current inode (similar
    to log recovery).
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f9e583edf1a71b7b40d5c5c492319a07ebe82d71
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:04 2017 -0700

    xfs: skip bmbt block ino validation during owner change
    
    commit 99c794c639a65cc7b74f30a674048fd100fe9ac8 upstream.
    
    Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
    owners on v5 (!rmapbt) filesystems. The bmbt scan uses
    xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
    current owner of the block against the parent inode of the bmbt.
    This works during extent swap because the bmbt owners are updated to
    the opposite inode number before the inode extent forks are swapped.
    
    The modified bmbt blocks are marked as ordered buffers which allows
    everything to commit in a single transaction. If the transaction
    commits to the log and the system crashes such that recovery of the
    extent swap is required, log recovery restarts the bmbt scan to fix
    up any bmbt blocks that may have not been written back before the
    crash. The log recovery bmbt scan occurs after the inode forks have
    been swapped, however. This causes the bmbt block owner verification
    to fail, leads to log recovery failure and requires xfs_repair to
    zap the log to recover.
    
    Define a new invalid inode owner flag to inform the btree block
    lookup mechanism that the current inode may be invalid with respect
    to the current owner of the bmbt block. Set this flag on the cursor
    used for change owner scans to allow this operation to work at
    runtime and during log recovery.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
    Cc: stable@vger.kernel.org
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit fe211e1744db41fb23b0a85f7cda87de8fab5ea2
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:03 2017 -0700

    xfs: don't log dirty ranges for ordered buffers
    
    commit 8dc518dfa7dbd079581269e51074b3c55a65a880 upstream.
    
    Ordered buffers are attached to transactions and pushed through the
    logging infrastructure just like normal buffers with the exception
    that they are not actually written to the log. Therefore, we don't
    need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
    called on ordered buffers to set up all of the dirty state on the
    transaction, buffer and log item and prepare the buffer for I/O.
    
    Now that xfs_trans_dirty_buf() is available, call it from
    xfs_trans_ordered_buf() so the latter is now mutually exclusive with
    xfs_trans_log_buf(). This reflects the implementation of ordered
    buffers and helps eliminate confusion over the need to log ranges of
    ordered buffers just to set up internal log state.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 19a87a9407654b6e46fff9f325cac0a11dec75f7
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:02 2017 -0700

    xfs: refactor buffer logging into buffer dirtying helper
    
    commit 9684010d38eccda733b61106765e9357cf436f65 upstream.
    
    xfs_trans_log_buf() is responsible for logging the dirty segments of
    a buffer along with setting all of the necessary state on the
    transaction, buffer, bli, etc., to ensure that the associated items
    are marked as dirty and prepared for I/O. We have a couple use cases
    that need to to dirty a buffer in a transaction without actually
    logging dirty ranges of the buffer.  One existing use case is
    ordered buffers, which are currently logged with arbitrary ranges to
    accomplish this even though the content of ordered buffers is never
    written to the log. Another pending use case is to relog an already
    dirty buffer across rolled transactions within the deferred
    operations infrastructure. This is required to prevent a held
    (XFS_BLI_HOLD) buffer from pinning the tail of the log.
    
    Refactor xfs_trans_log_buf() into a new function that contains all
    of the logic responsible to dirty the transaction, lidp, buffer and
    bli. This new function can be used in the future for the use cases
    outlined above. This patch does not introduce functional changes.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 93b64516019249fa196cc3cf4c9040270cf4106f
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:01 2017 -0700

    xfs: ordered buffer log items are never formatted
    
    commit e9385cc6fb7edf23702de33a2dc82965d92d9392 upstream.
    
    Ordered buffers pass through the logging infrastructure without ever
    being written to the log. The way this works is that the ordered
    buffer status is transferred to the log vector at commit time via
    the ->iop_size() callback. In xlog_cil_insert_format_items(),
    ordered log vectors bypass ->iop_format() processing altogether.
    
    Therefore it is unnecessary for xfs_buf_item_format() to handle
    ordered buffers. Remove the unnecessary logic and assert that an
    ordered buffer never reaches this point.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ba986b3c84987bbc5e52d8ab83a851e613ce4001
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:07:00 2017 -0700

    xfs: remove unnecessary dirty bli format check for ordered bufs
    
    commit 6453c65d3576bc3e602abb5add15f112755c08ca upstream.
    
    xfs_buf_item_unlock() historically checked the dirty state of the
    buffer by manually checking the buffer log formats for dirty
    segments. The introduction of ordered buffers invalidated this check
    because ordered buffers have dirty bli's but no dirty (logged)
    segments. The check was updated to accommodate ordered buffers by
    looking at the bli state first and considering the blf only if the
    bli is clean.
    
    This logic is safe but unnecessary. There is no valid case where the
    bli is clean yet the blf has dirty segments. The bli is set dirty
    whenever the blf is logged (via xfs_trans_log_buf()) and the blf is
    cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()).
    
    Remove the conditional blf dirty checks and replace with an assert
    that should catch any discrepencies between bli and blf dirty
    states. Refactor the old blf dirty check into a helper function to
    be used by the assert.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0f5af7eae8846fd73d01ecbe0d60309560084a74
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:59 2017 -0700

    xfs: open-code xfs_buf_item_dirty()
    
    commit a4f6cf6b2b6b60ec2a05a33a32e65caa4149aa2b upstream.
    
    It checks a single flag and has one caller. It probably isn't worth
    its own function.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 81286ade81f73e895fe2edf89f3e8054a595ebe5
Author: Omar Sandoval <osandov@fb.com>
Date:   Sun Sep 17 14:06:58 2017 -0700

    xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster()
    
    commit f2e9ad212def50bcf4c098c6288779dd97fff0f0 upstream.
    
    After xfs_ifree_cluster() finds an inode in the radix tree and verifies
    that the inode number is what it expected, xfs_reclaim_inode() can swoop
    in and free it. xfs_ifree_cluster() will then happily continue working
    on the freed inode. Most importantly, it will mark the inode stale,
    which will probably be overwritten when the inode slab object is
    reallocated, but if it has already been reallocated then we can end up
    with an inode spuriously marked stale.
    
    In 8a17d7ddedb4 ("xfs: mark reclaimed inodes invalid earlier") we added
    a second check to xfs_iflush_cluster() to detect this race, but the
    similar RCU lookup in xfs_ifree_cluster() needs the same treatment.
    
    Signed-off-by: Omar Sandoval <osandov@fb.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 63d184d2955bab0584acc10b502e415ce23394b1
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun Sep 17 14:06:57 2017 -0700

    xfs: evict all inodes involved with log redo item
    
    commit 799ea9e9c59949008770aab4e1da87f10e99dbe4 upstream.
    
    When we introduced the bmap redo log items, we set MS_ACTIVE on the
    mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
    from being truncated prematurely during log recovery.  This also had the
    effect of putting linked inodes on the lru instead of evicting them.
    
    Unfortunately, we neglected to find all those unreferenced lru inodes
    and evict them after finishing log recovery, which means that we leak
    them if anything goes wrong in the rest of xfs_mountfs, because the lru
    is only cleaned out on unmount.
    
    Therefore, evict unreferenced inodes in the lru list immediately
    after clearing MS_ACTIVE.
    
    Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Cc: viro@ZenIV.linux.org.uk
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 536932f39e93411c48a165c9c859e806c8989301
Author: Carlos Maiolino <cmaiolino@redhat.com>
Date:   Sun Sep 17 14:06:56 2017 -0700

    xfs: stop searching for free slots in an inode chunk when there are none
    
    commit 2d32311cf19bfb8c1d2b4601974ddd951f9cfd0b upstream.
    
    In a filesystem without finobt, the Space manager selects an AG to alloc a new
    inode, where xfs_dialloc_ag_inobt() will search the AG for the free slot chunk.
    
    When the new inode is in the same AG as its parent, the btree will be searched
    starting on the parent's record, and then retried from the top if no slot is
    available beyond the parent's record.
    
    To exit this loop though, xfs_dialloc_ag_inobt() relies on the fact that the
    btree must have a free slot available, once its callers relied on the
    agi->freecount when deciding how/where to allocate this new inode.
    
    In the case when the agi->freecount is corrupted, showing available inodes in an
    AG, when in fact there is none, this becomes an infinite loop.
    
    Add a way to stop the loop when a free slot is not found in the btree, making
    the function to fall into the whole AG scan which will then, be able to detect
    the corruption and shut the filesystem down.
    
    As pointed by Brian, this might impact performance, giving the fact we
    don't reset the search distance anymore when we reach the end of the
    tree, giving it fewer tries before falling back to the whole AG search, but
    it will only affect searches that start within 10 records to the end of the tree.
    
    Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6b6505d90b77f98b0ce08a8332f03cb62f97c78f
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:55 2017 -0700

    xfs: add log recovery tracepoint for head/tail
    
    commit e67d3d4246e5fbb0c7c700426d11241ca9c6f473 upstream.
    
    Torn write detection and tail overwrite detection can shift the log
    head and tail respectively in the event of CRC mismatch or
    corruption errors. Add a high-level log recovery tracepoint to dump
    the final log head/tail and make those values easily attainable in
    debug/diagnostic situations.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7549e7c01fb0220e47515ad3ee52f46e2742f178
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:54 2017 -0700

    xfs: handle -EFSCORRUPTED during head/tail verification
    
    commit a4c9b34d6a17081005ec459b57b8effc08f4c731 upstream.
    
    Torn write and tail overwrite detection both trigger only on
    -EFSBADCRC errors. While this is the most likely failure scenario
    for each condition, -EFSCORRUPTED is still possible in certain cases
    depending on what ends up on disk when a torn write or partial tail
    overwrite occurs. For example, an invalid log record h_len can lead
    to an -EFSCORRUPTED error when running the log recovery CRC pass.
    
    Therefore, update log head and tail verification to trigger the
    associated head/tail fixups in the event of -EFSCORRUPTED errors
    along with -EFSBADCRC. Also, -EFSCORRUPTED can currently be returned
    from xlog_do_recovery_pass() before rhead_blk is initialized if the
    first record encountered happens to be corrupted. This leads to an
    incorrect 'first_bad' return value. Initialize rhead_blk earlier in
    the function to address that problem as well.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 47db1fc608b89820f712ab7806b0bd4d4ed69c16
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:53 2017 -0700

    xfs: fix log recovery corruption error due to tail overwrite
    
    commit 4a4f66eac4681378996a1837ad1ffec3a2e2981f upstream.
    
    If we consider the case where the tail (T) of the log is pinned long
    enough for the head (H) to push and block behind the tail, we can
    end up blocked in the following state without enough free space (f)
    in the log to satisfy a transaction reservation:
    
            0       phys. log       N
            [-------HffT---H'--T'---]
    
    The last good record in the log (before H) refers to T. The tail
    eventually pushes forward (T') leaving more free space in the log
    for writes to H. At this point, suppose space frees up in the log
    for the maximum of 8 in-core log buffers to start flushing out to
    the log. If this pushes the head from H to H', these next writes
    overwrite the previous tail T. This is safe because the items logged
    from T to T' have been written back and removed from the AIL.
    
    If the next log writes (H -> H') happen to fail and result in
    partial records in the log, the filesystem shuts down having
    overwritten T with invalid data. Log recovery correctly locates H on
    the subsequent mount, but H still refers to the now corrupted tail
    T. This results in log corruption errors and recovery failure.
    
    Since the tail overwrite results from otherwise correct runtime
    behavior, it is up to log recovery to try and deal with this
    situation. Update log recovery tail verification to run a CRC pass
    from the first record past the tail to the head. This facilitates
    error detection at T and moves the recovery tail to the first good
    record past H' (similar to truncating the head on torn write
    detection). If corruption is detected beyond the range possibly
    affected by the max number of iclogs, the log is legitimately
    corrupted and log recovery failure is expected.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e34b72a2381e6432b9eab07a3ec285b59a80e45f
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:52 2017 -0700

    xfs: always verify the log tail during recovery
    
    commit 5297ac1f6d7cbf45464a49b9558831f271dfc559 upstream.
    
    Log tail verification currently only occurs when torn writes are
    detected at the head of the log. This was introduced because a
    change in the head block due to torn writes can lead to a change in
    the tail block (each log record header references the current tail)
    and the tail block should be verified before log recovery proceeds.
    
    Tail corruption is possible outside of torn write scenarios,
    however. For example, partial log writes can be detected and cleared
    during the initial head/tail block discovery process. If the partial
    write coincides with a tail overwrite, the log tail is corrupted and
    recovery fails.
    
    To facilitate correct handling of log tail overwites, update log
    recovery to always perform tail verification. This is necessary to
    detect potential tail overwrite conditions when torn writes may not
    have occurred. This changes normal (i.e., no torn writes) recovery
    behavior slightly to detect and return CRC related errors near the
    tail before actual recovery starts.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 35093926c2f8bd259e50b73685f638095cc59c89
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:51 2017 -0700

    xfs: fix recovery failure when log record header wraps log end
    
    commit 284f1c2c9bebf871861184b0e2c40fa921dd380b upstream.
    
    The high-level log recovery algorithm consists of two loops that
    walk the physical log and process log records from the tail to the
    head. The first loop handles the case where the tail is beyond the
    head and processes records up to the end of the physical log. The
    subsequent loop processes records from the beginning of the physical
    log to the head.
    
    Because log records can wrap around the end of the physical log, the
    first loop mentioned above must handle this case appropriately.
    Records are processed from in-core buffers, which means that this
    algorithm must split the reads of such records into two partial
    I/Os: 1.) from the beginning of the record to the end of the log and
    2.) from the beginning of the log to the end of the record. This is
    further complicated by the fact that the log record header and log
    record data are read into independent buffers.
    
    The current handling of each buffer correctly splits the reads when
    either the header or data starts before the end of the log and wraps
    around the end. The data read does not correctly handle the case
    where the prior header read wrapped or ends on the physical log end
    boundary. blk_no is incremented to or beyond the log end after the
    header read to point to the record data, but the split data read
    logic triggers, attempts to read from an invalid log block and
    ultimately causes log recovery to fail. This can be reproduced
    fairly reliably via xfstests tests generic/047 and generic/388 with
    large iclog sizes (256k) and small (10M) logs.
    
    If the record header read has pushed beyond the end of the physical
    log, the subsequent data read is actually contiguous. Update the
    data read logic to detect the case where blk_no has wrapped, mod it
    against the log size to read from the correct address and issue one
    contiguous read for the log data buffer. The log record is processed
    as normal from the buffer(s), the loop exits after the current
    iteration and the subsequent loop picks up with the first new record
    after the start of the log.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0800356def7f3ede34986adeeb03235176297eb0
Author: Carlos Maiolino <cmaiolino@redhat.com>
Date:   Sun Sep 17 14:06:50 2017 -0700

    xfs: Properly retry failed inode items in case of error during buffer writeback
    
    commit d3a304b6292168b83b45d624784f973fdc1ca674 upstream.
    
    When a buffer has been failed during writeback, the inode items into it
    are kept flush locked, and are never resubmitted due the flush lock, so,
    if any buffer fails to be written, the items in AIL are never written to
    disk and never unlocked.
    
    This causes unmount operation to hang due these items flush locked in AIL,
    but this also causes the items in AIL to never be written back, even when
    the IO device comes back to normal.
    
    I've been testing this patch with a DM-thin device, creating a
    filesystem larger than the real device.
    
    When writing enough data to fill the DM-thin device, XFS receives ENOSPC
    errors from the device, and keep spinning on xfsaild (when 'retry
    forever' configuration is set).
    
    At this point, the filesystem can not be unmounted because of the flush locked
    items in AIL, but worse, the items in AIL are never retried at all
    (once xfs_inode_item_push() will skip the items that are flush locked),
    even if the underlying DM-thin device is expanded to the proper size.
    
    This patch fixes both cases, retrying any item that has been failed
    previously, using the infra-structure provided by the previous patch.
    
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7942f605c3086abe6c9f61f29e9326c48d5c8095
Author: Carlos Maiolino <cmaiolino@redhat.com>
Date:   Sun Sep 17 14:06:49 2017 -0700

    xfs: Add infrastructure needed for error propagation during buffer IO failure
    
    commit 0b80ae6ed13169bd3a244e71169f2cc020b0c57a upstream.
    
    With the current code, XFS never re-submit a failed buffer for IO,
    because the failed item in the buffer is kept in the flush locked state
    forever.
    
    To be able to resubmit an log item for IO, we need a way to mark an item
    as failed, if, for any reason the buffer which the item belonged to
    failed during writeback.
    
    Add a new log item callback to be used after an IO completion failure
    and make the needed clean ups.
    
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1ba04933408e4b4567f557d363f7bdecfabe9399
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Sep 17 14:06:48 2017 -0700

    xfs: remove xfs_trans_ail_delete_bulk
    
    commit 27af1bbf524459962d1477a38ac6e0b7f79aaecc upstream.
    
    xfs_iflush_done uses an on-stack variable length array to pass the log
    items to be deleted to xfs_trans_ail_delete_bulk.  On-stack VLAs are a
    nasty gcc extension that can lead to unbounded stack allocations, but
    fortunately we can easily avoid them by simply open coding
    xfs_trans_ail_delete_bulk in xfs_iflush_done, which is the only caller
    of it except for the single-item xfs_trans_ail_delete.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9a3f752290907e7bfa80a333e4965574932f5670
Author: Eric Sandeen <sandeen@sandeen.net>
Date:   Sun Sep 17 14:06:47 2017 -0700

    xfs: toggle readonly state around xfs_log_mount_finish
    
    commit 6f4a1eefdd0ad4561543270a7fceadabcca075dd upstream.
    
    When we do log recovery on a readonly mount, unlinked inode
    processing does not happen due to the readonly checks in
    xfs_inactive(), which are trying to prevent any I/O on a
    readonly mount.
    
    This is misguided - we do I/O on readonly mounts all the time,
    for consistency; for example, log recovery.  So do the same
    RDONLY flag twiddling around xfs_log_mount_finish() as we
    do around xfs_log_mount(), for the same reason.
    
    This all cries out for a big rework but for now this is a
    simple fix to an obvious problem.
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 01d38e380746e5880d744c634f0c459ea6646dd9
Author: Eric Sandeen <sandeen@sandeen.net>
Date:   Sun Sep 17 14:06:46 2017 -0700

    xfs: write unmount record for ro mounts
    
    commit 757a69ef6cf2bf839bd4088e5609ddddd663b0c4 upstream.
    
    There are dueling comments in the xfs code about intent
    for log writes when unmounting a readonly filesystem.
    
    In xfs_mountfs, we see the intent:
    
    /*
     * Now the log is fully replayed, we can transition to full read-only
     * mode for read-only mounts. This will sync all the metadata and clean
     * the log so that the recovery we just performed does not have to be
     * replayed again on the next mount.
     */
    
    and it calls xfs_quiesce_attr(), but by the time we get to
    xfs_log_unmount_write(), it returns early for a RDONLY mount:
    
     * Don't write out unmount record on read-only mounts.
    
    Because of this, sequential ro mounts of a filesystem with
    a dirty log will replay the log each time, which seems odd.
    
    Fix this by writing an unmount record even for RO mounts, as long
    as norecovery wasn't specified (don't write a clean log record
    if a dirty log may still be there!) and the log device is
    writable.
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ec0d46ef8b7e35b4f7c82bcf12afbe96b711350f
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Sep 17 14:06:45 2017 -0700

    iomap: fix integer truncation issues in the zeroing and dirtying helpers
    
    commit e28ae8e428fefe2facd72cea9f29906ecb9c861d upstream.
    
    Fix the min_t calls in the zeroing and dirtying helpers to perform the
    comparisms on 64-bit types, which prevents them from incorrectly
    being truncated, and larger zeroing operations being stuck in a never
    ending loop.
    
    Special thanks to Markus Stockhausen for spotting the bug.
    
    Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
    Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e1a7b7e1f6c294f64602b9cb1c15d44432f48561
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun Sep 17 14:06:44 2017 -0700

    xfs: don't leak quotacheck dquots when cow recovery
    
    commit 77aff8c76425c8f49b50d0b9009915066739e7d2 upstream.
    
    If we fail a mount on account of cow recovery errors, it's possible that
    a previous quotacheck left some dquots in memory.  The bailout clause of
    xfs_mountfs forgets to purge these, and so we leak them.  Fix that.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7fb3e5e373bb45342c6909ea8320010c461b4082
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun Sep 17 14:06:43 2017 -0700

    xfs: clear MS_ACTIVE after finishing log recovery
    
    commit 8204f8ddaafafcae074746fcf2a05a45e6827603 upstream.
    
    Way back when we established inode block-map redo log items, it was
    discovered that we needed to prevent the VFS from evicting inodes during
    log recovery because any given inode might be have bmap redo items to
    replay even if the inode has no link count and is ultimately deleted,
    and any eviction of an unlinked inode causes the inode to be truncated
    and freed too early.
    
    To make this possible, we set MS_ACTIVE so that inodes would not be torn
    down immediately upon release.  Unfortunately, this also results in the
    quota inodes not being released at all if a later part of the mount
    process should fail, because we never reclaim the inodes.  So, set
    MS_ACTIVE right before we do the last part of log recovery and clear it
    immediately after we finish the log recovery so that everything
    will be torn down properly if we abort the mount.
    
    Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8edd73a13dc03d4bdcb25d9273908a901f880d09
Author: Omar Sandoval <osandov@fb.com>
Date:   Sun Sep 17 14:06:42 2017 -0700

    xfs: fix inobt inode allocation search optimization
    
    commit c44245b3d5435f533ca8346ece65918f84c057f9 upstream.
    
    When we try to allocate a free inode by searching the inobt, we try to
    find the inode nearest the parent inode by searching chunks both left
    and right of the chunk containing the parent. As an optimization, we
    cache the leftmost and rightmost records that we previously searched; if
    we do another allocation with the same parent inode, we'll pick up the
    search where it last left off.
    
    There's a bug in the case where we found a free inode to the left of the
    parent's chunk: we need to update the cached left and right records, but
    because we already reassigned the right record to point to the left, we
    end up assigning the left record to both the cached left and right
    records.
    
    This isn't a correctness problem strictly, but it can result in the next
    allocation rechecking chunks unnecessarily or allocating inodes further
    away from the parent than it needs to. Fix it by swapping the record
    pointer after we update the cached left and right records.
    
    Fixes: bd169565993b ("xfs: speed up free inode search")
    Signed-off-by: Omar Sandoval <osandov@fb.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f90756d75d69cb05d82a061c799c54dc46e1db1b
Author: Lukas Czerner <lczerner@redhat.com>
Date:   Sun Sep 17 14:06:41 2017 -0700

    xfs: Fix per-inode DAX flag inheritance
    
    commit 56bdf855e676f1f2ed7033f288f57dfd315725ba upstream.
    
    According to the commit that implemented per-inode DAX flag:
    commit 58f88ca2df72 ("xfs: introduce per-inode DAX enablement")
    the flag is supposed to act as "inherit flag".
    
    Currently this only works in the situations where parent directory
    already has a flag in di_flags set, otherwise inheritance does not
    work. This is because setting the XFS_DIFLAG2_DAX flag is done in a
    wrong branch designated for di_flags, not di_flags2.
    
    Fix this by moving the code to branch designated for setting di_flags2,
    which does test for flags in di_flags2.
    
    Fixes: 58f88ca2df72 ("xfs: introduce per-inode DAX enablement")
    Signed-off-by: Lukas Czerner <lczerner@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 229980158f95098ba82e7bec91ce8ada18335bdc
Author: Christoph Hellwig <hch@lst.de>
Date:   Sun Sep 17 14:06:40 2017 -0700

    xfs: fix multi-AG deadlock in xfs_bunmapi
    
    commit 5b094d6dac0451ad89b1dc088395c7b399b7e9e8 upstream.
    
    Just like in the allocator we must avoid touching multiple AGs out of
    order when freeing blocks, as freeing still locks the AGF and can cause
    the same AB-BA deadlocks as in the allocation path.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 81e27c94f9ab86c04ba4ca5f1d2bcf9e61f7b5af
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:39 2017 -0700

    xfs: fix quotacheck dquot id overflow infinite loop
    
    commit cfaf2d034360166e569a4929dd83ae9698bed856 upstream.
    
    If a dquot has an id of U32_MAX, the next lookup index increment
    overflows the uint32_t back to 0. This starts the lookup sequence
    over from the beginning, repeats indefinitely and results in a
    livelock.
    
    Update xfs_qm_dquot_walk() to explicitly check for the lookup
    overflow and exit the loop.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 01bc132048cf9505ed49152cc82e583b18c5538d
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun Sep 17 14:06:38 2017 -0700

    xfs: check _alloc_read_agf buffer pointer before using
    
    commit 10479e2dea83d4c421ad05dfc55d918aa8dfc0cd upstream.
    
    In some circumstances, _alloc_read_agf can return an error code of zero
    but also a null AGF buffer pointer.  Check for this and jump out.
    
    Fixes-coverity-id: 1415250
    Fixes-coverity-id: 1415320
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c32b1ec8a266476494f04843434538cdb25d9190
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun Sep 17 14:06:37 2017 -0700

    xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write
    
    commit 4c1a67bd3606540b9b42caff34a1d5cd94b1cf65 upstream.
    
    We must initialize the firstfsb parameter to _bmapi_write so that it
    doesn't incorrectly treat stack garbage as a restriction on which AGs
    it can search for free space.
    
    Fixes-coverity-id: 1402025
    Fixes-coverity-id: 1415167
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a6247b0189fab0edbe065ab42e76eddb2a03a631
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun Sep 17 14:06:36 2017 -0700

    xfs: check _btree_check_block value
    
    commit 1e86eabe73b73c82e1110c746ed3ec6d5e1c0a0d upstream.
    
    Check the _btree_check_block return value for the firstrec and lastrec
    functions, since we have the ability to signal that the repositioning
    did not succeed.
    
    Fixes-coverity-id: 114067
    Fixes-coverity-id: 114068
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e76496fa85543c48858c537c1a6465068e18db8b
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Sun Sep 17 14:06:35 2017 -0700

    xfs: don't crash on unexpected holes in dir/attr btrees
    
    commit cd87d867920155911d0d2e6485b769d853547750 upstream.
    
    In quite a few places we call xfs_da_read_buf with a mappedbno that we
    don't control, then assume that the function passes back either an error
    code or a buffer pointer.  Unfortunately, if mappedbno == -2 and bno
    maps to a hole, we get a return code of zero and a NULL buffer, which
    means that we crash if we actually try to use that buffer pointer.  This
    happens immediately when we set the buffer type for transaction context.
    
    Therefore, check that we have no error code and a non-NULL bp before
    trying to use bp.  This patch is a follow-up to an incomplete fix in
    96a3aefb8ffde231 ("xfs: don't crash if reading a directory results in an
    unexpected hole").
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b46382f02aff8d9ac141714bc6ae4f972836816f
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:34 2017 -0700

    xfs: free cowblocks and retry on buffered write ENOSPC
    
    commit cf2cb7845d6e101cb17bd62f8aa08cd514fc8988 upstream.
    
    XFS runs an eofblocks reclaim scan before returning an ENOSPC error to
    userspace for buffered writes. This facilitates aggressive speculative
    preallocation without causing user visible side effects such as
    premature ENOSPC.
    
    Run a cowblocks scan in the same situation to reclaim lingering COW fork
    preallocation throughout the filesystem.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 171192c92da616d5848e0e70c6cab4f14351d275
Author: Brian Foster <bfoster@redhat.com>
Date:   Sun Sep 17 14:06:33 2017 -0700

    xfs: free uncommitted transactions during log recovery
    
    commit 39775431f82f890f4aaa08860a30883d081bffc7 upstream.
    
    Log recovery allocates in-core transaction and member item data
    structures on-demand as it processes the on-disk log. Transactions
    are allocated on first encounter on-disk and stored in a hash table
    structure where they are easily accessible for subsequent lookups.
    Transaction items are also allocated on demand and are attached to
    the associated transactions.
    
    When a commit record is encountered in the log, the transaction is
    committed to the fs and the in-core structures are freed. If a
    filesystem crashes or shuts down before all in-core log buffers are
    flushed to the log, however, not all transactions may have commit
    records in the log. As expected, the modifications in such an
    incomplete transaction are not replayed to the fs. The in-core data
    structures for the partial transaction are never freed, however,
    resulting in a memory leak.
    
    Update xlog_do_recovery_pass() to first correctly initialize the
    hash table array so empty lists can be distinguished from populated
    lists on function exit. Update xlog_recover_free_trans() to always
    remove the transaction from the list prior to freeing the associated
    memory. Finally, walk the hash table of transaction lists as the
    last step before it goes out of scope and free any transactions that
    may remain on the lists. This prevents a memory leak of partial
    transactions in the log.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 621d0b75a3476bce5f1d4e13bb99deaf57b9289d
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Mon Jun 19 13:19:08 2017 -0700

    xfs: don't allow bmap on rt files
    
    commit 61d819e7bcb7f33da710bf3f5dcb2bcf1e48203c upstream.
    
    bmap returns a dumb LBA address but not the block device that goes with
    that LBA.  Swapfiles don't care about this and will blindly assume that
    the data volume is the correct blockdev, which is totally bogus for
    files on the rt subvolume.  This results in the swap code doing IOs to
    arbitrary locations on the data device(!) if the passed in mapping is a
    realtime file, so just turn off bmap for rt files.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8913492d12b1e71bd89bb234408483b7c56700e0
Author: Brian Foster <bfoster@redhat.com>
Date:   Wed Jun 14 21:35:35 2017 -0700

    xfs: remove bli from AIL before release on transaction abort
    
    commit 3d4b4a3e30ae7a949c31e1e10268a3da4723d290 upstream.
    
    When a buffer is modified, logged and committed, it ultimately ends
    up sitting on the AIL with a dirty bli waiting for metadata
    writeback. If another transaction locks and invalidates the buffer
    (freeing an inode chunk, for example) in the meantime, the bli is
    flagged as stale, the dirty state is cleared and the bli remains in
    the AIL.
    
    If a shutdown occurs before the transaction that has invalidated the
    buffer is committed, the transaction is ultimately aborted. The log
    items are flagged as such and ->iop_unlock() handles the aborted
    items. Because the bli is clean (due to the invalidation),
    ->iop_unlock() unconditionally releases it. The log item may still
    reside in the AIL, however, which means the I/O completion handler
    may still run and attempt to access it. This results in assert
    failure due to the release of the bli while still present in the AIL
    and a subsequent NULL dereference and panic in the buffer I/O
    completion handling. This can be reproduced by running generic/388
    in repetition.
    
    To avoid this problem, update xfs_buf_item_unlock() to first check
    whether the bli is aborted and if so, remove it from the AIL before
    it is released. This ensures that the bli is no longer accessed
    during the shutdown sequence after it has been freed.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6c0ecde201d796363b92de79553b75089760d9a4
Author: Brian Foster <bfoster@redhat.com>
Date:   Wed Jun 14 21:35:35 2017 -0700

    xfs: release bli from transaction properly on fs shutdown
    
    commit 79e641ce29cfae5b8fc55fb77ac62d11d2d849c0 upstream.
    
    If a filesystem shutdown occurs with a buffer log item in the CIL
    and a log force occurs, the ->iop_unpin() handler is generally
    expected to tear down the bli properly. This entails freeing the bli
    memory and releasing the associated hold on the buffer so it can be
    released and the filesystem unmounted.
    
    If this sequence occurs while ->bli_refcount is elevated (i.e.,
    another transaction is open and attempting to modify the buffer),
    however, ->iop_unpin() may not be responsible for releasing the bli.
    Instead, the transaction may release the final ->bli_refcount
    reference and thus xfs_trans_brelse() is responsible for tearing
    down the bli.
    
    While xfs_trans_brelse() does drop the reference count, it only
    attempts to release the bli if it is clean (i.e., not in the
    CIL/AIL). If the filesystem is shutdown and the bli is sitting dirty
    in the CIL as noted above, this ends up skipping the last
    opportunity to release the bli. In turn, this leaves the hold on the
    buffer and causes an unmount hang. This can be reproduced by running
    generic/388 in repetition.
    
    Update xfs_trans_brelse() to handle this shutdown corner case
    correctly. If the final bli reference is dropped and the filesystem
    is shutdown, remove the bli from the AIL (if necessary) and release
    the bli to drop the buffer hold and ensure an unmount does not hang.
    
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ce83e494d1bbbdd045aae236dcbb412cdd721319
Author: Darrick J. Wong <darrick.wong@oracle.com>
Date:   Wed Jun 14 21:25:57 2017 -0700

    xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent
    
    commit e1a4e37cc7b665b6804fba812aca2f4d7402c249 upstream.
    
    In a pathological scenario where we are trying to bunmapi a single
    extent in which every other block is shared, it's possible that trying
    to unmap the entire large extent in a single transaction can generate so
    many EFIs that we overflow the transaction reservation.
    
    Therefore, use a heuristic to guess at the number of blocks we can
    safely unmap from a reflink file's data fork in an single transaction.
    This should prevent problems such as the log head slamming into the tail
    and ASSERTs that trigger because we've exceeded the transaction
    reservation.
    
    Note that since bunmapi can fail to unmap the entire range, we must also
    teach the deferred unmap code to roll into a new transaction whenever we
    get low on reservation.
    
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    [hch: random edits, all bugs are my fault]
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7cb011bbacef6fcf1d26fe8cd8cc8079404b01f8
Author: Brian Foster <bfoster@redhat.com>
Date:   Wed Jun 14 21:21:45 2017 -0700

    xfs: push buffer of flush locked dquot to avoid quotacheck deadlock
    
    commit 7912e7fef2aebe577f0b46d3cba261f2783c5695 upstream.
    
    Reclaim during quotacheck can lead to deadlocks on the dquot flush
    lock:
    
     - Quotacheck populates a local delwri queue with the physical dquot
       buffers.
     - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
       dirties all of the dquots.
     - Reclaim kicks in and attempts to flush a dquot whose buffer is
       already queud on the quotacheck queue. The flush succeeds but
       queueing to the reclaim delwri queue fails as the backing buffer is
       already queued. The flush unlock is now deferred to I/O completion
       of the buffer from the quotacheck queue.
     - The dqadjust bulkstat continues and dirties the recently flushed
       dquot once again.
     - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
       the flush lock to update the backing buffers with the in-core
       recalculated values. It deadlocks on the redirtied dquot as the
       flush lock was already acquired by reclaim, but the buffer resides
       on the local delwri queue which isn't submitted until the end of
       quotacheck.
    
    This is reproduced by running quotacheck on a filesystem with a
    couple million inodes in low memory (512MB-1GB) situations. This is
    a regression as of commit 43ff2122e6 ("xfs: on-stack delayed write
    buffer lists"), which removed a trylock and buffer I/O submission
    from the quotacheck dquot flush sequence.
    
    Quotacheck first resets and collects the physical dquot buffers in a
    delwri queue. Then, it traverses the filesystem inodes via bulkstat,
    updates the in-core dquots, flushes the corrected dquots to the
    backing buffers and finally submits the delwri queue for I/O. Since
    the backing buffers are queued across the entire quotacheck
    operation, dquot reclaim cannot possibly complete a dquot flush
    before quotacheck completes.
    
    Therefore, quotacheck must submit the buffer for I/O in order to
    cycle the flush lock and flush the dirty in-core dquot to the
    buffer. Add a delwri queue buffer push mechanism to submit an
    individual buffer for I/O without losing the delwri queue status and
    use it from quotacheck to avoid the deadlock. This restores
    quotacheck behavior to as before the regression was introduced.
    
    Reported-by: Martin Svec <martin.svec@zoner.cz>
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 85ab1b23d2d865049299f3c197ce550e80228fac
Author: Brian Foster <bfoster@redhat.com>
Date:   Thu Jun 8 08:23:07 2017 -0700

    xfs: fix spurious spin_is_locked() assert failures on non-smp kernels
    
    commit 95989c46d2a156365867b1d795fdefce71bce378 upstream.
    
    The 0-day kernel test robot reports assertion failures on
    !CONFIG_SMP kernels due to failed spin_is_locked() checks. As it
    turns out, spin_is_locked() is hardcoded to return zero on
    !CONFIG_SMP kernels and so this function cannot be relied on to
    verify spinlock state in this configuration.
    
    To avoid this problem, replace the associated asserts with lockdep
    variants that do the right thing regardless of kernel configuration.
    Drop the one assert that checks for an unlocked lock as there is no
    suitable lockdep variant for that case. This moves the spinlock
    checks from XFS debug code to lockdep, but generally provides the
    same level of protection.
    
    Reported-by: kbuild test robot <fengguang.wu@intel.com>
    Signed-off-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4c1d33c4cf864cd1fa14868440daa300a8494900
Author: Jan Kara <jack@suse.cz>
Date:   Thu May 18 16:36:24 2017 -0700

    xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff()
    
    commit a54fba8f5a0dc36161cacdf2aa90f007f702ec1a upstream.
    
    Currently several places in xfs_find_get_desired_pgoff() handle the case
    of a missing page. Make them all handled in one place after the loop has
    terminated.
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3fddeb80034b2be27179cdc4e23167bc78d304d1
Author: Andy Lutomirski <luto@kernel.org>
Date:   Tue Aug 1 07:11:37 2017 -0700

    x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
    
    commit e137a4d8f4dd2e277e355495b6b2cb241a8693c3 upstream.
    
    Switching FS and GS is a mess, and the current code is still subtly
    wrong: it assumes that "Loading a nonzero value into FS sets the
    index and base", which is false on AMD CPUs if the value being
    loaded is 1, 2, or 3.
    
    (The current code came from commit 3e2b68d752c9 ("x86/asm,
    sched/x86: Rewrite the FS and GS context switch code"), which made
    it better but didn't fully fix it.)
    
    Rewrite it to be much simpler and more obviously correct.  This
    should fix it fully on AMD CPUs and shouldn't adversely affect
    performance.
    
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Borislav Petkov <bpetkov@suse.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Chang Seok <chang.seok.bae@intel.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0caec70692a0f19538ed4ebb816df0d5585c8bd0
Author: Andy Lutomirski <luto@kernel.org>
Date:   Tue Aug 1 07:11:35 2017 -0700

    x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
    
    commit 9584d98bed7a7a904d0702ad06bbcc94703cb5b4 upstream.
    
    In ELF_COPY_CORE_REGS, we're copying from the current task, so
    accessing thread.fsbase and thread.gsbase makes no sense.  Just read
    the values from the CPU registers.
    
    In practice, the old code would have been correct most of the time
    simply because thread.fsbase and thread.gsbase usually matched the
    CPU registers.
    
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Borislav Petkov <bpetkov@suse.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Chang Seok <chang.seok.bae@intel.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c7d1ddec251d39415cd488c29e9d60b22d4b61b7
Author: Andy Lutomirski <luto@kernel.org>
Date:   Tue Aug 1 07:11:34 2017 -0700

    x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
    
    commit 767d035d838f4fd6b5a5bbd7a3f6d293b7f65a49 upstream.
    
    execve used to leak FSBASE and GSBASE on AMD CPUs.  Fix it.
    
    The security impact of this bug is small but not quite zero -- it
    could weaken ASLR when a privileged task execs a less privileged
    program, but only if program changed bitness across the exec, or the
    child binary was highly unusual or actively malicious.  A child
    program that was compromised after the exec would not have access to
    the leaked base.
    
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Borislav Petkov <bpetkov@suse.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Chang Seok <chang.seok.bae@intel.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit cc9618c9fffe6bd362f048928e15effe04e5b6cd
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Sat Aug 12 21:33:23 2017 -0700

    f2fs: check hot_data for roll-forward recovery
    
    commit 125c9fb1ccb53eb2ea9380df40f3c743f3fb2fed upstream.
    
    We need to check HOT_DATA to truncate any previous data block when doing
    roll-forward recovery.
    
    Reviewed-by: Chao Yu <yuchao0@huawei.com>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0f90297cba9ba37eb37723423c2df022ce77704a
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Thu Aug 10 17:35:04 2017 -0700

    f2fs: let fill_super handle roll-forward errors
    
    commit afd2b4da40b3b567ef8d8e6881479345a2312a03 upstream.
    
    If we set CP_ERROR_FLAG in roll-forward error, f2fs is no longer to proceed
    any IOs due to f2fs_cp_error(). But, for example, if some stale data is involved
    on roll-forward process, we're able to get -ENOENT, getting fs stuck.
    If we get any error, let fill_super set SBI_NEED_FSCK and try to recover back
    to stable point.
    
    Reviewed-by: Chao Yu <yuchao0@huawei.com>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 60b94125a1fe4988f5392d8537305dad441ef43d
Author: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date:   Thu Sep 7 14:08:34 2017 +0800

    ip_tunnel: fix setting ttl and tos value in collect_md mode
    
    
    [ Upstream commit 0f693f1995cf002432b70f43ce73f79bf8d0b6c9 ]
    
    ttl and tos variables are declared and assigned, but are not used in
    iptunnel_xmit() function.
    
    Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
    Cc: Alexei Starovoitov <ast@fb.com>
    Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3f60dadbe1781e292b560dd353d4a5a637ed192d
Author: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date:   Fri Sep 8 11:35:21 2017 -0300

    sctp: fix missing wake ups in some situations
    
    
    [ Upstream commit 7906b00f5cd1cd484fced7fcda892176e3202c8a ]
    
    Commit fb586f25300f ("sctp: delay calls to sk_data_ready() as much as
    possible") minimized the number of wake ups that are triggered in case
    the association receives a packet with multiple data chunks on it and/or
    when io_events are enabled and then commit 0970f5b36659 ("sctp: signal
    sk_data_ready earlier on data chunks reception") moved the wake up to as
    soon as possible. It thus relies on the state machine running later to
    clean the flag that the event was already generated.
    
    The issue is that there are 2 call paths that calls
    sctp_ulpq_tail_event() outside of the state machine, causing the flag to
    linger and possibly omitting a needed wake up in the sequence.
    
    One of the call paths is when enabling SCTP_SENDER_DRY_EVENTS via
    setsockopt(SCTP_EVENTS), as noticed by Harald Welte. The other is when
    partial reliability triggers removal of chunks from the send queue when
    the application calls sendmsg().
    
    This commit fixes it by not setting the flag in case the socket is not
    owned by the user, as it won't be cleaned later. This works for
    user-initiated calls and also for rx path processing.
    
    Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
    Reported-by: Harald Welte <laforge@gnumonks.org>
    Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bf8ed95d2ca9c99f0237fb3cf56c381b19130610
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Sep 8 15:48:47 2017 -0700

    ipv6: fix typo in fib6_net_exit()
    
    
    [ Upstream commit 32a805baf0fb70b6dbedefcd7249ac7f580f9e3b ]
    
    IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.
    
    Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c9335db792c04be68e553c6d0537c9df8b20e557
Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Fri Sep 8 10:26:19 2017 +0200

    ipv6: fix memory leak with multiple tables during netns destruction
    
    
    [ Upstream commit ba1cc08d9488c94cb8d94f545305688b72a2a300 ]
    
    fib6_net_exit only frees the main and local tables. If another table was
    created with fib6_alloc_table, we leak it when the netns is destroyed.
    
    Fix this in the same way ip_fib_net_exit cleans up tables, by walking
    through the whole hashtable of fib6_table's. We can get rid of the
    special cases for local and main, since they're also part of the
    hashtable.
    
    Reproducer:
        ip netns add x
        ip -net x -6 rule add from 6003:1::/64 table 100
        ip netns del x
    
    Reported-by: Jianlin Shi <jishi@redhat.com>
    Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ca7d8a337bd3e3eda49ab1b4dfa09ac9b335a56b
Author: Xin Long <lucien.xin@gmail.com>
Date:   Tue Sep 5 17:26:33 2017 +0800

    ip6_gre: update mtu properly in ip6gre_err
    
    
    [ Upstream commit 5c25f30c93fdc5bf25e62101aeaae7a4f9b421b3 ]
    
    Now when probessing ICMPV6_PKT_TOOBIG, ip6gre_err only subtracts the
    offset of gre header from mtu info. The expected mtu of gre device
    should also subtract gre header. Otherwise, the next packets still
    can't be sent out.
    
    Jianlin found this issue when using the topo:
      client(ip6gre)<---->(nic1)route(nic2)<----->(ip6gre)server
    
    and reducing nic2's mtu, then both tcp and sctp's performance with
    big size data became 0.
    
    This patch is to fix it by also subtracting grehdr (tun->tun_hlen)
    from mtu info when updating gre device's mtu in ip6gre_err(). It
    also needs to subtract ETH_HLEN if gre dev'type is ARPHRD_ETHER.
    
    Reported-by: Jianlin Shi <jishi@redhat.com>
    Signed-off-by: Xin Long <lucien.xin@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f5755c0e870056dd35c95a0b5c0a038cdb4382ee
Author: Jason Wang <jasowang@redhat.com>
Date:   Tue Sep 5 09:22:05 2017 +0800

    vhost_net: correctly check tx avail during rx busy polling
    
    
    [ Upstream commit 8b949bef9172ca69d918e93509a4ecb03d0355e0 ]
    
    We check tx avail through vhost_enable_notify() in the past which is
    wrong since it only checks whether or not guest has filled more
    available buffer since last avail idx synchronization which was just
    done by vhost_vq_avail_empty() before. What we really want is checking
    pending buffers in the avail ring. Fix this by calling
    vhost_vq_avail_empty() instead.
    
    This issue could be noticed by doing netperf TCP_RR benchmark as
    client from guest (but not host). With this fix, TCP_RR from guest to
    localhost restores from 1375.91 trans per sec to 55235.28 trans per
    sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).
    
    Fixes: 030881372460 ("vhost_net: basic polling support")
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 90406e68e42fa50c41b69a5d607fa979d0ab562b
Author: Claudiu Manoil <claudiu.manoil@nxp.com>
Date:   Mon Sep 4 10:45:28 2017 +0300

    gianfar: Fix Tx flow control deactivation
    
    
    [ Upstream commit 5d621672bc1a1e5090c1ac5432a18c79e0e13e03 ]
    
    The wrong register is checked for the Tx flow control bit,
    it should have been maccfg1 not maccfg2.
    This went unnoticed for so long probably because the impact is
    hardly visible, not to mention the tangled code from adjust_link().
    First, link flow control (i.e. handling of Rx/Tx link level pause frames)
    is disabled by default (needs to be enabled via 'ethtool -A').
    Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
    old boards), which results in Tx flow control remaining always on
    once activated.
    
    Fixes: 45b679c9a3ccd9e34f28e6ec677b812a860eb8eb ("gianfar: Implement PAUSE frame generation support")
    Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1bcf18718ec63ad5fb025b75a5d2439e1dcf1213
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Fri Sep 1 11:26:13 2017 +0200

    Revert "net: fix percpu memory leaks"
    
    
    [ Upstream commit 5a63643e583b6a9789d7a225ae076fb4e603991c ]
    
    This reverts commit 1d6119baf0610f813eb9d9580eb4fd16de5b4ceb.
    
    After reverting commit 6d7b857d541e ("net: use lib/percpu_counter API
    for fragmentation mem accounting") then here is no need for this
    fix-up patch.  As percpu_counter is no longer used, it cannot
    memory leak it any-longer.
    
    Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
    Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5a7a40bad254d2571d93059ba4b3963dc448cdb0
Author: Jesper Dangaard Brouer <brouer@redhat.com>
Date:   Fri Sep 1 11:26:08 2017 +0200

    Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
    
    
    [ Upstream commit fb452a1aa3fd4034d7999e309c5466ff2d7005aa ]
    
    This reverts commit 6d7b857d541ecd1d9bd997c97242d4ef94b19de2.
    
    There is a bug in fragmentation codes use of the percpu_counter API,
    that can cause issues on systems with many CPUs.
    
    The frag_mem_limit() just reads the global counter (fbc->count),
    without considering other CPUs can have upto batch size (130K) that
    haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
    this become dangerous at >=24 CPUs (3*1024*1024/130000=24).
    
    The correct API usage would be to use __percpu_counter_compare() which
    does the right thing, and takes into account the number of (online)
    CPUs and batch size, to account for this and call __percpu_counter_sum()
    when needed.
    
    We choose to revert the use of the lib/percpu_counter API for frag
    memory accounting for several reasons:
    
    1) On systems with CPUs > 24, the heavier fully locked
       __percpu_counter_sum() is always invoked, which will be more
       expensive than the atomic_t that is reverted to.
    
    Given systems with more than 24 CPUs are becoming common this doesn't
    seem like a good option.  To mitigate this, the batch size could be
    decreased and thresh be increased.
    
    2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
       CPU, before SKBs are pushed into sockets on remote CPUs.  Given
       NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
       likely be limited.  Thus, a fair chance that atomic add+dec happen
       on the same CPU.
    
    Revert note that commit 1d6119baf061 ("net: fix percpu memory leaks")
    removed init_frag_mem_limit() and instead use inet_frags_init_net().
    After this revert, inet_frags_uninit_net() becomes empty.
    
    Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
    Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b5a3ae8b127e692d6ebf4707c4ec6db68c413024
Author: Ido Schimmel <idosch@mellanox.com>
Date:   Fri Sep 1 12:22:25 2017 +0300

    bridge: switchdev: Clear forward mark when transmitting packet
    
    
    [ Upstream commit 79e99bdd60b484af9afe0147e85a13e66d5c1cdb ]
    
    Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for
    stacked devices") added the 'offload_fwd_mark' bit to the skb in order
    to allow drivers to indicate to the bridge driver that they already
    forwarded the packet in L2.
    
    In case the bit is set, before transmitting the packet from each port,
    the port's mark is compared with the mark stored in the skb's control
    block. If both marks are equal, we know the packet arrived from a switch
    device that already forwarded the packet and it's not re-transmitted.
    
    However, if the packet is transmitted from the bridge device itself
    (e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark
    stored in the skb's control block isn't valid.
    
    This scenario can happen in rare cases where a packet was trapped during
    L3 forwarding and forwarded by the kernel to a bridge device.
    
    Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
    Signed-off-by: Ido Schimmel <idosch@mellanox.com>
    Reported-by: Yotam Gigi <yotamg@mellanox.com>
    Tested-by: Yotam Gigi <yotamg@mellanox.com>
    Reviewed-by: Jiri Pirko <jiri@mellanox.com>
    Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 73ee5a73e75f3c0e5d4ca0c5a362424e93413bb0
Author: Ido Schimmel <idosch@mellanox.com>
Date:   Fri Sep 1 10:52:31 2017 +0200

    mlxsw: spectrum: Forbid linking to devices that have uppers
    
    
    [ Upstream commit 25cc72a33835ed8a6f53180a822cadab855852ac ]
    
    The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
    device in case a port is enslaved to a master netdev such as bridge or
    bond.
    
    Since the driver ignores events unrelated to its ports and their
    uppers, it's possible to engineer situations in which the device's data
    path differs from the kernel's.
    
    One example to such a situation is when a port is enslaved to a bond
    that is already enslaved to a bridge. When the bond was enslaved the
    driver ignored the event - as the bond wasn't one of its uppers - and
    therefore a bridge port instance isn't created in the device.
    
    Until such configurations are supported forbid them by checking that the
    upper device doesn't have uppers of its own.
    
    Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave")
    Signed-off-by: Ido Schimmel <idosch@mellanox.com>
    Reported-by: Nogah Frankel <nogahf@mellanox.com>
    Tested-by: Nogah Frankel <nogahf@mellanox.com>
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a10c510179b369f7d1e8cf77f43ee2db900c1ac9
Author: Wei Wang <weiwan@google.com>
Date:   Thu May 18 11:22:33 2017 -0700

    tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
    
    
    [ Upstream commit 499350a5a6e7512d9ed369ed63a4244b6536f4f8 ]
    
    When tcp_disconnect() is called, inet_csk_delack_init() sets
    icsk->icsk_ack.rcv_mss to 0.
    This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() =>
    __tcp_select_window() call path to have division by 0 issue.
    So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0.
    
    Reported-by: Andrey Konovalov  <andreyknvl@google.com>
    Signed-off-by: Wei Wang <weiwan@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a6e51fda71a205fbd8f7b98da799c46e563c3db1
Author: Florian Fainelli <f.fainelli@gmail.com>
Date:   Wed Aug 30 17:49:29 2017 -0700

    Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
    
    
    [ Upstream commit ebc8254aeae34226d0bc8fda309fd9790d4dccfe ]
    
    This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
    Correctly process PHY_HALTED in phy_stop_machine()") because it is
    creating the possibility for a NULL pointer dereference.
    
    David Daney provide the following call trace and diagram of events:
    
    When ndo_stop() is called we call:
    
     phy_disconnect()
        +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
        +---> phy_stop_machine()
        |      +---> phy_state_machine()
        |              +----> queue_delayed_work(): Work queued.
        +--->phy_detach() implies: phydev->attached_dev = NULL;
    
    Now at a later time the queued work does:
    
     phy_state_machine()
        +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:
    
     CPU 12 Unable to handle kernel paging request at virtual address
    0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
    Oops[#1]:
    CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
    Workqueue: events_power_efficient phy_state_machine
    task: 80000004021ed100 task.stack: 8000000409d70000
    $ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
    $ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
    $ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
    $12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
    $16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
    $20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
    $24   : 0000000000000061 ffffffff808637b0
    $28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
    Hi    : 000000000000002a
    Lo    : 000000000000003f
    epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
    ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
    Status: 14009ce3        KX SX UX KERNEL EXL IE
    Cause : 00800008 (ExcCode 02)
    BadVA : 0000000000000048
    PrId  : 000d9501 (Cavium Octeon III)
    Modules linked in:
    Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
    task=80000004021ed100, tls=0000000000000000)
    Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
            0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
            80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
            ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
            8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
            ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
            0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
            8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
            8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
            8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
            ...
    Call Trace:
    [<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
    [<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
    [<ffffffff808a1708>] process_one_work+0x158/0x368
    [<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
    [<ffffffff808a8598>] kthread+0xc8/0xe0
    [<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c
    
    The original motivation for this change originated from Marc Gonzales
    indicating that his network driver did not have its adjust_link callback
    executing with phydev->link = 0 while he was expecting it.
    
    PHYLIB has never made any such guarantees ever because phy_stop() merely just
    tells the workqueue to move into PHY_HALTED state which will happen
    asynchronously.
    
    Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Reported-by: David Daney <ddaney.cavm@gmail.com>
    Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
    Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit af33da0ed95f6a7b652f774fbb07fb52d2c21a97
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Aug 30 09:29:31 2017 -0700

    kcm: do not attach PF_KCM sockets to avoid deadlock
    
    
    [ Upstream commit 351050ecd6523374b370341cc29fe61e2201556b ]
    
    syzkaller had no problem to trigger a deadlock, attaching a KCM socket
    to another one (or itself). (original syzkaller report was a very
    confusing lockdep splat during a sendmsg())
    
    It seems KCM claims to only support TCP, but no enforcement is done,
    so we might need to add additional checks.
    
    Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Acked-by: Tom Herbert <tom@quantonium.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8c623e5d03692dc478277185a0b907d53aea1b43
Author: Benjamin Poirier <bpoirier@suse.com>
Date:   Mon Aug 28 14:29:41 2017 -0400

    packet: Don't write vnet header beyond end of buffer
    
    
    [ Upstream commit edbd58be15a957f6a760c4a514cd475217eb97fd ]
    
    ... which may happen with certain values of tp_reserve and maclen.
    
    Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv")
    Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
    Cc: Willem de Bruijn <willemb@google.com>
    Acked-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2b3bd5972a5ce434b3f2211181e72033efe018d9
Author: Stefano Brivio <sbrivio@redhat.com>
Date:   Fri Aug 25 22:48:48 2017 +0200

    cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox()
    
    
    [ Upstream commit 0f3086868e8889a823a6e0f3d299102aa895d947 ]
    
    Passing commands for logging to t4_record_mbox() with size
    MBOX_LEN, when the actual command size is actually smaller,
    causes out-of-bounds stack accesses in t4_record_mbox() while
    copying command words here:
    
            for (i = 0; i < size / 8; i++)
                    entry->cmd[i] = be64_to_cpu(cmd[i]);
    
    Up to 48 bytes from the stack are then leaked to debugfs.
    
    This happens whenever we send (and log) commands described by
    structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
    fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
    fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
    fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
    fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
    fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
    fw_sched_cmd(32), fw_devlog_cmd(32).
    
    The cxgb4vf driver got this right instead.
    
    When we call t4_record_mbox() to log a command reply, a MBOX_LEN
    size can be used though, as get_mbox_rpl() will fill cmd_rpl up
    completely.
    
    Fixes: 7f080c3f2ff0 ("cxgb4: Add support to enable logging of firmware mailbox commands")
    Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit de2ecec26dba848c729e51faaf2b4daf35096330
Author: stephen hemminger <stephen@networkplumber.org>
Date:   Thu Aug 24 16:49:16 2017 -0700

    netvsc: fix deadlock betwen link status and removal
    
    
    [ Upstream commit 9b4e946ce14e20d7addbfb7d9139e604f9fda107 ]
    
    There is a deadlock possible when canceling the link status
    delayed work queue. The removal process is run with RTNL held,
    and the link status callback is acquring RTNL.
    
    Resolve the issue by using trylock and rescheduling.
    If cancel is in process, that block it from happening.
    
    Fixes: 122a5f6410f4 ("staging: hv: use delayed_work for netvsc_send_garp()")
    Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 64dfc67548da52fe7891decf725342a8e87e32d8
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Aug 23 15:59:49 2017 +0200

    qlge: avoid memcpy buffer overflow
    
    
    [ Upstream commit e58f95831e7468d25eb6e41f234842ecfe6f014f ]
    
    gcc-8.0.0 (snapshot) points out that we copy a variable-length string
    into a fixed length field using memcpy() with the destination length,
    and that ends up copying whatever follows the string:
    
        inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
    drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
      memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
    
    Changing it to use strncpy() will instead zero-pad the destination,
    which seems to be the right thing to do here.
    
    The bug is probably harmless, but it seems like a good idea to address
    it in stable kernels as well, if only for the purpose of building with
    gcc-8 without warnings.
    
    Fixes: a61f80261306 ("qlge: Add ethtool register dump function.")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 08d56d8a99bb82e134ba7704e4cfdabbcc16fc4f
Author: Stefano Brivio <sbrivio@redhat.com>
Date:   Wed Aug 23 13:27:13 2017 +0200

    sctp: Avoid out-of-bounds reads from address storage
    
    
    [ Upstream commit ee6c88bb754e3d363e568da78086adfedb692447 ]
    
    inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
    sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
    to export diagnostic information to userspace.
    
    However, the memory allocated to store sockaddr information is
    smaller than that and depends on the address family, so we leak
    up to 100 uninitialized bytes to userspace. Just use the size of
    the source structs instead, in all the three cases this is what
    userspace expects. Zero out the remaining memory.
    
    Unused bytes (i.e. when IPv4 addresses are used) in source
    structs sctp_sockaddr_entry and sctp_transport are already
    cleared by sctp_add_bind_addr() and sctp_transport_new(),
    respectively.
    
    Noticed while testing KASAN-enabled kernel with 'ss':
    
    [ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800
    [ 2326.896800] Read of size 128 by task ss/9527
    [ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1
    [ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
    [ 2326.917585] Call Trace:
    [ 2326.920312]  dump_stack+0x63/0x8d
    [ 2326.924014]  kasan_object_err+0x21/0x70
    [ 2326.928295]  kasan_report+0x288/0x540
    [ 2326.932380]  ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
    [ 2326.938500]  ? skb_put+0x8b/0xd0
    [ 2326.942098]  ? memset+0x31/0x40
    [ 2326.945599]  check_memory_region+0x13c/0x1a0
    [ 2326.950362]  memcpy+0x23/0x50
    [ 2326.953669]  inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
    [ 2326.959596]  ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag]
    [ 2326.966495]  ? __lock_sock+0x102/0x150
    [ 2326.970671]  ? sock_def_wakeup+0x60/0x60
    [ 2326.975048]  ? remove_wait_queue+0xc0/0xc0
    [ 2326.979619]  sctp_diag_dump+0x44a/0x760 [sctp_diag]
    [ 2326.985063]  ? sctp_ep_dump+0x280/0x280 [sctp_diag]
    [ 2326.990504]  ? memset+0x31/0x40
    [ 2326.994007]  ? mutex_lock+0x12/0x40
    [ 2326.997900]  __inet_diag_dump+0x57/0xb0 [inet_diag]
    [ 2327.003340]  ? __sys_sendmsg+0x150/0x150
    [ 2327.007715]  inet_diag_dump+0x4d/0x80 [inet_diag]
    [ 2327.012979]  netlink_dump+0x1e6/0x490
    [ 2327.017064]  __netlink_dump_start+0x28e/0x2c0
    [ 2327.021924]  inet_diag_handler_cmd+0x189/0x1a0 [inet_diag]
    [ 2327.028045]  ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag]
    [ 2327.034651]  ? inet_diag_dump_compat+0x190/0x190 [inet_diag]
    [ 2327.040965]  ? __netlink_lookup+0x1b9/0x260
    [ 2327.045631]  sock_diag_rcv_msg+0x18b/0x1e0
    [ 2327.050199]  netlink_rcv_skb+0x14b/0x180
    [ 2327.054574]  ? sock_diag_bind+0x60/0x60
    [ 2327.058850]  sock_diag_rcv+0x28/0x40
    [ 2327.062837]  netlink_unicast+0x2e7/0x3b0
    [ 2327.067212]  ? netlink_attachskb+0x330/0x330
    [ 2327.071975]  ? kasan_check_write+0x14/0x20
    [ 2327.076544]  netlink_sendmsg+0x5be/0x730
    [ 2327.080918]  ? netlink_unicast+0x3b0/0x3b0
    [ 2327.085486]  ? kasan_check_write+0x14/0x20
    [ 2327.090057]  ? selinux_socket_sendmsg+0x24/0x30
    [ 2327.095109]  ? netlink_unicast+0x3b0/0x3b0
    [ 2327.099678]  sock_sendmsg+0x74/0x80
    [ 2327.103567]  ___sys_sendmsg+0x520/0x530
    [ 2327.107844]  ? __get_locked_pte+0x178/0x200
    [ 2327.112510]  ? copy_msghdr_from_user+0x270/0x270
    [ 2327.117660]  ? vm_insert_page+0x360/0x360
    [ 2327.122133]  ? vm_insert_pfn_prot+0xb4/0x150
    [ 2327.126895]  ? vm_insert_pfn+0x32/0x40
    [ 2327.131077]  ? vvar_fault+0x71/0xd0
    [ 2327.134968]  ? special_mapping_fault+0x69/0x110
    [ 2327.140022]  ? __do_fault+0x42/0x120
    [ 2327.144008]  ? __handle_mm_fault+0x1062/0x17a0
    [ 2327.148965]  ? __fget_light+0xa7/0xc0
    [ 2327.153049]  __sys_sendmsg+0xcb/0x150
    [ 2327.157133]  ? __sys_sendmsg+0xcb/0x150
    [ 2327.161409]  ? SyS_shutdown+0x140/0x140
    [ 2327.165688]  ? exit_to_usermode_loop+0xd0/0xd0
    [ 2327.170646]  ? __do_page_fault+0x55d/0x620
    [ 2327.175216]  ? __sys_sendmsg+0x150/0x150
    [ 2327.179591]  SyS_sendmsg+0x12/0x20
    [ 2327.183384]  do_syscall_64+0xe3/0x230
    [ 2327.187471]  entry_SYSCALL64_slow_path+0x25/0x25
    [ 2327.192622] RIP: 0033:0x7f41d18fa3b0
    [ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    [ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0
    [ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003
    [ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040
    [ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003
    [ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084
    [ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64
    [ 2327.251953] Allocated:
    [ 2327.254581] PID = 9484
    [ 2327.257215]  save_stack_trace+0x1b/0x20
    [ 2327.261485]  save_stack+0x46/0xd0
    [ 2327.265179]  kasan_kmalloc+0xad/0xe0
    [ 2327.269165]  kmem_cache_alloc_trace+0xe6/0x1d0
    [ 2327.274138]  sctp_add_bind_addr+0x58/0x180 [sctp]
    [ 2327.279400]  sctp_do_bind+0x208/0x310 [sctp]
    [ 2327.284176]  sctp_bind+0x61/0xa0 [sctp]
    [ 2327.288455]  inet_bind+0x5f/0x3a0
    [ 2327.292151]  SYSC_bind+0x1a4/0x1e0
    [ 2327.295944]  SyS_bind+0xe/0x10
    [ 2327.299349]  do_syscall_64+0xe3/0x230
    [ 2327.303433]  return_from_SYSCALL_64+0x0/0x6a
    [ 2327.308194] Freed:
    [ 2327.310434] PID = 4131
    [ 2327.313065]  save_stack_trace+0x1b/0x20
    [ 2327.317344]  save_stack+0x46/0xd0
    [ 2327.321040]  kasan_slab_free+0x73/0xc0
    [ 2327.325220]  kfree+0x96/0x1a0
    [ 2327.328530]  dynamic_kobj_release+0x15/0x40
    [ 2327.333195]  kobject_release+0x99/0x1e0
    [ 2327.337472]  kobject_put+0x38/0x70
    [ 2327.341266]  free_notes_attrs+0x66/0x80
    [ 2327.345545]  mod_sysfs_teardown+0x1a5/0x270
    [ 2327.350211]  free_module+0x20/0x2a0
    [ 2327.354099]  SyS_delete_module+0x2cb/0x2f0
    [ 2327.358667]  do_syscall_64+0xe3/0x230
    [ 2327.362750]  return_from_SYSCALL_64+0x0/0x6a
    [ 2327.367510] Memory state around the buggy address:
    [ 2327.372855]  ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
    [ 2327.380914]  ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00
    [ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb
    [ 2327.397031]                                ^
    [ 2327.401792]  ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
    [ 2327.409850]  ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00
    [ 2327.417907] ==================================================================
    
    This fixes CVE-2017-7558.
    
    References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266
    Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
    Cc: Xin Long <lucien.xin@gmail.com>
    Cc: Vlad Yasevich <vyasevich@gmail.com>
    Cc: Neil Horman <nhorman@tuxdriver.com>
    Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
    Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Reviewed-by: Xin Long <lucien.xin@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4d8ee1935bcd666360311dfdadeee235d682d69a
Author: Florian Fainelli <f.fainelli@gmail.com>
Date:   Tue Aug 22 15:24:47 2017 -0700

    fsl/man: Inherit parent device and of_node
    
    
    [ Upstream commit a1a50c8e4c241a505b7270e1a3c6e50d94e794b1 ]
    
    Junote Cai reported that he was not able to get a DSA setup involving the
    Freescale DPAA/FMAN driver to work and narrowed it down to
    of_find_net_device_by_node(). This function requires the network device's
    device reference to be correctly set which is the case here, though we have
    lost any device_node association there.
    
    The problem is that dpaa_eth_add_device() allocates a "dpaa-ethernet" platform
    device, and later on dpaa_eth_probe() is called but SET_NETDEV_DEV() won't be
    propagating &pdev->dev.of_node properly. Fix this by inherenting both the parent
    device and the of_node when dpaa_eth_add_device() creates the platform device.
    
    Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver")
    Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1e39e5c6a2ea1f488ad13d351d6c55a5ef530666
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 22 09:39:28 2017 -0700

    udp: on peeking bad csum, drop packets even if not at head
    
    
    [ Upstream commit fd6055a806edc4019be1b9fb7d25262599bca5b1 ]
    
    When peeking, if a bad csum is discovered, the skb is unlinked from
    the queue with __sk_queue_drop_skb and the peek operation restarted.
    
    __sk_queue_drop_skb only drops packets that match the queue head.
    
    This fails if the skb was found after the head, using SO_PEEK_OFF
    socket option. This causes an infinite loop.
    
    We MUST drop this problematic skb, and we can simply check if skb was
    already removed by another thread, by looking at skb->next :
    
    This pointer is set to NULL by the  __skb_unlink() operation, that might
    have happened only under the spinlock protection.
    
    Many thanks to syzkaller team (and particularly Dmitry Vyukov who
    provided us nice C reproducers exhibiting the lockup) and Willem de
    Bruijn who provided first version for this patch and a test program.
    
    Fixes: 627d2d6b5500 ("udp: enable MSG_PEEK at non-zero offset")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Cc: Willem de Bruijn <willemb@google.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Acked-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4b4a194a10e2a2dd7bf3f90016b56ac495a1d37e
Author: Sabrina Dubroca <sd@queasysnail.net>
Date:   Tue Aug 22 15:36:08 2017 +0200

    macsec: add genl family module alias
    
    
    [ Upstream commit 78362998f58c7c271e2719dcd0aaced435c801f9 ]
    
    This helps tools such as wpa_supplicant can start even if the macsec
    module isn't loaded yet.
    
    Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
    Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 43c792a8488087668f7e1052201e2eeb32150141
Author: Wei Wang <weiwan@google.com>
Date:   Fri Aug 25 15:03:10 2017 -0700

    ipv6: fix sparse warning on rt6i_node
    
    
    [ Upstream commit 4e587ea71bf924f7dac621f1351653bd41e446cb ]
    
    Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
    generates a new sparse warning on rt->rt6i_node related code:
      net/ipv6/route.c:1394:30: error: incompatible types in comparison
      expression (different address spaces)
      ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
      expression (different address spaces)
    
    This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
    rcu API is used for it.
    After this fix, sparse no longer generates the above warning.
    
    Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
    Signed-off-by: Wei Wang <weiwan@google.com>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7f8f23fc8026a7a4f29f49c18a2ebbb529ee3916
Author: Wei Wang <weiwan@google.com>
Date:   Mon Aug 21 09:47:10 2017 -0700

    ipv6: add rcu grace period before freeing fib6_node
    
    
    [ Upstream commit c5cff8561d2d0006e972bd114afd51f082fee77c ]
    
    We currently keep rt->rt6i_node pointing to the fib6_node for the route.
    And some functions make use of this pointer to dereference the fib6_node
    from rt structure, e.g. rt6_check(). However, as there is neither
    refcount nor rcu taken when dereferencing rt->rt6i_node, it could
    potentially cause crashes as rt->rt6i_node could be set to NULL by other
    CPUs when doing a route deletion.
    This patch introduces an rcu grace period before freeing fib6_node and
    makes sure the functions that dereference it takes rcu_read_lock().
    
    Note: there is no "Fixes" tag because this bug was there in a very
    early stage.
    
    Signed-off-by: Wei Wang <weiwan@google.com>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit dccb31be7ef8984b8fa636b65f74b662db6b3cb3
Author: Stefano Brivio <sbrivio@redhat.com>
Date:   Fri Aug 18 14:40:53 2017 +0200

    ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
    
    
    [ Upstream commit 3de33e1ba0506723ab25734e098cf280ecc34756 ]
    
    A packet length of exactly IPV6_MAXPLEN is allowed, we should
    refuse parsing options only if the size is 64KiB or more.
    
    While at it, remove one extra variable and one assignment which
    were also introduced by the commit that introduced the size
    check. Checking the sum 'offset + len' and only later adding
    'len' to 'offset' doesn't provide any advantage over directly
    summing to 'offset' and checking it.
    
    Fixes: 6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt")
    Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
