Introduce nr_busy_cpus in the struct sched_group_power [Not in sched_group
because sched groups are duplicated for the SD_OVERLAP scheduler domain]
and for each cpu that enters and exits idle, this parameter will
be updated in each scheduler group of the scheduler domain that this cpu
belongs to.
To avoid the frequent update of this state as the cpu enters
and exits idle, the update of the stat during idle exit is
delayed to the first timer tick that happens after the cpu becomes busy.
This is done using NOHZ_IDLE flag in the struct rq's nohz_flags.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20111202010832.555984323@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce nohz_flags in the struct rq, which will track these two flags
for now.
NOHZ_TICK_STOPPED keeps track of the tick stopped status that gets set when
the tick is stopped. It will be used to update the nohz idle load balancer data
structures during the first busy tick after the tick is restarted. At this
first busy tick after tickless idle, NOHZ_TICK_STOPPED flag will be reset.
This will minimize the nohz idle load balancer status updates that currently
happen for every tickless exit, making it more scalable when there
are many logical cpu's that enter and exit idle often.
NOHZ_BALANCE_KICK will track the need for nohz idle load balance
on this rq. This will replace the nohz_balance_kick in the rq, which was
not being updated atomically.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20111202010832.499438999@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The second call to sched_rt_period() is redundant, because the value of the
rt_runtime was already read and it was protected by the ->rt_runtime_lock.
Signed-off-by: Shan Hai <haishan.bai@gmail.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322535836-13590-2-git-send-email-haishan.bai@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For the SD_OVERLAP domain, sched_groups for each CPU's sched_domain are
privately allocated and not shared with any other cpu. So the
sched group allocation should come from the cpu's node for which
SD_OVERLAP sched domain is being setup.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111118230554.164910950@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is another case where we are on our way to schedule(),
so can save a useless clock update and resulting microscopic
vruntime update.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321971686.6855.18.camel@marge.simson.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of going through the scheduler domain hierarchy multiple times
(for giving priority to an idle core over an idle SMT sibling in a busy
core), start with the highest scheduler domain with the SD_SHARE_PKG_RESOURCES
flag and traverse the domain hierarchy down till we find an idle group.
This cleanup also addresses an issue reported by Mike where the recent
changes returned the busy thread even in the presence of an idle SMT
sibling in single socket platforms.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321556904.15339.25.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This tracepoint shows how long a task is sleeping in uninterruptible state.
E.g. it may show how long and where a mutex is waited for.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's too many sched*.[ch] files in kernel/, give them their own
directory.
(No code changed, other than Makefile glue added.)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since once needs to do something at conferences and fixing compile
warnings doesn't actually require much if any attention I decided
to break up the sched.c #include "*.c" fest.
This further modularizes the scheduler code.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-x0fcd3mnp8f9c99grcpewmhi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It improves perfomance, especially if autogroup is enabled.
The size of set_task_rq() was 0x180 and now it is 0xa0.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321020240-3874331-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Even though there are no siblings, the list should be
initialized to not contain bogus values.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Acked-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320182360-20043-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the linkage of jump-labels has been fixed they show a measurable
improvement in overhead for the enabled-but-unused case.
Workload is:
'taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches
bash -c "for ((i=0;i<5;i++)); do $(dirname $0)/pipe-test 20000; done"'
There's a speedup for all situations:
instructions cycles branches
-------------------------------------------------------------------------
Intel Westmere
base 806611770 745895590 146765378
+jumplabel 803090165 (-0.44%) 713381840 (-4.36%) 144561130
AMD Barcelona
base 824657415 740055589 148855354
+jumplabel 821056910 (-0.44%) 737558389 (-0.34%) 146635229
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111108042736.560831357@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In return_cfs_rq_runtime() we want to return bandwidth when there are no
remaining tasks, not "return" when this is the case.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111108042736.623812423@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avoid select_idle_sibling() from picking a sibling thread if there's
an idle core that shares cache.
This fixes SMT balancing in the increasingly common case where there's
a shared cache core available to balance to.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1321350377.1421.55.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In UP systems, the idle task is initialized using the init_task
structure from which the command name is taken (currently "swapper").
In SMP systems, one idle task per CPU is forked by the worker thread
from which the task structure is copied. The command name is, therefore,
"kworker/0:0" or "kworker/0:1", if not updated. Since such update was
lacking, all idle tasks in SMP systems were incorrectly named. This
longtime bug was not discovered immediately, because there is no /proc/0
entry - the bug only becomes apparent when tracing is enabled.
This patch sets the command name of the idle tasks in SMP systems to the
name that is used in the INIT_TASK structure suffixed by a slash and the
number of the CPU.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111026211708.768925506@osadl.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Normally the RT bandwidth scheme will share bandwidth across the
entire root_domain. However sometimes its convenient to disable this
sharing for debug purposes. Provide a simple feature switch to this
end.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The return-value convention for these functions varies depending on
whether they're interruptible or can timeout. It can be a little
confusing--document it.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111006192246.GB28026@fieldses.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Every time I have to stare at this function I need to completely
reverse engineer its workings, about time I write a comment
explaining the thing.
Collected bits and pieces from previous changelogs, mostly:
4be9daaa1b83378269a5
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1318518057.27731.2.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'rmobile-fixes-for-linus' of git://github.com/pmundt/linux-sh:
ARM: mach-shmobile: cpuidle single/global and last_state fixes
ARM: mach-shmobile: move helper macro PORTCR to sh_pfc.h
ARM: mach-shmobile: move helper macro PORT_xx to sh_pfc.h
ARM: mach-shmobile: move helper macro PORT_DATA_xx to sh_pfc.h
ARM: mach-shmobile: ap4evb: remove white space from end of line
ARM: mach-shmobile: clock-sh7372: remove un-necessary index
ARM: mach-shmobile: kota2: add comment out separator
ARM: mach-shmobile: sh73a0: add MMC data pin pull-up
Commit 31a3ddda16 introduced
a use after free in virtio-pci. The main issue is
that the release method signals removal of the virtio device,
while remove signals removal of the pci device.
For example, on driver removal or hot-unplug,
virtio_pci_release_dev is called before virtio_pci_remove.
We then might get a crash as virtio_pci_remove tries to use the
device freed by virtio_pci_release_dev.
We allocate/free all resources together with the
pci device, so we can leave the release method empty.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
After commit e978aa7d7d ("cpuidle: Move dev->last_residency update to
driver enter routine; remove dev->last_state") setting acpi_idle_suspend
to 1 by acpi_processor_suspend() causes the ACPI cpuidle routines to
return error codes continuously, which in turn causes cpuidle to lock up
(hard).
However, acpi_idle_suspend doesn't appear to be useful for any
particular purpose (it's racy and doesn't really provide any real
protection), so it can be removed, which makes the problem go away.
Reported-and-tested-by: Tomas M. <tmezzadra@gmail.com>
Reported-and-tested-by: Ferenc Wagner <wferi@niif.hu>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I missed the combios path when I updated the atombios pm code.
Reported by amarsh04 on IRC.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The variable i is removed by commit ded8433
"[CPUFREQ] db8500: remove unneeded for loop iteration over freq_table",
but current code to print available frequencies still uses the i variable.
Thus add the i variable back to fix below buld error:
CC drivers/cpufreq/db8500-cpufreq.o
drivers/cpufreq/db8500-cpufreq.c: In function 'db8500_cpufreq_init':
drivers/cpufreq/db8500-cpufreq.c:123: error: 'i' undeclared (first use in this function)
drivers/cpufreq/db8500-cpufreq.c:123: error: (Each undeclared identifier is reported only once
drivers/cpufreq/db8500-cpufreq.c:123: error: for each function it appears in.)
make[2]: *** [drivers/cpufreq/db8500-cpufreq.o] Error 1
make[1]: *** [drivers/cpufreq] Error 2
make: *** [drivers] Error 2
This patch also fixes using uninitialized i variable as array index.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (29 commits)
m68k/mac: Remove mac_irq_{en,dis}able() wrappers
m68k/irq: Remove obsolete support for user vector interrupt fixups
m68k/irq: Remove obsolete m68k irq framework
m68k/q40: Convert Q40/Q60 to genirq
m68k/sun3: Convert Sun3/3x to genirq
m68k/sun3: Use the kstat_irqs_cpu() wrapper
m68k/apollo: Convert Apollo to genirq
m68k/vme: Convert VME to genirq
m68k/hp300: Convert HP9000/300 and HP9000/400 to genirq
m68k/mac: Optimize interrupts using chain handlers
m68k/mac: Convert Mac to genirq
m68k/amiga: Optimize interrupts using chain handlers
m68k/amiga: Convert Amiga to genirq
m68k/amiga: Refactor amiints.c
m68k/atari: Remove code and comments about different irq types
m68k/atari: Convert Atari to genirq
m68k/irq: Add genirq support
m68k/irq: Remove obsolete IRQ_FLG_* users
m68k/irq: Rename {,__}m68k_handle_int()
m68k/irq: Add m68k_setup_irq_controller()
...
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] v4l2-ctrl: Send change events to all fh for auto cluster slave controls
[media] v4l2-event: Don't set sev->fh to NULL on unsubscribe
[media] v4l2-event: Remove pending events from fh event queue when unsubscribing
[media] v4l2-event: Deny subscribing with a type of V4L2_EVENT_ALL
[media] MAINTAINERS: add a maintainer for s5p-mfc driver
[media] v4l: s5p-mfc: fix reported capabilities
[media] media: vb2: reset queued list on REQBUFS(0) call
[media] media: vb2: set buffer length correctly for all buffer types
[media] media: vb2: add a check for uninitialized buffer
[media] mxl111sf: fix build warning
[media] mxl111sf: remove pointless if condition in mxl111sf_config_spi
[media] mxl111sf: check for errors after mxl111sf_write_reg in mxl111sf_idac_config
[media] mxl111sf: fix return value of mxl111sf_idac_config
[media] uvcvideo: GET_RES should only be checked for BITMAP type menu controls
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/kvm: Fix build failure with HV KVM and CBE
powerpc/ps3: Fix lv1_gpu_attribute hcall
powerpc/ps3: Fix PS3 repository build warnings
powerpc/ps3: irq: Remove IRQF_DISABLED
powerpc/irq: Remove IRQF_DISABLED
powerpc/numa: NUMA topology support for PowerNV
powerpc: Add System RAM to /proc/iomem
powerpc: Add KVM as module to defconfigs
powerpc/kvm: Fix build with older toolchains
powerpc, tqm5200: update tqm5200_defconfig to fit for charon board.
powerpc/5200: add support for charon board
This needed the sfi IRQ 0xFF fix to go in first. It simply plumbs in the
bma023 driver with the firmware naming of it.
Signed-off-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Real world year equals the value in vrtc YEAR register plus an offset.
We used 1960 as the offset to make leap year consistent, but for a
device's first use, its YEAR register is 0 and the system year will
be parsed as 1960 which is not a valid UNIX time and will cause many
applications to fail mysteriously. So we use 1972 instead to fix this
issue.
Updated patch which adds a sanity check suggested by Mathias
This isn't a change in behaviour for systems, because 1972 is the one we
actually use. It's the old version in upstream which is out of sync with
all devices.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a build error. CE4100 with no serial errors because the alternate
function is only a prototype not a null function as intended.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: rename the option to nospace_cache
Btrfs: handle bio_add_page failure gracefully in scrub
Btrfs: fix deadlock caused by the race between relocation
Btrfs: only map pages if we know we need them when reading the space cache
Btrfs: fix orphan backref nodes
Btrfs: Abstract similar code for btrfs_block_rsv_add{, _noflush}
Btrfs: fix unreleased path in btrfs_orphan_cleanup()
Btrfs: fix no reserved space for writing out inode cache
Btrfs: fix nocow when deleting the item
Btrfs: tweak the delayed inode reservations again
Btrfs: rework error handling in btrfs_mount()
Btrfs: close devices on all error paths in open_ctree()
Btrfs: avoid null dereference and leaks when bailing from open_ctree()
Btrfs: fix subvol_name leak on error in btrfs_mount()
Btrfs: fix memory leak in btrfs_parse_early_options()
Btrfs: fix our reservations for updating an inode when completing io
Btrfs: fix oops on NULL trans handle in btrfs_truncate
btrfs: fix double-free 'tree_root' in 'btrfs_mount()'
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Use kmemdup rather than duplicating its implementation
ALSA: hda - Re-enable the check NO_PRESENCE misc bit
ALSA: vmaster - Free slave-links when freeing the master element
ALSA: hda - Don't add elements of other codecs to vmaster slave
ALSA: intel8x0: improve virtual environment detection
ALSA: intel8x0: move virtual environment detection code into one place
ALSA: snd_usb_audio: add Logitech HD Webcam c510 to quirk-384
ALSA: hda - fix internal mic on Dell Vostro 3500 laptop
ALSA: HDA: Remove quirk for Toshiba T110
ALSA: usb-audio - Fix the missing volume quirks at delayed init
ALSA: hda - Mute unused capture sources for Realtek codecs
ALSA: intel8x0: Improve comments for VM optimization
ASoC: Ensure we get an impedence reported for WM8958 jack detect
ASoC: Don't use wm8994->control_data when requesting IRQs
ASoC: Don't use wm8994->control_data in wm8994_readable_register()
ASoC: Update git repository URL
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (42 commits)
drm/radeon/kms/pm: switch to dynamically allocating clock mode array
drm/radeon/kms: optimize r600_pm_profile_init
drm/radeon/kms/pm: add a proper pm profile init function for fusion
drm/radeon/kms: remove extraneous calls to radeon_pm_compute_clocks()
drm/exynos: added padding to be 64-bit align.
drm: fix kconfig unmet dependency warning
drm: add some comments to drm_wait_vblank and drm_queue_vblank_event
drm/radeon/benchmark: signedness bug in radeon_benchmark_move()
drm: do not sleep on vblank while holding a mutex
MAINTAINERS: exynos: Add EXYNOS DRM maintainer entry
drm: try to restore previous CRTC config if mode set fails
drm/radeon/kms: make an aux failure debug only
drm: drop select of SLOW_WORK
drm: serialize access to list of debugfs files
drm/radeon/kms: fix use of vram scratch page on evergreen/ni
drm/radeon: Make sure CS mutex is held across GPU reset.
drm: Ensure string is null terminated.
vmwgfx: Only allow 64x64 cursors
vmwgfx: Initialize clip rect loop correctly in surface dirty
vmwgfx: Close screen object system
...
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix force shutdown handling in xfs_end_io
xfs: constify xfs_item_ops
xfs: Fix possible memory corruption in xfs_readlink
The following error is seen in some case when mounting rootfs from
SD/MMC cards.
Waiting for root device /dev/mmcblk0p1...
mmc1: host does not support reading read-only switch. assuming write-enable.
mmc1: new high speed SDHC card at address b368
mmcblk0: mmc1:b368 SDC 3.74 GiB
mmcblk0: p1
mmc1: Timeout waiting for hardware interrupt.
mmcblk0: error -110 transferring data, sector 3678224, nr 40, cmd response 0x900, card status 0xc00
end_request: I/O error, dev mmcblk0, sector 3678225
Buffer I/O error on device mmcblk0p1, logical block 458754
lost page write due to I/O error on mmcblk0p1
This patch fixes the problem by lowering the usdhc clock and correcting
watermark configuration.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Chris Ball <cjb@laptop.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
The AUTO_ZRELADDR selection for ARCH_IMX_V4_V5 and ARCH_MX5 should
really be mutually exclusive to ZBOOT_ROM just like what ARCH_IMX_V6_V7
does.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
The config symbol ARCH_MX3 has been removed by commit 'a89cf59
arm/imx: merge i.MX3 and i.MX6', and it should not be referenced
any more.
The patch also change ARCH_MX* to SOC_IMX* for other platforms.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Rename no_space_cache option to nospace_cache to be more consistent with
the rest, where the simple prefix 'no' is used to negate an option.
The option has been introduced during the -rc1 cycle and there are has not been
widely used, so it's safe.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>