kernel_optimize_test

Author	SHA1	Message	Date
Max Filippov	da844a8177	xtensa: add device trees support Device trees allow specification of hardware topology and device parameters at runtime instead of hard-coding them in platform setup code. This allows running single binary kernel on a range of compatible boards. New boot parameters tag BP_TAG_FDT is allocated and a pointer to flat device tree is passed in it. Note that current interrupt mapping scheme uses single cell for interrupt identification. That means that IRQ numbers used in DTS must be CPU internal IRQ numbers, not external. It is possible to extend interrupt identification to two cells, and use second cell to tell external IRQ numbers form internal. That would allow to use single DTS on multiple boards with different mapping of external IRQ numbers. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:23 -08:00
Max Filippov	2206d5dd9a	xtensa: add IRQ domains support IRQ domains provide a mechanism for conversion of linux IRQ numbers to hardware IRQ numbers and vice versus. It is used by OpenFirmware for linking device tree objects to their respective interrupt controllers. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:23 -08:00
Max Filippov	0322cabd39	xtensa: add U-Boot image support (uImage). Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:23 -08:00
Max Filippov	3f5ec298e5	xtensa: clean up boot make rules - remove duplicate rules for binary and packed image - use predefined macros for ld/objcopy/gzip - remove build-id section from bootable elf image Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:23 -08:00
Max Filippov	599bf77a0d	xtensa: fix mb and wmb definitions Define mb and wmb as memw to force memory barrier. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:23 -08:00
Max Filippov	71872b5fb2	xtensa: add s32c1i-based spinlock implementations Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:22 -08:00
Max Filippov	e5a9f6adba	xtensa: add s32c1i-based bitops implementations Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:22 -08:00
Max Filippov	219b1e4c61	xtensa: add s32c1i-based atomic ops implementations Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:22 -08:00
Max Filippov	00273125c3	xtensa: add s32c1i sanity check Add a brief sanity test of S32C1I functionality. This instruction is needed by the kernel and userland as part of the base ABI (including GCC atomic builtins, certain threading packages, future atomic support in the C++ standard, etc). However, correct operation of this instruction requires some cooperation by hardware external to the processor (such as bus bridge, bus fabric, or memory controller). Minimally exercising this mechanism and reporting explicit status early in the boot process is helpful to chip vendors using the Linux kernel as a benchmark of correctness of hardware. As it turns out, S32C1I is not exercised by the kernel and by uClibc based userland as of early June 2008. This is expected to change soon as both incorporate more recent open source developments. Signed-off-by: Marc Gauthier <marc@tensilica.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:22 -08:00
Max Filippov	28570e8dac	xtensa: add trap_set_handler function trap_set_handler sets new C-handler in the exception table and returns previous handler. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:22 -08:00
Max Filippov	c622b29d1f	xtensa: initialize atomctl SR In order to use S32C1I instruction on cores with ATOMCTL SR the register must be properly initialized. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:22 -08:00
Max Filippov	733536b865	xtensa: save and restore scompare1 SR on kernel entry Although scompare1 may be saved/restored by xchal_ncp_{load,store} macros, explicit save/restore of registers manipulated by the kernel itself is considered more correct. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:22 -08:00
Max Filippov	2f6ea6a767	xtensa: display s32c1i feature flag in cpuinfo Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:21 -08:00
Max Filippov	415217efc1	xtensa: fix CPU cache flags formatting Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:21 -08:00
Max Filippov	288dc2b68c	xtensa: properly fix missing compiler barrier in simcall Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:21 -08:00
Max Filippov	382cb5b917	xtensa: fix build warning for arch/xtensa/mm/tlb.c Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:21 -08:00
Max Filippov	35b16a9a09	xtensa: provide DMA_ERROR_CODE definition This fixes the following allmodconfig build error: drivers/uio/uio_dmem_genirq.c:95:18: error: 'DMA_ERROR_CODE' undeclared (first use in this function) drivers/uio/uio_dmem_genirq.c:238:18: error: 'DMA_ERROR_CODE' undeclared (first use in this function) make[3]: *** [drivers/uio/uio_dmem_genirq.o] Error 1 Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:21 -08:00
Max Filippov	94d6c61b97	xtensa: ISS: add BASE_BAUD definition to serial.h This fixes the following build error in allyesconfig: drivers/tty/serial/8250/8250_early.c: In function 'parse_options': drivers/tty/serial/8250/8250_early.c:160:18: error: 'BASE_BAUD' undeclared (first use in this function) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:21 -08:00
Chris Zankel	d1538c4675	xtensa: provide proper assembler function boundaries with ENDPROC() Use ENDPROC() to mark the end of assembler functions. Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:20 -08:00
Max Filippov	c0226e34a4	xtensa: make DoubleExceptionVector literals fit the gap Manually load references to exc_table from the explicit literal in order to fit DoubleExceptionVector.literals into the available 16-byte gap before DoubleExceptionVector.text in the absence of link time relaxation. Without this fix DoubleExceptionVector.literal section overlaps DoubleExceptionVector.text section in the linked vmlinux image. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:20 -08:00
Chris Zankel	6550162200	xtensa: add config option to disable linker relaxation The default linker behavior is to optimize identical literal values and remove unnecessary overhead from assembler-generated "longcall" sequences to reduce code size. Provide an option to disable this behavior to improve compile time. Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:20 -08:00
Nicolas Kaiser	02b25d811f	xtensa: unbalanced parentheses Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:20 -08:00
Wanlong Gao	09378d7c21	xtensa:fix the incompatible pointer type warning in time.c Fix the definition of the function ccount_read to be compatible to the member read of the structure clocksource. Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>	2012-12-18 21:10:20 -08:00
Linus Torvalds	752451f01c	Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux Pull i2c-embedded changes from Wolfram Sang: - CBUS driver (an I2C variant) - continued rework of the omap driver - s3c2410 gets lots of fixes and gains pinctrl support - at91 gains DMA support - the GPIO muxer gains devicetree probing - typical fixes and additions all over * 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (45 commits) i2c: omap: Remove the OMAP_I2C_FLAG_RESET_REGS_POSTIDLE flag i2c: at91: add dma support i2c: at91: change struct members indentation i2c: at91: fix compilation warning i2c: mxs: Do not disable the I2C SMBus quick mode i2c: mxs: Handle i2c DMA failure properly i2c: s3c2410: Remove recently introduced performance overheads i2c: ocores: Move grlib set/get functions into #ifdef CONFIG_OF block i2c: s3c2410: Add fix for i2c suspend/resume i2c: s3c2410: Fix code to free gpios i2c: i2c-cbus-gpio: introduce driver i2c: ocores: Add support for the GRLIB port of the controller and use function pointers for getreg and setreg functions i2c: ocores: Add irq support for sparc i2c: omap: Move the remove constraint ARM: dts: cfa10049: Add the i2c muxer buses to the CFA-10049 i2c: s3c2410: do not special case HDMIPHY stuck bus detection i2c: s3c2410: use exponential back off while polling for bus idle i2c: s3c2410: do not generate STOP for QUIRK_HDMIPHY i2c: s3c2410: grab adapter lock while changing i2c clock i2c: s3c2410: Add support for pinctrl ...	2012-12-18 16:51:10 -08:00
Shawn Guo	c1e37ea287	net: fec: forbid FEC_PTP on SoCs that do not support Beside imx6q, the kernel built from imx_v6_v7_defconfig is also supposed to be running on other IMX SoCs that do not have the PTP block. Before fec driver gets fixed to run-time detect target hardware rather than conditional compiling with #ifdef CONFIG_FEC_PTP, let's give it a quick fix in Kconfig to forbid FEC_PTP on those IMX SoCs that do not support PTP. Reported-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-18 16:26:18 -08:00
Sathya Perla	d23e946cb6	be2net: fix wrong frag_idx reported by RX CQ The RX CQ can report completions with invalid frag_idx when the RXQ that was previously using it, was not cleaned up properly. This hits a BUG_ON() in be2net. When completion coalescing is enabled on a CQ, an explicit CQ-notify (with rearm) is needed for each compl, to flush partially coalesced CQ entries that are pending DMA. In be_close(), this fix now notifies CQ for each compl, waits explicitly for the flush compl to arrive and complains if it doesn't arrive. Also renaming be_crit_error() to be_hw_error() as it's the more appropriate name and to convey that we don't wait for the flush compl only when a HW error has occurred. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-18 16:18:39 -08:00
Sathya Perla	a323d9bf83	be2net: fix be_close() to ensure all events are ack'ed In be_close(), be_eq_clean() must be called after all RX/TX/MCC queues have been cleaned to ensure that any events caused while cleaning up completions are notified/acked. Not clearing all events can cause upredictable behaviour when RX rings are re-created in the subsequent be_open(). Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-18 16:18:39 -08:00
David S. Miller	4a9d1946b0	sparc64: Define pte_accessible() We can elide flush_tlb_*() calls when _PAGE_VALID is clear as that is the test used to determine whether or not to queue up a TLB flush in set_pte_at(). Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-18 16:06:16 -08:00
Dave Kleikamp	6cb9c36975	sparc: huge_ptep_set_* functions need to call set_huge_pte_at() Modifying the huge pte's requires that all the underlying pte's be modified. Version 2: added missing flush_tlb_page() Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-18 15:36:20 -08:00
Linus Torvalds	673ab8783b	Merge branch 'akpm' (more patches from Andrew) Merge patches from Andrew Morton: "Most of the rest of MM, plus a few dribs and drabs. I still have quite a few irritating patches left around: ones with dubious testing results, lack of review, ones which should have gone via maintainer trees but the maintainers are slack, etc. I need to be more activist in getting these things wrapped up outside the merge window, but they're such a PITA." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits) mm/vmscan.c: avoid possible deadlock caused by too_many_isolated() vmscan: comment too_many_isolated() mm/kmemleak.c: remove obsolete simple_strtoul mm/memory_hotplug.c: improve comments mm/hugetlb: create hugetlb cgroup file in hugetlb_init mm/mprotect.c: coding-style cleanups Documentation: ABI: /sys/devices/system/node/ slub: drop mutex before deleting sysfs entry memcg: add comments clarifying aspects of cache attribute propagation kmem: add slab-specific documentation about the kmem controller slub: slub-specific propagation changes slab: propagate tunable values memcg: aggregate memcg cache values in slabinfo memcg/sl[au]b: shrink dead caches memcg/sl[au]b: track all the memcg children of a kmem_cache memcg: destroy memcg caches sl[au]b: allocate objects from memcg cache sl[au]b: always get the cache from its page in kmem_cache_free() memcg: skip memcg kmem allocations in specified code regions memcg: infrastructure to match an allocation to the right cache ...	2012-12-18 15:08:12 -08:00
Linus Torvalds	d7b96ca5d0	Fix fallout from __devexit removal -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQzoH6AAoJEMsfJm/On5mBk8MQAJgBifZmikVrhleYIaCW7p4g MwyL3+2TFE1Hdgr8j2cblaOm4rfSbmAAOrPHcZQHY6lAeEJpwClce1SzyAL7C97T GLYfPegUjN1Itvs+jL3dEv9Y9xmIcIvSG18IScjLhaazIYz8sWprsGs62BVX9BnC UZMZBHqSpyOuh2NZSJSGj2J3ZGkDBeHI82N2FidLP0Xh2bki/Q/H4AY+f3+oomLC DTXiF9URH2qbS0oSpKgpfVzxA2sS4JF/i70AvBLYFlG5YjtGOQBtPxoPHNAgBdLH vZ7a/hEdExhqP/2l7m7yRaLVwCKi5o/Mf8xCFpygNu/vo66u2sw7Mo3TjFQ18iVl 9hDwTz1Uq7TlBsORHx6cngHIq8V6s6dxt1zOe68txjdH82LFEJEB4ybRdXoAPHc/ GRuCe2RnYTmMLwRnVmAuNTfq32fXPu6lq4zwFrK5jxMSMBkqsQWtBkJRKOy09vGb mp71PUnv5N1/SEcS4+ddjKnMQ/WBl4pB5qkkGer/vSxeQSWbyfaLfruR0KdEBhnY bK/JFZVN9jII040yk3gxQmOmZQLnsHy6U8mOVDAeDAIII2gpPL2X+IQOmmnHY47g pUF6zh70eRMD37+OzB4VuxmR9zS7PDGWPO3i1VuAGLN0FNdD1b9Hhdnmmf1xQM7a 7gQL4V/S3epB3GtvXJO7 =uh1S -----END PGP SIGNATURE----- Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixlet from Guenter Roeck: "Fix fallout from __devexit removal" * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (twl4030-madc-hwmon) Fix warning message caused by removal of __devexit	2012-12-18 15:06:11 -08:00
Linus Torvalds	5b3040a48b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull tile updates from Chris Metcalf: "These are a smattering of minor changes from Tilera and other folks, mostly in the ptrace area." * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: set CORE_DUMP_USE_REGSET on tile arch/tile: implement arch_ptrace using user_regset on tile arch/tile: implement user_regset interface on tile arch/tile: clean up tile-specific PTRACE_SETOPTIONS arch/tile: provide PT_FLAGS_COMPAT value in pt_regs tile/PCI: use for_each_pci_dev to simplify the code tilegx: remove __init from pci fixup hook	2012-12-18 15:05:30 -08:00
Fengguang Wu	3cf23841b4	mm/vmscan.c: avoid possible deadlock caused by too_many_isolated() Neil found that if too_many_isolated() returns true while performing direct reclaim we can end up waiting for other threads to complete their direct reclaim. If those threads are allowed to enter the FS or IO to free memory, but this thread is not, then it is possible that those threads will be waiting on this thread and so we get a circular deadlock. some task enters direct reclaim with GFP_KERNEL => too_many_isolated() false => vmscan and run into dirty pages => pageout() => take some FS lock => fs/block code does GFP_NOIO allocation => enter direct reclaim again => too_many_isolated() true => waiting for others to progress, however the other tasks may be circular waiting for the FS lock.. The fix is to let !__GFP_IO and !__GFP_FS direct reclaims enjoy higher priority than normal ones, by lowering the throttle threshold for the latter. Allowing ~1/8 isolated pages in normal is large enough. For example, for a 1GB LRU list, that's ~128MB isolated pages, or 1k blocked tasks (each isolates 32 4KB pages), or 64 blocked tasks per logical CPU (assuming 16 logical CPUs per NUMA node). So it's not likely some CPU goes idle waiting (when it could make progress) because of this limit: there are much more sleeping reclaim tasks than the number of CPU, so the task may well be blocked by some low level queue/lock anyway. Now !GFP_IOFS reclaims won't be waiting for GFP_IOFS reclaims to progress. They will be blocked only when there are too many concurrent !GFP_IOFS reclaims, however that's very unlikely because the IO-less direct reclaims is able to progress much more faster, and they won't deadlock each other. The threshold is raised high enough for them, so that there can be sufficient parallel progress of !GFP_IOFS reclaims. [akpm@linux-foundation.org: tweak comment] Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Torsten Kaiser <just.for.lkml@googlemail.com> Tested-by: NeilBrown <neilb@suse.de> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Fengguang Wu	d37dd5dcb9	vmscan: comment too_many_isolated() Comment "Why it's doing so" rather than "What it does" as proposed by Andrew Morton. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Abhijit Pawar	dc053733ea	mm/kmemleak.c: remove obsolete simple_strtoul Replace the obsolete simple_strtoul() with kstrtoul(). Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Tang Chen	79a4dcefd3	mm/memory_hotplug.c: improve comments Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Jianguo Wu	7179e7bf45	mm/hugetlb: create hugetlb cgroup file in hugetlb_init Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y and CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option, system will fail to boot. This failure is caused by following code path: setup_hugepagesz hugetlb_add_hstate hugetlb_cgroup_file_init cgroup_add_cftypes kzalloc <--slab is not available yet For this path, slab is not available yet, so memory allocated will be failed, and cause WARN_ON() in hugetlb_cgroup_file_init(). So I move hugetlb_cgroup_file_init() into hugetlb_init(). [akpm@linux-foundation.org: tweak coding-style, remove pointless __init on inlined function] [akpm@linux-foundation.org: fix warning] Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Andrew Morton	7d12efaea7	mm/mprotect.c: coding-style cleanups A few gremlins have recently crept in. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Davidlohr Bueso	5bbe1ec11f	Documentation: ABI: /sys/devices/system/node/ Describe NUMA node sysfs files/attributes. Note that for the specific dates and contacts I couldn't find, I left it as default for Oct 2002 and linux-mm. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Glauber Costa	5413dfba88	slub: drop mutex before deleting sysfs entry Sasha Levin recently reported a lockdep problem resulting from the new attribute propagation introduced by kmemcg series. In short, slab_mutex will be called from within the sysfs attribute store function. This will create a dependency, that will later be held backwards when a cache is destroyed - since destruction occurs with the slab_mutex held, and then calls in to the sysfs directory removal function. In this patch, I propose to adopt a strategy close to what __kmem_cache_create does before calling sysfs_slab_add, and release the lock before the call to sysfs_slab_remove. This is pretty much the last operation in the kmem_cache_shutdown() path, so we could do better by splitting this and moving this call alone to later on. This will fit nicely when sysfs handling is consistent between all caches, but will look weird now. Lockdep info: ====================================================== [ INFO: possible circular locking dependency detected ] 3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117 Tainted: G W ------------------------------------------------------- trinity-child13/6961 is trying to acquire lock: (s_active#43){++++.+}, at: sysfs_addrm_finish+0x31/0x60 but task is already holding lock: (slab_mutex){+.+.+.}, at: kmem_cache_destroy+0x22/0xe0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (slab_mutex){+.+.+.}: lock_acquire+0x1aa/0x240 __mutex_lock_common+0x59/0x5a0 mutex_lock_nested+0x3f/0x50 slab_attr_store+0xde/0x110 sysfs_write_file+0xfa/0x150 vfs_write+0xb0/0x180 sys_pwrite64+0x60/0xb0 tracesys+0xe1/0xe6 -> #0 (s_active#43){++++.+}: __lock_acquire+0x14df/0x1ca0 lock_acquire+0x1aa/0x240 sysfs_deactivate+0x122/0x1a0 sysfs_addrm_finish+0x31/0x60 sysfs_remove_dir+0x89/0xd0 kobject_del+0x16/0x40 __kmem_cache_shutdown+0x40/0x60 kmem_cache_destroy+0x40/0xe0 mon_text_release+0x78/0xe0 __fput+0x122/0x2d0 ____fput+0x9/0x10 task_work_run+0xbe/0x100 do_exit+0x432/0xbd0 do_group_exit+0x84/0xd0 get_signal_to_deliver+0x81d/0x930 do_signal+0x3a/0x950 do_notify_resume+0x3e/0x90 int_signal+0x12/0x17 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slab_mutex); lock(s_active#43); lock(slab_mutex); lock(s_active#43); * DEADLOCK * 2 locks held by trinity-child13/6961: #0: (mon_lock){+.+.+.}, at: mon_text_release+0x25/0xe0 #1: (slab_mutex){+.+.+.}, at: kmem_cache_destroy+0x22/0xe0 stack backtrace: Pid: 6961, comm: trinity-child13 Tainted: G W 3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117 Call Trace: print_circular_bug+0x1fb/0x20c __lock_acquire+0x14df/0x1ca0 lock_acquire+0x1aa/0x240 sysfs_deactivate+0x122/0x1a0 sysfs_addrm_finish+0x31/0x60 sysfs_remove_dir+0x89/0xd0 kobject_del+0x16/0x40 __kmem_cache_shutdown+0x40/0x60 kmem_cache_destroy+0x40/0xe0 mon_text_release+0x78/0xe0 __fput+0x122/0x2d0 ____fput+0x9/0x10 task_work_run+0xbe/0x100 do_exit+0x432/0xbd0 do_group_exit+0x84/0xd0 get_signal_to_deliver+0x81d/0x930 do_signal+0x3a/0x950 do_notify_resume+0x3e/0x90 int_signal+0x12/0x17 Signed-off-by: Glauber Costa <glommer@parallels.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Glauber Costa	ebe945c276	memcg: add comments clarifying aspects of cache attribute propagation This patch clarifies two aspects of cache attribute propagation. First, the expected context for the for_each_memcg_cache macro in memcontrol.h. The usages already in the codebase are safe. In mm/slub.c, it is trivially safe because the lock is acquired right before the loop. In mm/slab.c, it is less so: the lock is acquired by an outer function a few steps back in the stack, so a VM_BUG_ON() is added to make sure it is indeed safe. A comment is also added to detail why we are returning the value of the parent cache and ignoring the children's when we propagate the attributes. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Glauber Costa	92e7934955	kmem: add slab-specific documentation about the kmem controller Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:15 -08:00
Glauber Costa	107dab5c92	slub: slub-specific propagation changes SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a ->show() in the root cache, and then ->store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. [akpm@linux-foundation.org: tweak code to avoid __maybe_unused] Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00
Glauber Costa	943a451a87	slab: propagate tunable values SLAB allows us to tune a particular cache behavior with tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This could be done by an explicit call to do_tune_cpucache() after the cache is created. But this is not very convenient now that the caches are created from common code, since this function is SLAB-specific. Another method of doing that is taking advantage of the fact that do_tune_cpucache() is always called from enable_cpucache(), which is called at cache initialization. We can just preset the values, and then things work as expected. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. This change will require us to move the assignment of root_cache in memcg_params a bit earlier. We need this to be already set - which memcg_kmem_register_cache will do - when we reach __kmem_cache_create() Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00
Glauber Costa	749c54151a	memcg: aggregate memcg cache values in slabinfo When we create caches in memcgs, we need to display their usage information somewhere. We'll adopt a scheme similar to /proc/meminfo, with aggregate totals shown in the global file, and per-group information stored in the group itself. For the time being, only reads are allowed in the per-group cache. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00
Glauber Costa	2293315293	memcg/sl[au]b: shrink dead caches This means that when we destroy a memcg cache that happened to be empty, those caches may take a lot of time to go away: removing the memcg reference won't destroy them - because there are pending references, and the empty pages will stay there, until a shrinker is called upon for any reason. In this patch, we will call kmem_cache_shrink() for all dead caches that cannot be destroyed because of remaining pages. After shrinking, it is possible that it could be freed. If this is not the case, we'll schedule a lazy worker to keep trying. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00
Glauber Costa	7cf2798240	memcg/sl[au]b: track all the memcg children of a kmem_cache This enables us to remove all the children of a kmem_cache being destroyed, if for example the kernel module it's being used in gets unloaded. Otherwise, the children will still point to the destroyed parent. Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00
Glauber Costa	1f458cbf12	memcg: destroy memcg caches Implement destruction of memcg caches. Right now, only caches where our reference counter is the last remaining are deleted. If there are any other reference counters around, we just leave the caches lying around until they go away. When that happens, a destruction function is called from the cache code. Caches are only destroyed in process context, so we queue them up for later processing in the general case. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00
Glauber Costa	d79923fad9	sl[au]b: allocate objects from memcg cache We are able to match a cache allocation to a particular memcg. If the task doesn't change groups during the allocation itself - a rare event, this will give us a good picture about who is the first group to touch a cache page. This patch uses the now available infrastructure by calling memcg_kmem_get_cache() before all the cache allocations. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00
Glauber Costa	b9ce5ef49f	sl[au]b: always get the cache from its page in kmem_cache_free() struct page already has this information. If we start chaining caches, this information will always be more trustworthy than whatever is passed into the function. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 15:02:14 -08:00

1 2 3 4 5 ...

347486 Commits