kernel_optimize_test/arch
Cyrill Gorcunov a072738e04 perf, x86: Implement initial P4 PMU driver
The netburst PMU is way different from the "architectural
perfomance monitoring" specification that current CPUs use.
P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle
perfomance monitoring events.

A few implementational details:

1) We need a separate x86_pmu::hw_config helper in struct
   x86_pmu since register bit-fields are quite different from P6,
   Core and later cpu series.

2) For the same reason is a x86_pmu::schedule_events helper
   introduced.

3) hw_perf_event::config consists of packed ESCR+CCCR values.
   It's allowed since in reality both registers only use a half
   of their size. Of course before making a real write into a
   particular MSR we need to unpack the value and extend it to
   a proper size.

4) The tuple of packed ESCR+CCCR in hw_perf_event::config
   doesn't describe the memory address of ESCR MSR register
   so that we need to keep a mapping between these tuples
   used and available ESCR (various P4 events may use same
   ESCRs but not simultaneously), for this sake every active
   event has a per-cpu map of hw_perf_event::idx <--> ESCR
   addresses.

5) Since hw_perf_event::idx is an offset to counter/control register
   we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel
   strips it down to 8 registers and event armed may never be turned
   off (ie the bit in active_mask is set but the loop never reaches
   this index to check), thanks to Peter Zijlstra

Restrictions:

 - No cascaded counters support (do we ever need them?)
 - No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS
   doesn't work for now)
 - There are events with same counters which can't work simultaneously
   (need to use intersected ones due to broken counter 1)
 - No PERF_COUNT_HW_CACHE_ events yet

Todo:

 - Implement dependent events
 - Need proper hashing for event opcodes (no linear search, good for
   debugging stage but not in real loads)
 - Some events counted during a clock cycle -- need to set threshold
   for them and count every clock cycle just to get summary statistics
   (ie to behave the same way as other PMUs do)
 - Need to swicth to use event_constraints
 - To support RAW events we need to encode a global list of P4 events
   into p4_templates
 - Cache events need to be added

Event support status matrix:

 Event			status
 -----------------------------
 cycles			works
 cache-references	works
 cache-misses		works
 branch-misses		works
 bus-cycles		partially (does not work on 64bit cpu with HT enabled)
 instruction		doesnt work (needs dependent event [mop tagging])
 branches		doesnt work

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100311165439.GB5129@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:51:08 +01:00
..
alpha alpha: PTR_ERR overwrites -EINVAL in syscall osf_mount 2010-03-06 11:26:27 -08:00
arm perf: Provide generic perf_sample_data initialization 2010-03-10 13:22:23 +01:00
avr32 USB: atmel uaba: Adding invert vbus_pin 2010-03-02 14:54:57 -08:00
blackfin
cris cris v32: typo in crisv32_arbiter_unwatch()? 2010-03-06 11:26:28 -08:00
frv frv: remove pci_dma_sync_single() and pci_dma_sync_sg() 2010-03-06 11:26:27 -08:00
h8300
ia64 Driver core: Constify struct sysfs_ops in struct kobj_type 2010-03-07 17:04:49 -08:00
m32r Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-03-03 07:34:18 -08:00
m68k Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2010-03-01 09:15:15 -08:00
m68knommu
microblaze Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-03-03 07:34:18 -08:00
mips sysdev: Pass attribute in sysdev_class attributes show/store 2010-03-07 17:04:47 -08:00
mn10300 Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2010-03-01 09:15:15 -08:00
parisc parisc: use __ratelimit in unaligned.c 2010-03-06 22:54:59 +00:00
powerpc perf: Rework and fix the arch CPU-hotplug hooks 2010-03-10 13:22:24 +01:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 2010-03-08 10:17:20 -08:00
score Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2010-03-01 09:15:15 -08:00
sh perf: Rework and fix the arch CPU-hotplug hooks 2010-03-10 13:22:24 +01:00
sparc perf: Provide generic perf_sample_data initialization 2010-03-10 13:22:23 +01:00
um elf coredump: add extended numbering support 2010-03-06 11:26:46 -08:00
x86 perf, x86: Implement initial P4 PMU driver 2010-03-11 18:51:08 +01:00
xtensa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2010-03-02 07:55:08 -08:00
.gitignore
Kconfig Merge branch 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-05 10:50:22 -08:00