Go to file
Vijay Thakkar 2079f7aa0a perf vendor events amd: Add Zen2 events
This patch adds PMU events for AMD Zen2 core based processors, namely,
Matisse (model 71h), Castle Peak (model 31h) and Rome (model 2xh), as
documented in the AMD Processor Programming Reference for Matisse [1].
The model number regex has been set to detect all the models under
family 17 that do not match those of Zen1, as the range is larger for
zen2.

Zen2 adds some additional counters that are not present in Zen1 and
events for them have been added in this patch. Some counters have also
been removed for Zen2 thatwere previously present in Zen1 and have been
confirmed to always sample zero on zen2. These added/removed counters
have been omitted for brevity but can be found here:
https://gist.github.com/thakkarV/5b12ca5fd7488eb2c42e451e40bdd5f3

Note that PPR for Zen2 [1] does not include some counters that were
documented in the PPR for Zen1 based processors [2]. After having tested
these counters, some of them that still work for zen2 systems have been
preserved in the events for zen2. The counters that are omitted in [1]
but are still measurable and non-zero on zen2 (tested on a Ryzen 3900X
system) are the following:

  PMC 0x000 fpu_pipe_assignment.{total|total0|total1|total2|total3}
  PMC 0x004 fp_num_mov_elim_scal_op.*
  PMC 0x046 ls_tablewalker.*
  PMC 0x062 l2_latency.l2_cycles_waiting_on_fills
  PMC 0x063 l2_wcb_req.*
  PMC 0x06D l2_fill_pending.l2_fill_busy
  PMC 0x080 ic_fw32
  PMC 0x081 ic_fw32_miss
  PMC 0x086 bp_snp_re_sync
  PMC 0x087 ic_fetch_stall.*
  PMC 0x08C ic_cache_inval.*
  PMC 0x099 bp_tlb_rel
  PMC 0x0C7 ex_ret_brn_resync
  PMC 0x28A ic_oc_mode_switch.*
  L3PMC 0x001 l3_request_g1.*
  L3PMC 0x006 l3_comb_clstr_state.*

[1]: Processor Programming Reference (PPR) for AMD Family 17h Model 71h,
Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019

[2]: Processor Programming Reference (PPR) for AMD Family 17h Models
01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019

All of the PPRs can be found at:

https://bugzilla.kernel.org/show_bug.cgi?id=206537

Here are the results of running "fpu_pipe_assignment.total" events on my
Ryzen 3900X family 17h model 71h system:

Before this patch:

  $> perf list *fpu_pipe_assignment*

List of pre-defined events (to be used in -e):

After:

  $> perf list *fpu_pipe_assignment*

  floating point:
  fpu_pipe_assignment.total
      [Total number of fp uOps]
  fpu_pipe_assignment.total0
      [Total number uOps assigned to pipe 0]
  fpu_pipe_assignment.total1
      [Total number uOps assigned to pipe 1]
  fpu_pipe_assignment.total2
      [Total number uOps assigned to pipe 2]
  fpu_pipe_assignment.total3
      [Total number uOps assigned to pipe 3]

  Metric Groups:

  $> perf stat -e fpu_pipe_assignment.total sleep 1

  Performance counter stats for 'sleep 1':

              25,883      fpu_pipe_assignment.total

         1.004145868 seconds time elapsed

         0.001805000 seconds user
         0.000000000 seconds sys

Usage tests while running Linpackin the background:

  $> perf stat -I1000 -e fpu_pipe_assignment.total
       1.000266796     79,313,191,516      fpu_pipe_assignment.total
       2.000809630     68,091,474,430      fpu_pipe_assignment.total
       3.001028115     52,925,023,174      fpu_pipe_assignment.total

  $> perf record -e fpu_pipe_assignment.total,fpu_pipe_assignment.total0 -a sleep 1
  [ perf record: Woken up 9 times to write data ]
  [ perf record: Captured and wrote 4.031 MB perf.data (64764 samples) ]

  $> perf report --stdio --no-header | head -30
      98.33%  xhpl             xhpl                          [.] dgemm_kernel
       0.28%  xhpl             xhpl                          [.] dtrsm_kernel_LT
       0.10%  xhpl             [kernel.kallsyms]             [k] entry_SYSCALL_64
       0.08%  xhpl             xhpl                          [.] idamax_k
       0.07%  baloo_file_extr  liblmdb.so                    [.] mdb_mid2l_insert
       0.06%  xhpl             xhpl                          [.] dgemm_itcopy
       0.06%  xhpl             xhpl                          [.] dgemm_oncopy
       0.06%  xhpl             [kernel.kallsyms]             [k] __schedule
       0.06%  xhpl             [kernel.kallsyms]             [k] syscall_trace_enter
       0.06%  xhpl             [kernel.kallsyms]             [k] native_sched_clock
       0.06%  xhpl             [kernel.kallsyms]             [k] pick_next_task_fair
       0.05%  xhpl             xhpl                          [.] blas_thread_server.llvm.15009391670273914865
       0.04%  xhpl             [kernel.kallsyms]             [k] do_syscall_64
       0.04%  xhpl             [kernel.kallsyms]             [k] yield_task_fair
       0.04%  xhpl             libpthread-2.31.so            [.] __pthread_mutex_unlock_usercnt
       0.03%  xhpl             [kernel.kallsyms]             [k] cpuacct_charge
       0.03%  xhpl             [kernel.kallsyms]             [k] syscall_return_via_sysret
       0.03%  xhpl             libc-2.31.so                  [.] __sched_yield
       0.03%  xhpl             [kernel.kallsyms]             [k] __calc_delta

  $> perf annotate --stdio2 dgemm_kernel | egrep '^ {0,2}[0-9]+' -B2 -A2
                  sub          $0x60,%rsp
                  mov          %rbx,(%rsp)
    0.00          mov          %rbp,0x8(%rsp)
                  mov          %r12,0x10(%rsp)
    0.00          mov          %r13,0x18(%rsp)
                  mov          %r14,0x20(%rsp)
                  mov          %r15,0x28(%rsp)
  --
                  mov          %rdi,%r13
                  mov          %rsi,0x28(%rsp)
    0.00          mov          %rdx,%r12
                  vmovsd       %xmm0,0x30(%rsp)
                  shl          $0x3,%r10
                  mov          0x28(%rsp),%rax
    0.00          xor          %rdx,%rdx
                  mov          $0x18,%rdi
                  div          %rdi
  --
                  nop
            a0:   mov          %r12,%rax
    0.00          shl          $0x3,%rax
                  mov          %r8,%rdi
                  lea          (%r8,%rax,8),%r15
  --
                  mov          %r12,%rax
                  nop
    0.00    c0:   vmovups      (%rdi),%ymm1
    0.09          vmovups      0x20(%rdi),%ymm2
    0.02          vmovups      (%r15),%ymm3
    0.10          vmovups      %ymm1,(%rsi)
    0.07          vmovups      %ymm2,0x20(%rsi)
    0.07          vmovups      %ymm3,0x40(%rsi)
    0.06          add          $0x40,%rdi
                  add          $0x40,%r15
                  add          $0x60,%rsi
    0.00          dec          %rax
                ↑ jne          c0
                  mov          %r9,%r15
  --
                  nop
           110:   lea          0x80(%rsp),%rsi
    0.01          add          $0x60,%rsi
    0.03          mov          %r12,%rax
    0.00          sar          $0x3,%rax
                  cmp          $0x2,%rax
                ↓ jl           d26
                  prefetcht0   0x200(%rdi)
    0.01          vmovups      -0x60(%rsi),%ymm1
    0.02          prefetcht0   0xa0(%rsi)
    0.00          vbroadcastsd -0x80(%rdi),%ymm0
    0.00          prefetcht0   0xe0(%rsi)
    0.03          vmovups      -0x40(%rsi),%ymm2
    0.00          prefetcht0   0x120(%rsi)
                  vmovups      -0x20(%rsi),%ymm3
                  vmulpd       %ymm0,%ymm1,%ymm4
    0.01          prefetcht0   0x160(%rsi)
                  vmulpd       %ymm0,%ymm2,%ymm8
    0.01          vmulpd       %ymm0,%ymm3,%ymm12
    0.02          prefetcht0   0x1a0(%rsi)
    0.01          vbroadcastsd -0x78(%rdi),%ymm0
                  vmulpd       %ymm0,%ymm1,%ymm5
    0.01          vmulpd       %ymm0,%ymm2,%ymm9
                  vmulpd       %ymm0,%ymm3,%ymm13
    0.01          vbroadcastsd -0x70(%rdi),%ymm0
                  vmulpd       %ymm0,%ymm1,%ymm6
    0.00          vmulpd       %ymm0,%ymm2,%ymm10
    0.00          add          $0x60,%rsi

  ... snip ...

                  nop
          65e0:   vmovddup     -0x60(%rsi),%xmm2
    0.00          vmovups      -0x80(%rdi),%xmm0
                  vmovups      -0x70(%rdi),%xmm1
    0.00          vmovddup     -0x58(%rsi),%xmm3
                  vfmadd231pd  %xmm0,%xmm2,%xmm4
    0.00          vfmadd231pd  %xmm1,%xmm2,%xmm5
    0.00          vfmadd231pd  %xmm0,%xmm3,%xmm6
    0.00          vfmadd231pd  %xmm1,%xmm3,%xmm7
    0.00          add          $0x10,%rsi
                  add          $0x20,%rdi
    0.00          dec          %rax
                ↑ jne          65e0
                  nop
                  nop
          6620:   vmovddup     0x30(%rsp),%xmm0
    0.00          vmulpd       %xmm0,%xmm4,%xmm4
    0.00          vmulpd       %xmm0,%xmm5,%xmm5
                  vmulpd       %xmm0,%xmm6,%xmm6
                  vmulpd       %xmm0,%xmm7,%xmm7
                  vaddpd       (%r15),%xmm4,%xmm4
                  vaddpd       0x10(%r15),%xmm5,%xmm5
    0.00          vaddpd       (%r15,%r10,1),%xmm6,%xmm6
    0.00          vaddpd       0x10(%r15,%r10,1),%xmm7,%xmm7
    0.00          vmovups      %xmm4,(%r15)
                  vmovups      %xmm5,0x10(%r15)
    0.00          vmovups      %xmm6,(%r15,%r10,1)
                  vmovups      %xmm7,0x10(%r15,%r10,1)
                  add          $0x20,%r15
  --
                  lea          (%r8,%rax,8),%r8
          69d8:   mov          0x20(%rsp),%r14
    0.00          test         $0x1,%r14
                ↓ je           6d84
                  mov          %r9,%r15
  --
                  vbroadcastsd -0x28(%rsi),%ymm3
                  vfmadd231pd  (%rdi),%ymm0,%ymm4
    0.00          vfmadd231pd  0x20(%rdi),%ymm1,%ymm5
                  vfmadd231pd  0x40(%rdi),%ymm2,%ymm6
                  vfmadd231pd  0x60(%rdi),%ymm3,%ymm7
  --
                  vmulpd       %ymm0,%ymm4,%ymm4
                  vaddpd       (%r15),%ymm4,%ymm4
    0.00          vmovups      %ymm4,(%r15)
                  add          $0x20,%r15
                  dec          %r11
  --
                  mov          %rbx,%rsp
                  mov          (%rsp),%rbx
    0.01          mov          0x8(%rsp),%rbp
                  mov          0x10(%rsp),%r12
                  mov          0x18(%rsp),%r13

Signed-off-by: Vijay Thakkar <vijaythakkar@me.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jon Grimm <jon.grimm@amd.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200318190002.307290-3-vijaythakkar@me.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24 10:35:58 -03:00
arch perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box 2020-03-20 13:06:23 +01:00
block block: Fix partition support for host aware zoned block devices 2020-03-12 07:54:39 -06:00
certs
crypto
Documentation A single commit to handle an erratum in Cavium ThunderX to prevent access 2020-03-15 13:15:16 -07:00
drivers Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2020-03-17 09:38:03 -07:00
fs Fix for yet another subtle futex issue. The futex code used ihold() to 2020-03-15 12:55:52 -07:00
include Merge branch 'perf/urgent' into perf/core, to pick up fixes 2020-03-19 15:01:45 +01:00
init
ipc
kernel perf/core: Fix reversed NULL check in perf_event_groups_less() 2020-03-20 13:06:22 +01:00
lib
LICENSES
mm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 16:19:19 -07:00
net taprio: Fix sending packets without dequeueing them 2020-03-12 11:25:08 -07:00
samples
scripts
security
sound ASoC: Fixes for v5.6 2020-03-07 07:24:36 +01:00
tools perf vendor events amd: Add Zen2 events 2020-03-24 10:35:58 -03:00
usr
virt
.clang-format
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap
COPYING
CREDITS
Kbuild
Kconfig
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 16:19:19 -07:00
Makefile Linux 5.6-rc6 2020-03-15 15:01:23 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.