diff --git a/lib/libpmc/pmc.k7.3 b/lib/libpmc/pmc.k7.3 new file mode 100644 index 000000000000..745a56bc4cfe --- /dev/null +++ b/lib/libpmc/pmc.k7.3 @@ -0,0 +1,239 @@ +.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" This software is provided by Joseph Koshy ``as is'' and +.\" any express or implied warranties, including, but not limited to, the +.\" implied warranties of merchantability and fitness for a particular purpose +.\" are disclaimed. in no event shall Joseph Koshy be liable +.\" for any direct, indirect, incidental, special, exemplary, or consequential +.\" damages (including, but not limited to, procurement of substitute goods +.\" or services; loss of use, data, or profits; or business interruption) +.\" however caused and on any theory of liability, whether in contract, strict +.\" liability, or tort (including negligence or otherwise) arising in any way +.\" out of the use of this software, even if advised of the possibility of +.\" such damage. +.\" +.\" $FreeBSD$ +.\" +.Dd September 16, 2008 +.Os +.Dt PMC.K7 3 +.Sh NAME +.Nm pmc.k7 +.Nd measurement events for +.Tn AMD +.Tn Athlon +(K7 family) CPUs +.Sh LIBRARY +.Lb libpmc +.Sh SYNOPSIS +.In pmc.h +.Sh DESCRIPTION +AMD K7 PMCs are present in the +.Tn "AMD Athlon" +series of CPUs and are documented in: +.Rs +.%B "AMD Athlon Processor x86 Code Optimization Guide" +.%N "Publication No. 22007" +.%D "February 2002" +.%Q "Advanced Micro Devices, Inc." +.Re +.Ss PMC Features +AMD K7 PMCs are 48 bits wide. +Each K7 CPU contains 4 PMCs with the following capabilities: +.Bl -column "PMC_CAP_INTERRUPT" "Support" +.It Em Capability Ta Em Support +.It PMC_CAP_CASCADE Ta \&No +.It PMC_CAP_EDGE Ta Yes +.It PMC_CAP_INTERRUPT Ta Yes +.It PMC_CAP_INVERT Ta Yes +.It PMC_CAP_READ Ta Yes +.It PMC_CAP_PRECISE Ta \&No +.It PMC_CAP_SYSTEM Ta Yes +.It PMC_CAP_TAGGING Ta \&No +.It PMC_CAP_THRESHOLD Ta Yes +.It PMC_CAP_USER Ta Yes +.It PMC_CAP_WRITE Ta Yes +.El +.Ss Event Qualifiers +.Pp +Event specifiers for AMD K7 PMCs can have the following optional +qualifiers: +.Bl -tag -width indent +.It Li count= Ns Ar value +Configure the counter to increment only if the number of configured +events measured in a cycle is greater than or equal to +.Ar value . +.It Li edge +Configure the counter to only count negated-to-asserted transitions +of the conditions expressed by the other qualifiers. +In other words, the counter will increment only once whenever a given +condition becomes true, irrespective of the number of clocks during +which the condition remains true. +.It Li inv +Invert the sense of comparision when the +.Dq Li count +qualifier is present, making the counter to increment when the +number of events per cycle is less than the value specified by +the +.Dq Li count +qualifier. +.It Li os +Configure the PMC to count events happening at privilege level 0. +.It Li unitmask= Ns Ar mask +This qualifier is used to further qualify a select few events, +.Dq Li k7-dc-refills-from-l2 , +.Dq Li k7-dc-refills-from-system +and +.Dq Li k7-dc-writebacks . +Here +.Ar mask +is a string of the following characters optionally separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li m +Count operations for lines in the +.Dq Modified +state. +.It Li o +Count operations for lines in the +.Dq Owner +state. +.It Li e +Count operations for lines in the +.Dq Exclusive +state. +.It Li s +Count operations for lines in the +.Dq Shared +state. +.It Li i +Count operations for lines in the +.Dq Invalid +state. +.El +.Pp +If no +.Dq Li unitmask +qualifier is specified, the default is to count events for caches +lines in any of the above states. +.It Li usr +Configure the PMC to count events occurring at privilege levels 1, 2 +or 3. +.El +.Pp +If neither of the +.Dq Li os +or +.Dq Li usr +qualifiers were specified, the default is to enable both. +.Ss AMD K7 Event Specifiers +The event specifiers supported on AMD K7 PMCs are: +.Bl -tag -width indent +.It Li k7-dc-accesses +Count data cache accesses. +.It Li k7-dc-misses +Count data cache misses. +.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask +Count data cache refills from L2 cache. +This event may be further qualified using the +.Dq Li unitmask +qualifier. +.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask +Count data cache refills from system memory. +This event may be further qualified using the +.Dq Li unitmask +qualifier. +.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask +Count data cache writebacks. +This event may be further qualified using the +.Dq Li unitmask +qualifier. +.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits +Count L1 DTLB misses and L2 DTLB hits. +.It Li k7-l1-and-l2-dtlb-misses +Count L1 and L2 DTLB misses. +.It Li k7-misaligned-references +Count misaligned data references. +.It Li k7-ic-fetches +Count instruction cache fetches. +.It Li k7-ic-misses +Count instruction cache misses. +.It Li k7-l1-itlb-misses +Count L1 ITLB misses that are L2 ITLB hits. +.It Li k7-l1-l2-itlb-misses +Count L1 (and L2) ITLB misses. +.It Li k7-retired-instructions +Count all retired instructions. +.It Li k7-retired-ops +Count retired ops. +.It Li k7-retired-branches +Count all retired branches (conditional, unconditional, exceptions +and interrupts). +.It Li k7-retired-branches-mispredicted +Count all misprediced retired branches. +.It Li k7-retired-taken-branches +Count retired taken branches. +.It Li k7-retired-taken-branches-mispredicted +Count mispredicted taken branches that were retired. +.It Li k7-retired-far-control-transfers +Count retired far control transfers. +.It Li k7-retired-resync-branches +Count retired resync branches (non control transfer branches). +.It Li k7-interrupts-masked-cycles +Count the number of cycles when the processor's +.Va IF +flag was zero. +.It Li k7-interrupts-masked-while-pending-cycles +Count the number of cycles interrupts were masked while pending due +to the processor's +.Va IF +flag being zero. +.It Li k7-hardware-interrupts +Count the number of taken hardware interrupts. +.El +.Ss Event Name Aliases +The following table shows the mapping between the PMC-independent +aliases supported by +.Lb libpmc +and the underlying hardware events used. +.Bl -column "branch-mispredicts" "Description" +.It Em Alias Ta Em Event +.It Li branches Ta Li k7-retired-branches +.It Li branch-mispredicts Ta Li k7-retired-branches-mispredicted +.It Li dc-misses Ta Li k7-dc-misses +.It Li ic-misses Ta Li k7-ic-misses +.It Li instructions Ta Li k7-retired-instructions +.It Li interrupts Ta Li k7-hardware-interrupts +.It Li unhalted-cycles Ta (unsupported) +.El +.Sh SEE ALSO +.Xr pmc 3 , +.Xr pmc.k8 3 , +.Xr pmc.p4 3 , +.Xr pmc.p5 3 , +.Xr pmc.p6 3 , +.Xr pmc.tsc 3 , +.Xr pmclog 3 , +.Xr hwpmc 4 +.Sh HISTORY +The +.Nm pmc +library first appeared in +.Fx 6.0 . +.Sh AUTHORS +The +.Lb libpmc +library was written by +.An "Joseph Koshy" +.Aq jkoshy@FreeBSD.org . diff --git a/lib/libpmc/pmc.k8.3 b/lib/libpmc/pmc.k8.3 new file mode 100644 index 000000000000..229f2a1ca8f2 --- /dev/null +++ b/lib/libpmc/pmc.k8.3 @@ -0,0 +1,703 @@ +.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" This software is provided by Joseph Koshy ``as is'' and +.\" any express or implied warranties, including, but not limited to, the +.\" implied warranties of merchantability and fitness for a particular purpose +.\" are disclaimed. in no event shall Joseph Koshy be liable +.\" for any direct, indirect, incidental, special, exemplary, or consequential +.\" damages (including, but not limited to, procurement of substitute goods +.\" or services; loss of use, data, or profits; or business interruption) +.\" however caused and on any theory of liability, whether in contract, strict +.\" liability, or tort (including negligence or otherwise) arising in any way +.\" out of the use of this software, even if advised of the possibility of +.\" such damage. +.\" +.\" $FreeBSD$ +.\" +.Dd September 17, 2008 +.Os +.Dt PMC.K8 3 +.Sh NAME +.Nm pmc.k8 +.Nd measurement events for +.Tn AMD +.Tn Athlon 64 +(K8 family) CPUs +.Sh LIBRARY +.Lb libpmc +.Sh SYNOPSIS +.In pmc.h +.Sh DESCRIPTION +AMD K8 PMCs are present in the +.Tn "AMD Athlon64" +and +.Tn "AMD Opteron" +series of CPUs. +They are documented in the +.Rs +.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors" +.%N "Publication No. 26094" +.%D "April 2004" +.%Q "Advanced Micro Devices, Inc." +.Re +.Ss PMC Features +AMD K8 PMCs are 48 bits wide. +Each CPU contains 4 PMCs with the following capabilities: +.Bl -column "PMC_CAP_INTERRUPT" "Support" +.It Em Capability Ta Em Support +.It PMC_CAP_CASCADE Ta \&No +.It PMC_CAP_EDGE Ta Yes +.It PMC_CAP_INTERRUPT Ta Yes +.It PMC_CAP_INVERT Ta Yes +.It PMC_CAP_READ Ta Yes +.It PMC_CAP_PRECISE Ta \&No +.It PMC_CAP_SYSTEM Ta Yes +.It PMC_CAP_TAGGING Ta \&No +.It PMC_CAP_THRESHOLD Ta Yes +.It PMC_CAP_USER Ta Yes +.It PMC_CAP_WRITE Ta Yes +.El +.Ss Event Qualifiers +.Pp +Event specifiers for AMD K8 PMCs can have the following optional +qualifiers: +.Bl -tag -width indent +.It Li count= Ns Ar value +Configure the counter to increment only if the number of configured +events measured in a cycle is greater than or equal to +.Ar value . +.It Li edge +Configure the counter to only count negated-to-asserted transitions +of the conditions expressed by the other fields. +In other words, the counter will increment only once whenever a given +condition becomes true, irrespective of the number of clocks during +which the condition remains true. +.It Li inv +Invert the sense of comparision when the +.Dq Li count +qualifier is present, making the counter to increment when the +number of events per cycle is less than the value specified by +the +.Dq Li count +qualifier. +.It Li mask= Ns Ar qualifier +Many event specifiers for AMD K8 PMCs need to be additionally +qualified using a mask qualifier. +These additional qualifiers are event-specific and are documented +along with their associated event specifiers below. +.It Li os +Configure the PMC to count events happening at privilege level 0. +.It Li usr +Configure the PMC to count events occurring at privilege levels 1, 2 +or 3. +.El +.Pp +If neither of the +.Dq Li os +or +.Dq Li usr +qualifiers were specified, the default is to enable both. +.Ss AMD K8 Event Specifiers +The event specifiers supported on AMD K8 PMCs are: +.Bl -tag -width indent +.It Li k8-bu-cpu-clk-unhalted +Count the number of clock cycles when the CPU is not in the HLT or +STPCLK states. +.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier +Count fill requests that missed in the L2 cache. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li dc-fill +Count data cache fill requests. +.It Li ic-fill +Count instruction cache fill requests. +.It Li tlb-reload +Count TLB reloads. +.El +.Pp +The default is to count all types of requests. +.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier +Count internally generated requests to the L2 cache. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li cancelled +Count cancelled requests. +.It Li dc-fill +Count data cache fill requests. +.It Li ic-fill +Count instruction cache fill requests. +.It Li tag-snoop +Count tag snoop requests. +.It Li tlb-reload +Count TLB reloads. +.El +.Pp +The default is to count all types of requests. +.It Li k8-dc-access +Count data cache accesses including microcode scratchpad accesses. +.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier +Count data cache copyback operations. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li exclusive +Count operations for lines in the +.Dq exclusive +state. +.It Li invalid +Count operations for lines in the +.Dq invalid +state. +.It Li modified +Count operations for lines in the +.Dq modified +state. +.It Li owner +Count operations for lines in the +.Dq owner +state. +.It Li shared +Count operations for lines in the +.Dq shared +state. +.El +.Pp +The default is to count operations for lines in all the +above states. +.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier +Count data cache accesses by lock instructions. +This event is only available on processors of revision C or later +vintage. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li accesses +Count data cache accesses by lock instructions. +.It Li misses +Count data cache misses by lock instructions. +.El +.Pp +The default is to count all accesses. +.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier +Count the number of dispatched prefetch instructions. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li load +Count load operations. +.It Li nta +Count non-temporal operations. +.It Li store +Count store operations. +.El +.Pp +The default is to count all operations. +.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit +Count L1 DTLB misses that are L2 DTLB hits. +.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss +Count L1 DTLB misses that are also misses in the L2 DTLB. +.It Li k8-dc-microarchitectural-early-cancel-of-an-access +Count microarchitectural early cancels of data cache accesses. +.It Li k8-dc-microarchitectural-late-cancel-of-an-access +Count microarchitectural late cancels of data cache accesses. +.It Li k8-dc-misaligned-data-reference +Count misaligned data references. +.It Li k8-dc-miss +Count data cache misses. +.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier +Count one bit ECC errors found by the scrubber. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li scrubber +Count scrubber detected errors. +.It Li piggyback +Count piggyback scrubber errors. +.El +.Pp +The default is to count both kinds of errors. +.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier +Count data cache refills from L2 cache. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li exclusive +Count operations for lines in the +.Dq exclusive +state. +.It Li invalid +Count operations for lines in the +.Dq invalid +state. +.It Li modified +Count operations for lines in the +.Dq modified +state. +.It Li owner +Count operations for lines in the +.Dq owner +state. +.It Li shared +Count operations for lines in the +.Dq shared +state. +.El +.Pp +The default is to count operations for lines in all the +above states. +.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier +Count data cache refills from system memory. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li exclusive +Count operations for lines in the +.Dq exclusive +state. +.It Li invalid +Count operations for lines in the +.Dq invalid +state. +.It Li modified +Count operations for lines in the +.Dq modified +state. +.It Li owner +Count operations for lines in the +.Dq owner +state. +.It Li shared +Count operations for lines in the +.Dq shared +state. +.El +.Pp +The default is to count operations for lines in all the +above states. +.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier +Count the number of dispatched FPU ops. +This event is supported in revision B and later CPUs. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li add-pipe-excluding-junk-ops +Count add pipe ops excluding junk ops. +.It Li add-pipe-junk-ops +Count junk ops in the add pipe. +.It Li multiply-pipe-excluding-junk-ops +Count multiply pipe ops excluding junk ops. +.It Li multiply-pipe-junk-ops +Count junk ops in the multiply pipe. +.It Li store-pipe-excluding-junk-ops +Count store pipe ops excluding junk ops +.It Li store-pipe-junk-ops +Count junk ops in the store pipe. +.El +.Pp +The default is to count all types of ops. +.It Li k8-fp-cycles-with-no-fpu-ops-retired +Count cycles when no FPU ops were retired. +This event is supported in revision B and later CPUs. +.It Li k8-fp-dispatched-fpu-fast-flag-ops +Count dispatched FPU ops that use the fast flag interface. +This event is supported in revision B and later CPUs. +.It Li k8-fr-decoder-empty +Count cycles when there was nothing to dispatch (i.e., the decoder +was empty). +.It Li k8-fr-dispatch-stalls +Count all dispatch stalls. +.It Li k8-fr-dispatch-stall-for-segment-load +Count dispatch stalls for segment loads. +.It Li k8-fr-dispatch-stall-for-serialization +Count dispatch stalls for serialization. +.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire +Count dispatch stalls from branch abort to retiral. +.It Li k8-fr-dispatch-stall-when-fpu-is-full +Count dispatch stalls when the FPU is full. +.It Li k8-fr-dispatch-stall-when-ls-is-full +Count dispatch stalls when the load/store unit is full. +.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full +Count dispatch stalls when the reorder buffer is full. +.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full +Count dispatch stalls when reservation stations are full. +.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet +Count dispatch stalls when waiting for all to be quiet. +.\" XXX What does "waiting for all to be quiet" mean? +.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending +Count dispatch stalls when a far control transfer or a resync branch +is pending. +.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier +Count FPU exceptions. +This event is supported in revision B and later CPUs. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li sse-and-x87-microtraps +Count SSE and x87 microtraps. +.It Li sse-reclass-microfaults +Count SSE reclass microfaults +.It Li sse-retype-microfaults +Count SSE retype microfaults +.It Li x87-reclass-microfaults +Count x87 reclass microfaults. +.El +.Pp +The default is to count all types of exceptions. +.It Li k8-fr-interrupts-masked-cycles +Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero). +.It Li k8-fr-interrupts-masked-while-pending-cycles +Count cycles while interrupts were masked while pending (i.e., cycles +when INTR was asserted while CPU RFLAGS field IF was zero). +.It Li k8-fr-number-of-breakpoints-for-dr0 +Count the number of breakpoints for DR0. +.It Li k8-fr-number-of-breakpoints-for-dr1 +Count the number of breakpoints for DR1. +.It Li k8-fr-number-of-breakpoints-for-dr2 +Count the number of breakpoints for DR2. +.It Li k8-fr-number-of-breakpoints-for-dr3 +Count the number of breakpoints for DR3. +.It Li k8-fr-retired-branches +Count retired branches including exceptions and interrupts. +.It Li k8-fr-retired-branches-mispredicted +Count mispredicted retired branches. +.It Li k8-fr-retired-far-control-transfers +Count retired far control transfers (which are always mispredicted). +.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier +Count retired fastpath double op instructions. +This event is supported in revision B and later CPUs. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li low-op-pos-0 +Count instructions with the low op in position 0. +.It Li low-op-pos-1 +Count instructions with the low op in position 1. +.It Li low-op-pos-2 +Count instructions with the low op in position 2. +.El +.Pp +The default is to count all types of instructions. +.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier +Count retired FPU instructions. +This event is supported in revision B and later CPUs. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li mmx-3dnow +Count MMX and 3DNow!\& instructions. +.It Li packed-sse-sse2 +Count packed SSE and SSE2 instructions. +.It Li scalar-sse-sse2 +Count scalar SSE and SSE2 instructions +.It Li x87 +Count x87 instructions. +.El +.Pp +The default is to count all types of instructions. +.It Li k8-fr-retired-near-returns +Count retired near returns. +.It Li k8-fr-retired-near-returns-mispredicted +Count mispredicted near returns. +.It Li k8-fr-retired-resyncs +Count retired resyncs (non-control transfer branches). +.It Li k8-fr-retired-taken-hardware-interrupts +Count retired taken hardware interrupts. +.It Li k8-fr-retired-taken-branches +Count retired taken branches. +.It Li k8-fr-retired-taken-branches-mispredicted +Count retired taken branches that were mispredicted. +.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare +Count retired taken branches that were mispredicted only due to an +address miscompare. +.It Li k8-fr-retired-uops +Count retired uops. +.It Li k8-fr-retired-x86-instructions +Count retired x86 instructions including exceptions and interrupts. +.It Li k8-ic-fetch +Count instruction cache fetches. +.It Li k8-ic-instruction-fetch-stall +Count cycles in stalls due to instruction fetch. +.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit +Count L1 ITLB misses that are L2 ITLB hits. +.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss +Count ITLB misses that miss in both L1 and L2 ITLBs. +.It Li k8-ic-microarchitectural-resync-by-snoop +Count microarchitectural resyncs caused by snoops. +.It Li k8-ic-miss +Count instruction cache misses. +.It Li k8-ic-refill-from-l2 +Count instruction cache refills from L2 cache. +.It Li k8-ic-refill-from-system +Count instruction cache refills from system memory. +.It Li k8-ic-return-stack-hits +Count hits to the return stack. +.It Li k8-ic-return-stack-overflow +Count overflows of the return stack. +.It Li k8-ls-buffer2-full +Count load/store buffer2 full events. +.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier +Count locked operations. +For revision C and later CPUs, the following qualifiers are supported: +.Pp +.Bl -tag -width indent -compact +.It Li cycles-in-request +Count the number of cycles in the lock request/grant stage. +.It Li cycles-to-complete +Count the number of cycles a lock takes to complete once it is +non-speculative and is the older load/store operation. +.It Li locked-instructions +Count the number of lock instructions executed. +.El +.Pp +The default is to count the number of lock instructions executed. +.It Li k8-ls-microarchitectural-late-cancel +Count microarchitectural late cancels of operations in the load/store +unit. +.It Li k8-ls-microarchitectural-resync-by-self-modifying-code +Count microarchitectural resyncs caused by self-modifying code. +.It Li k8-ls-microarchitectural-resync-by-snoop +Count microarchitectural resyncs caused by snoops. +.It Li k8-ls-retired-cflush-instructions +Count retired CFLUSH instructions. +.It Li k8-ls-retired-cpuid-instructions +Count retired CPUID instructions. +.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier +Count segment register loads. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Bl -tag -width indent -compact +.It Li cs +Count CS register loads. +.It Li ds +Count DS register loads. +.It Li es +Count ES register loads. +.It Li fs +Count FS register loads. +.It Li gs +Count GS register loads. +.\" .It Li hs +.\" Count HS register loads. +.\" XXX "HS" register? +.It Li ss +Count SS register loads. +.El +.Pp +The default is to count all types of loads. +.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier +Count memory controller bypass counter saturation events. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li dram-controller-interface-bypass +Count DRAM controller interface bypass. +.It Li dram-controller-queue-bypass +Count DRAM controller queue bypass. +.It Li memory-controller-hi-pri-bypass +Count memory controller high priority bypasses. +.It Li memory-controller-lo-pri-bypass +Count memory controller low priority bypasses. +.El +.Pp +.It Li k8-nb-memory-controller-dram-slots-missed +Count memory controller DRAM command slots missed (in MemClks). +.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier +Count memory controller page access events. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li page-conflict +Count page conflicts. +.It Li page-hit +Count page hits. +.It Li page-miss +Count page misses. +.El +.Pp +The default is to count all types of events. +.It Li k8-nb-memory-controller-page-table-overflow +Count memory control page table overflow events. +.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier +Count probe events. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li probe-hit +Count all probe hits. +.It Li probe-hit-dirty-no-memory-cancel +Count probe hits without memory cancels. +.It Li probe-hit-dirty-with-memory-cancel +Count probe hits with memory cancels. +.It Li probe-miss +Count probe misses. +.El +.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier +Count sized commands issued. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li nonpostwrszbyte +.It Li nonpostwrszdword +.It Li postwrszbyte +.It Li postwrszdword +.It Li rdszbyte +.It Li rdszdword +.It Li rdmodwr +.El +.Pp +The default is to count all types of commands. +.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier +Count memory control turnaround events. +This event may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.\" XXX doc is unclear whether these are cycle counts or event counts +.It Li dimm-turnaround +Count DIMM turnarounds. +.It Li read-to-write-turnaround +Count read to write turnarounds. +.It Li write-to-read-turnaround +Count write to read turnarounds. +.El +.Pp +The default is to count all types of events. +.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier +.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier +.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier +Count events on the HyperTransport(tm) buses. +These events may be further qualified using +.Ar qualifier , +which is a +.Ql + +separated set of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li buffer-release +Count buffer release messages sent. +.It Li command +Count command messages sent. +.It Li data +Count data messages sent. +.It Li nop +Count nop messages sent. +.El +.Pp +The default is to count all types of messages. +.El +.Ss Event Name Aliases +The following table shows the mapping between the PMC-independent +aliases supported by +.Lb libpmc +and the underlying hardware events used. +.Bl -column "branch-mispredicts" "Description" +.It Em Alias Ta Em Event +.It Li branches Ta Li k8-fr-retired-taken-branches +.It Li branch-mispredicts Ta Li k8-fr-retired-taken-branches-mispredicted +.It Li dc-misses Ta Li k8-dc-miss +.It Li ic-misses Ta Li k8-ic-miss +.It Li instructions Ta Li k8-fr-retired-x86-instructions +.It Li interrupts Ta Li k8-fr-taken-hardware-interrupts +.It Li unhalted-cycles Ta Li k8-by-cpu-clk-unhalted +.El +.Sh SEE ALSO +.Xr pmc 3 , +.Xr pmc.k7 3 , +.Xr pmc.p4 3 , +.Xr pmc.p5 3 , +.Xr pmc.p6 3 , +.Xr pmc.tsc 3 , +.Xr pmclog 3 , +.Xr hwpmc 4 +.Sh HISTORY +The +.Nm pmc +library first appeared in +.Fx 6.0 . +.Sh AUTHORS +The +.Lb libpmc +library was written by +.An "Joseph Koshy" +.Aq jkoshy@FreeBSD.org . diff --git a/lib/libpmc/pmc.p4.3 b/lib/libpmc/pmc.p4.3 new file mode 100644 index 000000000000..e8390f9a10e1 --- /dev/null +++ b/lib/libpmc/pmc.p4.3 @@ -0,0 +1,1222 @@ +.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" This software is provided by Joseph Koshy ``as is'' and +.\" any express or implied warranties, including, but not limited to, the +.\" implied warranties of merchantability and fitness for a particular purpose +.\" are disclaimed. in no event shall Joseph Koshy be liable +.\" for any direct, indirect, incidental, special, exemplary, or consequential +.\" damages (including, but not limited to, procurement of substitute goods +.\" or services; loss of use, data, or profits; or business interruption) +.\" however caused and on any theory of liability, whether in contract, strict +.\" liability, or tort (including negligence or otherwise) arising in any way +.\" out of the use of this software, even if advised of the possibility of +.\" such damage. +.\" +.\" $FreeBSD$ +.\" +.Dd September 17, 2008 +.Os +.Dt PMC.P4 3 +.Sh NAME +.Nm pmc.p4 +.Nd measurement events for +.Tn "Intel Pentium 4" +and other +.Tn Netburst +architecture CPUs +.Sh LIBRARY +.Lb libpmc +.Sh SYNOPSIS +.In pmc.h +.Sh DESCRIPTION +Intel P4 PMCs are present in Intel +.Tn "Pentium 4" +and +.Tn Xeon +processors that use the +.Tn Netburst +CPU architecture. +.Pp +These PMCs are documented in +.Rs +.%B "IA-32 Intel(R) Architecture Software Developer's Manual" +.%T "Volume 3: System Programming Guide" +.%N "Order Number 245472-012" +.%D 2003 +.%Q "Intel Corporation" +.Re +Further information about using these PMCs may be found in +.Rs +.%B "IA-32 Intel(R) Architecture Optimization Guide" +.%D 2003 +.%N "Order Number 248966-009" +.%Q "Intel Corporation" +.Re +Some of these events are affected by processor errata described in +.Rs +.%B "Intel(R) Pentium(R) 4 Processor Specification Update" +.%N "Document Number: 249199-059" +.%D "April 2005" +.%Q "Intel Corporation" +.Re +.Ss PMC Features +Intel Pentium 4 PMCs are 40 bits wide. +Each CPU contains 18 PMCs, divided into 4 groups with 4, 4, 4 and 6 +PMCs respectively. +On processors with hyperthreading support, PMC resources are shared +between logical processors. +These PMCs support the following capabilities: +.Bl -column "PMC_CAP_INTERRUPT" "Support" +.It Em Capability Ta Em Support +.It PMC_CAP_CASCADE Ta Yes +.It PMC_CAP_EDGE Ta Yes +.It PMC_CAP_INTERRUPT Ta Yes +.It PMC_CAP_INVERT Ta Yes +.It PMC_CAP_READ Ta Yes +.It PMC_CAP_PRECISE Ta Unimplemented +.It PMC_CAP_SYSTEM Ta Yes +.It PMC_CAP_TAGGING Ta Yes +.It PMC_CAP_THRESHOLD Ta Yes +.It PMC_CAP_USER Ta Yes +.It PMC_CAP_WRITE Ta Yes +.El +.Ss Event Qualifiers +.Pp +Event specifiers for Intel P4 PMCs can have the following common +qualifiers: +.Bl -tag -width indent +.It Li active= Ns Ar choice +(On P4 HTT CPUs) Filter event counting based on which logical +processors are active. +The allowed values of +.Ar choice +are: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count when either logical processor is active. +.It Li both +Count when both logical processors are active. +.It Li none +Count only when neither logical processor is active. +.It Li single +Count only when one logical processor is active. +.El +.Pp +The default is +.Dq Li both . +.It Li cascade +Configure the PMC to cascade onto its partner. +See +.Sx "Cascading P4 PMCs" +below for more information. +.It Li edge +Configure the counter to count false to true transitions of the threshold +comparision output. +This qualifier only takes effect if a threshold qualifier has also been +specified. +.It Li complement +Configure the counter to increment only when the event count seen is +less than the threshold qualifier value specified. +.It Li mask= Ns Ar qualifier +Many event specifiers for Intel P4 PMCs need to be additionally +qualified using a mask qualifier. +The allowed syntax for these qualifiers is event specific and is +described along with the events. +.It Li os +Configure the PMC to count when the CPL of the processor is 0. +.It Li precise +Select precise event based sampling. +Precise sampling is supported by the hardware for a limited set of +events. +.It Li tag= Ns Ar value +Configure the PMC to tag the internal uop selected by the other +fields in this event specifier with value +.Ar value . +This feature is used when cascading PMCs. +.It Li threshold= Ns Ar value +Configure the PMC to increment only when the event counts seen are +greater than the specified threshold value +.Ar value . +.It Li usr +Configure the PMC to count when the CPL of the processor is 1, 2 or 3. +.El +.Pp +If neither of the +.Dq Li os +or +.Dq Li usr +qualifiers are specified, the default is to enable both. +.Pp +On Intel Pentium 4 processors with HTT, events are +divided into two classes: +.Pp +.Bl -tag -width indent -compact +.It "TS Events" +are those where hardware can differentiate between events +generated on one logical processor from those generated on the +other. +.It "TI Events" +are those where hardware cannot differentiate between events +generated by multiple logical processors in a package. +.El +.Pp +Only TS events are allowed for use with process-mode PMCs on +Pentium-4/HTT CPUs. +.Pp +The event specifiers supported by Intel P4 PMCs are: +.Pp +.Bl -tag -width indent +.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count integer SIMD SSE2 instructions that operate on 128 bit SIMD +operands. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all uops operating on 128 bit SIMD integer operands in memory or +XMM register. +.El +.Pp +If an instruction contains more than one 128 bit MMX uop, then each +uop will be counted. +.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count MMX instructions that operate on 64 bit SIMD operands. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all uops operating on 64 bit SIMD integer operands in memory or +in MMX registers. +.El +.Pp +If an instruction contains more than one 64 bit MMX uop, then each +uop will be counted. +.It Li p4-b2b-cycles +.Pq "TI event" +Count back-to-back bus cycles. +Further documentation for this event is unavailable. +.It Li p4-bnr +.Pq "TI event" +Count bus-not-ready conditions. +Further documentation for this event is unavailable. +.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier +.Pq "TS event" +Count instruction fetch requests qualified by additional +flags specified in +.Ar qualifier . +At this point only one flag is supported: +.Pp +.Bl -tag -width indent -compact +.It Li tcmiss +Count trace cache lookup misses. +.El +.Pp +The default qualifier is also +.Dq Li mask=tcmiss . +.It Li p4-branch-retired Op Li ,mask= Ns Ar flags +.Pq "TS event" +Counts retired branches. +Qualifier +.Ar flags +is a list of the following +.Ql + +separated strings: +.Pp +.Bl -tag -width indent -compact +.It Li mmnp +Count branches not-taken and predicted. +.It Li mmnm +Count branches not-taken and mis-predicted. +.It Li mmtp +Count branches taken and predicted. +.It Li mmtm +Count branches taken and mis-predicted. +.El +.Pp +The default qualifier counts all four kinds of branches. +.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier +.Pq "TS event" +Count the number of entries (clipped at 15) currently active in the +BSQ. +Qualifier +.Ar qualifier +is a +.Ql + +separated set of the following flags: +.Pp +.Bl -tag -width indent -compact +.It Li req-type0 , Li req-type1 +Forms a 2-bit number used to select the request type encoding: +.Pp +.Bl -tag -width indent -compact +.It Li 0 +reads excluding read invalidate +.It Li 1 +read invalidates +.It Li 2 +writes other than writebacks +.It Li 3 +writebacks +.El +.Pp +Bit +.Dq Li req-type1 +is the MSB for this two bit number. +.It Li req-len0 , Li req-len1 +Forms a two-bit number that specifies the request length encoding: +.Pp +.Bl -tag -width indent -compact +.It Li 0 +0 chunks +.It Li 1 +1 chunk +.It Li 3 +8 chunks +.El +.Pp +Bit +.Dq Li req-len1 +is the MSB for this two bit number. +.It Li req-io-type +Count requests that are input or output requests. +.It Li req-lock-type +Count requests that lock the bus. +.It Li req-lock-cache +Count requests that lock the cache. +.It Li req-split-type +Count requests that is a bus 8-byte chunk that is split across an +8-byte boundary. +.It Li req-dem-type +Count requests that are demand (not prefetches) if set. +Count requests that are prefetches if not set. +.It Li req-ord-type +Count requests that are ordered. +.It Li mem-type0 , Li mem-type1 , Li mem-type2 +Forms a 3-bit number that specifies a memory type encoding: +.Pp +.Bl -tag -width indent -compact +.It Li 0 +UC +.It Li 1 +USWC +.It Li 4 +WT +.It Li 5 +WP +.It Li 6 +WB +.El +.Pp +Bit +.Dq Li mem-type2 +is the MSB of this 3-bit number. +.El +.Pp +The default qualifier has all the above bits set. +.Pp +Edge triggering using the +.Dq Li edge +qualifier should not be used with this event when counting cycles. +.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier +.Pq "TS event" +Count allocations in the bus sequence unit according to the flags +specified in +.Ar qualifier , +which is a +.Ql + +separated set of the following flags: +.Pp +.Bl -tag -width indent -compact +.It Li req-type0 , Li req-type1 +Forms a 2-bit number used to select the request type encoding: +.Pp +.Bl -tag -width indent -compact +.It Li 0 +reads excluding read invalidate +.It Li 1 +read invalidates +.It Li 2 +writes other than writebacks +.It Li 3 +writebacks +.El +.Pp +Bit +.Dq Li req-type1 +is the MSB for this two bit number. +.It Li req-len0 , Li req-len1 +Forms a two-bit number that specifies the request length encoding: +.Pp +.Bl -tag -width indent -compact +.It Li 0 +0 chunks +.It Li 1 +1 chunk +.It Li 3 +8 chunks +.El +.Pp +Bit +.Dq Li req-len1 +is the MSB for this two bit number. +.It Li req-io-type +Count requests that are input or output requests. +.It Li req-lock-type +Count requests that lock the bus. +.It Li req-lock-cache +Count requests that lock the cache. +.It Li req-split-type +Count requests that is a bus 8-byte chunk that is split across an +8-byte boundary. +.It Li req-dem-type +Count requests that are demand (not prefetches) if set. +Count requests that are prefetches if not set. +.It Li req-ord-type +Count requests that are ordered. +.It Li mem-type0 , Li mem-type1 , Li mem-type2 +Forms a 3-bit number that specifies a memory type encoding: +.Pp +.Bl -tag -width indent -compact +.It Li 0 +UC +.It Li 1 +USWC +.It Li 4 +WT +.It Li 5 +WP +.It Li 6 +WB +.El +.Pp +Bit +.Dq Li mem-type2 +is the MSB of this 3-bit number. +.El +.Pp +The default qualifier has all the above bits set. +.Pp +This event is usually used along with the +.Dq Li edge +qualifier to avoid multiple counting. +.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier +.Pq "TS event" +Count cache references as seen by the bus unit (2nd or 3rd level +cache references). +Qualifier +.Ar qualifier +is a +.Ql + +separated list of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li rd-2ndl-hits +Count 2nd level cache hits in the shared state. +.It Li rd-2ndl-hite +Count 2nd level cache hits in the exclusive state. +.It Li rd-2ndl-hitm +Count 2nd level cache hits in the modified state. +.It Li rd-3rdl-hits +Count 3rd level cache hits in the shared state. +.It Li rd-3rdl-hite +Count 3rd level cache hits in the exclusive state. +.It Li rd-3rdl-hitm +Count 3rd level cache hits in the modified state. +.It Li rd-2ndl-miss +Count 2nd level cache misses. +.It Li rd-3rdl-miss +Count 3rd level cache misses. +.It Li wr-2ndl-miss +Count write-back lookups from the data access cache that miss the 2nd +level cache. +.El +.Pp +The default is to count all the above events. +.It Li p4-execution-event Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count the retirement of tagged uops selected through the execution +tagging mechanism. +Qualifier +.Ar flags +can contain the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3 +The marked uops are not bogus. +.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3 +The marked uops are bogus. +.El +.Pp +This event requires additional (upstream) events to be allocated to +perform the desired uop tagging. +The default is to set all the above flags. +This event can be used for precise event based sampling. +.It Li p4-front-end-event Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count the retirement of tagged uops selected through the front-end +tagging mechanism. +Qualifier +.Ar flags +can contain the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li nbogus +The marked uops are not bogus. +.It Li bogus +The marked uops are bogus. +.El +.Pp +This event requires additional (upstream) events to be allocated to +perform the desired uop tagging. +The default is to select both kinds of events. +This event can be used for precise event based sampling. +.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count each DBSY or DRDY event selected by qualifier +.Ar flags . +Qualifier +.Ar flags +is a +.Ql + +separated set of the following flags: +.Pp +.Bl -tag -width indent -compact +.It Li drdy-drv +Count when this processor is driving data onto the bus. +.It Li drdy-own +Count when this processor is reading data from the bus. +.It Li drdy-other +Count when data is on the bus but not being sampled by this processor. +.It Li dbsy-drv +Count when this processor reserves the bus for use in the next cycle +in order to drive data. +.It Li dbsy-own +Count when some agent reserves the bus for use in the next bus cycle +to drive data that this processor will sample. +.It Li dbsy-other +Count when some agent reserves the bus for use in the next bus cycle +to drive data that this processor will not sample. +.El +.Pp +Flags +.Dq Li drdy-own +and +.Dq Li drdy-other +are mutually exclusive. +Flags +.Dq Li dbsy-own +and +.Dq Li dbsy-other +are mutually exclusive. +The default value for +.Ar qualifier +is +.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own . +.It Li p4-global-power-events Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count cycles during which the processor is not stopped. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li running +Count cycles when the processor is active. +.El +.Pp +.It Li p4-instr-retired Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count instructions retired during a clock cycle. +Qualifer +.Ar flags +comprises of the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li nbogusntag +Count non-bogus instructions that are not tagged. +.It Li nbogustag +Count non-bogus instructions that are tagged. +.It Li bogusntag +Count bogus instructions that are not tagged. +.It Li bogustag +Count bogus instructions that are tagged. +.El +.Pp +The default qualifier counts all the above kinds of instructions. +.It Li p4-ioq-active-entries Xo +.Op Li ,mask= Ns Ar qualifier +.Op Li ,busreqtype= Ns Ar req-type +.Xc +.Pq "TS event" +Count the number of entries (clipped at 15) in the IOQ that are +active. +The event masks are specified by qualifier +.Ar qualifier +and +.Ar req-type . +.Pp +Qualifier +.Ar qualifier +is a +.Ql + +separated set of the following flags: +.Pp +.Bl -tag -width indent -compact +.It Li all-read +Count read entries. +.It Li all-write +Count write entries. +.It Li mem-uc +Count entries accessing uncacheable memory. +.It Li mem-wc +Count entries accessing write-combining memory. +.It Li mem-wt +Count entries accessing write-through memory. +.It Li mem-wp +Count entries accessing write-protected memory +.It Li mem-wb +Count entries accessing write-back memory. +.It Li own +Count store requests driven by the processor (i.e., not by other +processors or by DMA). +.It Li other +Count store requests driven by other processors or by DMA. +.It Li prefetch +Include hardware and software prefetch requests in the count. +.El +.Pp +The default value for +.Ar qualifier +is to enable all the above flags. +.Pp +The +.Ar req-type +qualifier is a 5-bit number can be additionally used to select a +specific bus request type. +The default is 0. +.Pp +The +.Dq Li edge +qualifier should not be used when counting cycles with this event. +The exact behaviour of this event depends on the processor revision. +.It Li p4-ioq-allocation Xo +.Op Li ,mask= Ns Ar qualifier +.Op Li ,busreqtype= Ns Ar req-type +.Xc +.Pq "TS event" +Count various types of transactions on the bus matching the flags set +in +.Ar qualifier +and +.Ar req-type . +.Pp +Qualifier +.Ar qualifier +is a +.Ql + +separated set of the following flags: +.Pp +.Bl -tag -width indent -compact +.It Li all-read +Count read entries. +.It Li all-write +Count write entries. +.It Li mem-uc +Count entries accessing uncacheable memory. +.It Li mem-wc +Count entries accessing write-combining memory. +.It Li mem-wt +Count entries accessing write-through memory. +.It Li mem-wp +Count entries accessing write-protected memory +.It Li mem-wb +Count entries accessing write-back memory. +.It Li own +Count store requests driven by the processor (i.e., not by other +processors or by DMA). +.It Li other +Count store requests driven by other processors or by DMA. +.It Li prefetch +Include hardware and software prefetch requests in the count. +.El +.Pp +The default value for +.Ar qualifier +is to enable all the above flags. +.Pp +The +.Ar req-type +qualifier is a 5-bit number can be additionally used to select a +specific bus request type. +The default is 0. +.Pp +The +.Dq Li edge +qualifier is normally used with this event to prevent multiple +counting. +The exact behaviour of this event depends on the processor revision. +.It Li p4-itlb-reference Op mask= Ns Ar qualifier +.Pq "TS event" +Count translations using the intruction translation look-aside +buffer. +The +.Ar qualifier +argument is a list of the following strings separated by +.Ql + +characters. +.Pp +.Bl -tag -width indent -compact +.It Li hit +Count ITLB hits. +.It Li miss +Count ITLB misses. +.It Li hit-uc +Count uncacheable ITLB hits. +.El +.Pp +If no +.Ar qualifier +is specified the default is to count all the three kinds of ITLB +translations. +.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier +.Pq "TS event" +Count replayed events at the load port. +Qualifier +.Ar qualifier +can take on one value: +.Pp +.Bl -tag -width indent -compact +.It Li split-ld +Count split loads. +.El +.Pp +The default value for +.Ar qualifier +is +.Dq Li split-ld . +.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count mispredicted IA-32 branch instructions. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li nbogus +Count non-bogus retired branch instructions. +.El +.It Li p4-machine-clear Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count the number of pipeline clears seen by the processor. +Qualifer +.Ar flags +is a list of the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li clear +Count for a portion of the many cycles when the machine is being +cleared for any reason. +.It Li moclear +Count machine clears due to memory ordering issues. +.It Li smclear +Count machine clears due to self-modifying code. +.El +.Pp +Use qualifier +.Dq Li edge +to get a count of occurrences of machine clears. +The default qualifier is +.Dq Li clear . +.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list +.Pq "TS event" +Count the cancelling of various kinds of requests in the data cache +address control unit of the CPU. +The qualifier +.Ar event-list +is a list of the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li st-rb-full +Requests cancelled because no store request buffer was available. +.It Li 64k-conf +Requests that conflict due to 64K aliasing. +.El +.Pp +If +.Ar event-list +is not specified, then the default is to count both kinds of events. +.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list +.Pq "TS event" +Count the completion of load split, store split, uncacheable split and +uncacheable load operations selected by qualifier +.Ar event-list . +The qualifier +.Ar event-list +is a +.Ql + +separated list of the following flags: +.Pp +.Bl -tag -width indent -compact +.It Li lsc +Count load splits completed, excluding loads from uncacheable or +write-combining areas. +.It Li ssc +Count any split stores completed. +.El +.Pp +The default is to count both kinds of operations. +.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier +.Pq "TS event" +Count load replays triggered by the memory order buffer. +Qualifier +.Ar qualifier +can be a +.Ql + +separated list of the following flags: +.Pp +.Bl -tag -width indent -compact +.It Li no-sta +Count replays because of unknown store addresses. +.It Li no-std +Count replays because of unknown store data. +.It Li partial-data +Count replays because of partially overlapped data accesses between +load and store operations. +.It Li unalgn-addr +Count replays because of mismatches in the lower 4 bits of load and +store operations. +.El +.Pp +The default qualifier is +.Ar no-sta+no-std+partial-data+unalgn-addr . +.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count packed double-precision uops. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all uops operating on packed double-precision operands. +.El +.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count packed single-precision uops. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all uops operating on packed single-precision operands. +.El +.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier +.Pq "TI event" +Count page walks performed by the page miss handler. +Qualifier +.Ar qualifier +can be a +.Ql + +separated list of the following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li dtmiss +Count page walks for data TLB misses. +.It Li itmiss +Count page walks for instruction TLB misses. +.El +.Pp +The default value for +.Ar qualifier +is +.Dq Li dtmiss+itmiss . +.It Li p4-replay-event Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count the retirement of tagged uops selected through the replay +tagging mechanism. +Qualifier +.Ar flags +contains a +.Ql + +separated set of the following strings: +.Pp +.Bl -tag -width indent -compact +.It Li nbogus +The marked uops are not bogus. +.It Li bogus +The marked uops are bogus. +.El +.Pp +This event requires additional (upstream) events to be allocated to +perform the desired uop tagging. +The default qualifier counts both kinds of uops. +This event can be used for precise event based sampling. +.It Li p4-resource-stall Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count the occurrence or latency of stalls in the allocator. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li sbfull +A stall due to the lack of store buffers. +.El +.It Li p4-response +.Pq "TI event" +Count different types of responses. +Further documentation on this event is not available. +.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count branches retired. +Qualifier +.Ar flags +contains a +.Ql + +separated list of strings: +.Pp +.Bl -tag -width indent -compact +.It Li conditional +Count conditional jumps. +.It Li call +Count direct and indirect call branches. +.It Li return +Count return branches. +.It Li indirect +Count returns, indirect calls or indirect jumps. +.El +.Pp +The default qualifier counts all the above branch types. +.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count mispredicted branches retired. +Qualifier +.Ar flags +contains a +.Ql + +separated list of strings: +.Pp +.Bl -tag -width indent -compact +.It Li conditional +Count conditional jumps. +.It Li call +Count indirect call branches. +.It Li return +Count return branches. +.It Li indirect +Count returns, indirect calls or indirect jumps. +.El +.Pp +The default qualifier counts all the above branch types. +.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count the number of scalar double-precision uops. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count the number of scalar double-precision uops. +.El +.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count the number of scalar single-precision uops. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all uops operating on scalar single-precision operands. +.El +.It Li p4-snoop +.Pq "TI event" +Count snoop traffic. +Further documentation on this event is not available. +.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count the number of times an assist is required to handle problems +with the operands for SSE and SSE2 operations. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count assists for all SSE and SSE2 uops. +.El +.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier +.Pq "TS event" +Count events replayed at the store port. +Qualifier +.Ar qualifier +can take on one value: +.Pp +.Bl -tag -width indent -compact +.It Li split-st +Count split stores. +.El +.Pp +The default value for +.Ar qualifier +is +.Dq Li split-st . +.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier +.Pq "TI event" +Count the duration in cycles of operating modes of the trace cache and +decode engine. +The desired operating mode is selected by +.Ar qualifier , +which is a list of the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li DD +Both logical processors are in deliver mode. +.It Li DB +Logical processor 0 is in deliver mode while logical processor 1 is in +build mode. +.It Li DI +Logical processor 0 is in deliver mode while logical processor 1 is +halted, or in machine clear, or transitioning to a long microcode +flow. +.It Li BD +Logical processor 0 is in build mode while logical processor 1 is in +deliver mode. +.It Li BB +Both logical processors are in build mode. +.It Li BI +Logical processor 0 is in build mode while logical processor 1 is +halted, or in machine clear or transitioning to a long microcode +flow. +.It Li ID +Logical processor 0 is halted, or in machine clear or transitioning to +a long microcode flow while logical processor 1 is in deliver mode. +.It Li IB +Logical processor 0 is halted, or in machine clear or transitioning to +a long microcode flow while logical processor 1 is in build mode. +.El +.Pp +If there is only one logical processor in the processor package then +the qualifier for logical processor 1 is ignored. +If no qualifier is specified, the default qualifier is +.Dq Li DD+DB+DI+BD+BB+BI+ID+IB . +.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count the number of times uop delivery changed from the trace cache to +MS ROM. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li cisc +Count TC to MS transfers. +.El +.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count the number of valid uops written to the uop queue. +Qualifier +.Ar flags +is a list of the following strings, separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li from-tc-build +Count uops being written from the trace cache in build mode. +.It Li from-tc-deliver +Count uops being written from the trace cache in deliver mode. +.It Li from-rom +Count uops being written from microcode ROM. +.El +.Pp +The default qualifier counts all the above kinds of uops. +.It Li p4-uop-type Op Li ,mask= Ns Ar flags +.Pq "TS event" +This event is used in conjunction with the front-end at-retirement +mechanism to tag load and store uops. +Qualifer +.Ar flags +comprises the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li tagloads +Mark uops that are load operations. +.It Li tagstores +Mark uops that are store operations. +.El +.Pp +The default qualifier counts both kinds of uops. +.It Li p4-uops-retired Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count uops retired during a clock cycle. +Qualifier +.Ar flags +comprises the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li nbogus +Count marked uops that are not bogus. +.It Li bogus +Count marked uops that are bogus. +.El +.Pp +The default qualifier counts both kinds of uops. +.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count write-combining buffer operations. +Qualifier +.Ar flags +contains the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li wcb-evicts +WC buffer evictions due to any cause. +.It Li wcb-full-evict +WC buffer evictions due to no WC buffer being available. +.El +.Pp +The default qualifer counts both kinds of evictions. +.It Li p4-x87-assist Op Li ,mask= Ns Ar flags +.Pq "TS event" +Count the retirement of x87 instructions that required special +handling. +Qualifier +.Ar flags +contains the following strings separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li fpsu +Count instructions that saw an FP stack underflow. +.It Li fpso +Count instructions that saw an FP stack overflow. +.It Li poao +Count instructions that saw an x87 output overflow. +.It Li poau +Count instructions that saw an x87 output underflow. +.It Li prea +Count instructions that needed an x87 input assist. +.El +.Pp +The default qualifier counts all the above types of instruction +retirements. +.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count x87 floating-point uops. +Qualifier +.Ar flags +can take the following value (which is also the default): +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all x87 floating-point uops. +.El +.Pp +If an instruction contains more than one x87 floating-point uops, then +all x87 floating-point uops will be counted. +This event does not count x87 floating-point data movement operations. +.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags +.Pq "TI event" +Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store +data or perform register-to-register moves. +This event does not count integer move uops. +Qualifier +.Ar flags +may contain the following keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li allp0 +Count all x87 and SIMD store and move uops. +.It Li allp2 +Count all x87 and SIMD load uops. +.El +.Pp +The default is to count all uops. +.Pq Errata +This event may be affected by processor errata N43. +.El +.Ss "Cascading P4 PMCs" +PMC cascading support is currently poorly implemented. +While individual event counters may be allocated with a +.Dq Li cascade +qualifier, the current API does not offer the ability +to name and allocate all the resources needed for a +cascaded event counter pair in a single operation. +.Ss "Precise Event Based Sampling" +Support for precise event based sampling is currently +unimplemented. +.Ss Event Name Aliases +The following table shows the mapping between the PMC-independent +aliases supported by +.Lb libpmc +and the underlying hardware events used. +.Bl -column "branch-mispredicts" "Description" +.It Em Alias Ta Em Event +.It Li branches Ta Li p4-branch-retired,mask=mmtp+mmtm +.It Li branch-mispredicts Ta Li p4-mispred-branch-retired +.It Li dc-misses Ta (unsupported) +.It Li ic-misses Ta (unsupported) +.It Li instructions Ta Li p4-instr-retired,mask=nbogusntag+nbogustag +.It Li interrupts Ta Li (unsupported) +.It Li unhalted-cycles Ta Li p4-global-power-events +.El +.Sh SEE ALSO +.Xr pmc 3 , +.Xr pmc.k7 3 , +.Xr pmc.k8 3 , +.Xr pmc.p5 3 , +.Xr pmc.p6 3 , +.Xr pmc.tsc 3 , +.Xr pmclog 3 , +.Xr hwpmc 4 +.Sh HISTORY +The +.Nm pmc +library first appeared in +.Fx 6.0 . +.Sh AUTHORS +The +.Lb libpmc +library was written by +.An "Joseph Koshy" +.Aq jkoshy@FreeBSD.org . diff --git a/lib/libpmc/pmc.p5.3 b/lib/libpmc/pmc.p5.3 new file mode 100644 index 000000000000..aaf0a8f538cd --- /dev/null +++ b/lib/libpmc/pmc.p5.3 @@ -0,0 +1,402 @@ +.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" This software is provided by Joseph Koshy ``as is'' and +.\" any express or implied warranties, including, but not limited to, the +.\" implied warranties of merchantability and fitness for a particular purpose +.\" are disclaimed. in no event shall Joseph Koshy be liable +.\" for any direct, indirect, incidental, special, exemplary, or consequential +.\" damages (including, but not limited to, procurement of substitute goods +.\" or services; loss of use, data, or profits; or business interruption) +.\" however caused and on any theory of liability, whether in contract, strict +.\" liability, or tort (including negligence or otherwise) arising in any way +.\" out of the use of this software, even if advised of the possibility of +.\" such damage. +.\" +.\" $FreeBSD$ +.\" +.Dd September 16, 2008 +.Os +.Dt PMC 3 +.Sh NAME +.Nm pmc +.Nd library for accessing hardware performance monitoring counters +.Sh LIBRARY +.Lb libpmc +.Sh SYNOPSIS +.In pmc.h +.Sh DESCRIPTION +Intel Pentium PMCs are present in Intel +.Tn Pentium +and +.Tn "Pentium MMX" +processors. +These PMCs are documented in the +.Rs +.%B "Intel 64 and IA-32 Intel(R) Architectures Software Developer's Manual" +.%T "Volume 3B: System Programming Guide, Part 2" +.%N "Order Number 253669-024US" +.%D "August 2007" +.%Q "Intel Corporation" +.Re +.Ss PMC Features +These CPUs contain two PMCs, each 40 bits wide. +These PMCs support the following capabilities: +.Bl -column "PMC_CAP_INTERRUPT" "Support" +.It Em Capability Ta Em Support +.It PMC_CAP_CASCADE Ta \&No +.It PMC_CAP_EDGE Ta \&No +.It PMC_CAP_INTERRUPT Ta \&No +.It PMC_CAP_INVERT Ta \&No +.It PMC_CAP_READ Ta Yes +.It PMC_CAP_PRECISE Ta \&No +.It PMC_CAP_SYSTEM Ta Yes +.It PMC_CAP_TAGGING Ta \&No +.It PMC_CAP_THRESHOLD Ta \&No +.It PMC_CAP_USER Ta Yes +.It PMC_CAP_WRITE Ta Yes +.El +.Ss Event Qualifiers +Event specifiers for Intel Pentium PMCs can have the following common +qualifiers: +.Bl -tag -width indent +.It Li duration +Count duration (in clocks) of events. +The default is to count events. +.It Li os +Measure events at privilege levels 0, 1 and 2. +.It Li overflow +Assert the external processor pin associated with a counter on counter +overflow. +.It Li usr +Measure events at privilege level 3. +.El +.Pp +If neither of the +.Dq Li os +or +.Dq Li usr +qualifiers are specified, the default is to enable both. +.Pp +Some events may only be used on specific counters and some events +are defined only on processors supporting the MMX instruction set. +Note that these PMCs do not have the ability to interrupt the CPU. +.Ss Intel Pentium Event Specifiers +The event specifiers supported by Intel Pentium PMCs are: +.Bl -tag -width indent +.It Li p5-any-segment-register-loaded +The number of writes to any segment register, including the LDTR, +GDTR, TR and IDTR. +Far control transfers and task switches that involve privilege +level changes will count this event twice. +.It Li p5-bank-conflicts +The number of actual bank conflicts. +.It Li p5-branches +The number of taken and not taken branches including branches, jumps, calls, +software interrupts and interrupt returns. +.It Li p5-breakpoint-match-on-dr0-register +The number of matches on the DR0 breakpoint register. +.It Li p5-breakpoint-match-on-dr1-register +The number of matches on the DR1 breakpoint register. +.It Li p5-breakpoint-match-on-dr2-register +The number of matches on the DR2 breakpoint register. +.It Li p5-breakpoint-match-on-dr3-register +The number of matches on the DR3 breakpoint register. +.It Li p5-btb-false-entries +.Pq Tn Pentium MMX +The number of false entries in the BTB. +This event is only allocated on counter 0. +.It Li p5-btb-hits +The number of branches executed that hit in the branch table buffer. +.It Li p5-btb-miss-prediction-on-not-taken-branch +.Pq Tn Pentium MMX +The number of times the BTB predicted a not-taken branch as taken. +This event is only allocated on counter 1. +.It Li p5-bus-cycle-duration +The number of cycles while a bus cycle was in progress. +.It Li p5-bus-ownership-latency +.Pq Tn Pentium MMX +The time from bus ownership being requested to ownership being granted. +This event is only allocated on counter 0. +.It Li p5-bus-ownership-transfers +.Pq Tn Pentium MMX +The number of bus ownership transfers. +This event is only allocated on counter 1. +.It Li p5-bus-utilization-due-to-processor-activity +.Pq Tn Pentium MMX +The number of clocks the bus is busy due to the processor's own +activity. +This event is only allocated on counter 0. +.It Li p5-cache-line-sharing +.Pq Tn Pentium MMX +The number of shared data lines in L1 cache. +This event is only allocated on counter 1. +.It Li p5-cache-m-state-line-sharing +.Pq Tn Pentium MMX +The number of hits to an M- state line due to a memory access by +another processor. +This event is only allocated on counter 0. +.It Li p5-code-cache-miss +The number of instruction reads that miss the internal code cache. +Both cacheable and uncacheable misses are counted. +.It Li p5-code-read +The number of instruction reads to both cacheable and uncacheable regions. +.It Li p5-code-tlb-miss +The number of instruction reads that miss the instruction TLB. +Both cacheable and uncacheable unreads are counted. +.It Li p5-d1-starvation-and-fifo-is-empty +.Pq Tn Pentium MMX +The number of times the D1 stage cannot issue any instructions because +the FIFO was empty. +This event is only allocated on counter 0. +.It Li p5-d1-starvation-and-only-one-instruction-in-fifo +.Pq Tn Pentium MMX +The number of times the D1 stage could issue only one instruction +because the FIFO had one instruction ready. +This event is only allocated on counter 1. +.It Li p5-data-cache-lines-written-back +The number of data cache lines that are written back, including +those caused by internal and external snoops. +.It Li p5-data-cache-tlb-miss-stall-duration +.Pq Tn Pentium MMX +The number of clocks the pipeline is stalled due to a data cache +TLB miss. +This event is only allocated on counter 1. +.It Li p5-data-read +The number of memory data reads, counting internal data cache hits and +misses. +I/O and data memory accesses due to TLB miss processing are +not included. +Split cycle reads are counted individually. +.It Li p5-data-read-miss +The number of memory read accesses that miss the data cache, counting +both cacheable and uncacheable accesses. +Data accesses that are part of TLB miss processing are not included. +I/O accesses are not included. +.It Li p5-data-read-miss-or-write-miss +The number of data reads and writes that miss the internal data cache, +counting uncacheable accesses. +Data accesses due to TLB miss processing are not counted. +.It Li p5-data-read-or-write +The number of data reads and writes including internal data cache hits +and misses. +Data reads due to TLB miss processing are not counted. +.It Li p5-data-tlb-miss +The number of misses to the data cache translation lookaside buffer. +.It Li p5-data-write +The number of memory data writes, counting internal data cache hits +and misses. +I/O is not included and split cycle writes are counted individually. +.It Li p5-data-write-miss +The number of memory write accesses that miss the data cache, counting +both cacheable and uncacheable accesses. +I/O accesses are not counted. +.It Li p5-emms-instructions-executed +.Pq Tn Pentium MMX +The number of EMMS instructions executed. +This event is only allocated on counter 0. +.It Li p5-external-data-cache-snoop-hits +The number of external snoops to the data cache that hit a valid line, +or the data line fill buffer, or one of the write back buffers. +.It Li p5-external-snoops +The number of external snoop requests accepted, including snoops that +hit in the code cache, the data cache and that hit in neither. +.It Li p5-floating-point-stalls-duration +.Pq Tn Pentium MMX +The number of cycles the pipeline is stalled due to a floating point +freeze. +This event is only allocated on counter 0. +.It Li p5-flops +The number of floating point adds, subtracts, multiples, divides and +square roots. +Transcendental instructions trigger this event multiple times. +Instructions generating divide-by-zero, negative square root, special +operand and stack exceptions are not counted. +Integer multiply instructions that use the x87 FPU are counted. +.It Li p5-full-write-buffer-stall-duration-while-executing-mmx-instructions +.Pq Tn Pentium MMX +The number of clocks the pipeline has stalled due to full write +buffers when executing MMX instructions. +This event is only allocated on counter 0. +.It Li p5-hardware-interrupts +The number of taken INTR and NMI interrupts. +.It Li p5-instructions-executed +The number of instructions executed. +Repeat prefixed instructions are counted only once. +The HLT instruction is counted only once, irrespective of the number +of cycles spent in the halted state. +All hardware and software exceptions are counted as instructions, and +fault handler invocations are also counted as instructions. +.It Li p5-instructions-executed-v-pipe +The number of instructions that executed in the V pipe. +.It Li p5-io-read-or-write-cycle +The number of bus cycles directed to I/O space. +.It Li p5-locked-bus-cycle +The number of locked bus cycles that occur on account of the lock +prefixes, LOCK instructions, page table updates and descriptor table +updates. +.It Li p5-memory-accesses-in-both-pipes +The number of data memory reads or writes that are paired in both pipes. +.It Li p5-misaligned-data-memory-on-mmx-instructions +.Pq Tn Pentium MMX +The number of misaligned data memory references when executing MMX +instructions. +This event is only allocated on counter 0. +.It Li p5-misaligned-data-memory-or-io-references +The number of memory or I/O reads or writes that are not aligned on +natural boundaries. +2- and 4-byte accesses are counted as misaligned if they cross a 4 +byte boundary. +.It Li p5-mispredicted-or-unpredicted-returns +.Pq Tn Pentium MMX +The number of returns predicted incorrectly or not at all, only +counting RET instructions. +This event is only allocated on counter 0. +.It Li p5-mmx-instruction-data-read-misses +.Pq Tn Pentium MMX +The number of MMX instruction data read misses. +This event is only allocated on counter 1. +.It Li p5-mmx-instruction-data-reads +.Pq Tn Pentium MMX +The number of MMX instruction data reads. +This event is only allocated on counter 0. +.It Li p5-mmx-instruction-data-write-misses +.Pq Tn Pentium MMX +The number of data write misses caused by MMX instructions. +This event is only allocated on counter 1. +.It Li p5-mmx-instruction-data-writes +.Pq Tn Pentium MMX +The number of data writes caused by MMX instructions. +This event is only allocated on counter 0. +.It Li p5-mmx-instructions-executed-u-pipe +.Pq Tn Pentium MMX +The number of MMX instructions executed in the U pipe. +This event is only allocated on counter 0. +.It Li p5-mmx-instructions-executed-v-pipe +The number of MMX instructions executed in the V pipe. +This event is only allocated on counter 1. +.It Li p5-mmx-multiply-unit-interlock +.Pq Tn Pentium MMX +The number of clocks the pipeline is stalled because the destination +of a prior MMX multiply is not ready. +This event is only allocated on counter 0. +.It Li p5-movd-movq-store-stall-due-to-previous-mmx-operation +.Pq Tn Pentium MMX +The number of clocks a MOVD/MOVQ instruction stalled in the D2 stage +of the pipeline due to a previous MMX instruction. +This event is only allocated on counter 1. +.It Li p5-noncacheable-memory-reads +The number of bus cycles for non-cacheable instruction or data reads, +including cycles caused by TLB misses. +.It Li p5-number-of-cycles-not-in-halt-state +.Pq Tn Pentium MMX +The number of cycles the processor is not idle due to the HLT +instruction. +This event is only allocated on counter 0. +.It Li p5-pipeline-agi-stalls +The number of address generation interlock stalls. +An AGI that occurs in both the U and V pipelines in the same clock +signals the event twice. +.It Li p5-pipeline-flushes +The number of pipeline flushes that occur. +Pipeline flushes are caused by branch mispredicts, exceptions, +interrupts, some segment register loads, and BTB misses. +Prefetch queue flushes due to serializing instructions are not +counted. +.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions +.Pq Tn Pentium MMX +The number of pipeline flushes due to wrong branch predictions +resolved in either the E- or WB- stage of the pipeline. +This event is only allocated on counter 0. +.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions-resolved-in-wb-stage +.Pq Tn Pentium MMX +The number of pipeline flushes due to wrong branch predictions +resolved in the stage of the pipeline. +This event is only allocated on counter 1. +.It Li p5-pipeline-stall-for-mmx-instruction-data-memory-reads +.Pq Tn Pentium MMX +The number of clocks during pipeline stalls caused by waiting MMX data +memory reads. +This event is only allocated on counter 0. +.It Li p5-predicted-returns +.Pq Tn Pentium MMX +The number of predicted returns, whether correct or incorrect. +This counter only counts RET instructions. +This event is only allocated on counter 1. +.It Li p5-returns +.Pq Tn Pentium MMX +The number of RET instructions executed. +This event is only allocated on counter 0. +.It Li p5-saturating-mmx-instructions-executed +.Pq Tn Pentium MMX +The number of saturating MMX instructions executed. +This event is only allocated on counter 0. +.It Li p5-saturations-performed +.Pq Tn Pentium MMX +The number of saturating MMX instructions executed when at least one +of its results were actually saturated. +This event is only allocated on counter 1. +.It Li p5-stall-on-mmx-instruction-write-to-e-o-m-state-line +.Pq Tn Pentium MMX +The number of clocks during stalls on MMX instructions writing to +E- or M- state cache lines. +This event is only allocated on counter 1. +.It Li p5-stall-on-write-to-an-e-or-m-state-line +The number of stalls on a write to an exclusive or modified data cache +line. +.It Li p5-taken-branch-or-btb-hit +The number of events that may cause a hit in the BTB, namely either +taken branches or BTB hits. +.It Li p5-taken-branches +.Pq Tn Pentium MMX +The number of taken branches. +This event is only allocated on counter 1. +.It Li p5-transitions-between-mmx-and-fp-instructions +.Pq Tn Pentium MMX +The number of transitions between MMX and floating-point instructions +and vice-versa. +This event is only allocated on counter 1. +.It Li p5-waiting-for-data-memory-read-stall-duration +The number of clocks the pipeline was stalled waiting for data +memory reads. +Data TLB misses processing is included in this count. +.It Li p5-write-buffer-full-stall-duration +The number of clocks while the pipeline was stalled due to write +buffers being full. +.It Li p5-write-hit-to-m-or-e-state-lines +The number of writes that hit exclusive or modified lines in the data +cache. +.It Li p5-writes-to-noncacheable-memory +.Pq Tn Pentium MMX +The number of writes to non-cacheable memory, including write cycles +caused by TLB misses and I/O writes. +This event is only allocated on counter 1. +.El +.Sh SEE ALSO +.Xr pmc 3 , +.Xr pmc.k7 3 , +.Xr pmc.k8 3 , +.Xr pmc.p4 3 , +.Xr pmc.p6 3 , +.Xr pmc.tsc 3 , +.Xr pmclog 3 , +.Xr hwpmc 4 +.Sh HISTORY +The +.Nm pmc +library first appeared in +.Fx 6.0 . +.Sh AUTHORS +The +.Lb libpmc +library was written by +.An "Joseph Koshy" +.Aq jkoshy@FreeBSD.org . diff --git a/lib/libpmc/pmc.p6.3 b/lib/libpmc/pmc.p6.3 new file mode 100644 index 000000000000..992a8acd2646 --- /dev/null +++ b/lib/libpmc/pmc.p6.3 @@ -0,0 +1,954 @@ +.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" This software is provided by Joseph Koshy ``as is'' and +.\" any express or implied warranties, including, but not limited to, the +.\" implied warranties of merchantability and fitness for a particular purpose +.\" are disclaimed. in no event shall Joseph Koshy be liable +.\" for any direct, indirect, incidental, special, exemplary, or consequential +.\" damages (including, but not limited to, procurement of substitute goods +.\" or services; loss of use, data, or profits; or business interruption) +.\" however caused and on any theory of liability, whether in contract, strict +.\" liability, or tort (including negligence or otherwise) arising in any way +.\" out of the use of this software, even if advised of the possibility of +.\" such damage. +.\" +.\" $FreeBSD$ +.\" +.Dd September 16, 2008 +.Os +.Dt PMC.P6 3 +.Sh NAME +.Nm pmc.p6 +.Nd measurement events for +.Tn Intel +Pentium Pro, P-II, P-III family CPUs +.Sh LIBRARY +.Lb libpmc +.Sh SYNOPSIS +.In pmc.h +.Sh DESCRIPTION +Intel P6 PMCs are present in Intel +.Tn "Pentium Pro" , +.Tn "Pentium II" , +.Tn Celeron , +.Tn "Pentium III" +and +.Tn "Pentium M" +processors. +.Pp +They are documented in +.Rs +.%B "IA-32 Intel(R) Architecture Software Developer's Manual" +.%T "Volume 3: System Programming Guide" +.%N "Order Number 245472-012" +.%D 2003 +.%Q "Intel Corporation" +.Re +.Pp +Some of these events are affected by processor errata described in +.Rs +.%B "Intel(R) Pentium(R) III Processor Specification Update" +.%N "Document Number: 244453-054" +.%D "April 2005" +.%Q "Intel Corporation" +.Re +.Ss PMC Features +These CPUs have two counters, each 40 bits wide. +Some events may only be used on specific counters and some events are +defined only on specific processor models. +These PMCs support the following capabilities: +.Bl -column "PMC_CAP_INTERRUPT" "Support" +.It Em Capability Ta Em Support +.It PMC_CAP_CASCADE Ta \&No +.It PMC_CAP_EDGE Ta Yes +.It PMC_CAP_INTERRUPT Ta Yes +.It PMC_CAP_INVERT Ta Yes +.It PMC_CAP_READ Ta Yes +.It PMC_CAP_PRECISE Ta \&No +.It PMC_CAP_SYSTEM Ta Yes +.It PMC_CAP_TAGGING Ta \&No +.It PMC_CAP_THRESHOLD Ta Yes +.It PMC_CAP_USER Ta Yes +.It PMC_CAP_WRITE Ta Yes +.El +.Ss Event Qualifiers +Event specifiers for Intel P6 PMCs can have the following common +qualifiers: +.Bl -tag -width indent +.It Li cmask= Ns Ar value +Configure the PMC to increment only if the number of configured +events measured in a cycle is greater than or equal to +.Ar value . +.It Li edge +Configure the PMC to count the number of deasserted to asserted +transitions of the conditions expressed by the other qualifiers. +If specified, the counter will increment only once whenever a +condition becomes true, irrespective of the number of clocks during +which the condition remains true. +.It Li inv +Invert the sense of comparision when the +.Dq Li cmask +qualifier is present, making the counter increment when the number of +events per cycle is less than the value specified by the +.Dq Li cmask +qualifier. +.It Li os +Configure the PMC to count events happening at processor privilege +level 0. +.It Li umask= Ns Ar value +This qualifier is used to further qualify the event selected (see +below). +.It Li usr +Configure the PMC to count events occurring at privilege levels 1, 2 +or 3. +.El +.Pp +If neither of the +.Dq Li os +or +.Dq Li usr +qualifiers are specified, the default is to enable both. +.Pp +The event specifiers supported by Intel P6 PMCs are: +.Bl -tag -width indent +.It Li p6-baclears +Count the number of times a static branch prediction was made by the +branch decoder because the BTB did not have a prediction. +.It Li p6-br-bac-missp-exec +.Pq Tn "Pentium M" +Count the number of branch instructions executed that where +mispredicted at the Front End (BAC). +.It Li p6-br-bogus +Count the number of bogus branches. +.It Li p6-br-call-exec +.Pq Tn "Pentium M" +Count the number of call instructions executed. +.It Li p6-br-call-missp-exec +.Pq Tn "Pentium M" +Count the number of call instructions executed that were mispredicted. +.It Li p6-br-cnd-exec +.Pq Tn "Pentium M" +Count the number of conditional branch instructions executed. +.It Li p6-br-cnd-missp-exec +.Pq Tn "Pentium M" +Count the number of conditional branch instructions executed that were +mispredicted. +.It Li p6-br-ind-call-exec +.Pq Tn "Pentium M" +Count the number of indirect call instructions executed. +.It Li p6-br-ind-exec +.Pq Tn "Pentium M" +Count the number of indirect branch instructions executed. +.It Li p6-br-ind-missp-exec +.Pq Tn "Pentium M" +Count the number of indirect branch instructions executed that were +mispredicted. +.It Li p6-br-inst-decoded +Count the number of branch instructions decoded. +.It Li p6-br-inst-exec +.Pq Tn "Pentium M" +Count the number of branch instructions executed but necessarily retired. +.It Li p6-br-inst-retired +Count the number of branch instructions retired. +.It Li p6-br-miss-pred-retired +Count the number of mispredicted branch instructions retired. +.It Li p6-br-miss-pred-taken-ret +Count the number of taken mispredicted branches retired. +.It Li p6-br-missp-exec +.Pq Tn "Pentium M" +Count the number of branch instructions executed that were +mispredicted at execution. +.It Li p6-br-ret-bac-missp-exec +.Pq Tn "Pentium M" +Count the number of return instructions executed that were +mispredicted at the Front End (BAC). +.It Li p6-br-ret-exec +.Pq Tn "Pentium M" +Count the number of return instructions executed. +.It Li p6-br-ret-missp-exec +.Pq Tn "Pentium M" +Count the number of return instructions executed that were +mispredicted at execution. +.It Li p6-br-taken-retired +Count the number of taken branches retired. +.It Li p6-btb-misses +Count the number of branches for which the BTB did not produce a +prediction. +.It Li p6-bus-bnr-drv +Count the number of bus clock cycles during which this processor is +driving the BNR# pin. +.It Li p6-bus-data-rcv +Count the number of bus clock cycles during which this processor is +receiving data. +.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier +Count the number of clocks during which DRDY# is asserted. +An additional qualifier may be specified, and comprises one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-hit-drv +Count the number of bus clock cycles during which this processor is +driving the HIT# pin. +.It Li p6-bus-hitm-drv +Count the number of bus clock cycles during which this processor is +driving the HITM# pin. +.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier +Count the number of clocks during with LOCK# is asserted on the +external system bus. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-req-outstanding +Count the number of bus requests outstanding in any given cycle. +.It Li p6-bus-snoop-stall +Count the number of clock cycles during which the bus is snoop stalled. +.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier +Count the number of completed bus transactions of any kind. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier +Count the number of burst read transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier +Count the number of completed burst transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier +Count the number of completed deferred transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier +Count the number of completed instruction fetch transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier +Count the number of completed invalidate transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier +Count the number of completed memory transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier +Count the number of completed partial write transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier +Count the number of completed read-for-ownership transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier +Count the number of completed I/O transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier +Count the number of completed partial transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier +Count the number of completed write-back transactions. +An additional qualifier may be specified and comprises one of the following +keywords: +.Pp +.Bl -tag -width indent -compact +.It Li any +Count transactions generated by any agent on the bus. +.It Li self +Count transactions generated by this processor. +.El +.Pp +The default is to count operations generated by this processor. +.It Li p6-cpu-clk-unhalted +Count the number of cycles during with the processor was not halted. +.Pp +.Pq Tn "Pentium M" +Count the number of cycles during with the processor was not halted +and not in a thermal trip. +.It Li p6-cycles-div-busy +Count the number of cycles during which the divider is busy and cannot +accept new divides. +This event is only allocated on counter 0. +.It Li p6-cycles-in-pending-and-masked +Count the number of processor cycles for which interrupts were +disabled and interrupts were pending. +.It Li p6-cycles-int-masked +Count the number of processor cycles for which interrupts were +disabled. +.It Li p6-data-mem-refs +Count all loads and all stores using any memory type, including +internal retries. +Each part of a split store is counted separately. +.It Li p6-dcu-lines-in +Count the total lines allocated in the data cache unit. +.It Li p6-dcu-m-lines-in +Count the number of M state lines allocated in the data cache unit. +.It Li p6-dcu-m-lines-out +Count the number of M state lines evicted from the data cache unit. +.It Li p6-dcu-miss-outstanding +Count the weighted number of cycles while a data cache unit miss is +outstanding, incremented by the number of outstanding cache misses at +any time. +.It Li p6-div +Count the number of integer and floating-point divides including +speculative divides. +This event is only allocated on counter 1. +.It Li p6-emon-esp-uops +.Pq Tn "Pentium M" +Count the total number of micro-ops. +.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium M" +Count the number of +.Tn "Enhanced Intel SpeedStep" +transitions. +An additional qualifier may be specified, and can be one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all transitions. +.It Li freq +Count only frequency transitions. +.El +.Pp +The default is to count all transitions. +.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium M" +Count the number of retired fused micro-ops. +An additional qualifier may be specified, and may be one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li all +Count all fused micro-ops. +.It Li loadop +Count only load and op micro-ops. +.It Li stdsta +Count only STD/STA micro-ops. +.El +.Pp +The default is to count all fused micro-ops. +.It Li p6-emon-kni-comp-inst-ret +.Pq Tn "Pentium III" +Count the number of SSE computational instructions retired. +An additional qualifier may be specified, and comprises one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li packed-and-scalar +Count packed and scalar operations. +.It Li scalar +Count scalar operations only. +.El +.Pp +The default is to count packed and scalar operations. +.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium III" +Count the number of SSE instructions retired. +An additional qualifier may be specified, and comprises one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li packed-and-scalar +Count packed and scalar operations. +.It Li scalar +Count scalar operations only. +.El +.Pp +The default is to count packed and scalar operations. +.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium III" +Count the number of SSE prefetch or weakly ordered instructions +dispatched (including speculative prefetches). +An additional qualifier may be specified, and comprises one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li nta +Count non-temporal prefetches. +.It Li t1 +Count prefetches to L1. +.It Li t2 +Count prefetches to L2. +.It Li wos +Count weakly ordered stores. +.El +.Pp +The default is to count non-temporal prefetches. +.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium III" +Count the number of prefetch or weakly ordered instructions that miss +all caches. +An additional qualifier may be specified, and comprises one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li nta +Count non-temporal prefetches. +.It Li t1 +Count prefetches to L1. +.It Li t2 +Count prefetches to L2. +.It Li wos +Count weakly ordered stores. +.El +.Pp +The default is to count non-temporal prefetches. +.It Li p6-emon-pref-rqsts-dn +.Pq Tn "Pentium M" +Count the number of downward prefetches issued. +.It Li p6-emon-pref-rqsts-up +.Pq Tn "Pentium M" +Count the number of upward prefetches issued. +.It Li p6-emon-simd-instr-retired +.Pq Tn "Pentium M" +Count the number of retired +.Tn MMX +instructions. +.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium M" +Count the number of computational SSE instructions retired. +An additional qualifier may be specified and can be one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li sse-packed-single +Count SSE packed-single instructions. +.It Li sse-scalar-single +Count SSE scalar-single instructions. +.It Li sse2-packed-double +Count SSE2 packed-double instructions. +.It Li sse2-scalar-double +Count SSE2 scalar-double instructions. +.El +.Pp +The default is to count SSE packed-single instructions. +.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer +.Pp +.Pq Tn "Pentium M" +Count the number of SSE instructions retired. +An additional qualifier can be specified, and can be one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li sse-packed-single +Count SSE packed-single instructions. +.It Li sse-packed-single-scalar-single +Count SSE packed-single and scalar-single instructions. +.It Li sse2-packed-double +Count SSE2 packed-double instructions. +.It Li sse2-scalar-double +Count SSE2 scalar-double instructions. +.El +.Pp +The default is to count SSE packed-single instructions. +.It Li p6-emon-synch-uops +.Pq Tn "Pentium M" +Count the number of sync micro-ops. +.It Li p6-emon-thermal-trip +.Pq Tn "Pentium M" +Count the duration or occurrences of thermal trips. +Use the +.Dq Li edge +qualifier to count occurrences of thermal trips. +.It Li p6-emon-unfusion +.Pq Tn "Pentium M" +Count the number of unfusion events in the reorder buffer. +.It Li p6-flops +Count the number of computational floating point operations retired. +This event is only allocated on counter 0. +.It Li p6-fp-assist +Count the number of floating point exceptions handled by microcode. +This event is only allocated on counter 1. +.It Li p6-fp-comps-ops-exe +Count the number of computation floating point operations executed. +This event is only allocated on counter 0. +.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of transitions between MMX and floating-point +instructions. +An additional qualifier may be specified, and comprises one of the +following keywords: +.Pp +.Bl -tag -width indent -compact +.It Li mmxtofp +Count transitions from MMX instructions to floating-point instructions. +.It Li fptommx +Count transitions from floating-point instructions to MMX instructions. +.El +.Pp +The default is to count MMX to floating-point transitions. +.It Li p6-hw-int-rx +Count the number of hardware interrupts received. +.It Li p6-ifu-fetch +Count the number of instruction fetches, both cacheable and non-cacheable. +.It Li p6-ifu-fetch-miss +Count the number of instruction fetch misses (i.e., those that produce +memory accesses). +.It Li p6-ifu-mem-stall +Count the number of cycles instruction fetch is stalled for any reason. +.It Li p6-ild-stall +Count the number of cycles the instruction length decoder is stalled. +.It Li p6-inst-decoded +Count the number of instructions decoded. +.It Li p6-inst-retired +Count the number of instructions retired. +.It Li p6-itlb-miss +Count the number of instruction TLB misses. +.It Li p6-l2-ads +Count the number of L2 address strobes. +.It Li p6-l2-dbus-busy +Count the number of cycles during which the L2 cache data bus was busy. +.It Li p6-l2-dbus-busy-rd +Count the number of cycles during which the L2 cache data bus was busy +transferring read data from L2 to the processor. +.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier +Count the number of L2 instruction fetches. +An additional qualifier may be specified and comprises a list of the following +keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li e +Count operations affecting E (exclusive) state lines. +.It Li i +Count operations affecting I (invalid) state lines. +.It Li m +Count operations affecting M (modified) state lines. +.It Li s +Count operations affecting S (shared) state lines. +.El +.Pp +The default is to count operations affecting all (MESI) state lines. +.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier +Count the number of L2 data loads. +An additional qualifier may be specified and comprises a list of the following +keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li both +.Pq Tn "Pentium M" +Count both hardware-prefetched lines and non-hardware-prefetched lines. +.It Li e +Count operations affecting E (exclusive) state lines. +.It Li hw +.Pq Tn "Pentium M" +Count hardware-prefetched lines only. +.It Li i +Count operations affecting I (invalid) state lines. +.It Li m +Count operations affecting M (modified) state lines. +.It Li nonhw +.Pq Tn "Pentium M" +Exclude hardware-prefetched lines. +.It Li s +Count operations affecting S (shared) state lines. +.El +.Pp +The default on processors other than +.Tn "Pentium M" +processors is to count operations affecting all (MESI) state lines. +The default on +.Tn "Pentium M" +processors is to count both hardware-prefetched and +non-hardware-prefetch operations on all (MESI) state lines. +.Pq Errata +This event is affected by processor errata E53. +.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier +Count the number of L2 lines allocated. +An additional qualifier may be specified and comprises a list of the following +keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li both +.Pq Tn "Pentium M" +Count both hardware-prefetched lines and non-hardware-prefetched lines. +.It Li e +Count operations affecting E (exclusive) state lines. +.It Li hw +.Pq Tn "Pentium M" +Count hardware-prefetched lines only. +.It Li i +Count operations affecting I (invalid) state lines. +.It Li m +Count operations affecting M (modified) state lines. +.It Li nonhw +.Pq Tn "Pentium M" +Exclude hardware-prefetched lines. +.It Li s +Count operations affecting S (shared) state lines. +.El +.Pp +The default on processors other than +.Tn "Pentium M" +processors is to count operations affecting all (MESI) state lines. +The default on +.Tn "Pentium M" +processors is to count both hardware-prefetched and +non-hardware-prefetch operations on all (MESI) state lines. +.Pq Errata +This event is affected by processor errata E45. +.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier +Count the number of L2 lines evicted. +An additional qualifier may be specified and comprises a list of the following +keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li both +.Pq Tn "Pentium M" +Count both hardware-prefetched lines and non-hardware-prefetched lines. +.It Li e +Count operations affecting E (exclusive) state lines. +.It Li hw +.Pq Tn "Pentium M" +Count hardware-prefetched lines only. +.It Li i +Count operations affecting I (invalid) state lines. +.It Li m +Count operations affecting M (modified) state lines. +.It Li nonhw +.Pq Tn "Pentium M" only +Exclude hardware-prefetched lines. +.It Li s +Count operations affecting S (shared) state lines. +.El +.Pp +The default on processors other than +.Tn "Pentium M" +processors is to count operations affecting all (MESI) state lines. +The default on +.Tn "Pentium M" +processors is to count both hardware-prefetched and +non-hardware-prefetch operations on all (MESI) state lines. +.Pq Errata +This event is affected by processor errata E45. +.It Li p6-l2-m-lines-inm +Count the number of modified lines allocated in L2 cache. +.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier +Count the number of L2 M-state lines evicted. +.Pp +.Pq Tn "Pentium M" +On these processors an additional qualifier may be specified and +comprises a list of the following keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li both +Count both hardware-prefetched lines and non-hardware-prefetched lines. +.It Li hw +Count hardware-prefetched lines only. +.It Li nonhw +Exclude hardware-prefetched lines. +.El +.Pp +The default is to count both hardware-prefetched and +non-hardware-prefetch operations. +.Pq Errata +This event is affected by processor errata E53. +.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier +Count the total number of L2 requests. +An additional qualifier may be specified and comprises a list of the following +keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li e +Count operations affecting E (exclusive) state lines. +.It Li i +Count operations affecting I (invalid) state lines. +.It Li m +Count operations affecting M (modified) state lines. +.It Li s +Count operations affecting S (shared) state lines. +.El +.Pp +The default is to count operations affecting all (MESI) state lines. +.It Li p6-l2-st +Count the number of L2 data stores. +An additional qualifier may be specified and comprises a list of the following +keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li e +Count operations affecting E (exclusive) state lines. +.It Li i +Count operations affecting I (invalid) state lines. +.It Li m +Count operations affecting M (modified) state lines. +.It Li s +Count operations affecting S (shared) state lines. +.El +.Pp +The default is to count operations affecting all (MESI) state lines. +.It Li p6-ld-blocks +Count the number of load operations delayed due to store buffer blocks. +.It Li p6-misalign-mem-ref +Count the number of misaligned data memory references (crossing a 64 +bit boundary). +.It Li p6-mmx-assist +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of MMX assists executed. +.It Li p6-mmx-instr-exec +.Pq Tn Celeron , Tn "Pentium II" +Count the number of MMX instructions executed, except MOVQ and MOVD +stores from register to memory. +.It Li p6-mmx-instr-ret +.Pq Tn "Pentium II" +Count the number of MMX instructions retired. +.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of MMX instructions executed. +An additional qualifier may be specified and comprises a list of +the following keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li pack +Count MMX pack operation instructions. +.It Li packed-arithmetic +Count MMX packed arithmetic instructions. +.It Li packed-logical +Count MMX packed logical instructions. +.It Li packed-multiply +Count MMX packed multiply instructions. +.It Li packed-shift +Count MMX packed shift instructions. +.It Li unpack +Count MMX unpack operation instructions. +.El +.Pp +The default is to count all operations. +.It Li p6-mmx-sat-instr-exec +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of MMX saturating instructions executed. +.It Li p6-mmx-uops-exec +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of MMX micro-ops executed. +.It Li p6-mul +Count the number of integer and floating-point multiplies, including +speculative multiplies. +This event is only allocated on counter 1. +.It Li p6-partial-rat-stalls +Count the number of cycles or events for partial stalls. +.It Li p6-resource-stalls +Count the number of cycles there was a resource related stall of any kind. +.It Li p6-ret-seg-renames +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of segment register rename events retired. +.It Li p6-sb-drains +Count the number of cycles the store buffer is draining. +.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of segment register renames. +An additional qualifier may be specified, and comprises a list of the +following keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li ds +Count renames for segment register DS. +.It Li es +Count renames for segment register ES. +.It Li fs +Count renames for segment register FS. +.It Li gs +Count renames for segment register GS. +.El +.Pp +The default is to count operations affecting all segment registers. +.It Li p6-seg-rename-stalls +.Pq Tn "Pentium II" , Tn "Pentium III" +Count the number of segment register renaming stalls. +An additional qualifier may be specified, and comprises a list of the +following keywords separated by +.Ql + +characters: +.Pp +.Bl -tag -width indent -compact +.It Li ds +Count stalls for segment register DS. +.It Li es +Count stalls for segment register ES. +.It Li fs +Count stalls for segment register FS. +.It Li gs +Count stalls for segment register GS. +.El +.Pp +The default is to count operations affecting all the segment registers. +.It Li p6-segment-reg-loads +Count the number of segment register loads. +.It Li p6-uops-retired +Count the number of micro-ops retired. +.El +.Ss Event Name Aliases +The following table shows the mapping between the PMC-independent +aliases supported by +.Lb libpmc +and the underlying hardware events used. +.Bl -column "branch-mispredicts" "Description" +.It Em Alias Ta Em Event +.It Li branches Ta Li p6-br-inst-retired +.It Li branch-mispredicts Ta Li p6-br-miss-pred-retired +.It Li dc-misses Ta Li p6-dcu-lines-in +.It Li ic-misses Ta Li p6-ifu-fetch-miss +.It Li instructions Ta Li p6-inst-retired +.It Li interrupts Ta Li p6-hw-int-rx +.It Li unhalted-cycles Ta Li p6-cpu-clk-unhalted +.El +.Sh SEE ALSO +.Xr pmc 3 , +.Xr pmc.k7 3 , +.Xr pmc.k8 3 , +.Xr pmc.p4 3 , +.Xr pmc.p5 3 , +.Xr pmc.tsc 3 , +.Xr pmclog 3 , +.Xr hwpmc 4 +.Sh HISTORY +The +.Nm pmc +library first appeared in +.Fx 6.0 . +.Sh AUTHORS +The +.Lb libpmc +library was written by +.An "Joseph Koshy" +.Aq jkoshy@FreeBSD.org .