Name

    AMD_performance_monitor
    
Name Strings

    GL_AMD_performance_monitor
    
Contributors

    Dan Ginsburg
    Aaftab Munshi
    Dave Oldcorn
    Maurice Ribble
    Jonathan Zarge

Contact

    Dan Ginsburg (dan.ginsburg 'at' amd.com)

Status

    ???

Version

    Last Modified Date: 11/29/2007

Number

    OpenGL Extension #360
    OpenGL ES Extension #50

Dependencies

    None

Overview

    This extension enables the capture and reporting of performance monitors.
    Performance monitors contain groups of counters which hold arbitrary counted 
    data.  Typically, the counters hold information on performance-related
    counters in the underlying hardware.  The extension is general enough to
    allow the implementation to choose which counters to expose and pick the
    data type and range of the counters.  The extension also allows counting to 
    start and end on arbitrary boundaries during rendering.

Issues

    1.  Should this be an EGL or OpenGL/OpenGL ES extension?

        Decision - Make this an OpenGL/OpenGL ES extension
        
        Reason - We would like to expose this extension in both OpenGL and 
        OpenGL ES which makes EGL an unsuitable choice.  Further, support for 
        EGL is not a requirement and there are platforms that support OpenGL ES 
        but not EGL, making it difficult to make this an EGL extension.
        
    2.  Should the API support multipassing?
    
        Decision - No.
        
        Reason - Multipassing should really be left to the application to do.  
        This makes the API unnecessarily complicated.  A major issue is that 
        depending on which counters are to be sampled, the # of passes and which 
        counters get selected in each pass can be difficult to determine.  It is 
        much easier to give a list of counters categorized by groups with 
        specific information on the number of counters that can be selected from 
        each group.

    3.  Should we define a 64-bit data type for UNSIGNED_INT64_AMD?

        Decision - No.

        Reason - While counters can be returned as 64-bit unsigned integers, the
        data is passed back to the application inside of a void*.  Therefore,
        there is no need in this extension to define a 64-bit data type (e.g.,
        GLuint64).  It will be up the application to declare a native 64-bit
        unsigned integer and cast the returned data to that type.


New Procedures and Functions

    void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 
                                 uint *groups)
    
    void GetPerfMonitorCountersAMD(uint group, int *numCounters, 
                                   int *maxActiveCounters, sizei countersSize, 
                                   uint *counters)

    void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, sizei *length, 
                                      char *groupString)

    void GetPerfMonitorCounterStringAMD(uint group, uint counter, sizei bufSize,
                                        sizei *length, char *counterString)
 
    void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 
                                      enum pname, void *data)
    
    void GenPerfMonitorsAMD(sizei n, uint *monitors)
    
    void DeletePerfMonitorsAMD(sizei n, uint *monitors)
    
    void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 
                                      uint group, int numCounters, 
                                      uint *counterList)

    void BeginPerfMonitorAMD(uint monitor)
        
    void EndPerfMonitorAMD(uint monitor)

    void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, sizei dataSize, 
                                      uint *data, int *bytesWritten)


New Tokens

    Accepted by the <pame> parameter of GetPerfMonitorCounterInfoAMD
    
        COUNTER_TYPE_AMD                           0x8BC0
        COUNTER_RANGE_AMD                          0x8BC1
        
    Returned as a valid value in <data> parameter of
    GetPerfMonitorCounterInfoAMD if <pname> = COUNTER_TYPE_AMD
        
        UNSIGNED_INT                               0x1405
        FLOAT                                      0x1406
        UNSIGNED_INT64_AMD                         0x8BC2
        PERCENTAGE_AMD                             0x8BC3
        
    Accepted by the <pname> parameter of GetPerfMonitorCounterDataAMD
        
        PERFMON_RESULT_AVAILABLE_AMD               0x8BC4
        PERFMON_RESULT_SIZE_AMD                    0x8BC5
        PERFMON_RESULT_AMD                         0x8BC6

Addition to the GL specification

    Add a new section called Performance Monitoring
    
    A performance monitor consists of a number of hardware and software counters
    that can be sampled by the GPU and reported back to the application.
    Performance counters are organized as a single hierarchy where counters are
    categorized into groups.  Each group has a list of counters that belong to
    the counter and can be sampled, and a maximum number of counters that can be 
    sampled.
    
    The command
    
        void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, 
                                     uint *groups);
        
    returns the number of available groups in <numGroups>, if <numGroups> is
    not NULL.  If <groupsSize> is not 0 and <groups> is not NULL, then the list 
    of available groups is returned.  The number of entries that will be 
    returned in <groups> is determined by <groupsSize>.  If <groupsSize> is 0, 
    no information is copied.  Each group is identified by a unique unsigned int 
    identifier.
    
    The command
    
        void GetPerfMonitorCountersAMD(uint group, int *numCounters, 
                                       int *maxActiveCounters, 
                                       sizei countersSize, 
                                       uint *counters);
        
    returns the following information.  For each group, it returns the number of 
    available counters in <numCounters>, the max number of counters that can be
    active at any time in <maxActiveCounters>, and the list of counters in 
    <counters>.  The number of entries that can be returned in <counters> is
    determined by <countersSize>.  If <countersSize> is 0, no information is
    copied. Each counter in a group is identified by a unique unsigned int
    identifier.  If <group> does not reference a valid group ID, an 
    INVALID_VALUE error is generated.

    
    The command
    
        void GetPerfMonitorGroupStringAMD(uint group, sizei bufSize, 
                                          sizei *length, char *groupString)

        
    returns the string that describes the group name identified by <group> in 
    <groupString>.  The actual number of characters written to <groupString>,
    excluding the null terminator, is returned in <length>.  If <length> is 
    NULL, then no length is returned.  The maximum number of characters that
    may be written into <groupString>, including the null terminator, is 
    specified by <bufSize>.  If <bufSize> is 0 and <groupString> is NULL, the 
    number of characters that would be required to hold the group string,
    excluding the null terminator, is returned in <length>.  If <group> 
    does not reference a valid group ID, an INVALID_VALUE error is generated.
    
    
    The command
    
        void GetPerfMonitorCounterStringAMD(uint group, uint counter, 
                                            sizei bufSize, sizei *length, 
                                            char *counterString);

    
    returns the string that describes the counter name identified by <group> 
    and <counter> in <counterString>.  The actual number of characters written 
    to <counterString>, excluding the null terminator, is returned in <length>.  
    If <length> is NULL, then no length is returned.  The maximum number of 
    characters that may be written into <counterString>, including the null 
    terminator, is specified by <bufSize>.  If <bufSize> is 0 and 
    <counterString> is NULL, the number of characters that would be required to 
    hold the counter string, excluding the null terminator, is returned in 
    <length>.  If <group> does not reference a valid group ID, or <counter> 
    does not reference a valid counter within the group ID, an INVALID_VALUE 
    error is generated.
       
    The command
    
        void GetPerfMonitorCounterInfoAMD(uint group, uint counter, 
                                          enum pname, void *data);
        
    returns the following information about a counter.  For a <counter> 
    belonging to <group>, we can query the counter type and counter range.  If 
    <pname> is COUNTER_TYPE_AMD, then <data> returns the type.  Valid type
    values returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT.
    If type value returned is PERCENTAGE_AMD, then this describes a float
    value that is in the range [0.0 .. 100.0].  If <pname> is COUNTER_RANGE_AMD,
    <data> returns two values representing a minimum and a maximum. The 
    counter's type is used to determine the format in which the range values 
    are returned.  If <group> does not reference a valid group ID, or <counter> 
    does not reference a valid counter within the group ID, an INVALID_VALUE 
    error is generated.

    
    The command
    
        void GenPerfMonitorsAMD(sizei n, uint *monitors)
        
    returns a list of monitors.  These monitors can then be used to select 
    groups/counters to be sampled, to start multiple monitoring sessions and to 
    return counter information sampled by the GPU.  At creation time, the 
    performance monitor object has all counters disabled.  The value of the
    PERFMON_RESULT_AVAILABLE_AMD, PERFMON_RESULT_AMD, and 
    PERFMON_RESULT_SIZE_AMD queries will all initially be 0.
    
    The command
    
        void DeletePerfMonitorsAMD(sizei n, uint *monitors)
        
    is used to delete the list of monitors created by a previous call to 
    GenPerfMonitors.  If a monitor ID in the list <monitors> does not 
    reference a previously generated performance monitor, an INVALID_VALUE
    error is generated.
    
    The command 
    
        void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 
                                          uint group, int numCounters, 
                                          uint *counterList);
        
    is used to enable or disable a list of counters from a group to be monitored 
    as identified by <monitor>.  The <enable> argument determines whether the
    counters should be enabled or disabled.  <group> specifies the group
    ID under which counters will be enabled or disabled.  The <numCounters>
    argument gives the number of counters to be selected from the list 
    <counterList>.  If <monitor> is not a valid monitor created by 
    GenPerfMonitorsAMD, then INVALID_VALUE error will be generated.  If <group>
    is not a valid group, the INVALID_VALUE error will be generated.  If 
    <numCounters> is less than 0, an INVALID_VALUE error will be generated. 

    When SelectPerfMonitorCountersAMD is called on a monitor, any outstanding 
    results for that monitor become invalidated and the result queries 
    PERFMON_RESULT_SIZE_AMD and PERFMON_RESULT_AVAILABLE_AMD are reset to 0.
    
    The command
    
        void BeginPerfMonitorAMD(uint monitor);
        
    is used to start a monitor session.  Note that BeginPerfMonitor calls cannot 
    be nested.  In addition, it is quite possible that given the list of groups 
    and counters/group enabled for a monitor, it may not be able to sample the 
    necessary counters and so the monitor session will fail.  In such a case,
    an INVALID_OPERATION error will be generated.

    While BeginPerfMonitorAMD does mark the beginning of performance counter
    collection, the counters do not begin collecting immediately.  Rather, the
    counters begin collection when BeginPerfMonitorAMD is processed by
    the hardware.  That is, the API is asynchronous, and performance counter
    collection does not begin until the graphics hardware processes the
    BeginPerfMonitorAMD command.  
    
    The command
    
        void EndPerfMonitorAMD(uint monitor);
        
    ends a monitor session started by BeginPerfMonitorAMD.  If a performance 
    monitor is not currently started, an INVALID_OPERATION error will be 
    generated.
    
    Note that there is an implied overhead to collecting performance counters
    that may or may not distort performance depending on the implementation.  
    For example, some counters may require a pipeline flush thereby causing a
    change in the performance of the application.  Further, the frequency at 
    which an application samples may distort the accuracy of counters which are 
    variant (e.g., non-deterministic based on the input).  While the effects 
    of sampling frequency are implementation dependent, general guidance can
    be given that sampling at a high frequency may distort both performance
    of the application and the accuracy of variant counters.

    The command
    
        void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, 
                                          sizei dataSize, 
                                          uint *data, sizei *bytesWritten);
        
    is used to return counter values that have been sampled for a monitor
    session.  If <pname> is PERFMON_RESULT_AVAILABLE_AMD, then <data> will
    indicate whether the result is available or not.  If <pname> is 
    PERFMON_RESULT_SIZE_AMD, <data> will contain actual size of all counter 
    results being sampled.  If <pname> is PERFMON_RESULT_AMD, <data> will
    contain results.  For each counter of a group that was selected to be 
    sampled, the information is returned as group ID, followed by counter ID, 
    followed by counter value.  The size of counter value returned will depend 
    on the counter value type.  The argument <dataSize> specifies the number of
    bytes available in the <data> buffer for writing.  If <bytesWritten> is not 
    NULL, it gives the number of bytes written into the <data> buffer.  It is an 
    INVALID_OPERATION error for <data> to be NULL.  If <pname> is 
    PERFMON_RESULT_AMD and <dataSize> is less than the number of bytes required 
    to store the results as reported by a PERFMON_RESULT_SIZE_AMD query, then 
    results will be written only up to the number of bytes specified by 
    <dataSize>.

    If no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for a monitor,
    then the result of querying for PERFMON_RESULT_AVAILABLE and 
    PERFMON_RESULT_SIZE will be 0.  When SelectPerfMonitorCountersAMD is called
    on a monitor, the results stored for the monitor become invalidated and
    the value of PERFMON_RESULT_AVAILABLE and PERFMON_RESULT_SIZE queries should
    behave as if no BeginPerfMonitorAMD/EndPerfMonitorAMD has been issued for
    the monitor.

Errors

    INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is unable
    to begin monitoring with the currently selected counters.  

    INVALID_OPERATION error will be generated if BeginPerfMonitorAMD is called
    when a performance monitor is already active.

    INVALID_OPERATION error will be generated if EndPerfMonitorAMD is called
    when a performance monitor is not currently started.

    INVALID_VALUE error will be generated if the <group> parameter to 
    GetPerfMonitorCountersAMD, GetPerfMonitorCounterStringAMD,
    GetPerfMonitorCounterStringAMD, GetPerfMonitorCounterInfoAMD, or
    SelectPerfMonitorCountersAMD does not reference a valid group ID.

    INVALID_VALUE error will be generated if the <counter> parameter to
    GetPerfMonitorCounterInfoAMD does not reference a valid counter ID
    in the group specified by <group>.

    INVALID_VALUE error will be generated if any of the monitor IDs
    in the <monitors> parameter to DeletePerfMonitorsAMD do not reference
    a valid generated monitor ID.
   
    INVALID_VALUE error will be generated if the <monitor> parameter to
    SelectPerfMonitorCountersAMD does not reference a monitor created by
    GenPerfMonitorsAMD.

    INVALID_VALUE error will be generated if the <numCounters> parameter to
    SelectPerfMonitorCountersAMD is less than 0.

     

New State

Sample Usage

    typedef struct 
    {
            GLuint       *counterList;
            int         numCounters;
            int         maxActiveCounters;
    } CounterInfo;

    void
    getGroupAndCounterList(GLuint **groupsList, int *numGroups, 
                           CounterInfo **counterInfo)
    {
        GLint          n;
        GLuint        *groups;
        CounterInfo   *counters;

        glGetPerfMonitorGroupsAMD(&n, 0, NULL);
        groups = (GLuint*) malloc(n * sizeof(GLuint));
        glGetPerfMonitorGroupsAMD(NULL, n, groups);
        *numGroups = n;

        *groupsList = groups;
        counters = (CounterInfo*) malloc(sizeof(CounterInfo) * n);
        for (int i = 0 ; i < n; i++ )
        {
            glGetPerfMonitorCountersAMD(groups[i], &counters[i].numCounters,
                                     &counters[i].maxActiveCounters, 0, NULL);

            counters[i].counterList = (GLuint*)malloc(counters[i].numCounters * 
                                                      sizeof(int));

            glGetPerfMonitorCountersAMD(groups[i], NULL, NULL,
                                        counters[i].numCounters, 
                                        counters[i].counterList);
        }

        *counterInfo = counters;
    }
    
    static int  countersInitialized = 0;
        
    int
    getCounterByName(char *groupName, char *counterName, GLuint *groupID, 
                     GLuint *counterID)
    {
        int          numGroups;
        GLuint       *groups;
        CounterInfo  *counters;
        int          i = 0;

        if (!countersInitialized)
        {
            getGroupAndCounterList(&groups, &numGroups, &counters);
            countersInitialized = 1;
        }

        for ( i = 0; i < numGroups; i++ )
        {
           char curGroupName[256];
           glGetPerfMonitorGroupStringAMD(groups[i], 256, NULL, curGroupName);
           if (strcmp(groupName, curGroupName) == 0)
           {
               *groupID = groups[i];
               break;
           }
        }

        if ( i == numGroups )
            return -1;           // error - could not find the group name

        for ( int j = 0; j < counters[i].numCounters; j++ )
        {
            char curCounterName[256];
            
            glGetPerfMonitorCounterStringAMD(groups[i],
                                             counters[i].counterList[j], 
                                             256, NULL, curCounterName);
            if (strcmp(counterName, curCounterName) == 0)
            {
                *counterID = counters[i].counterList[j];
                return 0;
            }
        }

        return -1;           // error - could not find the counter name
    }

    void
    drawFrameWithCounters(void)
    {
        GLuint group[2];
        GLuint counter[2];
        GLuint monitor;
        GLuint *counterData;

        // Get group/counter IDs by name.  Note that normally the
        // counter and group names need to be queried for because
        // each implementation of this extension on different hardware
        // could define different names and groups.  This is just provided
        // to demonstrate the API.
        getCounterByName("HW", "Hardware Busy", &group[0],
                         &counter[0]);
        getCounterByName("API", "Draw Calls", &group[1], 
                         &counter[1]);
                
        // create perf monitor ID
        glGenPerfMonitorsAMD(1, &monitor);

        // enable the counters
        glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[0], 1,
                                       &counter[0]);
        glSelectPerfMonitorCountersAMD(monitor, GL_TRUE, group[1], 1, 
                                       &counter[1]);

        glBeginPerfMonitorAMD(monitor);

        // RENDER FRAME HERE
        // ...
        
        glEndPerfMonitorAMD(monitor);

        // read the counters
        GLint resultSize;
        glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_SIZE_AMD, 
                                       sizeof(GLint), &resultSize, NULL);

        counterData = (GLuint*) malloc(resultSize);

        GLsizei bytesWritten;
        glGetPerfMonitorCounterDataAMD(monitor, GL_PERFMON_RESULT_AMD,  
                                       resultSize, counterData, &bytesWritten);

        // display or log counter info
        GLsizei wordCount = 0;

        while ( (4 * wordCount) < bytesWritten )
        {
            GLuint groupId = counterData[wordCount];
            GLuint counterId = counterData[wordCount + 1];

            // Determine the counter type
            GLuint counterType;
            glGetPerfMonitorCounterInfoAMD(groupId, counterId, 
                                           GL_COUNTER_TYPE_AMD, &counterType);
 
            if ( counterType == GL_UNSIGNED_INT64_AMD )
            {
                unsigned __int64 counterResult = 
                           *(unsigned __int64*)(&counterData[wordCount + 2]);

                // Print counter result

                wordCount += 4;
            }
            else if ( counterType == GL_FLOAT )
            {
                float counterResult = *(float*)(&counterData[wordCount + 2]);

                // Print counter result

                wordCount += 3;
            } 
            // else if ( ... ) check for other counter types 
            //   (GL_UNSIGNED_INT and GL_PERCENTAGE_AMD)
        }
    }
 
Revision History
    11/29/2007 - dginsburg
       + Clarified the default state of a performance monitor object on creation

    11/09/2007 - dginsbur
       + Clarify what happens if SelectPerfMonitorCountersAMD is called on
         a monitor with outstanding query results.
       + Rename counterSize to countersSize
       + Remove some ';' typos

    06/13/2007 - dginsbur
       + Add language on the asynchronous nature of the API and 
         counter accuracy/performance distortion.
       + Add myself as the contact
       + Remove INVALID_OPERATION error when countersList is NULL
       + Clarify 64-bit issue
       + Make PERCENTAGE_AMD counters float rather than uint
       + Clarify accuracy distortion on variant counters only
       + Tweak to overview language

    06/09/2007 - dginsbur
       + Fill in errors section and make many more errors explicit
       + Fix the example code so it compiles

    06/08/2007 - dginsbur
       + Modified GetPerfMonitorGroupString and GetPerfMonitorCounterString to
         be more client/server friendly.  
       + Modified example.
       + Renamed parameters/variables to follow GL conventions.
       + Modified several 'int' param types to 'sizei'
       + Modifid counters type from 'int' to 'uint'
       + Renamed argument 'cb' and 'cbret'
       + Better documented GetPerfMonitorCounterData 
       + Add AMD adornment in many places that were missing it
 
    06/07/2007 - dginsbur
       + Cleanup formatting, remove tabs, make fit in proper page width
       + Add FLOAT and UNSIGNED_INT to list of COUNTER_TYPEs
       + Fix some bugs in the example code
       + Rewrite introduction
       + Clarified Issue 1 reasoning
       + Added Issue 3 regarding use of 64-bit data types
       + Added revision history

    03/21/2007 - Initial version written.  Written by amunshi.

        
