Name

    NV_vertex_program3

Name Strings

    GL_NV_vertex_program3

Contact

    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

Status

    Shipping.

Version

    Last Modified Data:         10/12/2009
    NVIDIA Revision:            7

Number

    306

Dependencies

    ARB_vertex_program is required.
    NV_vertex_program2_option is required.
    This extension interacts with ARB_fragment_program_shadow.

Overview

    This extension, like the NV_vertex_program2_option extension,
    provides additional vertex program functionality to extend the
    standard ARB_vertex_program language and execution environment.
    ARB programs wishing to use this added functionality need only add:

        OPTION NV_vertex_program3;

    to the beginning of their vertex programs.

    New functionality provided by this extension, above and beyond that
    already provided by NV_vertex_program2_option extension, includes: 

        * texture lookups in vertex programs,

        * ability to push and pop address registers on the stack,

        * address register-relative addressing for vertex attribute and
          result arrays, and

        * a second four-component condition code.

Issues

    Should we provided a separate "!!VP3.0" program type, like the
    "!!VP2.0" type defined in NV_vertex_program2?

      RESOLVED:  No.  Since ARB_vertex_program has been fully defined
      (it wasn't in the !!VP2.0 time-frame), we will simply define
      language extensions to !!ARBvp1.0 that expose new functionality.
      The NV_vertex_program2_option specification followed this same
      pattern for the NV3X family (GeForce FX, Quadro FX).

    Should this be called "NV_vertex_program3_option"?

      RESOLVED:  No.  The similar extension to !!ARBvp1.0 called
      "NV_vertex_program2_option" got that name only because the simpler
      "NV_vertex_program2" name had already been used.

    Is there a limit on the number of texture units that can be accessed
    by a vertex program?

      RESOLVED:  Yes.  The limit may be lower than the total number of texture
      image units available and is given by the implementation-dependent
      constant MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB.  Any program that attempts
      to use more unique texture image units will fail to load.  Programs can
      use any texture image unit number, as long as they don't use too many
      simultaneously.  As an example, the GeForce 6 series of GPUs provides 16
      texture image units accessible to vertex programs, but no more than four
      can be used simultaneously.  It is not an error to use texture image
      units 12-15 in a program.

      This limitation is identical to the one in the ARB_vertex_shader
      extensions -- both extensions use the same enum to query the number of
      available image units.  Violating this limit in GLSL results in a link
      error.

    Is there a restriction on the texture targets that can be accessed by a
    vertex program?

      RESOLVED:  Yes -- for any texture image unit, vertex and fragment
      processing can not use different targets.  If they do, an
      INVALID_OPERATION is generated at Begin-time.  This resolution is
      consistent with resultion of the same issue in the ARB_vertex_shader
      extension and OpenGL 2.0.

    Since vertices don't have screen space partial derivatives, how is
    the LOD used for texture accesses defined?

      RESOLVED:  The TXL instruction allows a program to explicitly
      set an LOD; the LOD for all other texture instructions is zero.
      The texture LOD bias specified in the texture object and environment
      do apply to all vertex texture lookups.


New Procedures and Functions

    None.

New Tokens

    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
    GetFloatv, and GetDoublev:

        MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB              0x8B4C

Additions to Chapter 2 of the OpenGL 1.4 Specification (OpenGL Operation)

    Modify Section 2.14.2, Vertex Program Grammar and Restrictions

    (mostly add to existing grammar rules, as extended by
    NV_vertex_program2_option)

    <optionName>            ::= "NV_vertex_program3"

    <instruction>           ::= <TexInstruction>

    <ALUInstruction>        ::= <ASTACKop_instruction>

    <TexInstruction>        ::= <TEXop_instruction>

    <ASTACKop_instruction>  ::= <PUSHAop> <instOperandAddrVNS>
                              | <POPAop> <instResultAddr>

    <PUSHAop>               ::= "PUSHA"

    <POPAop>                ::= "POPA"

    <TEXop_instruction>     ::= <TEXop> <instResult> "," <instOperandV> "," 
                                <texTarget>

    <TEXop>                 ::= "TEX"
                              | "TXP"
                              | "TXB"
                              | "TXL"

    <texTarget>             ::= <texImageUnit> "," <texTargetType>

    <texImageUnit>          ::= "texture" <optTexImageUnitNum>

    <optTexImageUnitNum>    ::= /* empty */
                              | "[" <texImageUnitNum> "]"

    <texImageUnitNum>       ::= <integer> 
                                /*[0,MAX_TEXTURE_IMAGE_UNITS_ARB-1]*/

    <texTargetType>         ::= "1D"
                              | "2D"
                              | "3D"
                              | "CUBE"
                              | "RECT"

    <attribVtxBasic>        ::= "texcoord" "[" <arrayMemRel> "]"
                              | "attrib" "[" <arrayMemRel> "]"

    <resultVtxBasic>        ::= "texcoord" "[" <arrayMemRel> "]"

    <ccMaskRule>            ::= "EQ0"
                              | "GE0"
                              | "GT0"
                              | "LE0"
                              | "LT0"
                              | "NE0"
                              | "TR0"
                              | "FL0"
                              | "EQ1"
                              | "GE1"
                              | "GT1"
                              | "LE1"
                              | "LT1"
                              | "NE1"
                              | "TR1"
                              | "FL1"

    (modify description of reserved identifiers)
    
    ... The following strings are reserved keywords and may not be used
    as identifiers:

        ABS, ADD, ADDRESS, ALIAS, ARA, ARL, ARR, ATTRIB, BRA, CAL, COS,
        DP3, DP4, DPH, DST, END, EX2, EXP, FLR, FRC, LG2, LIT, LOG, MAD,
        MAX, MIN, MOV, MUL, OPTION, OUTPUT, PARAM, POPA, POW, PUSHA, RCC,
        RCP, RET, RSQ, SEQ, SFL, SGE, SGT, SIN, SLE, SLT, SNE, SUB, SSG,
        STR, SWZ, TEMP, TEX, TXB, TXL, TXP, XPD, program, result, state,
        and vertex.

    Modify Section 2.14.3.1, Vertex Attributes

    (add new bindings to binding table)

      Vertex Attribute Binding  Components  Underlying State
      ------------------------  ----------  --------------------------------
      ...
      vertex.texcoord[A+n]      (s,t,r,q)   indexed texture coordinate
      vertex.attrib[A+n]        (x,y,z,w)   indexed generic vertex attribute

    If a vertex attribute binding matches "vertex.texcoord[A+n]", where
    "A" is a component of an address register (Section 2.14.3.5), a
    texture coordinate number <c> is computed by adding the current
    value of the address register component and <n>.  The "x", "y",
    "z", and "w" components of the vertex attribute variable are
    filled with the "s", "t", "r", and "q" components, respectively,
    of the vertex texture coordinates for texture unit <c>.  If <c>
    is negative or greater than or equal to MAX_TEXTURE_COORDS_ARB,
    the vertex attribute variable is undefined.

    If a vertex attribute binding matches "vertex.attrib[A+n]", where
    "A" is a component of an address register (Section 2.14.3.5), a
    vertex attribute number <a> is computed by adding the current value
    of the address register component and <n>.  The "x", "y", "z", and
    "w" components of the vertex attribute variable are filled with the
    "x", "y", "z", and "w" components, respectively, of generic vertex
    attribute <a>.  If <a> is negative or greater than or equal to
    MAX_VERTEX_ATTRIBS_ARB, the vertex attribute variable is undefined.

    Modify Section 2.14.3.4, Vertex Program Results

    (add new binding to binding table)

      Binding                        Components  Description
      -----------------------------  ----------  ----------------------------
      ...
      result.texcoord[A+n]           (s,t,r,q)   indexed texture coordinate

    If a result variable binding matches "result.texcoord[A+n]", where "A"
    is a component of an address register (Section 2.14.3.5), a texture
    coordinate number <c> is computed by adding the current value of
    the address register component and <n>.  Updates to the "x", "y",
    "z", and "w" components of the result variable set the "s", "t",
    "r" and "q" components, respectively, of the transformed vertex's
    texture coordinates for texture unit <c>.  If <c> is negative or
    greater than or equal to MAX_TEXTURE_COORDS_ARB, the effects of
    updates to vertex attribute variable are undefined and may overwrite
    other programs results.

    Modify Section 2.14.3.X, Condition Code Registers (added in
    NV_Vertex_program2_option)

    The vertex program condition code registers are two four-component
    vectors, called CC0 and CC1.  Each component of this register is one
    of four enumerated values: GT (greater than), EQ (equal), LT (less
    than), or UN (unordered).  The condition code register can be used
    to mask writes to registers and to evaluate conditional branches.

    Most vertex program instructions can optionally update one of the
    two condition code registers.  When a vertex program instruction
    updates a condition code register, a condition code component is set
    to LT if the corresponding component of the result is less than zero,
    EQ if it is equal to zero, GT if it is greater than zero, and UN if
    it is NaN (not a number).

    The condition code registers are initialized to vectors of EQ values
    each time a vertex program executes.

    Modify Section 2.14.3.7, Vertex Program Resource Limits

    (add new paragraph to end of section) In addition to the previous limits,
    the number of unique texture image units that can be accessed
    simultaneously by a vertex program is limited.  The limit is given by the
    implementation-dependent constant MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB, and
    may be lower than the total number of texture image units provided.  If
    the number of texture image units referenced by a vertex program exceeds
    this limit, the program will fail to load.
    
    Modify Section 2.14.4, Vertex Program Execution Environment

    (modify Begin-time error language for vertex program execution to cover
    invalid texture uses)

    If vertex program mode is enabled and the currently bound program object
    does not contain a valid vertex program, the error INVALID_OPERATION will
    be generated by Begin, RasterPos, and any command that implicitly calls
    Begin (e.g., DrawArrays).

    If vertex program mode is enabled and the currently bound program object
    accesses a texture image unit, the texture target used must be consistent
    with the target (if any) used for fragment processing.  If vertex and
    fragment processing require the use of different texture targets on the
    same texture image unit, the error INVALID_OPERATION will be generated by
    Begin, RasterPos, and any command that implicitly calls Begin.

    (modify instruction table) There are forty-eight vertex program
    instructions.  Vertex program instructions may have up to eight
    variants, including a suffix of "C" or "C0" to allow an update of
    condition code register zero (section 2.14.3.X), a suffix of "C1"
    to allow an update of condition code register one, and a suffix of
    "_SAT" to clamp the result vector components to the range [0,1].
    For example, the eight forms of the "ADD" instruction are "ADD",
    "ADDC", "ADDC0", "ADDC1", "ADD_SAT", "ADDC_SAT", "ADDC0_SAT", and
    "ADDC1_SAT".  The instructions and their respective input and output
    parameters are summarized in Table X.5.

                  Modifiers
      Instruction   C S   Inputs  Output   Description
      -----------   - -   ------  ------   --------------------------------
      ABS           X X   v       v        absolute value
      ADD           X X   v,v     v        add
      ARA           X -   a       a        address register add
      ARL           X -   s       a        address register load
      ARR           X -   v       a        address register load (round)
      BRA           - -   c       -        branch
      CAL           - -   c       -        subroutine call
      COS           X X   s       ssss     cosine
      DP3           X X   v,v     ssss     3-component dot product
      DP4           X X   v,v     ssss     4-component dot product
      DPH           X X   v,v     ssss     homogeneous dot product
      DST           X X   v,v     v        distance vector
      EX2           X X   s       ssss     exponential base 2
      EXP           X X   s       v        exponential base 2 (approximate)
      FLR           X X   v       v        floor
      FRC           X X   v       v        fraction
      LG2           X X   s       ssss     logarithm base 2
      LIT           X X   v       v        compute light coefficients
      LOG           X X   s       v        logarithm base 2 (approximate)
      MAD           X X   v,v,v   v        multiply and add
      MAX           X X   v,v     v        maximum
      MIN           X X   v,v     v        minimum
      MOV           X X   v       v        move
      MUL           X X   v,v     v        multiply
      POPA          - -   -       a        pop address register
      POW           X X   s,s     ssss     exponentiate
      PUSHA         - -   a       -        push address register
      RCC           X X   s       ssss     reciprocal (clamped)
      RCP           X X   s       ssss     reciprocal
      RET           - -   c       -        subroutine return
      RSQ           X X   s       ssss     reciprocal square root
      SEQ           X X   v,v     v        set on equal
      SFL           X X   v,v     v        set on false
      SGE           X X   v,v     v        set on greater than or equal
      SGT           X X   v,v     v        set on greater than
      SIN           X X   s       ssss     sine
      SLE           X X   v,v     v        set on less than or equal
      SLT           X X   v,v     v        set on less than
      SNE           X X   v,v     v        set on not equal
      SSG           X X   v       v        set sign
      STR           X X   v,v     v        set on true
      SUB           X X   v,v     v        subtract
      SWZ           X X   v       v        extended swizzle
      TEX           X X   v       v        texture lookup
      TXB           X X   v       v        texture lookup with LOD bias
      TXL           X X   v       v        texture lookup with explicit LOD
      TXP           X X   v       v        projective texture lookup
      XPD           X X   v,v     v        cross product

      Table X.5:  Summary of vertex program instructions.  The columns
      "C" and "S" indicate whether the "C", "C0", and "C1" condition code
      update modifiers, and the "_SAT" saturation modifiers, respectively,
      are supported for the opcode.  "v" indicates a floating-point vector
      input or output, "s" indicates a floating-point scalar input,
      "ssss" indicates a scalar output replicated across a 4-component
      result vector, "a" indicates a vector address register, and "c"
      indicates a condition code test.

    Rewrite Section 2.14.4.3,  Vertex Program Destination Register Update

    A vertex program instruction can optionally clamp the results of
    a floating-point result vector to the range [0,1].  The components
    of the result vector are clamped to [0,1] if the saturation suffix
    "_SAT" is present in the instruction.

    Most vertex program instructions write a 4-component result vector to
    a single temporary or vertex result register.  Writes to individual
    components of the destination register are controlled by individual
    component write masks specified as part of the instruction.

    The component write mask is specified by the <optionalMask> rule
    found in the <maskedDstReg> rule.  If the optional mask is "",
    all components are enabled.  Otherwise, the optional mask names
    the individual components to enable.  The characters "x", "y",
    "z", and "w" match the x, y, z, and w components respectively.
    For example, an optional mask of ".xzw" indicates that the x, z,
    and w components should be enabled for writing but the y component
    should not.  The grammar requires that the destination register mask
    components must be listed in "xyzw" order.  The condition code write
    mask is specified by the <ccMask> rule found in the <instResultCC>
    and <instResultAddrCC> rules.  Otherwise, the selected condition
    code register is loaded and swizzled according to the swizzle
    codes specified by <swizzleSuffix>.  Each component of the swizzled
    condition code is tested according to the rule given by <ccMaskRule>.
    <ccMaskRule> may have the values "EQ", "NE", "LT", "GE", LE", or "GT",
    which mean to enable writes if the corresponding condition code field
    evaluates to equal, not equal, less than, greater than or equal, less
    than or equal, or greater than, respectively.  Comparisons involving
    condition codes of "UN" (unordered) evaluate to true for "NE" and
    false otherwise.  For example, if the condition code is (GT,LT,EQ,GT)
    and the condition code mask is "(NE.zyxw)", the swizzle operation
    will load (EQ,LT,GT,GT) and the mask will thus will enable writes on
    the y, z, and w components.  In addition, "TR" always enables writes
    and "FL" always disables writes, regardless of the condition code.
    If the condition code mask is empty, it is treated as "(TR)".

    Each component of the destination register is updated with the result
    of the vertex program instruction if and only if the component is
    enabled for writes by both the component write mask and the condition
    code write mask.  Otherwise, the component of the destination register
    remains unchanged.

    A vertex program instruction can also optionally update the condition
    code register.  The condition code is updated if the condition
    code register update suffix "C" is present in the instruction.
    The instruction "ADDC" will update the condition code; the otherwise
    equivalent instruction "ADD" will not.  If condition code updates
    are enabled, each component of the destination register enabled
    for writes is compared to zero.  The corresponding component of
    the condition code is set to "LT", "EQ", or "GT", if the written
    component is less than, equal to, or greater than zero, respectively.
    Condition code components are set to "UN" if the written component is
    NaN (not a number).  Values of -0.0 and +0.0 both evaluate to "EQ".
    If a component of the destination register is not enabled for writes,
    the corresponding condition code component is also unchanged.

    In the following example code,

        # R1=(-2, 0, 2, NaN)              R0                  CC
        MOVC R0, R1;               # ( -2,  0,   2, NaN) (LT,EQ,GT,UN)
        MOVC R0.xyz, R1.yzwx;      # (  0,  2, NaN, NaN) (EQ,GT,UN,UN)
        MOVC R0 (NE), R1.zywx;     # (  0,  0, NaN,  -2) (EQ,EQ,UN,LT)

    the first instruction writes (-2,0,2,NaN) to R0 and updates the
    condition code to (LT,EQ,GT,UN).  The second instruction, only the
    "x", "y", and "z" components of R0 and the condition code are updated,
    so R0 ends up with (0,2,NaN,NaN) and the condition code ends up with
    (EQ,GT,UN,UN).  In the third instruction, the condition code mask
    disables writes to the x component (its condition code field is "EQ"),
    so R0 ends up with (0,0,NaN,-2) and the condition code ends up with
    (EQ,EQ,UN,LT).

    The following pseudocode illustrates the process of writing a
    result vector to the destination register.  In the pseudocode,
    "instrSaturate" is TRUE if and only if result saturation is
    enabled, "instrMask" refers to the component write mask given by
    the <optWriteMask> rule.  "ccMaskRule" refers to the condition code
    mask rule given by <ccMask> and "updatecc" is TRUE if and only if
    condition code updates are enabled.  "result", "destination", and "cc"
    refer to the result vector, the register selected by <dstRegister>
    and the condition code, respectively.  Condition codes do not exist
    in the VP1 execution environment.

      boolean TestCC(CondCode field) {
          switch (ccMaskRule) {
          case "EQ":  return (field == "EQ");
          case "NE":  return (field != "EQ");
          case "LT":  return (field == "LT");
          case "GE":  return (field == "GT" || field == "EQ");
          case "LE":  return (field == "LT" || field == "EQ");
          case "GT":  return (field == "GT");
          case "TR":  return TRUE;
          case "FL":  return FALSE;
          case "":    return TRUE;
          }
      }

      enum GenerateCC(float value) {
        if (value == NaN) {
          return UN;
        } else if (value < 0) {
          return LT;
        } else if (value == 0) {
          return EQ;
        } else {
          return GT;
        }
      }

      void UpdateDestination(floatVec destination, floatVec result)
      {
          floatVec merged;
          ccVec    mergedCC;

          // Clamp result components to [0,1] if requested in the instruction.
          if (instrSaturate) {
              if (result.x < 0)      result.x = 0;
              else if (result.x > 1) result.x = 1;
              if (result.y < 0)      result.y = 0;
              else if (result.y > 1) result.y = 1;
              if (result.z < 0)      result.z = 0;
              else if (result.z > 1) result.z = 1;
              if (result.w < 0)      result.w = 0;
              else if (result.w > 1) result.w = 1;
          }

          // Merge the converted result into the destination register, under
          // control of the compile- and run-time write masks.
          merged = destination;
          mergedCC = cc;
          if (instrMask.x && TestCC(cc.c***)) {
              merged.x = result.x;
              if (updatecc) mergedCC.x = GenerateCC(result.x);
          }
          if (instrMask.y && TestCC(cc.*c**)) {
              merged.y = result.y;
              if (updatecc) mergedCC.y = GenerateCC(result.y);
          }
          if (instrMask.z && TestCC(cc.**c*)) {
              merged.z = result.z;
              if (updatecc) mergedCC.z = GenerateCC(result.z);
          }
          if (instrMask.w && TestCC(cc.***c)) {
              merged.w = result.w;
              if (updatecc) mergedCC.w = GenerateCC(result.w);
          }

          // Write out the new destination register and condition code.
          destination = merged;
          cc = mergedCC;
      }

    While this rule describes floating-point results, the same logic
    applies to the integer results generated by the ARA, ARL, and ARR
    instructions.

    Add to Section 2.14.4.5, Vertex Program Options

    Section 2.14.4.5.3, NV_vertex_program3 Program Option

    If a vertex program specifies the "NV_vertex_program3" option, the
    ARB_vertex_program grammar and execution environment are extended
    to take advantage of all the features of the "NV_vertex_program2"
    option, plus the following features:

        * several new instructions:

          * POPA -- pop address register off stack        
          * PUSHA -- push address register onto stack
          * TEX -- texture lookup
          * TXB -- texture lookup w/LOD bias
          * TXL -- texture lookup w/explicit LOD
          * TXP -- projective texture lookup

        * address register-relative addressing for vertex texture
          coordinate and generic attribute arrays,

        * address register-relative addressing for vertex texture
          coordinate result array, and

        * a second four-component condition code.


    Modify Section 2.14.5.34,  RET:  Subroutine Call Return

    The RET instruction conditionally returns from a subroutine initiated
    by a CAL instruction by popping an instruction reference off the
    top of the call stack and transferring control to the referenced
    instruction.  The following pseudocode describes the operation of
    the instruction:

      if (TestCC(cc.c***) || TestCC(cc.*c**) || 
          TestCC(cc.**c*) || TestCC(cc.***c)) {
        if (callStackDepth <= 0) {
          // terminate vertex program normally
        } else {
          callStackDepth--;
          if (callStack[callStackDepth] is a instruction reference) {
            instruction = callStack[callStackDepth];
          } else {
            // terminate vertex program abnormally
          }
        }

        // continue execution at <instruction>
      } else {
        // do nothing
      }

    In the pseudocode, <callStackDepth> is the depth of the call stack,
    <callStack> is an array holding the call stack, and <instruction> is
    a reference to an instruction previously pushed onto the call stack.

    If the call stack is empty when RET executes, the vertex program
    terminates normally.

    The vertex program terminates abnormally if the entry at the top of the
    call stack is not an instruction reference pushed by CAL.  When a vertex
    program terminates abnormally, all of the vertex program results are
    undefined.

    Add to Section 2.14.5,  Vertex Program Instruction Set

    Section 2.14.5.43, POPA:  Pop Address Register Stack    

    The POPA instruction generates a integer result vector by popping
    an entry off of the call stack.

      if (callStackDepth <= 0) {
        terminate vertex program;
      } else {
        callStackDepth--;
        if (callStack[callStackDepth] is an address register) {
          iresult = callStack[callStackDepth];
        } else {
          terminate vertex program;
        }
      }

    POPA does not support non-default write masks; a program will fail to load
    if it includes a component write mask other than ".xyzw" or a condition
    code write mask test other than "TR".

    In the pseudocode, <callStackDepth> is the current depth of the call
    stack and <callStack> is an array holding the call stack.  

    The vertex program terminates abnormally if it executes a POPA instruction
    when the call stack is empty, or when the entry at the top of the call
    stack is not an address register pushed by PUSHA.  When a vertex program
    terminates abnormally, all of the vertex program results are undefined.

    Section 2.14.5.44, PUSHA:  Push Address Register Stack    

    The PUSHA instruction pushes the address register operand onto the
    call stack, which is also used for subroutine calls.  The PUSHA
    instruction does not generate a result vector.

      tmp = AddrVectorLoad(op0);
      if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
        terminate vertex program;
      } else {
        callStack[callStackDepth] = tmp;
        callStackDepth++;
      }

    In the pseudocode, <callStackDepth> is the current depth of the call
    stack and <callStack> is an array holding the call stack.

    The vertex program terminates abnormally if it executes a PUSHA
    instruction when the call stack is full.  When a vertex program terminates
    abnormally, all of the vertex program results are undefined.

    Component swizzling is not supported when the operand is loaded.

    Section 2.14.5.45, TEX:  Texture Lookup

    The TEX instruction uses the single vector operand to perform a
    lookup in the specified texture map, yielding a 4-component result
    vector containing filtered texel values.  The (s,t,r,q) coordinates
    used for the texture lookup are (x,y,z,1), where x, y, and z are
    components of the vector operand.

      tmp = VectorLoad(op0);
      result = TextureSample(tmp.x, tmp.y, tmp.z, 1.0, 0.0, unit, target);

    where <unit> and <target> are the texture image unit number and
    target type, matching the <texImageUnitNum> and <texTargetType>
    grammar rules.

    The resulting sample is mapped to RGBA as described in Table 3.21,
    and the R, G, B, and A values are written to the x, y, z, and w
    components, respectively, of the result vector.

    Since partial derivatives of the texture coordinates are not defined,
    the base LOD value for vertex texture lookups is defined to be
    zero.  The value of lambda' used in equation 3.16 will be simply
    clamp(texobj_bias + texunit_bias).

    Section 2.14.5.46, TXB:  Texture Lookup (With LOD Bias)

    The TXB instruction uses the single vector operand to perform a
    lookup in the specified texture map, yielding a 4-component result
    vector containing filtered texel values.  The (s,t,r,q) coordinates
    used for the texture lookup are (x,y,z,1), where x, y, and z are
    components of the vector operand.  The w component of the operand
    is used as an additional LOD bias.

      tmp = VectorLoad(op0);
      result = TextureSample(tmp.x, tmp.y, tmp.z, 1.0, tmp.w, unit, target);

    where <unit> and <target> are the texture image unit number and
    target type, matching the <texImageUnitNum> and <texTargetType>
    grammar rules.

    The resulting sample is mapped to RGBA as described in Table 3.21,
    and the R, G, B, and A values are written to the x, y, z, and w
    components, respectively, of the result vector.

    Since partial derivatives of the texture coordinates are not defined,
    the base LOD value for vertex texture lookups is defined to be
    zero.  The value of lambda' used in equation 3.16 will be simply
    clamp(texobj_bias + texunit_bias + tmp.w).  

    Since the base LOD value is zero, the TXB instruction is completely
    equivalent to the TXL instruction, where the w component contains
    an explicit base LOD value.

    Section 2.14.5.47, TXL:  Texture Lookup (With Explicit LOD)

    The TXL instruction uses the single vector operand to perform a
    lookup in the specified texture map, yielding a 4-component result
    vector containing filtered texel values.  The (s,t,r,q) coordinates
    used for the texture lookup are (x,y,z,1), where x, y, and z are
    components of the vector operand.  The w component of the operand
    is used as the base LOD for the texture lookup.

      tmp = VectorLoad(op0); 
      result = TextureSampleLOD(tmp.x, tmp.y, tmp.z, 1.0, tmp.w, unit, target);

    where <unit> and <target> are the texture image unit number and
    target type, matching the <texImageUnitNum> and <texTargetType>
    grammar rules.

    The resulting sample is mapped to RGBA as described in Table 3.21,
    and the R, G, B, and A values are written to the x, y, z, and w
    components, respectively, of the result vector.

    The value of lambda' used in equation 3.16 will be simply tmp.w +
    clamp(texobj_bias + texunit_bias), where tmp.w is the base LOD.

    Section 2.14.5.48, TXP:  Texture Lookup (Projective)

    The TXP instruction uses the single vector operand to perform a
    lookup in the specified texture map, yielding a 4-component result
    vector containing filtered texel values.  The (s,t,r,q) coordinates
    used for the texture lookup are (x,y,z,w), where x, y, z, and w are
    the four components of the vector operand.

      tmp = VectorLoad(op0);
      result = TextureSample(tmp.x, tmp.y, tmp.z, tmp.w, 0.0, unit, target);

    where <unit> and <target> are the texture image unit number and
    target type, matching the <texImageUnitNum> and <texTargetType>
    grammar rules.

    The resulting sample is mapped to RGBA as described in Table 3.21,
    and the R, G, B, and A values are written to the x, y, z, and w
    components, respectively, of the result vector.

    Since partial derivatives of the texture coordinates are not defined,
    the base LOD value for vertex texture lookups is defined to be
    zero.  The value of lambda' used in equation 3.16 will be simply
    clamp(texobj_bias + texunit_bias).

Additions to Chapter 3 of the OpenGL 1.4 Specification (Rasterization)

    None.

Additions to Chapter 4 of the OpenGL 1.4 Specification (Per-Fragment
Operations and the Frame Buffer)

    None.

Additions to Chapter 5 of the OpenGL 1.4 Specification (Special Functions)

    None.

Additions to Chapter 6 of the OpenGL 1.4 Specification (State and
State Requests)

    None.

Additions to Appendix A of the OpenGL 1.4 Specification (Invariance)

    None.

Additions to the AGL/GLX/WGL Specifications

    None.

Dependencies on ARB_vertex_program

    ARB_vertex_program is required.

    This specification and NV_vertex_program2_option are based on a
    modified version of the grammar published in the ARB_vertex_program
    specification.  This modified grammar includes a few structural
    changes to better accommodate new functionality from this and
    other extensions, but should be functionally equivalent to the
    ARB_vertex_program grammar.  See NV_vertex_program2_option for
    details on the base grammar.

Dependencies on NV_vertex_program2_option

    NV_vertex_program2_option is required.  

    If the NV_vertex_program3 program option is specified, all
    the functionality described in both this extension and the
    NV_vertex_program2_option specification is available.

Dependencies on ARB_fragment_program_shadow

    If this extension and ARB_fragment_program shadow are both supported,
    vertex programs may include the option statement:

      OPTION ARB_fragment_program_shadow;

    which enables the use of SHADOW1D, SHADOW2D, and SHADOWRECT texture
    targets in texture lookup instructions, as described in the
    ARB_fragment_program_shadow specification.

    NVIDIA NOTE:  Drivers prior to September 2006 do not support the use of
    this option, and will not accept texture lookups with SHADOW1D, SHADOW2D,
    and SHADOWRECT targets.  Shadow mapping in vertex programs will result in
    software fallbacks on GeForce 6 and GeForce 7 series GPUs, but may be done
    in hardware on future GPUs.

Errors

    None.

New State

    None.

New Implementation Dependent State:

                                             Minimum
    Get Value             Type  Get Command   Value   Description                 Section   Attr.
    ---------             ----  -----------  -------  --------------------------  --------  -----
    MAX_VERTEX_TEXTURE_    Z+   GetIntegerv     1     Number of separate texture  2.14.3.7  -
      IMAGE_UNITS_ARB                                 image units that can be
                                                      accessed by a vertex program

Revision History

    Rev.    Date    Author    Changes
    ----  --------  --------  --------------------------------------------
    7     10/12/09  pbrown    Update grammar/documentation of PUSHA/POPA to
                              reflect the implementation.  <instResultAddr> is
                              used for POPA with some semantic checks.  Note
                              that some driver versions erroneously allowed
                              conditional write masks on POPA.  Also clarify
                              that ARB_fragment_program_shadow includes
                              support for "SHADOWRECT".

    6     09/27/06  pbrown    Document that ARB_fragment_program_shadow is
                              allowed, to enable the use of "SHADOW1D" and
                              "SHADOW2D" targets for texture lookups.

    5     11/07/05  pbrown    Fix PUSHA documentation to specify the right
                              constant name used for overflow testing.

    4     09/01/05  pbrown    Fix spec language to document that a vertex
                              program will fail to compile if it uses "too
                              many" textures -- previously only documented
                              in the issues section.

    3     08/25/05  pbrown    Document that using a different texture target
                              than fragment processing on the same texture
                              unit results in an INVALID_OPERATION error at
                              Begin time.  This is consistent with GLSL
                              language in the ARB_shader_objects and OpenGL
                              2.0 specifications.  The implementation has
                              always done this, but it was overlooked in
                              the spec language.

    2     06/23/04  pbrown    Documented that vertex results are undefined
                              when a vertex program terminates abnormally
                              (e.g., PUSHA/POPA stack overflow/underflow).
                              Documented error in RET if the top of the call
                              stack contains a value written by PUSHA.

    1     --------  pbrown    Initial pre-release revisions.

