HLSL Intrinsic Functions
--------------------------------------------------------------------------------
HLSL intrinsic functions are made up of
A return type.
A function name.
One or more arguments.
For example, the abs intrinsic function could be called like this:
float3 retValue;
retValue = abs(float3(0,0,1));
Where:
retValue is a float3 return type.
abs is the function name.
The input argument is a float3(0,0,1).
The input argument is made up of one or more template types, where each template type is made up of one or more data types. The return value is also made up of a template type and a data type, and the return value is dependent on the choices made for the input argument and the operation of the function.
Template Type
These are the template types for input arguments and return values:
scalar (1 component)
vector (up to 4 components)
matrix (up to 16 components)
object (sampler object)
Component Type
The component data types for input arguments and return values are the high-level shader language (HLSL) types:
Boolean: bool
Integer: int
Float types: half, float, or double
Numeric: int, half, float, or double
Sampler: sampler, sampler, sampler1D, sampler2D, sampler3D, samplerCUBE
The template type must be compatible with the component type:
The scalar, vector, and matrix templates can use any component type except the Sampler type.
The object template can only use the Sampler types.
Intrinsic Function Listing
The following table lists the intrinsic functions available in HLSL. Each function has a brief description, and a link to a reference page that has more detail about the input argument and return type.
Name Syntax Description
abs - HLSL value abs(value a) Absolute value (per component).
acos acos(x) Returns the arccosine of each component of x. Each component should be in the range [-1, 1].
all all(x) Test if all components of x are nonzero.
any any(x) Test if any component of x is nonzero.
asin asin(x) Returns the arcsine of each component of x. Each component should be in the range [-pi/2, pi/2].
atan atan(x) Returns the arctangent of x. The return values are in the range [-pi/2, pi/2].
atan2 atan2(y, x) Returns the arctangent of y/x. The signs of y and x are used to determine the quadrant of the return values in the range [-pi, pi]. atan2 is well-defined for every point other than the origin, even if x equals 0 and y does not equal 0.
ceil ceil(x) Returns the smallest integer which is greater than or equal to x.
clamp clamp(x, min, max) Clamps x to the range [min, max].
clip clip(x) Discards the current pixel, if any component of x is less than zero. This can be used to simulate clip planes, if each component of x represents the distance from a plane.
cos cos(x) Returns the cosine of x.
cosh cosh(x) Returns the hyperbolic cosine of x.
cross cross(a, b) Returns the cross product of two 3-D vectors a and b.
D3DCOLORtoUBYTE4 D3DCOLORtoUBYTE4(x) Swizzles and scales components of the 4-D vector x to compensate for the lack of UBYTE4 support in some hardware.
ddx ddx(x) Returns the partial derivative of x with respect to the screen-space x-coordinate.
ddy ddy(x) Returns the partial derivative of x with respect to the screen-space y-coordinate.
degrees degrees(x) Converts x from radians to degrees.
determinant determinant(m) Returns the determinant of the square matrix m.
distance distance(a, b) Returns the distance between two points, a and b.
dot dot(a, b) Returns the • product of two vectors, a and b.
exp exp(x) Returns the base-e exponent.
exp2 value exp2(value a) Base 2 Exp (per component).
faceforward faceforward(n, i, ng) Returns -n * sign(•(i, ng)).
floor floor(x) Returns the greatest integer which is less than or equal to x.
fmod fmod(a, b) Returns the floating point remainder f of a / b such that a = i * b + f, where i is an integer, f has the same sign as x, and the absolute value of f is less than the absolute value of b.
frac frac(x) Returns the fractional part f of x, such that f is a value greater than or equal to 0, and less than 1.
frexp frexp(x, out exp) Returns the mantissa and exponent of x. frexp returns the mantissa, and the exponent is stored in the output parameter exp. If x is 0, the function returns 0 for both the mantissa and the exponent.
fwidth fwidth(x) Returns abs(ddx(x)) + abs(ddy(x)).
isfinite isfinite(x) Returns true if x is finite, false otherwise.
isinf isinf(x) Returns true if x is +INF or -INF, false otherwise.
isnan isnan(x) Returns true if x is NAN or QNAN, false otherwise.
ldexp ldexp(x, exp) Returns x * 2exp.
length length(v) Returns the length of the vector v.
lerp lerp(a, b, s) Returns a + s(b - a). This linearly interpolates between a and b, such that the return value is a when s is 0, and b when s is 1.
lit lit(n • l, n • h, m) Returns a lighting vector (ambient, diffuse, specular, 1): ambient = 1; diffuse = (n • l < 0) ? 0 : n • l; specular = (n • l < 0) || (n • h < 0) ? 0 : (n • h * m);
log log(x) Returns the base-e logarithm of x. If x is negative, the function returns indefinite. If x is 0, the function returns +INF.
log10 log10(x) Returns the base-10 logarithm of x. If x is negative, the function returns indefinite. If x is 0, the function returns +INF.
log2 log2(x) Returns the base-2 logarithm of x. If x is negative, the function returns indefinite. If x is 0, the function returns +INF.
max max(a, b) Selects the greater of a and b.
min min(a, b) Selects the lesser of a and b.
modf modf(x, out ip) Splits the value x into fractional and integer parts, each of which has the same sign and x. The signed fractional portion of x is returned. The integer portion is stored in the output parameter ip.
mul mul(a, b) Performs matrix multiplication between a and b. If a is a vector, it is treated as a row vector. If b is a vector, it is treated as a column vector. The inner dimension acolumns and brows must be equal. The result has the dimension arows x bcolumns.
noise noise(x) Not yet implemented.
normalize normalize(v) Returns the normalized vector v / length(v). If the length of v is 0, the result is indefinite.
pow pow(x, y) Returns xy.
radians radians(x) Converts x from degrees to radians.
reflect reflect(i, n) Returns the reflection vector v, given the entering ray direction i, and the surface normal n, such that v = i - 2 * •(i, n) * n.
refract refract(i, n, ?) Returns the refraction vector v, given the entering ray direction i, the surface normal n, and the relative index of refraction ?. If the angle between i and n is too great for a given ?, refract returns (0,0,0).
round round(x) Rounds x to the nearest integer
rsqrt rsqrt(x) Returns 1 / sqrt(x)
saturate saturate(x) Clamps x to the range [0, 1]
sign sign(x) Computes the sign of x. Returns -1 if x is less than 0, 0 if x equals 0, and 1 if x is greater than zero.
sin sin(x) Returns the sine of x
sincos sincos(x, out s, out c) Returns the sine and cosine of x. sin(x) is stored in the output parameter s. cos(x) is stored in the output parameter c.
sinh sinh(x) Returns the hyperbolic sine of x
smoothstep smoothstep(min, max, x) Returns 0 if x < min. Returns 1 if x > max. Returns a smooth Hermite interpolation between 0 and 1, if x is in the range [min, max].
sqrt value sqrt(value a) Square root (per component)
step step(a, x) Returns (x >= a) ? 1 : 0
tan tan(x) Returns the tangent of x
tanh tanh(x) Returns the hyperbolic tangent of x
tex1D tex1D(s, t) 1-D texture lookup. s is a sampler or a sampler1D object. t is a scalar.
tex1D tex1D(s, t, ddx, ddy) 1-D texture lookup, with derivatives. s is a sampler or sampler1D object. t, ddx, and ddy are scalars.
tex1Dproj tex1Dproj(s, t) 1-D projective texture lookup. s is a sampler or sampler1D object. t is a 4-D vector. t is divided by its last component before the lookup takes place.
tex1Dbias tex1Dbias(s, t) 1-D biased texture lookup. s is a sampler or sampler1D object. t is a 4-D vector. The mip level is biased by t.w before the lookup takes place.
tex2D tex2D(s, t) 2-D texture lookup. s is a sampler or a sampler2D object. t is a 2-D texture coordinate.
tex2D tex2D(s, t, ddx, ddy) 2-D texture lookup, with derivatives. s is a sampler or sampler2D object. t, ddx, and ddy are 2-D vectors.
tex2Dproj tex2Dproj(s, t) 2-D projective texture lookup. s is a sampler or sampler2D object. t is a 4-D vector. t is divided by its last component before the lookup takes place.
tex2Dbias tex2Dbias(s, t) 2-D biased texture lookup. s is a sampler or sampler2D object. t is a 4-D vector. The mip level is biased by t.w before the lookup takes place.
tex3D tex3D(s, t) 3-D volume texture lookup. s is a sampler or a sampler3D object. t is a 3-D texture coordinate.
tex3D tex3D(s, t, ddx, ddy) 3-D volume texture lookup, with derivatives. s is a sampler or sampler3D object. t, ddx, and ddy are 3-D vectors.
tex3Dproj tex3Dproj(s, t) 3-D projective volume texture lookup. s is a sampler or sampler3D object. t is a 4-D vector. t is divided by its last component before the lookup takes place.
tex3Dbias tex3Dbias(s, t) 3-D biased texture lookup. s is a sampler or sampler3D object. t is a 4-D vector. The mip level is biased by t.w before the lookup takes place.
texCUBE texCUBE(s, t) 3-D cube texture lookup. s is a sampler or a samplerCUBE object. t is a 3-D texture coordinate.
texCUBE texCUBE(s, t, ddx, ddy) 3-D cube texture lookup, with derivatives. s is a sampler or samplerCUBE object. t, ddx, and ddy are 3-D vectors.
texCUBEproj texCUBEproj(s, t) 3-D projective cube texture lookup. s is a sampler or samplerCUBE object. t is a 4-D vector. t is divided by its last component before the lookup takes place.
texCUBEbias texCUBEbias(s, t) 3-D biased cube texture lookup. s is a sampler or samplerCUBE object. t is a 4-dimensional vector. The mip level is biased by t.w before the lookup takes place.
transpose transpose(m) Returns the transpose of the matrix m. If the source is dimension mrows x mcolumns, the result is dimension mcolumns x mrows.
--------------------------------------------------------------------------------
HLSL Language Basics
--------------------------------------------------------------------------------
HLSL Implements Per-Component Math Operations
The Vector Type
The Matrix Type
With high-level shader language (HLSL), you can program shaders at an algorithm level. The time spent considering hardware details such as register allocation, co-issuing instructions, and register read-port limits is greatly reduced. HLSL also has the advantages of other high level languages such as code re-use, improved readability, and a compiler that will optimize the code.
To generate HLSL shaders, you must first learn the high level shading language. Once you understand the language, you will need to know how to: declare variables and functions, use intrinsic functions, define custom data types and use semantics to connect shader arguments to other shaders and to the pipeline.
Once you learn how to author shaders in HLSL, you will need to learn about API calls so that you can: compile a shader for particular hardware, initialize shader constants, and initialize other pipeline state if necessary. These topics are covered in Writing HLSL Shaders. Beyond this, you can reuse your shader code by learning how to:
Write shader fragments. These independent shader functions can be compiled at design time and linked at runtime.
Use effects to manage state. Effects encapsulate pipeline state (including shader state) which makes managing pipeline state easier and more efficient.
HLSL Implements Per-Component Math Operations
HLSL uses two special types, a vector type and a matrix type to make programming 2-D and 3-D graphics easier. Each of these types contain more than one component; a vector contains up to four components, and a matrix contains up to 16 components. When vectors and matrices are used in standard HLSL equations, the math performed is designed to work per-component. For instance, HLSL implements this multiply:
float4 v = a*b;
as a four-component multiply. The result is four scalars:
float4 v = a*b;
v.x = a.x*b.x;
v.y = a.y*b.y;
v.z = a.z*b.z;
v.w = a.w*a.w;
This is four multiplications where each result is stored in a separate component of v. This is called a four-component multiply. HLSL uses component math which makes writing shaders in HLSL very efficient.
This is very different from a multiply which is typically implemented as a dot product which generates a single scalar:
v = a.x*b.x + a.y*b.y + a.z*b.z + a.w*b.w;
A matrix also use per-component operations in HLSL:
float3x3 mat1,mat2;
...
float3x3 mat3 = mat1*mat2;
The result is a per-component multiply of the two matrices (as opposed to a standard 3x3 matrix multiply). A per component matrix multiply yields this first term:
mat3.m00 = mat1.m00 * mat2._m00;
This is different from a 3x3 matrix multiply which would yield this first term:
// First component of a four-component matrix multiply
mat.m00 = mat1._m00 * mat2._m00 +
mat1._m01 * mat2._m10 +
mat1._m02 * mat2._m20 +
mat1._m03 * mat2._m30;
Overloaded versions of the multiply intrinsic function handle cases where one operand is a vector and the other operand is a matrix. Such as: vector * vector, vector * matrix, matrix * vector, and matrix * matrix. For instance:
float4x3 World;
float4 main(float4 pos : POSITION) : POSITION
{
float4 val;
val.xyz = mul(pos,World);
val.w = 0;
return val;
}
produces the same result as:
float4x3 World;
float4 main(float4 pos : POSITION) : POSITION
{
float4 val;
val.xyz = (float3) mul((float1x4)pos,World);
val.w = 0;
return val;
}
This example casts the "pos" vector to a column vector using the "(float1x4)" cast. Changing a vector by casting, or swapping the order of the arguments supplied to mul is equivalent to transposing the matrix.
Automatic cast conversion causes the mul and dot intrinsic functions to return the same results as used here:
{
float4 val;
return mul(val,val);
}
This result of the mul is a 1x4 * 4x1 = 1x1 vector. This is equivalent to a dot product:
{
float4 val;
return dot(val,val);
}
which returns a single scalar value.
The Vector Type
A vector is a data structure that contains between one and four components.
bool bVector; // scalar containing 1 Boolean
bool1 bVector; // vector containing 1 Boolean
int1 iVector; // vector containing 1 int
half2 hVector; // vector containing 2 halfs
float3 fVector; // vector containing 3 floats
double4 dVector; // vector containing 4 doubles
The integer immediately following the data type is the number of components on the vector.
Initializers can also be included in the declarations.
bool bVector = false;
int1 iVector = 1;
half2 hVector = { 0.2, 0.3 };
float3 fVector = { 0.2f, 0.3f, 0.4f };
double4 dVector = { 0.2, 0.3, 0.4, 0.5 };
Alternatively, the vector type can be used to make the same declarations:
vector <bool, 1> bVector = false;
vector <int, 1> iVector = 1;
vector <half, 2> hVector = { 0.2, 0.3 };
vector <float, 3> fVector = { 0.2f, 0.3f, 0.4f };
vector <double, 4> dVector = { 0.2, 0.3, 0.4, 0.5 };
The vector type uses angle brackets to specify the type and number of components.
Vectors contain up to four components, each of which can be accessed using one of two naming sets:
The position set: x,y,z,w
The color set: r,g,b,a
These statements both return the value in the third component.
// Given
float4 pos = float4(0,0,2,1);
pos.z // value is 2
pos.b // value is 2
Naming sets can use one or more components, but they cannot be mixed.
// Given
float4 pos = float4(0,0,2,1);
float2 temp;
temp = pos.xy // valid
temp = pos.rg // valid
temp = pos.xg // NOT VALID because the position and color sets were used.
Specifying one or more vector components when reading components is called swizzling. For example:
float4 pos = float4(0,0,2,1);
float2 f_2D;
f_2D = pos.xy; // read two components
f_2D = pos.xz; // read components in any order
f_2D = pos.zx;
f_2D = pos.xx; // components can be read more than once
f_2D = pos.yy;
Masking controls how many components are written.
float4 pos = float4(0,0,2,1);
float4 f_4D;
f_4D = pos; // write four components
f_4D.xz = pos.xz; // write two components
f_4D.zx = pos.xz; // change the write order
f_4D.xzyw = pos.w; // write one component to more than one component
f_4D.wzyx = pos;
Assignments cannot be written to the same component more than once. So the left side of this statement is invalid:
f_4D.xx = pos.xy; // cannot write to the same destination components
Also, the component name spaces cannot be mixed. This is an invalid component write:
f_4D.xg = pos.rgrg; // invalid write: cannot mix component name spaces
The Matrix Type
A matrix is a data structure that contains rows and columns of data. The data can be any of the scalar data types, however, every element of a matrix is the same data type. The number of rows and columns is specified with the "row by column" string that is appended to the data type.
int1x1 iMatrix; // integer matrix with 1 row, 1 column
int2x1 iMatrix; // integer matrix with 2 rows, 1 column
...
int4x1 iMatrix; // integer matrix with 4 rows, 1 column
...
int1x4 iMatrix; // integer matrix with 1 row, 4 columns
double1x1 dMatrix; // double matrix with 1 row, 1 column
double2x2 dMatrix; // double matrix with 2 rows, 2 columns
double3x3 dMatrix; // double matrix with 3 rows, 3 columns
double4x4 dMatrix; // double matrix with 4 rows, 4 columns
The maximum number of rows or columns is 4; the minimum number is 1.
A matrix can be initialized when it is declared:
float2x2 fMatrix = { 0.0f, 0.1, // row 1
2.1f, 2.2f // row 2
};
Or, the matrix type can be used to make the same declarations:
matrix <float, 2, 2> fMatrix = { 0.0f, 0.1, // row 1
2.1f, 2.2f // row 2
};
The matrix type uses the angle brackets to specify the type, the number of rows, and the number of columns. This example creates a floating-point matrix, with two rows and two columns. Any of the scalar data types can be used.
This declaration defines a matrix of half values (16-bit floating-point numbers) with two rows and three columns:
matrix <half, 2, 3> fHalfMatrix;
A matrix contains values organized in rows and columns, which can be accessed using the structure operator "." followed by one of two naming sets:
The zero-based row-column position:
_m00, _m01, _m02, _m03
_m10, _m11, _m12, _m13
_m20, _m21, _m22, _m23
_m30, _m31, _m32, _m33
The one-based row-column position:
_11, _12, _13, _14
_21, _22, _23, _24
_31, _32, _33, _34
_41, _42, _43, _44
Each naming set starts with an underscore followed by the row number and the column number. The zero-based convention also includes the letter "m" before the row and column number. Here's an example that uses the two naming sets to access a matrix:
// given
float2x2 fMatrix = { 1.0f, 1.1f, // row 1
2.0f, 2.1f // row 2
};
float f_1D;
f_1D = matrix._m00; // read the value in row 1, column 1: 1.0
f_1D = matrix._m11; // read the value in row 2, column 2: 2.1
f_1D = matrix._11; // read the value in row 1, column 1: 1.0
f_1D = matrix._22; // read the value in row 2, column 2: 2.1
Just like vectors, naming sets can use one or more components from either naming set.
// Given
float2x2 fMatrix = { 1.0f, 1.1f, // row 1
2.0f, 2.1f // row 2
};
float2 temp;
temp = fMatrix._m00_m11 // valid
temp = fMatrix._m11_m00 // valid
temp = fMatrix._11_22 // valid
temp = fMatrix._22_11 // valid
A matrix can also be accessed using array access notation, which is a zero-based set of indices. Each index is inside of square brackets. A 4x4 matrix is accessed with the following indices:
[0][0], [0][1], [0][2], [0][3]
[1][0], [1][1], [1][2], [1][3]
[2][0], [2][1], [2][2], [2][3]
[3][0], [3][1], [3][2], [3][3]
Here is an example of accessing a matrix:
float2x2 fMatrix = { 1.0f, 1.1f, // row 1
2.0f, 2.1f // row 2
};
float temp;
temp = fMatrix[0][0] // single component read
temp = fMatrix[0][1] // single component read
Notice that the structure operator "." is not used to access an array. Array access notation cannot use swizzling to read more than one component.
float2 temp;
temp = fMatrix[0][0]_[0][1] // invalid, cannot read two components
However, array accessing can read a multi-component vector.
float2 temp;
float2x2 fMatrix;
temp = fMatrix[0] // read the first row
As with vectors, reading more than one matrix component is called swizzling. More than one component can be assigned, assuming only one name space is used. These are all valid assignments:
// Given these variables
float4x4 worldMatrix = float4( {0,0,0,0}, {1,1,1,1}, {2,2,2,2}, {3,3,3,3} );
float4x4 tempMatrix;
tempMatrix._m00_m11 = worldMatrix._m00_m11; // multiple components
tempMatrix._m00_m11 = worldMatrix.m13_m23;
tempMatrix._11_22_33 = worldMatrix._11_22_33; // any order on swizzles
tempMatrix._11_22_33 = worldMatrix._24_23_22;
Masking controls how many components are written.
// Given
float4x4 worldMatrix = float4( {0,0,0,0}, {1,1,1,1}, {2,2,2,2}, {3,3,3,3} );
float4x4 tempMatrix;
tempMatrix._m00_m11 = worldMatrix._m00_m11; // write two components
tempMatrix._m23_m00 = worldMatrix.m00_m11;
Assignments cannot be written to the same component more than once. So the left side of this statement is invalid:
// cannot write to the same component more than once
tempMatrix._m00_m00 = worldMatrix.m00_m11;
Also, the component name spaces cannot be mixed. This is an invalid component write:
// Invalid use of same component on left side
tempMatrix._11_m23 = worldMatrix._11_22;
Matrix Ordering
Matrix packing order for uniform parameters is set to column-major by default. This means each column of the matrix is stored in a single constant register. On the other hand, a row-major matrix packs each row of the matrix in a single constant register. Matrix packing can be changed with the "#pragma pack_matrix" directive, or with the "row_major" or the "col_major" keywords.
In general, column-major matrices are more efficient than row-major matrices. Here is an example that compares the number of instructions used for both column-major and row-major matrices:
// column-major matrix packing
float4x3 World;
float4 main(float4 pos : POSITION) : POSITION
{
float4 val;
val.xyz = mul(pos,World);
val.w = 0;
return val;
}
If you look at the assembly code generated from the HLSL compiler, you will see these instructions:
vs_2_0
def c3, 0, 0, 0, 0
dcl_position v0
m4x3 oPos.xyz, v0, c0
mov oPos.w, c3.x
// approximately four instruction slots used
Using a column-major matrix in this example generated four assembly-language instructions.
Here is the same example using a row-major matrix:
// row-major matrix packing
#pragma pack_matrix(row_major)
float4x3 World;
float4 main(float4 pos : POSITION) : POSITION
{
float4 val;
val.xyz = mul(pos,World);
val.w = 0;
return val;
}
The assembly code generated from compiling this HLSL code is:
vs_2_0
def c4, 0, 0, 0, 0
dcl_position v0
mul r0.xyz, v0.x, c0
mad r2.xyz, v0.y, c1, r0
mad r4.xyz, v0.z, c2, r2
mad oPos.xyz, v0.w, c3, r4
mov oPos.w, c4.x
// approximately five instruction slots used
This generated five instruction slots. In this example, writing the same code with a column-major packing order saved one instruction out of five. In addition to saving instruction slots, column-major packing usually saves constant register space.
The data in a matrix is loaded into shader constant registers before a shader runs. There are two choices for how the matrix data is read: in row-major order or in column-major order. Column-major order means that each matrix column will be stored in a single constant register, and row-major order means that each row of the matrix will be stored in a single constant register. This is an important consideration for how many constant registers are used for a matrix.
A row-major matrix is laid out like this:
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
A column-major matrix is laid out like this:
11 21 31 41
12 22 32 42
13 23 33 43
14 24 34 44
Row-major and column-major matrix ordering determine the order the matrix components are read from the constant table or from shader inputs. Once the data is written into constant registers, matrix order has no effect on how the data is used or accessed from within shader code. Also, matrices declared in a shader body do not get packed into constant registers. Row-major and column-major packing order has no influence on the packing order of constructors (which always follows row-major ordering).
The order of the data in a matrix can be declared at compile time (see Type Modifiers, or the compiler will order the data at runtime for the most efficient use.
--------------------------------------------------------------------------------