This tutorial will create an application that loads and draws textured squares with both transparent and opaque textures. This code references callbacks and functions which were set up in previous tutorials. The source for this tutorial can be found here. There is also a pure sceGu implementation there.

#include "../../common/callbacks.h"
#include <gu2gl.h>
// PSP Module Info
PSP_MODULE_INFO("Triangle Sample", 0, 1, 1);
// Global variables
int running = 1;
static unsigned int __attribute__((aligned(16))) list[262144];
struct Vertex
	unsigned int color;
	float x, y, z;
struct Vertex __attribute__((aligned(16))) square_indexed[4] = {
    {0xFF00FFFF, -0.25f, -0.25f, -1.0f},
    {0xFFFF00FF, -0.25f, 0.25f, -1.0f},
    {0xFFFFFF00, 0.25f, 0.25f, -1.0f},
    {0xFF000000, 0.25f, -0.25f, -1.0f},
unsigned short __attribute__((aligned(16))) indices[6] = {
    0, 1, 2, 2, 3, 0
 * @brief Resets matrix and applies transform
void reset_transform(float x, float y, float z){
    ScePspFVector3 v = {x, y, z};
int main() {
    // Boilerplate
    // Initialize Graphics
    // Initialize Matrices
    glOrtho(-16.0f / 9.0f, 16.0f / 9.0f, -1.0f, 1.0f, -10.0f, 10.0f);
    //Main program loop
        guglStartFrame(list, GL_FALSE);
        // We're doing a 2D, Textured render 
        //Clear background to Bjack
        //Move this left
        reset_transform(-0.5f, 0.0f, 0.0f);
        glDrawElements(GL_TRIANGLES, GL_INDEX_16BIT | GL_COLOR_8888 | GL_VERTEX_32BITF | GL_TRANSFORM_3D, 6, indices, square_indexed);
        //Move this right
        reset_transform(0.5f, 0.0f, 0.0f);
        glDrawElements(GL_TRIANGLES, GL_INDEX_16BIT | GL_COLOR_8888 | GL_VERTEX_32BITF | GL_TRANSFORM_3D, 6, indices, square_indexed);
        guglSwapBuffers(GL_TRUE, GL_FALSE);
    // Terminate Graphics
    // Exit Game
    return 0;

This is our starting code for the application. The main significant changes was getting rid of disable texture 2D, removing the information for rendering the other shapes, and the reset_transform method which simply resets the model matrix and applies the translation to a set position.

STB is a collection of cross platform header-only C libraries. These are public domain code and can be used for many tasks. You can find the source here. We'll be using stb_image.h as our image loading library. This will load our images. I am aware that some recent versions of stb_image did not work earlier this year, but this may have been fixed by now. If you need a reliable confirmed working stb_image to use, check here.

To add this to your application, you will need to include it and `#define STB_IMAGE_IMPLEMENTATION` in at least one file before compilation, otherwise you will encounter errors. If you're using the confirmed working stb_image, it most likely will generate warnings that will trigger -Werror, so you'll likely have to disable it or ignore warnings from that file.

#include "stb_image.h"

In order to map textures to geometry, we need UV coordinates to read. To add these to our current Vertex, we'll need to add the UV values in the struct and the VType parameter.

struct Vertex
        float u, v;
	unsigned int color;
	float x, y, z;
struct Vertex __attribute__((aligned(16))) square_indexed[4] = {
    {0.0f, 0.0f, 0xFF00FFFF, -0.25f, -0.25f, -1.0f},
    {1.0f, 0.0f, 0xFFFF00FF, -0.25f, 0.25f, -1.0f},
    {1.0f, 1.0f, 0xFFFFFF00, 0.25f, 0.25f, -1.0f},
    {0.0f, 1.0f, 0xFF000000, 0.25f, -0.25f, -1.0f},

As we can see, we added the data here.

        glDrawElements(GL_TRIANGLES, GL_INDEX_16BIT | GL_TEXTURE_32BITF | GL_COLOR_8888 | GL_VERTEX_32BITF | GL_TRANSFORM_3D, 6, indices, square_indexed);

And the TEXTURE_32BITF flag was added to the vertex type.

First, we'll need some helper functions. We need to compute the nearest power of 2, copy texture data to an oversized buffer, and swizzle the texture.

unsigned int pow2(const unsigned int value) {
    unsigned int poweroftwo = 1;
    while (poweroftwo < value) {
        poweroftwo <<= 1;
    return poweroftwo;

This code calculates the nearest of power of 2 by bitshifting by 1 to the left, which is functionally equal to *= 2. This will always get the greatest possible power of two to fit the texture. We need this because as explained briefly in the PSP Graphics Context tutorial, the PSP GE can only use textures with power of 2 size, otherwise it will fail or produce invalid results.

void copy_texture_data(void* dest, const void* src, const int pW, const int width, const int height){
    for (unsigned int y = 0; y < height; y++) {
        for (unsigned int x = 0; x < width; x++) {
            ((unsigned int*)dest)[x + y * pW] = ((unsigned int *)src)[x + y * width];

This code performs a copy from src → dest where the destination is a power of 2 buffer and the source may or may not be power of 2. The function is a basic for loop that iterates through all X and Y. It is not relevant to set the remaining unfilled pixels of dest – this is user defined. Ideally, you'll never be reading from this area anyways.

void swizzle_fast(unsigned char *out, const unsigned char *in, const unsigned int width, const unsigned int height) {
    unsigned int blockx, blocky;
    unsigned int j;
    unsigned int width_blocks = (width / 16);
    unsigned int height_blocks = (height / 8);
    unsigned int src_pitch = (width - 16) / 4;
    unsigned int src_row = width * 8;
    const u8 *ysrc = in;
    u32 *dst = (u32 *)out;
    for (blocky = 0; blocky < height_blocks; ++blocky) {
        const unsigned char *xsrc = ysrc;
        for (blockx = 0; blockx < width_blocks; ++blockx) {
            const unsigned int *src = (unsigned int *)xsrc;
            for (j = 0; j < 8; ++j) {
                *(dst++) = *(src++);
                *(dst++) = *(src++);
                *(dst++) = *(src++);
                *(dst++) = *(src++);
                src += src_pitch;
            xsrc += 16;
        ysrc += src_row;

This code is a bit cryptic to understand, and I would recommend reading up on it. Basically, we are performing texture swizzling to the texture on load in order to generate a faster texture read order. This lets us avoid cache misses by relocating blocks of the texture. Cache misses are extremely expensive events where reading from a cache fails, resulting in a halt in the CPU execution pipeline and re-fetching the program memory from the RAM. RAM is hundreds or thousands of time slower than your CPU cache and results in severely degraded performance in high cache-miss scenarios (like linked lists). All you need to know about this code is that swizzling is very efficient and results in a several times improvement on texture reads.

Let's start off by defining a Texture object.

typedef struct {
    unsigned int width, height;
    unsigned int pW, pH;
    void* data;

This texture object simply contains the width and height alongside of the power2 width and height and the data stored as the opaque void*.

Texture* load_texture(const char* filename, const int vram) {
    int width, height, nrChannels;
    unsigned char* data = stbi_load(filename, &width, &height, &nrChannels, STBI_rgb_alpha);
        return NULL;
    Texture* tex = (Texture*)malloc(sizeof(Texture));
    // FIXME: Allocation could fail
    tex->width = width;
    tex->height = height;
    tex->pW = pow2(width);
    tex->pH = pow2(height);
    size_t size = tex->pH * tex->pW * 4;
    unsigned int *dataBuffer = (unsigned int*)memalign(16, size);
    //FIXME: Allocation could fail -- release resources, return NULL
    copy_texture_data(dataBuffer, data, text->pW, tex->width, tex->height);
    unsigned int* swizzled_pixels = NULL;
        swizzled_pixels = getStaticVramTexture(tex->pW, tex->pH, GU_PSM_8888);
    } else {
        swizzled_pixels = (unsigned int *)memalign(16, size);
    //FIXME: Allocation could fail -- release resources, return NULL
    swizzle_fast((unsigned char*)swizzled_pixels, (const unsigned char*)dataBuffer, tex->pW * 4, tex->pH);
    tex->data = swizzled_pixels;
    sceKernelDcacheWritebackInvalidateAll(); // Technically you could InvalidateRange() over the tex->data buffer
    return tex;

There's a lot to break down here. Our function signature takes in a file to load and whether or not we load it into VRAM. First, we create some temporary variables for STB image to read to. We also set vertical flip to be true. Depending on your file and its encoding you may have to perform a flip of the texture. It may be necessary to have an argument into our function to enable / disable this. We then call stbi_load() to load the image file into memory. This will return the pointer to a buffer containing the raw pixel data decoded from our file or NULL if it failed at some point. We have to check this and return NULL.

Next, we allocate a new Texture structure. This will be helpful for applications where you're passing multiple textures around and you don't want the overhead of a copy on this struct. (Imagine if you had more image metadata in an implementation, and you're passing around hundreds of textures – you'll be sinking a lot of time just copying your structure around). This might fail to allocate, so be careful! We also initialize the members of the Texture. We then create a dataBuffer which will use power of 2 and make sure the memory is aligned. This is required for the GE as explained above. We use the copy_texture_data function to copy STB's returned data and put it in the new buffer. We then free the STB data. We could stop here, which is valid, and specify when binding our texture that we don't use swizzle. This is NOT recommended because your overall texture speed will be significantly worse, especially on main RAM compared to VRAM.

Finally, we create a buffer to swizzle our pixels. Here will be the final destination, so we must determine whether it lies in VRAM or main RAM. VRAM will always be faster, but it's limited, compared to a relatively vast pool of main RAM. Once again, allocations can fail – and you should deal with that instead of crashing. The next part calls the swizzle function to swizzle and copy from the dataBuffer to the swizzled buffer. We can then free the data buffer and set our texture data. Now, we need to clear our CPU cache and render that to memory. The PSP's CPU cache may linger for a while, resulting in graphical glitches that are very apparent, due to the memory not being written back to main RAM. We can force this with `sceKernelDcacheWritebackInvalidateAll()` which invalidates the entire cache to guarantee our results are normal. We then return the texture.

Okay, now we have a texture, we have geometry that uses that texture… how do we use that texture? Well, we have to bind it. In this process we set a bunch of information about the texture for the PSP GE to use to render.

void bind_texture(Texture* tex) {
    glTexMode(GU_PSM_8888, 0, 0, 1);
    glTexFilter(GL_NEAREST, GL_NEAREST);
    glTexWrap(GL_REPEAT, GL_REPEAT);
    glTexImage(0, tex->pW, tex->pH, tex->pW, tex->data);

This bind texture mode specifies the mode the texture is in first. It says we're using an RGBA 8-bit per channel texture, the number of mipmaps, an unknown variable that must be set to 0, and whether or not it is swizzled. Since we didn't create mipmaps, and sceGu has no way to easily generate them, we're going to have 0, and enable swizzling. We then set the texture function to modulate all channels. This is the function used when blending against the vertex color. Modulate is equal to output_color = texture_color * vertex_color. We then set up a texture filter, which in this case uses nearest-neighbor sampling instead of a bilinear sampling. You can read more about this online. We also set the texture wrapping mode to repeat. If you set this to clamp and your UV value exceeds [0, 1] it results in this strange stretched-pixel effect with the last pixel inside the range. Finally we send the texture pointer to the GE, with the number of mipmaps (0 in this case), the power 2 width and height, the texture buffer width (power 2 width) and the pointer to the data.

int main() {
    Texture* tex = load_texture("container.jpg", GL_TRUE);
         guglStartFrame(list, GL_FALSE);
         glDrawElements(GL_TRIANGLES, GL_INDEX_16BIT | GL_TEXTURE_32BITF | GL_COLOR_8888 | GL_VERTEX_32BITF | GL_TRANSFORM_3D, 6, indices, square_indexed);
         guglSwapBuffers(GL_TRUE, GL_FALSE);
    return 0;

If you've set everything up properly (and provided a texture) this should render to the screen!

In order to enable transparent images to draw, we need to enable blending – this allows us to blend our transparent areas with opaque ones, resulting in the desired effects of transparency. Thankfully, this is relatively easy to enable.

        glBlendFunc(GU_ADD, GU_SRC_ALPHA, GU_ONE_MINUS_SRC_ALPHA, 0, 0);

Make sure this code is within the render loop between start frame and the end. First, we set the blend function – the typical blend function is color_result = color_source * source_factor + color_destination * destination_factor. SceGu lets you customize this more than OpenGL – you can change the operator from addition to other operations, and fixed values if you choose the FIX mode. We'll stick with the tried and true blending method of adding the color source * alpha + destination * (1-alpha). We will then enable blending. If you choose a transparent or translucent image you can then render with transparency!

Congratulations, you've finally drawn a texture onto the screen and can use your own images. In the next tutorial we'll cover matrices and coordinate spaces and how to make a basic 2D camera (though the concept can be applied into 3D).

  • tutorial/texture.txt
  • Last modified: 2022/09/14 05:17
  • by iridescence