Wednesday 22 March 2023

Exploring AI for Texture generation

By now I'm sure everyone reading this has heard about AI and the potential it has to disrupt everything.I thought I would put it to the test an try it out for texture generation.

Texture generation is an area I have done a lot of work in over the years having used substance designer/painter, Krita, Gimp, Blender and Unreal material nodes as well as many other programs I'm probably forgetting. 

It is a task that takes a significant amount of time, especially with the 4/5 different maps PBR requires. Often artists will simply skip many of these steps, I have seen a lot of albedo maps plugged into the roughness slot in a material node setup.

AI promises to dramatically improve the speed of generation of textures and materials, so I decided to try it out and see how good a result I could get. 


Stable diffusion 

To begin with we need an AI solution, Stable Diffusion is free, open source and can be run locally. which is why we're going to use it.

https://github.com/Stability-AI/StableDiffusion

Rather than wrangling python libraries we can clone the stable-diffusion-webui project to use the full suite of stable diffusion tools from a GUI that runs locally. 

https://github.com/AUTOMATIC1111/stable-diffusion-webui

These repos contain documentation for how to get them running, but if you're familiar with git it's pretty easy.


Texture Generation

Out the gate, stable diffusion does not do a great job of generating textures. Even with some prompt wrangling I couldn't seem to get anything resembling usable textures.

cobblestones, game texture, game asset, photorealistic, photography, 8K uhd

Switching to a fine-tuned model improves the results somewhat. In this case I used the Epic Diffusion model which I had previously used for generating images for TTRPGs. 


Results on the finetune are definitely better than the raw model. However, anyone who understands PBR and albedo will very quickly see these images are not useable. Each contains a lot of lighitng information. 

  • The highlights (blue circles) could be removed in software relatively easily, but detail will be lost in those areas.
  • There are very obvious directional light cues (green circles). These are difficult to remove without normal information and will often be interpreted as height differences by texture generation tools. Whilst these are theoretically removable with inverse rendering techiques, we are working with abitrary textures so approximating normal maps for the purpose of light removal is unlikely to provide good results.
  • The worst problem with these images is the hard shadows most of them exhibit. These are pretty much impossible to remove without knowing the underlying geometry and lighting direction.

 

Texture - diffusion 

Turns out others have run into the same issues with texture generation as I did and have trained a model that is finetuned on albedo textures.

Here is a link to the model card

If you are using the AUTOMATIC1111 UI then you will need to download the cpkt version of the texture-diffusion model. It is available from here. If you have issues with getting the texture diffusion model to work with the UI, then updating the repo and submodules should fix this. 

The results are significantly better than the other options:

pbr, castle brick wall


By default Stable diffusion generates textures at 512x512. There are 3 options for upscaling that; generating at a higher resolution, using the hi-res fix option, or upscaling the result. 

pbr, brick wall
 

The first two options tended to create results like that shown above, whilst not bad by any means, the detail doesn't really improve, you just get more of it.

Upscaling the resultant 512x512 to 2048x2048 seems to be the best option. As with other applications the LDSR upscaler tends to provide the best results, it is a slow model but well worth the wait.

Original 512x512

upscaled 2048x2048 using LDSR

 

Materials

Out of the box texture-diffusion provides albedo maps. However for PBR rendering we need more than that. preferably we should have, height, normal and roughness maps. Metalness too if we're generating mixed material texture sets. 

I haven't tested using texture-diffusion to generate these images, it may work. However there is an easier solution that is guaranteed to work and is easy to use.

 

Materialize by bounding box software

Materialize is a machine learning powered tool that is used to generate PBR material maps from photos, for our use case we can use it generate the other maps from our albedo.

The software is free to use and open source too, which is bonkers considering the power this tool provides.

The materialize software interface

The results this tool provides are amazing. From importing the albedo image into materialize to the final texture set took maybe five minutes. it could take less than that if you don't fiddle with every slider to see what they do.

That's insane! 

The longest part of creating the above material was waiting for the AI upscaler to finish during the stable diffusion step.

 

Conclusion

Pending the outcome of any lawsuits this is the future of 99% of texturing. The old tools like substance designer will have their place for creating very specific textures, such as those that contain specific iconography and details that an AI is unlikely to be able to produce. Or if an art director has a very particular idea of what a surface should look like.

This will certainly be my new workflow when I'm looking for a generic texture and I can't find it on FreePBR ;p 
 

Results






 





Wednesday 1 March 2023

[OpenCV / OpenGL] Facial Albedo Approximation in Constrained Environments

This is a writeup of some of the work I did during my post graduate studies. The purpose of this research was to find solutions to the problem of facial reflectance capture in realtime on constrained hardware.Where facial reflectance is albedo & roughness textures for physically based BRDFs. 

Constrained hardware refers to the use of webcams or phone cameras, in addition to laptop hardware or low power mobile chips, with limited compute and graphics processing capability. 

Albedo Approximation

The first step was to extract albedo information from images. I used pre-existing renders for this process, with accurate normals rendered for each face.  
Test images were rendered in Blender using HDRIs for lighting.

Using the normal and color data, spherical harmonics can be extracted from the image. Lighting extraction can the be performed by using inverse rendering (Marschner, Guenter and Raghupathy, 2000).

Image = Diffuse * Albedo + Specular.

Rearranging the equation by dividing by the diffuse component results in an image that contains only the Albedo and specular. Spherical harmonics are used to approximate the diffuse component and have been shown to be up to 98% accurate for this task (Ramamoorthi, 2006)

 Image / Diffuse = Albedo + Specular.



The result of diffuse removal is shown above. The bright highlights on grazing angles can be removed by accounting for the fresnel affect manually. 

However, a more powerful technique is to combine highlight removal into a single step with a process called corrective fields. (Ichim et al., 2015)


Corrective fields remove a lot of the artifacts introduced by grazing angles and as a bonus help to reduce the specular component that is still present in the image. 

There are still two issues with the calculated result.  

  1. The image still contains specular information. 
  2. The image is limited to a single angle so the texture is stretched on the sides of faces, or missing if part of the face is occluded.

Multi-view Merging

The issues resulting from simple inverse rendering can both be solved by combining the results of multiple viewpoints. 
For the 2nd issue, multi-view merging means more areas of the face a visible to the system and thus areas are less likely to be missed.

An example of multiple viewpoints merged into a single texture.

The above image combines many angles to obtain the final result. Diffuse lighting has been removed from the image, however specular highlights are still clearly visible. 

Specular Removal

An interesting fact about specular highlights is that they are view dependent, whereas albedo is not. Using the reverse rendering via spherical harmonic technique as described above. A texture containing albedo + specular can be obtained. Because albedo is not affected by viewing angle, whereas specular is. Then any variation in luminosity can be attributed to a change in specular intesity.  

In theory, by choosing the minimum value of a point on the surface given multiple viewing angles, we can find the angle with the lowest specular response and use that value to get the most accurate estimation for the surfaces albedo. 

Choosing a minimum is technically correct, but in practice, limitations of the capture process and errors in the spherical harmonic estaimation, mean that niavely choosing the darkest pixel, often results in very visible seams in the image.

Seams appear when niavely sampling based on minimum pixel intensity.

Combining techniques

The trick to obtaining the best result and to removing the specular response is reversing the above process and merging images before attempting to remove lighting information from them. 
Whilst this makes little sense from a theoretical standpoint, it ultimately proves to result in better results that are free from seams and specular highlights. 

pre-merge extration (top) vs post merge extraction (bottom)

 

Future work

I obtained some very good results when changing the problem from one of reverse rendering to a linear regression problem. However it is difficult to optimise this kind of solution to low power devices, as a sufficiently powerful CPU is required.

Weighted least squares linear solve

 

 


References 

Ghosh, A. et al. (2011) ‘Multiview face capture using polarized spherical gradient illumination’, Proceedings of the 2011 SIGGRAPH Asia Conference on - SA ’11, 30(6), p. 1. doi: 10.1145/2024156.2024163.

Ramamoorthi, R. (2006) ‘Modeling Illumination Variation with Spherical Harmonics’, Face Processing: Advanced Modeling Methods, pp. 385–424.

IchiIchim, A. E., Bouaziz, S., & Pauly, M. (2015). Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics, 34(4), 45:1-45:14. https://doi.org/10.1145/2766974