Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: ARM Unveils Details of ASTC Texture Compression at HPG Conference - Part 2 - ARM Community

Jump to content

ARM Unveils Details of ASTC Texture Compression at HPG Conference - Part 2

Attached Image
In part one of this two-part post, I wrote about the first part of Tom Olson's paper from High Performance Graphics 2012. We saw how Bounded Integer Sequence Encoding allows ASTC to have a finely graded tradeoff between the roles of bits in a block. In this part, we will learn about the other techniques that contribute to ASTC’s flexibility and quality.

Partitioning the Color Space
Sometimes, the colors in a block do not fit neatly along one line. In an image of a red ball resting on green grass, blocks at the edge of the ball will may contain some green pixels and some red pixels. In this case, the encoder will recognize this, partition the texels into two sets, and assign two separate sets of color endpoints, one for the reds and one for the greens. It will then assign each texel to one of the two color partitions.

Instead of storing an extra bit for each pixel, an index is stored, which looks up a partition pattern in a large table. As there are 2048 patterns to choose from, the chances are that there will be a good match somewhere in the table.

In fact, the table is not stored as a table, but instead generated procedurally from a function. This saves a lot of space, as the table would otherwise be quite large. It can also be generalized to define 2-partition, 3-partition or 4-partition patterns with virtually no extra space. (There is another advantage which we will come back to later.)

A different color endpoint mode can be chosen for each partition, if this would be more efficient.

Dual Plane Mode
The features described so far work well when the channels in the texture are well correlated with each other, as is usually the case in images of read objects. However, we don’t always use textures just for surface color.

In the case of masked textures, which contain transparent “holes”, the alpha (transparency) channel often has little correlation with the colors in the image. Similarly, for normal maps, two texture channels are often interpreted as the X and Y coordinates of a surface normal at a particular point, and these two exhibit less correlation than actual images.

In this case, ASTC allows the encoder to store two color weights per texel, and then apply the second weight to a selected channel from the image. So, in the mask case, we would use weight 0 to interpolate the RGB components, and weight 1 to interpolate alpha.

144 Texels Into 128 Bits Doesn’t Go
For the larger block sizes, it is not possible to dedicate even 1 bit to every texel weight. The way around this is traditionally to omit values from the weight grid (for example in a checkerboard pattern), and reconstruct the missing weights by summing the nearest existent values.

Andrew Pomianowski, from our partners AMD, contributed a valuable insight into this process. We can allow the weight grid to be independent in size from the texel grid, and interpolate the weights bilinearly to calculate the effective weight at each texel position. This is actually easier to implement than traditional infill mechanisms, as there are no special rules for edge texels, and it produces good results for any combination of grid size and texel footprint.

So, for smoothly changing areas of an image, it is often possible to encode a coarse weight grid, which allows use of finer quantization on the color endpoints or the weights themselves. This is another example of the tradeoffs available in the block encoding.

Dynamic Range
Most existing texture compression methods encode images which have low dynamic range. Each of the red, green and blue components for a texel lie in the range 0..1, and the values are linearly spaced.

Modern content may also make use of high dynamic range (HDR) content, where the brightness of components can go higher than 1.

ASTC allows encoding of HDR images, with a greatly extended dynamic range and near-logarithmic interpolation of color values to match the human visual system. At 8 bits per pixel, it is comparable to the current de facto HDR encoding scheme, BC6H.

Unlike BC6H, however, it offers HDR as another independent choice in its palette of options. Content creators have exactly the same range of color formats and bit rates as they do with LDR images. Even at lower bit rates, the results are stil impressive:

Attached Image

HDR image compressed with ASTC at 3.56bpp, displayed using three different exposures



Adaptive Encoding
We have already seen that ASTC allows different bit tradeoffs in each block, but the dynamic adaptation goes a lot further than that.

Of all the options available: block footprint size, number of partitions, partition index, color endpoint modes, quantization levels, weight grid size, single or dual plane, and choice of LDR/HDR, only the block footprint is global. Everything else can vary from block to block.

This means that grey areas in a color image will be encoded using the more efficient luminance-only data, opaque areas of an RGBA image will be encoded without spending bits on the alpha channel, and smooth areas will be encoded with fewer (but more accurate) color weights. This freedom of choice increases the quality of the image as a whole, which would otherwise have to be encoded using the same worst-case parameters for every block.

While this is undoubtedly good for quality, it brings with it a problem of speed. With the large number of choices available for each parameter, the encoder has a vast configuration space in which to search when encoding a block. Doing this naively would lead to very long encoding times.

Developing good heuristics, search algorithms and error metrics has been the main thrust of encoder technology development and using ARM’s encoding tools, the user can now trade off encoding time against quality according to their needs.

World-Beating Performance
Having described all of this, the results speak for themselves. Despite using some bits in each block to indicate the encoding mode, ASTC performs as well as, and usually much better than, the currently available texture compression schemes.
  • At 2 bits per pixel, it outperforms PVRTC by more than 2dB.
  • At 3.56 bits per pixel, ASTC does better than PVRTC and DXT1 at 4 bits per pixel by around 1.5dB. It also beats ETC2 at 4 bits per pixel by about 0.7dB. This is despite a 10% bit rate disadvantage.
  • At 8 bits per pixel for HDR, ASTC and BC6H are roughly equivalent, both producing very good mPSNR ratios of 40-50 dB in the tested images.

For comparison, it is worth noting that differences as small as 0.25dB are visible to the human eye. The HPG paper goes into detail on the measurement techniques, and the significance of the results.

3D Textures
One thing that Tom does not discuss in his paper, for reasons of space, is ASTC’s support for 3D textures. The partition function generalizes nicely to 3D, allowing ASTC to define a cuboidal block footprint from 3x3x3 texels up to 6x6x6.

The advantage of a true 3D encoder, as opposed to separately encoding 2D slices of a 3D image, is that it can take advantage of similarity between the slices. When comparing the two approaches, our experiments show a significant increase in measured quality for true 3D encoding at the same bit rate.

Allowing flexible bit rate selection for 3D images is revolutionary. Until now, their expense means that use has been very rare. ASTC does not just support using 3D images in the mobile space, it enables them, handing content developers a whole new technique to use. Once they get their hands on this new tool, who knows what innovative applications they will discover?

Conclusion
ASTC is all about giving content developers what they want – a wide range of supported color formats, together with fine control over the space/time/quality tradeoff. And it does all of this while comfortably outperforming the best-in-class LDR compression schemes, and matching current best performance for HDR.

With ASTC, I think it’s no exaggeration to say that ARM has created the biggest leap forward in texture compression technology for years.

Sean Ellis is part of the Technical Staff in ARM's Media Processing Division. He has been working with 3D graphics since 1988, including on ARM's original machine, the Archimedes. He was a key player in the specification of the Java Mobile 3D Graphics standards (M3G and M3G2) and is currently working to define the architecture of the next generation of ARM GPUs. He is also involved in patent work, helping to protect ARM's innovations in the multimedia arena.
All company and product names appearing in the ARM Blogs are trademarks and/or registered trademarks of ARM Limited per ARM’s official trademark list. All other product or service names mentioned herein are the trademarks of their respective owners.

4 Comments On This Entry

Please log in above to add a comment or register for an account

Page 1 of 1

Sean Lumly 

08 August 2012 - 04:04 PM
Sean, first off, kudos on a FANTASTIC implementation. The flexibility of this format is truly inspiring and as such, this is a tremendous example of computer science in action. I feel that this format will be ground breaking in that it should unite the desktop and the embedded space, which seems like a glove fit with the newly publicized OpenGL 4.3/ES3 specs. Kudos on a terrific job!

I have a question regarding encoding performance. How fast in general is encoding on current test hardware? I notice that you have a desktop (x86) implementation of the encoder, but are there any plans for a native ARM encoder (ARMv7/NEON/etc)? Additionally, the block structure of this format seems as though it would lend itself quite well to het-compute that [more] fully utilize CPU cores and GPU cores in the near future (ie. Renderscript/OpenCL/etc).

I ask this question because while ASTC is very nice, it will likely compete with a slew of other formats until it is adopted en-mass. Additionally, more linear schemes (like WebP/JPG) may provide superior compression ratios for asset storage on disk. My intuition tells me that a viable strategy would be to stream the resources from disk and compress them to a format (ASTC for example) before putting them in memory. The latencies of disk reads should provide ample time to compress the assets for memory with minimal (if any) added latency, and above all, it allows developers to potentially use a single graphics format (eg. WebP that provides lossy/lossess compression) for a great many textures, target the specific requirements of hardware in an on-demand fashion, and allow for generally higher quality in the same package size. It could also offer a potentially simplified workflow with a simple API for streaming/compressing texture assets without the need to form a per-device strategy for asset storage. Dreaming ahead, it may be possible to optionally store small "compression hints" with each assets to speed up on-device encoding.

Just a thought. Please let me know if it's unrealistic! I would do my own experimentation if I just had a bit more time... :)
0

Sean Ellis 

08 August 2012 - 04:31 PM
Thanks for the comment, Sean. We've been having a lot of positive feedback on ASTC since the HPG conference, and the Khronos announcement.

Speed is tricky, so here comes the standard engineer's answer - "it depends..."

The encoder explores a "space" of possible encodings for each block. The reference encoder has a variety of encoding speeds, which trade off the depth to which the search space is explored, against the quality available within that part of the space explored. Using the "kodak" image suite as a reference, we see encoding speeds in the range of around 1Mpx/thread for "very fast" encoding, down to 5kpx/thread for exhaustive mode. But then, exhaustive explores much, much farther into the encoding space.

While the encoder we have is designed for the best possible general compression quality, specialized encoders will only explore much smaller parts of the encoding space, and thus be able to go much faster than this. This makes them more suitable for on-device encoding of specific types of content. One of the most interesting features of the ASTC technology is that there is a lot of room to innovate on the encoder side of things.

On-device recompression of highly compressed delivery formats is already used for some assets, and it makes sense if the encoding is fast enough and creates high enough quality. The difficulty here is that the artist will need to take into account artefacts introduced by both compression schemes. One way to get some way towards this is to use a lossless compression (e.g. zip) on raw ASTC data. This will eliminate some of the remaining entropy in the ASTC data, and save space while eliminating the recompression overhead.
0

Sean Lumly 

09 August 2012 - 04:20 PM
Ah, that makes a lot of sense, and I'm looking forward to seeing how encoders evolve to enhance the format. And thank you very much for the link to the encoder source!

Regarding the 144 texel blocks: since you're using interpolation to derive the individual texel weights on a per-block basis, I'm guessing that this would provide quality very similar to if the block had been scaled up from a block with a lower number of texels. Would I be correct in assuming that the 'adaptive' part of the format (ie. the per-block encoded result) is the saving grace for these large blocks -- certain blocks can tradeoff weight precision for colour accuracy prior to being scaled up, or utilize the procedural partitions and get sharp results in some cases? In this way, I'm guessing it would be superior to a lowest-common-denominator approach in which a smaller texture is substituted for a larger one...

Also, I'm reading that ASTC is an option for Mali T600 GPUs. Does this include the entire family (T601-T678), or the more recently announced T6x8 midgard variants?
0

Sean Ellis 

24 August 2012 - 09:52 AM
Sean, sorry for the late reply. You are right that the 144-texel blocks always have to be interpolated. But the number of weights can be allocated in each block to best match the features in that block. For example, in blocks with strong horizontal features, the weights would be allocated on a grid with coarser horizontal spacing.

With any compression scheme, at very low bitrates you always have the problem of loss of information. In ASTC the encoder is free to adjust its encodings on a block-by-block basis, which gives a better chance of preserving the significant information in a block. Whether this serves any particular image better than simply scaling the image down and encoding at at higher bit-rate is a choice that the content developer has to make.

As for availability, ASTC will be supported on the new ARM Mali-T624, Mali-T628 and Mali-T678 GPUs, but it will not be added onto the previous Mali-T604 and T658 GPUs.
0
Page 1 of 1
Maximise
Minimise
» 

My Blog Links

» 

Recent Entries

» 

ARM Onsite

» 

Search My Blog