zeux / meshoptimizer

This release features many library changes including a few new algorithms and substantial improvements in existing algorithms, notably making mesh simplification and clusterization better, as well as gltfpack fixes and improvements.

Library improvements

meshopt_simplify now has an extra output parameter, result_error, which will contain the relative simplification error (which can be converted to absolute with meshopt_simplifyScale)
meshopt_simplifySloppy interface has changed to align with meshopt_simplify: the function now expects a larger output index buffer size and accepts an input error that restricts simpification as well as an output error.
meshopt_simplify now tries to avoid simplifications that result in triangle flips; this substantially improves triangulation quality at a moderate performance cost.
meshopt_buildMeshlets interface has changed to allow for almost arbitrary meshlet vertex/triangle limits by outputting three separate arrays of meshlet data instead of one; the resulting layout is also more compact and is often more GPU-friendly.
meshopt_buildMeshlets now implements a new, more expensive algorithm that generates meshlets that are optimized for a balance of vertex reuse, spatial coherency and cone culling efficiency, controlled with an extra cone_weight parameter. The old linear-time algorithm is still available as meshopt_buildMeshletsScan.
Implement a new algorithm, meshopt_generateTessellationIndexBuffer, that can be used to generate a special index buffer that, together with hardware tessellation stage, can efficiently implement crack-free PN-AEN tessellation for arbitrary meshes.
Optimize Wasm SIMD variant of meshopt_decodeVertexBuffer, making it ~5% faster
Fix SIMD decoder filters (meshopt_decodeFilter*) when vertex count wasn't aligned by 4
Fix undefined behavior when decoding some invalid compressed index buffers with meshopt_decodeIndexBuffer

gltfpack improvements

Implement support for KHR_materials_variants
Implement support for recent versions of PBR-next extensions, including KHR_materials_volume and KHR_materials_specular
Fix issues with running texture compression tools (toktx, basisu) in various environments
Fix support for older versions of Node.js
Fix processing for some scenes with clearcoat materials that didn't have a diffuse texture
Fix handling of absolute paths in Node.js builds
Fix processing for scenes with KHR_texture_transform extension when quantization is disabled

This release focuses on gltfpack improvements and also features small improvements to simplifier to improve quality for some edge cases.

gltfpack highlights

gltfpack improves support for instanced meshes substantially in this release. While previously all instances of the same mesh would be merged together unconditionally which could result in large file sizes and/or memory consumption, by default the instances are kept as is now; -mm can be used to merge the geometry of the instances together, or alternatively -mi can be used to encode the instance data using EXT_mesh_gpu_instancing which, given a compatible loader, can significantly reduce the transmission size and improve loading and rendering performance.

To improve support for large scenes even further, gltfpack is now much more memory efficient, requiring ~40% less memory for processing on average.

The extension that's used by gltfpack to compress geometry, animation and instance data, is now part of glTF and is called EXT_meshopt_compression; gltfpack was changed accordingly to output compressed files conforming the up-to-date specification. This requires loaders to update to the new extension; https://github.com/zeux/meshoptimizer/tree/master/js contains plugins for three.js and Babylon.js and work is underway to integrate these directly upstream.

For texture compression, gltfpack is switching to toktx from KTX-Software; this enables support for super-compressed UASTC textures and support for texture scaling during encoding (via -ts option) which can further reduce the file size. Additionally when using toktx, gltfpack now pads the textures to a multiple of 4 to ensure compatibility with WebGL, and can optionally (via -tp option) pad to a power of 2 for older browsers. basisu command-line tool is still supported for now and automatically used if toktx is not available.

Finally, gltfpack is now available as a JS library in addition to having command-line executables; the library uses a filesystem-like interface. Please refer to gltf/library.js for documentation on the two exposed functions.

gltfpack improvements

Improve support for scenes with many instances of the same mesh; -mm is now required to merge these instances together
Implement support for EXT_mesh_gpu_instancing via -mi command line option
-km can now be used to keep unused materials
-ke now keeps extras on nodes in addition to materials
Improve memory consumption when packing large scenes by 40% on average
node.js version of gltfpack now supports texture compression if basisu or toktx are available
Update KTX2 support to track latest KTX2 specification, including DFD changes for ETC1S/UASTC
Implement support for various PBR.Next extensions including KHR_materials_transmission, KHR_materials_ior, KHR_materials_specular and KHR_materials_sheen
Implement support for toktx when compressing textures
Implement support for -ts that can be used to rescale textures to reduce transmission and memory size
Instead of using 1-255 range for texture quality, -tq now accepts a level from 1 to 10, which is tuned to balance compression ratio vs quality for both ETC1S and UASTC
Fix processing for files with unused texture coordinate 0
Implement support for -tp that can be used to rescale images to power-of-two when using texture compression
Remove command line option -tb in favor of -tc; the latter uses KHR_texture_basisu which should be more widely supported
Remove command line option -te; textures are now automatically embedded into .glb files
Implement JSON report via -r option which contains various stats about the resulting glTF scene
Fix texture embedding for images with spaces in the URI
Fix issues with non-uniform and negative mesh scale
Implement support for multiple scenes; all scenes are now preserved along with their own node hierarchy
Implement support for higher bitrate colors via -vc option
Fix animation range in some cases, in particular starting time is now preserved when it's not 0, and ending time is preserved when animation doesn't have motion

Miscellaneous improvements

Improve meshopt_simplify edge analysis to track edge loops more carefully; this fixes simplification for some cases where an open border would previously get collapsed incorrectly
Fix a few issues with CMake configuration when meshoptimizer is used as a dependent library
Fix compilation for old Apple Clang versions
Reduce size of meshopt_decoder.js by 40% before gzip and 5% after gzip
meshopt_decoder.js now has an ES6-friendly variant, meshopt_decoder.module.js, that can be imported.

This release features several new algorithms, mainly aimed at improving the geometry compression, as well as many gltfpack changes with the same goal.

New algorithms

meshopt_optimizeVertexCacheStrip optimizes triangle lists for vertex cache, favoring long triangle strips over vertex transform efficiency. This function is recommended to use as a replacement for meshopt_optimizeVertexCache when reducing the compressed geometry size is more valuable than reducing vertex transform cost, or when using meshopt_stripify to produce shorter triangle strip sequences.
meshopt_encodeIndexBuffer now supports the new strip-optimized order better; this required some bitstream changes that can be enabled with meshopt_encodeIndexVersion(1). Version 1 will become the default encoding version in a later release.
meshopt_encodeIndexSequence can be used to compress index buffer data that doesn't represent triangle lists; the encoding is recommended for triangle strip or line lists, but can work with any index sequence (it's less efficient than meshopt_encodeIndexBuffer at compressing triangle lists)

When compressing geometry, using meshopt_optimizeVertexCacheStrip and meshopt_encodeIndexVersion(1) is recommended to minimize the distribution size of the resulting meshes; this can make the encoded data ~10% smaller before gzip/zstd compression and up to 20% smaller after gzip/zstd.

Additionally, a set of vertex filters (meshopt_decodeFilterOct, meshopt_decodeFilterQuat, meshopt_decodeFilterExp) was added to support MESHOPT_compression glTF extension; these are not as useful outside of glTF, and are described in detail in the extension draft. Cumulatively these can substantially reduce the geometry and animation data in glTF files compressed using the extension.

gltfpack highlights

gltfpack incorporates the new algorithms and filters to substantially improve the compression ratios for geometry and animation data. For example, Corset model from glTF-Sample-Models repository is 20% smaller, BrainStem model from the same repository is 30% smaller. Most of the changes currently require using a higher compression mode, activated via -cc command-line option; in a future release -cc may replace -c.

The texture compression support was updated to incorporate latest changes in KTX2 / KHR_texture_basisu specification; additionally, gltfpack now supports Basis UASTC encoding via -tu flag. Note that since gltfpack doesn't support UASTC RDO yet, the UASTC compressed files will be much larger (but much higher quality) compared to ETC1S encoded files.

For easier distribution, gltfpack is now available as an npm package.

gltfpack improvements

Support all primitive topology modes, except indexed point lists, as an input
Support for line lists as an output; line meshes were previously discarded
Improve filtering of redundant geometry streams (removing color/morph delta streams as necessary)
Implement support for KHR_materials_clearcoat extension
Preserve extras data on material instances when -ke flag is used
Add fine-grained control over quantization parameters for animations (-at, -ar, -as)
Add -noq option that can be used to disable quantization (resulting in much larger files)
Improve performance on large scenes with lots of mesh instances
Improve validation and error messages for invalid input files
Fix invalid output for files with meshes that don't produce any geometry

Miscellaneous improvements

meshopt_decodeVertexBuffer now automatically enables SSSE3 SIMD implementation for clang/gcc using __cpuid-based runtime detection without the need to use extra compile flags
meshopt_encodeVertexBuffer now works correctly on empty inputs (count = 0)
CMake scripts now support CMake versions older than 3.7
CMake options are now prefixed with MESHOPT_ (note: this breaks shared library builds, fixed in #129)

This release has several new algorithms, SIMD improvements for vertex codec and a lot of gltfpack changes including Basis support.

New algorithms

meshopt_simplifyPoints can be used to simplify point clouds. The algorithm is a variant of sloppy simplifier, which means it's fast and not attribute-aware (for now).
meshopt_spatialSortRemap and meshopt_spatialSortTriangles can be used to reorder vertices or triangles to increase spatial locality. This is helpful when working with point clouds and triangle meshes with redundant connectivity, and can improve clusterization results.

Performance improvements

meshopt_decodeVertexBuffer now has an experimental AVX512 implementation, which is ~10% faster than SSSE3 implementation (it uses 128b vectors and as such carries no extra power cost). It requires AVX512-VBMI2 and AVX512-VL (available on Ice Lake CPUs).
meshopt_decodeVertexBuffer now has an experimental WebAssembly SIMD implementation, which is ~3x faster than scalar implementation. It requires a compatible WebAssembly implementation with SIMD enabled (Chrome Canary was used for testing).
WebAssembly decoders are now compiled using upstream Emscripten compiler backend, which results in ~5% faster decoding across the board.

Miscellaneous improvements

All allocations now use allocation callbacks that can be set through meshopt_setAllocators; previously, allocations from meshopt_IndexAdapter were using global operator new/delete.
CMake build system now supports BUILD_SHARED_LIBS
CMake build system now can install gltfpack and libmeshoptimizer upon request

gltfpack highlights

This change includes a lot of work on extension specification. As a result, MESHOPT_quantized_geometry extension that was being used before got replaced with a new KHR_mesh_quantization extension (extension PR), and the details of MESHOPT_compression extension have changed substantially to allow for fallback data (extension PR), requiring updates to GLTF loaders. Both three.js (r111) and Babylon.JS (4.1) can be used to load these files, with a custom demo/GLTFLoader.js for three.js and an extension demo/babylon.MESHOPT_compression.js for Babylon.JS.

As a result, gltfpack-produced files now validate cleanly with the most recent glTF validator build (2.0.0-dev.3.0 (November 2019)).

gltfpack also now supports Basis Universal texture supercompression. Encoding files with these textures requires basisu executable which can be built from the official repository. Two container format options are provided:

.basis - native container format for Basis; this is supported by three.js and Babylon.JS today, but is likely to be removed in the future because this is not compatible with glTF specification
.ktx - KTX2 container format from Khronos that supports Basis supercompression; this is not supported by any renderer at the time of this writing, but this is the route that is being specified (spec PR).

In addition, there were a lot of changes aimed at increasing efficiency and extending feature support, with the full list below.

gltfpack improvements

Switch from MESHOPT_quantized_geometry to KHR_mesh_quantization
gltfpack-produced files now validate cleanly with the most recent build of glTF validator (PR)
Update MESHOPT_compression specification, requires updating JSON loaders (GLTFLoader.js)
Implement support for arbitrary number of input bone influences (largest 4 weights are preserved)
Implement degenerate triangle filtering (5% triangle/size savings on some models)
Use 8-bit morph target deltas when possible (depending on the model, up to 2x memory savings, ~3% size savings); requires three.js r111 to work correctly
Add -cf command line option to support compressed data fallback; files produced with this option don't require MESHOPT_compression extension, but loaders that support it will not need to load uncompressed data
Add -si R and -sa flags that simplify the meshes using default/aggressive (sloppy) simplification
By default, gltfpack now produces normalized normals/tangents; this results in larger but specification-compliant files. This will be improved later, for now you can use -vu to get better compression by using unnormalized normals/tangents.
Impement support for Basis / KTX2 compression (-tb to compress textures using basisu into .basis container; -tc to compress textures using KTX2 container which requires extra extensions and isn't supported by renderers yet)
Implement support for embedding texture files into buffers (-te flag)
Implement support for point clouds
Improve animation compression efficiency for translation/scale data by reducing output precision slightly.
Improve efficiency of bone influence encoding (~1% size savings)
A few correctness fixes, including non-uniform scale handling and quantized color/weight data parsing
Morph target names are now preserved using extra.targetNames JSON array

This release contains a few improvements for various algorithms, introduces support for triangle strips with degenerate triangles and adds gltfpack (alpha).

Interface changes:

meshopt_stripify and meshopt_unstripify now require an extra argument, restart_index

Improvements:

Improve meshopt_simplifySloppy performance by up to 10% by using three-point interpolation search
Improve results of meshopt_optimizeVertexCache by up to 0.5% by using a new data set obtained with differential evolution
meshopt_stripify now supports stitching strips using degenerate triangles instead of restart indices; this typically results in a 10% larger index buffer compared to restart indices, but on some GPUs it can be substantially faster to render

gltfpack:

This release introduces an alpha vesion of gltfpack. gltfpack is a command-line tool that converts .obj or .gltf files to glTF files that are optimized for render performance and transmission time. gltfpack merges meshes and materials to reduce draw call count, merges buffers to reduce draw setup cost, quantizes vertex attributes to reduce GPU memory footprint, optimizes vertex and index data for more efficient GPU rendering, resamples and quantizes animation data to reduce memory footprint, and can optionally compress the vertex/index/animation buffers in the output using meshoptimizer codecs to further reduce the file size.

The resulting files rely on two not-yet-standardized extensions; when compression is not used, the resulting files can be loaded using three.js (r107+) and Babylon.js (4.1+) glTF loaders. Loading compressed files requires integrating JavaScript decoders (js/meshopt_decoder.js); demo/GLTFLoader.js contains a custom version of three.js loader that can be used to load them.

This release contains a few improvements for simplifier, introduces a new simplification algorithm, adds support for custom allocators and improves performance and code size of JavaScript decoders.

Interface changes:

meshopt_computeMeshletBounds now passes meshlet parameter by pointer instead of by value.

New algorithms:

Introduce a new simplification algorithm, meshopt_simplifySloppy, that performs decimation without concerns for topological integrity. The algorithm can and will merge small disjoint features together, and is extremely fast at ~20M triangles/sec on large meshes on modern desktop CPUs.
Memory allocation can now be configured to use custom allocation callbacks using meshopt_setAllocator.

Improvements:

Default simplifier now uses normalized error metric, which makes it much easier to consistently configure target_error parameter - it now corresponds to linear error, normalized to mesh radius (0.01 means 1% deviation).
Fix edge cases when default simplifier could run many passes in vain, resulting in poor performance.
Improve JavaScript decoder performance: vertex decoding is 17% faster, index decoding is 1.7x faster.
Improve JavaScript decoder size: decoder.js is now 2.4x smaller (3.5 KB after gzip)

Compatibility:

Fix gcc -Wshadow warnings
Work around a bug in Edge ChakraCore compiler that could result in indices being incorrectly decoded with decoder.js.

This release contains a number of fixes and improvements for vertex codec, substantially improves performance of several algorithms in Debug builds and introduces support for decompressing vertex/index data from JavaScript.

New algorithms:

Introduce an experimental algorithm, meshopt_generateVertexRemapMulti, that generates the same remap table as meshopt_generateVertexRemap for indexing a mesh, but supports vertex data stored as multiple independent streams (deinterleaved)
Introduce an experimental algorithm, meshopt_generateShadowIndexBufferMulti, that can generate a second index buffer that shares the vertex data with the original index buffer, but supports vertex data stored as multiple independent streams (deinterleaved)

Improvements:

Optimize NEON code in meshopt_decodeVertexBuffer, making it 1-2% faster
Improve compatibility of SIMD code in meshopt_decodeVertexBuffer, fixing compilation issues on ARM64, MSVC ARM, and clang for Windows
Fix a bug in meshopt_encodeVertexBuffer that resulted in incorrectly encoded data on platforms where char is unsigned (this mostly affected ARM hosts such as Android)
Substantially improve performance of multiple algorithms in Debug:
- meshopt_analyzeVertexCache is 6x faster
- meshopt_optimizeVertexCache is 4.7x faster
- meshopt_analyzeOverdraw is 3.9x faster
- meshopt_optimizeOverdraw is 1.4x faster
- meshopt_simplify is 1.3x faster

JavaScript support:

Introduce js/decoder.js that contains a WebAssembly version of vertex and index decoders with a JavaScript-friendly interface. The decoders run at 200-400 MB/s on modern desktop CPUs.
Introduce tools/OptMeshLoader.js that contains an example mesh loader for THREE.js that uses vertex/index codecs for compression and quantizes vertex data for efficient storage; the meshes for this loader can be produced by tools/meshencoder.cpp using .OBJ files as an input.

This release substantially improves mesh simplification and introduces experimental algorithms for advanced GPU mesh rendering (cone culling, meshlet construction). The library can also now be used from Rust via https://crates.io/crates/meshopt.

Interface changes:

meshopt_simplify has an extra argument, target_error, that can be used to limit the geometric error introduced by the simplifier

New algorithms:

Introduce an experimental algorithm, meshopt_buildMeshlets, that can create meshlet data from index buffer that can be used to efficiently drive the mesh shading pipeline in NVidia RTX GPUs
Introduce experimental algorithms, meshopt_computeClusterBounds and meshopt_computeMeshletBounds, that can compute bounding sphere and bounding normal cone for use in GPU cluster culling.
Introduce an experimental algorithm, meshopt_generateShadowIndexBuffer, that can generate a second index buffer that shares the vertex data with the original index buffer, but is more efficient when a subset of vertex attributes is needed.

Improvements:

Significantly rework meshopt_simplify to improve simplification quality, including error metric improvements, attribute-guided collapse that preserves UV seam structure better, and other tweaks
Significantly rework and optimize meshopt_simplify, making it ~4x faster
Optimize meshopt_generateVertexRemap, making it 1.25x faster
Optimize meshopt_decodeVertexBuffer for platforms without SIMD support, making it 1.1x faster
Fix undefined behavior (left shift of negative integer) in meshopt_encodeVertexBuffer

This release introduces vertex buffer encoder and a stable version of index buffer encoder.

New algorithms:

Introduce vertex encoder that compresses vertex buffers; it can be invoked using meshopt_encodeVertexBuffer and meshopt_decodeVertexBuffer. The algorithm typically provides 1.5-2x compression ratio for quantized vertex data, and the resulting data can be compressed further by a general purpose compressor like zstd. Decoding is highly optimized using SSSE3/NEON and runs at 2 GB/s on a modern desktop CPU.
Introduce a stable index encoder that compresses index buffers; it can be invoked using meshopt_encodeIndexBuffer and meshopt_decodeIndexBuffer. The algorithm typically encodes index buffers using ~3-4 bits per index, and the resulting data can be compressed further by a general purpose compressor like zstd, yielding ~2-3 bits per index for most meshes. Decoding is highly optimized and runs at 2 GB/s on a modern desktop CPU for 32-bit indices (1 GB/s for 16-bit indices).
Introduce a new algorithm to optimize for vertex fetch, meshopt_optimizeVertexFetchRemap; it generates a remap table that can be used with meshopt_remapVertexBuffer/meshopt_remapIndexBuffer and helps optimizing meshes with several vertex streams.

Improvements:

Optimize cluster sorting in meshopt_optimizeOverdraw, making the function 10% faster
Optimize index decoder, making it 15% faster for 32-bit indices and 40% faster for 16-bit indices
Fix meshopt_analyzeVertexCache and meshopt_analyzeVertexFetch results for sparse vertex buffers (with unused vertices)
Support in-place optimization in meshopt_remapVertexBuffer
Improve CMake build files to make the library easier to integrate

This release has large interface changes and introduces several new algorithms and tweaks to existing algorithms.

Interface:

All C++ function wrappers have been moved out of meshopt namespace and gained meshopt_ prefix to simplify documentation & interface
All structs used by the interface have been renamed and now also have meshopt_ prefix to avoid name conflicts
meshopt_quantizeX functions now use function arguments instead of template parameters for better compatibility
cache_size argument has been removed from meshopt_optimizeVertexCache and meshopt_optimizeOverdraw; to perform optimization for a FIFO cache of a fixed size, use meshopt_optimizeVertexCacheFifo

New algorithms:

Introduce an algorithm that compresses index buffers; it can be invoked using meshopt_encodeIndexBuffer and meshopt_decodeIndexBuffer. The algorithm typically encodes index buffers using ~3-4 bits per index, and the resulting data can be compressed further by a general purpose compressor like zstd, yielding ~2-3 bits per index for most meshes.
Introduce an algorithm that can convert an index buffer to a triangle strip that is still reasonably cache efficient; indexed triangle strips are faster to render on some hardware and can reduce the index buffer size. The algorithm can be invoked using meshopt_stripify and typically produces buffers with around 60-65% indices compared to triangle lists, and a 5-10% ACMR penalty on GPUs with small caches.
Introduce a new quantization function, meshopt_quantizeFloat, that can reduce the precision of a floating-point number while keeping the floating-point representation. This can be useful to generate vertex data that can be compressed more effectively using a general purpose compression algorithm.

Improvements:

Overdraw analyzer (meshopt_analyzeOverdraw) now uses a pixel center fill convention to match hardware rendering more closely.
Vertex cache analyzer (meshopt_analyzeVertexCache) now models cache that matches real hardware a bit more closely, and requires additional parameters to configure (namely, primitive group size and warp/wavefront size).
Vertex cache optimizer (meshopt_optimizeVertexCache) has been tuned to generate better output that performs well on real hardware, especially given meshes that have topology similar to that of a uniform grid as an input.
Various algorithms have been optimized for performance and memory consumption.

Jul	AUG	Apr
	08
2020	2021	2022