Have you read Part 1? As a recap, this series is on using SIMD (specifically, 128 bit SSE 4 float vectors) to optimize a batch normalizing of vectors. While the end result may not be the most super-useful-awesomest-thing-in-the-world, it is
Part 1: Vector3 Batch Normalization – FPU vs SIMD
Recently, I wanted to see how fast I could make a function that normalizes every vector in an array. The goal was to take “natural” data and make it as fast as possible without putting any undue restrictions on the