Software rendering

I need help in software rendering in general any recommended documents to look and for raspberry pi should I code NEON of vfp. Because vfp supports old Pi's but vfp is in Assembly but I am not comfortable in assembly. Any help might be helpful.
And rendering in 2d would be helpful because 3d is hard and
I don't know much math's.

Edited by Shazan Shums on
I don't know if it's what you're looking for but this may help : How OpenGL works: software renderer in 500 lines of code
You should code neon, not vfp. Of course that means that RPi 1 and Zero will not be supported, because neon is not available there.

Why? Because there is not a big advantage of "vector" floating point. While it offers to operate on registers that contain multiple floats, it is actually implemented to do so sequentially. So there will be no performance gain than simply doing scalar operations. For real SIMD operations that execute one instruction over multiple data faster than doing same operation multiple times you need to use NEON.

As a fallback you just execute plain C version of code to support Rpi 1 and Zero. If you need to use floats, then just use regular floats in C and let the compiler to optimize - it will use VFP instructions whenever it can.

For doing NEON don't go into assembly, do intrinsics - same as HH is doing with SSE2. While NEON intrinsics don't give possibility to emit all possible NEON assembly instruction, they will be good enough. Especially if you use clang compiler (I find it generates better ARM asm than gcc). Only when you do profiling/performance analysis on complete code you wrote and determine that some function that uses NEON is a bottleneck, only then you go into asm.

Another option for Raspberry Pi is to use fixed point / integer math for rendering. For older models it may give you much better performance that float math. For simple 2D stuff it might be good enough. NEON has support for integer math.

Another consideration is that Raspberry Pi 3 can actually run in 64-bit mode (just like x86 can run in x86_64 mode). And NEON instructions are a bit different there. While regular add/mul/sub instruction work similarly, they are encoded differently and there are additional instruction that you can benefit from. So writing asm directly will be more painful because you'll need to maintain completely different code bases. Unfortunately currently there is no official support from Linux kernel, so it runs only in 32-bit mode. But people are working on it, I expect very soon we'll be able to officially run 64-bit code on rpi3.