a few more thoughts - depending on what other gear you have - a poly synth for example...

I would swap the chainsaw for a general cv - it's a eurorack implementation of general midi - so has loads of samples of real instruments - plus it can be expanded to be used as a polyphonic cv-> midi converter (possibly the best one out there for interfacing sinfonion with external poly synths) - & general cv will also double up as a percussion source!

agree with farkas - get vcv rack (although personally I dislike it) and consider a dc-coupled audio interface to interface the odular with it - these can be had for cheap - used RME interfaces or a bit more expensive and in rack es8/9

are you thinking of using the bitbox for drums? if not how are you thinking of using the metronome to 'add a cool drum transition at the end'?

building a mixer from scratch is a real pita - seriously check out the tex-mix - I really like mine!

I have something similar in terms of generative (marbles) and programmable (step fader, BSP - & thinking hard about a black sequencer) feeding a sinfonion (plus a pico seq - which I'm not at all keen on - for swapping song parts) for sequencing and maybe a couple more voices than necessary

you can check out my rack
ModularGrid Rack
which is something similar to what you are trying to do (NB left most section is mostly video synthesizer - some of the doepfer modules are used for both though) although this is slightly out of date - it's now spread over 8 cases not 6 as depicted

I agree with @farkas about going back to the drawing board too... step back as far as possible and start with what you requirements are - how many voices and what type - how do you sequence that, do you want programmatic or random or both - do you need percussion - much better in terms of price/features to go outside the box...

I'm looking forward to @Lugia's take they are often very interesting!

"some of the best base-level info to remember can be found in Jim's sigfile" @Lugia

Utility modules are the dull polish that makes the shiny modules actually shine!!!

sound sources < sound modifiers < modulation sources < utilities