There are some that would say that building a subtractive monophonic Synth using modular gear is almost missing the point.  I don’t necessarily agree, but it raises the central and FIRST question you should ask: what is it that you want your modular to do?  No one can tell you, that is for you to figure out and it very much determines what build you should explore.
I think that the extensive modulation options of pretty much every parameter in many modules offers unique sonic possibilities, even in a subtractive synth.  In addition to the excellent points that Lugia made, consider that many well revered standard hardware synths use multiple envelopes and LFOs per voice: pitch, volume, filter at a minimum and that some of them are five or six stage (delay, attach, hold, decay, sustain, release).  A four multi stage envelope module is ideal and XAOC Zadar reigns supreme here.  LFOs also in fours (Batumi), so that you can get some tremolo and movement (pitch, volume, PHASE, PWM etc.).
That said, I personally would consider Zadar and Batumi a must for any monophonic subtractive build.  Also, I’d second the SVF comment, albeit I would advise to also include one of the really interesting multi-mode filters, such as Three Sisters, QPAS, or Belgrade as they REALLY open up sound sculpting.
Lastly, Reverb and Delay are must have effects, followed by Chorus and Phaser.  Classics for a reason.  Also, here is where things get interesting, as you can do neato things with a sequencer and those parameters.  At a minimum, add a decent multi FX (Black Hole DSP2) and Pamela’s New Workout.  Sending a Euclidean sequenced saw to the reverb size can make for some interesting effects.
Now for the interesting and unique:
Sample based sound engines: 4ms Stereo Triggered Sampler and MN Morphagene.  The former is better suited for pre-sliced samples (albeit you can experiment with start and length parameters) while the latter is just awesome for all sorts of sonic mayhem that involve dynamic playback speeds and slicing (including triggered) and cross fading.  Both are excellent choices.  Both can do granular synthesis, albeit the Morphagene with its dedicated CV control has a bit of an edge here.
Additive synthesis:
Audiospektri HG-16 is high on my want list for this purpose, albeit, I could see some of the FPGA based technology really take a hold here.  XAOC Odessa should be interesting, but I’m really hoping for a true FFT based engine at some point (Fast Fourrier Transformation) with full control over the resulting harmonics.  Any sound source can go in, like a sampler, but it is actually re-created using additive synthesis.
Wavetable synthesizer:
Preferably with custom wavetables to be loaded.  The Piston Honda MKIII is a nice example of what can be done with this, particularly the CV controlled morphing.  The more you like percussive noise, the more you need to look into this.  Manis Iteritas and the other Wavetable Oscillators from Noise Engineering reign here.
FM synthesis:
Bring on the noise!  Akemies Taiko and Castle.  Harsh, metallic, mental.  Love it, but I’m out of space.
Sequencers and clock:
The beating heart of your setup.  Nothing comes close to Tempi+Rene MKII, albeit for percussion I’d suggest a Trigger Riot.  These can be easily substituted for using a MIDI-to-CV module and whatever other sequencer you have.  The Mutant Brain is a decent choice.
The unsung heroes:
It is ALL about generating and mangling CV.  That became clear VERY fast to me.  Hence, modules that attenuate, modulate, cross-modulate, and otherwise combine CV are the heart of what makes a modular unique.  This has been my biggest epiphany since I ventured in head on.  Half of your setup will easily be those types of modules and 2/3rd will generate and mangle CV without ever making a sound themselves.
Modules I love:
WMD Triple Bipolar - CV controlled attenuverter with three individual channels and a SUMMED out, also does fine work on audio signals
Noise Engineering Roti Polar - Four CV input attenuverting mixer (e.g. add two envelopes with opposing signs to get a bidirectional envelope to control the playback speed of a Morphagene)
4ms VCA Matrix - it’s big, yes, but it’s a CV modulated four channel CV mixer for all weird complex waveforms
Ornament & Crime - for the ASR alone
Branches - a dual Bernoulli gate: if you have two different timbral versions of a sound (two oscillators or two Signal paths) having a probabilized sequence of the two can be fun.  Example: open and closed hi-hat in a percussive sequence
Marbles: still very much learning how to use it, but we all love happy accidents, don’t we?