Are many people using Distributed Polling? We are just playing around with it now.
How many devices can you typically poll per thread in under 5 minutes? Some of this is also a function of how much latency you have between your pollers and your devices and how "slow" your devices are. But we are having to throw a lot threads (>200) at this to ensure we get our 1200+ devices polling under 5 minutes.
The RRD and DB backends are not the bottleneck, it is purely the latency of the SNMP polling we need to overcome. The longest devices take between 150-200s depending on what else is hitting it at the time (e.g discovery + poller). So we need a fair number threads just to get these out of the way.
I was wondering if sorting the most expensive devices first could help avoid getting two slow devices back to back.
Also, it would be nice to have the system use a queue to schedule the work rather than a hard partition. This avoids the problem of losing 1/Nth of your devices when you lose a poller. With a queue you would just lose 1/Nth of the threads. If you had enough spare capacity you could ride through the downtime of a single poller.
Cheer
Milton