Hi Stuart,
I ran into an issue similar to what you're describing.
We have 5000+ devices and some of them would sporadically report down (dns) then come back on the next poll.
DNS was not the culprit, well, not directly.
We run Ubuntu and we found out that the default local DNS cache daemon (systemd-resolved) had issues (besides being limited to a static value of 4096 entries)
We replaced the host's local cache with bind (very simple to set up) and haven't had any similar issue since.
If you indeed run a local DNS cache, you can try disabling it altogether and see if things get better.
HTH,
Ahmed.