Our company is running two separate networks internally for company operations; one network is for testing and the other is for production. The production network has access to the the public internet while the testing network does not. We have identical Gemfire grids set up on both of these networks for testing and production support, respectively. The Gemfire grids are mirror images of each other in terms of hostnames, configuration, and locators to the point that we could copy the deployed .war file (we are hosting each Gemfire node in a webapp inside Tomcat) from the test network to the production network with zero change in configuration.
Recently, we have been observing some odd behavior on the test network: Gemfire will intermittenly become partially unresponsive and will require a reboot to start working properly. In particular, one of our CacheListeners stops firing. We have not seen this at all in the production network in the few months that we have had the Gemfire application live in production, however, this happens fairly frequently in the test network.
I know this is a very vague description of the problem but I haven't been able to isolate it any more than this yet. It is very difficult to reproduce. Is there a possiblity that something like this could be related to the limited public internet access on the test network? Does Gemfire require a public internet connection for its licensing? That is the only difference between the two installations so it is naturally what we thought to look at first. Any other ideas?
Tom