Consul
Connection aborted issues
Problem
Error while running playbook (deploying services or infrastructure)
TASK [consul-env : register instance type] *************************************
fatal: [52.49.58.215]: FAILED! => {"changed": false, "failed": true, "msg": "Could not connect to consul agent at localhost:11580, error was ('Connection aborted.', BadStatusLine(\"''\",))"}
Solution
This normally means that consul is experiencing cluster rift due to docker conntrack issue. To solve this issue it is necessary to upgrade to latest docker-agent and consul and redeploy these services.
- Install latest docker agent (Athena shell)
athena-services docker-agent
- Install latest consul (Athena shell)
athena-services consul
- Stop consul in all machines except for AccessGateway instance. In a particular stop in (Backoffice, Public, Internal, Exchange) machine (Instance shell)
sudo docker stop consul
-
Try to recover consul nodes by following the consul outage recovery guide
-
If it was not possible to recover by doing steps described in consul outage recovery guide, log-in into access gateway and delete corrupted consul data (Instance shell)
sudo docker stop consul && sudo docker run -it --volumes-from consul-data busybox sh -c 'rm -rf /var/consul/*' && sudo docker start consul
- Initialize AccessGateway consul (Athena shell)
athena-infrastructure vpc,consul
- Restart consul on all nodes (Athena shell)
athena-services consul
- To make sure that all services get properly re-registered run (assuming that all used services are listed in services-
.roles) (Athena shell)
athena-services