Connection aborted issues
Error while running playbook (deploying services or infrastructure)
TASK [consul-env : register instance type] *************************************
fatal: []: FAILED! => {"changed": false, "failed": true, "msg": "Could not connect to consul agent at localhost:11580, error was ('Connection aborted.', BadStatusLine(\"''\",))"}
This normally means that consul is experiencing cluster rift due to docker conntrack issue. To solve this issue it is necessary to upgrade to latest docker-agent and consul and redeploy these services.
- Install latest docker agent (Athena shell)
athena-services docker-agent
- Install latest consul (Athena shell)
athena-services consul
- Stop consul in all machines except for AccessGateway instance. In a particular stop in (Backoffice, Public, Internal, Exchange) machine (Instance shell)
sudo docker stop consul
Try to recover consul nodes by following the consul outage recovery guide
If it was not possible to recover by doing steps described in consul outage recovery guide, log-in into access gateway and delete corrupted consul data (Instance shell)
sudo docker stop consul && sudo docker run -it --volumes-from consul-data busybox sh -c 'rm -rf /var/consul/*' && sudo docker start consul
- Initialize AccessGateway consul (Athena shell)
athena-infrastructure vpc,consul
- Restart consul on all nodes (Athena shell)
athena-services consul
- To make sure that all services get properly re-registered run (assuming that all used services are listed in services-
.roles) (Athena shell)