Version 3 is a major upgrade of the platform.
Monitoring- Stability fix (cAdvisor is causing unwanted container file system locks and does not report process memory usage accurately). Replaced cAdvisor with Glances and Prometheus Node exporter, both installed as a host services.
Service discovery- Stability fix (UDP packet masquerading issue prevents consul from sending UDP packets to restarted containers.) Moved consul to a separate docker network UDP with fixed IP address.
Authorization- Security improvement (WAF services were only authenticated). Now Athena services available through WAF are also authorized
Docker engine- Feature upgrade (New features and changes in v0.12 release notes). To improve container deployment network security Athena services are now deployed in a separate network “athena”. This is a major host infrastructure upgrade and may require machine restart depending on which platform components are installed. Upgrade critical infrastructure in an appropriate order using “serial 1” in ansible playbooks!
Monitoring- Feature upgrade. Now Kibana has set of default dashboards, one for all services, one per service and one per host.
Monitoring- Feature upgrade. Now fluentd td-agent supports set of common logging formats. Service playbooks now must specify labels
AthenaLogTypewhen launching a container.
Create backups for all running services data for emergency restore. These backups are hot and will not restore data state exactly and should be used as emergency restore only. If greater data consistency is necessary please use cold snapshots strategy instead.
Create hot snapshots for all running instances. Hot snapshots are used to rollback to host service binary state if instance host service upgrade fails. Hot snapshots do not provide data consistency. Use cold snapshots and or make instance unavailable during upgrade to ensure data consistency.
To find out which athena container services will be upgraded please run
Run athena-upgrade command which will upgrade all services listed in
services-<env>.roles file as well as all host installed platform sub-systems.
This will terminate current running instances. Make sure recent environment snapshots are available.
To roll back instance state to last known environment snapshot run
To rollback instance state to specific environment snapshot run
athena-snapshots -r -d 2016-06-10-10-18-58