CNaaS NMS Synchronization

Every device in CNaaS NMS has a synchronization status that is either true or false. This value represents whether the NMS considers this device to have the latest up to date configuration or not. It does not necessarily reflect the actual state because someone could log in to a device and make lokal changes without the NMS noticing.

Devices will move in and out of synchronization based upon a number of different events that can occur, here is an overview:

Becomes unsynchronized when...Becomes synchronizedDoes not affect synchronization
  1. Settings repo is updated:
    global → all devices
    access/dist/core → all devices if this type
    device/<hostname> → that hostname
  2. Templates repo is updated:
    Based upon dependencies in mapping.yml devices
  3. syncto touches a device and discovers it's been modified outside of NMS
  4. Interfaces are updated via device interfaces API
  5. ZTP of new DIST/CORE neighbors need new linknets
  6. Device is moved to UNMANAGED state
  7. Device upgrade (post_flight step) is performed (since v1.2.0)
  8. Mgmtdomain is added/removed/changed (since v1.2.0)
  9. Linknet added/removed via API (since v1.3.0)
  1. syncto job pushes new config
  2. syncto discovers this device does not need any changes
  1. Local changes made to device outside of NMS
  2. Bouncing interface via interface status API

How syncto selects devices for synchronization:

The default way to run syncto is to run on "all" devices, what this actually means is that all devices that are not already marked as synchronized will get new config sent to them. So if you update settings for access devices and run a syncto "all" job, only access devices will be touched. If you suspect someone has made local changes to a device without making it unmanaged, you can run syncto "all" with the extra option "resync" set to true to contact all devices regardless of their syncronization status. This also applies if you synchronize a group or device type instead of "all". If you however select a single hostname, that device will always be contacted regardless of it's previous synchronization status.

The intention is that all devices should always be marked as synchronized unless someone is actively working on something. If you have to leave a device as unsynchronized and go do something else you should move the device to UNMANAGED state until you can get it back to synchronization so that it will not stop other people from working in the system. For SUNET there is also a Nagios alarm that will trigger if devices are left unsynchronized for more than an hour, so make sure to notify NOC (or make the device UNMANAGED to stop the alarm) if you leave something unsynchronized. If you mark devices as unsynchronized it's probably a good idea to make a note to come back and make them MANAGED again at some later point.

Unable to synchronize Arista device

If you get an error message like "napalm.base.exceptions.SessionLockedException: Session is already in use"

It means some other configuration session is pending on the device, maybe someone was trying to make local configuration changes on the device. To clear these config session SSH into the device and run "show configuration sessions" to see a list of any pending sessions, then enter the config sessions one by one using "configure session <name>" and then run "abort" to abort that configuration session.

  • No labels