subject-id - SAML2 General Purpose Subject Identifier in SWAMID

SWAMID and all other academic identity federations within eduGAIN introduce a new long-term unique identifier, subject-id, which is planned to eventually replace eduPersonPrincipalName (ePPN). This wiki page describes both why a new long-term unique identifier is needed and how we transition in an orderly manner.

Background

Since SWAMID was formed in 2007, the eduPersonPrincipalName (ePPN) has been recommended as the user name  of users released from identity providers (IdPs) to services (SPs). A majority of the services in SWAMID use ePPN as the identifier of users. To ensure that the organisation complies with Swedish legislation regarding access and handling of personal data and sensitive data, e.g. The Data Protection Regulation, SWAMID requires in its Identity Assurance Profiles and in the SAML WebSSO Technology Profile that the ePPN is globally unique and never reused to another individual:

SWAMID Identity Assurance profiles:

5.2.3 Each Subject MUST be represented by one or more globally unique identifiers.

Subject identifiers MUST NOT be re-assigned.

SWAMID SAML WebSSO Technology Profile:

5.5.3 An Identity Provider MUST support release of the attribute eduPersonPrincipalName. The value of the attribute for a Subject MUST NOT be reassigned to another Subject.

Guidance: The e-mail address of a Subject is not suitable as value for the attribute eduPersonPrincipalName due to name changes and later reassignments to other Subjects.

One problem with ePPN is that some identity federations within eduGAIN allow the attribute to be reused for other individuals. This makes the attribute inappropriate as an identifier internationally. When an individual leaves an organisation, it is highly likely that the individual's authorisations remain in federated services, linked to the individual's user name (ePPN). If a new individual at the same organisation receives the same username, the new individual will automatically have the same permissions and access to the same data as the previous holder of the username. For this reason, it is an absolute requirement that usernames are not reused for other individuals.

To handle this, a new identifier, subject-id, has been developed, defined as the General Purpose Subject Identifier (section 3.3) in the SAML V2.0 Subject Identifier Attributes Profile Version 1.0.

From the SWAMID SAML WebSSO Technology Profile:

5.5.4 An Identity Provider MUST support the release of the attribute subject-id . The value of the attribute for a Subject MUST NOT be reassigned to another Subject.

Guidance: The subject-id is a globally unique identifier identical for all Relying Parties for a given Subject. SWAMID recommends that the value of eduPersonPrincipalName is used for subject-id since it is already defined for all Subjects, widely used as identifier in Relying Parties in SWAMID, unique and non- reassigned for all Identity Providers in SWAMID. The subject-id should not be changed as a result of a change to any other data associated with the Subject (e.g., name, email address, organisational role).

Differences between ePPN and subject-id

subject-id has the same properties as ePPN has in SWAMID:

  • Includes a scope (ex. @ org.se )
  • May never be assigned to another individual (ever)
  • Similar to an email address (but should not be an email address)
  • Should be treated case-insensitively, i.e AnvandarNamn@ORG.SE shall be counted as the same value as anvandarnamn@org.se

However, there is an important difference between ePPN and subject-id; its allowed characters:

  • ePPN: A-Z, a-z, 0-9, hyphen ( -), undescore ( _), period ( .)
  • subject-id: A-Z, a-z, 0-9, hyphen ( -), equal sign ( =)

Other things to consider

  • It is advisable that the same case is used for ePPN and subject-id to minimise the risk of mismatch in services.

Plan and implement the switch in an Identity Provider (IdP)

There are a couple of obvious solutions to this:

  • Option 1 - Use ePPN as subject-id (if all requirements are met)
  • Option 2 - Change the value of the ePPN (if reuse requirements are not met)
  • Option 3 - Translate ePPN to subject-id in a specific way (if ePPN is never reused and period or underscore appears in ePPN)
  • Option 4 - Keep value of ePPN, choose a new value for subject-id

The alternatives have both advantages and disadvantages.

Option 1 - Use ePPN as subject-id if all requirements are met

If ePPNs are chosen in a way that ensures they are never reused for another individual and underscore ( _) or period ( .) does not occur, it is valid to use the same value for subject-id.

Advantages:

  • Services can replace ePPN with subject-id without change
  • Usernames are well established and recognised by users

Disadvantages

  • None

Option 2 - Change the value of the ePPN (if reuse requirements are not met)

For organisations that currently use the user's e-mail address, or something based on the e-mail address, for ePPN, it may be appropriate to take the opportunity to choose a new value for ePPN which is then also used for subject-id.

  • Choose a new subject-id value that does not risk being reused (and has not been used as an ePPN by another individual before)
  • Do not use period ( .) or underscore ( _) in new ePPN values
  • Choose something that is relatively easy for people to handle
  • Use the same value for subject-id as for new ePPN value

Advantages:

  • Chance to start over with unique identifiers that are not at risk of reuse (unlike email addresses used as ePPNs by some organisations within SWAMID today)

Disadvantages

  • Users need to manage new usernames (at login or used in user interface)
  • Services need to handle user name changes via, for example, one of these methods:
    • Using ePPN but saving received subject-id for a transition period and then switching to only subject-id values
    • Change all usernames in the service at the same time
    • Create new user identities without connection to the old ones

Option 3.1 - Translate ePPN to subject-id by removing unwanted characters

If ePPNs are chosen in a way that ensures they are never reused for another individual but the underscore ( _) or period ( .) occurs, it is possible to remove unwanted characters from the ePPN to form subject-id.

  • Use the ePPN as the basis for the value of subject-id
    • Delete period ( .) and underscores ( _)
    • Example:
      • anna_b@org.seannab@org.se
      • fornamn.efternamn1-efternamn2@org.sefornamnefternamn1efternamn2@org.se
  • Ensure that the ePPN (and subject-id) is never reused
    • This follows from the identity assurance profiles, the technology profile as well as GDPR and security reasons as there is a risk of unauthorised access to someone else's data
    • If email address is used today, make sure they are never reused and create an identifier other than email address for new users (without period and underscore), see Option 4 below
    • It occurs today that e-mail addresses are reused, sometimes after an explicit decision by the board

Advantages:

  • Services can replace the ePPN with subject-id without change, except for those users where an underscore or period appears in the ePPN, and then either do a translation with the removal of the underscore and period or require new user identities in the service only for these users
  • The usernames are well established and in most cases are recognised by the users
  • The usernames do not contain any unexpected character combinations

Disadvantages

  • There can be some confusion as to which username applies, and it differs between services that use ePPN as an identifier and those that use subject-id
  • There is a risk of conflict between usernames. In Ladok, there are roughly 100,000 SWAMID ePPNs, of which just under 1,000 ePPNs contain periods or underscores. Of these, seven conflicts arise, of which four are likely the result of misspellings when creating user identities in Ladok that are therefore not used.

Implementation in Shibboleth

Replacement of period and underscore in Shibboleth Identity Provider is described in Example of a standard attribute resolver for Shibboleth IdP v5 and above . With minor adjustments, characters can be removed instead. Before that, it needs to be ensured that two individuals cannot get the same subject-id.

Implementation in ADFS

Support for changing value from ePPN to subject-id is available in version 2.3 of ADFSToolkit.

Option 3.2 - Translate ePPN to subject-id with replacement characters

If ePPNs are chosen in a way that ensures they are never reused for another individual but the underscore ( _) or periods ( .) occurs, it is possible to replace unwanted characters in the ePPN to form subject-id.

  • Use the ePPN as the basis for the value of subject-id
    • Replace period ( .) with =2E
    • Replace underscores ( _) with =5F
    • Alternatively, replace periods and underscores with hyphens ( -) if there is no risk of conflicts
    • Example:
      • anna_b@org.seanna=5Fb@org.se
      • Alternatively anna_b@org.seanna-b@org.se
      • fornamn.efternamn1_efternamn2@org.sefornamn=2Eefternamn1=5Fefternamn2@org.se
      • Alternatively fornamn.efternamn1_efternamn2@org.sefornamn-efternamn1-efternamn2@org.se
  • Ensure that the ePPN (and subject-id) is never reused
    • This follows from the identity assurance profiles, the technology profile as well as GDPR and security reasons as there is a risk of unauthorised access to someone else's data
    • If email address is used today, make sure they are never reused and create an identifier other than email address for new users (without period and underscore), see Option 4 below
    • It occurs today that e-mail addresses are reused, sometimes after an explicit decision by the board

Advantages:

  • Services can replace the ePPN with subject-id without change, except for those users where underscores or periods appear in the ePPN, and then either make a translation with =5F/ =2E/ - or require new user identities in the Service for only those users
  • The usernames are well established and in most cases are recognised by the users

Disadvantages

  • The ePPNs that contain underscores or periods are given a confusing new value as subject-id with an equal sign in it
  • Some well-established usernames are no longer recognised in the system and differ from what is used at login
  • There can be some confusion as to which username applies, and it differs between services that use ePPN as an identifier and those that use subject-id

Implementation in Shibboleth

Replacement of period and underscore in Shibboleth Identity Provider is described in Example of a standard attribute resolver for Shibboleth IdP v5 and above .

Implementation in ADFS

Support for changing value from ePPN to subject-id is available in version 2.3 of ADFSToolkit.

Option 4 - Choose new values for subject-id

If ePPNs are chosen in a way that ensures they are never reused for another individual but the underscore ( _) or period ( .) occurs, there is a chance to choose new values for these or all users to form the subject-id.

  • Choose a new subject-id value that has no direct connection to the ePPN (and has not been used as an ePPN by any other individual before)
  • Do not use period ( .) or underscore ( _) in new subject-id values
  • Choose something that is relatively easy for people to handle
    • Bad example: 488d2f98-b670-4c13-aedf-c5b4d0783efb@org.se - difficult to handle in administration
    • Good examples: andber01@org.se - for Anders Bertilsson, easy to handle and remember for users, note, however, problems around name changes
    • Good examples: lusab-babad@org.se - translation of unique 32-bit integer via Proquints (interesting reasoning about identifiers on the link!), used by eduID and Antagning.se

Advantages:

  • Chance to start over with unique identifiers that are not at risk of reuse (unlike email addresses used as ePPNs by many organisations within SWAMID today)

Disadvantages

  • Users need to manage new usernames (at login or used in user interface)
  • Services need to handle user name changes via, for example, one of these methods:
    • Using ePPN but saving received subject-id for a transition period and then switching to only subject-id values
    • Change all usernames in the service at the same time
    • Create new user identities without connection to the old ones

Plan and implement the change in a service (SP)

Remember to always compare the user identifiers ePPN and subject-id case-insensitive, i.e AnvandarNamn@ORG.SE shall be counted as the same value as anvandarnamn@org.se.

Proposed process for switching from ePPN as user identifier to subject-id

Services with few users or without the possibility of automating the change of user name

  1. Collect new usernames for users (subject-id)
    1. Change manually in the system
    2. Change to subject-id as identifier

Services with many users and the possibility to automate changing usernames

  1. Determine which users use the service as well as values for ePPN and subject-id for these.
    1. Mark the service with the entity category REFEDS Personalized Access to receive subject-id from IdPs
  2. Compare ePPN with subject-id
    1. If they are the same for all users
      1. Change to subject-id as identifier
    2. If they differ in a deterministic way (if, for example, all periods and underscores are removed)
      1. Develop rules in the translation service
      2. Update the user database to the subject-id values
      3. Change to subject-id as identifier
    3. If they differ in a non-deterministic way (for example, if many users have completely different values of subject-id than ePPN)
      1. Save subject-id for users during a transition period (for example using an extra column in the user table in the database or logging)
      2. Inform users via a secure channel that they need to log in by a certain date to maintain access to the system
      3. Change to subject-id as identifier

  • No labels