Conversational Biometrics Group - CBS Project

The Conversational Biometrics Server (CBS)


The CBS provides the following types of functionality:

  1. Acoustic text-independent speaker recognition (acoustic verification)
  2. CB Policy Manager (CBPM)

CBPM can be used to combine acoustic verification with knowledge verification. However, it also provides a much broader framework that allows combining dynamically multiple verification sources, such as possession-based verification (e.g. key, caller-ID) and other types of biometrics (e.g. fingerprint, iris).


The server is currently designated to be integrated in telephony environments where speech applications are currently becoming prevalent. However, nothing in its design and implementation restricts its use for the telephony environment. The CBS can be used wherever security is a concern, and whenever user verification needs to be performed. A high level block diagram of a system using the CBS is shown in figure 1.



1. Acoustic Verification

The CBS uses an acoustic verification subsystem implementing algorithms that perform text independent speaker recognition (known sometimes as voiceprint matching). This capability allows using a personís voice as a biometric. As mentioned above, the recognition is text-independent, meaning that a userís identity can be determined from a given speech sample regardless of what is being said: The user provides a claim of his/her identity and speaks, and the system matches the speech to the speaker model (voiceprint) of the same user (trained in an earlier session). This match produces a similarity score, and a binary acceptance/rejection decision produced by thresholding the score.

The acoustic verification subsystem works in real time, and is capable of producing scores while speech is being gathered by an audio collection device (e.g. telephone, microphone). The interface to the acoustic subsystem includes two main functions: enrollment (creating a speaker model based on given speech) and scoring (measuring the similarity between a given speech sample and a speaker model). Existing speaker models may also be adapted using new speech data.



CBPM is a framework that allows dynamic generation of verification challenges on the fly, during a verification session, such that the session will obey a predefined security policy.


The server has access to a pool of possible verification challenges, or Verification Objects (VOís), configured by the clients. Examples of a VO include knowledge topics (mother maidenís name, favorite color, application-specific topic), possession-based VOís (caller-id or key), and different biometrics (fingerprint, keyboard stroke, retina). A security policy is a Finite State Machine (FSM) that at each state includes either a specific VO or a list of VOís from which a VO will be randomly chosen in run time. The state machine always ends in either acceptance or rejection of the user. A vary simple example policy is shown in figure 2.




When a user enrolls in the system, a user profile is created. The user profile specifies the different ways that the user can be authenticated. For example, if the user provided knowledge data such as motherís maiden name or favorite color, then both the questions and the answers will be in the profile. If for this user a can be obtained in run-time, then his profile will include an application specific field indicating that he may be prompted for balance. In other words, the profile contains a list of all possible VOís for the user, with the exception of acoustic verification that does not need to be explicitly specified in the profile. The server would generate a biometric score every time a speech/knowledge related VO is invoked.


When the CBS receives a request to authenticate a user based on a policy, it loads the policy, and invokes the first VO in the policy state machine. Based on the result of the first invoked VO (level of acoustic score, match of knowledge value, etc.) the CBS determines which state of the policy to branch to, and consequently what the next VO to invoke is. This process iterates until either an acceptance or rejection state is reached. Since the decision is made in run-time, based on the actual result of each invoked VO, the VOís are said to be generated dynamically.


The accumulated results of previous VOís in the current verification session are called a context. The context stores the following information about the session that is in progress:

  1. Accumulated acoustic score for the session.
  2. Number of VOís that were satisfied (e.g. answer matches the answer in the profile).
  3. Total number of VOís already invoked.
  4. List including the VOís already invoked.
  5. Last VO returned value in three formats: string/ float / Boolean.
  6. The user name (so that the user profile can be retrieved).
  7. Policy name (so that the policy can be retrieved).
  8. Current state ID.
  9. Acoustic threshold (default value).
  10. Total number of VOís in the user profile.
  11. Default maximum number of dissatisfied VOís


At every turn/iteration the CBS updates the context based on the userís input. The CBPM checks which of the conditions of the current state in the policy FSM are met (conditions on the values of the context). Accordingly, the CBPM decides which next state to branch to.


3. Implementation Details

The CBS can currently run on Linux/UNIX and Windows. It was recently load-tested for performance and demonstrated high stability, resource efficiency, and scalability with the number of concurrent channels. The CBS works in a stateless mode, where multiple servers can be run concurrently and every individual turn can be directed to a different server. This allows extreme system robustness. Policies, profiles, and VOís are scripted in XML.