Voice recording system

The voice recording system lets you record a call as soon as the customer connects. It’s modelled as a workflow system, where a coordinator or orchestrator manages the flow of data among different workers for processing the data generated out of a system. The calling pattern is asynchronous and performed reliably using Kafka (CPCE Eventing).

Components

Workflow manager service

With the voice processing system, you can currently perform four steps of processing on recorded media: copying the recording file from Twilio, decryption, redaction, and transcription. More steps may be added in the future. The workflow manager service ensures these steps work seamlessly.

How it works

The workflow management system acts like an orchestration layer of the complete system. It’s built out of a state machine that manages the execution and state machine of the workflow using messages which are persisted for tracing the execution logic.

The workflow service runs on a JSON-based state machine where next state is achieved when a message of certain type arrives and new messages are generated. To do this, workflow manager interacts with different voice processors (recording worker, transcription worker, or redaction worker) based on the type of incoming message it receives.

Voice processors

Voice processors perform one type of activity at a time to abstract message-based processes. The processors are stateless, so they can scale as your voice processing increases. Plus, they interact with in-house or third-party providers.

Example:

  • Recording worker interacts with Twillio and S3 to copy voice recordings to an Expedia-owned S3 storage system.
  • Transcription worker interacts with Google Transcribe service to get transcriptions of audio recordings.
  • Redaction worker interacts with VoiceBase (third party) to get the PCI-compliant, redacted file.

Block diagram

The diagram below shows how the workflow manager interacts with various voice processors using the Kafka messaging system.

voice recording

Events generated from voice recording systems

The voice recording system publishes each completed event within a recording as a “VoiceRecordingEvent,” which gets consumed by DI for analysis.

Schema - VoiceRecordingEvent

{
   "name": "VoiceRecordingEvent",
   "namespace": "com.expedia.cpce.conversation.model.v2",
   "type": "record",
   "fields": [
       {
           "name": "partnerId",
           "type": "string"
       },
       {
         "name": "conversationId",
         "type": "string"
       },
       {
         "name": "recordingId",
         "type": "string"
       },
       {
         "name": "callId",
         "type": "string"
       },
       {
          "name": "participantId",
          "type": ["string", "null"]
       },
       {
          "name": "participantType",
          "type": ["string", "null"]
       },
       {
           "name" : "participantSsoLogin",
           "type" : ["string", "null"]
       },
       {
         "name": "recordingLocation",
         "type": "string"
       },
       {
         "name": "recordingLocationType",
         "type": "string"
       },
       {
         "name": "recordingDurationSec",
         "type": "string"
       },
       {
         "name": "recordingChannels",
         "type": "int"
       },
       {
         "name": "recordingProcessingType",
         "type": "string"
       },
       {
         "name": "recordingStartTime",
         "type": "string"
       },
       {
         "name" : "isTranscriptionPresent",
         "type": "boolean"
       }
   ]
}

Sample event

{
	"partnerId": "33bea4ef-93c2-4166-b297-2b5e07b35235",
	"conversationId": "d65e4bd0-aa0b-403c-8291-1860ed6b1848",
	"recordingId": "5b9951fe-02f7-4bd6-af1d-a0f32e398893",
	"callId": "479eed3e-f314-47ac-b267-ad577c17ad95",
	"participantId": "067c2e79-6e96-4480-8727-7a67afe25e8c",
	"participantType": "traveler",
	"participantSsoLogin": null,
	"recordingLocation": "recordings/33bea4ef-93c2-4166-b297-2b5e07b35235/d65e4bd0-aa0b-403c-8291-1860ed6b1848/5b9951fe-02f7-4bd6-af1d-a0f32e398893.wav",
	"recordingLocationType": "S3",
	"recordingDurationSec": "12",
	"recordingChannels": 2,
	"recordingProcessingType": "PostCall",
	"recordingStartTime": "1612364436000",
	"isTranscriptionPresent": false
}

Resource Manager integration

Once the workflow successfully completes, it is stored in the Resource Manager (RM). Within RM, recording resources are stored under the conversation container as sub-resources. Each conversation can have multiple recordings.

Resource endpoints

Recordings are stored in RM under the hierarchy: partner > conversations > recordings. The API endpoints to fetch partner recordings for a conversation are below.

  1. To fetch all recordings for a conversation:

    GET → /partners/{pguid}/conversations/{cguid}/recordings

    Request : {base_url}/partners/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/conversations/c8cc3508-b5dc-4de4-86c0-b519d58be66e/recordings

    Response :

    [
       {
           "metadata": {
               "id": "d9c2264d-09b6-4dbd-ab7b-2dba8f944a88",
               "resourceType": "voicerecording",
               "schemaUri": "/schemas/cp.voicerecording_v4.VoiceRecording",
               "container": {
                   "resourceUri": "/partners/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/conversations/c8cc3508-b5dc-4de4-86c0-b519d58be66e/"
               },
               "audit": {
                   "createdOn": "2021-02-01T22:49:44.743Z",
                   "createdBy": "SERVICE : 2031ec1lcs5182vkd7hko5aael",
                   "updatedOn": "2021-02-01T22:49:44.743Z",
                   "updatedBy": "SERVICE : 2031ec1lcs5182vkd7hko5aael"
               },
               "state": "ACTIVE",
               "resourceUri": "/partners/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/conversations/c8cc3508-b5dc-4de4-86c0-b519d58be66e/recordings/d9c2264d-09b6-4dbd-ab7b-2dba8f944a88"
           },
           "id": "d9c2264d-09b6-4dbd-ab7b-2dba8f944a88",
           "partnerId": "d84a7cfd-add2-40bf-9f5f-04622ba0d8fd",
           "conversationId": "c8cc3508-b5dc-4de4-86c0-b519d58be66e",
           "recordingId": "b9e9abca-e33f-4a0f-b20d-d996d7732e47",
           "callId": "ea364eb8-d7f5-4dc7-971b-f70c6ca52635",
           "participantId": "e0df1008-e4ea-4788-962c-933911881f3b",
           "participantType": "traveler",
           "recordingLocation": "recordings/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/c8cc3508-b5dc-4de4-86c0-b519d58be66e/b9e9abca-e33f-4a0f-b20d-d996d7732e47.wav",
           "recordingLocationType": "S3",
           "recordingDurationSec": "213",
           "recordingChannels": 2,
           "recordingProcessingType": "PostCall",
           "recordingStartTime": "1612219563000",
           "isTranscriptionPresent": false
       }
    ]
  2. To fetch a particular recording for a conversation:

    GET → /partners/{pguid}/conversations/{cguid}/recordings/{r_guid}

    Request : {Base_url}/partners/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/conversations/c8cc3508-b5dc-4de4-86c0-b519d58be66e/recordings/d9c2264d-09b6-4dbd-ab7b-2dba8f944a88

    Response :

    {
       "metadata": {
           "id": "d9c2264d-09b6-4dbd-ab7b-2dba8f944a88",
           "resourceType": "voicerecording",
           "schemaUri": "/schemas/cp.voicerecording_v4.VoiceRecording",
           "container": {
               "resourceUri": "/partners/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/conversations/c8cc3508-b5dc-4de4-86c0-b519d58be66e/"
           },
           "audit": {
               "createdOn": "2021-02-01T22:49:44.743Z",
               "createdBy": "SERVICE : 2031ec1lcs5182vkd7hko5aael",
               "updatedOn": "2021-02-01T22:49:44.743Z",
               "updatedBy": "SERVICE : 2031ec1lcs5182vkd7hko5aael"
           },
           "state": "ACTIVE",
           "resourceUri": "/partners/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/conversations/c8cc3508-b5dc-4de4-86c0-b519d58be66e/recordings/d9c2264d-09b6-4dbd-ab7b-2dba8f944a88"
       },
       "id": "d9c2264d-09b6-4dbd-ab7b-2dba8f944a88",
       "partnerId": "d84a7cfd-add2-40bf-9f5f-04622ba0d8fd",
       "conversationId": "c8cc3508-b5dc-4de4-86c0-b519d58be66e",
       "recordingId": "b9e9abca-e33f-4a0f-b20d-d996d7732e47",
       "callId": "ea364eb8-d7f5-4dc7-971b-f70c6ca52635",
       "participantId": "e0df1008-e4ea-4788-962c-933911881f3b",
       "participantType": "traveler",
       "recordingLocation": "recordings/d84a7cfd-add2-40bf-9f5f-04622ba0d8fd/c8cc3508-b5dc-4de4-86c0-b519d58be66e/b9e9abca-e33f-4a0f-b20d-d996d7732e47.wav",
       "recordingLocationType": "S3",
       "recordingDurationSec": "213",
       "recordingChannels": 2,
       "recordingProcessingType": "PostCall",
       "recordingStartTime": "1612219563000",
       "isTranscriptionPresent": false
    }

Voice transcription

The voice recording system stores recorded customer calls with Expedia. You can transcribe recordings and use them to identify customer intents or support on-call agents. To do so, use a third-party voice provider like Google, Amazon, or DeepGram.

Voice recording playback

Playback lets you serve the voice recording for a particular conversation for internal quality and monitoring purposes.

Plug the streaming API below into your web audio tag. Currently, this API is consumed in analytics console for playback. It’s protected by RBAC and business location context.

API

  • GET → /partners/{partnerId}/conversations/{conversationId}/recordings/{recordingId}/play

Configurations in voice recording systems

Clean up expired recordings

The voice recording system supports separate expiry times for each partner, which are configurable as resources inside RM. A cleanup job runs on a schedule to delete any expired recordings for each partner.

Upcoming configuration capabilities

  • Partner-specific action on recordings - The current design will be extended to configure for your specific requirements.

Examples:

  • You can opt not to use steps of the default recording pipeline like redaction, transcription, or sentiment analysis.
  • You can choose where recordings are stored.
  • You can configure notifications or events to send at different stages of the pipeline. For example, you can send a notification to do further processing when transcription is completed.
  • To incorporate this extension, the current workflow JSON will include a condition that checks whether the current step needs to be executed and a notification webhook that stores partner-level processing apart from default.

  • Recording skills configurability - Currently, recording is enabled by default for voice calls. In the future, you’ll be able to toggle recording on or off using the following options within the recording skill:

    1. Customer profile: Configure 'other rules' of recording based on caller information.
    2. Agent profile: Configure 'other rules' of recording based on agent.
    3. Location: Opt in or out of recordings at start of call based on location. For example, you may wish to skip welcome message for a certain locale.
    4. Reduce recordings: For example, you may want to decrease recording from 100 percent per country (default) to 30 percent.