System Architecture Diagram

When a customer call comes in, the call enters the SoftSwitch dialplan. Within the dialplan, a curl command is executed to call an API endpoint provided by EasyCallCenter365, sending the call UUID, caller number, callee number, and other related information.
At this point, the call is taken over and managed by EasyCallCenter365.
EasyCallCenter365 starts call recording and attempts to establish a connection with the large language model (LLM).
The LLM returns responses through a streaming HTTP response. While EasyCallCenter365 receives the text stream, it simultaneously sends the text to the speak command for text-to-speech synthesis.
After the SoftSwitch TTS module receives the speech synthesis command, it extracts the text parameters, connects to the TTS server, and sends a speech synthesis request.
Since text is continuously generated throughout the process, the TTS module sends text for synthesis while simultaneously receiving the synthesized audio stream, decoding the audio, and playing it back in real time.
Throughout the entire call, the SoftSwitch ASR module continuously performs real-time speech recognition.

The ASR module is responsible for sending the audio stream and receiving the speech recognition result text.
Through Event Socket messages, SoftSwitch sends the recognized speech text back to EasyCallCenter365.
EasyCallCenter365 packages the speech recognition result together with the previous conversation context and sends it to the LLM.
The process then continues in a loop until the phone call ends.

No comments to display

Back to top