Text Conversations with Amazon's Alexa

/media/uploads/amussa/flow_diagram1.png

Introduction

Project for ECE 4180, Georgia Insititute of Technology.

Project by: Allie Alexander Mussa and Jonathan Osei-Owusu

The Amazon Echo and Echo Dot devices host an AI, Alexa, whom serves as a personal assistant. Alexa's only form of communication is via voice commands through the systems microphone array. Communicating with Alexa can be an issue if the user is not near the Amazon Echo device. A solution to this problem is text conversations. Being able to send Alexa a text message via cell phone or an instant message via pc allows for constant communication with Alexa, increasing productivity and usability of the Amazon Echo. This form of communication is also considerately more secure, as microphones are not actively listening for commands and will not be able to "adapt(s) to your speech patterns" (Alexa App Description, Apple App Store).

As a proof of concept, it was sought to develop a system that took raw text data, passed it to Alexa for processing, and returned Alexa's response as a text string. After some research, it was easy to find that Alexa's whole being is based on a voice recognition software that converts speech to text. To overcome this, it was chosen to use a text-to-speech module, the EMIC 2: https://developer.mbed.org/users/4180_1/notebook/emic-2-text-to-speech-engine/ .

The Adafruit FONA 800: https://developer.mbed.org/users/jbaker66/notebook/adafruit-fona-800-minigsm/ was used to send and receive text messages.

A windows application was developed for pc chat with Alexa. The communication between the LPC1768 and the application was hosted on a local HTTP Server. To connect to the HTTP server, the mbed used the Adafruit HUZZAH ESP8266: https://developer.mbed.org/users/ausdong/notebook/using-the-adafruit-huzzah-esp8266-to-add-wi-fi-to-/ .

A demonstration video of the project can be seen below.

NOTE: This mbed is a proof of concept and is not finalized. Areas for improvement can be seen in the section "Areas for Improvement" below. Explanations of what was done to overcome obstacles temporarily to demonstrate proof of concept are held within the comments in the below linked code.

Code

Import programAlexa_Text

Allows SMS to be sent to query Alexa. Also allows HTTP requests to be sent to Alexa via an ESP8266-hosted web server.


Pin Assignments


Overview Photo


/media/uploads/amussa/img_0364.jpg

Adafruit Fona 800*`*

LPC1768FONA 800EXT. Device
N/CBatLi Rechg Bat*
N/CVIO
N/CMicroUSBUSB Pwr Trnsfrm*
GNDGND
N/CSPKR+
N/CSPKR-
P12RST
N/CPS
GNDKEY
P11RI
P14 (RX)TX
P13 (TX)RX
N/CNS

"*"NOTE: Device was powered via MicroUSB port for battery charging. 5V Power Transformer/Converter used with MicroUSB Cable.

"*`*"For more information on device, visit link in section "Introduction" above.


Adafruit HUZZAH ESP8266*`*

LPC1768HUZZAHEXT. 5V DC SUPPLY*
GNDGNDGND
P27 (RX)TX
P28 (TX)RX
V+5V DC

"*"NOTE: Device was powered via External 5V DC Power Supply with current rating greater than 500 mA.

"*`*"For more information on device, visit link in section "Introduction" above.


EMIC 2*`*

LPC1768EMIC 23.5mm SparkfFun**EXT. SPEAKEREXT. 5V DC SUPPLY *
5V5V DC
GND)GNDSLEEVE, TIPGND(Through 3.5mm)GND
P9 (TX)SIN
P10 (RX)SOUT
SP+RING1LEFT(3.5mm cable)
SP-RING2RIGHT(3.5mm cable)
  • External Speaker used for better audio quality of the EMIC 2 AI for clearer communications with Alexa.

"*"NOTE: Device was powered via External 5V DC Power Supply with current rating greater than 500 mA.

"*`*"For more information on EMIC 2, visit link in section "Introduction" above. For more information on 3.5 mm SparkFun Breakout, visit https://www.sparkfun.com/products/11570

Areas for Improvement

Elimination of Text to Speech

It was originally planned to pass the text commands, either via Windows Application or Text Message, to the LPC1768 and directly to the Amazon Alexa via a Custom Alexa Skill in conjunction with a lambda function. This, however, proved difficulty as Alexa's being is revolved around the AI's speech to text signal processing algorithm and seeks a audio file.

It was thought that if no way was found to pass text, the EMIC 2 output SP+ and SP- could be summed and sampled in the LPC1768 via the AnalogIn pin, saved as an appropriate audio file extension, and the saved file could be sent to Alexa for processing.

C# GUI

Upon typing a recognizable command in the C# GUI's chatbox, a web browser (i.e. Google Chrome) appears and posts a URL query. The GUI currently is incapable of sending a POST request to the ESP8266-hosted web server, without needing to open a dedicated browser. As an improvement, text strings should be able to propagate a POST request, without needing to open a dedicated web browser

Actual text responses

For this proof of concept, some pre-programmed responses were captured in the LPC1768 code and sent as a response to the question asked. In an ideal situation, a Custom Alexa skill in conjunction with a lambda function would gather Alexa's response as a text string and send the response directly to the LPC1768.


Please log in to post comments.