Repository: Olney1/ChatGPT-OpenAI-Smart-Speaker Branch: main Commit: c00deccbefd5 Files: 17 Total size: 62.7 KB Directory structure: gitextract_x984fnsf/ ├── .gitattributes ├── .github/ │ └── FUNDING.yml ├── .gitignore ├── LICENSE ├── README.md ├── alexa_led_pattern.py ├── apa102.py ├── chat.py ├── create_messages.py ├── deprecated/ │ └── smart_speaker.py ├── pi.py ├── requirements.txt ├── requirements_mac.txt ├── test_agent.py └── wake_words/ └── custom_model/ ├── Jeffers_Mac.ppn ├── Jeffers_Pi.ppn └── LICENSE.txt ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitattributes ================================================ *.mp4 filter=lfs diff=lfs merge=lfs -text videos/long_demo.mp4 filter=lfs diff=lfs merge=lfs -text ================================================ FILE: .github/FUNDING.yml ================================================ # These are supported funding model platforms github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] patreon: # Replace with a single Patreon username open_collective: # Replace with a single Open Collective username ko_fi: # Replace with a single Ko-fi username tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry liberapay: # Replace with a single Liberapay username issuehunt: # Replace with a single IssueHunt username lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry polar: # Replace with a single Polar username buy_me_a_coffee: olney1 custom: ['https://ai-solutions.ai'] ================================================ FILE: .gitignore ================================================ Credit to: https://djangowaves.com/tips-tricks/gitignore-for-a-django-project/ .DS_Store # Ignore Pipfile and Pipfile.lock Pipfile Pipfile.lock # Django # *.log *.pot *.pyc __pycache__ db.sqlite3 # Backup files # *.bak # If you are using PyCharm # # User-specific stuff .idea/**/workspace.xml .idea/**/tasks.xml .idea/**/usage.statistics.xml .idea/**/dictionaries .idea/**/shelf # AWS User-specific .idea/**/aws.xml # Generated files .idea/**/contentModel.xml # Sensitive or high-churn files .idea/**/dataSources/ .idea/**/dataSources.ids .idea/**/dataSources.local.xml .idea/**/sqlDataSources.xml .idea/**/dynamic.xml .idea/**/uiDesigner.xml .idea/**/dbnavigator.xml # Gradle .idea/**/gradle.xml .idea/**/libraries # File-based project format *.iws # IntelliJ out/ # JIRA plugin atlassian-ide-plugin.xml # Python # *.py[cod] *$py.class # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache .pytest_cache/ nosetests.xml coverage.xml *.cover .hypothesis/ # Jupyter Notebook .ipynb_checkpoints # pyenv .python-version # celery celerybeat-schedule.* # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # mkdocs documentation /site # mypy .mypy_cache/ # Sublime Text # *.tmlanguage.cache *.tmPreferences.cache *.stTheme.cache *.sublime-workspace *.sublime-project # sftp configuration file sftp-config.json # Package control specific files Package Control.last-run Control.ca-list Control.ca-bundle Control.system-ca-bundle GitHub.sublime-settings # Visual Studio Code # .vscode/* !.vscode/settings.json !.vscode/tasks.json !.vscode/launch.json !.vscode/extensions.json .history # Additional Removals response.mp3 test.py ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2023 Ben Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # ChatGPT Smart Speaker (speech recognition and text-to-speech using OpenAI and Google Speech Recognition) ![Jeff the smart speaker](images/smart_speaker_pi.png) ![Jeff the smart speaker](images/v2.jpg) ## Video Demo using activation word "Jeffers" - [Demo](https://vimeo.com/1029160996?share=copy#t=0)

## Equipment List: ## - [Raspberry Pi 4b 4GB](https://www.amazon.co.uk/Raspberry-Pi-Model-4GB/dp/B09TTNF8BT?_encoding=UTF8&tag=olney104-21 "Raspberry Pi 4b 4GB") ## - [VMini External USB Stereo Speaker](https://www.amazon.co.uk/Speakers-Computer-Speaker-Soundbar-Checkout/dp/B08NDJDFPS?_encoding=UTF8&tag=olney104-21 "VMini External USB Stereo Speaker") ## - [VReSpeaker 4-Mic Array](https://www.amazon.co.uk/Seeed-ReSpeaker-4-Mic-Array-Raspberry/dp/B076SSR1W1?&_encoding=UTF8&tag=olney104-21 "VReSpeaker 4-Mic Array") ## - [ANSMANN 10,000mAh Type-C 20W PD Power Bank](https://www.amazon.co.uk/Powerbank-10000mAh-capacity-Smartphones-rechargeable-Black/dp/B01NBNH2AL/?_encoding=UTF8&tag=olney104-21 "ANSMANN 10,000mAh Type-C 20W PD Power Bank")
## Running on your PC/Mac (use the chat.py or test.py script) The `chat.py` and `test.py` scripts run directly on your PC/Mac. They both allow you to use speech recognition to input a prompt, send the prompt to OpenAI to generate a response, and then use gTTS to convert the response to an audio file and play the audio file on your Mac/PC. Your PC/Mac must have a working default microphone and speakers for this script to work. Please note that these scripts were designed on a Mac, so additional dependencies may be required on Windows and Linux. The difference between them is that `chat.py` is faster and always on and `test.py` acts like a standard smart speaker - only working once it hears the activation command (currently set to 'Jeffers').
## Running on Raspberry Pi (use the pi.py script) ![New](https://img.shields.io/badge/-NEW-green) The `pi.py` script is a new and more advanced custom version of the `smart_speaker.py` script and is the most advanced script similar to a real smart speaker. The purpose of this script is to offload the wake up word to a custom model build via PicoVoice (`https://console.picovoice.ai/`). This improves efficiency and long term usage reliability. This script will be the main script for development moving forward due to greater reliability and more advanced features to be added regularly.
## Prerequisites - chat.py - You need to have a valid OpenAI API key. You can sign up for a free API key at https://platform.openai.com. - You'll need to be running Python version 3.7.3 or higher. I am using 3.11.4 on a Mac and 3.7.3 on Raspberry Pi. - Run `brew install portaudio` after installing HomeBrew: `/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"` - You need to install the following packages: `openai`, `gTTS`, `pyaudio`, `SpeechRecognition`, `playsound, python-dotenv` and `pyobjc` if you are on a Mac. You can install these packages using pip or use pipenv if you wish to contain a virtual environment. - Firstly, update your tools: `pip install --upgrade pip setuptools` then `pip install openai pyaudio SpeechRecognition gTTS playsound python-dotenv apa102-pi gpiozero pyobjc`
## Prerequisites - pi.py ![New](https://img.shields.io/badge/-NEW-green) To run pi.py you will need a Raspberry Pi 4b (I'm using the 4GB model but 2GB should be enough), ReSpeaker 4-Mic Array for Raspberry Pi and USB speakers. You will also need a developer account and API key with OpenAI (`https://platform.openai.com/overview`), a Tavily Search agent API key (`https://app.tavily.com/sign-in`) and an Access Key and Custom Voice Model with PicoVoice (`https://console.picovoice.ai/`) and (`https://console.picovoice.ai/ppn` respectively. Please create your own voice model and download the correct version for use on a Raspberry Pi) Now on to the Pi setup. Let's get started! Run the following on your Raspberry Pi terminal: 1. `sudo apt update` 2. `sudo apt install python3-gpiozero` 3. `git clone https://github.com/Olney1/ChatGPT-OpenAI-Smart-Speaker` 4. Firstly, update your tools: `pip install --upgrade pip setuptools` then `pip install openai pyaudio SpeechRecognition gTTS pydub python-dotenv apa102-pi gpiozero` Next, install the dependencies, `pip install -r requirements.txt`. I am using Python 3.9 `#!/usr/bin/env python3.9`. You can install these packages using pip or use pipenv if you wish to contain a virtual environment. 5. PyAudio relies on PortAudio as a dependency. You can install it using the following command: `sudo apt-get install portaudio19-dev` 6. Pydub dependencies: You need to have ffmpeg installed on your system. On a Raspberry Pi you can install it using: `sudo apt-get install ffmpeg`. You may also need simpleaudio if you run into issues with the script hanging when finding the wake word, so it's best to install these packages just in case: `sudo apt-get install python3-dev` (for development headers to compile) and `install simpleaudio` (for a different backend to play mp3 files) and `sudo apt-get install libasound2-dev` (necessary dependencies). 7. If you are using the RESPEAKER, follow this guide to install the required dependencies: (`https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/#getting-started`). Then install support for the lights on the RESPEAKER board. You'll need APA102 LED: `sudo apt install -y python3-rpi.gpio` and then `sudo pip3 install apa102-pi`. 8. Activate SPI: sudo raspi-config; Go to "Interface Options"; Go to "SPI"; Enable SPI; While you are at it: Do change the default password! Exit the tool and reboot. 9. Get the Seeed voice card source code, install and reboot: `git clone https://github.com/HinTak/seeed-voicecard.git` `cd seeed-voicecard` `sudo ./install.sh` `sudo reboot now` 10. Finally, load audio output on Raspberry Pi `sudo raspi-config` -Select 1 System options -Select S2 Audio -Select your preferred Audio output device -Select Finish
## Usage - applies to chat.py: 1. You'll need to set up the environment variables for your Open API Key. To do this create a `.env` file in the same directory and add your API Key to the file like this: `OPENAI_API_KEY="API KEY GOES HERE"`. This is safer than hard coding your API key into the program. You must not change the name of the variable `OPENAI_API_KEY`. 2. Run the script using `python chat.py`. 3. The script will prompt you to say something. Speak a sentence into your microphone. You may need to allow the program permission to access your microphone on a Mac, a prompt should appear when running the program. 4. The script will send the spoken sentence to OpenAI, generate a response using the text-to-speech model, and play the response as an audio file.
## Usage - applies to pi.py 1. You'll need to set up the environment variables for your Open API Key, PicoVoice Access Key and Tavily API key for agent searches. To do this create a `.env` file in the same directory and add your API Keys to the file like this: `OPENAI_API_KEY="API KEY GOES HERE"` and `ACCESS_KEY="PICOVOICE ACCESS KEY GOES HERE"` and `TAVILY_API_KEY="API KEY GOES HERE"`. This is safer than hard coding your API key into the program. 2. Ensure that you have the `pi.py` script along with `apa102.py` and `alexa_led_pattern.py` scripts in the same folder saved on your Pi if using ReSpeaker. 3. Run the script using `python3 pi.py` or `python3 pi.py 2> /dev/null` on the Raspberry Pi. The second option omits all developer warnings and errors to keep the console focused purely on the print statements. 4. The script will prompt you to say the wake word which is programmed into the wake word custom model by Picovoice as 'Jeffers'. You can change this to any name you want. Once the wake word has been detected the lights will light up blue. It will now be ready for you to ask your question. When you have asked your question, or when the microphone picks up and processes noise, the lights will rotate a blue colour meaning that your recording sample/question is being sent to OpenAI. 5. The script will then generate a response using the text-to-speech model, and play the response as an audio file. ## Customisation - You can change the OpenAI model engine by modifying the value of `model_engine`. For example, to use the "gpt-3.5-turbo" model for a cheaper and quicker response but with a knowledge cut-off to Sep 2021, set `model_engine = "gpt-3.5-turbo"`. - You can change the language of the generated audio file by modifying the value of `language`. For example, to generate audio in French, set `language = 'fr'`. - You can adjust the `temperature` parameter in the following line to control the randomness of the generated response: ``` response = client.chat.completions.create( model=model_engine, messages=[{"role": "system", "content": "You are a helpful smart speaker called Jeffers!"}, # Play about with more context here. {"role": "user", "content": prompt}], max_tokens=1024, n=1, temperature=0.7, ) return response ``` Higher values of `temperature` will result in more diverse and random responses, while lower values will result in more deterministic responses.
## Important notes for Raspberry Pi Installation As of May 2024, Seeed Studio has listed the ReSpeaker series among its [retired products](https://wiki.seeedstudio.com/discontinuedproducts/). It may not be compatible with the Raspberry Pi 5 due to hardware changes. It is highly recommended to install the legacy version of Raspberry Pi on a Rasberry Pi 4b model if you have an ReSPEAKER. You can also simply buy a micro USB microphone and configure the input source for this using alsamixer and currently still use the ReSPEAKER for the lighting pattern. If you are using the same USB speaker in my video you will need to run `sudo apt-get install pulseaudio` to install support for this. This may also require you to set a command to start pulseaudio on every boot: `pulseaudio --start`. ### Adding a Start Command on Boot Open the terminal and type: `sudo nano /etc/rc. local` After important network/start commands add this: `su -l pi -c '/usr/bin/python3 /home/pi/ChatGPT-OpenAI-Smart-Speaker/ && pulseaudio --start && python3 pi.py 2> /dev/null’` Be sure to leave the line exit 0 at the end, then save the file and exit. In nano, to exit, type Ctrl-x, and then Y ### ReSpeaker If you want to use ReSpeaker for the lights, you can purchase this from most of the major online stores that stock Raspberry Pi. Here is the online guide: https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/ To test your microphone and speakers install Audacity on your Raspberry Pi: `sudo apt update` `sudo apt install audacity` `audacity` ### Other Possible Issues On the raspberry pi you may encounter an error regarding the installation of `flac`. See here for the resolution: https://raspberrypi.stackexchange.com/questions/137630/im-unable-to-install-flac-on-my-raspberry-pi-3 The files you will need are going to be here: https://archive.raspbian.org/raspbian/pool/main/f/flac/
Please note the links below may have changed or be updated, so please refer back to this link above for the latest file names and then update your command below. `sudo apt-get install libogg0` `$ wget https://archive.raspbian.org/raspbian/pool/main/f/flac/libflac8_1.3.2-3+deb10u3_armhf.deb` `$ wget https://archive.raspbian.org/raspbian/pool/main/f/flac/flac_1.3.2-3+deb10u3_armhf.deb` `$ sudo dpkg -i libflac8_1.3.2-3+deb10u3_armhf.deb` `$ sudo dpkg -i flac_1.3.2-3+deb10u3_armhf.deb` `$ which flac` `/usr/bin/flac` `sudo reboot` `$ flac --version` `flac 1.3.2` You may find you need to install GStreamer if you encounter errors regarding Gst. Install GStreamer: Open a terminal and run the following command to install GStreamer and its base plugins: `sudo apt-get install gstreamer1.0-tools gstreamer1.0-plugins-base gstreamer1.0-plugins-good` This installs the GStreamer core, along with a set of essential and good-quality plugins. Next, you need to install the Python bindings for GStreamer. Use this command: `sudo apt-get install python3-gst-1.0` This command installs the GStreamer bindings for Python 3. Install Additional GStreamer Plugins (if needed): Depending on the audio formats you need to work with, you might need additional GStreamer plugins. For example, to install plugins for MP3 playback, use: `sudo apt-get install gstreamer1.0-plugins-ugly` To quit a running script on Pi from boot: `ALT + PrtScSysRq (or Print button) + K`
## Credit to: https://github.com/tinue/apa102-pi & Seeed Technology Limited for supplementary code.
## Read more about what is next for the project https://medium.com/@ben_olney/openai-smart-speaker-with-raspberry-pi-5e284d21a53e ================================================ FILE: alexa_led_pattern.py ================================================ #!/usr/bin/env python # Copyright (C) 2017 Seeed Technology Limited # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import time class AlexaLedPattern(object): def __init__(self, show=None, number=12): self.pixels_number = number self.pixels = [0] * 4 * number if not show or not callable(show): def dummy(data): pass show = dummy self.show = show self.stop = False def wakeup(self, direction=0): position = int((direction + 15) / (360 / self.pixels_number)) % self.pixels_number pixels = [0, 0, 0, 24] * self.pixels_number pixels[position * 4 + 2] = 48 self.show(pixels) def listen(self): pixels = [0, 0, 0, 24] * self.pixels_number self.show(pixels) def think(self): pixels = [0, 0, 12, 12, 0, 0, 0, 24] * self.pixels_number while not self.stop: self.show(pixels) time.sleep(0.2) pixels = pixels[-4:] + pixels[:-4] def speak(self): step = 1 position = 12 while not self.stop: pixels = [0, 0, position, 24 - position] * self.pixels_number self.show(pixels) time.sleep(0.01) if position <= 0: step = 1 time.sleep(0.4) elif position >= 12: step = -1 time.sleep(0.4) position += step def off(self): self.show([0] * 4 * 12) ================================================ FILE: apa102.py ================================================ """ from https://github.com/tinue/APA102_Pi This is the main driver module for APA102 LEDs """ import spidev from math import ceil RGB_MAP = { 'rgb': [3, 2, 1], 'rbg': [3, 1, 2], 'grb': [2, 3, 1], 'gbr': [2, 1, 3], 'brg': [1, 3, 2], 'bgr': [1, 2, 3] } class APA102: """ Driver for APA102 LEDS (aka "DotStar"). (c) Martin Erzberger 2016-2017 My very first Python code, so I am sure there is a lot to be optimized ;) Public methods are: - set_pixel - set_pixel_rgb - show - clear_strip - cleanup Helper methods for color manipulation are: - combine_color - wheel The rest of the methods are used internally and should not be used by the user of the library. Very brief overview of APA102: An APA102 LED is addressed with SPI. The bits are shifted in one by one, starting with the least significant bit. An LED usually just forwards everything that is sent to its data-in to data-out. While doing this, it remembers its own color and keeps glowing with that color as long as there is power. An LED can be switched to not forward the data, but instead use the data to change it's own color. This is done by sending (at least) 32 bits of zeroes to data-in. The LED then accepts the next correct 32 bit LED frame (with color information) as its new color setting. After having received the 32 bit color frame, the LED changes color, and then resumes to just copying data-in to data-out. The really clever bit is this: While receiving the 32 bit LED frame, the LED sends zeroes on its data-out line. Because a color frame is 32 bits, the LED sends 32 bits of zeroes to the next LED. As we have seen above, this means that the next LED is now ready to accept a color frame and update its color. So that's really the entire protocol: - Start by sending 32 bits of zeroes. This prepares LED 1 to update its color. - Send color information one by one, starting with the color for LED 1, then LED 2 etc. - Finish off by cycling the clock line a few times to get all data to the very last LED on the strip The last step is necessary, because each LED delays forwarding the data a bit. Imagine ten people in a row. When you yell the last color information, i.e. the one for person ten, to the first person in the line, then you are not finished yet. Person one has to turn around and yell it to person 2, and so on. So it takes ten additional "dummy" cycles until person ten knows the color. When you look closer, you will see that not even person 9 knows its own color yet. This information is still with person 2. Essentially the driver sends additional zeroes to LED 1 as long as it takes for the last color frame to make it down the line to the last LED. """ # Constants MAX_BRIGHTNESS = 31 # Safeguard: Set to a value appropriate for your setup LED_START = 0b11100000 # Three "1" bits, followed by 5 brightness bits def __init__(self, num_led, global_brightness=MAX_BRIGHTNESS, order='rgb', bus=0, device=1, max_speed_hz=8000000): self.num_led = num_led # The number of LEDs in the Strip order = order.lower() self.rgb = RGB_MAP.get(order, RGB_MAP['rgb']) # Limit the brightness to the maximum if it's set higher if global_brightness > self.MAX_BRIGHTNESS: self.global_brightness = self.MAX_BRIGHTNESS else: self.global_brightness = global_brightness self.leds = [self.LED_START,0,0,0] * self.num_led # Pixel buffer self.spi = spidev.SpiDev() # Init the SPI device self.spi.open(bus, device) # Open SPI port 0, slave device (CS) 1 # Up the speed a bit, so that the LEDs are painted faster if max_speed_hz: self.spi.max_speed_hz = max_speed_hz def clock_start_frame(self): """Sends a start frame to the LED strip. This method clocks out a start frame, telling the receiving LED that it must update its own color now. """ self.spi.xfer2([0] * 4) # Start frame, 32 zero bits def clock_end_frame(self): """Sends an end frame to the LED strip. As explained above, dummy data must be sent after the last real colour information so that all of the data can reach its destination down the line. The delay is not as bad as with the human example above. It is only 1/2 bit per LED. This is because the SPI clock line needs to be inverted. Say a bit is ready on the SPI data line. The sender communicates this by toggling the clock line. The bit is read by the LED and immediately forwarded to the output data line. When the clock goes down again on the input side, the LED will toggle the clock up on the output to tell the next LED that the bit is ready. After one LED the clock is inverted, and after two LEDs it is in sync again, but one cycle behind. Therefore, for every two LEDs, one bit of delay gets accumulated. For 300 LEDs, 150 additional bits must be fed to the input of LED one so that the data can reach the last LED. Ultimately, we need to send additional numLEDs/2 arbitrary data bits, in order to trigger numLEDs/2 additional clock changes. This driver sends zeroes, which has the benefit of getting LED one partially or fully ready for the next update to the strip. An optimized version of the driver could omit the "clockStartFrame" method if enough zeroes have been sent as part of "clockEndFrame". """ # Round up num_led/2 bits (or num_led/16 bytes) for _ in range((self.num_led + 15) // 16): self.spi.xfer2([0x00]) def clear_strip(self): """ Turns off the strip and shows the result right away.""" for led in range(self.num_led): self.set_pixel(led, 0, 0, 0) self.show() def set_pixel(self, led_num, red, green, blue, bright_percent=100): """Sets the color of one pixel in the LED stripe. The changed pixel is not shown yet on the Stripe, it is only written to the pixel buffer. Colors are passed individually. If brightness is not set the global brightness setting is used. """ if led_num < 0: return # Pixel is invisible, so ignore if led_num >= self.num_led: return # again, invisible # Calculate pixel brightness as a percentage of the # defined global_brightness. Round up to nearest integer # as we expect some brightness unless set to 0 brightness = ceil(bright_percent*self.global_brightness/100.0) brightness = int(brightness) # LED startframe is three "1" bits, followed by 5 brightness bits ledstart = (brightness & 0b00011111) | self.LED_START start_index = 4 * led_num self.leds[start_index] = ledstart self.leds[start_index + self.rgb[0]] = red self.leds[start_index + self.rgb[1]] = green self.leds[start_index + self.rgb[2]] = blue def set_pixel_rgb(self, led_num, rgb_color, bright_percent=100): """Sets the color of one pixel in the LED stripe. The changed pixel is not shown yet on the Stripe, it is only written to the pixel buffer. Colors are passed combined (3 bytes concatenated) If brightness is not set the global brightness setting is used. """ self.set_pixel(led_num, (rgb_color & 0xFF0000) >> 16, (rgb_color & 0x00FF00) >> 8, rgb_color & 0x0000FF, bright_percent) def rotate(self, positions=1): """ Rotate the LEDs by the specified number of positions. Treating the internal LED array as a circular buffer, rotate it by the specified number of positions. The number could be negative, which means rotating in the opposite direction. """ cutoff = 4 * (positions % self.num_led) self.leds = self.leds[cutoff:] + self.leds[:cutoff] def show(self): """Sends the content of the pixel buffer to the strip. Todo: More than 1024 LEDs requires more than one xfer operation. """ self.clock_start_frame() # xfer2 kills the list, unfortunately. So it must be copied first # SPI takes up to 4096 Integers. So we are fine for up to 1024 LEDs. self.spi.xfer2(list(self.leds)) self.clock_end_frame() def cleanup(self): """Release the SPI device; Call this method at the end""" self.spi.close() # Close SPI port @staticmethod def combine_color(red, green, blue): """Make one 3*8 byte color value.""" return (red << 16) + (green << 8) + blue def wheel(self, wheel_pos): """Get a color from a color wheel; Green -> Red -> Blue -> Green""" if wheel_pos > 255: wheel_pos = 255 # Safeguard if wheel_pos < 85: # Green -> Red return self.combine_color(wheel_pos * 3, 255 - wheel_pos * 3, 0) if wheel_pos < 170: # Red -> Blue wheel_pos -= 85 return self.combine_color(255 - wheel_pos * 3, 0, wheel_pos * 3) # Blue -> Green wheel_pos -= 170 return self.combine_color(0, wheel_pos * 3, 255 - wheel_pos * 3) def dump_array(self): """For debug purposes: Dump the LED array onto the console.""" print(self.leds) ================================================ FILE: chat.py ================================================ from openai import OpenAI import os import speech_recognition as sr from gtts import gTTS from playsound import playsound from dotenv import load_dotenv from pathlib import Path # Load the environment variables load_dotenv() # Create an OpenAI API client client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) # Model name and language model_engine = "gpt-4o" language = 'en' def recognise_speech(): # obtain audio from the microphone r = sr.recogniser() with sr.Microphone() as source: print("Say something!") audio = r.listen(source) # recognise speech using Google Speech Recognition try: # for testing purposes, we're just using the default API key # to use another API key, use `r.recognise_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")` # instead of `r.recognise_google(audio)` # convert the audio to text print("Google Speech Recognition thinks you said: " + r.recognise_google(audio)) speech = r.recognise_google(audio) print("This is what we think was said: " + speech) except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError as e: print("Could not request results from Google Speech Recognition service; {0}".format(e)) # Add a holding messsage like the one below to deal with current TTS delays until such time that TTS can be streamed. playsound("sounds/holding.mp3") # There’s an optional second argument, block, which is set to True by default. Setting it to False makes the function run asynchronously. return speech def chatgpt_response(prompt): # send the converted audio text to chatgpt response = client.chat.completions.create( model=model_engine, messages=[{"role": "system", "content": "You are a helpful smart speaker called Jeffers!"}, {"role": "user", "content": prompt}], max_tokens=300, n=1, temperature=0.7, ) return response def generate_audio_file(message): speech_file_path = Path(__file__).parent / "response.mp3" response = client.audio.speech.create( model="tts-1", voice="fable", input=message ) # response.content contains the binary audio data which we can write to a file and play with open(speech_file_path, 'wb') as f: f.write(response.content) def play_audio_file(): # play the audio file playsound("response.mp3") # There’s an optional second argument, block, which is set to True by default. Setting it to False makes the function run asynchronously. def main(): # run the program prompt = recognise_speech() print(f"This is the prompt being sent to OpenAI: " + prompt) responses = chatgpt_response(prompt) message = responses.choices[0].message.content print(message) generate_audio_file(message) play_audio_file() if __name__ == "__main__": main() ================================================ FILE: create_messages.py ================================================ from openai import OpenAI import os from dotenv import load_dotenv """Create your own professional messages with OpenAI for your speaker""" # Load the environment variables load_dotenv() client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) def create_holding_message(): message = "One moment please" response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/holding.mp3") def create_google_speech_issue(): message = "Sorry, there was an issue reaching Google Speech Recognition, please try again." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/google_issue.mp3") def understand_speech_issue(): message = "Sorry, I didn't quite get that." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/understand.mp3") def stop(): message = "No worries, I'll be here when you need me." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/stop.mp3") def hello(): message = "Welcome, my name is Jeffers, I'm your helpful smart speaker. Just say my name and ask me anything." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/hello.mp3") def create_picovoice_issue(): message = "Sorry, there was an issue with the PicoVoice Service." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/picovoice_issue.mp3") def create_picture_message(): message = "Let me take a look through the camera." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/start_camera.mp3") def start_picture_message(): message = "Hold steady....... I'm taking a photo now...... in ....... 3 ...... 2 ......... 1" response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/take_photo.mp3") def agent_search(): message = "Let me do a quick search for you." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/agent.mp3") def audio_issue(): message = "There was an issue opening the PyAudio stream on the device." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/audio_issue.mp3") def tavily_key_error(): message = "I could not find your API key for the Tavily Search Service. Please ensure you update your .env file with a Tavily Search API key in order to use the agent." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/tavily_key_error.mp3") def camera_issue(): message = "Sorry, there was an issue opening Pi Camera." response = client.audio.speech.create( model="tts-1", voice="fable", input=message, ) response.stream_to_file("sounds/camera_issue.mp3") camera_issue() ================================================ FILE: deprecated/smart_speaker.py ================================================ import os from openai import OpenAI import pyaudio import speech_recognition as sr from gtts import gTTS from dotenv import load_dotenv import apa102 import threading from gpiozero import LED try: import queue as Queue except ImportError: import Queue as Queue from alexa_led_pattern import AlexaLedPattern from pathlib import Path from pydub import AudioSegment from pydub.playback import play import time # Set the working directory for Pi if you want to run this code via rc.local script so that it is automatically running on Pi startup. Remove this line if you have installed this project in a different directory. os.chdir('/home/pi/ChatGPT-OpenAI-Smart-Speaker') # Set the pre-prompt configuration here to precede the user's question to enable OpenAI to understand that it's acting as a smart speaker and add any other required information. We will send this in the OpenAI call as part of the system content in messages. pre_prompt = "You are a helpful smart speaker called Jeffers! Please respond with short and concise answers to the following user question and always remind the user at the end to say your name again to continue the conversation:" # Load the environment variables load_dotenv() # Create an OpenAI API client client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) # Add 1 second silence globally due to initial buffering how pydub handles audio in memory silence = AudioSegment.silent(duration=1000) # load pixels Class class Pixels: PIXELS_N = 12 def __init__(self, pattern=AlexaLedPattern): self.pattern = pattern(show=self.show) self.dev = apa102.APA102(num_led=self.PIXELS_N) self.power = LED(5) self.power.on() self.queue = Queue.Queue() self.thread = threading.Thread(target=self._run) self.thread.daemon = True self.thread.start() self.last_direction = None def wakeup(self, direction=0): self.last_direction = direction def f(): self.pattern.wakeup(direction) self.put(f) def listen(self): if self.last_direction: def f(): self.pattern.wakeup(self.last_direction) self.put(f) else: self.put(self.pattern.listen) def think(self): self.put(self.pattern.think) def speak(self): self.put(self.pattern.speak) def off(self): self.put(self.pattern.off) def put(self, func): self.pattern.stop = True self.queue.put(func) def _run(self): while True: func = self.queue.get() self.pattern.stop = False func() def show(self, data): for i in range(self.PIXELS_N): self.dev.set_pixel(i, int(data[4*i + 1]), int(data[4*i + 2]), int(data[4*i + 3])) self.dev.show() pixels = Pixels() # settings and keys model_engine = "gpt-4o" language = 'en' def recognise_speech(): # obtain audio from the microphone r = sr.Recognizer() with sr.Microphone() as source: try: pixels.off() print("Listening...") audio_stream = r.listen(source) print("Waiting for wake word...") # recognize speech using Google Speech Recognition try: # convert the audio to text print("Google Speech Recognition thinks you said " + r.recognize_google(audio_stream)) speech = r.recognize_google(audio_stream) print("Recognized Speech:", speech) # Print the recognized speech for debugging words = speech.lower().split() # Split the speech into words if "jeffers" not in words: print("Wake word not detected in the speech") return False else: print("Found wake word!") # Add 1 second silence due to initial buffering how pydub handles audio in memory silence = AudioSegment.silent(duration=1000) start_audio_response = silence + AudioSegment.from_mp3("sounds/start.mp3") play(start_audio_response) return True except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError as e: print("Could not request results from Google Speech Recognition service; {0}".format(e)) except KeyboardInterrupt: print("Interrupted by User Keyboard") pass def speech(): r = sr.Recognizer() with sr.Microphone() as source: while True: # Now we wake the LEDs to indicate the optimum moment now when the user can speak pixels.wakeup() try: r.adjust_for_ambient_noise(source) audio_stream = r.listen(source) print("Waiting for user to speak...") try: speech_text = r.recognize_google(audio_stream) pixels.off() print("Google Speech Recognition thinks you said " + speech_text) pixels.think() return speech_text except sr.UnknownValueError: pixels.think() print("Google Speech Recognition could not understand audio") understand_error = AudioSegment.silent(duration=1000) + AudioSegment.from_mp3("sounds/understand.mp3") play(understand_error) time.sleep(4) except sr.RequestError as e: pixels.think() print(f"Could not request results from Google Speech Recognition service; {e}") audio_response = AudioSegment.silent(duration=1000) + AudioSegment.from_mp3("sounds/google_issue.mp3") play(audio_response) except KeyboardInterrupt: print("Interrupted by User Keyboard") break # This allows the user to still manually exit the loop with a keyboard interrupt def chatgpt_response(prompt): if prompt is not None: # Add a holding messsage like the one below to deal with current TTS delays until such time that TTS can be streamed due to initial buffering how pydub handles audio in memory silence = AudioSegment.silent(duration=1000) holding_audio_response = silence + AudioSegment.from_mp3("sounds/holding.mp3") play(holding_audio_response) # send the converted audio text to chatgpt response = client.chat.completions.create( model=model_engine, messages=[{"role": "system", "content": pre_prompt}, {"role": "user", "content": prompt}], max_tokens=400, n=1, temperature=0.7, ) # Whilst we are waiting for the response, we can play a checking message to improve the user experience. checking_on_that = silence + AudioSegment.from_mp3("sounds/checking.mp3") play(checking_on_that) return response else: return None def generate_audio_file(message): speech_file_path = Path(__file__).parent / "response.mp3" response = client.audio.speech.create( model="tts-1", voice="fable", input=message ) response.stream_to_file(speech_file_path) def play_wake_up_audio(): # play the audio file and wake speaking LEDs pixels.speak() audio_response = silence + AudioSegment.from_mp3("response.mp3") play(audio_response) def main(): # run the program # Indicate to the user that the device is ready pixels.wakeup() device_on = silence + AudioSegment.from_mp3("sounds/on.mp3") play(device_on) # Play the "Hello" audio file to welcome the user hello = silence + AudioSegment.from_mp3("sounds/hello.mp3") play(hello) while True: if recognise_speech(): prompt = speech() print(f"This is the prompt being sent to OpenAI: {prompt}") response = chatgpt_response(prompt) if response is not None: message = response.choices[0].message.content print(message) generate_audio_file(message) play_wake_up_audio() pixels.off() else: print("No prompt to send to OpenAI") # We continue to listen for the wake word else: print("Speech was not recognised") pixels.off() if __name__ == "__main__": main() ================================================ FILE: pi.py ================================================ #!/usr/bin/env python3.9 import os import subprocess from openai import OpenAI import pyaudio import alsaaudio from datetime import datetime import speech_recognition as sr from gtts import gTTS from dotenv import load_dotenv import apa102 import threading from gpiozero import LED try: import queue as Queue except ImportError: import Queue as Queue from alexa_led_pattern import AlexaLedPattern from pathlib import Path from pydub import AudioSegment from pydub.playback import play as pydub_play import time import pvporcupine import struct from picamera2 import Picamera2 import base64 from langchain_community.tools import TavilySearchResults from langchain.agents import AgentType, initialize_agent from langchain_openai import ChatOpenAI from langchain.schema import SystemMessage # Set the working directory for Pi if you want to run this code via rc.local script so that it is automatically running on Pi startup. Remove this line if you have installed this project in a different directory. os.chdir('/home/pi/ChatGPT-OpenAI-Smart-Speaker') # We add 0.5 second silence globally due to initial buffering how pydub handles audio in memory silence = AudioSegment.silent(duration=500) # This is our pre-prompt configuration to precede the user's question to enable OpenAI to understand that it's acting as a smart speaker and add any other required information. We will send this in the OpenAI call as part of the system content in messages. pre_prompt = "You are a helpful smart speaker called Jeffers! Please respond with short and concise answers to the following user question and always remind the user at the end to say your name again to continue the conversation:" # Load your keys and tokens here load_dotenv() client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) try: TAVILY_API_KEY = os.environ.get("TAVILY_API_KEY") print(f"Tavily search API key found") except: print("Tavily search API key not found.") tavily_key_not_found = silence + AudioSegment.from_mp3("sounds/tavily_key_error.mp3") TAVILY_API_KEY = None # We set the OpenAI model and language settings here for the route that follows general questions and questions with images. This is not for the agent route. model_engine = "chatgpt-4o-latest" language = 'en' # Load the Tavily Search tool which the agent will use to answer questions about weather, news, and recent events. tool = TavilySearchResults( max_results=20, include_answer=True, include_raw_content=True, include_images=False, search_depth="advanced", # include_domains = [] # exclude_domains = [] ) class Pixels: PIXELS_N = 12 def __init__(self, pattern=AlexaLedPattern): self.pattern = pattern(show=self.show) self.dev = apa102.APA102(num_led=self.PIXELS_N) self.power = LED(5) self.power.on() self.queue = Queue.Queue() self.thread = threading.Thread(target=self._run) self.thread.daemon = True self.thread.start() self.last_direction = None def wakeup(self, direction=0): self.last_direction = direction def f(): self.pattern.wakeup(direction) self.put(f) def listen(self): if self.last_direction: def f(): self.pattern.wakeup(self.last_direction) self.put(f) else: self.put(self.pattern.listen) def think(self): self.put(self.pattern.think) def speak(self): self.put(self.pattern.speak) def off(self): self.put(self.pattern.off) def put(self, func): self.pattern.stop = True self.queue.put(func) def _run(self): while True: func = self.queue.get() self.pattern.stop = False func() def show(self, data): for i in range(self.PIXELS_N): self.dev.set_pixel(i, int(data[4*i + 1]), int(data[4*i + 2]), int(data[4*i + 3])) self.dev.show() # Instantiate the Pixels class pixels = Pixels() # Function to instantiate the PyAudio object for playing audio def play(audio_segment): pydub_play(audio_segment) # This function is called first to detect the wake word "Jeffers" and then proceed to listen for the user's question. def detect_wake_word(): # Here we use the Porcupine wake word detection engine to detect the wake word "Jeffers" and then proceed to listen for the user's question. porcupine = None pa = None audio_stream = None try: # Path to the custom wake word .ppn file custom_wake_word_path = os.path.join(os.path.dirname(__file__), 'wake_words', 'custom_model/Jeffers_Pi.ppn') print(f"Wake word file path: {custom_wake_word_path}") if not os.path.exists(custom_wake_word_path): print(f"Error: Wake word file not found at {custom_wake_word_path}") # Initialize Porcupine with the custom wake word # You will need to obtain an access key from Picovoice to use Porcupine (https://console.picovoice.ai/). You can also create your own custom wake word model using the Picovoice Console. try: porcupine = pvporcupine.create(access_key=os.environ.get("ACCESS_KEY"), keyword_paths=[custom_wake_word_path]) except pvporcupine.PorcupineInvalidArgumentError as e: print(f"Error creating Porcupine instance: {e}") # Handle the error here try: pa = pyaudio.PyAudio() audio_stream = pa.open( rate=porcupine.sample_rate, channels=1, format=pyaudio.paInt16, output_device_index=1, input=True, input_device_index=pa.get_default_input_device_info()["index"], frames_per_buffer=porcupine.frame_length) except: print("Error with audio stream setup.") error_response = silence + AudioSegment.from_mp3("sounds/audio_issue.mp3") play(error_response) while True: pcm = audio_stream.read(porcupine.frame_length) pcm = struct.unpack_from("h" * porcupine.frame_length, pcm) result = porcupine.process(pcm) if result >= 0: print("Wake word detected") return True except: # Deal with any errors that may occur from using the PicoVoice Service (https://console.picovoice.ai/) print("Error with wake word detection, Porcupine or the PicoVoice Service.") error_response = silence + AudioSegment.from_mp3("sounds/picovoice_issue.mp3") play(error_response) finally: if audio_stream is not None: audio_stream.close() if pa is not None: pa.terminate() if porcupine is not None: porcupine.delete() return False # This function is called to use the Langchain search agent using the TavilySearchResults tool to answer questions about weather, news, and recent events. def search_agent(speech_text): today = datetime.today() #! Update this location to your location location = "Colchester, UK" print(f"Today's date: {today}") print(f"User's question understood via the search_agent function: {speech_text}") search_results = tool.invoke({ 'query': f"The current date is {today}, the user is based in {location} and the user wants to know {speech_text}. Keep responses short and concise. Do not respond with links to websites and do not read out website links, search deeper to find the answer. If the question is about weather, please use Celsius as a metric." }) # Process the search results llm = ChatOpenAI(model="gpt-4o", temperature=0.7) # Prepare the content for the LLM content = "\n".join([result['content'] for result in search_results]) # Use the LLM to summarise and extract relevant information response = llm.invoke(f""" Based on the following search results, provide a concise and relevant answer to the user's question: "{speech_text}" Search results: {content} Please keep the response short, informative, and directly addressing the user's question. Do not mention sources or include any URLs. """) return response.content # This function is called after the wake word is detected to listen for the user's question and then proceed to convert the speech to text. def recognise_speech(): # Here we use the Google Speech Recognition engine to convert the user's question into text and then send it to OpenAI for a response. r = sr.Recognizer() with sr.Microphone() as source: start_camera = silence + AudioSegment.from_mp3("sounds/start_camera.mp3") take_photo = silence + AudioSegment.from_mp3("sounds/take_photo.mp3") camera_shutter = silence + AudioSegment.from_mp3("sounds/camera_shutter.mp3") agent_search = silence + AudioSegment.from_mp3("sounds/agent.mp3") camera_issue = silence + AudioSegment.from_mp3("sounds/camera_issue.mp3") print("Listening for your question...") audio_stream = r.listen(source, timeout=5, phrase_time_limit=10) print("Processing your question...") try: speech_text = r.recognize_google(audio_stream) print("Google Speech Recognition thinks you said: " + speech_text) # 1. Agent search route if any(keyword in speech_text.lower() for keyword in ["activate search", "weather like today", "will it rain today", "latest news", "events are on"]): print("Phrase 'activate search', 'weather like today', 'will it rain today', 'latest news', or 'events are on' detected. Using search agent.") play(agent_search) agent_response = search_agent(speech_text) print("Agent response:", agent_response) return agent_response, None, None # 2. Image capture route if "take a look" in speech_text.lower() or "turn on camera" in speech_text.lower() or "on the camera" in speech_text.lower(): print("Phrase 'take a look', 'turn on camera', or 'on the camera' detected.") play(start_camera) print("Getting ready to capture an image...") play(take_photo) try: # Updated to use Picamera2, if you want to revert to PiCamera, please follow a previous version of this code and file on our GitHub repository. camera = Picamera2() # Configure the camera camera_config = camera.create_still_configuration(main={"size": (640, 480)}) camera.configure(camera_config) camera.start() time.sleep(1) # Give the camera time to adjust play(camera_shutter) image_path = "captured_image.jpg" camera.capture_file(image_path) camera.stop() camera.close() print("Photo captured and saved as captured_image.jpg") return None, image_path, speech_text except Exception as e: print(f"Pi camera error: {e}") play(camera_issue) return None, None, None # 3. General speech route - no agent or image capture return None, None, speech_text except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError as e: print(f"Could not request results from Google Speech Recognition service; {e}") return None, None, None # This route is called to send the user's general question to OpenAI's ChatGPT model and then play the response to the user. def chatgpt_response(prompt): # Here we send the user's question to OpenAI's ChatGPT model and then play the response to the user. if prompt is not None: try: # Add a holding message like the one below to deal with current TTS delays until such time that TTS can be streamed due to initial buffering how pydub handles audio in memory silence = AudioSegment.silent(duration=1000) holding_audio_response = silence + AudioSegment.from_mp3("sounds/holding.mp3") play(holding_audio_response) # send the converted audio text to chatgpt response = client.chat.completions.create( model=model_engine, messages=[{"role": "system", "content": pre_prompt}, {"role": "user", "content": prompt + "If the user's question involves browsing the web, local or national current or future events, or event that you are unaware of, news or weather, ALWAYS respond telling them to use the phrase 'activate search' before asking a question. If the users request is to take a photo, ALWAYS respond telling them to use the phrase 'take a look' followed by their request."}], max_tokens=400, n=1, temperature=0.7, ) # Whilst we are waiting for the response, we can play a checking message to improve the user experience. checking_on_that = silence + AudioSegment.from_mp3("sounds/checking.mp3") play(checking_on_that) return response except Exception as e: # If there is an error, we can play a message to the user to indicate that there was an issue with the API call. print(f"An API error occurred: {str(e)}") error_message = silence + AudioSegment.from_mp3("sounds/openai_issue.mp3") play(error_message) return None else: return None # This route is called to encode the image as base64 when an image is taken. def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') # This route is called if the user's question also includes an image to send to OpenAI's ChatGPT model. def chatgpt_response_with_image(prompt, image_path): if prompt is not None: try: # Add a holding message like the one below to deal with current TTS delays until such time that TTS can be streamed due to initial buffering how pydub handles audio in memory silence = AudioSegment.silent(duration=1000) holding_audio_response = silence + AudioSegment.from_mp3("sounds/holding.mp3") play(holding_audio_response) # Encode the image as base64 base64_image = encode_image(image_path) # Send the converted audio text and image to ChatGPT response = client.chat.completions.create( model=model_engine, messages=[ {"role": "system", "content": pre_prompt}, { "role": "user", "content": [ { "type": "text", "text": prompt }, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" } } ] } ], max_tokens=400, n=1, temperature=0.7, ) # Whilst we are waiting for the response, we can play a checking message to improve the user experience. checking_on_that = silence + AudioSegment.from_mp3("sounds/checking.mp3") play(checking_on_that) return response except Exception as e: # If there is an error, we can play a message to the user to indicate that there was an issue with the API call. print(f"An API error occurred: {str(e)}") error_message = silence + AudioSegment.from_mp3("sounds/openai_issue.mp3") play(error_message) return None else: return None # This route is called to generate an audio file on demand from the response from OpenAI's ChatGPT model. def generate_audio_file(message): # This is a standalone function to generate an audio file from the response from OpenAI's ChatGPT model. speech_file_path = Path(__file__).parent / "response.mp3" response = client.audio.speech.create( model="tts-1", voice="fable", input=message ) response.stream_to_file(speech_file_path) # This is a standalone function to which we can call to play the audio file and wake speaking LEDs to indicate that the smart speaker is responding to the user. def play_response(): pixels.speak() audio_response = silence + AudioSegment.from_mp3("response.mp3") play(audio_response) # This is the main function that runs the program and controls the flow. def main(): # This is the main function that runs the program. pixels.wakeup() device_on = silence + AudioSegment.from_mp3("sounds/on.mp3") play(device_on) hello = silence + AudioSegment.from_mp3("sounds/hello.mp3") play(hello) pixels.off() while True: print("Waiting for wake word...") if detect_wake_word(): pixels.listen() # Indicate that the speaker is listening agent_response, image_path, speech_text = recognise_speech() if agent_response: print(f"Processed agent response: {agent_response}") # For debugging generate_audio_file(agent_response) play_response() pixels.off() if speech_text: if image_path: response = chatgpt_response_with_image(speech_text, image_path) else: response = chatgpt_response(speech_text) if response: message = response.choices[0].message.content print(message) generate_audio_file(message) play_response() pixels.off() else: print("No prompt to send to OpenAI") pixels.off() else: print("Speech was not recognised or there was an error.") pixels.off() # After processing (or failure to process), the loop will continue, returning to wake word detection. if __name__ == "__main__": main() ================================================ FILE: requirements.txt ================================================ openai pyaudio python-alsa-audio SpeechRecognition gTTS python-dotenv apa102-pi gpiozero RPi.GPIO alexa-led-pattern pydub pvporcupine picamera langchain-community langchain langchain-openai langchainhub ================================================ FILE: requirements_mac.txt ================================================ # Requirements for simply testing the chat.py script on a Mac openai pyaudio SpeechRecognition gTTS python-dotenv pydub pvporcupine langchain-community langchain langchain-openai langchainhub PyObjC ffmpeg pydub ================================================ FILE: test_agent.py ================================================ from langchain_community.tools.tavily_search import TavilySearchResults from datetime import datetime from langchain_openai import ChatOpenAI from dotenv import load_dotenv from openai import OpenAI import os load_dotenv() model = ChatOpenAI(model="gpt-4") client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) location = "Colchester, UK" today = datetime.today().strftime('%A, %B %d, %Y') print(f"Today is {today}") search = TavilySearchResults(max_results=6) search_results = search.invoke(f"What local events are not to be missed next week in {location}? The date is {today}.") print(search_results) # Now send the results to OpenAI for further processing response = client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": "Summarise the most up-to-date and applicable information from these search results."}, {"role": "user", "content": str(search_results)} # Convert search_results to a string ], max_tokens=600, n=1, temperature=0.7, ) print(response.choices[0].message.content) ================================================ FILE: wake_words/custom_model/LICENSE.txt ================================================ A copy of license terms is available at https://picovoice.ai/docs/terms-of-use/