Repository: Olney1/ChatGPT-OpenAI-Smart-Speaker
Branch: main
Commit: c00deccbefd5
Files: 17
Total size: 62.7 KB
Directory structure:
gitextract_x984fnsf/
├── .gitattributes
├── .github/
│ └── FUNDING.yml
├── .gitignore
├── LICENSE
├── README.md
├── alexa_led_pattern.py
├── apa102.py
├── chat.py
├── create_messages.py
├── deprecated/
│ └── smart_speaker.py
├── pi.py
├── requirements.txt
├── requirements_mac.txt
├── test_agent.py
└── wake_words/
└── custom_model/
├── Jeffers_Mac.ppn
├── Jeffers_Pi.ppn
└── LICENSE.txt
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitattributes
================================================
*.mp4 filter=lfs diff=lfs merge=lfs -text
videos/long_demo.mp4 filter=lfs diff=lfs merge=lfs -text
================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms
github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
polar: # Replace with a single Polar username
buy_me_a_coffee: olney1
custom: ['https://ai-solutions.ai']
================================================
FILE: .gitignore
================================================
Credit to: https://djangowaves.com/tips-tricks/gitignore-for-a-django-project/
.DS_Store
# Ignore Pipfile and Pipfile.lock
Pipfile
Pipfile.lock
# Django #
*.log
*.pot
*.pyc
__pycache__
db.sqlite3
# Backup files #
*.bak
# If you are using PyCharm #
# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf
# AWS User-specific
.idea/**/aws.xml
# Generated files
.idea/**/contentModel.xml
# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml
# Gradle
.idea/**/gradle.xml
.idea/**/libraries
# File-based project format
*.iws
# IntelliJ
out/
# JIRA plugin
atlassian-ide-plugin.xml
# Python #
*.py[cod]
*$py.class
# Distribution / packaging
.Python build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
.pytest_cache/
nosetests.xml
coverage.xml
*.cover
.hypothesis/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery
celerybeat-schedule.*
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# mkdocs documentation
/site
# mypy
.mypy_cache/
# Sublime Text #
*.tmlanguage.cache
*.tmPreferences.cache
*.stTheme.cache
*.sublime-workspace
*.sublime-project
# sftp configuration file
sftp-config.json
# Package control specific files Package
Control.last-run
Control.ca-list
Control.ca-bundle
Control.system-ca-bundle
GitHub.sublime-settings
# Visual Studio Code #
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
.history
# Additional Removals
response.mp3
test.py
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2023 Ben
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# ChatGPT Smart Speaker (speech recognition and text-to-speech using OpenAI and Google Speech Recognition)


## Video Demo using activation word "Jeffers" - [Demo](https://vimeo.com/1029160996?share=copy#t=0)
<br>
<br>
## Equipment List:
## - [Raspberry Pi 4b 4GB](https://www.amazon.co.uk/Raspberry-Pi-Model-4GB/dp/B09TTNF8BT?_encoding=UTF8&tag=olney104-21 "Raspberry Pi 4b 4GB")
## - [VMini External USB Stereo Speaker](https://www.amazon.co.uk/Speakers-Computer-Speaker-Soundbar-Checkout/dp/B08NDJDFPS?_encoding=UTF8&tag=olney104-21 "VMini External USB Stereo Speaker")
## - [VReSpeaker 4-Mic Array](https://www.amazon.co.uk/Seeed-ReSpeaker-4-Mic-Array-Raspberry/dp/B076SSR1W1?&_encoding=UTF8&tag=olney104-21 "VReSpeaker 4-Mic Array")
## - [ANSMANN 10,000mAh Type-C 20W PD Power Bank](https://www.amazon.co.uk/Powerbank-10000mAh-capacity-Smartphones-rechargeable-Black/dp/B01NBNH2AL/?_encoding=UTF8&tag=olney104-21 "ANSMANN 10,000mAh Type-C 20W PD Power Bank")
<br>
## Running on your PC/Mac (use the chat.py or test.py script)
The `chat.py` and `test.py` scripts run directly on your PC/Mac. They both allow you to use speech recognition to input a prompt, send the prompt to OpenAI to generate a response, and then use gTTS to convert the response to an audio file and play the audio file on your Mac/PC. Your PC/Mac must have a working default microphone and speakers for this script to work. Please note that these scripts were designed on a Mac, so additional dependencies may be required on Windows and Linux. The difference between them is that `chat.py` is faster and always on and `test.py` acts like a standard smart speaker - only working once it hears the activation command (currently set to 'Jeffers').
<br>
## Running on Raspberry Pi (use the pi.py script)

The `pi.py` script is a new and more advanced custom version of the `smart_speaker.py` script and is the most advanced script similar to a real smart speaker. The purpose of this script is to offload the wake up word to a custom model build via PicoVoice (`https://console.picovoice.ai/`). This improves efficiency and long term usage reliability. This script will be the main script for development moving forward due to greater reliability and more advanced features to be added regularly.
<br>
## Prerequisites - chat.py
- You need to have a valid OpenAI API key. You can sign up for a free API key at https://platform.openai.com.
- You'll need to be running Python version 3.7.3 or higher. I am using 3.11.4 on a Mac and 3.7.3 on Raspberry Pi.
- Run `brew install portaudio` after installing HomeBrew: `/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"`
- You need to install the following packages: `openai`, `gTTS`, `pyaudio`, `SpeechRecognition`, `playsound, python-dotenv` and `pyobjc` if you are on a Mac. You can install these packages using pip or use pipenv if you wish to contain a virtual environment.
- Firstly, update your tools: `pip install --upgrade pip setuptools` then `pip install openai pyaudio SpeechRecognition gTTS playsound python-dotenv apa102-pi gpiozero pyobjc`
<br>
## Prerequisites - pi.py 
To run pi.py you will need a Raspberry Pi 4b (I'm using the 4GB model but 2GB should be enough), ReSpeaker 4-Mic Array for Raspberry Pi and USB speakers.
You will also need a developer account and API key with OpenAI (`https://platform.openai.com/overview`), a Tavily Search agent API key (`https://app.tavily.com/sign-in`) and an Access Key and Custom Voice Model with PicoVoice (`https://console.picovoice.ai/`) and (`https://console.picovoice.ai/ppn` respectively. Please create your own voice model and download the correct version for use on a Raspberry Pi)
Now on to the Pi setup. Let's get started!
Run the following on your Raspberry Pi terminal:
1. `sudo apt update`
2. `sudo apt install python3-gpiozero`
3. `git clone https://github.com/Olney1/ChatGPT-OpenAI-Smart-Speaker`
4. Firstly, update your tools: `pip install --upgrade pip setuptools` then `pip install openai pyaudio SpeechRecognition gTTS pydub python-dotenv apa102-pi gpiozero` Next, install the dependencies, `pip install -r requirements.txt`. I am using Python 3.9 `#!/usr/bin/env python3.9`. You can install these packages using pip or use pipenv if you wish to contain a virtual environment.
5. PyAudio relies on PortAudio as a dependency. You can install it using the following command: `sudo apt-get install portaudio19-dev`
6. Pydub dependencies: You need to have ffmpeg installed on your system. On a Raspberry Pi you can install it using: `sudo apt-get install ffmpeg`. You may also need simpleaudio if you run into issues with the script hanging when finding the wake word, so it's best to install these packages just in case: `sudo apt-get install python3-dev` (for development headers to compile) and `install simpleaudio` (for a different backend to play mp3 files) and `sudo apt-get install libasound2-dev` (necessary dependencies).
7. If you are using the RESPEAKER, follow this guide to install the required dependencies: (`https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/#getting-started`). Then install support for the lights on the RESPEAKER board. You'll need APA102 LED: `sudo apt install -y python3-rpi.gpio` and then `sudo pip3 install apa102-pi`.
8. Activate SPI: sudo raspi-config; Go to "Interface Options"; Go to "SPI"; Enable SPI; While you are at it: Do change the default password! Exit the tool and reboot.
9. Get the Seeed voice card source code, install and reboot:
`git clone https://github.com/HinTak/seeed-voicecard.git`
`cd seeed-voicecard`
`sudo ./install.sh`
`sudo reboot now`
10. Finally, load audio output on Raspberry Pi `sudo raspi-config`
-Select 1 System options
-Select S2 Audio
-Select your preferred Audio output device
-Select Finish
<br>
## Usage - applies to chat.py:
1. You'll need to set up the environment variables for your Open API Key. To do this create a `.env` file in the same directory and add your API Key to the file like this: `OPENAI_API_KEY="API KEY GOES HERE"`. This is safer than hard coding your API key into the program.
You must not change the name of the variable `OPENAI_API_KEY`.
2. Run the script using `python chat.py`.
3. The script will prompt you to say something. Speak a sentence into your microphone. You may need to allow the program permission to access your microphone on a Mac, a prompt should appear when running the program.
4. The script will send the spoken sentence to OpenAI, generate a response using the text-to-speech model, and play the response as an audio file.
<br>
## Usage - applies to pi.py
1. You'll need to set up the environment variables for your Open API Key, PicoVoice Access Key and Tavily API key for agent searches. To do this create a `.env` file in the same directory and add your API Keys to the file like this: `OPENAI_API_KEY="API KEY GOES HERE"` and `ACCESS_KEY="PICOVOICE ACCESS KEY GOES HERE"` and `TAVILY_API_KEY="API KEY GOES HERE"`. This is safer than hard coding your API key into the program.
2. Ensure that you have the `pi.py` script along with `apa102.py` and `alexa_led_pattern.py` scripts in the same folder saved on your Pi if using ReSpeaker.
3. Run the script using `python3 pi.py` or `python3 pi.py 2> /dev/null` on the Raspberry Pi. The second option omits all developer warnings and errors to keep the console focused purely on the print statements.
4. The script will prompt you to say the wake word which is programmed into the wake word custom model by Picovoice as 'Jeffers'. You can change this to any name you want. Once the wake word has been detected the lights will light up blue. It will now be ready for you to ask your question. When you have asked your question, or when the microphone picks up and processes noise, the lights will rotate a blue colour meaning that your recording sample/question is being sent to OpenAI.
5. The script will then generate a response using the text-to-speech model, and play the response as an audio file.
## Customisation
- You can change the OpenAI model engine by modifying the value of `model_engine`. For example, to use the "gpt-3.5-turbo" model for a cheaper and quicker response but with a knowledge cut-off to Sep 2021, set `model_engine = "gpt-3.5-turbo"`.
- You can change the language of the generated audio file by modifying the value of `language`. For example, to generate audio in French, set `language = 'fr'`.
- You can adjust the `temperature` parameter in the following line to control the randomness of the generated response:
```
response = client.chat.completions.create(
model=model_engine,
messages=[{"role": "system", "content": "You are a helpful smart speaker called Jeffers!"}, # Play about with more context here.
{"role": "user", "content": prompt}],
max_tokens=1024,
n=1,
temperature=0.7,
)
return response
```
Higher values of `temperature` will result in more diverse and random responses, while lower values will result in more deterministic responses.
<br>
## Important notes for Raspberry Pi Installation
As of May 2024, Seeed Studio has listed the ReSpeaker series among its [retired products](https://wiki.seeedstudio.com/discontinuedproducts/). It may not be compatible with the Raspberry Pi 5 due to hardware changes.
It is highly recommended to install the legacy version of Raspberry Pi on a Rasberry Pi 4b model if you have an ReSPEAKER. You can also simply buy a micro USB microphone and configure the input source for this using alsamixer and currently still use the ReSPEAKER for the lighting pattern.
If you are using the same USB speaker in my video you will need to run `sudo apt-get install pulseaudio` to install support for this. This may also require you to set a command to start pulseaudio on every boot: `pulseaudio --start`.
### Adding a Start Command on Boot
Open the terminal and type: `sudo nano /etc/rc. local`
After important network/start commands add this: `su -l pi -c '/usr/bin/python3 /home/pi/ChatGPT-OpenAI-Smart-Speaker/ && pulseaudio --start && python3 pi.py 2> /dev/null’`
Be sure to leave the line exit 0 at the end, then save the file and exit. In nano, to exit, type Ctrl-x, and then Y
### ReSpeaker
If you want to use ReSpeaker for the lights, you can purchase this from most of the major online stores that stock Raspberry Pi.
Here is the online guide: https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/
To test your microphone and speakers install Audacity on your Raspberry Pi:
`sudo apt update`
`sudo apt install audacity`
`audacity`
### Other Possible Issues
On the raspberry pi you may encounter an error regarding the installation of `flac`.
See here for the resolution: https://raspberrypi.stackexchange.com/questions/137630/im-unable-to-install-flac-on-my-raspberry-pi-3
The files you will need are going to be here: https://archive.raspbian.org/raspbian/pool/main/f/flac/
<br>Please note the links below may have changed or be updated, so please refer back to this link above for the latest file names and then update your command below.
`sudo apt-get install libogg0`
`$ wget https://archive.raspbian.org/raspbian/pool/main/f/flac/libflac8_1.3.2-3+deb10u3_armhf.deb`
`$ wget https://archive.raspbian.org/raspbian/pool/main/f/flac/flac_1.3.2-3+deb10u3_armhf.deb`
`$ sudo dpkg -i libflac8_1.3.2-3+deb10u3_armhf.deb`
`$ sudo dpkg -i flac_1.3.2-3+deb10u3_armhf.deb`
`$ which flac`
`/usr/bin/flac`
`sudo reboot`
`$ flac --version`
`flac 1.3.2`
You may find you need to install GStreamer if you encounter errors regarding Gst.
Install GStreamer: Open a terminal and run the following command to install GStreamer and its base plugins:
`sudo apt-get install gstreamer1.0-tools gstreamer1.0-plugins-base gstreamer1.0-plugins-good`
This installs the GStreamer core, along with a set of essential and good-quality plugins.
Next, you need to install the Python bindings for GStreamer. Use this command:
`sudo apt-get install python3-gst-1.0`
This command installs the GStreamer bindings for Python 3.
Install Additional GStreamer Plugins (if needed): Depending on the audio formats you need to work with, you might need additional GStreamer plugins. For example, to install plugins for MP3 playback, use:
`sudo apt-get install gstreamer1.0-plugins-ugly`
To quit a running script on Pi from boot: `ALT + PrtScSysRq (or Print button) + K`
<br>
## Credit to:
https://github.com/tinue/apa102-pi & Seeed Technology Limited for supplementary code.
<br>
## Read more about what is next for the project
https://medium.com/@ben_olney/openai-smart-speaker-with-raspberry-pi-5e284d21a53e
================================================
FILE: alexa_led_pattern.py
================================================
#!/usr/bin/env python
# Copyright (C) 2017 Seeed Technology Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
class AlexaLedPattern(object):
def __init__(self, show=None, number=12):
self.pixels_number = number
self.pixels = [0] * 4 * number
if not show or not callable(show):
def dummy(data):
pass
show = dummy
self.show = show
self.stop = False
def wakeup(self, direction=0):
position = int((direction + 15) / (360 / self.pixels_number)) % self.pixels_number
pixels = [0, 0, 0, 24] * self.pixels_number
pixels[position * 4 + 2] = 48
self.show(pixels)
def listen(self):
pixels = [0, 0, 0, 24] * self.pixels_number
self.show(pixels)
def think(self):
pixels = [0, 0, 12, 12, 0, 0, 0, 24] * self.pixels_number
while not self.stop:
self.show(pixels)
time.sleep(0.2)
pixels = pixels[-4:] + pixels[:-4]
def speak(self):
step = 1
position = 12
while not self.stop:
pixels = [0, 0, position, 24 - position] * self.pixels_number
self.show(pixels)
time.sleep(0.01)
if position <= 0:
step = 1
time.sleep(0.4)
elif position >= 12:
step = -1
time.sleep(0.4)
position += step
def off(self):
self.show([0] * 4 * 12)
================================================
FILE: apa102.py
================================================
"""
from https://github.com/tinue/APA102_Pi
This is the main driver module for APA102 LEDs
"""
import spidev
from math import ceil
RGB_MAP = { 'rgb': [3, 2, 1], 'rbg': [3, 1, 2], 'grb': [2, 3, 1],
'gbr': [2, 1, 3], 'brg': [1, 3, 2], 'bgr': [1, 2, 3] }
class APA102:
"""
Driver for APA102 LEDS (aka "DotStar").
(c) Martin Erzberger 2016-2017
My very first Python code, so I am sure there is a lot to be optimized ;)
Public methods are:
- set_pixel
- set_pixel_rgb
- show
- clear_strip
- cleanup
Helper methods for color manipulation are:
- combine_color
- wheel
The rest of the methods are used internally and should not be used by the
user of the library.
Very brief overview of APA102: An APA102 LED is addressed with SPI. The bits
are shifted in one by one, starting with the least significant bit.
An LED usually just forwards everything that is sent to its data-in to
data-out. While doing this, it remembers its own color and keeps glowing
with that color as long as there is power.
An LED can be switched to not forward the data, but instead use the data
to change it's own color. This is done by sending (at least) 32 bits of
zeroes to data-in. The LED then accepts the next correct 32 bit LED
frame (with color information) as its new color setting.
After having received the 32 bit color frame, the LED changes color,
and then resumes to just copying data-in to data-out.
The really clever bit is this: While receiving the 32 bit LED frame,
the LED sends zeroes on its data-out line. Because a color frame is
32 bits, the LED sends 32 bits of zeroes to the next LED.
As we have seen above, this means that the next LED is now ready
to accept a color frame and update its color.
So that's really the entire protocol:
- Start by sending 32 bits of zeroes. This prepares LED 1 to update
its color.
- Send color information one by one, starting with the color for LED 1,
then LED 2 etc.
- Finish off by cycling the clock line a few times to get all data
to the very last LED on the strip
The last step is necessary, because each LED delays forwarding the data
a bit. Imagine ten people in a row. When you yell the last color
information, i.e. the one for person ten, to the first person in
the line, then you are not finished yet. Person one has to turn around
and yell it to person 2, and so on. So it takes ten additional "dummy"
cycles until person ten knows the color. When you look closer,
you will see that not even person 9 knows its own color yet. This
information is still with person 2. Essentially the driver sends additional
zeroes to LED 1 as long as it takes for the last color frame to make it
down the line to the last LED.
"""
# Constants
MAX_BRIGHTNESS = 31 # Safeguard: Set to a value appropriate for your setup
LED_START = 0b11100000 # Three "1" bits, followed by 5 brightness bits
def __init__(self, num_led, global_brightness=MAX_BRIGHTNESS,
order='rgb', bus=0, device=1, max_speed_hz=8000000):
self.num_led = num_led # The number of LEDs in the Strip
order = order.lower()
self.rgb = RGB_MAP.get(order, RGB_MAP['rgb'])
# Limit the brightness to the maximum if it's set higher
if global_brightness > self.MAX_BRIGHTNESS:
self.global_brightness = self.MAX_BRIGHTNESS
else:
self.global_brightness = global_brightness
self.leds = [self.LED_START,0,0,0] * self.num_led # Pixel buffer
self.spi = spidev.SpiDev() # Init the SPI device
self.spi.open(bus, device) # Open SPI port 0, slave device (CS) 1
# Up the speed a bit, so that the LEDs are painted faster
if max_speed_hz:
self.spi.max_speed_hz = max_speed_hz
def clock_start_frame(self):
"""Sends a start frame to the LED strip.
This method clocks out a start frame, telling the receiving LED
that it must update its own color now.
"""
self.spi.xfer2([0] * 4) # Start frame, 32 zero bits
def clock_end_frame(self):
"""Sends an end frame to the LED strip.
As explained above, dummy data must be sent after the last real colour
information so that all of the data can reach its destination down the line.
The delay is not as bad as with the human example above.
It is only 1/2 bit per LED. This is because the SPI clock line
needs to be inverted.
Say a bit is ready on the SPI data line. The sender communicates
this by toggling the clock line. The bit is read by the LED
and immediately forwarded to the output data line. When the clock goes
down again on the input side, the LED will toggle the clock up
on the output to tell the next LED that the bit is ready.
After one LED the clock is inverted, and after two LEDs it is in sync
again, but one cycle behind. Therefore, for every two LEDs, one bit
of delay gets accumulated. For 300 LEDs, 150 additional bits must be fed to
the input of LED one so that the data can reach the last LED.
Ultimately, we need to send additional numLEDs/2 arbitrary data bits,
in order to trigger numLEDs/2 additional clock changes. This driver
sends zeroes, which has the benefit of getting LED one partially or
fully ready for the next update to the strip. An optimized version
of the driver could omit the "clockStartFrame" method if enough zeroes have
been sent as part of "clockEndFrame".
"""
# Round up num_led/2 bits (or num_led/16 bytes)
for _ in range((self.num_led + 15) // 16):
self.spi.xfer2([0x00])
def clear_strip(self):
""" Turns off the strip and shows the result right away."""
for led in range(self.num_led):
self.set_pixel(led, 0, 0, 0)
self.show()
def set_pixel(self, led_num, red, green, blue, bright_percent=100):
"""Sets the color of one pixel in the LED stripe.
The changed pixel is not shown yet on the Stripe, it is only
written to the pixel buffer. Colors are passed individually.
If brightness is not set the global brightness setting is used.
"""
if led_num < 0:
return # Pixel is invisible, so ignore
if led_num >= self.num_led:
return # again, invisible
# Calculate pixel brightness as a percentage of the
# defined global_brightness. Round up to nearest integer
# as we expect some brightness unless set to 0
brightness = ceil(bright_percent*self.global_brightness/100.0)
brightness = int(brightness)
# LED startframe is three "1" bits, followed by 5 brightness bits
ledstart = (brightness & 0b00011111) | self.LED_START
start_index = 4 * led_num
self.leds[start_index] = ledstart
self.leds[start_index + self.rgb[0]] = red
self.leds[start_index + self.rgb[1]] = green
self.leds[start_index + self.rgb[2]] = blue
def set_pixel_rgb(self, led_num, rgb_color, bright_percent=100):
"""Sets the color of one pixel in the LED stripe.
The changed pixel is not shown yet on the Stripe, it is only
written to the pixel buffer.
Colors are passed combined (3 bytes concatenated)
If brightness is not set the global brightness setting is used.
"""
self.set_pixel(led_num, (rgb_color & 0xFF0000) >> 16,
(rgb_color & 0x00FF00) >> 8, rgb_color & 0x0000FF,
bright_percent)
def rotate(self, positions=1):
""" Rotate the LEDs by the specified number of positions.
Treating the internal LED array as a circular buffer, rotate it by
the specified number of positions. The number could be negative,
which means rotating in the opposite direction.
"""
cutoff = 4 * (positions % self.num_led)
self.leds = self.leds[cutoff:] + self.leds[:cutoff]
def show(self):
"""Sends the content of the pixel buffer to the strip.
Todo: More than 1024 LEDs requires more than one xfer operation.
"""
self.clock_start_frame()
# xfer2 kills the list, unfortunately. So it must be copied first
# SPI takes up to 4096 Integers. So we are fine for up to 1024 LEDs.
self.spi.xfer2(list(self.leds))
self.clock_end_frame()
def cleanup(self):
"""Release the SPI device; Call this method at the end"""
self.spi.close() # Close SPI port
@staticmethod
def combine_color(red, green, blue):
"""Make one 3*8 byte color value."""
return (red << 16) + (green << 8) + blue
def wheel(self, wheel_pos):
"""Get a color from a color wheel; Green -> Red -> Blue -> Green"""
if wheel_pos > 255:
wheel_pos = 255 # Safeguard
if wheel_pos < 85: # Green -> Red
return self.combine_color(wheel_pos * 3, 255 - wheel_pos * 3, 0)
if wheel_pos < 170: # Red -> Blue
wheel_pos -= 85
return self.combine_color(255 - wheel_pos * 3, 0, wheel_pos * 3)
# Blue -> Green
wheel_pos -= 170
return self.combine_color(0, wheel_pos * 3, 255 - wheel_pos * 3)
def dump_array(self):
"""For debug purposes: Dump the LED array onto the console."""
print(self.leds)
================================================
FILE: chat.py
================================================
from openai import OpenAI
import os
import speech_recognition as sr
from gtts import gTTS
from playsound import playsound
from dotenv import load_dotenv
from pathlib import Path
# Load the environment variables
load_dotenv()
# Create an OpenAI API client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# Model name and language
model_engine = "gpt-4o"
language = 'en'
def recognise_speech():
# obtain audio from the microphone
r = sr.recogniser()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognise speech using Google Speech Recognition
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognise_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognise_google(audio)`
# convert the audio to text
print("Google Speech Recognition thinks you said: " + r.recognise_google(audio))
speech = r.recognise_google(audio)
print("This is what we think was said: " + speech)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
# Add a holding messsage like the one below to deal with current TTS delays until such time that TTS can be streamed.
playsound("sounds/holding.mp3") # There’s an optional second argument, block, which is set to True by default. Setting it to False makes the function run asynchronously.
return speech
def chatgpt_response(prompt):
# send the converted audio text to chatgpt
response = client.chat.completions.create(
model=model_engine,
messages=[{"role": "system", "content": "You are a helpful smart speaker called Jeffers!"},
{"role": "user", "content": prompt}],
max_tokens=300,
n=1,
temperature=0.7,
)
return response
def generate_audio_file(message):
speech_file_path = Path(__file__).parent / "response.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message
)
# response.content contains the binary audio data which we can write to a file and play
with open(speech_file_path, 'wb') as f:
f.write(response.content)
def play_audio_file():
# play the audio file
playsound("response.mp3") # There’s an optional second argument, block, which is set to True by default. Setting it to False makes the function run asynchronously.
def main():
# run the program
prompt = recognise_speech()
print(f"This is the prompt being sent to OpenAI: " + prompt)
responses = chatgpt_response(prompt)
message = responses.choices[0].message.content
print(message)
generate_audio_file(message)
play_audio_file()
if __name__ == "__main__":
main()
================================================
FILE: create_messages.py
================================================
from openai import OpenAI
import os
from dotenv import load_dotenv
"""Create your own professional messages with OpenAI for your speaker"""
# Load the environment variables
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def create_holding_message():
message = "One moment please"
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/holding.mp3")
def create_google_speech_issue():
message = "Sorry, there was an issue reaching Google Speech Recognition, please try again."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/google_issue.mp3")
def understand_speech_issue():
message = "Sorry, I didn't quite get that."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/understand.mp3")
def stop():
message = "No worries, I'll be here when you need me."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/stop.mp3")
def hello():
message = "Welcome, my name is Jeffers, I'm your helpful smart speaker. Just say my name and ask me anything."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/hello.mp3")
def create_picovoice_issue():
message = "Sorry, there was an issue with the PicoVoice Service."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/picovoice_issue.mp3")
def create_picture_message():
message = "Let me take a look through the camera."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/start_camera.mp3")
def start_picture_message():
message = "Hold steady....... I'm taking a photo now...... in ....... 3 ...... 2 ......... 1"
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/take_photo.mp3")
def agent_search():
message = "Let me do a quick search for you."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/agent.mp3")
def audio_issue():
message = "There was an issue opening the PyAudio stream on the device."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/audio_issue.mp3")
def tavily_key_error():
message = "I could not find your API key for the Tavily Search Service. Please ensure you update your .env file with a Tavily Search API key in order to use the agent."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/tavily_key_error.mp3")
def camera_issue():
message = "Sorry, there was an issue opening Pi Camera."
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message,
)
response.stream_to_file("sounds/camera_issue.mp3")
camera_issue()
================================================
FILE: deprecated/smart_speaker.py
================================================
import os
from openai import OpenAI
import pyaudio
import speech_recognition as sr
from gtts import gTTS
from dotenv import load_dotenv
import apa102
import threading
from gpiozero import LED
try:
import queue as Queue
except ImportError:
import Queue as Queue
from alexa_led_pattern import AlexaLedPattern
from pathlib import Path
from pydub import AudioSegment
from pydub.playback import play
import time
# Set the working directory for Pi if you want to run this code via rc.local script so that it is automatically running on Pi startup. Remove this line if you have installed this project in a different directory.
os.chdir('/home/pi/ChatGPT-OpenAI-Smart-Speaker')
# Set the pre-prompt configuration here to precede the user's question to enable OpenAI to understand that it's acting as a smart speaker and add any other required information. We will send this in the OpenAI call as part of the system content in messages.
pre_prompt = "You are a helpful smart speaker called Jeffers! Please respond with short and concise answers to the following user question and always remind the user at the end to say your name again to continue the conversation:"
# Load the environment variables
load_dotenv()
# Create an OpenAI API client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# Add 1 second silence globally due to initial buffering how pydub handles audio in memory
silence = AudioSegment.silent(duration=1000)
# load pixels Class
class Pixels:
PIXELS_N = 12
def __init__(self, pattern=AlexaLedPattern):
self.pattern = pattern(show=self.show)
self.dev = apa102.APA102(num_led=self.PIXELS_N)
self.power = LED(5)
self.power.on()
self.queue = Queue.Queue()
self.thread = threading.Thread(target=self._run)
self.thread.daemon = True
self.thread.start()
self.last_direction = None
def wakeup(self, direction=0):
self.last_direction = direction
def f():
self.pattern.wakeup(direction)
self.put(f)
def listen(self):
if self.last_direction:
def f():
self.pattern.wakeup(self.last_direction)
self.put(f)
else:
self.put(self.pattern.listen)
def think(self):
self.put(self.pattern.think)
def speak(self):
self.put(self.pattern.speak)
def off(self):
self.put(self.pattern.off)
def put(self, func):
self.pattern.stop = True
self.queue.put(func)
def _run(self):
while True:
func = self.queue.get()
self.pattern.stop = False
func()
def show(self, data):
for i in range(self.PIXELS_N):
self.dev.set_pixel(i, int(data[4*i + 1]), int(data[4*i + 2]), int(data[4*i + 3]))
self.dev.show()
pixels = Pixels()
# settings and keys
model_engine = "gpt-4o"
language = 'en'
def recognise_speech():
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
try:
pixels.off()
print("Listening...")
audio_stream = r.listen(source)
print("Waiting for wake word...")
# recognize speech using Google Speech Recognition
try:
# convert the audio to text
print("Google Speech Recognition thinks you said " + r.recognize_google(audio_stream))
speech = r.recognize_google(audio_stream)
print("Recognized Speech:", speech) # Print the recognized speech for debugging
words = speech.lower().split() # Split the speech into words
if "jeffers" not in words:
print("Wake word not detected in the speech")
return False
else:
print("Found wake word!")
# Add 1 second silence due to initial buffering how pydub handles audio in memory
silence = AudioSegment.silent(duration=1000)
start_audio_response = silence + AudioSegment.from_mp3("sounds/start.mp3")
play(start_audio_response)
return True
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
except KeyboardInterrupt:
print("Interrupted by User Keyboard")
pass
def speech():
r = sr.Recognizer()
with sr.Microphone() as source:
while True:
# Now we wake the LEDs to indicate the optimum moment now when the user can speak
pixels.wakeup()
try:
r.adjust_for_ambient_noise(source)
audio_stream = r.listen(source)
print("Waiting for user to speak...")
try:
speech_text = r.recognize_google(audio_stream)
pixels.off()
print("Google Speech Recognition thinks you said " + speech_text)
pixels.think()
return speech_text
except sr.UnknownValueError:
pixels.think()
print("Google Speech Recognition could not understand audio")
understand_error = AudioSegment.silent(duration=1000) + AudioSegment.from_mp3("sounds/understand.mp3")
play(understand_error)
time.sleep(4)
except sr.RequestError as e:
pixels.think()
print(f"Could not request results from Google Speech Recognition service; {e}")
audio_response = AudioSegment.silent(duration=1000) + AudioSegment.from_mp3("sounds/google_issue.mp3")
play(audio_response)
except KeyboardInterrupt:
print("Interrupted by User Keyboard")
break # This allows the user to still manually exit the loop with a keyboard interrupt
def chatgpt_response(prompt):
if prompt is not None:
# Add a holding messsage like the one below to deal with current TTS delays until such time that TTS can be streamed due to initial buffering how pydub handles audio in memory
silence = AudioSegment.silent(duration=1000)
holding_audio_response = silence + AudioSegment.from_mp3("sounds/holding.mp3")
play(holding_audio_response)
# send the converted audio text to chatgpt
response = client.chat.completions.create(
model=model_engine,
messages=[{"role": "system", "content": pre_prompt},
{"role": "user", "content": prompt}],
max_tokens=400,
n=1,
temperature=0.7,
)
# Whilst we are waiting for the response, we can play a checking message to improve the user experience.
checking_on_that = silence + AudioSegment.from_mp3("sounds/checking.mp3")
play(checking_on_that)
return response
else:
return None
def generate_audio_file(message):
speech_file_path = Path(__file__).parent / "response.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message
)
response.stream_to_file(speech_file_path)
def play_wake_up_audio():
# play the audio file and wake speaking LEDs
pixels.speak()
audio_response = silence + AudioSegment.from_mp3("response.mp3")
play(audio_response)
def main():
# run the program
# Indicate to the user that the device is ready
pixels.wakeup()
device_on = silence + AudioSegment.from_mp3("sounds/on.mp3")
play(device_on)
# Play the "Hello" audio file to welcome the user
hello = silence + AudioSegment.from_mp3("sounds/hello.mp3")
play(hello)
while True:
if recognise_speech():
prompt = speech()
print(f"This is the prompt being sent to OpenAI: {prompt}")
response = chatgpt_response(prompt)
if response is not None:
message = response.choices[0].message.content
print(message)
generate_audio_file(message)
play_wake_up_audio()
pixels.off()
else:
print("No prompt to send to OpenAI")
# We continue to listen for the wake word
else:
print("Speech was not recognised")
pixels.off()
if __name__ == "__main__":
main()
================================================
FILE: pi.py
================================================
#!/usr/bin/env python3.9
import os
import subprocess
from openai import OpenAI
import pyaudio
import alsaaudio
from datetime import datetime
import speech_recognition as sr
from gtts import gTTS
from dotenv import load_dotenv
import apa102
import threading
from gpiozero import LED
try:
import queue as Queue
except ImportError:
import Queue as Queue
from alexa_led_pattern import AlexaLedPattern
from pathlib import Path
from pydub import AudioSegment
from pydub.playback import play as pydub_play
import time
import pvporcupine
import struct
from picamera2 import Picamera2
import base64
from langchain_community.tools import TavilySearchResults
from langchain.agents import AgentType, initialize_agent
from langchain_openai import ChatOpenAI
from langchain.schema import SystemMessage
# Set the working directory for Pi if you want to run this code via rc.local script so that it is automatically running on Pi startup. Remove this line if you have installed this project in a different directory.
os.chdir('/home/pi/ChatGPT-OpenAI-Smart-Speaker')
# We add 0.5 second silence globally due to initial buffering how pydub handles audio in memory
silence = AudioSegment.silent(duration=500)
# This is our pre-prompt configuration to precede the user's question to enable OpenAI to understand that it's acting as a smart speaker and add any other required information. We will send this in the OpenAI call as part of the system content in messages.
pre_prompt = "You are a helpful smart speaker called Jeffers! Please respond with short and concise answers to the following user question and always remind the user at the end to say your name again to continue the conversation:"
# Load your keys and tokens here
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
try:
TAVILY_API_KEY = os.environ.get("TAVILY_API_KEY")
print(f"Tavily search API key found")
except:
print("Tavily search API key not found.")
tavily_key_not_found = silence + AudioSegment.from_mp3("sounds/tavily_key_error.mp3")
TAVILY_API_KEY = None
# We set the OpenAI model and language settings here for the route that follows general questions and questions with images. This is not for the agent route.
model_engine = "chatgpt-4o-latest"
language = 'en'
# Load the Tavily Search tool which the agent will use to answer questions about weather, news, and recent events.
tool = TavilySearchResults(
max_results=20,
include_answer=True,
include_raw_content=True,
include_images=False,
search_depth="advanced",
# include_domains = []
# exclude_domains = []
)
class Pixels:
PIXELS_N = 12
def __init__(self, pattern=AlexaLedPattern):
self.pattern = pattern(show=self.show)
self.dev = apa102.APA102(num_led=self.PIXELS_N)
self.power = LED(5)
self.power.on()
self.queue = Queue.Queue()
self.thread = threading.Thread(target=self._run)
self.thread.daemon = True
self.thread.start()
self.last_direction = None
def wakeup(self, direction=0):
self.last_direction = direction
def f():
self.pattern.wakeup(direction)
self.put(f)
def listen(self):
if self.last_direction:
def f():
self.pattern.wakeup(self.last_direction)
self.put(f)
else:
self.put(self.pattern.listen)
def think(self):
self.put(self.pattern.think)
def speak(self):
self.put(self.pattern.speak)
def off(self):
self.put(self.pattern.off)
def put(self, func):
self.pattern.stop = True
self.queue.put(func)
def _run(self):
while True:
func = self.queue.get()
self.pattern.stop = False
func()
def show(self, data):
for i in range(self.PIXELS_N):
self.dev.set_pixel(i, int(data[4*i + 1]), int(data[4*i + 2]), int(data[4*i + 3]))
self.dev.show()
# Instantiate the Pixels class
pixels = Pixels()
# Function to instantiate the PyAudio object for playing audio
def play(audio_segment):
pydub_play(audio_segment)
# This function is called first to detect the wake word "Jeffers" and then proceed to listen for the user's question.
def detect_wake_word():
# Here we use the Porcupine wake word detection engine to detect the wake word "Jeffers" and then proceed to listen for the user's question.
porcupine = None
pa = None
audio_stream = None
try:
# Path to the custom wake word .ppn file
custom_wake_word_path = os.path.join(os.path.dirname(__file__), 'wake_words', 'custom_model/Jeffers_Pi.ppn')
print(f"Wake word file path: {custom_wake_word_path}")
if not os.path.exists(custom_wake_word_path):
print(f"Error: Wake word file not found at {custom_wake_word_path}")
# Initialize Porcupine with the custom wake word
# You will need to obtain an access key from Picovoice to use Porcupine (https://console.picovoice.ai/). You can also create your own custom wake word model using the Picovoice Console.
try:
porcupine = pvporcupine.create(access_key=os.environ.get("ACCESS_KEY"), keyword_paths=[custom_wake_word_path])
except pvporcupine.PorcupineInvalidArgumentError as e:
print(f"Error creating Porcupine instance: {e}")
# Handle the error here
try:
pa = pyaudio.PyAudio()
audio_stream = pa.open(
rate=porcupine.sample_rate,
channels=1,
format=pyaudio.paInt16,
output_device_index=1,
input=True,
input_device_index=pa.get_default_input_device_info()["index"],
frames_per_buffer=porcupine.frame_length)
except:
print("Error with audio stream setup.")
error_response = silence + AudioSegment.from_mp3("sounds/audio_issue.mp3")
play(error_response)
while True:
pcm = audio_stream.read(porcupine.frame_length)
pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)
result = porcupine.process(pcm)
if result >= 0:
print("Wake word detected")
return True
except:
# Deal with any errors that may occur from using the PicoVoice Service (https://console.picovoice.ai/)
print("Error with wake word detection, Porcupine or the PicoVoice Service.")
error_response = silence + AudioSegment.from_mp3("sounds/picovoice_issue.mp3")
play(error_response)
finally:
if audio_stream is not None:
audio_stream.close()
if pa is not None:
pa.terminate()
if porcupine is not None:
porcupine.delete()
return False
# This function is called to use the Langchain search agent using the TavilySearchResults tool to answer questions about weather, news, and recent events.
def search_agent(speech_text):
today = datetime.today()
#! Update this location to your location
location = "Colchester, UK"
print(f"Today's date: {today}")
print(f"User's question understood via the search_agent function: {speech_text}")
search_results = tool.invoke({
'query': f"The current date is {today}, the user is based in {location} and the user wants to know {speech_text}. Keep responses short and concise. Do not respond with links to websites and do not read out website links, search deeper to find the answer. If the question is about weather, please use Celsius as a metric."
})
# Process the search results
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
# Prepare the content for the LLM
content = "\n".join([result['content'] for result in search_results])
# Use the LLM to summarise and extract relevant information
response = llm.invoke(f"""
Based on the following search results, provide a concise and relevant answer to the user's question: "{speech_text}"
Search results:
{content}
Please keep the response short, informative, and directly addressing the user's question. Do not mention sources or include any URLs.
""")
return response.content
# This function is called after the wake word is detected to listen for the user's question and then proceed to convert the speech to text.
def recognise_speech():
# Here we use the Google Speech Recognition engine to convert the user's question into text and then send it to OpenAI for a response.
r = sr.Recognizer()
with sr.Microphone() as source:
start_camera = silence + AudioSegment.from_mp3("sounds/start_camera.mp3")
take_photo = silence + AudioSegment.from_mp3("sounds/take_photo.mp3")
camera_shutter = silence + AudioSegment.from_mp3("sounds/camera_shutter.mp3")
agent_search = silence + AudioSegment.from_mp3("sounds/agent.mp3")
camera_issue = silence + AudioSegment.from_mp3("sounds/camera_issue.mp3")
print("Listening for your question...")
audio_stream = r.listen(source, timeout=5, phrase_time_limit=10)
print("Processing your question...")
try:
speech_text = r.recognize_google(audio_stream)
print("Google Speech Recognition thinks you said: " + speech_text)
# 1. Agent search route
if any(keyword in speech_text.lower() for keyword in ["activate search", "weather like today", "will it rain today", "latest news", "events are on"]):
print("Phrase 'activate search', 'weather like today', 'will it rain today', 'latest news', or 'events are on' detected. Using search agent.")
play(agent_search)
agent_response = search_agent(speech_text)
print("Agent response:", agent_response)
return agent_response, None, None
# 2. Image capture route
if "take a look" in speech_text.lower() or "turn on camera" in speech_text.lower() or "on the camera" in speech_text.lower():
print("Phrase 'take a look', 'turn on camera', or 'on the camera' detected.")
play(start_camera)
print("Getting ready to capture an image...")
play(take_photo)
try:
# Updated to use Picamera2, if you want to revert to PiCamera, please follow a previous version of this code and file on our GitHub repository.
camera = Picamera2()
# Configure the camera
camera_config = camera.create_still_configuration(main={"size": (640, 480)})
camera.configure(camera_config)
camera.start()
time.sleep(1) # Give the camera time to adjust
play(camera_shutter)
image_path = "captured_image.jpg"
camera.capture_file(image_path)
camera.stop()
camera.close()
print("Photo captured and saved as captured_image.jpg")
return None, image_path, speech_text
except Exception as e:
print(f"Pi camera error: {e}")
play(camera_issue)
return None, None, None
# 3. General speech route - no agent or image capture
return None, None, speech_text
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print(f"Could not request results from Google Speech Recognition service; {e}")
return None, None, None
# This route is called to send the user's general question to OpenAI's ChatGPT model and then play the response to the user.
def chatgpt_response(prompt):
# Here we send the user's question to OpenAI's ChatGPT model and then play the response to the user.
if prompt is not None:
try:
# Add a holding message like the one below to deal with current TTS delays until such time that TTS can be streamed due to initial buffering how pydub handles audio in memory
silence = AudioSegment.silent(duration=1000)
holding_audio_response = silence + AudioSegment.from_mp3("sounds/holding.mp3")
play(holding_audio_response)
# send the converted audio text to chatgpt
response = client.chat.completions.create(
model=model_engine,
messages=[{"role": "system", "content": pre_prompt}, {"role": "user", "content": prompt + "If the user's question involves browsing the web, local or national current or future events, or event that you are unaware of, news or weather, ALWAYS respond telling them to use the phrase 'activate search' before asking a question. If the users request is to take a photo, ALWAYS respond telling them to use the phrase 'take a look' followed by their request."}],
max_tokens=400,
n=1,
temperature=0.7,
)
# Whilst we are waiting for the response, we can play a checking message to improve the user experience.
checking_on_that = silence + AudioSegment.from_mp3("sounds/checking.mp3")
play(checking_on_that)
return response
except Exception as e:
# If there is an error, we can play a message to the user to indicate that there was an issue with the API call.
print(f"An API error occurred: {str(e)}")
error_message = silence + AudioSegment.from_mp3("sounds/openai_issue.mp3")
play(error_message)
return None
else:
return None
# This route is called to encode the image as base64 when an image is taken.
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# This route is called if the user's question also includes an image to send to OpenAI's ChatGPT model.
def chatgpt_response_with_image(prompt, image_path):
if prompt is not None:
try:
# Add a holding message like the one below to deal with current TTS delays until such time that TTS can be streamed due to initial buffering how pydub handles audio in memory
silence = AudioSegment.silent(duration=1000)
holding_audio_response = silence + AudioSegment.from_mp3("sounds/holding.mp3")
play(holding_audio_response)
# Encode the image as base64
base64_image = encode_image(image_path)
# Send the converted audio text and image to ChatGPT
response = client.chat.completions.create(
model=model_engine,
messages=[
{"role": "system", "content": pre_prompt},
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
max_tokens=400,
n=1,
temperature=0.7,
)
# Whilst we are waiting for the response, we can play a checking message to improve the user experience.
checking_on_that = silence + AudioSegment.from_mp3("sounds/checking.mp3")
play(checking_on_that)
return response
except Exception as e:
# If there is an error, we can play a message to the user to indicate that there was an issue with the API call.
print(f"An API error occurred: {str(e)}")
error_message = silence + AudioSegment.from_mp3("sounds/openai_issue.mp3")
play(error_message)
return None
else:
return None
# This route is called to generate an audio file on demand from the response from OpenAI's ChatGPT model.
def generate_audio_file(message):
# This is a standalone function to generate an audio file from the response from OpenAI's ChatGPT model.
speech_file_path = Path(__file__).parent / "response.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="fable",
input=message
)
response.stream_to_file(speech_file_path)
# This is a standalone function to which we can call to play the audio file and wake speaking LEDs to indicate that the smart speaker is responding to the user.
def play_response():
pixels.speak()
audio_response = silence + AudioSegment.from_mp3("response.mp3")
play(audio_response)
# This is the main function that runs the program and controls the flow.
def main():
# This is the main function that runs the program.
pixels.wakeup()
device_on = silence + AudioSegment.from_mp3("sounds/on.mp3")
play(device_on)
hello = silence + AudioSegment.from_mp3("sounds/hello.mp3")
play(hello)
pixels.off()
while True:
print("Waiting for wake word...")
if detect_wake_word():
pixels.listen() # Indicate that the speaker is listening
agent_response, image_path, speech_text = recognise_speech()
if agent_response:
print(f"Processed agent response: {agent_response}") # For debugging
generate_audio_file(agent_response)
play_response()
pixels.off()
if speech_text:
if image_path:
response = chatgpt_response_with_image(speech_text, image_path)
else:
response = chatgpt_response(speech_text)
if response:
message = response.choices[0].message.content
print(message)
generate_audio_file(message)
play_response()
pixels.off()
else:
print("No prompt to send to OpenAI")
pixels.off()
else:
print("Speech was not recognised or there was an error.")
pixels.off()
# After processing (or failure to process), the loop will continue, returning to wake word detection.
if __name__ == "__main__":
main()
================================================
FILE: requirements.txt
================================================
openai
pyaudio
python-alsa-audio
SpeechRecognition
gTTS
python-dotenv
apa102-pi
gpiozero
RPi.GPIO
alexa-led-pattern
pydub
pvporcupine
picamera
langchain-community
langchain
langchain-openai
langchainhub
================================================
FILE: requirements_mac.txt
================================================
# Requirements for simply testing the chat.py script on a Mac
openai
pyaudio
SpeechRecognition
gTTS
python-dotenv
pydub
pvporcupine
langchain-community
langchain
langchain-openai
langchainhub
PyObjC
ffmpeg
pydub
================================================
FILE: test_agent.py
================================================
from langchain_community.tools.tavily_search import TavilySearchResults
from datetime import datetime
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from openai import OpenAI
import os
load_dotenv()
model = ChatOpenAI(model="gpt-4")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
location = "Colchester, UK"
today = datetime.today().strftime('%A, %B %d, %Y')
print(f"Today is {today}")
search = TavilySearchResults(max_results=6)
search_results = search.invoke(f"What local events are not to be missed next week in {location}? The date is {today}.")
print(search_results)
# Now send the results to OpenAI for further processing
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Summarise the most up-to-date and applicable information from these search results."},
{"role": "user", "content": str(search_results)} # Convert search_results to a string
],
max_tokens=600,
n=1,
temperature=0.7,
)
print(response.choices[0].message.content)
================================================
FILE: wake_words/custom_model/LICENSE.txt
================================================
A copy of license terms is available at https://picovoice.ai/docs/terms-of-use/
gitextract_x984fnsf/
├── .gitattributes
├── .github/
│ └── FUNDING.yml
├── .gitignore
├── LICENSE
├── README.md
├── alexa_led_pattern.py
├── apa102.py
├── chat.py
├── create_messages.py
├── deprecated/
│ └── smart_speaker.py
├── pi.py
├── requirements.txt
├── requirements_mac.txt
├── test_agent.py
└── wake_words/
└── custom_model/
├── Jeffers_Mac.ppn
├── Jeffers_Pi.ppn
└── LICENSE.txt
SYMBOL INDEX (73 symbols across 6 files)
FILE: alexa_led_pattern.py
class AlexaLedPattern (line 19) | class AlexaLedPattern(object):
method __init__ (line 20) | def __init__(self, show=None, number=12):
method wakeup (line 32) | def wakeup(self, direction=0):
method listen (line 40) | def listen(self):
method think (line 45) | def think(self):
method speak (line 53) | def speak(self):
method off (line 69) | def off(self):
FILE: apa102.py
class APA102 (line 11) | class APA102:
method __init__ (line 77) | def __init__(self, num_led, global_brightness=MAX_BRIGHTNESS,
method clock_start_frame (line 95) | def clock_start_frame(self):
method clock_end_frame (line 104) | def clock_end_frame(self):
method clear_strip (line 136) | def clear_strip(self):
method set_pixel (line 144) | def set_pixel(self, led_num, red, green, blue, bright_percent=100):
method set_pixel_rgb (line 172) | def set_pixel_rgb(self, led_num, rgb_color, bright_percent=100):
method rotate (line 185) | def rotate(self, positions=1):
method show (line 196) | def show(self):
method cleanup (line 208) | def cleanup(self):
method combine_color (line 214) | def combine_color(red, green, blue):
method wheel (line 220) | def wheel(self, wheel_pos):
method dump_array (line 235) | def dump_array(self):
FILE: chat.py
function recognise_speech (line 18) | def recognise_speech():
function chatgpt_response (line 44) | def chatgpt_response(prompt):
function generate_audio_file (line 56) | def generate_audio_file(message):
function play_audio_file (line 67) | def play_audio_file():
function main (line 71) | def main():
FILE: create_messages.py
function create_holding_message (line 12) | def create_holding_message():
function create_google_speech_issue (line 26) | def create_google_speech_issue():
function understand_speech_issue (line 39) | def understand_speech_issue():
function stop (line 52) | def stop():
function hello (line 65) | def hello():
function create_picovoice_issue (line 78) | def create_picovoice_issue():
function create_picture_message (line 91) | def create_picture_message():
function start_picture_message (line 104) | def start_picture_message():
function agent_search (line 117) | def agent_search():
function audio_issue (line 129) | def audio_issue():
function tavily_key_error (line 141) | def tavily_key_error():
function camera_issue (line 153) | def camera_issue():
FILE: deprecated/smart_speaker.py
class Pixels (line 35) | class Pixels:
method __init__ (line 38) | def __init__(self, pattern=AlexaLedPattern):
method wakeup (line 49) | def wakeup(self, direction=0):
method listen (line 56) | def listen(self):
method think (line 64) | def think(self):
method speak (line 67) | def speak(self):
method off (line 70) | def off(self):
method put (line 73) | def put(self, func):
method _run (line 77) | def _run(self):
method show (line 83) | def show(self, data):
function recognise_speech (line 96) | def recognise_speech():
function speech (line 131) | def speech():
function chatgpt_response (line 163) | def chatgpt_response(prompt):
function generate_audio_file (line 185) | def generate_audio_file(message):
function play_wake_up_audio (line 194) | def play_wake_up_audio():
function main (line 200) | def main():
FILE: pi.py
class Pixels (line 66) | class Pixels:
method __init__ (line 69) | def __init__(self, pattern=AlexaLedPattern):
method wakeup (line 80) | def wakeup(self, direction=0):
method listen (line 87) | def listen(self):
method think (line 95) | def think(self):
method speak (line 98) | def speak(self):
method off (line 101) | def off(self):
method put (line 104) | def put(self, func):
method _run (line 108) | def _run(self):
method show (line 114) | def show(self, data):
function play (line 124) | def play(audio_segment):
function detect_wake_word (line 128) | def detect_wake_word():
function search_agent (line 186) | def search_agent(speech_text):
function recognise_speech (line 216) | def recognise_speech():
function chatgpt_response (line 278) | def chatgpt_response(prompt):
function encode_image (line 311) | def encode_image(image_path):
function chatgpt_response_with_image (line 316) | def chatgpt_response_with_image(prompt, image_path):
function generate_audio_file (line 368) | def generate_audio_file(message):
function play_response (line 379) | def play_response():
function main (line 385) | def main():
Condensed preview — 17 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (67K chars).
[
{
"path": ".gitattributes",
"chars": 99,
"preview": "*.mp4 filter=lfs diff=lfs merge=lfs -text\nvideos/long_demo.mp4 filter=lfs diff=lfs merge=lfs -text\n"
},
{
"path": ".github/FUNDING.yml",
"chars": 779,
"preview": "# These are supported funding model platforms\n\ngithub: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [u"
},
{
"path": ".gitignore",
"chars": 2023,
"preview": "Credit to: https://djangowaves.com/tips-tricks/gitignore-for-a-django-project/\n\n.DS_Store\n\n# Ignore Pipfile and Pipfile."
},
{
"path": "LICENSE",
"chars": 1060,
"preview": "MIT License\n\nCopyright (c) 2023 Ben\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof thi"
},
{
"path": "README.md",
"chars": 12926,
"preview": "# ChatGPT Smart Speaker (speech recognition and text-to-speech using OpenAI and Google Speech Recognition)\n\n![Jeff the s"
},
{
"path": "alexa_led_pattern.py",
"chars": 2007,
"preview": "#!/usr/bin/env python\n\n# Copyright (C) 2017 Seeed Technology Limited\n#\n# Licensed under the Apache License, Version 2.0 "
},
{
"path": "apa102.py",
"chars": 9632,
"preview": "\"\"\"\nfrom https://github.com/tinue/APA102_Pi\nThis is the main driver module for APA102 LEDs\n\"\"\"\nimport spidev\nfrom math i"
},
{
"path": "chat.py",
"chars": 2971,
"preview": "from openai import OpenAI\nimport os\nimport speech_recognition as sr\nfrom gtts import gTTS\nfrom playsound import playsoun"
},
{
"path": "create_messages.py",
"chars": 3639,
"preview": "from openai import OpenAI\nimport os\nfrom dotenv import load_dotenv\n\n\"\"\"Create your own professional messages with OpenAI"
},
{
"path": "deprecated/smart_speaker.py",
"chars": 8673,
"preview": "import os\nfrom openai import OpenAI\nimport pyaudio\nimport speech_recognition as sr\nfrom gtts import gTTS\nfrom dotenv imp"
},
{
"path": "pi.py",
"chars": 18802,
"preview": "#!/usr/bin/env python3.9\nimport os\nimport subprocess\nfrom openai import OpenAI\nimport pyaudio\nimport alsaaudio\nfrom date"
},
{
"path": "requirements.txt",
"chars": 202,
"preview": "openai\npyaudio\npython-alsa-audio\nSpeechRecognition\ngTTS\npython-dotenv\napa102-pi\ngpiozero\nRPi.GPIO\nalexa-led-pattern\npydu"
},
{
"path": "requirements_mac.txt",
"chars": 211,
"preview": "# Requirements for simply testing the chat.py script on a Mac\nopenai\npyaudio\nSpeechRecognition\ngTTS\npython-dotenv\npydub\n"
},
{
"path": "test_agent.py",
"chars": 1062,
"preview": "from langchain_community.tools.tavily_search import TavilySearchResults\nfrom datetime import datetime\nfrom langchain_ope"
},
{
"path": "wake_words/custom_model/LICENSE.txt",
"chars": 79,
"preview": "A copy of license terms is available at https://picovoice.ai/docs/terms-of-use/"
}
]
// ... and 2 more files (download for full content)
About this extraction
This page contains the full source code of the Olney1/ChatGPT-OpenAI-Smart-Speaker GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 17 files (62.7 KB), approximately 15.3k tokens, and a symbol index with 73 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.