0% found this document useful (0 votes)
32 views10 pages

Code in Voices

code

Uploaded by

nace.atanasov7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views10 pages

Code in Voices

code

Uploaded by

nace.atanasov7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 10

# tactiq.

io free youtube transcript


# Build Your Own GPT-4o Voice Assistant in Python with Groq, Llama3, OpenAI-TTS &
Faster-Whisper
# https://www.youtube.com/watch/pi6gr_YHSuc

00:00:00.120 voice assistants are the hottest thing


00:00:02.240 in AI right now open AI has just
00:00:04.799 strategically released their new GPT 40
00:00:08.119 multimodal Voice Assistant preview one
00:00:10.679 day before they knew Google was going to
00:00:12.880 release theirs it is very clear they are
00:00:15.080 hungry for it
00:00:17.400 Savage while we wait for the bureaucracy
00:00:20.240 of these Tech Giants to finish up their
00:00:22.359 code I'm going to show you how to code
00:00:24.400 your own low latency multimodal AI Voice
00:00:28.080 Assistant but before we get into the
00:00:29.720 code let's introduce you to your new AI
00:00:33.000 Voice
00:00:36.200 Assistant Jarvis I'm home welcome home
00:00:39.760 how was your day Jarvis it was great I
00:00:41.640 got some code pulled up on my screen can
00:00:43.360 you tell me briefly what this program is
00:00:46.640 it looks like you're working on a voice
00:00:48.399 assistant project in Python the code
00:00:50.559 appears to be a comprehensive framework
00:00:52.239 for building a multimodal AI Voice
00:00:54.239 Assistant incorporating various
00:00:56.320 functionalities such as processing voice
00:00:58.320 prompts image and webcam input clipboard
00:01:01.719 text audio input and speech output is
00:01:05.920 there a specific part of the code you'd
00:01:07.799 like me to explain or help you
00:01:11.159 with Jarvis yes I copied the program's
00:01:14.400 entire code can you tell me what library
00:01:17.159 this program is using for text to
00:01:20.520 speech the program is using open AI tts1
00:01:24.439 model for text to speech functionality
00:01:27.000 which is called through the speak
00:01:28.360 function Jarvis what do you see my
00:01:31.520 webcam it looks like I see a webcam
00:01:34.200 capture of a male adult smiling and
00:01:36.439 giving the middle finger wearing a brown
00:01:38.439 shirt and gray beanie the background
00:01:40.720 appears to be an attic with various
00:01:42.439 objects like a microphone cables tape
00:01:44.799 and
00:01:45.560 insulation the overall tone of the image
00:01:48.159 is coming across as rude and possibly
00:01:50.320 intended to be disrespectful y' don't
00:01:52.399 ever disrespect me looking like blance
00:01:54.119 from The Golden Girls now that Jarvis
00:01:56.079 has introduced herself let's jump
00:01:58.320 straight into the code tutorial first
00:02:00.600 create a folder to put your voice
00:02:02.399 assistant files open your voice
00:02:04.680 assistant folder in a code editor I will
00:02:07.240 be using vs code for this video inside
00:02:09.959 your projects folder start by creating a
00:02:12.640 file called
00:02:14.120 requirements.txt open that file and copy
00:02:17.519 the list of dependencies the same as
00:02:19.720 mine these will be the dependencies that
00:02:21.760 your Python program will need for this
00:02:23.879 voice assistant save this file and exit
00:02:26.200 out of it for those of you that do not
00:02:27.800 have the patience or the desire to go
00:02:30.040 through a Python Programming tutorial
00:02:32.160 video which is exactly what I'm aiming
00:02:34.400 to make with this video you can access
00:02:36.640 the source code to all of my videos in
00:02:39.000 the Pro membership channels of my
00:02:41.040 Discord having access to the pro
00:02:43.120 channels is not necessary to go through
00:02:45.560 this video tutorial as I will be showing
00:02:47.920 all of the code for the voice assistant
00:02:49.920 in the demo in this tutorial anyone a
00:02:52.239 part of my Pro membership not only has
00:02:54.599 access to my source code from this video
00:02:56.959 and all of my other tutorials but also a
00:02:59.840 Pro written tutorial before I even
00:03:02.080 release the videos videos like this take
00:03:04.440 a lot of time to create with this much
00:03:06.400 editing which is why I am able to
00:03:08.319 release Pro written tutorials like this
00:03:10.480 one before I can get a full video
00:03:12.720 tutorial on my YouTube channel to join
00:03:14.879 the AI Austin Pro membership click the
00:03:17.480 buy me a coffee Link in this video's
00:03:19.560 description if you don't already have
00:03:21.799 python 3.11 installed you'll need to
00:03:24.720 install that on your computer for this
00:03:26.599 voice assistant with python 3.11
00:03:29.319 installed
00:03:30.280 I can now open a new terminal inside of
00:03:32.599 my project folder and run this command
00:03:35.120 to install every library in the
00:03:37.519 requirements.txt file make sure you
00:03:40.360 don't get any error messages in this
00:03:42.879 important step if you do get any error
00:03:45.599 messages and don't understand them just
00:03:48.280 paste it into chat GPT and ask for an
00:03:51.319 explanation of what is causing the error
00:03:53.920 and what you need to fix it if you get
00:03:56.000 any errors specifically related to Pi
00:03:58.480 audio on a Windows machine you're going
00:04:00.920 to need to download the wheel file and
00:04:03.120 install it with that instead these wheel
00:04:05.400 files can be found at pypi dorg which is
00:04:08.799 the official python package index click
00:04:11.439 on the download files then find the
00:04:13.560 files with CP 311 in the name these are
00:04:16.918 the correct wheel files for python 3.11
00:04:20.639 now figure out if your Windows machine
00:04:22.919 is a 64-bit or 32-bit processor and
00:04:26.600 download the file with a correlating
00:04:28.479 number at the end of the file name once
00:04:31.039 the wheel file has been downloaded copy
00:04:33.440 the entire file path to that wheel file
00:04:36.720 now open command prompt as administrator
00:04:39.680 and type pip install and then paste in
00:04:42.240 the full path to your wheel file for pi
00:04:44.800 audio we now need three API Keys get
00:04:47.720 your Gro open aai and Google gen AI API
00:04:51.400 keys at these links and store them
00:04:53.440 somewhere secure the open AI API key
00:04:56.680 will be used to create text to speech
00:04:58.960 with their API I if you don't want to
00:05:01.639 pay to use this program you can skip
00:05:04.280 this and I'd recommend using the pyttsx3
00:05:07.800 library to get free local text to speech
00:05:11.600 using pi ttsx 3 will not give you as
00:05:15.160 high quality of a voice as paying for
00:05:17.160 open AI which is why I am showing it in
00:05:19.639 this video because a lot of people are
00:05:21.919 demanding a voice assistant with a
00:05:23.960 realistic human sounding voice
00:05:26.479 personally I use Pi ttsx 3 for my own
00:05:30.160 voice assistant because it gives me
00:05:31.919 faster responses and I don't need that
00:05:34.919 high quality of a voice if you just want
00:05:36.479 a clear voice go with pi ttsx 3 inside
00:05:40.560 your projects folder create a file named
00:05:43.360 assistant. piy the first step we want to
00:05:46.120 accomplish inside of our Python program
00:05:48.560 is creating a function to generate
00:05:50.600 responses from llama 370b with grock's
00:05:54.120 API we'll start by importing the grock
00:05:57.160 class from grock then initi iate the
00:06:00.120 grock client by passing your API key to
00:06:03.440 the grock class inside quotes Define a
00:06:06.280 function called grock prompt that takes
00:06:08.560 prompt as input inside the function
00:06:11.400 create a list called convo with the
00:06:13.919 prompt formatted correctly for llama in
00:06:16.560 another variable called chat completion
00:06:18.919 call the grot client to create a
00:06:20.919 response passing the convo list and
00:06:23.720 specify the grock API to use the Llama
00:06:26.680 370b model inside a variable called
00:06:29.639 response I will select the last message
00:06:32.280 from grock then we return the response.
00:06:35.520 content from The grock Prompt outside of
00:06:37.960 the grock prompt you can create a
00:06:39.599 variable called prompt and request a
00:06:41.960 text input the program will store the
00:06:44.319 return value from running our grock
00:06:46.479 prompt with our text prompt input and
00:06:49.400 now print the response content to the
00:06:51.720 terminal if you've installed grock
00:06:53.680 correctly and set your API key in the
00:06:56.280 parameters this script should allow you
00:06:58.319 to prompt llama 370b from the grock API
00:07:02.319 this grock prompt will be the main
00:07:04.400 function for our conversation with our
00:07:06.520 voice assistant if you have not already
00:07:08.400 heard of Gro yet Gro is a company that
00:07:11.000 invented a new piece of Hardware that is
00:07:13.360 like a GPU but specifically made for
00:07:16.240 inferencing language models this allows
00:07:18.599 for us to get high-speed AKA low latency
00:07:22.080 responses from the grock API even using
00:07:25.160 a 70 billion parameter model but we're
00:07:27.520 going to need much more than a
00:07:28.800 conversation with with the language
00:07:30.120 model to make this AI agent fully
00:07:32.840 multimodal we now need to create a
00:07:34.919 function to call functions in our
00:07:37.039 program from the language model Define
00:07:39.800 function uncore call that also takes
00:07:42.759 prompt as input inside this function we
00:07:45.120 will create a system message variable
00:07:47.400 write the system message string the same
00:07:49.720 as mine here if you want to add any new
00:07:52.159 function calls specify the function in
00:07:54.639 the first sentence and the list at the
00:07:56.800 end of the system message text string on
00:07:59.479 underneath system message we'll
00:08:01.080 structure the function conversation with
00:08:03.319 the system message and the prompt from
00:08:05.319 the user back up at the grock prompt we
00:08:07.479 can copy chat completion response and
00:08:10.120 return response. content then paste that
00:08:12.840 into function call then change convo in
00:08:15.319 the chat completion input parameters to
00:08:18.280 function cono above the response
00:08:21.000 variable outside of our functions we
00:08:22.960 will create a function response that
00:08:24.840 sends the prompt input to our new
00:08:27.039 function and we can print that function
00:08:29.560 response if you did add any new function
00:08:31.800 calls to your program now would be a
00:08:33.760 good place in the tutorial to test your
00:08:36.080 system message is actually getting llama
00:08:38.279 to call your functions reliably now we
00:08:40.719 have our code to communicate with the
00:08:42.440 main brain for our assistant through the
00:08:44.720 grock API let's add our three functions
00:08:47.560 to take screenshots take photos and
00:08:50.080 extract text we copy for the assistant
00:08:52.399 in our conversations the take screenshot
00:08:55.240 function will need us to import the
00:08:57.320 Image Grab class from PI i l inside the
00:09:00.800 function we will set the file name for
00:09:02.680 our screenshot captures create a
00:09:04.920 variable screenshot that stores the raw
00:09:07.519 image data from calling image grabs grab
00:09:10.320 function create RGB screenshot to
00:09:13.360 convert the raw image data to RGB format
00:09:16.399 with the image grabs convert function
00:09:19.040 lastly for this function we can call the
00:09:21.040 save function on RGB screenshot by
00:09:24.040 passing The Path and setting the quality
00:09:26.519 to 15% dropping the image quality to 15%
00:09:30.279 will allow for lower latency when
00:09:32.279 prompting the image processing language
00:09:34.399 model we will Implement later and allow
00:09:36.600 the language model to process these
00:09:38.519 images faster if you want to test the
00:09:40.680 take screenshot function create a new
00:09:42.920 file called test.py paste the function
00:09:45.839 and import statement for Image Grab call
00:09:48.480 the function on the last line of your
00:09:50.279 program and once it finishes running you
00:09:52.560 should have a screenshot JPEG file in
00:09:54.959 your voice assistant folder once your
00:09:57.120 screenshot function is working properly
00:09:59.399 in your test file make sure the code is
00:10:01.560 the same in your assistant file minus
00:10:03.839 the last line in test.py delete all of
00:10:06.680 the code in test.py and import CV2 that
00:10:10.360 we will use to allow our program to take
00:10:12.800 webcam photos we'll write this code out
00:10:15.200 in the test function to make sure we are
00:10:17.560 calling the correct camera on our
00:10:19.680 computer create webcore Cam that equals
00:10:23.079 CV2 do video capture pass zero as an
00:10:27.040 input variable zero is specif specifying
00:10:29.760 the camera number in the list of
00:10:31.800 available cameras on your computer even
00:10:34.279 if your computer only has one camera CV2
00:10:37.440 might think there is more so let's first
00:10:39.760 test it at zero which is most likely
00:10:41.959 your devic's webcam zero is specifying
00:10:44.760 the first camera in the list Define
00:10:47.240 Webcam capture and on the first line
00:10:49.839 check if not webcam is opened which in
00:10:52.880 English means the program has not
00:10:55.240 successfully opened a camera on line
00:10:57.399 three print a message letting us know
00:10:59.720 the camera was not successfully opened
00:11:02.079 and exit the program on Mac or Linux
00:11:04.639 this will not occur if you have
00:11:06.240 installed CV2 correctly and when first
00:11:08.920 running the program Grant your
00:11:10.760 commandline app permissions to access
00:11:13.040 the webcam for Windows users it is
00:11:15.480 possible you will need to manually Grant
00:11:17.600 permissions for command line to access
00:11:19.800 the web camera if the program doesn't
00:11:22.120 quit our program is successfully opening
00:11:24.720 a camera so outside of the if statement
00:11:27.240 we can create a path variable to store
00:11:29.639 the name of our webcam photo we'll store
00:11:32.079 R and frame from calling CV2 read
00:11:35.440 function on our webcam we can now save
00:11:37.800 the webcam photo with CV 2's right
00:11:40.320 function and call the function on the
00:11:42.440 last line in our program run test.py and
00:11:45.519 make sure the webcam. jpeg file is in
00:11:48.560 your voice assistance folder and
00:11:50.560 capturing the webcam if the webcam photo
00:11:53.279 is not actually your webcam try changing
00:11:55.839 the number input to the video capture
00:11:58.240 function Plus one until you find out
00:12:01.040 which number in the list of CV2 cameras
00:12:03.760 is actually your webcam once your code
00:12:06.200 is functioning copy it into assistant.
00:12:09.120 piy and make sure you import CV2 then
00:12:11.880 save assistant. piy our G clipboard text
00:12:15.320 function will need Piper clip imported
00:12:18.120 inside the function create clipboard
00:12:20.480 uncore content that will store the
00:12:22.560 results from calling Piper clip. paste
00:12:25.480 we can now use an if statement to check
00:12:27.399 the results for Python's built in is
00:12:29.880 instance functions results for clipboard
00:12:32.760 content specifying the variable should
00:12:35.120 be a string meaning text content if
00:12:38.040 there is a string we will return
00:12:40.160 clipboard content else clipboard content
00:12:43.040 is not a string so we will print to the
00:12:45.399 terminal letting us know we probably
00:12:47.440 didn't copy the text correctly and
00:12:49.680 return none as the context for our
00:12:52.079 prompt to test this function you can
00:12:53.880 delete all the code in test.py and
00:12:56.240 replace it with import Piper clip in
00:12:58.519 your git clipboard text function then
00:13:01.040 you can print the results of get
00:13:03.079 clipboard content which should display
00:13:05.360 the text that you copy before running
00:13:08.040 test.py now we can Implement Google's
00:13:10.600 new Gemini 1.5 flash model for low
00:13:13.800 latency image processing as stated
00:13:16.079 earlier we'll be using llama 3 through
00:13:18.199 the gro API as our main model for our
00:13:20.880 voice assistant we'll simply use this
00:13:22.800 new flash model to provide low latency
00:13:25.959 highly relevant context to our voice
00:13:28.440 assistant first first import Google
00:13:30.279 generative AI as gen AI also P's image
00:13:34.519 class next configure your connection to
00:13:37.320 geni by passing your Google gen API key
00:13:40.920 we can now set up our llm configuration
00:13:43.360 settings for Gemini then turn off all
00:13:45.560 safety settings by setting them to
00:13:47.720 blockor none in all caps then we can set
00:13:50.880 a model variable to load the Gemini
00:13:53.079 flash model with our config and safety
00:13:55.399 settings Define Vision prompt that takes
00:13:58.800 promp prompt and photop paath as input
00:14:01.440 inside IMG open the image at photop
00:14:04.519 paath then we will format our prompt
00:14:06.560 with a prompt heading and add the prompt
00:14:08.759 input to the end inside response we can
00:14:11.399 generate a response from Gemini flash by
00:14:13.959 passing The Prompt and image input
00:14:16.279 formatted in a list finally return
00:14:19.120 response. text now let's start wrapping
00:14:21.959 everything together and make the program
00:14:24.120 conversational with the ability to call
00:14:26.279 our functions and send images to Gemini
00:14:28.720 FL if necessary first create a system
00:14:31.639 message for our llama 3 conversation
00:14:34.120 then we can format the system message in
00:14:36.320 a list called convo now delete these
00:14:38.839 lines after the functions and create a
00:14:41.240 while true Loop we can request the text
00:14:43.880 prompt input then inside call store the
00:14:46.560 response from function call if take
00:14:48.639 screenshot is in call Print taking
00:14:50.480 screenshot and call the take screenshot
00:14:52.720 function inside a variable called visual
00:14:55.160 context store the response from sending
00:14:57.519 the prompt and image path to Vision
00:15:00.399 prompt else if capture webcam in call
00:15:03.240 Print capturing webcam and call the
00:15:05.399 webcam capture function then again send
00:15:08.079 the prompt and photo path to Vision
00:15:10.519 prompt else if extract clipboard and
00:15:12.839 call Print extracting clipboard text
00:15:15.440 inside paste store the return text from
00:15:18.240 calling get clipboard text then modify
00:15:20.959 prompt to have two line breaks after the
00:15:23.399 current prompt tab then all caps write
00:15:26.519 clipboard content and add the paste text
00:15:29.800 this format will help llama better
00:15:32.079 understand the prompt from the pasted
00:15:33.920 text then we can modify the grock prompt
00:15:36.560 function a bit to handle a conversation
00:15:39.079 instead of a single prompt first let's
00:15:41.199 add IMG context as an input parameter
00:15:44.440 delete the current convo since we
00:15:46.199 created one outside the function to
00:15:48.240 track our Gro conversation check if
00:15:51.000 image context then format The Prompt
00:15:53.759 with the image context again putting two
00:15:56.680 line breaks a tab then writing an all
00:15:59.759 caps image context before adding the
00:16:02.360 image context to The Prompt now we can
00:16:04.720 add that formatted user prompt to the
00:16:07.160 convo list with Python's append function
00:16:10.160 after response add the generated
00:16:12.319 response to the conversation back down
00:16:14.959 inside of the while true Loop create
00:16:17.000 response that will equal the results
00:16:18.959 from sending the prompt and image
00:16:21.040 context to grock prompt then print the
00:16:23.800 response at the end of the while true
00:16:25.720 Loop now we've been through a lot of
00:16:27.519 code this is a good point to to check
00:16:29.360 that all of your code is the same as
00:16:31.240 mine at this state in the tutorial then
00:16:33.639 once you have all your code matching
00:16:35.319 mine you should be able to have a
00:16:37.040 conversation with your multimodal AI
00:16:39.440 agent with that working let's give our
00:16:41.279 voice assistant a realistic human voice
00:16:43.759 for converting text to speech who will
00:16:45.959 import the open aai class from the open
00:16:48.639 AI library and also import Pi audio
00:16:52.519 below our grock and Gemini API
00:16:54.800 configurations we will create open aai
00:16:57.360 client and initiate a connection to the
00:16:59.839 openai API by passing our openai API key
00:17:04.359 we can now Define speak that takes text
00:17:07.119 as input since I have explained this
00:17:09.319 code already in another Voice Assistant
00:17:11.520 video I won't go too in depth on this
00:17:14.079 but this is the optimal code for
00:17:15.880 streaming text to speech from the open
00:17:18.280 aai API in Python what you do need to
00:17:21.000 know about this code is the voice
00:17:23.000 parameter is where you can choose one of
00:17:25.280 six open AI voices that you prefer for
00:17:28.160 your voices assistant here's a quick
00:17:30.000 sample of each
00:17:31.720 voice hello world I am Shimmer hello
00:17:34.720 world I am Nova hello world I am Onyx
00:17:38.000 hello world I am Fable hello world I am
00:17:40.559 Echo hello world I am alloy once you
00:17:42.880 have your speak function complete inside
00:17:45.200 of the while true Loop you can call the
00:17:47.120 speak function after printing your
00:17:49.160 response and your program will now have
00:17:51.360 a highquality low latency AI generated
00:17:54.480 Voice using the open AI API we now have
00:17:57.400 given our assistant multi modality and a
00:18:00.120 voice the last step is to allow our
00:18:02.520 program to efficiently handle voice
00:18:04.559 prompts and a wake word unlike open AI
00:18:07.159 we are going to set up a wake word for
00:18:08.760 our voice assistant because unlike the
00:18:10.760 gp40 voice assistant we want this
00:18:13.240 program as something we can start up
00:18:14.840 when we are next to our computer and the
00:18:16.679 program is always waiting for us to
00:18:18.679 intentionally send a prompt a voice
00:18:21.120 assistant shouldn't need us to open the
00:18:23.280 program click start listening wait 8
00:18:25.919 seconds for the gp40 model to load then
00:18:29.000 start speaking a prompt instead of
00:18:30.840 having to wait 10 seconds to talk to our
00:18:33.159 AI friend we're trying to create a voice
00:18:35.679 assistant that is always ready to
00:18:37.480 respond to our prompts when we need it
00:18:39.480 and not making noise in the background
00:18:41.400 when we don't at the top we import OS
00:18:44.440 and from faster whisper import The
00:18:47.039 Whisper model class we will set numb
00:18:49.520 cores to check how many CPU cores your
00:18:52.280 device has then we will set whisper size
00:18:55.200 as base in whisper model we will
00:18:57.480 initiate the whisper model class on CPU
00:19:00.559 with int8 and set the CPU threads to
00:19:03.640 equal half our total CPU cores and the
00:19:06.600 numb workers to equal the same Define
00:19:09.240 wave to text that takes audio path as
00:19:11.799 input set segments and underscore to
00:19:14.840 store the results from calling whisper
00:19:16.960 models transcribed function on our audio
00:19:19.880 path inside text we can join the
00:19:22.400 streamed results from faster whisper and
00:19:25.159 lastly return all the transcribed text
00:19:28.400 now that that our program can transcribe
00:19:30.400 audio let's Implement some code for the
00:19:32.640 program to process audio in the
00:19:34.679 background process to keep latency as
00:19:37.159 low as possible import speech
00:19:39.159 recognition as Sr and the time Library
00:19:42.679 set wake word as your wake word I'll use
00:19:46.280 Jarvis as mine set R to initiate sr's
00:19:50.240 recognizer class then set source to
00:19:53.200 initiate your microphone Define start
00:19:55.600 listening and with source as s call the
00:19:58.360 rec recognizers adjust for ambient noise
00:20:01.000 function to analyze microphone input for
00:20:03.640 2 seconds when the program is starting
00:20:06.080 print a message to the terminal telling
00:20:07.960 us we can say the Wake word followed
00:20:09.880 with our prompt to notify us the program
00:20:12.240 has finished starting up then call
00:20:14.320 recognizer listen and background
00:20:16.320 function passing our microphone source
00:20:18.720 and call back as input which is a
00:20:20.880 function we will Define later to control
00:20:23.280 what the program does based on the audio
00:20:25.720 input then in a while true Loop we will
00:20:27.919 sleep the pr program every half second
00:20:30.120 which stops the Callback function
00:20:31.640 running in the background from
00:20:33.080 overworking our CPU in all of my
00:20:35.600 previous Voice Assistant tutorials we
00:20:37.559 have handled wake word detection by
00:20:39.320 having the program wait for the user to
00:20:41.400 speak just the Wake word respond to the
00:20:43.520 user that the program is now ready to
00:20:45.520 record a prompt and then start recording
00:20:47.799 a prompt the previous method was easy to
00:20:50.440 code but honestly annoying to use in
00:20:53.200 this assistant we will Implement a more
00:20:55.200 modern approach of having the user just
00:20:57.240 speak the Wake word immed immediately
00:20:59.080 followed by The Prompt this will bypass
00:21:01.360 the weight period between recording a
00:21:03.280 wake word and then the prompt which is a
00:21:05.440 much more intuitive way of interacting
00:21:07.400 with an AI voice assistant in 2024 now
00:21:10.640 that's the theory behind how we will
00:21:12.480 handle wake word detection to implement
00:21:14.799 this in our code we will import re which
00:21:17.480 is Python's Rex library then Define
00:21:20.720 extract prompt that takes transcribe
00:21:23.240 text and wake word as input create a
00:21:26.000 variable called pattern where we will
00:21:27.960 write this cryptic Rex code setting the
00:21:31.000 format to search for in our transcribed
00:21:33.360 text to find the Wake word and the text
00:21:35.720 following while Rex notoriously has the
00:21:38.919 ugliest syntax in anything you could
00:21:41.559 ever write with code it is a highly
00:21:43.640 efficient way to handle string search in
00:21:45.880 Python inside match we can call Rex's
00:21:48.960 search function passing the pattern
00:21:51.159 transcribe text and re. ignore case as
00:21:55.120 inputs we can check if a match was found
00:21:57.960 in the search function if so we can set
00:22:00.559 prompt to select the text after the Wake
00:22:02.960 word in our match and return the clean
00:22:05.120 text prompt else there was not a match
00:22:07.559 in the transcribed text so we will
00:22:10.000 return none now we just need to set up a
00:22:12.240 call back function to control the logic
00:22:14.640 of our voice assistant and how it will
00:22:16.679 act depending on what we give it as
00:22:18.600 voice input first delete this current
00:22:20.799 while true Loop in our program Define
00:22:23.080 callback that takes recognizer and audio
00:22:26.000 as input set prompt audio p path to
00:22:29.320 prompt. wve now we can open the prompt.
00:22:32.640 wve file whether or not it exists yet
00:22:35.480 write the audio input to the wave file
00:22:38.159 in prompt text call wave to text on the
00:22:41.279 new audio file to transcribe the voice
00:22:43.919 input to text with faster whisper with
00:22:46.440 clean prompt we can store the results
00:22:48.760 from extract prompt if clean prompt
00:22:51.640 which basically tells us there was a
00:22:53.400 wake word and following prompt found in
00:22:56.400 extract prompt call then print the
00:22:58.799 user's clean transcribed prompt to the
00:23:01.360 terminal for visual feedback that the
00:23:03.640 program understood Us in sight of call
00:23:06.240 pass the clean prompt to function call
00:23:08.640 then we will run through this if
00:23:10.200 sequence to check whether a function
00:23:12.000 call needs to be performed and handle
00:23:14.120 the context from a function call
00:23:16.120 accordingly after checking for function
00:23:18.360 calls we can generate a response with
00:23:20.520 the visual context and print the
00:23:22.679 assistant response to terminal finally
00:23:24.960 we can speak the response from our
00:23:26.840 multimodal voice assistant now our
00:23:29.000 program has all of the code it needs we
00:23:31.200 just need to call start listening to
00:23:33.120 effectively start up all the processes
00:23:35.480 and have our voice assistant ready to
00:23:37.400 respond whenever we speak the Wake word
00:23:39.960 if you guys appreciated this Python
00:23:41.679 tutorial don't forget to hit the like
00:23:43.640 button on this video And subscribe for
00:23:45.720 more practical AI videos like this

You might also like