We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 10
# tactiq.
io free youtube transcript
# Build Your Own GPT-4o Voice Assistant in Python with Groq, Llama3, OpenAI-TTS & Faster-Whisper # https://www.youtube.com/watch/pi6gr_YHSuc
00:00:00.120 voice assistants are the hottest thing
00:00:02.240 in AI right now open AI has just 00:00:04.799 strategically released their new GPT 40 00:00:08.119 multimodal Voice Assistant preview one 00:00:10.679 day before they knew Google was going to 00:00:12.880 release theirs it is very clear they are 00:00:15.080 hungry for it 00:00:17.400 Savage while we wait for the bureaucracy 00:00:20.240 of these Tech Giants to finish up their 00:00:22.359 code I'm going to show you how to code 00:00:24.400 your own low latency multimodal AI Voice 00:00:28.080 Assistant but before we get into the 00:00:29.720 code let's introduce you to your new AI 00:00:33.000 Voice 00:00:36.200 Assistant Jarvis I'm home welcome home 00:00:39.760 how was your day Jarvis it was great I 00:00:41.640 got some code pulled up on my screen can 00:00:43.360 you tell me briefly what this program is 00:00:46.640 it looks like you're working on a voice 00:00:48.399 assistant project in Python the code 00:00:50.559 appears to be a comprehensive framework 00:00:52.239 for building a multimodal AI Voice 00:00:54.239 Assistant incorporating various 00:00:56.320 functionalities such as processing voice 00:00:58.320 prompts image and webcam input clipboard 00:01:01.719 text audio input and speech output is 00:01:05.920 there a specific part of the code you'd 00:01:07.799 like me to explain or help you 00:01:11.159 with Jarvis yes I copied the program's 00:01:14.400 entire code can you tell me what library 00:01:17.159 this program is using for text to 00:01:20.520 speech the program is using open AI tts1 00:01:24.439 model for text to speech functionality 00:01:27.000 which is called through the speak 00:01:28.360 function Jarvis what do you see my 00:01:31.520 webcam it looks like I see a webcam 00:01:34.200 capture of a male adult smiling and 00:01:36.439 giving the middle finger wearing a brown 00:01:38.439 shirt and gray beanie the background 00:01:40.720 appears to be an attic with various 00:01:42.439 objects like a microphone cables tape 00:01:44.799 and 00:01:45.560 insulation the overall tone of the image 00:01:48.159 is coming across as rude and possibly 00:01:50.320 intended to be disrespectful y' don't 00:01:52.399 ever disrespect me looking like blance 00:01:54.119 from The Golden Girls now that Jarvis 00:01:56.079 has introduced herself let's jump 00:01:58.320 straight into the code tutorial first 00:02:00.600 create a folder to put your voice 00:02:02.399 assistant files open your voice 00:02:04.680 assistant folder in a code editor I will 00:02:07.240 be using vs code for this video inside 00:02:09.959 your projects folder start by creating a 00:02:12.640 file called 00:02:14.120 requirements.txt open that file and copy 00:02:17.519 the list of dependencies the same as 00:02:19.720 mine these will be the dependencies that 00:02:21.760 your Python program will need for this 00:02:23.879 voice assistant save this file and exit 00:02:26.200 out of it for those of you that do not 00:02:27.800 have the patience or the desire to go 00:02:30.040 through a Python Programming tutorial 00:02:32.160 video which is exactly what I'm aiming 00:02:34.400 to make with this video you can access 00:02:36.640 the source code to all of my videos in 00:02:39.000 the Pro membership channels of my 00:02:41.040 Discord having access to the pro 00:02:43.120 channels is not necessary to go through 00:02:45.560 this video tutorial as I will be showing 00:02:47.920 all of the code for the voice assistant 00:02:49.920 in the demo in this tutorial anyone a 00:02:52.239 part of my Pro membership not only has 00:02:54.599 access to my source code from this video 00:02:56.959 and all of my other tutorials but also a 00:02:59.840 Pro written tutorial before I even 00:03:02.080 release the videos videos like this take 00:03:04.440 a lot of time to create with this much 00:03:06.400 editing which is why I am able to 00:03:08.319 release Pro written tutorials like this 00:03:10.480 one before I can get a full video 00:03:12.720 tutorial on my YouTube channel to join 00:03:14.879 the AI Austin Pro membership click the 00:03:17.480 buy me a coffee Link in this video's 00:03:19.560 description if you don't already have 00:03:21.799 python 3.11 installed you'll need to 00:03:24.720 install that on your computer for this 00:03:26.599 voice assistant with python 3.11 00:03:29.319 installed 00:03:30.280 I can now open a new terminal inside of 00:03:32.599 my project folder and run this command 00:03:35.120 to install every library in the 00:03:37.519 requirements.txt file make sure you 00:03:40.360 don't get any error messages in this 00:03:42.879 important step if you do get any error 00:03:45.599 messages and don't understand them just 00:03:48.280 paste it into chat GPT and ask for an 00:03:51.319 explanation of what is causing the error 00:03:53.920 and what you need to fix it if you get 00:03:56.000 any errors specifically related to Pi 00:03:58.480 audio on a Windows machine you're going 00:04:00.920 to need to download the wheel file and 00:04:03.120 install it with that instead these wheel 00:04:05.400 files can be found at pypi dorg which is 00:04:08.799 the official python package index click 00:04:11.439 on the download files then find the 00:04:13.560 files with CP 311 in the name these are 00:04:16.918 the correct wheel files for python 3.11 00:04:20.639 now figure out if your Windows machine 00:04:22.919 is a 64-bit or 32-bit processor and 00:04:26.600 download the file with a correlating 00:04:28.479 number at the end of the file name once 00:04:31.039 the wheel file has been downloaded copy 00:04:33.440 the entire file path to that wheel file 00:04:36.720 now open command prompt as administrator 00:04:39.680 and type pip install and then paste in 00:04:42.240 the full path to your wheel file for pi 00:04:44.800 audio we now need three API Keys get 00:04:47.720 your Gro open aai and Google gen AI API 00:04:51.400 keys at these links and store them 00:04:53.440 somewhere secure the open AI API key 00:04:56.680 will be used to create text to speech 00:04:58.960 with their API I if you don't want to 00:05:01.639 pay to use this program you can skip 00:05:04.280 this and I'd recommend using the pyttsx3 00:05:07.800 library to get free local text to speech 00:05:11.600 using pi ttsx 3 will not give you as 00:05:15.160 high quality of a voice as paying for 00:05:17.160 open AI which is why I am showing it in 00:05:19.639 this video because a lot of people are 00:05:21.919 demanding a voice assistant with a 00:05:23.960 realistic human sounding voice 00:05:26.479 personally I use Pi ttsx 3 for my own 00:05:30.160 voice assistant because it gives me 00:05:31.919 faster responses and I don't need that 00:05:34.919 high quality of a voice if you just want 00:05:36.479 a clear voice go with pi ttsx 3 inside 00:05:40.560 your projects folder create a file named 00:05:43.360 assistant. piy the first step we want to 00:05:46.120 accomplish inside of our Python program 00:05:48.560 is creating a function to generate 00:05:50.600 responses from llama 370b with grock's 00:05:54.120 API we'll start by importing the grock 00:05:57.160 class from grock then initi iate the 00:06:00.120 grock client by passing your API key to 00:06:03.440 the grock class inside quotes Define a 00:06:06.280 function called grock prompt that takes 00:06:08.560 prompt as input inside the function 00:06:11.400 create a list called convo with the 00:06:13.919 prompt formatted correctly for llama in 00:06:16.560 another variable called chat completion 00:06:18.919 call the grot client to create a 00:06:20.919 response passing the convo list and 00:06:23.720 specify the grock API to use the Llama 00:06:26.680 370b model inside a variable called 00:06:29.639 response I will select the last message 00:06:32.280 from grock then we return the response. 00:06:35.520 content from The grock Prompt outside of 00:06:37.960 the grock prompt you can create a 00:06:39.599 variable called prompt and request a 00:06:41.960 text input the program will store the 00:06:44.319 return value from running our grock 00:06:46.479 prompt with our text prompt input and 00:06:49.400 now print the response content to the 00:06:51.720 terminal if you've installed grock 00:06:53.680 correctly and set your API key in the 00:06:56.280 parameters this script should allow you 00:06:58.319 to prompt llama 370b from the grock API 00:07:02.319 this grock prompt will be the main 00:07:04.400 function for our conversation with our 00:07:06.520 voice assistant if you have not already 00:07:08.400 heard of Gro yet Gro is a company that 00:07:11.000 invented a new piece of Hardware that is 00:07:13.360 like a GPU but specifically made for 00:07:16.240 inferencing language models this allows 00:07:18.599 for us to get high-speed AKA low latency 00:07:22.080 responses from the grock API even using 00:07:25.160 a 70 billion parameter model but we're 00:07:27.520 going to need much more than a 00:07:28.800 conversation with with the language 00:07:30.120 model to make this AI agent fully 00:07:32.840 multimodal we now need to create a 00:07:34.919 function to call functions in our 00:07:37.039 program from the language model Define 00:07:39.800 function uncore call that also takes 00:07:42.759 prompt as input inside this function we 00:07:45.120 will create a system message variable 00:07:47.400 write the system message string the same 00:07:49.720 as mine here if you want to add any new 00:07:52.159 function calls specify the function in 00:07:54.639 the first sentence and the list at the 00:07:56.800 end of the system message text string on 00:07:59.479 underneath system message we'll 00:08:01.080 structure the function conversation with 00:08:03.319 the system message and the prompt from 00:08:05.319 the user back up at the grock prompt we 00:08:07.479 can copy chat completion response and 00:08:10.120 return response. content then paste that 00:08:12.840 into function call then change convo in 00:08:15.319 the chat completion input parameters to 00:08:18.280 function cono above the response 00:08:21.000 variable outside of our functions we 00:08:22.960 will create a function response that 00:08:24.840 sends the prompt input to our new 00:08:27.039 function and we can print that function 00:08:29.560 response if you did add any new function 00:08:31.800 calls to your program now would be a 00:08:33.760 good place in the tutorial to test your 00:08:36.080 system message is actually getting llama 00:08:38.279 to call your functions reliably now we 00:08:40.719 have our code to communicate with the 00:08:42.440 main brain for our assistant through the 00:08:44.720 grock API let's add our three functions 00:08:47.560 to take screenshots take photos and 00:08:50.080 extract text we copy for the assistant 00:08:52.399 in our conversations the take screenshot 00:08:55.240 function will need us to import the 00:08:57.320 Image Grab class from PI i l inside the 00:09:00.800 function we will set the file name for 00:09:02.680 our screenshot captures create a 00:09:04.920 variable screenshot that stores the raw 00:09:07.519 image data from calling image grabs grab 00:09:10.320 function create RGB screenshot to 00:09:13.360 convert the raw image data to RGB format 00:09:16.399 with the image grabs convert function 00:09:19.040 lastly for this function we can call the 00:09:21.040 save function on RGB screenshot by 00:09:24.040 passing The Path and setting the quality 00:09:26.519 to 15% dropping the image quality to 15% 00:09:30.279 will allow for lower latency when 00:09:32.279 prompting the image processing language 00:09:34.399 model we will Implement later and allow 00:09:36.600 the language model to process these 00:09:38.519 images faster if you want to test the 00:09:40.680 take screenshot function create a new 00:09:42.920 file called test.py paste the function 00:09:45.839 and import statement for Image Grab call 00:09:48.480 the function on the last line of your 00:09:50.279 program and once it finishes running you 00:09:52.560 should have a screenshot JPEG file in 00:09:54.959 your voice assistant folder once your 00:09:57.120 screenshot function is working properly 00:09:59.399 in your test file make sure the code is 00:10:01.560 the same in your assistant file minus 00:10:03.839 the last line in test.py delete all of 00:10:06.680 the code in test.py and import CV2 that 00:10:10.360 we will use to allow our program to take 00:10:12.800 webcam photos we'll write this code out 00:10:15.200 in the test function to make sure we are 00:10:17.560 calling the correct camera on our 00:10:19.680 computer create webcore Cam that equals 00:10:23.079 CV2 do video capture pass zero as an 00:10:27.040 input variable zero is specif specifying 00:10:29.760 the camera number in the list of 00:10:31.800 available cameras on your computer even 00:10:34.279 if your computer only has one camera CV2 00:10:37.440 might think there is more so let's first 00:10:39.760 test it at zero which is most likely 00:10:41.959 your devic's webcam zero is specifying 00:10:44.760 the first camera in the list Define 00:10:47.240 Webcam capture and on the first line 00:10:49.839 check if not webcam is opened which in 00:10:52.880 English means the program has not 00:10:55.240 successfully opened a camera on line 00:10:57.399 three print a message letting us know 00:10:59.720 the camera was not successfully opened 00:11:02.079 and exit the program on Mac or Linux 00:11:04.639 this will not occur if you have 00:11:06.240 installed CV2 correctly and when first 00:11:08.920 running the program Grant your 00:11:10.760 commandline app permissions to access 00:11:13.040 the webcam for Windows users it is 00:11:15.480 possible you will need to manually Grant 00:11:17.600 permissions for command line to access 00:11:19.800 the web camera if the program doesn't 00:11:22.120 quit our program is successfully opening 00:11:24.720 a camera so outside of the if statement 00:11:27.240 we can create a path variable to store 00:11:29.639 the name of our webcam photo we'll store 00:11:32.079 R and frame from calling CV2 read 00:11:35.440 function on our webcam we can now save 00:11:37.800 the webcam photo with CV 2's right 00:11:40.320 function and call the function on the 00:11:42.440 last line in our program run test.py and 00:11:45.519 make sure the webcam. jpeg file is in 00:11:48.560 your voice assistance folder and 00:11:50.560 capturing the webcam if the webcam photo 00:11:53.279 is not actually your webcam try changing 00:11:55.839 the number input to the video capture 00:11:58.240 function Plus one until you find out 00:12:01.040 which number in the list of CV2 cameras 00:12:03.760 is actually your webcam once your code 00:12:06.200 is functioning copy it into assistant. 00:12:09.120 piy and make sure you import CV2 then 00:12:11.880 save assistant. piy our G clipboard text 00:12:15.320 function will need Piper clip imported 00:12:18.120 inside the function create clipboard 00:12:20.480 uncore content that will store the 00:12:22.560 results from calling Piper clip. paste 00:12:25.480 we can now use an if statement to check 00:12:27.399 the results for Python's built in is 00:12:29.880 instance functions results for clipboard 00:12:32.760 content specifying the variable should 00:12:35.120 be a string meaning text content if 00:12:38.040 there is a string we will return 00:12:40.160 clipboard content else clipboard content 00:12:43.040 is not a string so we will print to the 00:12:45.399 terminal letting us know we probably 00:12:47.440 didn't copy the text correctly and 00:12:49.680 return none as the context for our 00:12:52.079 prompt to test this function you can 00:12:53.880 delete all the code in test.py and 00:12:56.240 replace it with import Piper clip in 00:12:58.519 your git clipboard text function then 00:13:01.040 you can print the results of get 00:13:03.079 clipboard content which should display 00:13:05.360 the text that you copy before running 00:13:08.040 test.py now we can Implement Google's 00:13:10.600 new Gemini 1.5 flash model for low 00:13:13.800 latency image processing as stated 00:13:16.079 earlier we'll be using llama 3 through 00:13:18.199 the gro API as our main model for our 00:13:20.880 voice assistant we'll simply use this 00:13:22.800 new flash model to provide low latency 00:13:25.959 highly relevant context to our voice 00:13:28.440 assistant first first import Google 00:13:30.279 generative AI as gen AI also P's image 00:13:34.519 class next configure your connection to 00:13:37.320 geni by passing your Google gen API key 00:13:40.920 we can now set up our llm configuration 00:13:43.360 settings for Gemini then turn off all 00:13:45.560 safety settings by setting them to 00:13:47.720 blockor none in all caps then we can set 00:13:50.880 a model variable to load the Gemini 00:13:53.079 flash model with our config and safety 00:13:55.399 settings Define Vision prompt that takes 00:13:58.800 promp prompt and photop paath as input 00:14:01.440 inside IMG open the image at photop 00:14:04.519 paath then we will format our prompt 00:14:06.560 with a prompt heading and add the prompt 00:14:08.759 input to the end inside response we can 00:14:11.399 generate a response from Gemini flash by 00:14:13.959 passing The Prompt and image input 00:14:16.279 formatted in a list finally return 00:14:19.120 response. text now let's start wrapping 00:14:21.959 everything together and make the program 00:14:24.120 conversational with the ability to call 00:14:26.279 our functions and send images to Gemini 00:14:28.720 FL if necessary first create a system 00:14:31.639 message for our llama 3 conversation 00:14:34.120 then we can format the system message in 00:14:36.320 a list called convo now delete these 00:14:38.839 lines after the functions and create a 00:14:41.240 while true Loop we can request the text 00:14:43.880 prompt input then inside call store the 00:14:46.560 response from function call if take 00:14:48.639 screenshot is in call Print taking 00:14:50.480 screenshot and call the take screenshot 00:14:52.720 function inside a variable called visual 00:14:55.160 context store the response from sending 00:14:57.519 the prompt and image path to Vision 00:15:00.399 prompt else if capture webcam in call 00:15:03.240 Print capturing webcam and call the 00:15:05.399 webcam capture function then again send 00:15:08.079 the prompt and photo path to Vision 00:15:10.519 prompt else if extract clipboard and 00:15:12.839 call Print extracting clipboard text 00:15:15.440 inside paste store the return text from 00:15:18.240 calling get clipboard text then modify 00:15:20.959 prompt to have two line breaks after the 00:15:23.399 current prompt tab then all caps write 00:15:26.519 clipboard content and add the paste text 00:15:29.800 this format will help llama better 00:15:32.079 understand the prompt from the pasted 00:15:33.920 text then we can modify the grock prompt 00:15:36.560 function a bit to handle a conversation 00:15:39.079 instead of a single prompt first let's 00:15:41.199 add IMG context as an input parameter 00:15:44.440 delete the current convo since we 00:15:46.199 created one outside the function to 00:15:48.240 track our Gro conversation check if 00:15:51.000 image context then format The Prompt 00:15:53.759 with the image context again putting two 00:15:56.680 line breaks a tab then writing an all 00:15:59.759 caps image context before adding the 00:16:02.360 image context to The Prompt now we can 00:16:04.720 add that formatted user prompt to the 00:16:07.160 convo list with Python's append function 00:16:10.160 after response add the generated 00:16:12.319 response to the conversation back down 00:16:14.959 inside of the while true Loop create 00:16:17.000 response that will equal the results 00:16:18.959 from sending the prompt and image 00:16:21.040 context to grock prompt then print the 00:16:23.800 response at the end of the while true 00:16:25.720 Loop now we've been through a lot of 00:16:27.519 code this is a good point to to check 00:16:29.360 that all of your code is the same as 00:16:31.240 mine at this state in the tutorial then 00:16:33.639 once you have all your code matching 00:16:35.319 mine you should be able to have a 00:16:37.040 conversation with your multimodal AI 00:16:39.440 agent with that working let's give our 00:16:41.279 voice assistant a realistic human voice 00:16:43.759 for converting text to speech who will 00:16:45.959 import the open aai class from the open 00:16:48.639 AI library and also import Pi audio 00:16:52.519 below our grock and Gemini API 00:16:54.800 configurations we will create open aai 00:16:57.360 client and initiate a connection to the 00:16:59.839 openai API by passing our openai API key 00:17:04.359 we can now Define speak that takes text 00:17:07.119 as input since I have explained this 00:17:09.319 code already in another Voice Assistant 00:17:11.520 video I won't go too in depth on this 00:17:14.079 but this is the optimal code for 00:17:15.880 streaming text to speech from the open 00:17:18.280 aai API in Python what you do need to 00:17:21.000 know about this code is the voice 00:17:23.000 parameter is where you can choose one of 00:17:25.280 six open AI voices that you prefer for 00:17:28.160 your voices assistant here's a quick 00:17:30.000 sample of each 00:17:31.720 voice hello world I am Shimmer hello 00:17:34.720 world I am Nova hello world I am Onyx 00:17:38.000 hello world I am Fable hello world I am 00:17:40.559 Echo hello world I am alloy once you 00:17:42.880 have your speak function complete inside 00:17:45.200 of the while true Loop you can call the 00:17:47.120 speak function after printing your 00:17:49.160 response and your program will now have 00:17:51.360 a highquality low latency AI generated 00:17:54.480 Voice using the open AI API we now have 00:17:57.400 given our assistant multi modality and a 00:18:00.120 voice the last step is to allow our 00:18:02.520 program to efficiently handle voice 00:18:04.559 prompts and a wake word unlike open AI 00:18:07.159 we are going to set up a wake word for 00:18:08.760 our voice assistant because unlike the 00:18:10.760 gp40 voice assistant we want this 00:18:13.240 program as something we can start up 00:18:14.840 when we are next to our computer and the 00:18:16.679 program is always waiting for us to 00:18:18.679 intentionally send a prompt a voice 00:18:21.120 assistant shouldn't need us to open the 00:18:23.280 program click start listening wait 8 00:18:25.919 seconds for the gp40 model to load then 00:18:29.000 start speaking a prompt instead of 00:18:30.840 having to wait 10 seconds to talk to our 00:18:33.159 AI friend we're trying to create a voice 00:18:35.679 assistant that is always ready to 00:18:37.480 respond to our prompts when we need it 00:18:39.480 and not making noise in the background 00:18:41.400 when we don't at the top we import OS 00:18:44.440 and from faster whisper import The 00:18:47.039 Whisper model class we will set numb 00:18:49.520 cores to check how many CPU cores your 00:18:52.280 device has then we will set whisper size 00:18:55.200 as base in whisper model we will 00:18:57.480 initiate the whisper model class on CPU 00:19:00.559 with int8 and set the CPU threads to 00:19:03.640 equal half our total CPU cores and the 00:19:06.600 numb workers to equal the same Define 00:19:09.240 wave to text that takes audio path as 00:19:11.799 input set segments and underscore to 00:19:14.840 store the results from calling whisper 00:19:16.960 models transcribed function on our audio 00:19:19.880 path inside text we can join the 00:19:22.400 streamed results from faster whisper and 00:19:25.159 lastly return all the transcribed text 00:19:28.400 now that that our program can transcribe 00:19:30.400 audio let's Implement some code for the 00:19:32.640 program to process audio in the 00:19:34.679 background process to keep latency as 00:19:37.159 low as possible import speech 00:19:39.159 recognition as Sr and the time Library 00:19:42.679 set wake word as your wake word I'll use 00:19:46.280 Jarvis as mine set R to initiate sr's 00:19:50.240 recognizer class then set source to 00:19:53.200 initiate your microphone Define start 00:19:55.600 listening and with source as s call the 00:19:58.360 rec recognizers adjust for ambient noise 00:20:01.000 function to analyze microphone input for 00:20:03.640 2 seconds when the program is starting 00:20:06.080 print a message to the terminal telling 00:20:07.960 us we can say the Wake word followed 00:20:09.880 with our prompt to notify us the program 00:20:12.240 has finished starting up then call 00:20:14.320 recognizer listen and background 00:20:16.320 function passing our microphone source 00:20:18.720 and call back as input which is a 00:20:20.880 function we will Define later to control 00:20:23.280 what the program does based on the audio 00:20:25.720 input then in a while true Loop we will 00:20:27.919 sleep the pr program every half second 00:20:30.120 which stops the Callback function 00:20:31.640 running in the background from 00:20:33.080 overworking our CPU in all of my 00:20:35.600 previous Voice Assistant tutorials we 00:20:37.559 have handled wake word detection by 00:20:39.320 having the program wait for the user to 00:20:41.400 speak just the Wake word respond to the 00:20:43.520 user that the program is now ready to 00:20:45.520 record a prompt and then start recording 00:20:47.799 a prompt the previous method was easy to 00:20:50.440 code but honestly annoying to use in 00:20:53.200 this assistant we will Implement a more 00:20:55.200 modern approach of having the user just 00:20:57.240 speak the Wake word immed immediately 00:20:59.080 followed by The Prompt this will bypass 00:21:01.360 the weight period between recording a 00:21:03.280 wake word and then the prompt which is a 00:21:05.440 much more intuitive way of interacting 00:21:07.400 with an AI voice assistant in 2024 now 00:21:10.640 that's the theory behind how we will 00:21:12.480 handle wake word detection to implement 00:21:14.799 this in our code we will import re which 00:21:17.480 is Python's Rex library then Define 00:21:20.720 extract prompt that takes transcribe 00:21:23.240 text and wake word as input create a 00:21:26.000 variable called pattern where we will 00:21:27.960 write this cryptic Rex code setting the 00:21:31.000 format to search for in our transcribed 00:21:33.360 text to find the Wake word and the text 00:21:35.720 following while Rex notoriously has the 00:21:38.919 ugliest syntax in anything you could 00:21:41.559 ever write with code it is a highly 00:21:43.640 efficient way to handle string search in 00:21:45.880 Python inside match we can call Rex's 00:21:48.960 search function passing the pattern 00:21:51.159 transcribe text and re. ignore case as 00:21:55.120 inputs we can check if a match was found 00:21:57.960 in the search function if so we can set 00:22:00.559 prompt to select the text after the Wake 00:22:02.960 word in our match and return the clean 00:22:05.120 text prompt else there was not a match 00:22:07.559 in the transcribed text so we will 00:22:10.000 return none now we just need to set up a 00:22:12.240 call back function to control the logic 00:22:14.640 of our voice assistant and how it will 00:22:16.679 act depending on what we give it as 00:22:18.600 voice input first delete this current 00:22:20.799 while true Loop in our program Define 00:22:23.080 callback that takes recognizer and audio 00:22:26.000 as input set prompt audio p path to 00:22:29.320 prompt. wve now we can open the prompt. 00:22:32.640 wve file whether or not it exists yet 00:22:35.480 write the audio input to the wave file 00:22:38.159 in prompt text call wave to text on the 00:22:41.279 new audio file to transcribe the voice 00:22:43.919 input to text with faster whisper with 00:22:46.440 clean prompt we can store the results 00:22:48.760 from extract prompt if clean prompt 00:22:51.640 which basically tells us there was a 00:22:53.400 wake word and following prompt found in 00:22:56.400 extract prompt call then print the 00:22:58.799 user's clean transcribed prompt to the 00:23:01.360 terminal for visual feedback that the 00:23:03.640 program understood Us in sight of call 00:23:06.240 pass the clean prompt to function call 00:23:08.640 then we will run through this if 00:23:10.200 sequence to check whether a function 00:23:12.000 call needs to be performed and handle 00:23:14.120 the context from a function call 00:23:16.120 accordingly after checking for function 00:23:18.360 calls we can generate a response with 00:23:20.520 the visual context and print the 00:23:22.679 assistant response to terminal finally 00:23:24.960 we can speak the response from our 00:23:26.840 multimodal voice assistant now our 00:23:29.000 program has all of the code it needs we 00:23:31.200 just need to call start listening to 00:23:33.120 effectively start up all the processes 00:23:35.480 and have our voice assistant ready to 00:23:37.400 respond whenever we speak the Wake word 00:23:39.960 if you guys appreciated this Python 00:23:41.679 tutorial don't forget to hit the like 00:23:43.640 button on this video And subscribe for 00:23:45.720 more practical AI videos like this