-
Updated
Nov 11, 2021 - Python
#
speech-processing
Here are 342 public repositories matching this topic...
A PyTorch-based Speech Toolkit
audio
transformers
pytorch
voice-recognition
speech-recognition
speech-to-text
language-model
speaker-recognition
speaker-verification
speech-processing
audio-processing
asr
speaker-diarization
speechrecognition
speech-separation
speech-enhancement
spoken-language-understanding
huggingface
speech-toolkit
speechbrain
Reading list for research topics in multimodal machine learning
machine-learning
natural-language-processing
reinforcement-learning
computer-vision
deep-learning
robotics
healthcare
reading-list
representation-learning
speech-processing
multimodal-learning
-
Updated
Nov 5, 2021
WaveNet vocoder
-
Updated
Nov 2, 2020 - Python
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
tutorial
detection
extraction
citation
pytorch
pretrained-models
speaker-recognition
speaker-verification
speech-processing
speaker-diarization
voice-activity-detection
speech-activity-detection
speaker-change-detection
speaker-embedding
pyannote-audio
overlapped-speech-detection
speaker-diarization-pipeline
-
Updated
Nov 9, 2021 - Python
SincNet is a neural architecture for efficiently processing raw audio samples.
audio
python
deep-learning
signal-processing
waveform
cnn
pytorch
artificial-intelligence
speech-recognition
neural-networks
convolutional-neural-networks
digital-signal-processing
filtering
speaker-recognition
speaker-verification
speech-processing
audio-processing
asr
timit
speaker-identification
-
Updated
Apr 28, 2021 - Python
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
machine-learning
awesome
deep-learning
speech-recognition
awesome-list
speech-processing
speaker-diarization
-
Updated
Sep 27, 2021
manrajgrover
commented
Jul 16, 2020
What?
Currently, API manually throws its own messages and errors. We should move them to werkzeug
exceptions.
A neural network for end-to-end speech denoising
machine-learning
deep-learning
end-to-end
speech
neural-networks
wavenet
speech-processing
speech-denoising
-
Updated
Jul 24, 2019 - Python
Speech recognition toolkit for the arduino
-
Updated
May 5, 2021 - C++
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
deep-neural-networks
signal-processing
machine-learning-algorithms
speech-processing
speech-enhancement
-
Updated
Dec 1, 2020 - MATLAB
Problem Agnostic Speech Encoder
deep-learning
pytorch
unsupervised-learning
speech-processing
multi-task-learning
waveform-analysis
self-supervised-learning
-
Updated
May 20, 2020 - Python
Novoic's audio feature extraction library
audio
python
machine-learning
statistics
signal-processing
waveform
healthcare
feature-extraction
dimension
speech-processing
audio-processing
docstrings
alzheimers-disease
parkinsons-disease
-
Updated
Oct 19, 2020 - Python
Library to build speech synthesis systems designed for easy and fast prototyping.
-
Updated
Aug 11, 2021 - Python
A python wrapper for Speech Signal Processing Toolkit (SPTK).
-
Updated
May 22, 2021 - Python
This repository has implementation for "Neural Voice Cloning With Few Samples"
deep-learning
voice
tts
speech-processing
voice-synthesis
saidl
speaker-adaptation
voice-cloning
speaker-encodings
mel-spectogram
-
Updated
Feb 23, 2021 - Python
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
deep-learning
neural-network
speech
speech-recognition
neural-networks
deeplearning
speech-to-text
speaker-recognition
speaker-verification
speech-processing
speech-recognizer
beamforming
speech-analysis
timit
speechrecognition
speech-api
speech-separation
librispeech
speech-emotion-recognition
speaker-identification
-
Updated
Nov 3, 2021 - HTML
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
audio
raspberry-pi
deep-learning
tensorflow
keras
speech-processing
dns-challenge
noise-reduction
audio-processing
real-time-audio
speech-enhancement
speech-denoising
onnx
tf-lite
noise-suppression
dtln-model
-
Updated
Nov 5, 2020 - Python
Real-time GCC-NMF Blind Speech Separation and Enhancement
machine-learning
real-time
gcc
speech
ipython-notebook
low-latency
dictionary-learning
speaker
speech-processing
cross-correlation
nmf
real-time-processing
unsupervised-machine-learning
speech-separation
speech-enhancement
gcc-nmf
generalized-cross-correlation
tdoa
-
Updated
Apr 8, 2019 - Python
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
speech
speech-synthesis
encodings
speech-processing
speaker-embeddings
mel-spectrogram
voice-cloning
speaker-encodings
-
Updated
Feb 23, 2021 - Python
This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
deep-neural-networks
deep-learning
signal-processing
speech-processing
speech-analysis
speech-separation
-
Updated
Jan 9, 2021 - MATLAB
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
-
Updated
Aug 13, 2019 - Python
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
-
Updated
Oct 8, 2021 - Python
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
natural-language-processing
machine-translation
artificial-intelligence
speech-recognition
natural-language-generation
speech-processing
-
Updated
Oct 29, 2021
A React-Native Bridge for the Google Dialogflow (API.AI) SDK
google
react-native
voice
speech
text-recognition
apiai
api-ai
speech-processing
speak
speech-to-function
dialogflow
-
Updated
Jun 3, 2021 - JavaScript
Deep neural network based speech enhancement toolkit
-
Updated
Jun 14, 2019 - MATLAB
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
nlp
classifier
natural-language-processing
feature-extraction
nltk
gaussian-mixture-models
support-vector-machines
mfcc
principal-component-analysis
speech-processing
linear-discriminant-analysis
isomap
spectral-clustering
long-short-term-memory
kernel-pca
spectral-embedding
locally-linear-embedding
linear-prediction-coefficients
speech-utterance
-
Updated
Jul 15, 2020 - Python
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
audio
reproducible-research
paper
speech
pytorch
band
speech-processing
noise-reduction
denoising
speech-separation
speech-enhancement
narrow-band
single-channel
pretrained-model
band-fusion-model
full-band
sub-band
-
Updated
Nov 1, 2021 - Python
6
ngragaei
commented
Jul 27, 2020
frames[-1] = np.append(frames[-1], np.array([0]*(frame_length - len(frames[0]))))
TypeError: can't multiply sequence by non-int of type 'float'
Improve this page
Add a description, image, and links to the speech-processing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the speech-processing topic, visit your repo's landing page and select "manage topics."
I'd like to train this model on 8 V100 GPUs - does it support multi GPU training?