Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.
I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:
INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last)
setting pretrained_model_name will not only define the model arch but also load the pre-trained checkpoint. We should have another hparam to control whether to load pre-trained checkpoint or not.
I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.
I replicated the
personachat_self_original.json
file structure and added my own data. I deleteddataset_cache_OpenAIGPTTokenizer
file but when I try to train, I get this error: