Demonstration tutorial of retraining OpenAI’s GPT-2-small (a text-generating Transformer neural network) on a large public domain Project Gutenberg poetry corpus to generate high-quality English verse.
https://jalammar.github.io/illustrated-gpt2/
Other tutorial : https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f
https://github.com/minimaxir/gpt-2-simple
Example : http://textsynth.org/
Datasets :
https://www.kaggle.com/datasets
https://github.com/awesomedata/awesome-public-datasets
Scrap webpage with python :
https://www.crummy.com/software/BeautifulSoup/
https://github.com/EugenHotaj/beatles/blob/master/scraper.py
Tracking the way we use language