Demonstration tutorial of retraining OpenAI’s GPT-2-small (a text-generating Transformer neural network) on a large public domain Project Gutenberg poetry corpus to generate high-quality English verse.
https://jalammar.github.io/illustrated-gpt2/
Other tutorial : https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f
https://github.com/minimaxir/gpt-2-simple
Example : http://textsynth.org/
Datasets :
https://www.kaggle.com/datasets
https://github.com/awesomedata/awesome-public-datasets
Scrap webpage with python :
https://www.crummy.com/software/BeautifulSoup/
https://github.com/EugenHotaj/beatles/blob/master/scraper.py
https://github.com/shawwn/colab-tricks
This article was researched by George Weber in the early 1990s and written up in 1995. It was first published in the now sadly defunct and still missed Language Today by the magazine's editor, Geoffrey Kingscott (a founder member of the Andaman Association), in December 1997 (Language Monthly, 3: 12-18, 1997, ISSN 1369-9733).
Luis von Ahn shares how his ambitious new project, Duolingo, will help millions learn a new language while translating the Web quickly and accurately. Learn a language for free, and simultaneously translate the Web
All online newspapers in the world, translate with one click
Internet Slang Words and Computer Slang
Open-source smalltalk with powerful multimedia
On-the-fly audio programming language
Tracking the way we use language