Teachable NLP : The Finetuned GPT-2 Model with Pride and Prejudice
TabTab : TabTab!
The ’ Pride & Prejudice ’ is one of the most-loved romantic novels. In the early 19th century, Mrs.Elizabeth met Mr. Darcy at a ball and she immediately felt he was arrogant. This prejudice is developed and caused them to fall out. However, after Mr.Darcy and Ms. Elizabeth managed to overcome his pride and her prejudice, their relationship was eventually determined.
- They love at first sight?
- Mrs. Elizabeth doesn’t believe Mr.Wickham who slanders Mr. Darcy?
- Mr. Darcy doesn’t send a letter containing the truth to convice her?
- Mr. Wickham and Mrs. Lydia (Elizabeth’s younger sister) don’t run away in the nighttime?
Thanks to Teachable-NLP, I can fine-tune the model without any code. You can use text file from ‘Pride & Prejudice’ to train the NLP model and test the model. Feel free to try here!
A. Acquiring the Data
I got the original ‘Pride & Prejudice’ from Project Gutenburg. It is free to use for machine learning or editing because the copyright of the text is in the Public domain.
B. Preprocessing the data
In the preprocessing the data, I removed the 1) headings and 2) CRLF. And I kept double quotation because double quotation is usual in the novel.
file_name = "Pride and Prejudice.txt" f = open(file_name,"rt",encoding='utf-8') file = f.readlines() f.close() sentences =  start = 0 for line in file: # To know the beginning of the volumes if line == ("PRIDE & PREJUDICE.\n"): start = 1 continue elif start == 0: continue # Remove CHAPTER heading elif line.startswith("CHAPTER"): continue # Remove CRLF elif line == "\n": continue # To know the end of CHAPTER elif line.startswith("END OF"): start = 0 continue # To know the end of data elif line == " * * * * *": break # To remove symbols line = line.replace("_", "") line = line.replace("--", " ") # To remove unnecessary space line = line.strip() # After remove CRLF, create a space between the sentences line = line + " " sentences.append(line) training_data = ''.join(sentences) training_file = open('preprocess_pp.txt',"w") training_file.write(training_data) training_file.close()
C. Teachable NLP
I fine-tuned with GPT-2 small pre-trained with epochs 5. After training GPT-2 in Teachable -NLP, please click the
Test your model . It shows you TabTab linked to my own fine-tuned model. In TabTab, you can try ‘Pride & Prejudice’
I tested the model by writing about the case of Mrs. Elizabeth falling in love with Mr. Darcy right after dancing at the ball.
There are many different ways to rewrite Pride & Prejudice with your creative thoughts! I wonder how you’d change the story. Please share your own story in the forum with your NLP model by Clicking the
See what your friends made at the bottom left side of TabTab.