Where do I get the training data? It isn't in the repository. #38
-
Hi, this is Sheryl. I am applying for GSoC UNICODE, and am interested in improving the LSTM model for word segmentation. However, I am running into a roadblock running the "Thai_graphclust_model4_heavy" model with train_data = "BEST". I think the file path for '/content/lstm_word_segmentation/Data/Best/news/news_00040.txt' has been removed. Let me know how I should remedy this, as I'd love to run the model and take note of any room for improvement. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The data sets are not checked-in to the repository. The README contains instructions on how to obtain them. https://github.com/unicode-org/lstm_word_segmentation?tab=readme-ov-file#data-sets |
Beta Was this translation helpful? Give feedback.
The data sets are not checked-in to the repository. The README contains instructions on how to obtain them.
https://github.com/unicode-org/lstm_word_segmentation?tab=readme-ov-file#data-sets