torchtext is a great library, putting a layer of abstraction over the usually very heavy component in NLP projects, making the work with complex datasets a pace. Sadly, as is based and built on PyTorch, using it with is not directly possible.

I wrote a little wrapper library called Keras ❤ torchtext (keras-loves-torchtext) to make torchtext work with Keras.

The approach may be considered a bit dirty and unefficient as it requires to convert torch tensors to numpy arrays but the gain is a huge increase in productivity when working with NLP datasets in Keras.

Ultimately loading and processing a dataset like the IMDB movie review one can be as simple as following:*

from keras.layers import *
from keras.models import *
from torchtext import data, datasets
    
from kltt import WrapIterator
    
text_field = data.Field(fix_length=0)
label_field = data.Field(sequential=False, unk_token=None)
    
train_set, test_set = datasets.IMDB.splits(text_field, label_field)
train_it, test_it = data.BucketIterator.splits([train_set, test_set], [32] * 2, repeat=True)
    
text_field.build_vocab(train_set, _size=000)
label_field.build_vocab(train_set)
    
train_data, test_data = WrapIterator.wraps([train_it, test_it], ['text'], ['label'], permute={'text': (1, 0)})
    
model = Sequential()
model.add(Embedding(000, 300))
model.add(LSTM(8))
model.add(Dense(1, activation='sigmoid'))
    
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
model.fit_generator(iter(train_data), steps_per_epoch=len(train_data), epochs=3)
loss, acc = model.fit_generator(iter(test_data), epochs=len(test_data))

Have fun!

* using the Keras loading functions for the IMDB dataset makes it not much more complicated but these functions also rely on pre-processed data while torchtext generalizes much better as it allows to define complex processing pipelines on raw text data.



Source link
thanks you RSS link
( https://www.reddit.com/r/MachineLearning/comments/9jvlmc/p_use_torchtext_with_keras/)

LEAVE A REPLY

Please enter your comment!
Please enter your name here