Text Classification Using Recurrent Neural Networks

Supervised learning problems are one of the core areas of interest in machine learning space. There are a range of approaches available to solve such a problem. In this article, we will delve into the details of a text classification problem and explore how text classification using Recurrent Neural Networks can be used to address it.

Introduction

Text classification is one of the most popular problems in natural language processing (NLP) and machine learning. In this problem, we are provided with text data and our goal is to find out the class of the text from predefined categories or labels. This has myriad of utilities such as organizing documents/texts, analyzing and being able to comment on a data without going through the whole text. It can also help us in making business decisions such as directing a customer query or issue to correct department. This problem statement finds applications in areas such as spam detection, and topic categorization etc. as well.

On a high level, all such categorizations have these three steps:

  • Text Data (Input) : Raw text data such as movie reviews or emails.
  • Model (Machine Learning/Deep Learning): The model learns patterns from the text data.
  • Output (Model assigned categories): Categorized text such as positive or negative sentiment or spam or non- spam data.

IMBD Dataset

The IMDB reviews dataset is a popular resource used for text classificationtasks. This dataset includes reviews from IMDB and each review is labeled as either ā€œpositiveā€ or ā€œnegative.ā€ We will be using this dataset in this article to design and evaluate our text classification model.

Example:

Data Processing

Loading data: To begin with, we need to load the IMDb reviews dataset using TensorFlow Datasets (TFDS). This dataset provides a collection of movie reviews along with sentiment labels (positive or negative).

After loading the dataset, it is divided into training and testing sets:

  • Training and Testing Sets: The dataset is split into train_dataset for training the model and test_dataset for evaluating the modelā€™s performance.

Processing & associated optimization: to ensure efficient and effective training we define buffer size & batch size:

  • Buffer Size: Set using BUFFER_SIZE = 10000. Buffer size defines the size of the buffer used for shuffling the dataset. Higher buffer size improves the randomness of shuffling but requires more memory.
  • Batch Size: BATCH_SIZE = 64. Batch size determines how many examples are processed together in one batch during training.

Training dataset prep: training data is prepared by using following steps

  • Shuffle ā€” data set is shuffled to randomize the order of examples.
  • Batch ā€” It is then divided into batches of specified sizes.
  • Prefetch ā€” To improve performance, we load the next batch while current batch is in process.

Test dataset prep: test dataset prep follows the same steps except shuffling the data (this is not required; why?)

  • Batch: The test dataset is batched into chunks of the specified size.
  • Prefetch: Prefetching is also applied to enhance performance.

Text Vectorization and Encoding

We need to convert text data to numerical data so that it can be fed to the model. We start by defining the unique tokens (words) allowed in the text vectorization process. We then pass the text data through this text vectorizer to convert text data to array of integers.

Steps

Example

Below example shows the input and after round trip text. Round trip refers to passing input through text vectorizer and converting it back to text data.

Code

Model

As the article subject states, we are going to use recurrent neural network to perform text classification. Letā€™s try to understand what is RNN:

Recurrent neural networks (RNNs) are the kind of neural networks which are specifically designed to handle sequential information. Unlike traditional feedforward networks, RNNs have backward connections. This allows RNNs to retain the memory of previously seen inputs. This feature makes RNNs particularly effective for sequential problems such as speech modeling, time series prediction, and speech recognition.

However, RNNs do struggle because of issues such as vanishing and exploding gradients as well as difficulty in retaining long term dependencies. Advanced techniques, such as long-term and short-term memory networks (LSTM) and gated repeating units (GRUs), have been developed to overcome these challenges and increase performance.

In our modeling, we will be using Bidirectional LSTM which is a variant of RNN which particularly handles long term dependencies problem as it tries to retain context generated due to reading inputs forward and backward way.

We use below model architecture as our first model. As we can see, we are passing the text data through vectorizer which then is fed into embedding layer. Embedding layer is helpful in generating semantic representations, speeding up training and helps in generalization. This is followed by a layer of BiLSTM with 128 units which is then followed by a dense layer.

In the code, we also see how we are using padding and masking at the same time to handle different sizes of reviews.

Training & Results

As shown in above code snippet, we train the model with Adam optimizer & BinaryCrossentropy loss function. Since, itā€™s a classification problem, we are going with ā€œaccuracyā€ as metric.

We can see that we achieve 88% validation accuracy after 10 epochs. This performance is really good given we have very long reviews and language could be confounding for some. This high accuracy is primarily due to the BiLSTM layer which is able to learn much more context than traditional RNN.

Model with two BiLSTM layers

Let us explore if we could perform better by adding one more layer of BiLSTM. The updated architecture is shown below. Notice the second bidirectional layer, with this additional layer we have two BiLSTM layers with 128 units each.

Below result show that model performance has not improved by adding another 128 units BiLSTM. Validation accuracy hovers around 85 % which is lower than our initial model.

Future Work & Conclusion

In this article, we went through text classification problem and how we can model it with the help of RNNs using tensorflow. We observed that BiLSTM is indeed good at learning context which results in a high validation accuracy for the single layered model.

We do feel that model can be further improved with more involved text processing ( for example: more VOCAB size) & hyper-parameter tuning of the model (such as no of units in BiLSTM layer, no of layers, addition of drop out layer etc.). You can read more about how to perform these parameter tunings from my other article which is around predicting missing characters in a word using BiLSTM (Link).

If you liked the explanation , follow me for more! Feel free to leave your comments if you have any queries or suggestions.

You can also check out other articles written around data science, computing on medium. If you like my work and want to contribute to my journey, you cal always buy me a coffee :)

References

  1. [Datasets and TensorFlow APIs ] https://www.tensorflow.org/resources/models-datasets
  2. [BiLSTM] : https://www.youtube.com/watch?v=_bt9OaavkT4
  3. [BiLSTM vs LSTM] : https://medium.com/@souro400.nath/why-is-bilstm-better-than-lstm-a7eb0090c1e4
  4. [Text Classification using BERT] https://medium.com/@girish9851/sentiment-analysis-using-deep-learning-bert-adf975232da2
  5. [Github Link to Notebook] https://github.com/girish9851/Text-Classification-Using-RNN/blob/main/text-classification-using-rnn-2.ipynb

Comments