Scikit-Learn: Generate One-Hot Encoding Feature Using MultiLabelBinarizer

One-hot encode is widely used in nlp. In this tutorial, we will introduce how to create one-hot encode using scilit-learn MultiLabelBinarizer.

Scikit-Learn: Generate One-Hot Encoding Feature Using MultiLabelBinarizer

1.Import library

from sklearn.preprocessing import MultiLabelBinarizer
import numpy as np

2.Prepare text data

y = [('Texas', 'Florida'), 
    ('California', 'Alabama'), 
    ('Texas', 'Florida'), 
    ('Delware', 'Florida'), 
    ('Texas', 'Alabama')]

3.Create one-hot encode using MultiLabelBinarizer()

one_hot = MultiLabelBinarizer()

# One-hot encode data
one_hot.fit_transform(y)

Run this code, you will get one-hoe encode as follows:

array([[0, 0, 0, 1, 1],
       [1, 1, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [0, 0, 1, 1, 0],
       [1, 0, 0, 0, 1]])

4.View word one-hot encode

print(one_hot.classes_)

Run this code, you will see:

array(['Alabama', 'California', 'Delware', 'Florida', 'Texas'], dtype=object)