[Deep Learning] Word Embedding

Word Embedding with Keras Embedding Layer

  • 정수 인덱스를 벡터로 매핑하는 딕셔너리 구조 (인덱스 크기, 벡터 크기)
  • 학습 시키는 데이터에 따라 다른 임베딩이 만들어진다.

  • IMDB 영화 리뷰 데이터를 사용한 임베딩 예제
    • IMDB: (internet movie database) the world’s most popular and authoritative source for movie, TV and celebrity content
import tensorflow as tf
import tensorflow.keras as keras 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Embedding, Dropout
from tensorflow.keras.layers import Conv1D, MaxPooling1D, GlobalMaxPooling1D
import os, os.path
import zipfile
from tensorflow.keras.datasets import imdb
from tensorflow.keras import preprocessing
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
  • 5000 개의 단어만 사용하고, 각 문장에서는 뒤에서부터 500 개의 단어만 사용하겠음.
max_features = 5000
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
y_train[:1000].sum(), y_train[-1000:].sum()   # can assume equally distributed
(494, 498)
x_train.shape, x_test.shape, y_train.shape, y_test.shape
((25000,), (25000,), (25000,), (25000,))
[1, 2, 365, 1234, 5, 1156, 354, 11, 14, 2, 2, 7, 1016, 2, 2, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 2, 2, 1117, 1831, 2, 5, 4831, 26, 6, 2, 4183, 17, 369, 37, 215, 1345, 143, 2, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 2, 2, 63, 271, 6, 196, 96, 949, 4121, 4, 2, 7, 4, 2212, 2436, 819, 63, 47, 77, 2, 180, 6, 227, 11, 94, 2494, 2, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 2, 99, 76, 23, 2, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]

word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}
print('---review with words---')
print([id2word.get(i, ' ') for i in x_train[6]])
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step
---review with words---
['the', 'and', 'full', 'involving', 'to', 'impressive', 'boring', 'this', 'as', 'and', 'and', 'br', 'villain', 'and', 'and', 'need', 'has', 'of', 'costumes', 'b', 'message', 'to', 'may', 'of', 'props', 'this', 'and', 'and', 'concept', 'issue', 'and', 'to', "god's", 'he', 'is', 'and', 'unfolds', 'movie', 'women', 'like', "isn't", 'surely', "i'm", 'and', 'to', 'toward', 'in', "here's", 'for', 'from', 'did', 'having', 'because', 'very', 'quality', 'it', 'is', 'and', 'and', 'really', 'book', 'is', 'both', 'too', 'worked', 'carl', 'of', 'and', 'br', 'of', 'reviewer', 'closer', 'figure', 'really', 'there', 'will', 'and', 'things', 'is', 'far', 'this', 'make', 'mistakes', 'and', 'was', "couldn't", 'of', 'few', 'br', 'of', 'you', 'to', "don't", 'female', 'than', 'place', 'she', 'to', 'was', 'between', 'that', 'nothing', 'and', 'movies', 'get', 'are', 'and', 'br', 'yes', 'female', 'just', 'its', 'because', 'many', 'br', 'of', 'overly', 'to', 'descent', 'people', 'time', 'very', 'bland']

# 각 문장이 몇개의 단어로 구성되어 있는지 확인
[len(x_train[i]) for i in range(10)]
[218, 189, 141, 550, 147, 43, 123, 562, 233, 130]
print(max([len(x_train[i]) for i in range(25000)]), min([len(x_train[i]) for i in range(25000)]))
print(max([len(x_test[i]) for i in range(25000)]), min([len(x_test[i]) for i in range(25000)]))
2494 11
2315 7
x_train[0:2]   # words tokenized and expressed by (word) numbers
array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32]),
       list([1, 194, 1153, 194, 2, 78, 228, 5, 6, 1463, 4369, 2, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 2, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 2, 5, 163, 11, 3215, 2, 4, 1153, 9, 194, 775, 7, 2, 2, 349, 2637, 148, 605, 2, 2, 15, 123, 125, 68, 2, 2, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 2, 5, 2, 656, 245, 2350, 5, 4, 2, 131, 152, 491, 18, 2, 32, 2, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95])],

# 마지막 500개의 단어들만 사용한다. -> 500개보다 적으면 똑같은 길이로 만들어 준다.
maxlen = 500
x_train_p=preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test_p=preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)
print(x_train_p.shape, x_test_p.shape)
(25000, 500) (25000, 500)
y_train.shape, y_test.shape
((25000,), (25000,))
  • Embedding()은 (number of samples, input_length)인 2D 정수 텐서를 입력받습니다. 이 때 각 sample은 정수 인코딩이 된 결과로, 정수의 시퀀스입니다. Embedding()은 워드 임베딩 작업을 수행하고 (number of samples, input_length, embedding word dimensionality)인 3D 텐서를 리턴합니다.

model = Sequential()
model.add(Embedding(5000, 32, input_length=maxlen)) # input 각 단어에 대해 32-vector 로 임베딩
model.add(Dense(1, activation='sigmoid'))

Model: "sequential"
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 500, 32)           160000    
flatten (Flatten)            (None, 16000)             0         
dense (Dense)                (None, 1)                 16001     
Total params: 176,001
Trainable params: 176,001
Non-trainable params: 0
model.input_shape, model.output_shape
((None, 500), (None, 1))
x_train_p.shape, y_train.shape
((25000, 500), (25000,))

history = model.fit(x_train_p, y_train,
                    epochs=10, batch_size=500,
Epoch 1/10
40/40 [==============================] - 2s 28ms/step - loss: 0.6844 - acc: 0.5545 - val_loss: 0.6608 - val_acc: 0.6428
Epoch 10/10
40/40 [==============================] - 1s 22ms/step - loss: 0.1239 - acc: 0.9635 - val_loss: 0.2851 - val_acc: 0.8860
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, '--')
plt.plot(epochs, val_acc)
plt.title('Training(--) and validation accuracy')

plt.plot(epochs, loss,  '--')
plt.plot(epochs, val_loss)
plt.title('Training(--) and validation loss')


# test score
scores = model.evaluate(x_test_p, y_test, verbose=0)
print('Test accuracy:', scores[1])
Test accuracy: 0.8815600275993347
  • 위의 결과는 500 개의 단어만 고려한 것임.
  • 각 단어를 독립적으로 다루었으며, 문장의 구성 정보를 고려하지 않음
  • 문장의 구조 정보를 고려하려면 임베딩 층 위에 합성곱이나 순환신경망 층을 추가한다

CNN (1D)

model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=32, input_length=maxlen))
model.add(Conv1D(128, 5, activation="relu"))
model.add(Conv1D(128, 5, activation="relu"))
model.add(Conv1D(128, 5, activation="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
Model: "sequential_1"
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 32)           160000    
conv1d (Conv1D)              (None, 496, 128)          20608     
max_pooling1d (MaxPooling1D) (None, 99, 128)           0         
conv1d_1 (Conv1D)            (None, 95, 128)           82048     
max_pooling1d_1 (MaxPooling1 (None, 19, 128)           0         
conv1d_2 (Conv1D)            (None, 15, 128)           82048     
global_max_pooling1d (Global (None, 128)               0         
dense_1 (Dense)              (None, 128)               16512     
dropout (Dropout)            (None, 128)               0         
dense_2 (Dense)              (None, 1)                 129       
Total params: 361,345
Trainable params: 361,345
Non-trainable params: 0

history = model.fit(x_train_p, y_train,
                    epochs=10, batch_size=500,
Epoch 1/10
40/40 [==============================] - 5s 57ms/step - loss: 0.6925 - acc: 0.5101 - val_loss: 0.6844 - val_acc: 0.6734
Epoch 10/10
40/40 [==============================] - 2s 54ms/step - loss: 0.0622 - acc: 0.9817 - val_loss: 0.5348 - val_acc: 0.8644

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, '--')
plt.plot(epochs, val_acc)
plt.title('Training(--) and validation accuracy')

plt.plot(epochs, loss,  '--')
plt.plot(epochs, val_loss)
plt.title('Training(--) and validation loss')


# test score
scores = model.evaluate(x_test_p, y_test, verbose=0)
print('Test accuracy:', scores[1])
Test accuracy: 0.8475599884986877
# prediction
       [0.9896063 ]], dtype=float32)


(25000, 500)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, LSTM, GRU

model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=32, input_length=maxlen))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
Model: "sequential_2"
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 500, 32)           160000    
gru (GRU)                    (None, 32)                6336      
dense_3 (Dense)              (None, 32)                1056      
dropout_1 (Dropout)          (None, 32)                0         
dense_4 (Dense)              (None, 1)                 33        
Total params: 167,425
Trainable params: 167,425
Non-trainable params: 0

%time history = model.fit(x_train_p, y_train, epochs=10, batch_size=500, validation_split=0.2)
Epoch 1/10
40/40 [==============================] - 4s 50ms/step - loss: 0.6888 - acc: 0.5610 - val_loss: 0.6766 - val_acc: 0.6250
Epoch 10/10
40/40 [==============================] - 2s 43ms/step - loss: 0.1485 - acc: 0.9554 - val_loss: 0.3976 - val_acc: 0.8702
Wall time: 19.4 s

# test score
scores = model.evaluate(x_test_p, y_test, verbose=0)
print('Test accuracy:', scores[1])
Test accuracy: 0.8632000088691711
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, '--')
plt.plot(epochs, val_acc)
plt.title('Training(--) and validation accuracy')

plt.plot(epochs, loss,  '--')
plt.plot(epochs, val_loss)
plt.title('Training(--) and validation loss')


# prediction
       [0.9718255 ],
       [0.99939275]], dtype=float32)
  • y_test[2] 는 무슨 문장일까?
word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}
print([id2word.get(i, ' ') for i in x_test[2]])
['the', 'plot', 'near', 'ears', 'recent', 'and', 'and', 'of', 'him', 'flicks', 'frank', 'br', 'by', 'excellent', 'and', 'br', 'of', 'past', 'and', 'near', 'really', 'all', 'and', 'family', 'four', 'and', 'to', 'movie', 'that', 'obvious', 'family', 'brave', 'movie', 'is', 'got', 'say', 'and', 'with', 'up', 'comment', 'this', 'and', 'been', 'of', 'entertaining', 'not', 'be', 'and', 'james', 'in', 'you', 'seen', 'and', 'and', 'portrayed', 'dirty', 'in', 'so', 'washington', 'and', 'this', 'you', 'minutes', 'no', 'all', 'station', 'all', 'after', 'and', 'promising', 'who', 'and', 'and', 'and', 'to', 'and', 'any', 'by', 'speed', 'they', 'is', 'my', 'as', 'screams', 'dirty', 'in', 'of', 'full', 'br', 'pacino', 'dignity', 'need', 'men', 'of', 'and', 'popular', 'really', 'all', 'way', 'this', 'and', 'this', 'and', 'they', 'is', 'my', 'no', 'standard', 'certainly', 'near', 'br', 'an', 'beach', 'with', 'this', 'make', 'and', 'i', 'i', 'of', 'fails', 'and', 'br', 'of', 'finished', 'wear', 'psycho', 'and', 'in', 'learn', 'in', 'twice', 'know', 'by', 'br', 'be', 'how', 'rings', 'and', 'with', 'is', 'seemed', 'fails', 'visually', 'and', 'extremely', 'movie', 'and', "it's", 'of', 'and', 'like', 'children', 'is', 'easily', 'is', 'and', 'br', 'simply', 'must', 'well', 'at', 'although', 'this', 'family', 'an', 'br', 'many', 'not', 'scene', 'that', 'it', 'time', 'seemed', 'de', 'ignored', 'up', 'they', 'boat', 'morning', 'like', 'well', 'force', 'of', 'and', 'sent', 'been', 'history', 'like', 'story', 'its', 'disappointing', 'same', 'of', 'club', 'and', 'watching', 'husband', 'reviewer', 'to', 'although', 'that', 'around', 'and', 'except', 'to', 'de', 'and', 'br', 'of', 'you', 'available', 'but', 'hours', 'animals', 'showing', 'br', 'of', 'and', 'than', 'dead', 'white', 'splatter', 'waiting', 'film', 'and', 'to', 'and', 'this', 'documentary', 'in', '3', 'and', 'of', 'accents', 'and', 'br', 'of', 'ann', 'i', 'i', 'comes', '9', 'it', 'place', 'this', 'is', 'and', 'of', 'and', 'and', 'know', 'of', 'and', 'he', 'bonus', 'film', 'were', 'central', 'to', 'one', 'oh', 'is', 'excellent', 'and', 'in', 'can', 'when', 'from', 'well', 'people', 'in', "characters'", 'chief', 'from', 'leaving', 'in', 'and', 'and', 'but', 'is', 'easily', 'of', 'and', 'he', 'and', 'speak', 'this', 'as', 'today', 'paul', 'that', 'against', 'one', 'will', 'actual', 'in', 'could', 'her', 'plot', 'and', 'and', 'few', 'grade', 'and', 'go', 'and', 'but', 'be', 'lot', 'it', 'oliver', 'movie', 'is', 'and', 'picture', 'and', 'feel', 'this', 'of', 'and', 'like', 'different', 'just', 'clichéd', 'girl', 'at', 'finds', 'is', 'and', 'no', 'and', 'glory', 'any', 'is', "children's", 'just', 'moment', 'like', 'and', 'any', 'of', 'and', 'leaving', 'for', 'as', 'it', 'even', 'cliche', 'to', 'purchased', 'is', 'money', 'easily', 'and', 'and', 'glory', 'any', 'is', 'and', 'i', 'i', 'and', 'film', 'as', 'and', 'set', 'actually', 'easily', 'like', 'and', 'sequel', 'any', 'of', 'and', 'ryan', 'made', 'film', 'is', 'and', 'br', 'and', 'constant', 'and', 'of', '90s', 'letting', 'deep', 'in', 'act', 'made', 'of', 'road', 'in', 'of', 'and', 'movie', 'and', 'rural', 'vhs', 'of', 'share', 'in', 'reaching', 'fact', 'of', 'and', 'and', 'and', 'of', '90s', 'to', 'them', 'book', 'are', 'is', 'and', 'and', 'and', 'and', 'they', 'funniest', 'is', 'white', 'courage', 'and', 'vegas', 'wooden', 'br', 'of', 'gender', 'and', 'unfortunately', 'of', '1968', 'no', 'of', 'years', 'and', 'and', 'true', 'up', 'and', 'and', 'but', '3', 'all', 'ordinary', 'be', 'and', 'to', 'and', 'were', 'deserve', 'film', 'and', 'and', 'of', 'creative', 'br', 'comes', 'their', 'kung', 'who', 'is', 'and', 'and', 'out', 'new', 'all', 'it', 'incomprehensible', 'it', 'episode', 'much', "that's", 'including', 'i', 'i', 'cartoon', 'of', 'my', 'certain', 'no', 'as', 'and', 'over', 'you', 'with', 'way', 'to', 'cartoon', 'of', 'enough', 'for', 'that', 'with', 'way', 'who', 'is', 'finished', 'and', 'they', 'of', 'and', 'br', 'for', 'and', 'and', 'stunts', 'black', 'that', 'story', 'at', 'actual', 'in', 'can', 'as', 'movie', 'is', 'and', 'has', 'though', 'songs', 'and', 'action', "it's", 'action', 'his', 'one', 'me', 'and', 'and', 'this', 'second', 'no', 'all', 'way', 'and', 'not', 'lee', 'and', 'be', 'moves', 'br', 'figure', 'of', 'you', 'boss', 'movie', 'is', 'and', '9', 'br', 'propaganda', 'and', 'and', 'after', 'at', 'of', 'smoke', 'splendid', 'snow', 'saturday', "it's", 'results', 'this', 'of', 'load', "it's", 'think', 'class', 'br', 'think', 'cop', 'for', 'games', 'make', 'southern', 'things', 'to', 'it', 'and', 'who', 'and', 'if', 'is', 'boyfriend', 'you', 'which', 'is', 'tony', 'by', 'this', 'make', 'and', 'too', 'not', 'make', 'above', 'it', 'even', 'background']

Combine CNN and RNN together

model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=32, input_length=maxlen))
model.add(Conv1D(64, 5, padding='valid', activation='relu',strides=1))
model.add(Dense(1, activation='sigmoid'))
Model: "sequential_3"
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 500, 32)           160000    
dropout_2 (Dropout)          (None, 500, 32)           0         
conv1d_3 (Conv1D)            (None, 496, 64)           10304     
max_pooling1d_2 (MaxPooling1 (None, 124, 64)           0         
lstm (LSTM)                  (None, 55)                26400     
dense_5 (Dense)              (None, 1)                 56        
Total params: 196,760
Trainable params: 196,760
Non-trainable params: 0

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
%time history = model.fit(x_train_p, y_train, epochs=10, batch_size=500, validation_split=0.2)
Epoch 1/10
40/40 [==============================] - 3s 54ms/step - loss: 0.6757 - acc: 0.5888 - val_loss: 0.6171 - val_acc: 0.6452
Epoch 10/10
40/40 [==============================] - 2s 46ms/step - loss: 0.1449 - acc: 0.9482 - val_loss: 0.3144 - val_acc: 0.8800
Wall time: 19.7 s

# test score
scores = model.evaluate(x_test_p, y_test, verbose=0)
print('Test accuracy:', scores[1])
Test accuracy: 0.8714799880981445
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, '--')
plt.plot(epochs, val_acc)
plt.title('Training(--) and validation accuracy')

plt.plot(epochs, loss,  '--')
plt.plot(epochs, val_loss)
plt.title('Training(--) and validation loss')


Exercise (연습)

  • By default, if a GPU is available, the embedding matrix will be placed on the GPU. This achieves the best performance.
  • in order to use CPU (too big to fit on GPU), you should use CPU
    • with tf.device(‘cpu:0’): -embedding_layer = Embedding(…)
    • embedding_layer.build()
import tensorflow as tf
# 문장 토큰화와 단어 토큰화
text=[['Hope', 'to', 'see', 'you', 'soon'],
      ['Nice', 'to', 'see', 'you', 'again']]

# 각 단어에 대한 정수 인코딩
text=[[0, 1, 2, 3, 4],[5, 1, 2, 3, 6]]

# 위 데이터가 아래의 임베딩 층의 입력이 된다. (훈련 없이 모양만 보기 위함)
embedding_layer = Embedding(7, 2, input_length=5)
result = embedding_layer(tf.constant([0, 1, 2, 3, 4, 5, 6]))

# 7은 단어의 개수. 즉, 단어 집합(vocabulary)의 크기이다.
# 2는 임베딩한 후의 벡터의 크기이다.
# 5는 각 입력 시퀀스의 길이. 즉, input_length이다.

[[ 0.01560372 -0.04292933]
 [ 0.0303275  -0.00451677]
 [-0.00240674 -0.01646507]
 [ 0.01426259 -0.02349484]
 [ 0.00645077 -0.04100402]
 [-0.01287322 -0.01720787]
 [ 0.02342038 -0.00832408]]

# input_legnth를 지정하지 않았을 때 (가변 길이 문장, None)
model = Sequential()
model.add(Embedding(7, 2))
Model: "sequential_7"
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, None, 2)           14        
flatten_3 (Flatten)          (None, None)              0         
Total params: 14
Trainable params: 14
Non-trainable params: 0
# input_length를 지정했을 때 (불변 길이 문장, input_length)
model = Sequential()
model.add(Embedding(7, 2, input_length=5)) # need input_length to be connected to Flatten then Dense layers
Model: "sequential_8"
Layer (type)                 Output Shape              Param #   
embedding_9 (Embedding)      (None, 5, 2)              14        
flatten_4 (Flatten)          (None, 10)                0         
Total params: 14
Trainable params: 14
Non-trainable params: 0

