Answer: Text embeddings!
flowchart LR A(A cat does cat things) --> B{" "} B --> C(A) B --> D(cat) B --> E(does) B --> F(cat) B --> G(things) D --> H(cat) E --> I(do) F --> J(cat) G --> K(thing) H --> L(cat: 2) J --> L I --> M(do: 1) K --> N(thing: 1) L --> O(2) M --> P(1) N --> Q(1)
flowchart LR A(Input Text) --> B(Tokenization) B --> C(Token processing) C --> D(Embedding Layer) D --> E(Hidden Layers) E --> F(Output Layer)
Idea: Train a model to predict the probability distribution of words or tokens in a sequence given the preceding context!
flowchart LR A(The) --> B(bird) B --> C(flew) C --> D(over) D --> E(the) E --> F{?} F --> G("p(rooftops)=0.31") F --> H("p(trees)=0.14") F --> J("p(guitar)=0.001")
BUT: They did it and it works!