Crucial for language understanding, information retrieval, and machine translation
The sun sets behind the mountains, casting a golden glow across the sky.
import spacy# Load the English language modelnlp = spacy.load("en_core_web_sm")# Example texttext ="The sun sets behind the mountains, casting a golden glow across the sky."# Process the text with spaCydoc = nlp(text)# Find the maximum length of token text and POS tagmax_token_length =max(len(token.text) for token in doc)max_pos_length =max(len(token.pos_) for token in doc)# Print each token along with its part-of-speech tagfor token in doc:print(f"Token: {token.text.ljust(max_token_length)} | POS Tag: {token.pos_.ljust(max_pos_length)}")
Token: The | POS Tag: DET
Token: sun | POS Tag: NOUN
Token: sets | POS Tag: VERB
Token: behind | POS Tag: ADP
Token: the | POS Tag: DET
Token: mountains | POS Tag: NOUN
Token: , | POS Tag: PUNCT
Token: casting | POS Tag: VERB
Token: a | POS Tag: DET
Token: golden | POS Tag: ADJ
Token: glow | POS Tag: NOUN
Token: across | POS Tag: ADP
Token: the | POS Tag: DET
Token: sky | POS Tag: NOUN
Token: . | POS Tag: PUNCT
Named-Entity Recognition (NER)
Identifying and classifying named entities in text
Essential for information retrieval, document summarization, and question-answering systems
Apple is considering buying a U.K. based startup called LanguageHero located in London for $1 billion.
import spacy# Load the English language modelnlp = spacy.load("en_core_web_sm")# Example texttext ="Apple is considering buying a U.K. based startup called LanguageHero located in London for $1 billion."# Process the text with spaCydoc = nlp(text)# Print each token along with its Named Entity labelfor ent in doc.ents:print(f"Entity: {ent.text.ljust(20)} | Label: {ent.label_}")
Analyzing text to determine sentiment (e.g., positive, negative, neutral)
Used for gauging customer satisfaction, monitoring social media sentiment, etc.
I love TextBlob! It’s an amazing library for natural language processing.
# python -m textblob.download_corporafrom textblob import TextBlob# Example texttext ="I love TextBlob! It's an amazing library for natural language processing."# Perform sentiment analysis with TextBlobblob = TextBlob(text)sentiment_score = blob.sentiment.polarity# Determine sentiment label based on sentiment scoreif sentiment_score >0: sentiment_label ="Positive"elif sentiment_score <0: sentiment_label ="Negative"else: sentiment_label ="Neutral"# Print sentiment analysis resultsprint(f"Text: {text}")print(f"Sentiment Score: {sentiment_score:.2f}")print(f"Sentiment Label: {sentiment_label}")
Text: I love TextBlob! It's an amazing library for natural language processing.
Sentiment Score: 0.44
Sentiment Label: Positive
Text Classification
Categorizing text documents into predefined classes
Widely used in email spam detection, sentiment analysis, and content categorization
from sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.svm import SVCfrom sklearn.pipeline import make_pipelinefrom sklearn.preprocessing import LabelEncoder# Example labeled datasettexts = ["I love this product!","This product is terrible.","Great service, highly recommended.","I had a bad experience with this company.",]labels = ["Positive","Negative","Positive","Negative",]# Create a TF-IDF vectorizervectorizer = TfidfVectorizer()# Encode labels as integerslabel_encoder = LabelEncoder()encoded_labels = label_encoder.fit_transform(labels)# Create a pipeline with TF-IDF vectorizer and SVM classifierclassifier = make_pipeline(vectorizer, SVC(kernel='linear'))# Train the classifierclassifier.fit(texts, encoded_labels)# Example test texttest_text ="I love what this product can do."# Predict the label for the test textpredicted_label = classifier.predict([test_text])[0]# Decode the predicted label back to original labelpredicted_label_text = label_encoder.inverse_transform([predicted_label])[0]# Print the predicted labelprint(f"Text: {test_text}")print(f"Predicted Label: {predicted_label_text}")
Text: I love what this product can do.
Predicted Label: Positive
Information Extraction
Extracting structured information from unstructured text data
Crucial for knowledge base construction, data integration, and business intelligence
Question-Answering
Generating accurate answers to user queries in natural language
Essential for information retrieval, virtual assistants, and educational applications
Machine Translation
Automatically translating text from one language to another
Facilitates communication across language barriers
Early Days: Rule-Based Approaches (1960s-1980s)
Rely heavily on rule-based approaches
Significant efforts in tasks like part-of-speech tagging, named entity recognition, and machine translation
Struggled with ambiguity and complexity of natural language
Rise of Statistical Methods (1990s-2000s)
Emergence of statistical methods
Techniques like Hidden Markov Models and Conditional Random Fields gained prominence
Improved performance in tasks such as text classification, sentiment analysis, and information extraction
Machine Learning Revolution (2010s)
Rise of machine learning, particularly deep learning
Exploration of neural network architectures tailored for NLP tasks