= "The distant planet, its surface shrouded in mystery and intrigue! With its swirling clouds and alien landscapes, the planet: a tantalizing enigma to explorers and scientists alike? Oh, the wonders it conceals: ancient ruins and extraterrestrial life forms, waiting to be discovered! As the spacecraft descended through the atmosphere, anticipation filled the hearts of the crew. Little did they know, their journey was about to unveil secrets beyond their wildest imagination." paragraph
Exercise: Sentence tokenization
Task: Write a sentence tokenizer that takes the given paragraph and tokenizes it into sentences. Then, count the number of sentences and display the result.
Instructions:
- Start with just a simple punctuation (
.
) as the delimiter for sentences. - Check out the regex library
re
and the functionre.split
to include also other delimiters. Try out the following regexr'[.:;!?]\s*'
.
Show code
from typing import List
def tokenize_sentences_at_dot(paragraph: str) -> List[str]:
= paragraph.split(".")
sentence_tokens = [s.strip() for s in sentence_tokens if s.strip() != ""] # remove white space after .
sentence_tokens return sentence_tokens
= tokenize_sentences_at_dot(paragraph=paragraph)
tokenized_sentence print(f"The paragraph contains {len(tokenized_sentence)} sentences.")
The paragraph contains 2 sentences.
import re
def tokenize_sentences_at_punctuation(paragraph: str) -> List[str]:
= re.split(r'[.:;!?]\s*', paragraph)
sentence_tokens = [s.strip() for s in sentence_tokens if s.strip() != ""] # remove white space after .
sentence_tokens
return sentence_tokens
= tokenize_sentences_at_punctuation(paragraph=paragraph)
tokenized_sentence print(f"The paragraph contains {len(tokenized_sentence)} sentences.")
The paragraph contains 7 sentences.
for sentence in tokenized_sentence:
print(sentence)
The distant planet, its surface shrouded in mystery and intrigue
With its swirling clouds and alien landscapes, the planet
a tantalizing enigma to explorers and scientists alike
Oh, the wonders it conceals
ancient ruins and extraterrestrial life forms, waiting to be discovered
As the spacecraft descended through the atmosphere, anticipation filled the hearts of the crew
Little did they know, their journey was about to unveil secrets beyond their wildest imagination