Exercise: Sentence tokenization

Task: Write a sentence tokenizer that takes the given paragraph and tokenizes it into sentences. Then, count the number of sentences and display the result.

Instructions:

paragraph = "The distant planet, its surface shrouded in mystery and intrigue! With its swirling clouds and alien landscapes, the planet: a tantalizing enigma to explorers and scientists alike? Oh, the wonders it conceals: ancient ruins and extraterrestrial life forms, waiting to be discovered! As the spacecraft descended through the atmosphere, anticipation filled the hearts of the crew. Little did they know, their journey was about to unveil secrets beyond their wildest imagination."
Show code
from typing import List

def tokenize_sentences_at_dot(paragraph: str) -> List[str]:
    sentence_tokens = paragraph.split(".")
    sentence_tokens = [s.strip() for s in sentence_tokens if s.strip() != ""] # remove white space after .
    return sentence_tokens
tokenized_sentence = tokenize_sentences_at_dot(paragraph=paragraph)
print(f"The paragraph contains {len(tokenized_sentence)} sentences.")
The paragraph contains 2 sentences.
import re 

def tokenize_sentences_at_punctuation(paragraph: str) -> List[str]:
    sentence_tokens = re.split(r'[.:;!?]\s*', paragraph)
    sentence_tokens = [s.strip() for s in sentence_tokens if s.strip() != ""] # remove white space after .
    
    return sentence_tokens
tokenized_sentence = tokenize_sentences_at_punctuation(paragraph=paragraph)
print(f"The paragraph contains {len(tokenized_sentence)} sentences.")
The paragraph contains 7 sentences.
for sentence in tokenized_sentence:
    print(sentence)
The distant planet, its surface shrouded in mystery and intrigue
With its swirling clouds and alien landscapes, the planet
a tantalizing enigma to explorers and scientists alike
Oh, the wonders it conceals
ancient ruins and extraterrestrial life forms, waiting to be discovered
As the spacecraft descended through the atmosphere, anticipation filled the hearts of the crew
Little did they know, their journey was about to unveil secrets beyond their wildest imagination
Back to top