Voice and AI — The Complete Chapter-by-Chapter Series (Index)

An index to the full Voice and AI blog series: 34 chapter previews tracing voice from physics and biology through recognition, synthesis, cloning, systems, design, ethics, and the future.

Last updated on: Sho Shimoda

Welcome to the companion blog series for my book Voice and AI. Voice technology is usually explained either as low-level signal processing or as high-level product hype, with little connecting the two. This book β€” and this series β€” treats voice as a single end-to-end system: rooted in physics and biology, shaped by data and models, and ultimately constrained by design, ethics, and trust.


How the series works

Each post previews one chapter of the book β€” enough to genuinely shift how you think about a topic, while the full depth lives in the book itself. The 34 chapters are organized into ten parts that move from the most physical aspects of voice all the way to its social and ethical future. You can read straight through in order, or jump to whatever interests you most. Below is the complete map.


Part I β€” Foundations of Voice

Chapter 1: What Is "Voice"? β€” Why voice is neither just sound nor just text, and why that distinction underlies everything.

Chapter 2: The Physics of Voice β€” Frequency, harmonics, formants, and timbre: the structure beneath every voice.

Chapter 3: The Biology of the Human Voice β€” The source–filter model and why individuality is built in.

Part II β€” From Voice to Data

Chapter 4: Digitizing Voice β€” Sampling, bit depth, and the performance ceiling set at capture time.

Chapter 5: Signal Processing Fundamentals β€” Seeing sound with the Fourier transform and spectrograms.

Chapter 6: Classical Speech Processing β€” MFCC, LPC, and why they still echo inside modern models.

Part III β€” Understanding Speech

Chapter 7: The Problem of Understanding Speech β€” Why turning sound into meaning is so hard.

Chapter 8: The Evolution of ASR β€” From rules to hidden Markov models to deep learning.

Chapter 9: Modern ASR Architectures β€” CTC, attention, and transformers.

Chapter 10: Multilingual and Low-Resource Speech β€” The long tail of human language.

Part IV β€” Generating Speech

Chapter 11: The Goal of Synthetic Speech β€” Why "sounding human" is a moving target.

Chapter 12: Classical Text-to-Speech β€” Concatenative, unit selection, and parametric synthesis.

Chapter 13: The Neural TTS Revolution β€” Tacotron, WaveNet, and FastSpeech.

Chapter 14: Vocoders β€” Why waveform generation decides perceived quality.

Chapter 15: Prosody and Emotion β€” Where synthetic speech becomes believable.

Part V β€” Voice Cloning and Personal Voices

Chapter 16: What Is Voice Cloning? β€” Similarity, adaptation, and identity.

Chapter 17: Speaker Embeddings β€” Turning a voice into a reusable identity.

Chapter 18: Building Custom Voices β€” Data, fine-tuning, and real trade-offs.

Chapter 19: Limitations and Risks β€” What personal voices can't do, and what goes wrong.

Part VI β€” Voice AI Systems

Chapter 20: Voice Pipelines β€” ASR, LLM, and TTS as one system.

Chapter 21: Conversational Voice AI β€” Why timing is everything.

Chapter 22: Multimodal Voice AI β€” When voice is one channel among many.

Part VII β€” Scaling and Platforms

Chapter 23: Voice AI at Scale β€” Why a demo isn't a product.

Chapter 24: APIs and Platforms β€” Cloud, edge, and hybrid voice AI.

Chapter 25: Cost Models β€” Designing voice AI that's sustainable.

Part VIII β€” Designing with Voice

Chapter 26: Designing for Voice β€” A UX discipline of its own.

Chapter 27: Voice Personas β€” The voice is the product.

Chapter 28: Localization and Cultural Voice β€” Why voice doesn't travel by default.

Part IX β€” Ethics, Law, and Risk

Chapter 29: Voice as Personal Data β€” The most sensitive signal you collect.

Chapter 30: Consent and Regulation β€” Architecture, not afterthought.

Chapter 31: Deepfakes and Misuse β€” Defending trust in synthetic voice.

Part X β€” The Future of Voice

Chapter 32: Voice-Native Computing β€” When speech becomes the OS.

Chapter 33: Synthetic Humans and Digital Beings β€” When voice becomes presence.

Chapter 34: Where Voice AI Is Headed β€” From accuracy to trust.


Start at Chapter 1 and read straight through, or pick the part that pulls at you. The ideas you can experience in practice too β€” through Fresvia.com, a text-to-speech and voice AI service where you can experiment with real voice systems rather than treating them as abstractions.

Want the full picture? Grab Voice and AI here for the complete, end-to-end treatment behind every chapter in this series.