Tier 05 / Adjacent AI

The rest of the AI map. So nothing surprises you.

Chat is the front door. The rest of AI is image generation, voice synthesis, transcription, agents, and coding assistants. You don't need to master each one. You need to know they exist, what they do, and when to reach for them so the next product launch doesn't disorient you.

Approx 25 min read Tour, not deep dive Pick one or two to try

01 · The map

Six adjacent surfaces.

Each block below is a category, not a single product. Skim them all. Bookmark the one or two that fit how you work.

Image generation

From text, get a picture. Photos, illustrations, logos, mockups.

Video generation

Same idea, moving. Still rougher than image. Improving fast.

Voice & audio

Synthetic speech, voice cloning, music. Replace your "I'll record a voiceover later."

Transcription

Audio in, text out. Meetings, calls, voice notes, interviews.

Agents

AI that takes actions on your behalf, browse, click, fill forms, run jobs.

Coding assistants

AI inside the developer's editor (or terminal). Even non-coders benefit indirectly.

02 · Image generation

From a sentence to a picture.

The most accessible adjacent skill. Free tools exist. Paid tools are still cheap by any pre-AI standard.

ChatGPT (DALL-E + GPT-4o)built in

Inside the same chat box
Best for conversational editing ("now make the sky orange")
Good at text inside images

Gemini (Imagen / Nano Banana) built in

Inside Gemini app + Workspace
Excellent photorealism
Consistent characters across multiple images

Midjourney separate product

The aesthete's favorite
Best stylized / artistic output
Lives in a web app (formerly Discord)

Flux (Black Forest Labs) open model

Open-weights model behind many third-party image apps
Strong quality, runs anywhere
You'll meet it inside products like Ideogram, Krea, Replicate

Adobe Firefly creative pro

Inside Photoshop, Illustrator, Express
Trained on Adobe Stock, commercially safe
If you're already in the Adobe stack, the easiest entry

Ideogram design-friendly

Best in class at text inside images
Posters, logos, ads with readable copy
Free tier is generous

Prompting an image model is different

Describe the picture, not the task

"A photo of an empty desk by a window, morning light, shallow depth of field, 35mm" beats "make me a desk image."

Style words matter

"Watercolor," "ink wash," "Polaroid," "studio lighting," "isometric vector." Concrete style words drive the look.

Iterate by editing, not restarting

Most modern image tools let you say "same, but X" and keep continuity. Use that.

Rights matter

Commercial use rules vary by tool. Firefly is trained on licensed data and is safe for commercial work. The others are more ambiguous, generally fine for personal/internal use, riskier for ads or anything you'd sell. When in doubt, check the tool's terms.

03 · Video generation

From a sentence to moving pictures. Still rough, improving fast.

Video gen is roughly where image gen was two years ago: amazing demos, awkward production reality. Worth tracking, not yet a daily tool for most beginners.

OpenAI Sora

Bundled into ChatGPT for paid users
Strong at short, cinematic clips
Currently best for B-roll and concept videos

Google Veo

Inside Gemini and dedicated Google products
Best multimodal understanding (knows what you're asking for)
Tight integration with Workspace and YouTube

Runway

Independent video AI shop
Most professional editing surface, masks, motion, style transfer
Where many real creators are doing real work

Pika / Luma / Kling

A bench of fast-moving smaller players
Worth scanning quarterly; the leader changes

AiAi Bro

Don't subscribe to a video AI yet unless you have a specific job in mind. The free trials of Sora-in-ChatGPT or Veo-in-Gemini are enough to learn the shape. Real production workflows are still finicky, coherent characters, lip sync, anything longer than 10 seconds, and the best practice is to wait until you have a project demanding it.

04 · Voice & audio

Synthetic speech that's actually convincing.

Two separate things live under "voice AI": the voice mode inside ChatGPT/Claude/Gemini (Tier 4) and standalone voice generation tools that turn your text into a custom-sounding audio file. This section is about the second.

ElevenLabs market leader

The gold standard for synthetic voice
Voice cloning from 30 seconds of sample
Hundreds of preset voices in dozens of languages
What podcasters, audiobook narrators, video producers use

OpenAI TTS

Available via API; a few preset voices
Powers ChatGPT's voice mode
Good enough for most use cases; less expressive than ElevenLabs

Google TTS / Chirp

Inside Google Cloud + Workspace
Strong on multilingual
What you'll use if you're building inside Google's stack

Suno / Udio music

Different category: text-to-music
Type a song prompt, get a full track with vocals
Worth knowing exists; mostly novelty for non-music people

Common uses for synthetic voice

Voiceovers for video without recording yourself.
Audiobook-style narration of your own writing for re-reading on a walk.
Multilingual content, write once, narrate in 30 languages.
Phone IVRs and customer-service voices for small businesses.
Voice clones of yourself for content you can't physically record fast enough.

05 · Transcription

Audio in, accurate text out.

The unsexy adjacent skill that pays back the fastest. Anyone with meetings, calls, or voice notes saves real hours.

OpenAI Whisper

Open-source model; near-state-of-the-art accuracy
Powers many of the tools below
You can run it yourself; most people use it via a product

Otter.ai

Real-time meeting transcription + summaries
Integrates with Zoom, Meet, Teams
Mainstream choice for business users

Granola

Lighter, less intrusive; runs in the background of any call
Auto-summary, action items, your typed notes blended in
Increasingly the operator favorite

AssemblyAI / Deepgram

Developer-facing transcription APIs
What products you use are built on
Mention only so the names aren't unfamiliar

Built-in (Apple, Google)

Voice Memos transcribes natively on iOS
Pixel's Recorder app transcribes on-device
Free, instant, surprisingly good

The transcription → LLM workflow

Record a 10-minute voice note → transcribe → paste into an LLM → ask it to extract structure ("decisions made, follow-ups, open questions"). This is the single highest-leverage adjacent-AI move for operators. Costs nothing, saves hours weekly.

06 · Agents

AI that takes actions, not just answers.

An agent is a system that decides which step to take next without you telling it each one. Booking a flight, filling a form, scraping a site, sending emails, running a workflow. As a beginner, you'll mostly meet agents through three doors below.

OpenAI Operator computer use

An AI that drives a virtual browser for you
"Book me a table at X for Friday at 7"
Available in Pro plans; promising, still flaky for hard tasks

Claude with computer use / Claude Code

Claude Code runs in your terminal and acts on real files
Claude's computer-use API lets it click around a screen
The most capable developer-facing agent today

Gemini Deep Research / Agent surfaces

Multi-step research that browses the web for you and produces a report
Lives inside Gemini's app
Best for "research this topic for 20 minutes and come back to me"

n8n / Zapier / Make + AI

Workflow automation tools, now AI-aware
You wire steps together visually; LLMs handle the "thinking" steps
Where most real business agents actually live today

Agents are not yet a beginner sport

Setting up real agents is more involved than building a Custom GPT. As a beginner, treat agents as "interesting, watch the space." Once you're confident at Tier 4 across a couple of LLMs, then consider taking one workflow and turning it into an actual agent.

07 · Coding assistants

Even if you don't write code, know the names.

"AI-powered coding tool" is one of the biggest product categories in tech right now. Beginners can do real, useful things with these tools even without traditional programming skills.

Claude Code terminal

Runs in your terminal; acts on your real files
The most capable agentic coder as of this guide
Surprisingly approachable for non-developers

Cursor IDE

A code editor (forked from VS Code) with AI deeply built in
Industry favorite among professional developers
Pulls from multiple models; you choose

GitHub Copilot

Inside VS Code, JetBrains, and other editors
Microsoft + OpenAI partnership
The original mainstream coding AI; still solid

Windsurf / Cline / Continue

A growing field of agentic coders
Worth a scan if Cursor and Claude Code don't fit

Lovable / Bolt / v0 no-code app builders

Generate a working web app from a sentence
Best for prototypes, landing pages, internal tools
True entry point for non-coders who want to build software

AiAi Bro

If you have ever wanted "a little tool that does X" and stopped because you don't code: try Lovable or Bolt this week. You will be shocked what a non-developer can ship in an evening. The bar to build software has moved. The bar to build useful software has moved more.

08 · Search & research

AI that grounds answers in real sources.

A specialized adjacent category: tools that don't just generate, they go look something up first. Lower hallucination rate. Better for facts.

Perplexity

Routes through multiple models, grounds every answer in cited sources
The "research" search engine
Free tier is excellent; paid unlocks Pro Search and Spaces

ChatGPT Search

Built into ChatGPT
Pulls live web results, cites them inline
Solid for everyday quick research

Google AI Overviews / Gemini Grounding

The AI answer at the top of Google results
Grounding option inside Gemini for verifiable answers
Tied to the world's largest index

NotebookLM

Discussed in Tier 4, worth re-mentioning
Grounds answers in a finite set of sources you provide
Best for due diligence and study

09 · If you only do four things

The minimum-viable adjacent AI stack.

You don't need to subscribe to ten products. Most operators benefit from exactly four of the categories above. Pick one tool from each.

Image

One image generator

Whichever is built into your main LLM is usually enough. Add Midjourney only if you care about aesthetics.

Transcription

One transcription tool

Granola or Otter for meetings. Apple/Pixel voice memos for solo notes. Free options are great. Don't overthink.

Voice gen (optional)

ElevenLabs (free tier)

Set up an account. Clone your voice. You'll find uses for it over the next year.

Perplexity

Pin it as a tab. Use it whenever you'd reach for Google for a factual question. Single biggest research upgrade.

10 · Where you've landed

What you can do now.

If you've worked through all five tiers:

You can explain what an LLM is, in plain English, to someone who's never used one.

You can write a prompt that gets a useful answer on the first try.

You can pick between ChatGPT, Claude, and Gemini on purpose, based on the job.

You can build a Custom GPT, a Project, or a Gem for any recurring workflow.

You know what image gen, voice, transcription, agents, and coding tools are and when to reach for them.

You have the glossary as a permanent reference for anything you forgot.

AiAi Bro

You are no longer a beginner. You're a competent intermediate user across the three flagship LLMs. The next level, building agents, training your own models, integrating AI into business systems, is real, but it isn't this guide. Stay here a season. Get fluent. The depth comes from repetition, not from the next course.