Chat is the front door. The rest of AI is image generation, voice synthesis, transcription, agents, and coding assistants. You don't need to master each one. You need to know they exist, what they do, and when to reach for them so the next product launch doesn't disorient you.
Approx 25 min readTour, not deep divePick one or two to try
01 · The map
Six adjacent surfaces.
Each block below is a category, not a single product. Skim them all. Bookmark the one or two that fit how you work.
1
Image generation
From text, get a picture. Photos, illustrations, logos, mockups.
2
Video generation
Same idea, moving. Still rougher than image. Improving fast.
3
Voice & audio
Synthetic speech, voice cloning, music. Replace your "I'll record a voiceover later."
4
Transcription
Audio in, text out. Meetings, calls, voice notes, interviews.
5
Agents
AI that takes actions on your behalf, browse, click, fill forms, run jobs.
6
Coding assistants
AI inside the developer's editor (or terminal). Even non-coders benefit indirectly.
02 · Image generation
From a sentence to a picture.
The most accessible adjacent skill. Free tools exist. Paid tools are still cheap by any pre-AI standard.
ChatGPT (DALL-E + GPT-4o)built in
Inside the same chat box
Best for conversational editing ("now make the sky orange")
Good at text inside images
Gemini (Imagen / Nano Banana) built in
Inside Gemini app + Workspace
Excellent photorealism
Consistent characters across multiple images
Midjourney separate product
The aesthete's favorite
Best stylized / artistic output
Lives in a web app (formerly Discord)
Flux (Black Forest Labs) open model
Open-weights model behind many third-party image apps
Strong quality, runs anywhere
You'll meet it inside products like Ideogram, Krea, Replicate
Adobe Firefly creative pro
Inside Photoshop, Illustrator, Express
Trained on Adobe Stock, commercially safe
If you're already in the Adobe stack, the easiest entry
Ideogram design-friendly
Best in class at text inside images
Posters, logos, ads with readable copy
Free tier is generous
Prompting an image model is different
Describe the picture, not the task
"A photo of an empty desk by a window, morning light, shallow depth of field, 35mm" beats "make me a desk image."
Style words matter
"Watercolor," "ink wash," "Polaroid," "studio lighting," "isometric vector." Concrete style words drive the look.
Iterate by editing, not restarting
Most modern image tools let you say "same, but X" and keep continuity. Use that.
!
Rights matter
Commercial use rules vary by tool. Firefly is trained on licensed data and is safe for commercial work. The others are more ambiguous, generally fine for personal/internal use, riskier for ads or anything you'd sell. When in doubt, check the tool's terms.
03 · Video generation
From a sentence to moving pictures. Still rough, improving fast.
Video gen is roughly where image gen was two years ago: amazing demos, awkward production reality. Worth tracking, not yet a daily tool for most beginners.
OpenAI Sora
Bundled into ChatGPT for paid users
Strong at short, cinematic clips
Currently best for B-roll and concept videos
Google Veo
Inside Gemini and dedicated Google products
Best multimodal understanding (knows what you're asking for)
Tight integration with Workspace and YouTube
Runway
Independent video AI shop
Most professional editing surface, masks, motion, style transfer
Where many real creators are doing real work
Pika / Luma / Kling
A bench of fast-moving smaller players
Worth scanning quarterly; the leader changes
AB
AiAi Bro
Don't subscribe to a video AI yet unless you have a specific job in mind. The free trials of Sora-in-ChatGPT or Veo-in-Gemini are enough to learn the shape. Real production workflows are still finicky, coherent characters, lip sync, anything longer than 10 seconds, and the best practice is to wait until you have a project demanding it.
04 · Voice & audio
Synthetic speech that's actually convincing.
Two separate things live under "voice AI": the voice mode inside ChatGPT/Claude/Gemini (Tier 4) and standalone voice generation tools that turn your text into a custom-sounding audio file. This section is about the second.
ElevenLabs market leader
The gold standard for synthetic voice
Voice cloning from 30 seconds of sample
Hundreds of preset voices in dozens of languages
What podcasters, audiobook narrators, video producers use
OpenAI TTS
Available via API; a few preset voices
Powers ChatGPT's voice mode
Good enough for most use cases; less expressive than ElevenLabs
Google TTS / Chirp
Inside Google Cloud + Workspace
Strong on multilingual
What you'll use if you're building inside Google's stack
Suno / Udio music
Different category: text-to-music
Type a song prompt, get a full track with vocals
Worth knowing exists; mostly novelty for non-music people
Common uses for synthetic voice
Voiceovers for video without recording yourself.
Audiobook-style narration of your own writing for re-reading on a walk.
Multilingual content, write once, narrate in 30 languages.
Phone IVRs and customer-service voices for small businesses.
Voice clones of yourself for content you can't physically record fast enough.
05 · Transcription
Audio in, accurate text out.
The unsexy adjacent skill that pays back the fastest. Anyone with meetings, calls, or voice notes saves real hours.
OpenAI Whisper
Open-source model; near-state-of-the-art accuracy
Powers many of the tools below
You can run it yourself; most people use it via a product
Otter.ai
Real-time meeting transcription + summaries
Integrates with Zoom, Meet, Teams
Mainstream choice for business users
Granola
Lighter, less intrusive; runs in the background of any call
Auto-summary, action items, your typed notes blended in
Increasingly the operator favorite
AssemblyAI / Deepgram
Developer-facing transcription APIs
What products you use are built on
Mention only so the names aren't unfamiliar
Built-in (Apple, Google)
Voice Memos transcribes natively on iOS
Pixel's Recorder app transcribes on-device
Free, instant, surprisingly good
i
The transcription → LLM workflow
Record a 10-minute voice note → transcribe → paste into an LLM → ask it to extract structure ("decisions made, follow-ups, open questions"). This is the single highest-leverage adjacent-AI move for operators. Costs nothing, saves hours weekly.
06 · Agents
AI that takes actions, not just answers.
An agent is a system that decides which step to take next without you telling it each one. Booking a flight, filling a form, scraping a site, sending emails, running a workflow. As a beginner, you'll mostly meet agents through three doors below.
OpenAI Operator computer use
An AI that drives a virtual browser for you
"Book me a table at X for Friday at 7"
Available in Pro plans; promising, still flaky for hard tasks
Claude with computer use / Claude Code
Claude Code runs in your terminal and acts on real files
Claude's computer-use API lets it click around a screen
The most capable developer-facing agent today
Gemini Deep Research / Agent surfaces
Multi-step research that browses the web for you and produces a report
Lives inside Gemini's app
Best for "research this topic for 20 minutes and come back to me"
n8n / Zapier / Make + AI
Workflow automation tools, now AI-aware
You wire steps together visually; LLMs handle the "thinking" steps
Where most real business agents actually live today
!
Agents are not yet a beginner sport
Setting up real agents is more involved than building a Custom GPT. As a beginner, treat agents as "interesting, watch the space." Once you're confident at Tier 4 across a couple of LLMs, then consider taking one workflow and turning it into an actual agent.
07 · Coding assistants
Even if you don't write code, know the names.
"AI-powered coding tool" is one of the biggest product categories in tech right now. Beginners can do real, useful things with these tools even without traditional programming skills.
Claude Code terminal
Runs in your terminal; acts on your real files
The most capable agentic coder as of this guide
Surprisingly approachable for non-developers
Cursor IDE
A code editor (forked from VS Code) with AI deeply built in
Industry favorite among professional developers
Pulls from multiple models; you choose
GitHub Copilot
Inside VS Code, JetBrains, and other editors
Microsoft + OpenAI partnership
The original mainstream coding AI; still solid
Windsurf / Cline / Continue
A growing field of agentic coders
Worth a scan if Cursor and Claude Code don't fit
Lovable / Bolt / v0 no-code app builders
Generate a working web app from a sentence
Best for prototypes, landing pages, internal tools
True entry point for non-coders who want to build software
AB
AiAi Bro
If you have ever wanted "a little tool that does X" and stopped because you don't code: try Lovable or Bolt this week. You will be shocked what a non-developer can ship in an evening. The bar to build software has moved. The bar to build useful software has moved more.
08 · Search & research
AI that grounds answers in real sources.
A specialized adjacent category: tools that don't just generate, they go look something up first. Lower hallucination rate. Better for facts.
Perplexity
Routes through multiple models, grounds every answer in cited sources
The "research" search engine
Free tier is excellent; paid unlocks Pro Search and Spaces
ChatGPT Search
Built into ChatGPT
Pulls live web results, cites them inline
Solid for everyday quick research
Google AI Overviews / Gemini Grounding
The AI answer at the top of Google results
Grounding option inside Gemini for verifiable answers
Tied to the world's largest index
NotebookLM
Discussed in Tier 4, worth re-mentioning
Grounds answers in a finite set of sources you provide
Best for due diligence and study
09 · If you only do four things
The minimum-viable adjacent AI stack.
You don't need to subscribe to ten products. Most operators benefit from exactly four of the categories above. Pick one tool from each.
1
Image
One image generator
Whichever is built into your main LLM is usually enough. Add Midjourney only if you care about aesthetics.
2
Transcription
One transcription tool
Granola or Otter for meetings. Apple/Pixel voice memos for solo notes. Free options are great. Don't overthink.
3
Voice gen (optional)
ElevenLabs (free tier)
Set up an account. Clone your voice. You'll find uses for it over the next year.
4
Search
Perplexity
Pin it as a tab. Use it whenever you'd reach for Google for a factual question. Single biggest research upgrade.
10 · Where you've landed
What you can do now.
If you've worked through all five tiers:
You can explain what an LLM is, in plain English, to someone who's never used one.
You can write a prompt that gets a useful answer on the first try.
You can pick between ChatGPT, Claude, and Gemini on purpose, based on the job.
You can build a Custom GPT, a Project, or a Gem for any recurring workflow.
You know what image gen, voice, transcription, agents, and coding tools are and when to reach for them.
You have the glossary as a permanent reference for anything you forgot.
AB
AiAi Bro
You are no longer a beginner. You're a competent intermediate user across the three flagship LLMs. The next level, building agents, training your own models, integrating AI into business systems, is real, but it isn't this guide. Stay here a season. Get fluent. The depth comes from repetition, not from the next course.