What the paper is
Neural Content Intelligence (NCI) scores how a viewer's brain would respond to a video before the video ever ships. It takes the raw output of TRIBE v2, Meta's tri-modal brain encoding model trained on over 1,000 hours of fMRI, and turns predicted cortical activation into engagement signals a content team can actually read.
The pipeline is three steps. TRIBE v2 predicts BOLD response across roughly 70,000 cortical voxels for every second of a clip. Those voxels get aggregated onto the Yeo seven-network parcellation, a standard atlas that splits the cortex into functional systems like visual, dorsal attention, salience, limbic, and default mode. Those seven networks then collapse into five composite engagement metrics. A traditional fMRI study runs $15,000 to $150,000 and takes weeks. This runs as a forward pass in under two minutes on a single GPU.
NCI is the only approach that evaluates the video itself, before publication, at computational scale, with the explanation grounded in neuroscience rather than metadata.
Five engagement metrics
Each metric is computed from the network-level activation time course, so every score traces back to a specific cognitive system rather than a black-box number.
- Attention Retention Score (ARS). Sustained attention across the full clip, weighted toward dorsal attention and penalized for variance that signals an inconsistent hold.
- Emotional Impact Index (EII). Affective depth from limbic and default-mode activation, with a bonus for peak emotional moments. High EII tracks with sharing and recall.
- Hook Strength Score (HSS). The potency of the opening three seconds, prioritizing salience and visual intensity. Above 0.7 stops the scroll. Below 0.3 gets passed in a competitive feed.
- CTA Activation Score (CAS). Whether the content has primed decision-making circuits (frontoparietal control) at the moment a call to action lands.
- Neural Engagement Score (NES). A single composite that ranks a batch of assets or compares variants, reweightable for brand awareness versus direct response.
Five videos, five neural fingerprints
The proof of concept runs five real short-form videos spanning the major content archetypes. The headline result is that formats do not differ by degree, they engage qualitatively different brain systems, which means they need different optimization entirely.
- Leila Hormozi, business education (49s). Somatomotor 25% plus default mode 18%. Voice, cadence, and delivery drive engagement, not production value. Hook moments are distributed (14s, 19s, 30s), not front-loaded.
- Elon AI, tech commentary (60s). Same talking-head base, plus elevated frontoparietal (16%) as viewers critically evaluate claims. That evaluative state opens a natural CTA window.
- Perfume UGC street interview (35s). Highest somatomotor (27%) and elevated limbic (9%), the embodied, emotionally connected signature, with a strong 4s opening hook.
- Sanitary pad product demo (25s). Visual plus dorsal attention at 63% combined. Pure show-don't-tell. Frontoparietal is so low (5%) that viewers are watching, not evaluating, so a CTA needs a deliberate cognitive nudge first.
- Japanese ice cutter, viral satisfying (48s). The highest ventral attention (17%) of all five. That is the neural fingerprint of satisfying content: repeated surprise spikes that build an addictive watch-through loop.
Two clusters fall out cleanly. Talking-head formats run on somatomotor plus default mode (speech and narrative). Visual formats run on visual plus dorsal attention (sustained tracking). Metadata tools cannot see this distinction. Activation profiles make it obvious.
What the breakdown reveals
- Hook timing is format-dependent. The first-second dogma holds for UGC and news (hooks at 4s and 7s) and breaks for education and satisfying content, where the strongest moments land mid-to-late.
- Satisfying content has a measurable signature. The ice cutter's 17% ventral attention versus 8 to 12% elsewhere gives organic-reach creators a target they can check before publishing.
- CTA readiness varies wildly. Frontoparietal activation ranged from 5% (passive demo) to 16% (evaluative news). Placing a CTA by the clock instead of by neural readiness leaves conversion on the table.
- The loop effect is visible. Sustained activation through the final seconds keeps a viewer primed when the platform auto-replays, inflating the watch-time metrics that drive distribution.
Read the full paper
The complete 41-page paper includes the full methodology, the network-to-engagement mappings, deep per-video analyses with activation tables, and the limitations section. Read it inline below, or open the PDF.