Anthrogen has launched Odyssey, a household of protein language fashions for sequence and construction era, protein enhancing, and conditional design. The manufacturing fashions vary from 1.2B to 102B parameters. The Anthrogen’s analysis workforce positions Odyssey as a frontier, multimodal mannequin for actual protein design workloads, and notes that an API is in early entry.
https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf
What downside does Odyssey goal?
Protein design {couples} amino acid sequence with 3D construction and with practical context. Many prior fashions undertake self consideration, which mixes data throughout all the sequence directly. Proteins comply with geometric constraints, so lengthy vary results journey via native neighborhoods in 3D. Anthrogen frames this as a locality downside and proposes a brand new propagation rule, referred to as Consensus, that higher matches the area.
https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf
Enter illustration and tokenization
Odyssey is multimodal. It embeds sequence tokens, construction tokens, and light-weight practical cues, then fuses them right into a shared illustration. For construction, Odyssey makes use of a finite scalar quantizer, FSQ, to transform 3D geometry into compact tokens. Consider FSQ as an alphabet for shapes that lets the mannequin learn construction as simply as sequence. Useful cues can embrace area tags, secondary construction hints, orthologous group labels, or brief textual content descriptors. This joint view offers the mannequin entry to native sequence patterns and lengthy vary geometric relations in a single latent house.
https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf
Spine change, Consensus as a substitute of self consideration
Consensus replaces international self consideration with iterative, locality conscious updates on a sparse contact or sequence graph. Every layer encourages close by neighborhoods to agree first, then spreads that settlement outward throughout the chain and phone graph. This transformation alters compute. Self consideration scales as O(L²) with sequence size L. Anthrogen reviews that Consensus scales as O(L), which retains lengthy sequences and multi area constructs inexpensive. The corporate additionally reviews improved robustness to studying fee selections at bigger scales, which reduces brittle runs and restarts.
https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf
Coaching goal and era, discrete diffusion
Odyssey trains with discrete diffusion on sequence and construction tokens. The ahead course of applies masking noise that mimics mutation. The reverse time denoiser learns to reconstruct constant sequence and coordinates that work collectively. At inference, the identical reverse course of helps conditional era and enhancing. You may maintain a scaffold, repair a motif, masks a loop, add a practical tag, after which let the mannequin full the remainder whereas retaining sequence and construction in sync.
Anthrogen reviews matched comparisons the place diffusion outperforms masked language modeling throughout analysis. The web page notes decrease coaching perplexities for diffusion versus advanced masking, and decrease or comparable coaching perplexities versus easy masking. In validation, diffusion fashions outperform their masked counterparts, whereas a 1.2B masked mannequin tends to overfit to its personal masking schedule. The corporate argues that diffusion fashions the joint distribution of the complete protein, which aligns with sequence plus construction co design.
https://www.biorxiv.org/content/10.1101/2025.10.15.682677v1.full.pdf
Key takeaways
Odyssey is a multimodal protein mannequin household that fuses sequence, construction, and practical context, with manufacturing fashions at 1.2B, 8B, and 102B parameters.
Consensus replaces self consideration with locality conscious propagation that scales as O(L) and reveals sturdy studying fee conduct at bigger scales.
FSQ converts 3D coordinates into discrete construction tokens for joint sequence and construction modeling.
Discrete diffusion trains a reverse time denoiser and, in matched comparisons, outperforms masked language modeling throughout analysis.
Anthrogen reviews higher efficiency with about 10x much less information than competing fashions, which addresses information shortage in protein modeling.
Odyssey is spectacular mannequin as a result of it operationalizes joint sequence and construction modeling with FSQ, Consensus, and discrete diffusion, enabling conditional design and enhancing below sensible constraints. Odyssey scales to 102B parameters with O(L) complexity for Consensus, which lowers price for lengthy proteins and improves learning-rate robustness. Anthrogen reviews diffusion outperforming masked language modeling in matched evaluations, which aligns with co-design goals. The system targets multi-objective design, together with efficiency, specificity, stability, and manufacturability. The analysis workforce emphasizes information effectivity close to 10x versus competing fashions, which is materials in domains with scarce labeled information.
Try the Paper, and Technical particulars. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be part of us on telegram as nicely.
Michal Sutter is a knowledge science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking advanced datasets into actionable insights.
🙌 Observe MARKTECHPOST: Add us as a most popular supply on Google.

