Chapter 2: Generative AI Fundamentals

Introduction: The Building Blocks of Generation

To harness generative AI for scientific discovery, we must understand how these models workβ€”not as black boxes, but as computational systems with clear mathematical foundations and architectural principles. This chapter demystifies the core technologies powering modern generative AI: transformers, diffusion models, flow matching, and variational approaches using plain English. We’ll explore their mechanisms, understand when to use each, and see how they can be adapted for scientific applications.

While the mathematics can become complex, our focus is on building intuition. Scientists don’t need to be machine learning experts to use these tools effectively, but understanding the fundamentals will help you choose the right model architecture, diagnose failures, adapt pre-trained models, collaborate with computational researchers, and evaluate limitations.

* * *

The Three Pillars of Generative AI

Modern generative AI rests on three major architectural families:

Architecture Core Mechanism Best For Example Applications
Transformers & LLMs Self-attention over sequences Text, code, sequences Literature synthesis, protein sequences, code generation
Diffusion & Flow Models Iterative denoising / flow matching Structured outputs, images Molecular structures, protein folding, climate data
VAEs & GANs Latent space learning Data generation, interpolation Synthetic data, anomaly detection, compression
* * *

Part I: Transformers and Large Language Models

The Attention Revolution

Traditional neural networks process sequences step-by-step. Transformers changed this with attention: let every element directly attend to every other element simultaneously [1]. The breakthrough β€œAttention Is All You Need” paper introduced self-attention mechanisms that model long-range dependencies in parallel, eliminating the sequential bottleneck of recurrent networks.

The Attention Mechanism

Attention computes three quantities for each sequence element [1]:

Component Role Intuition
Query (Q) What am I looking for? The question each element asks
Key (K) What do I contain? How each element describes itself
Value (V) What do I contribute? The information each element provides

Attention formula:

Attention(Q, K, V) = softmax(Q·K^T / √d_k) · V

Here, β€œsoftmax” turns raw similarity scores into a meaningful probability distribution that tells the model how much attention to pay to each token.

Why This Works for Science:

  • Protein Sequences: Connect distant amino acids that interact in 3D
  • Scientific Literature: Link concepts mentioned paragraphs apart
  • Chemical Reactions: Identify relationships between reactants and products

The Transformer Architecture

The original encoder-decoder architecture [1] consists of stacked attention and feed-forward layers:

Input β†’ Embedding β†’ Positional Encoding
    ↓
[ Multi-Head Attention β†’ Add & Norm
    ↓
  Feed-Forward β†’ Add & Norm ] Γ— N layers
    ↓
Output Probabilities

Key Components:

Component Purpose Scientific Benefit
Multi-Head Attention Learn different patterns Capture multiple relationship types
Residual Connections Enable deep networks Scale to billions of parameters
Layer Normalization Stabilize training Faster convergence
Positional Encoding Track sequence order Understand structure (DNA direction, time series)

From Transformers to Large Language Models

1. Pre-Training: Learning General Patterns

Objective: Predict next token given context.

Input:  "The protein binds to the"
Target: "receptor"

The GPT series demonstrated the power of autoregressive language modeling at scale [2, 3], while BERT showed bidirectional pre-training for understanding tasks [4]. More recent models like LLaMA [5], Gemini [6], and Claude push the boundaries with trillions of tokens and multimodal capabilities.

Scientific Pre-Trained Models:

  • SciBERT [7]: Scientific papers
  • ESM-2 [8]: Protein sequences (750M sequences)
  • ESM-3 [9]: Multimodal protein model (sequence, structure, function) with 98B parameters (2024/2025)
  • Llama 3.1/3.2/3.3 [10]: Open foundation models with 128K context (2024)

2. Fine-Tuning: Domain Specialization

Full fine-tuning updates all model parameters, but parameter-efficient methods like LoRA [11] enable adaptation with minimal computational cost by learning low-rank updates to weight matrices.

Method Trainable Params Data Needed Use Case
Full Fine-Tuning 100% 10K-100K Maximum adaptation
LoRA [11] 0.1-1% 100-10K Limited compute
Adapters [12] 1-5% 1K-10K Task-specific layers
QLoRA [13] 0.1-1% 100-10K Quantized + LoRA (fine-tune 70B on consumer GPU)

3. Prompting: Zero-Shot and Few-Shot

Large language models demonstrate remarkable few-shot learning capabilities [3], enabling scientific applications without task-specific training.

Zero-Shot:

Summarize this oceanography paper: [text]

Few-Shot:

Examples:
SMILES: CCO β†’ Name: Ethanol
SMILES: CC(C)O β†’ Name: Propan-2-ol

SMILES: CCCO β†’ Name: ?

Reasoning Models: A New Paradigm (2024–2025)

A significant development in 2024–2025 is the emergence of reasoning models that β€œthink before they respond” [14, 15]. Unlike standard LLMs that generate answers in a single pass, reasoning models like OpenAI’s o1/o3 series and Gemini Deep Think produce an internal chain of thought before responding.

Key Characteristics:

  • Extended reasoning time for complex problems
  • Multi-step planning and self-verification
  • Superior performance on math, coding, and scientific reasoning
  • Configurable β€œreasoning effort” levels

Scientific Applications:

  • Complex mathematical proofs
  • Multi-step experimental design
  • Code debugging and algorithm implementation
  • Hypothesis evaluation requiring logical chains

Limitations for Science

Challenge Impact Mitigation
Hallucination False information RAG [16], verification, reasoning models
Lack of Uncertainty Overconfidence Ensemble methods [17]
Data Cutoff Outdated knowledge Fine-tuning, RAG [16]
Context Limits Long document handling Extended context models (128K+ tokens)
* * *

Part II: Diffusion Models and Flow Matching

The Denoising Paradigm

Diffusion models learn to generate by reversing a noise-addition process [18, 19]. The foundational work by Sohl-Dickstein et al. [18] established the thermodynamic interpretation, while Ho et al. [19] simplified training with the denoising objective.

Forward Process: Adding Noise

xβ‚€ (real data) β†’ x₁ β†’ xβ‚‚ β†’ ... β†’ xβ‚œ (pure noise)

At each step: xβ‚œ = √(1-Ξ²β‚œ)Β·xβ‚œβ‚‹β‚ + βˆšΞ²β‚œΒ·Ξ΅ where Ξ΅ ~ N(0,I)

Reverse Process: Learning to Denoise

Train a network to predict the noise [19]:

Loss = E[||Ξ΅ - Ξ΅β‚šα΅£β‚‘π’Ή(xβ‚œ, t)||Β²]

Generation: Sampling

  1. Start with noise: xβ‚œ ~ N(0, I)
  2. Iteratively denoise: xβ‚œ β†’ xβ‚œβ‚‹β‚ β†’ ... β†’ xβ‚€
  3. Result: novel sample

Flow Matching: A Powerful Alternative (2023–2025)

Flow Matching [20] has emerged as a powerful and efficient alternative to diffusion-based generative modeling, with growing interest in scientific applications [21]. Rather than learning to reverse a noising process, flow matching learns a velocity field that transports samples from noise to data along continuous paths.

Key Advantages:

  • Straighter trajectories: Fewer sampling steps required
  • Stable training: Simpler objectives, less hyperparameter tuning
  • Faster inference: Often 2-4x speedup over diffusion
  • Flexible conditioning: Natural incorporation of constraints

Flow Matching in Biology (2024–2025):

  • Molecule generation (NeurIPS 2023) [21]
  • Protein backbone generation with SE(3)-equivariant flows (ICLR 2024) [21]
  • Antibody design with IgFlow and dyAb [21]
  • Biological sequence and peptide generation (ICML 2024) [21]

Why Diffusion/Flow Works for Science

Score-based generative modeling [22] provides a continuous-time perspective, while empirical results show diffusion models produce higher-quality samples than GANs [23].

Feature Scientific Benefit
High Quality Realistic structures
Stable Training Easier than GANs [23]
Interpretable Visualize generation
Conditional Incorporate constraints [24]
Uncertainty Multiple samples

Applications:

  • Molecular conformations with valency constraints [25]
  • Protein structures respecting physics [26]
  • Climate data filling gaps while conserving energy
  • High-resolution image synthesis with latent diffusion [27]
  • Probabilistic weather forecasting with GenCast [28]

Conditional Diffusion

Generate data with specific properties using classifier guidance or classifier-free guidance [24]:

p(xβ‚œβ‚‹β‚ | xβ‚œ, condition)

Conditioning Methods:

Method Example
Classifier Guidance [23] β€œMolecule binds to protein X”
Classifier-Free [24] Train with/without conditions
Inpainting Fill missing climate data
* * *

Part III: VAEs and GANs

Variational Autoencoders (VAEs)

VAEs learn probabilistic latent representations through variational inference [29]:

Architecture:

Encoder: x β†’ [ΞΌ(x), Οƒ(x)]
Sample: z ~ N(ΞΌ, σ²)
Decoder: z β†’ xΜ‚

Loss:

Total = Reconstruction + KL_Divergence

Scientific Uses:

  • Chemistry: Interpolate between molecules [30]
  • Materials: Optimize in latent space
  • Genomics: Compress gene expression data

Generative Adversarial Networks (GANs)

GANs introduced adversarial training between generator and discriminator [31]:

Two Networks in Competition:

Network Role Goal
Generator Create fakes Fool discriminator
Discriminator Judge real/fake Detect fakes

Challenges:

  • Mode collapse (limited diversity)
  • Training instability
  • Difficult evaluation

Scientific Uses:

  • Data augmentation
  • Super-resolution
  • Image-to-image translation
  • Molecular graph generation [32]

Comparison

Criterion VAE GAN Diffusion Flow Matching
Quality Good Excellent Excellent Excellent
Stability Stable Unstable Stable Very Stable
Diversity Good Poor Excellent Excellent
Speed Fast Fast Slow Moderate
Latent Space Interpretable Opaque Implicit Implicit
* * *

Part IV: Pre-Training and Fine-Tuning

The Transfer Learning Pipeline

Pre-Training (general data, millions of examples)
    ↓
Fine-Tuning (domain data, thousands of examples)
    ↓
Prompting (task-specific, zero examples)

Foundation models [33] trained on massive corpora can be adapted to downstream scientific tasks with limited data.

Parameter-Efficient Fine-Tuning (PEFT)

LoRA Example [11]:

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,  # Adaptation rank
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"]
)

model = get_peft_model(base_model, config)
# Now only 0.5% of parameters are trainable!

QLoRA [13] combines quantization with LoRA, enabling fine-tuning of 70B models on consumer GPUsβ€”a game-changer for scientific labs with limited compute budgets.

Benefits:

  • Train on single GPU
  • Fast iteration
  • Multiple task-specific versions

Open vs. Closed Models for Science

The 2024–2025 landscape offers scientists unprecedented choice:

Model Family Access Parameters Best For
Llama 3.1 / 3.2 / 3.3 Open weights 1B–405B (text); 11B / 90B (vision variants) Local deployment, fine-tuning, on-prem and private customization
OpenAI GPT-4/4o/5.1/5.2 API (and ChatGPT) Not disclosed Advanced reasoning, multimodal understanding, coding, tool-using agents
Gemini 2.0/3.0 (Pro, Flash) API (Gemini API, Vertex AI) Not disclosed Multimodal tasks, very long context (up to ~1M tokens on supported models)
Claude (Claude 3.5, Sonnet/Opus 4.5) API Not disclosed Deep analysis, coding, safety-oriented assistants, long-context workflows
**ESM-3 ** API + open weights (ESM3-open) 1.4B (open); up to ~98B (largest) Protein modeling, structure/function prediction, generative protein design
* * *

Part V: Mathematical Foundations

Low-Rank Approximation

Large models use low-rank structures for efficiency [11].

Matrix Factorization:

W ∈ R^(mΓ—n) β‰ˆ AΒ·B where A ∈ R^(mΓ—r), B ∈ R^(rΓ—n), r << min(m,n)

Applications:

  • LoRA adaptation [11]
  • Model compression
  • Fast inference

Quantization

Reduce precision for efficiency [34, 35]:

Precision Bits Range Use Case
FP32 32 Full Training
FP16/BF16 16 Mixed Fast training
INT8 8 Limited Inference
INT4 4 Very limited Edge deployment, QLoRA

LLM.int8() [34] and GPTQ [35] enable accurate quantization without retraining.

Benefits:

  • 4Γ— smaller models (FP32 β†’ INT8)
  • 2-4Γ— faster inference
  • Run large models on consumer GPUs

Optimization Theory

Key Algorithms:

Optimizer Mechanism When to Use
SGD Basic gradient descent Small models, well-understood problems
Adam [36] Adaptive learning rates Default choice, most robust
AdamW [37] Adam + weight decay Large language models

Learning Rate Schedules:

Warmup β†’ Constant β†’ Decay
  • Warmup: Start small, gradually increase (stabilize training)
  • Constant: Maintain learning rate (main training)
  • Decay: Reduce toward end (fine-tune solution)
* * *

Part VI: Types of Generative AI by Modality

Text Generation

Capabilities:

  • Scientific writing
  • Code generation
  • Literature synthesis
  • Hypothesis generation

Models: GPT-4o/4.5/o1/o3 [14, 15], Llama 3.1/3.2/3.3/4 [10], Gemini 2.0/3.0 [6], Claude.

Image Generation

Capabilities:

  • Synthetic microscopy
  • Medical imaging
  • Scientific visualization
  • Data augmentation

Models: Stable Diffusion 3/3.5 [38], SDXL [27], DALL-E 3

Molecular Generation

Capabilities:

  • Drug discovery
  • Material design
  • Reaction prediction

Representations:

  • SMILES strings (text-like)
  • Molecular graphs [32]
  • 3D conformations [25]

Models: MolGAN [32], Diffusion-based generators [25], VAEs [30], Flow matching models [21]

Protein Generation

Capabilities:

  • Structure prediction (AlphaFold 3 [39], ESMFold [8])
  • Sequence design
  • Function prediction
  • Multimodal generation (ESM-3) [9]

Models: ESM-2 [8], ESM-3 [9], ProtGPT2 [40], RFdiffusion [26]

2024–2025 Highlight: ESM-3 [9] ESM-3, published in Science (January 2025), is a 98B parameter multimodal generative model that reasons over protein sequence, structure, and function simultaneously. It generated a novel green fluorescent protein (GFP) with only 58% sequence identity to known GFPsβ€”equivalent to simulating 500 million years of evolution.

Graph Generation

Capabilities:

  • Molecular graphs
  • Protein-protein interaction networks
  • Knowledge graphs

Models: GraphRNN, MolGAN [32], GraphVAE

Multimodal Generation

Capabilities:

  • Text β†’ Image (CLIP [41] + diffusion)
  • Text β†’ Molecule (MolT5)
  • Image β†’ Text (scientific figure captioning)
  • Cross-modal retrieval

Models: CLIP [41], GPT-4o/4V [15], Gemini 2.0 [6]

* * *

Design Principles for Scientific Applications

1. Incorporate Domain Knowledge

Physics-Informed Neural Networks (PINNs) [42]: Embed differential equations in loss function:

Loss = Data_Loss + λ·Physics_Loss

Physics_Loss = ||βˆ‚u/βˆ‚t - Ξ±βˆ‡Β²u||Β²  (heat equation)

Benefits:

  • Better generalization
  • Physical consistency
  • Data efficiency

2. Quantify Uncertainty

Methods:

  • Ensemble [17]: Train multiple models, measure spread
  • Bayesian [43]: Maintain distributions over parameters
  • Conformal [44]: Finite-sample coverage guarantees

Why Essential: Scientific decisions have consequencesβ€”we must know when models are uncertain.

3. Ensure Reproducibility

Checklist:

  • Set random seeds
  • Log hyperparameters
  • Version control data and code
  • Document environment
  • Share trained models

4. Validate Rigorously

Validation Type Purpose
Hold-out Test Independent performance
Cross-validation Robust estimates
Domain Expert Scientific plausibility
Physical Constraints Obey natural laws
Out-of-distribution Generalization limits
* * *

Practical Considerations

Model Selection Guide

Your Goal Recommended Architecture Rationale
Generate scientific text Transformer LLM [2, 3, 10] Excellent at language
Complex reasoning/math Reasoning models (o1/o3) [14, 15] Multi-step verification
Design molecules Diffusion [25] or Flow Matching [21] High-quality structures
Predict protein structure ESM-3 [9], AlphaFold 3 [39] State-of-the-art specialized models
Fill missing data Conditional diffusion [24] Respects constraints
Synthetic data augmentation GAN [31] or VAE [29] Fast generation
Explore design space VAE [29] Interpretable latent space

Computational Requirements

Task Typical Resources Alternative
Fine-tune small LLM Single GPU, 1-2 days Use LoRA/QLoRA [11, 13] for efficiency
Train diffusion model 4-8 GPUs, 3-7 days Use pre-trained models [27]
Inference (LLM) 1 GPU or CPU Quantization [34, 35] for speed
Protein structure 1 GPU, minutes Use hosted APIs

Common Pitfalls

Pitfall Consequence Solution
Overfitting Poor generalization Regularization, more data
Underfitting Poor performance Larger model, more training
Data leakage Inflated metrics Careful splitting
Ignoring uncertainty Overconfident predictions Uncertainty quantification [17, 43, 44]
Black-box use Unexplainable failures Understand fundamentals
* * *

Summary

This chapter introduced the core architectures powering generative AI:

Transformers [1] excel at sequential data through attention mechanisms, enabling scientific text generation [2, 3, 10], code synthesis, and protein language models [8, 9]. The emergence of reasoning models [14, 15] in 2024–2025 represents a paradigm shift for complex scientific problem-solving.

Diffusion models [18, 19, 22] and flow matching [20, 21] generate high-quality structured outputs by learning to reverse noise or transport samples along continuous paths, ideal for molecular design [25], protein structures [26], and climate data reconstruction [28].

VAEs [29] and GANs [31] provide complementary approaches for data generation, exploration, and augmentation.

Transfer learning [33]β€”pre-training followed by fine-tuning [11, 12, 13]β€”allows scientists to leverage massive general models with modest domain-specific data. Open models like Llama 3/4 [10] democratize access for scientific labs.

Mathematical foundations like low-rank approximation [11], quantization [34, 35], and optimization theory [36, 37] enable efficient training and deployment.

Understanding these fundamentals empowers you to:

  • Choose appropriate models for your scientific problems
  • Adapt existing models to new domains
  • Diagnose and fix failures
  • Collaborate effectively with AI researchers
  • Push the boundaries of what’s possible in your field
* * *

References

  1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762

  2. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners (GPT-2). OpenAI Blog. https://openai.com/research/better-language-models

  3. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., et al. (2020). Language models are few-shot learners (GPT-3). Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165

  4. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, 4171–4186. https://arxiv.org/abs/1810.04805

  5. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., et al. (2023). LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971. https://arxiv.org/abs/2302.13971

  6. Gemini Team, Google. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530. https://arxiv.org/abs/2403.05530

  7. Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. Proceedings of EMNLP-IJCNLP, 3615–3620. https://arxiv.org/abs/1903.10676

  8. Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model (ESM-2/ESMFold). Science, 379(6637), 1123–1130. https://doi.org/10.1126/science.ade2574

  9. Hayes, T., Rao, R., Akin, H., Sofroniew, N. J., Oktay, D., Lin, Z., et al. (2025). Simulating 500 million years of evolution with a language model (ESM-3). Science, 387(6736), 850–858. https://doi.org/10.1126/science.ads0018

  10. Meta AI. (2024). Llama 3.1, 3.2, and 3.3: The most capable openly available LLMs. Meta AI Blog. https://ai.meta.com/blog/meta-llama-3-1/

  11. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., et al. (2022). LoRA: Low-rank adaptation of large language models. Proceedings of ICLR. https://arxiv.org/abs/2106.09685

  12. Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., et al. (2019). Parameter-efficient transfer learning for NLP. Proceedings of ICML, 2790–2799. https://arxiv.org/abs/1902.00751

  13. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2024). QLoRA: Efficient finetuning of quantized LLMs. Proceedings of NeurIPS, 36. https://arxiv.org/abs/2305.14314

  14. OpenAI. (2024). Introducing OpenAI o1. OpenAI Blog. https://openai.com/index/introducing-openai-o1-preview/

  15. OpenAI. (2024). GPT-4o and reasoning models. OpenAI Platform. https://platform.openai.com/docs/models

  16. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Proceedings of NeurIPS, 33, 9459–9474. https://arxiv.org/abs/2005.11401

  17. Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. Proceedings of NeurIPS, 30. https://arxiv.org/abs/1612.01474

  18. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. Proceedings of ICML, 2256–2265. https://arxiv.org/abs/1503.03585

  19. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851. https://arxiv.org/abs/2006.11239

  20. Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., & Le, M. (2023). Flow matching for generative modeling. Proceedings of ICLR. https://arxiv.org/abs/2210.02747

  21. Li, Z., Zeng, Z., Lin, X., Fang, F., Qu, Y., Xu, Z., et al. (2025). Flow matching meets biology and life science: A survey. arXiv preprint arXiv:2507.17731. https://arxiv.org/abs/2507.17731

  22. Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2021). Score-based generative modeling through stochastic differential equations. Proceedings of ICLR. https://arxiv.org/abs/2011.13456

  23. Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems, 34, 8780–8794. https://arxiv.org/abs/2105.05233

  24. Ho, J., & Salimans, T. (2022). Classifier-free diffusion guidance. NeurIPS Workshop on Score-Based Methods. https://arxiv.org/abs/2207.12598

  25. Hoogeboom, E., Satorras, V. G., Vignac, C., & Welling, M. (2022). Equivariant diffusion for molecule generation in 3D. Proceedings of ICML, 8867–8887. https://arxiv.org/abs/2203.17003

  26. Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisenach, H. E., et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976), 1089–1100. https://doi.org/10.1038/s41586-023-06415-8

  27. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models (Stable Diffusion). Proceedings of CVPR, 10684–10695. https://arxiv.org/abs/2112.10752

  28. Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T. R., El-Kadi, A., Masters, D., et al. (2024). Probabilistic weather forecasting with machine learning (GenCast). Nature, 636, 84–90. https://doi.org/10.1038/s41586-024-08252-9

  29. Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. Proceedings of ICLR. https://arxiv.org/abs/1312.6114

  30. GΓ³mez-Bombarelli, R., Wei, J. N., Duvenaud, D., HernΓ‘ndez-Lobato, J. M., SΓ‘nchez-Lengeling, B., Sheberla, D., et al. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2), 268–276. https://doi.org/10.1021/acscentsci.7b00572

  31. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial networks. Advances in Neural Information Processing Systems, 27. https://arxiv.org/abs/1406.2661

  32. De Cao, N., & Kipf, T. (2018). MolGAN: An implicit generative model for small molecular graphs. ICML Workshop on Theoretical Foundations and Applications of Deep Generative Models. https://arxiv.org/abs/1805.11973

  33. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. https://arxiv.org/abs/2108.07258

  34. Dettmers, T., Lewis, M., Belkada, Y., & Zettlemoyer, L. (2022). LLM.int8(): 8-bit matrix multiplication for transformers at scale. Proceedings of NeurIPS, 35, 30318–30332. https://arxiv.org/abs/2208.07339

  35. Frantar, E., Ashkboos, S., Hoefler, T., & Alistarh, D. (2023). GPTQ: Accurate post-training quantization for generative pre-trained transformers. Proceedings of ICLR. https://arxiv.org/abs/2210.17323

  36. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of ICLR. https://arxiv.org/abs/1412.6980

  37. Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization (AdamW). Proceedings of ICLR. https://arxiv.org/abs/1711.05101

  38. Stability AI. (2024). Stable Diffusion 3: Scaling rectified flow transformers for high-resolution image synthesis. https://stability.ai/stable-image

  39. Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630(8016), 493–500. https://doi.org/10.1038/s41586-024-07487-w

  40. Ferruz, N., Schmidt, S., & HΓΆcker, B. (2022). ProtGPT2 is a deep unsupervised language model for protein design. Nature Communications, 13(1), 4348. https://doi.org/10.1038/s41467-022-32007-7

  41. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021). Learning transferable visual models from natural language supervision (CLIP). Proceedings of ICML, 8748–8763. https://arxiv.org/abs/2103.00020

  42. Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. https://doi.org/10.1016/j.jcp.2018.10.045

  43. Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. Proceedings of ICML, 1050–1059. https://arxiv.org/abs/1506.02142

  44. Angelopoulos, A. N., & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511. https://arxiv.org/abs/2107.07511

Additional Resources

Textbooks

Key Review Papers

* * *

This chapter provides the foundational knowledge needed to understand and apply generative AI in scientific contexts. For the latest developments, regularly check arXiv (cs.LG, cs.CL, q-bio sections) and major conference proceedings (NeurIPS, ICML, ICLR, CVPR).

* * *

Next β†’ Chapter 3: Scientific Data & Workflowsβ€”navigating the unique challenges of scientific datasets and integrating AI into research pipelines.