{"api_version": 1, "episode_id": "ep_latent_space_the_ai_engineer_podcast_16b3920a36ea", "title": "\ud83d\udd2c Training Transformers to solve 95% failure rate of Cancer Trials \u2014 Ron Alfa & Daniel Bear, Noetik", "podcast": "Latent Space: The AI Engineer Podcast", "podcast_slug": "latent_space_the_ai_engineer_podcast", "category": "tech", "publish_date": "2026-04-20T16:17:17+00:00", "audio_url": "https://api.substack.com/feed/podcast/194810752/1b92bce4d49858354007a47c48e4e6d4.mp3", "source_link": "https://www.latent.space/p/noetik", "cover_image_url": "https://substackcdn.com/feed/podcast/1084089/post/194810752/3a87f1aa9c0d9002c08d675c5183aae1.jpg", "summary": "Noetik is building foundation models trained on multimodal patient data to solve the 90-95% failure rate of cancer trials by rethinking patient selection, not drug design. The company argues that most cancer drugs fail not due to poor pharmacology but because they're tested on unselected patient populations, and that traditional preclinical models like immortalized cell lines and mouse models poorly reflect human biology. Their approach uses transformer models trained on real human tumor samples to identify functional disease subtypes and match drugs to patients based on underlying biology.", "key_takeaways": ["Cancer drug failures are primarily due to poor patient selection, not flawed drug design, as current preclinical models (cell lines, animal models) do not reflect real human tumor biology.", "Noetik generates multimodal data from real human tumor samples to train transformer-based foundation models that can reverse-translate from patient response to discover targets and define responsive subpopulations.", "The company treats diseased tissue as a functional system\u2014similar to how neural networks abstract neurons\u2014rather than trying to simulate individual cells, enabling faster, more accurate prediction of patient-level drug response."], "best_for": ["AI engineers", "researchers", "curious generalists"], "why_listen": "You'll understand how foundation models applied to human tissue data could fix the broken paradigm of cancer drug development and why most preclinical models are biologically misleading.", "verdict": "must_listen", "guests": [{"name": "Ron Alfa", "role": "Co-founder and CEO of Noetik", "bio_hint": "Physician-scientist leading the development of AI models to understand patient biology and improve cancer drug trials"}, {"name": "Daniel Bear", "role": "Co-founder of Noetik", "bio_hint": "Neuroscientist and computational biologist applying neural network abstractions to model tissue-level disease dynamics"}], "entities": {"people": [{"name": "Ron Alfa", "mentions": 4}, {"name": "Daniel Bear", "mentions": 3}], "places": [], "products": [], "companies": [{"name": "Noetik", "mentions": 7}]}, "quotes": [{"text": "Most of those drugs fail... because we're bad at selecting which patients those drugs are gonna work in.", "speaker": "Ron Alfa", "timestamp_seconds": 120.0}, {"text": "These are sort of Frankensteinian cells. They're cancer and dry. They're mostly cancer.", "speaker": "Ron Alfa", "timestamp_seconds": 420.0}, {"text": "We're in the first inkling of the chat GPT moment for bio, but it's very much just the very beginning.", "speaker": "Ron Alfa", "timestamp_seconds": 980.0}], "chapters": [{"title": "The Problem with Cancer Drug Trials", "summary": "Ron Alfa explains why 90-95% of cancer drugs fail in clinical trials, not due to poor pharmacology but because of inaccurate patient selection.", "end_seconds": 180.0, "start_seconds": 0.0}, {"title": "Building Noetik's Data Foundation", "summary": "The team describes launching a wet lab, generating multimodal tumor data, and building processing pipelines to capture real patient biology.", "end_seconds": 360.0, "start_seconds": 180.0}, {"title": "Why Traditional Models Fail", "summary": "Critique of using immortalized cell lines and animal models that poorly represent human cancer biology, leading to non-translatable results.", "end_seconds": 540.0, "start_seconds": 360.0}, {"title": "A New Approach with Transformers", "summary": "Noetik\u2019s use of transformer models trained on patient tissue data to understand functional disease biology rather than relying on outdated abstractions.", "end_seconds": 720.0, "start_seconds": 540.0}, {"title": "Rescuing Failed Trials", "summary": "Application of Noetik\u2019s models to analyze Phase II/III trial data and identify hidden biological predictors of drug response.", "end_seconds": 840.0, "start_seconds": 720.0}, {"title": "Redefining Cancer Subtypes", "summary": "The belief that rich multimodal data will reveal new, functionally distinct cancer subtypes beyond current pathological classifications.", "end_seconds": 960.0, "start_seconds": 840.0}, {"title": "Call to Action for AI in Biology", "summary": "A plea for more talent and excitement around applying machine learning to solve fundamental problems in biology and drug development.", "end_seconds": 1080.0, "start_seconds": 960.0}], "overall_score": 84.2, "score_breakdown": {"clarity": 85.0, "originality": 94.0, "hype_penalty": 2.0, "actionability": 75.0, "technical_depth": 82.0, "information_density": 75.0}, "score_evidence": {"clarity": "So our thesis is they fail not because we're bad at pharmacology, not because we're bad at target selection...", "originality": "Our thesis is they fail not because we're bad at pharmacology... we're bad at selecting which patients those drugs are gonna work in.", "hype_penalty": "Everyone should be excited about biology... we're just really the very beginning.", "actionability": "You can use these models to understand which patients or what underlying biology of the patients in the trial is a predictor of response.", "technical_depth": "These cell lines have genomes that allow them to persist that have abnormal numbers of chromosomes.", "information_density": "We built this whole processing pipeline to to get the tumors into, like, these arrays and the formats."}, "score_reasoning": {"clarity": "The discussion is well-structured, moving from problem to solution with clear examples and analogies.", "originality": "Introduces a novel contrarian thesis that cancer drug failures stem from poor patient selection rather than pharmacology, backed by a proprietary multimodal data and foundation modeling approach.", "hype_penalty": "Some aspirational language about AI transforming drug development, but largely grounded in specific technical and operational challenges.", "actionability": "Listeners gain a concrete understanding of patient stratification via multimodal models and how it applies to trial redesign.", "technical_depth": "The discussion demonstrates deep domain expertise in oncology drug development, critiquing cell lines and animal models with concrete biological and technical reasoning.", "information_density": "The episode provides specific details about Noetik's lab operations, data generation pipeline, and the limitations of current preclinical models in oncology."}, "scoring_confidence": 0.9, "transcript_available": true, "transcript_chars": 83556, "transcript_provider": "deepgram"}