Published inTowards Data ScienceChoosing Between LLM Agent FrameworksThe tradeoffs between building bespoke code-based agents and the major agent frameworks.Sep 21, 202426Sep 21, 202426
Published inTowards Data ScienceNavigating the New Types of LLM Agents and ArchitecturesThe failure of ReAct agents gives way to a new generation of agents — and possibilitiesAug 30, 20249Aug 30, 20249
Published inTowards Data ScienceEvaluating SQL Generation with LLM as a JudgeResults point to a promising approachJul 31, 2024Jul 31, 2024
Published inTowards Data ScienceLarge Language Model Performance in Time Series AnalysisHow do major LLMs stack up at detecting anomalies or movements in the data when given a large set of time series data within the context…May 1, 20242May 1, 20242
Published inTowards Data ScienceTips for Getting the Generation Part Right in Retrieval Augmented GenerationResults from experiments to evaluate and compare GPT-4, Claude 2.1, and Claude 3.0 OpusApr 6, 2024Apr 6, 2024
Published inTowards Data ScienceModel Evaluations Versus Task EvaluationsUnderstanding the difference for LLM applicationsMar 26, 2024Mar 26, 2024
Published inTowards Data ScienceWhy You Should Not Use Numeric Evals For LLM As a JudgeTesting major LLMs on how well they conduct numeric evaluationsMar 8, 2024Mar 8, 2024
Published inTowards Data ScienceThe Needle In a Haystack TestEvaluating the performance of RAG systemsFeb 15, 2024Feb 15, 2024
Published inTowards Data ScienceLLM Evals: Setup and the Metrics That MatterHow to build and run LLM evals — and why you should use precision and recall when benchmarking your LLM prompt templateOct 13, 20234Oct 13, 20234
Published inTowards Data ScienceSafeguarding LLMs with GuardrailsA pragmatic guide to implementing guardrails, covering both Guardrails AI and NVIDIA’s NeMo GuardrailsSep 1, 20233Sep 1, 20233