Sharely.ai Agents
AI Agent Builder & Evaluation Platform

Build, Compose, and Prove
Your AI Agents Work

Design agents visually, compose them with tools and subagents, trace every execution step, then validate with automated evaluation and human expert review — a complete platform for building and proving your AI agents.

Real-World Scenario

PrecisionCare Biomedicals

PrecisionCare Biomedicals is a pharmaceutical company with a comprehensive knowledge base of 300+ documents spanning SOPs, regulatory filings, clinical trial reports, and training materials.

They need to build intelligent AI agents that help employees find answers, check regulatory compliance, and navigate complex procedures. But they also need to compose agents with enterprise tools, trace every decision, and systematically validate that their agents work correctly.

Here's how they use Sharely Agents to design, compose, trace, evaluate, and continuously improve their AI agents.

PrecisionCare Biomedicals

Enterprise Knowledge Base

300+

Documents

6

Categories

GPT-4o

AI Model

12 SMEs

Reviewers

Agentic Workflow Composition

Compose Agents with Workflow-Tools

PrecisionCare doesn't just connect single search nodes to their agent. They compose tools as workflows — each tool is a mini-flow with conditional logic, specialized retrieval steps, and result merging. The agent's model decides which workflow-tool to invoke based on the question and its current state.

1

Connect Workflow-Tools to the Agent

PrecisionCare's Regulatory Compliance Agent has two workflow-tools registered: a Regulatory Search tool (SOPs, FDA filings) and a Clinical Trials Search tool (protocols, phase reports). The model decides which tool to call based on the user's question — and can call both iteratively to build a complete answer.

Sharely Agents - Agentflow
Regulatory Compliance Agent
Agentic RAG
Flow Assistant Toolbox
Flow Assistant

What would you like to do with your flow?

Validate Flow

Check for missing connections, configuration issues, and best practices

Modify Flow

Get help adding nodes, changing prompts, adjusting retrieval settings, or rewiring connections

Test Flow

Run your agent with a question and see live execution on the canvas

Explain Flow

Understand what each node does and how data flows through your agent

2

Design Tools as Workflows

Each tool connected to the agent is itself a workflow. Here’s the inside of PrecisionCare’s Regulatory Search Tool — it classifies the query by document category (SOP, Regulatory Filing, or Clinical Protocol), routes to specialized search nodes with different RBAC roles and top-K settings, then merges and ranks the results before returning them to the agent.

Agentflow - Regulatory Search Tool (Workflow)
Regulatory Search ToolWorkflow-Tool

Used by: Regulatory Compliance Agent · Invoked when the model needs regulatory or SOP information

How Workflow-Tools Work

The agent doesn't know or care about the internal complexity of its tools. It sees a tool with a name and description (“Search SOPs, regulatory filings, and FDA guidance documents”) and decides when to call it based on the user's question and the conversation state. When the agent invokes the tool, the workflow executes — classifying the query, routing to the right search branch, and returning merged results. The agent then uses those results to continue reasoning.

3

Trace Across Agent and Workflow-Tools

When the agent handles a question, PrecisionCare can inspect the full execution trace — the agent’s reasoning, its decision to invoke a workflow-tool, the workflow expanding into its internal steps (classification, branching, search), and the agent continuing with the results. Every step across both levels is visible.

Composition Trace - "What are the storage requirements for mAb samples?"

Composition Trace

Run #1,847 · Regulatory Compliance Agent v2.1 · 2.8s total

SuccessCompare

3

LLM Calls

1

Tool Invocations

4

Workflow Steps

4,216

Total Tokens

2.8s

Total Time

User Query

What are the storage requirements for reconstituted mAb samples per SOP-042?

Agent Reasoning (Iteration 1)

This question is about SOP storage requirements. I should search the regulatory knowledge base for SOP-042. → Decided to call: Regulatory Search Tool

892 tok0.4s
Workflow-Tool: Regulatory Search Tool

Invoked with query: "storage requirements reconstituted mAb samples SOP-042"

124 tok1.2s
Document Classifier

Classified as "SOP" (confidence: 0.96) → routing to SOP Search branch

0.1s
SOP Search

Role: Quality Assurance, Top-K: 10 → 10 chunks retrieved, top score 0.94 (SOP-042 Section 4.3)

0.6s
Merge & Rank

De-duplicated 10 results → 8 unique chunks, ranked by relevance. Top: SOP-042 §4.3 (0.94), SOP-042 §4.1 (0.89)

0.1s
Tool Output

Returned 8 ranked results with source citations to the agent

Agent Reasoning (Iteration 2)

Found SOP-042 §4.3 with storage requirements. I have enough information to answer the question directly. → Generating final response.

2,412 tok0.8s
Final Response

Per SOP-042 §4.3: reconstituted monoclonal antibody samples must be stored at 2-8°C (refrigerated) and used within 24 hours of reconstitution. Samples must not be frozen after reconstitution. Storage must be in amber glass vials to protect from light degradation.

SOP-042 Section 4.3SOP-042 Section 4.1SOP-042 Section 3.2
4,216 tok2.8s
Conversation Insights

See How Employees Use the AI Agent

PrecisionCare's AI agent handles hundreds of questions weekly from lab technicians, regulatory staff, and quality engineers. Sharely indexes every conversation and classifies each Q&A by type and quality.

1

Index & Classify Conversations

Sharely automatically pulls conversations from PrecisionCare's deployed AI agents. Each Q&A pair is classified by question type (FAQ, Technical, How-To, etc.) and answer quality (Answered, Partial, Failed, etc.).

Conversation Indexing - PrecisionCare Biomedicals

Indexing Session

Classifying 2,841 conversations from PrecisionCare agents...

Complete

2,841

Total Q&As

2,756

Classified

9

Question Types

5

Quality Levels

Classification Progress2,756 / 2,841 (97%)
2

Visualize Trends & Patterns

PrecisionCare's quality team sees conversation volume over time, question type distribution, and a heatmap crossing question types with answer quality — immediately revealing that Technical and Troubleshooting questions have the highest failure rates.

Questions Over Time
Answer Quality Distribution
Question Type vs Answer Quality Heatmap
3

Drill Down & Create Evaluation Datasets

Click any cell in the heatmap to see the actual Q&A pairs. PrecisionCare notices 74 Technical questions that Failed — they create an evaluation dataset directly from this segment to systematically fix the issues.

Technical - Failed (74 items)

Technical - Failed

74 items

MarkdownExcel

Create Evaluation Dataset

Technical - Failed (Mar 2026)

Questions will be imported without expected answers. Generate expected answers using AI from the dataset page.

DateSpaceQuestionAnswerTypeQuality
Mar 10Lab OperationsWhat is the required temperature range for storing reconstituted mAb samples per SOP-042?I'm sorry, but I couldn't find specific temperature requirements for reconstituted mAb samples...TechnicalFailed
Mar 9Quality AssuranceWhat are the acceptance criteria for endotoxin testing on batch release?I don't have information about the specific acceptance criteria for endotoxin testing...TechnicalFailed
Mar 8Regulatory AffairsWhat is the shelf life for lyophilized drug product per the stability protocol?I'm unable to find the shelf life specifications for lyophilized drug product...TechnicalFailed
Mar 7Clinical ResearchHow do we handle protocol deviations during Phase II trials per SOP-118?I'm sorry, but I don't have the information on protocol deviation handling procedures...TechnicalFailed
Automated Evaluation

Build Datasets, Generate Answers, Run Evaluations

PrecisionCare creates evaluation datasets from the failed conversations they identified. They auto-generate expected answers using their knowledge base + a strong AI model, then run automated evaluations to measure improvement.

1

Build & Enrich Evaluation Datasets

The 74 failed Technical questions are imported without expected answers. PrecisionCare uses RAG retrieval + Claude Sonnet 4.5 to auto-generate gold-standard expected answers from their knowledge base.

Dataset: Technical - Failed (Mar 2026) - 74 questions
DatasetsTechnical - Failed (Mar 2026)

Technical - Failed (Mar 2026)

74 questions · Imported from Conversation Insights

Import CSVAdd Question
74 of 74 questions are missing expected answers
Generate with AI

Generate Expected Answers

Use RAG retrieval + a strong model to generate ideal expected answers for 74 questions.

Claude Sonnet 4.5 (Recommended)
Generating answer (31/74)
Cancel

What are the acceptance criteria for endotoxin testing on batch release?

#1

What is the required temperature range for storing reconstituted mAb samples per SOP-042?

From Insights AI AnswerTechnical
#2

What are the acceptance criteria for endotoxin testing on batch release?

From Insights AI AnswerTechnical
#3

What is the shelf life for lyophilized drug product per the stability protocol?

From InsightsNo expected answerTechnical
#4

How do we handle protocol deviations during Phase II trials per SOP-118?

From InsightsNo expected answerTechnical
2

Run Automated Evaluations

PrecisionCare runs the 74 Technical questions against their RAG agent. Each question is answered by the agent, then judged by an AI evaluator on relevance, groundedness, correctness, and similarity to the expected answer.

Evaluation Session: Technical-Failed Regression Test

Technical-Failed Regression Test

Dataset: Technical - Failed (Mar 2026) · 74 questions · PrecisionCare RAG Agent

Completed

0.72

Relevance

0.81

Groundedness

68%

Correctness

0.65

Similarity

1.2s

Avg Latency

0.74

Source Recall

#QuestionRelevanceGroundedCorrectSimilarityLatency
1Required temperature range for reconstituted mAb samples per SOP-042?0.850.920.781.1s
2Acceptance criteria for endotoxin testing on batch release?0.450.600.321.4s
3Shelf life for lyophilized drug product per stability protocol?0.720.880.710.9s
4Protocol deviation handling during Phase II trials per SOP-118?0.900.950.851.0s
Human Evaluation

Validate with Subject Matter Experts

Automated metrics reveal that 32% of PrecisionCare's Technical questions are still failing. Before deploying fixes company-wide, they invite their scientists, regulatory specialists, and quality engineers to validate the AI's answers with human judgment.

1

Admin Launches an Evaluation Campaign

The quality team creates a human evaluation campaign, configuring how many evaluations each knowledge chunk should receive and which AI model to test.

Sharely Agents - Human Evaluation
Human EvaluationPrecisionCare Biomedicals - Q1 Audit

Human Evaluation Campaign

PrecisionCare Biomedicals - Q1 Audit

Active

Validate AI answers across regulatory and clinical documents

OverviewReviewersKnowledgeAI UsageSettings

Total Reviewers

12

Active Reviewers

8

Flagged Chunks

47

Answer Correctness

84%

Progress

Evaluations Completed454 / 1356 (33%)

Question Quality Distribution

200

High

94

Medium

160

Low

2

Invite Subject Matter Experts

The admin invites SMEs — scientists, regulatory specialists, and quality engineers — to review the AI-generated content. Each reviewer gets a unique, tokenized link.

Invite Reviewer

Invite Reviewer

Dr. Sarah Chen
s.chen@precisionbio.com
#
50

Limit how many items this reviewer can evaluate

This workspace has RBAC enabled. Select which roles this reviewer should have access to.

Reviewers (12)

Reviewers (12)

DSC

Dr. Sarah Chen

s.chen@precisionbio.com

Active
42/50
Regulatory
JR

James Rodriguez

j.rodriguez@precisionbio.com

Active
38/50
Clinical
DAP

Dr. Aisha Patel

a.patel@precisionbio.com

Active
31/40
Quality
MT

Michael Torres

m.torres@precisionbio.com

Active
15/50
Regulatory
LW

Lisa Wang

l.wang@precisionbio.com

Pending
0/30
Clinical
3

Reviewer Receives a Personalized Invitation

Each SME receives a branded email with a unique, tokenized link. No login required — they click and start reviewing immediately.

Email - s.chen@precisionbio.com

noreply@sharely.ai

To: s.chen@precisionbio.com

You're invited to evaluate AI content for PrecisionCare Biomedicals

You're invited to evaluate AI content

Hi Dr. Sarah Chen,

The Quality Team has invited you to participate in the PrecisionCare Biomedicals - Q1 Audit evaluation campaign for PrecisionCare Biomedicals.

Your feedback will help improve the quality of AI-generated questions and answers based on the knowledge base.

Start Evaluation

Or copy and paste this link into your browser:

https://agents.sharely.ai/review/eyJhbGciOi...kT3dFz

This link is unique to you. Please do not share it with others.

Powered by Sharely.ai

4

Reviewers Evaluate Questions & Answers

Subject matter experts review AI-generated questions from the knowledge base, rate their quality, then evaluate the AI's answer for correctness. The mobile-first interface makes it easy to review from anywhere.

Welcome Screen

Welcome, Dr. Chen!

Thank you for helping us improve our AI knowledge base.

What you'll be doing

  • Review AI-generated questions
  • Rate AI answer correctness
  • Flag issues for review

Campaign

PrecisionCare Biomedicals - Q1 Audit

Organization: PrecisionCare Biomedicals

Rate the Question

12 completed

Question

What is the required temperature range for storing reconstituted monoclonal antibody samples according to SOP-042?

Rate this question

GoodOKBad

Generating AI answer...

Rate the Answer

12 completed

Question

What is the required temperature range for storing reconstituted monoclonal antibody samples according to SOP-042?

Rate this question

GoodOKBad

AI Answer

According to SOP-042 Section 4.3, reconstituted monoclonal antibody samples must be stored at 2-8°C (refrigerated) and used within 24 hours of reconstitution. Samples must not be frozen after reconstitution.

Rate this answer

CorrectPartialWrong
5

Visualize Knowledge Coverage

See at a glance which of your 300 knowledge resources have been evaluated. The heat map reveals coverage gaps, quality distribution, and flagged content across your entire knowledge base.

Knowledge Coverage - PrecisionCare Biomedicals

300

Total Resources

37% have questions

1356

Total Questions

454 evaluated

200

High Quality

44% of evaluated

47

Flagged

Needs attention

Resource Coverage by Category

High Quality In Progress Needs Work Not Started
6

Drill Down into Resource-Level Detail

Filter and explore evaluation results at the individual resource level. See which documents have quality issues, review flagged content, and export results for regulatory audits.

Knowledge Resources - Filtered View
Status: In Progress
Category: SOP
Search resources...
ResourceQuestionsEvaluatedQualityFlaggedStatus
SOP-007 Cold Chain Management
72/7
3In Progress
SOP-009 Environmental Monitoring
178/17
-In Progress
SOP-011 Change Control
139/13
-In Progress
SOP-063 Bioassay Validation v2
42/4
-In Progress
SOP-069 Environmental Monitoring v2
1713/17
-In Progress
SOP-070 Deviation Handling v2
2013/20
-In Progress

The PrecisionCare Result

By connecting conversation analytics, automated evaluation, and human expert review, PrecisionCare built a complete feedback loop that continuously improves their AI agent.

Composable Agents

PrecisionCare built a Regulatory Compliance Agent that orchestrates 3 subagents and 6 tools, handling complex compliance workflows without custom code.

Data-Driven Improvement

Conversation insights revealed that Technical and Troubleshooting questions had the highest failure rates. Targeted datasets let them systematically fix these gaps.

Regulatory Confidence

Full execution tracing plus human validation from 12 SMEs provided the evidence needed for compliance audits and company-wide deployment.

Continuous Quality Loop

PrecisionCare designs agents, traces executions, evaluates with automation and humans, then iterates — a complete lifecycle for AI improvement.

Ready to Build and Prove Your AI Agents?

Design agents visually, compose with enterprise tools, trace every decision, then validate with automated evaluation and human experts. Get started in minutes.

Get Started Free