Human Feedback Collection
Human feedback is essential for evaluating and improving AI agents. DuraGraph provides tools for collecting, analyzing, and acting on human feedback at scale.
Overview
Section titled “Overview”Human feedback enables:
- Ground truth evaluation for model outputs
- Preference learning (RLHF/RLAIF)
- Quality monitoring in production
- Active learning for dataset improvement
Feedback Collection
Section titled “Feedback Collection”In-App Feedback Widget
Section titled “In-App Feedback Widget”from duragraph import Graph, llm_nodefrom duragraph.feedback import FeedbackCollector
@Graph(id="chatbot")class Chatbot: def __init__(self): self.feedback = FeedbackCollector()
@llm_node(model="gpt-4o-mini") def generate_response(self, state): return state
def collect_feedback(self, state): """Attach feedback collection to response.""" return { "response": state["response"], "feedback_widget": self.feedback.create_widget( run_id=state["run_id"], output=state["response"], ) }API-Based Feedback
Section titled “API-Based Feedback”# Submit thumbs up/downcurl -X POST http://localhost:8081/api/v1/feedback \ -H "Content-Type: application/json" \ -d '{ "run_id": "323e4567-e89b-12d3-a456-426614174000", "rating": "positive", "comment": "Helpful response" }'
# Submit detailed feedbackcurl -X POST http://localhost:8081/api/v1/feedback \ -H "Content-Type: application/json" \ -d '{ "run_id": "323e4567-e89b-12d3-a456-426614174000", "rating": "negative", "dimensions": { "accuracy": 2, "helpfulness": 3, "safety": 5 }, "comment": "Factually incorrect", "corrected_output": "The correct answer is..." }'Feedback Types
Section titled “Feedback Types”Binary Feedback (Thumbs Up/Down)
Section titled “Binary Feedback (Thumbs Up/Down)”from duragraph.feedback import BinaryFeedback
feedback = BinaryFeedback.submit( run_id="...", rating="positive", # or "negative" user_id="user_123",)Likert Scale
Section titled “Likert Scale”from duragraph.feedback import LikertFeedback
feedback = LikertFeedback.submit( run_id="...", rating=4, # 1-5 scale dimension="helpfulness", user_id="user_123",)Multi-Dimensional
Section titled “Multi-Dimensional”from duragraph.feedback import MultiDimensionalFeedback
feedback = MultiDimensionalFeedback.submit( run_id="...", dimensions={ "accuracy": 4, "helpfulness": 5, "clarity": 3, "safety": 5, }, user_id="user_123",)Comparative (A/B)
Section titled “Comparative (A/B)”from duragraph.feedback import ComparativeFeedback
feedback = ComparativeFeedback.submit( run_id_a="...", run_id_b="...", preference="a", # or "b" or "equal" reason="More accurate and concise", user_id="user_123",)Free-Form Comments
Section titled “Free-Form Comments”from duragraph.feedback import CommentFeedback
feedback = CommentFeedback.submit( run_id="...", comment="The response was helpful but could include more examples", tags=["needs_examples", "good_start"], user_id="user_123",)Feedback UI Components
Section titled “Feedback UI Components”React Component
Section titled “React Component”import { FeedbackWidget } from '@duragraph/react';
function ChatMessage({ message, runId }) { return ( <div> <p>{message.content}</p> <FeedbackWidget runId={runId} onSubmit={(feedback) => { console.log('Feedback submitted:', feedback); }} /> </div> );}HTML Widget
Section titled “HTML Widget”<div id="feedback-widget" data-run-id="..."></div><script src="https://cdn.duragraph.io/feedback-widget.js"></script><script> DuragraphFeedback.render('#feedback-widget', { runId: '...', apiKey: '...', onSubmit: (feedback) => { console.log('Submitted:', feedback); }, });</script>Feedback Analysis
Section titled “Feedback Analysis”Aggregate Metrics
Section titled “Aggregate Metrics”from duragraph.feedback import FeedbackAnalyzer
analyzer = FeedbackAnalyzer()
# Get aggregate statsstats = analyzer.get_stats( start_date="2025-01-01", end_date="2025-01-31", filters={"assistant_id": "..."})
print(stats.positive_rate) # 0.85print(stats.average_rating) # 4.2print(stats.total_feedback) # 1234Trending Issues
Section titled “Trending Issues”# Identify trending negative feedback topicsissues = analyzer.get_trending_issues( time_window="7d", sentiment="negative", min_count=5,)
for issue in issues: print(f"{issue.topic}: {issue.count} occurrences") print(f"Example: {issue.examples[0]}")User Segments
Section titled “User Segments”# Analyze feedback by user segmentsegments = analyzer.segment_analysis( dimensions=["user_tier", "use_case"],)
print(segments["enterprise"]["avg_rating"]) # 4.5print(segments["free_tier"]["avg_rating"]) # 3.8Active Learning
Section titled “Active Learning”Identify Low-Confidence Outputs
Section titled “Identify Low-Confidence Outputs”from duragraph.feedback import ActiveLearner
learner = ActiveLearner()
# Find outputs that need human reviewcandidates = learner.identify_for_review( criteria="low_confidence", limit=50,)
for candidate in candidates: # Present to human for labeling human_feedback = collect_human_review(candidate) learner.submit_feedback(candidate.run_id, human_feedback)Disagreement Sampling
Section titled “Disagreement Sampling”# Find cases where automated scores disagreedisagreements = learner.find_disagreements( scorer_a="llm_judge", scorer_b="heuristic", threshold=0.3, # Difference threshold)
# Request human adjudicationfor case in disagreements: human_score = request_human_rating(case)Feedback-Driven Improvements
Section titled “Feedback-Driven Improvements”Retrain with Feedback
Section titled “Retrain with Feedback”from duragraph.evals import FeedbackDataset
# Create dataset from feedbackdataset = FeedbackDataset.from_feedback( min_rating=4, # Only positive examples include_corrections=True,)
# Use for fine-tuning or prompt engineeringtraining_data = dataset.to_training_format()Prompt Optimization
Section titled “Prompt Optimization”from duragraph.prompts import PromptOptimizer
optimizer = PromptOptimizer()
# Optimize prompts based on feedbackoptimized_prompt = optimizer.optimize( current_prompt="You are a helpful assistant.", negative_examples=get_negative_feedback_examples(), positive_examples=get_positive_feedback_examples(),)Automated A/B Testing
Section titled “Automated A/B Testing”from duragraph.feedback import ABTester
tester = ABTester()
# Create A/B testtest = tester.create_test( name="prompt_comparison", variant_a={"prompt": "Version A"}, variant_b={"prompt": "Version B"}, traffic_split=0.5, success_metric="positive_feedback_rate",)
# Auto-promote winner after statistical significancetester.auto_promote(test_id=test.id, confidence=0.95)Integration with Evals
Section titled “Integration with Evals”Feedback as Ground Truth
Section titled “Feedback as Ground Truth”from duragraph.evals import EvalRunner, FeedbackScorer
runner = EvalRunner( eval_name="human_feedback_validation", dataset=load_production_runs(), scorer=FeedbackScorer( min_feedback_count=3, # Require 3+ ratings aggregation="mean", ),)
results = runner.run()Hybrid Scoring
Section titled “Hybrid Scoring”from duragraph.evals import HybridScorer
scorer = HybridScorer( scorers=[ LLMJudge(model="gpt-4o"), FeedbackScorer(), HeuristicScorer(), ], weights={ "llm_judge": 0.4, "feedback": 0.4, "heuristic": 0.2, },)Feedback Workflows
Section titled “Feedback Workflows”Review Queue
Section titled “Review Queue”from duragraph import Graphfrom duragraph.feedback import ReviewQueue
@Graph(id="feedback_review")class FeedbackReview: def __init__(self): self.queue = ReviewQueue()
def fetch_pending(self, state): """Get feedback needing review.""" pending = self.queue.get_pending(limit=50) return {"feedback_items": pending}
def categorize(self, state): """Categorize feedback.""" for item in state["feedback_items"]: category = categorize_feedback(item) self.queue.tag(item.id, category) return state
def route_to_team(self, state): """Route to appropriate team.""" for item in state["feedback_items"]: if item.category == "bug": notify_engineering(item) elif item.category == "content": notify_content_team(item) return stateAutomated Response
Section titled “Automated Response”@Graph(id="feedback_response")class FeedbackResponse: @llm_node(model="gpt-4o") def generate_response(self, state): """Generate response to feedback.""" return { "response": "Thank you for your feedback..." }
def send_to_user(self, state): """Send response to user.""" send_email( to=state["user_email"], subject="Thank you for your feedback", body=state["response"], ) return stateBest Practices
Section titled “Best Practices”1. Make Feedback Easy
Section titled “1. Make Feedback Easy”# Single-click feedbackfeedback_widget = FeedbackWidget( type="thumbs", # Simple thumbs up/down show_comment=False, # Optional comment field)
# Progressive disclosurefeedback_widget = FeedbackWidget( type="thumbs", show_detailed=lambda rating: rating == "negative", # Only ask details for negative)2. Provide Context
Section titled “2. Provide Context”feedback = FeedbackCollector.submit( run_id="...", rating="negative", context={ "user_intent": "...", "conversation_history": [...], "expected_outcome": "...", })3. Close the Loop
Section titled “3. Close the Loop”# Notify users when their feedback is acted upondef notify_feedback_action(feedback_id): feedback = Feedback.get(feedback_id) send_notification( user_id=feedback.user_id, message=f"Your feedback helped us improve! We've updated {feedback.feature}." )4. Incentivize Quality Feedback
Section titled “4. Incentivize Quality Feedback”from duragraph.feedback import FeedbackRewards
rewards = FeedbackRewards()
# Track quality feedbackrewards.track( user_id="...", feedback_id="...", quality_score=calculate_feedback_quality(...),)
# Reward top contributorstop_contributors = rewards.get_top_contributors(limit=10)Analytics Dashboard
Section titled “Analytics Dashboard”from duragraph.feedback import FeedbackDashboard
dashboard = FeedbackDashboard()
# Generate reportreport = dashboard.generate_report( metrics=[ "feedback_volume", "sentiment_trend", "category_distribution", "response_rate", ], time_period="30d", format="json",)Export Feedback Data
Section titled “Export Feedback Data”from duragraph.feedback import FeedbackExporter
exporter = FeedbackExporter()
# Export to CSVexporter.to_csv( filename="feedback_2025_01.csv", filters={"rating": "negative"},)
# Export to dataset formatexporter.to_dataset( format="jsonl", include_runs=True, # Include associated run data)Next Steps
Section titled “Next Steps”- Evals Overview - Evaluation framework
- LLM Judge - Automated evaluation
- CI/CD Integration - Testing pipeline
- Scorers - Scoring methods