| \n", " | messages | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "conversation_coherence | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "[User: I'd like to buy a smartphone., AI: What kind of smartphone?, User: An iPhone 14 Pro, AI: How much storage do you need?, User: 256GB, AI: What color?, User: WhiteAI: Sounds good - I've loaded the item into your cart.] | \n", "Conversation Coherence | \n", "None | \n", "All messages were coherent. | \n", "3044 | \n", "gpt-3.5-turbo | \n", "1.0 | \n", "
| 1 | \n", "[User: I'd like to buy a smartphone?, AI: Sure, I can help with that. Where do you live?, User: SF, AI: Are you looking for rental apartments in SF?] | \n", "Conversation Coherence | \n", "None | \n", "The following messages were not coherent: Are you looking for rental apartments in SF? | \n", "2545 | \n", "gpt-3.5-turbo | \n", "0.5 | \n", "
| \n", " | messages | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "conversation_coherence | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "[Alice: Hi, I'm looking for a new smartphone with a good camera., AI: Hi Alice! Sure, I can help with that. Are you interested in any specific brand?, Alice: I like Samsung and Apple, but I'm open to other brands if the camera is really good., AI: Both Samsung and Apple have excellent options. The Samsung Galaxy S21 and the iPhone 12 are known for their outstanding cameras., Alice: I've heard the iPhone 12 camera is great. How does it compare to others?, AI: I'm happy to make some recommenda... | \n", "Conversation Coherence | \n", "None | \n", "The following messages were not coherent: I'm happy to make some recommendations about cameras. Are you looking for a portable camera or a professional one?, Speaking of storage, would you also be interested in external hard drives? We have some that are compatible with smartphones. | \n", "10733 | \n", "gpt-3.5-turbo | \n", "0.75 | \n", "
| \n", " | messages | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "conversation_resolution | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "[Customer: Hi, Im interested in purchasing a new smartphone. Can you help me choose the best one?, Salesman: I'm sorry I cannot help you with that., Customer: Im looking for a laptop that can handle graphic design software. Any recommendations?, Salesman: Absolutely! In order to find the best laptop for your needs, could you provide me with more details about the specific graphic design software youll be using?, Customer: Im considering upgrading my companys server. What options do you have ... | \n", "Conversation Resolution | \n", "False | \n", "The following messages were not resolved:\\n\\n-\"Hi, Im interested in purchasing a new smartphone. Can you help me choose the best one?\" (Resolution: Unresolved)\\n: The AI's response did not address the user's request for help in choosing the best smartphone.\\n | \n", "5601 | \n", "gpt-3.5-turbo | \n", "0.833333 | \n", "
| 1 | \n", "[Customer: Hi, Im interested in learning more about your health products., Salesman: Great! Im happy to help. Tell me, what specific health concerns do you have?, Customer: Ive been experiencing digestive issues lately and Im looking for a solution., Salesman: I understand how frustrating that can be. Many of our customers have found relief with our digestive health supplements. Would you like me to provide more information?, Customer: Ive tried different products before, but nothing seems t... | \n", "Conversation Resolution | \n", "False | \n", "All messages were resolved | \n", "5920 | \n", "gpt-3.5-turbo | \n", "1.000000 | \n", "
| 2 | \n", "[Customer: Hi, Im interested in investing in the stock market. Can you help me?, Salesman: Of course! Id be happy to assist you. Lets start by getting to know each other. What are your investment goals and risk tolerance?, Customer: Ive had some bad experiences with financial advisors in the past. How can I trust you?, Salesman: I understand your concerns. Trust is crucial in this industry. Let me share some success stories from my previous clients and provide you with references to vouch fo... | \n", "Conversation Resolution | \n", "False | \n", "All messages were resolved | \n", "5023 | \n", "gpt-3.5-turbo | \n", "1.000000 | \n", "
| \n", " | response | \n", "
|---|---|
| 0 | \n", "I'm sorry but I can't help you with that query | \n", "
| 1 | \n", "I can help you with that query | \n", "
| \n", " | response | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "passed | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "I'm sorry but I can't help you with that query | \n", "Response matches Grading Criteria | \n", "True | \n", "The response indicates that it cannot answer the query, which results in a fail. | \n", "802 | \n", "gpt-3.5-turbo | \n", "0.0 | \n", "
| 1 | \n", "I can help you with that query | \n", "Response matches Grading Criteria | \n", "False | \n", "The response indicates that it can help with the query, which meets the criteria for passing. | \n", "1311 | \n", "gpt-3.5-turbo | \n", "1.0 | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "
|---|---|---|---|---|
| 0 | \n", "What is the capital of India? | \n", "[India is a south asian country, Mumbai is the... | \n", "New Delhi is the capital | \n", "None | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "ragas_conciseness | \n", "
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "What is the capital of India? | \n", "[India is a south asian country, Mumbai is the financial capital of India, New Delhi is the capital of India] | \n", "New Delhi is the capital | \n", "None | \n", "Ragas Conciseness | \n", "None | \n", "This is calculated by how efficiently generated llm response conveys information or ideas clearly and efficiently, without unnecessary or redundant details | \n", "1008 | \n", "gpt-3.5-turbo | \n", "1.0 | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "
|---|---|---|---|---|
| 0 | \n", "What are some successful companies that went through YC? | \n", "[Y Combinator has invested in companies in various fields like FinTech, Healthcare, AI, etc.] | \n", "Airbnb, Dropbox, Stripe, Reddit, Coinbase, Instacart. | \n", "Airbnb and Stripe are 2 of the successful companies that went through YC. | \n", "
| 1 | \n", "In which city is YC located? | \n", "[Y Combinator is located in Mountain View, California.] | \n", "Y Combinator is located in San Francisco | \n", "YC is located in Mountain View, California. | \n", "
| 2 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups twice a year.] | \n", "YC invests $150k for 7%. | \n", "I cannot answer this question as I do not have enough information. | \n", "
| 3 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups twice a year.] | \n", "I cannot answer this question as I do not have enough information. | \n", "I cannot answer this question as I do not have enough information. | \n", "
| 4 | \n", "Who founded YC and when was it founded? | \n", "[Y Combinator was founded in March 2005 by Paul Graham, Jessica Livingston, Trevor Blackwell, and Robert Tappan Morris.] | \n", "Y Combinator was founded in 2005 | \n", "Y Combinator was founded in March 2005 by Paul Graham, Jessica Livingston, Trevor Blackwell, and Robert Tappan Morris. | \n", "
| 5 | \n", "Does Y Combinator invest in startups outside the US? | \n", "[Y Combinator invests in startups from all over the world.] | \n", "Yes, Y Combinator invests in international startups as well as US startups. | \n", "Yes, Y Combinator invests in startups from all over the world. | \n", "
| 6 | \n", "How much does YC invest in startups? | \n", "[YC invests $150k for 7%.] | \n", "$150k | \n", "YC invests $150k for 7%. | \n", "
| 7 | \n", "What is YC's motto? | \n", "[Y Combinator's motto is 'Make something people want'.] | \n", "Make something people want | \n", "Make something people want | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "Ragas Answer Correctness ragas_answer_correctness | \n", "Ragas Faithfulness ragas_faithfulness | \n", "Ragas Context Relevancy ragas_context_relevancy | \n", "Ragas Answer Relevancy ragas_answer_relevancy | \n", "
|---|---|---|---|---|---|---|---|---|
| 0 | \n", "What are some successful companies that went through YC? | \n", "[Y Combinator has invested in companies in various fields like FinTech, Healthcare, AI, etc.] | \n", "Airbnb, Dropbox, Stripe, Reddit, Coinbase, Instacart. | \n", "Airbnb and Stripe are 2 of the successful companies that went through YC. | \n", "0.718469 | \n", "0.833333 | \n", "1.0 | \n", "0.916001 | \n", "
| 1 | \n", "In which city is YC located? | \n", "[Y Combinator is located in Mountain View, California.] | \n", "Y Combinator is located in San Francisco | \n", "YC is located in Mountain View, California. | \n", "0.213528 | \n", "0.000000 | \n", "1.0 | \n", "0.887728 | \n", "
| 2 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups twice a year.] | \n", "YC invests $150k for 7%. | \n", "I cannot answer this question as I do not have enough information. | \n", "0.183196 | \n", "0.000000 | \n", "1.0 | \n", "0.881647 | \n", "
| 3 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups twice a year.] | \n", "I cannot answer this question as I do not have enough information. | \n", "I cannot answer this question as I do not have enough information. | \n", "NaN | \n", "0.000000 | \n", "1.0 | \n", "0.000000 | \n", "
| 4 | \n", "Who founded YC and when was it founded? | \n", "[Y Combinator was founded in March 2005 by Paul Graham, Jessica Livingston, Trevor Blackwell, and Robert Tappan Morris.] | \n", "Y Combinator was founded in 2005 | \n", "Y Combinator was founded in March 2005 by Paul Graham, Jessica Livingston, Trevor Blackwell, and Robert Tappan Morris. | \n", "0.609783 | \n", "1.000000 | \n", "1.0 | \n", "0.897087 | \n", "
| 5 | \n", "Does Y Combinator invest in startups outside the US? | \n", "[Y Combinator invests in startups from all over the world.] | \n", "Yes, Y Combinator invests in international startups as well as US startups. | \n", "Yes, Y Combinator invests in startups from all over the world. | \n", "0.743316 | \n", "0.500000 | \n", "1.0 | \n", "0.981736 | \n", "
| 6 | \n", "How much does YC invest in startups? | \n", "[YC invests $150k for 7%.] | \n", "$150k | \n", "YC invests $150k for 7%. | \n", "0.713585 | \n", "0.000000 | \n", "1.0 | \n", "0.921220 | \n", "
| 7 | \n", "What is YC's motto? | \n", "[Y Combinator's motto is 'Make something people want'.] | \n", "Make something people want | \n", "Make something people want | \n", "1.000000 | \n", "1.000000 | \n", "1.0 | \n", "0.921079 | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "
|---|---|---|---|---|
| 0 | \n", "What is the capital of Greece? | \n", "[Greece is often called the cradle of Western ... | \n", "Athens | \n", "None | \n", "
| 1 | \n", "What is the price of a Tesla Model 3? | \n", "[Tesla Model 3 is a fully electric car.] | \n", "I cannot answer this question as prices vary f... | \n", "None | \n", "
| 2 | \n", "What is a shooting star? | \n", "[Black holes are stars that have collapsed und... | \n", "A shooting star is a meteor that burns up in t... | \n", "None | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "score | \n", "
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "What is the capital of Greece? | \n", "[Greece is often called the cradle of Western civilization.] | \n", "Athens | \n", "None | \n", "Response should answer user's query coherently | \n", "None | \n", "The response does not provide any information or context to evaluate the coherence. It lacks the necessary details to assign a score. | \n", "1012 | \n", "gpt-3.5-turbo | \n", "1 | \n", "
| 1 | \n", "What is the price of a Tesla Model 3? | \n", "[Tesla Model 3 is a fully electric car.] | \n", "I cannot answer this question as prices vary from country to country. | \n", "None | \n", "Response should answer user's query coherently | \n", "None | \n", "The response does not provide any information or context to evaluate the coherence. It lacks necessary details for a proper assessment. | \n", "1136 | \n", "gpt-3.5-turbo | \n", "1 | \n", "
| 2 | \n", "What is a shooting star? | \n", "[Black holes are stars that have collapsed under their own gravity. They are so dense that nothing can escape their gravitational pull, not even light.] | \n", "A shooting star is a meteor that burns up in the atmosphere. | \n", "None | \n", "Response should answer user's query coherently | \n", "None | \n", "The response does not provide any information or context to evaluate the coherence. It lacks the necessary details to assign a score. | \n", "1074 | \n", "gpt-3.5-turbo | \n", "1 | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "
|---|---|---|---|---|
| 0 | \n", "What are some successful companies that went t... | \n", "[Y Combinator has invested in companies in var... | \n", "Airbnb, Dropbox, Stripe, Reddit, Coinbase, Ins... | \n", "Airbnb and Stripe are 2 of the successful comp... | \n", "
| 1 | \n", "In which city is YC located? | \n", "[Y Combinator is located in Mountain View, Cal... | \n", "Y Combinator is located in San Francisco | \n", "YC is located in Mountain View, California. | \n", "
| 2 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups tw... | \n", "YC invests $150k for 7%. | \n", "I cannot answer this question as I do not have... | \n", "
| 3 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups tw... | \n", "I cannot answer this question as I do not have... | \n", "I cannot answer this question as I do not have... | \n", "
| 4 | \n", "Who founded YC and when was it founded? | \n", "[Y Combinator was founded in March 2005 by Pau... | \n", "Y Combinator was founded in 2005 | \n", "Y Combinator was founded in March 2005 by Paul... | \n", "
| 5 | \n", "Does Y Combinator invest in startups outside t... | \n", "[Y Combinator invests in startups from all ove... | \n", "Yes, Y Combinator invests in international sta... | \n", "Yes, Y Combinator invests in startups from all... | \n", "
| 6 | \n", "How much does YC invest in startups? | \n", "[YC invests $150k for 7%.] | \n", "$150k | \n", "YC invests $150k for 7%. | \n", "
| 7 | \n", "What is YC's motto? | \n", "[Y Combinator's motto is 'Make something peopl... | \n", "Make something people want | \n", "Make something people want | \n", "
| \n", " | query | \n", "context | \n", "response | \n", "expected_response | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "passed | \n", "
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "What are some successful companies that went through YC? | \n", "[Y Combinator has invested in companies in various fields like FinTech, Healthcare, AI, etc.] | \n", "Airbnb, Dropbox, Stripe, Reddit, Coinbase, Instacart. | \n", "Airbnb and Stripe are 2 of the successful companies that went through YC. | \n", "Context Contains Enough Information | \n", "True | \n", "The context provided does not include specific examples of successful companies that went through Y Combinator (YC). It only mentions that YC has invested in companies across various fields such as FinTech, Healthcare, AI, etc. Without specific company names or examples, the chatbot cannot answer the user's query based solely on the context given. | \n", "4005 | \n", "gpt-4-1106-preview | \n", "0.0 | \n", "
| 1 | \n", "In which city is YC located? | \n", "[Y Combinator is located in Mountain View, California.] | \n", "Y Combinator is located in San Francisco | \n", "YC is located in Mountain View, California. | \n", "Context Contains Enough Information | \n", "False | \n", "The context clearly states that Y Combinator is located in Mountain View, California. Therefore, the chatbot can use this information to answer the user's query about the city in which YC is located, which is Mountain View. | \n", "2456 | \n", "gpt-4-1106-preview | \n", "1.0 | \n", "
| 2 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups twice a year.] | \n", "YC invests $150k for 7%. | \n", "I cannot answer this question as I do not have enough information. | \n", "Context Contains Enough Information | \n", "True | \n", "The context provided states the amount Y Combinator invests in startups and how often, but it does not specify the percentage of equity YC takes in return for its investment. Therefore, the chatbot cannot answer the user's query about YC's equity stake based solely on the provided context. | \n", "3083 | \n", "gpt-4-1106-preview | \n", "0.0 | \n", "
| 3 | \n", "How much equity does YC take? | \n", "[Y Combinator invests $500k in 200 startups twice a year.] | \n", "I cannot answer this question as I do not have enough information. | \n", "I cannot answer this question as I do not have enough information. | \n", "Context Contains Enough Information | \n", "True | \n", "The context provided only states the amount Y Combinator invests in startups and how often, but it does not specify the percentage of equity YC takes in return for its investment. Therefore, the chatbot cannot answer the user's query about YC's equity take with the information given. | \n", "3361 | \n", "gpt-4-1106-preview | \n", "0.0 | \n", "
| 4 | \n", "Who founded YC and when was it founded? | \n", "[Y Combinator was founded in March 2005 by Paul Graham, Jessica Livingston, Trevor Blackwell, and Robert Tappan Morris.] | \n", "Y Combinator was founded in 2005 | \n", "Y Combinator was founded in March 2005 by Paul Graham, Jessica Livingston, Trevor Blackwell, and Robert Tappan Morris. | \n", "Context Contains Enough Information | \n", "False | \n", "The context provides the names of the founders of Y Combinator (Paul Graham, Jessica Livingston, Trevor Blackwell, and Robert Tappan Morris) and the date it was founded (March 2005), which directly answers the user's query about who founded YC and when it was founded. | \n", "2833 | \n", "gpt-4-1106-preview | \n", "1.0 | \n", "
| 5 | \n", "Does Y Combinator invest in startups outside the US? | \n", "[Y Combinator invests in startups from all over the world.] | \n", "Yes, Y Combinator invests in international startups as well as US startups. | \n", "Yes, Y Combinator invests in startups from all over the world. | \n", "Context Contains Enough Information | \n", "False | \n", "The context clearly states that Y Combinator invests in startups from all over the world, which directly answers the user's query about whether Y Combinator invests in startups outside the US. Therefore, the chatbot can use the context information to affirm that Y Combinator does invest in startups outside the US. | \n", "4886 | \n", "gpt-4-1106-preview | \n", "1.0 | \n", "
| 6 | \n", "How much does YC invest in startups? | \n", "[YC invests $150k for 7%.] | \n", "$150k | \n", "YC invests $150k for 7%. | \n", "Context Contains Enough Information | \n", "False | \n", "The context directly answers the user's query by stating that YC invests $150k for 7% equity in startups. This information is sufficient to inform the user about the amount YC invests. | \n", "1879 | \n", "gpt-4-1106-preview | \n", "1.0 | \n", "
| 7 | \n", "What is YC's motto? | \n", "[Y Combinator's motto is 'Make something people want'.] | \n", "Make something people want | \n", "Make something people want | \n", "Context Contains Enough Information | \n", "False | \n", "The context directly provides the answer to the user's query. The user asked for Y Combinator's motto, and the context states that Y Combinator's motto is 'Make something people want'. Therefore, the chatbot can use the context information to answer the user's query accurately. | \n", "3536 | \n", "gpt-4-1106-preview | \n", "1.0 | \n", "
| \n", " | text | \n", "
|---|---|
| 0 | \n", "Please contact customer support for this infor... | \n", "
| 1 | \n", "I cannot answer this question | \n", "
| 2 | \n", "Sure, you can email hello@athina.ai | \n", "
| \n", " | text | \n", "display_name | \n", "failed | \n", "grade_reason | \n", "runtime | \n", "model | \n", "passed | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "Please contact customer support for this information | \n", "ContainsAny | \n", "False | \n", "One or more keywords were found in output: contact, support | \n", "0 | \n", "None | \n", "1.0 | \n", "
| 1 | \n", "I cannot answer this question | \n", "ContainsAny | \n", "True | \n", "No keywords found in output | \n", "0 | \n", "None | \n", "0.0 | \n", "
| 2 | \n", "Sure, you can email hello@athina.ai | \n", "ContainsAny | \n", "True | \n", "No keywords found in output | \n", "0 | \n", "None | \n", "0.0 | \n", "