Repository: 567-labs/instructor Branch: main Commit: 41f050c7c1fa Files: 706 Total size: 4.0 MB Directory structure: gitextract_z1bftxv1/ ├── .coveragerc ├── .cursor/ │ └── rules/ │ ├── documentation-sync.mdc │ ├── followups.mdc │ ├── new-features-planning.mdc │ ├── readme.md │ └── simple-language.mdc ├── .cursorignore ├── .github/ │ ├── FUNDING.yml │ ├── ISSUE_TEMPLATE/ │ │ ├── bug_report.md │ │ └── feature_request.md │ ├── PULL_REQUEST_TEMPLATE/ │ │ └── pull_request_template.md │ ├── dependabot.yml │ └── workflows/ │ ├── ai-label.yml │ ├── evals.yml │ ├── python-publish.yml │ ├── ruff.yml │ ├── scheduled-release.yml │ ├── test.yml │ ├── test_docs.yml │ └── ty.yml ├── .gitignore ├── .grit/ │ ├── .gitignore │ └── grit.yaml ├── .pre-commit-config.yaml ├── .ruff.toml ├── AGENT.md ├── CHANGELOG.md ├── CLAUDE.md ├── CONTRIBUTING.md ├── LICENSE ├── NEW_PROVIDER_AGENT_INSTRUCTIONS.md ├── README.md ├── build_mkdocs.sh ├── cross_link_mapping.yaml ├── docs/ │ ├── AGENT.md │ ├── api-docstring-assessment.md │ ├── api.md │ ├── architecture.md │ ├── blog/ │ │ ├── .authors.yml │ │ ├── index.md │ │ └── posts/ │ │ ├── aisummit-2023.md │ │ ├── announcing-gemini-tool-calling-support.md │ │ ├── announcing-instructor-responses-support.md │ │ ├── announcing-unified-provider-interface.md │ │ ├── anthropic-prompt-caching.md │ │ ├── anthropic-web-search-structured.md │ │ ├── anthropic.md │ │ ├── bad-schemas-could-break-llms.md │ │ ├── best_framework.md │ │ ├── caching.md │ │ ├── chain-of-density.md │ │ ├── chat-with-your-pdf-with-gemini.md │ │ ├── citations.md │ │ ├── consistent-stories.md │ │ ├── course.md │ │ ├── cursor-rules.md │ │ ├── distilation-part1.md │ │ ├── extract-model-looks.md │ │ ├── extracting-model-metadata.md │ │ ├── fake-data.md │ │ ├── full-fastapi-visibility.md │ │ ├── generating-pdf-citations.md │ │ ├── generator.md │ │ ├── google-openai-client.md │ │ ├── introducing-structured-outputs-with-cerebras-inference.md │ │ ├── introducing-structured-outputs.md │ │ ├── introduction.md │ │ ├── jinja-proposal.md │ │ ├── langsmith.md │ │ ├── learn-async.md │ │ ├── llm-as-reranker.md │ │ ├── llms-txt-adoption.md │ │ ├── llms-txt-support.md │ │ ├── logfire.md │ │ ├── lseg-market-surveillance.md │ │ ├── matching-language.md │ │ ├── migrating-to-uv.md │ │ ├── mkdocs-llmstxt-plugin-integration.md │ │ ├── multimodal-gemini.md │ │ ├── native_caching.md │ │ ├── open_source.md │ │ ├── openai-distilation-store.md │ │ ├── openai-multimodal.md │ │ ├── pairwise-llm-judge.md │ │ ├── parea.md │ │ ├── pydantic-is-still-all-you-need.md │ │ ├── rag-and-beyond.md │ │ ├── rag-timelines.md │ │ ├── semantic-validation-structured-outputs.md │ │ ├── situate-context.md │ │ ├── string-based-init.md │ │ ├── structured-output-anthropic.md │ │ ├── tidy-data-from-messy-tables.md │ │ ├── timestamp.md │ │ ├── using_json.md │ │ ├── validation-part1.md │ │ ├── version-1.md │ │ ├── why-care-about-mcps.md │ │ ├── writer-support.md │ │ ├── youtube-flashcards.md │ │ └── youtube-transcripts.md │ ├── cli/ │ │ ├── batch.md │ │ ├── finetune.md │ │ ├── index.md │ │ └── usage.md │ ├── concepts/ │ │ ├── alias.md │ │ ├── batch.md │ │ ├── caching.md │ │ ├── citation.md │ │ ├── dictionary_operations.md │ │ ├── distillation.md │ │ ├── enums.md │ │ ├── error_handling.md │ │ ├── fastapi.md │ │ ├── fields.md │ │ ├── from_provider.md │ │ ├── hooks.md │ │ ├── index.md │ │ ├── iterable.md │ │ ├── lists.md │ │ ├── logging.md │ │ ├── maybe.md │ │ ├── migration.md │ │ ├── mode-migration.md │ │ ├── models.md │ │ ├── multimodal.md │ │ ├── parallel.md │ │ ├── partial.md │ │ ├── patching.md │ │ ├── philosophy.md │ │ ├── prompt_caching.md │ │ ├── prompting.md │ │ ├── raw_response.md │ │ ├── reask_validation.md │ │ ├── retrying.md │ │ ├── semantic_validation.md │ │ ├── templating.md │ │ ├── typeadapter.md │ │ ├── typeddicts.md │ │ ├── types.md │ │ ├── union.md │ │ ├── unions.md │ │ ├── usage.md │ │ └── validation.md │ ├── contributing.md │ ├── debugging.md │ ├── examples/ │ │ ├── action_items.md │ │ ├── audio_extraction.md │ │ ├── batch_classification_langsmith.md │ │ ├── batch_in_memory.md │ │ ├── batch_job_oai.md │ │ ├── building_knowledge_graphs.md │ │ ├── bulk_classification.md │ │ ├── classification.md │ │ ├── document_segmentation.md │ │ ├── entity_resolution.md │ │ ├── exact_citations.md │ │ ├── examples.md │ │ ├── extract_contact_info.md │ │ ├── extract_slides.md │ │ ├── extracting_receipts.md │ │ ├── extracting_tables.md │ │ ├── groq.md │ │ ├── image_to_ad_copy.md │ │ ├── index.md │ │ ├── knowledge_graph.md │ │ ├── local_classification.md │ │ ├── mistral.md │ │ ├── moderation.md │ │ ├── multi_modal_gemini.md │ │ ├── multiple_classification.md │ │ ├── ollama.md │ │ ├── open_source.md │ │ ├── pandas_df.md │ │ ├── partial_streaming.md │ │ ├── pii.md │ │ ├── planning-tasks.md │ │ ├── recursive.md │ │ ├── search.md │ │ ├── self_critique.md │ │ ├── single_classification.md │ │ ├── sqlmodel.md │ │ ├── tables_from_vision.md │ │ ├── tracing_with_langfuse.md │ │ ├── using_decimals.md │ │ ├── watsonx.md │ │ └── youtube_clips.md │ ├── faq.md │ ├── getting-started.md │ ├── help.md │ ├── hooks/ │ │ └── hide_lines.py │ ├── index.md │ ├── installation.md │ ├── integrations/ │ │ ├── anthropic.md │ │ ├── anyscale.md │ │ ├── azure.md │ │ ├── bedrock.md │ │ ├── cerebras.md │ │ ├── cohere.md │ │ ├── cortex.md │ │ ├── databricks.md │ │ ├── deepseek.md │ │ ├── fireworks.md │ │ ├── genai.md │ │ ├── google.md │ │ ├── groq.md │ │ ├── index.md │ │ ├── litellm.md │ │ ├── llama-cpp-python.md │ │ ├── mistral.md │ │ ├── ollama.md │ │ ├── openai-responses.md │ │ ├── openai.md │ │ ├── openrouter.md │ │ ├── perplexity.md │ │ ├── sambanova.md │ │ ├── together.md │ │ ├── truefoundry.md │ │ ├── vertex.md │ │ ├── writer.md │ │ └── xai.md │ ├── javascripts/ │ │ └── katex.js │ ├── jobs.md │ ├── learning/ │ │ ├── getting_started/ │ │ │ ├── first_extraction.md │ │ │ ├── installation.md │ │ │ ├── response_models.md │ │ │ └── structured_outputs.md │ │ ├── index.md │ │ ├── patterns/ │ │ │ ├── field_validation.md │ │ │ ├── list_extraction.md │ │ │ ├── nested_structure.md │ │ │ ├── optional_fields.md │ │ │ ├── prompt_templates.md │ │ │ └── simple_object.md │ │ ├── streaming/ │ │ │ ├── basics.md │ │ │ └── lists.md │ │ └── validation/ │ │ ├── basics.md │ │ ├── custom_validators.md │ │ ├── field_level_validation.md │ │ └── retry_mechanisms.md │ ├── llms.txt │ ├── modes-comparison.md │ ├── newsletter.md │ ├── overrides/ │ │ └── main.html │ ├── prompting/ │ │ ├── decomposition/ │ │ │ ├── decomp.md │ │ │ ├── faithful_cot.md │ │ │ ├── least_to_most.md │ │ │ ├── plan_and_solve.md │ │ │ ├── program_of_thought.md │ │ │ ├── recurs_of_thought.md │ │ │ ├── skeleton_of_thought.md │ │ │ └── tree-of-thought.md │ │ ├── ensembling/ │ │ │ ├── cosp.md │ │ │ ├── dense.md │ │ │ ├── diverse.md │ │ │ ├── max_mutual_information.md │ │ │ ├── meta_cot.md │ │ │ ├── more.md │ │ │ ├── prompt_paraphrasing.md │ │ │ ├── self_consistency.md │ │ │ ├── universal_self_consistency.md │ │ │ └── usp.md │ │ ├── few_shot/ │ │ │ ├── cosp.md │ │ │ ├── example_generation/ │ │ │ │ └── sg_icl.md │ │ │ ├── example_ordering.md │ │ │ └── exemplar_selection/ │ │ │ ├── knn.md │ │ │ └── vote_k.md │ │ ├── index.md │ │ ├── self_criticism/ │ │ │ ├── chain_of_verification.md │ │ │ ├── cumulative_reason.md │ │ │ ├── reversecot.md │ │ │ ├── self_calibration.md │ │ │ ├── self_refine.md │ │ │ └── self_verification.md │ │ ├── thought_generation/ │ │ │ ├── chain_of_thought_few_shot/ │ │ │ │ ├── active_prompt.md │ │ │ │ ├── auto_cot.md │ │ │ │ ├── complexity_based.md │ │ │ │ ├── contrastive.md │ │ │ │ ├── memory_of_thought.md │ │ │ │ ├── prompt_mining.md │ │ │ │ └── uncertainty_routed_cot.md │ │ │ └── chain_of_thought_zero_shot/ │ │ │ ├── analogical_prompting.md │ │ │ ├── step_back_prompting.md │ │ │ ├── tab_cot.md │ │ │ └── thread_of_thought.md │ │ └── zero_shot/ │ │ ├── emotion_prompting.md │ │ ├── rar.md │ │ ├── re2.md │ │ ├── role_prompting.md │ │ ├── s2a.md │ │ ├── self_ask.md │ │ ├── simtom.md │ │ └── style_prompting.md │ ├── repository-overview.md │ ├── start-here.md │ ├── templates/ │ │ └── provider_template.md │ ├── tutorials/ │ │ ├── 1-introduction.ipynb │ │ ├── 2-tips.ipynb │ │ ├── 3-0-applications-rag.ipynb │ │ ├── 3-1-validation-rag.ipynb │ │ ├── 4-validation.ipynb │ │ ├── 5-knowledge-graphs.ipynb │ │ ├── 6-chain-of-density.ipynb │ │ ├── 7-synthetic-data-generation.ipynb │ │ └── index.md │ └── why.md ├── ellipsis.yaml ├── examples/ │ ├── __init__.py │ ├── anthropic/ │ │ └── run.py │ ├── anthropic-web-tool/ │ │ └── run.py │ ├── asyncio-benchmarks/ │ │ └── run.py │ ├── auto-ticketer/ │ │ └── run.py │ ├── automodel/ │ │ └── run.py │ ├── avail/ │ │ ├── run.py │ │ └── run_mixtral.py │ ├── batch-classification/ │ │ ├── run-cache.py │ │ ├── run.py │ │ └── run_langsmith.py │ ├── batch_api/ │ │ ├── README.md │ │ ├── in_memory_batch_example.py │ │ └── run_batch_test.py │ ├── caching/ │ │ ├── example_diskcache.py │ │ ├── example_redis.py │ │ ├── lru.py │ │ └── run.py │ ├── caching_prototype/ │ │ ├── README.md │ │ └── run_real.py │ ├── chain-of-density/ │ │ ├── Readme.md │ │ ├── chain_of_density.py │ │ ├── finetune.py │ │ └── requirements.txt │ ├── citation_with_extraction/ │ │ ├── Dockerfile │ │ ├── README.md │ │ ├── citation_fuzzy_match.py │ │ ├── diagram.py │ │ ├── main.py │ │ ├── modal_main.py │ │ └── requirements.txt │ ├── citations/ │ │ └── run.py │ ├── classification/ │ │ ├── classifiy_with_validation.py │ │ ├── multi_prediction.py │ │ └── simple_prediction.py │ ├── codegen-from-schema/ │ │ ├── create_fastapi_app.py │ │ ├── input.json │ │ ├── models.py │ │ ├── readme.md │ │ └── run.py │ ├── cohere/ │ │ └── cohere.py │ ├── crm/ │ │ └── run.py │ ├── decimals/ │ │ └── run.py │ ├── distilations/ │ │ ├── math_finetunes_val.jsonl │ │ ├── readme.md │ │ ├── three_digit_mul.py │ │ └── three_digit_mul_dispatch.py │ ├── evals/ │ │ ├── eval.py │ │ ├── models.py │ │ ├── stats_dict.py │ │ ├── streamlit.py │ │ └── test.jsonl │ ├── extract-table/ │ │ ├── run_vision.py │ │ ├── run_vision_langsmith.py │ │ ├── run_vision_org.py │ │ ├── run_vision_org_table.py │ │ ├── run_vision_receipt.py │ │ └── test.py │ ├── extracting-pii/ │ │ └── run.py │ ├── fastapi_app/ │ │ ├── __init__.py │ │ ├── main.py │ │ └── script.py │ ├── fizzbuzz/ │ │ └── run.py │ ├── gpt-engineer/ │ │ ├── changes.diff │ │ ├── generate.py │ │ ├── program.json │ │ └── refactor.py │ ├── groq/ │ │ ├── groq_example.py │ │ └── groq_example2.py │ ├── hooks/ │ │ ├── README.md │ │ └── run.py │ ├── iterables/ │ │ └── run.py │ ├── knowledge-graph/ │ │ ├── run.py │ │ └── run_stream.py │ ├── learn-async/ │ │ └── run.py │ ├── llm-judge-relevance/ │ │ └── run.py │ ├── logfire/ │ │ ├── classify.py │ │ ├── image.py │ │ ├── requirements.txt │ │ └── validate.py │ ├── logfire-fastapi/ │ │ ├── Readme.md │ │ ├── requirements.txt │ │ ├── server.py │ │ └── test.py │ ├── logging/ │ │ └── run.py │ ├── match_language/ │ │ ├── run_v1.py │ │ └── run_v2.py │ ├── mistral/ │ │ └── mistral.py │ ├── multi-actions/ │ │ └── run.py │ ├── multiple_search_queries/ │ │ ├── diagram.py │ │ └── segment_search_queries.py │ ├── open_source_examples/ │ │ ├── README.md │ │ ├── openrouter.py │ │ ├── perplexity.py │ │ └── runpod.py │ ├── openai/ │ │ ├── __init__.py │ │ └── run.py │ ├── openai-audio/ │ │ └── run.py │ ├── parallel/ │ │ └── run.py │ ├── partial_streaming/ │ │ ├── benchmark.py │ │ └── run.py │ ├── patching/ │ │ ├── anyscale.py │ │ ├── oai.py │ │ ├── pcalls.py │ │ └── together.py │ ├── proscons/ │ │ └── run.py │ ├── query_planner_execution/ │ │ ├── diagram.py │ │ └── query_planner_execution.py │ ├── recursive_filepaths/ │ │ ├── diagram.py │ │ └── parse_recursive_paths.py │ ├── reranker/ │ │ └── run.py │ ├── resolving-complex-entities/ │ │ └── run.py │ ├── retry/ │ │ └── run.py │ ├── safer_sql_example/ │ │ ├── diagram.py │ │ └── safe_sql.py │ ├── simple-extraction/ │ │ ├── maybe_user.py │ │ └── user.py │ ├── situate_context/ │ │ └── run.py │ ├── sqlmodel/ │ │ ├── run.py │ │ └── test_basic.py │ ├── stream_action_items/ │ │ └── run.py │ ├── synethic-data/ │ │ └── run.py │ ├── task_planner/ │ │ ├── diagram.py │ │ └── task_planner_topological_sort.py │ ├── tenacity-benchmarks/ │ │ └── run.py │ ├── timestamps/ │ │ └── run.py │ ├── union/ │ │ └── run.py │ ├── validated-multiclass/ │ │ ├── output.json │ │ └── run.py │ ├── validators/ │ │ ├── allm_validator.py │ │ ├── annotator.py │ │ ├── chain_of_thought_validator.py │ │ ├── citations.py │ │ ├── competitors.py │ │ ├── field_validator.py │ │ ├── just_a_guy.py │ │ ├── llm_validator.py │ │ ├── moderation.py │ │ └── readme.md │ ├── vision/ │ │ ├── image_to_ad_copy.py │ │ ├── run.py │ │ ├── run_raw.py │ │ ├── run_table.py │ │ └── slides.py │ ├── watsonx/ │ │ └── watsonx.py │ ├── youtube/ │ │ └── run.py │ ├── youtube-clips/ │ │ └── run.py │ └── youtube-flashcards/ │ └── run.py ├── github_issue.md ├── instructor/ │ ├── __init__.py │ ├── _types/ │ │ ├── __init__.py │ │ └── _alias.py │ ├── auto_client.py │ ├── batch/ │ │ ├── __init__.py │ │ ├── models.py │ │ ├── processor.py │ │ ├── providers/ │ │ │ ├── __init__.py │ │ │ ├── anthropic.py │ │ │ ├── base.py │ │ │ └── openai.py │ │ ├── request.py │ │ └── utils.py │ ├── cache/ │ │ └── __init__.py │ ├── cli/ │ │ ├── __init__.py │ │ ├── batch.py │ │ ├── cli.py │ │ ├── deprecated_hub.py │ │ ├── files.py │ │ ├── jobs.py │ │ └── usage.py │ ├── client.py │ ├── core/ │ │ ├── __init__.py │ │ ├── client.py │ │ ├── exceptions.py │ │ ├── hooks.py │ │ ├── patch.py │ │ └── retry.py │ ├── distil.py │ ├── dsl/ │ │ ├── __init__.py │ │ ├── citation.py │ │ ├── iterable.py │ │ ├── json_tracker.py │ │ ├── maybe.py │ │ ├── parallel.py │ │ ├── partial.py │ │ ├── response_list.py │ │ ├── simple_type.py │ │ └── validators.py │ ├── exceptions.py │ ├── function_calls.py │ ├── hooks.py │ ├── mode.py │ ├── models.py │ ├── multimodal.py │ ├── patch.py │ ├── process_response.py │ ├── processing/ │ │ ├── __init__.py │ │ ├── function_calls.py │ │ ├── multimodal.py │ │ ├── response.py │ │ ├── schema.py │ │ └── validators.py │ ├── providers/ │ │ ├── README.md │ │ ├── __init__.py │ │ ├── anthropic/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── bedrock/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── cerebras/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── cohere/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── fireworks/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── gemini/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── genai/ │ │ │ ├── __init__.py │ │ │ └── client.py │ │ ├── groq/ │ │ │ ├── __init__.py │ │ │ └── client.py │ │ ├── mistral/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── openai/ │ │ │ ├── __init__.py │ │ │ └── utils.py │ │ ├── perplexity/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ ├── vertexai/ │ │ │ ├── __init__.py │ │ │ └── client.py │ │ ├── writer/ │ │ │ ├── __init__.py │ │ │ ├── client.py │ │ │ └── utils.py │ │ └── xai/ │ │ ├── __init__.py │ │ ├── client.py │ │ └── utils.py │ ├── py.typed │ ├── templating.py │ ├── utils/ │ │ ├── __init__.py │ │ ├── core.py │ │ └── providers.py │ ├── validation/ │ │ ├── __init__.py │ │ ├── async_validators.py │ │ └── llm_validators.py │ └── validators.py ├── mkdocs.yml ├── pyproject.toml ├── requirements-doc.txt ├── requirements-examples.txt ├── requirements.txt ├── scripts/ │ ├── README.md │ ├── audit_patterns.py │ ├── check_blog_excerpts.py │ ├── check_links.py │ ├── fix_api_calls.py │ ├── fix_doc_tests.py │ ├── fix_old_patterns.py │ ├── make_clean.py │ ├── make_desc.py │ ├── make_sitemap.py │ ├── validate_headings.py │ └── validate_meta_tags.py ├── sitemap.yaml ├── tests/ │ ├── __init__.py │ ├── conftest.py │ ├── docs/ │ │ ├── _concept_groups.py │ │ ├── _example_groups.py │ │ ├── conftest.py │ │ ├── test_concepts.py │ │ ├── test_concepts_advanced.py │ │ ├── test_concepts_operations.py │ │ ├── test_concepts_providers.py │ │ ├── test_docs.py │ │ ├── test_examples.py │ │ ├── test_examples_batch.py │ │ ├── test_examples_integrations.py │ │ ├── test_examples_multimodal.py │ │ ├── test_examples_providers.py │ │ ├── test_hub.py │ │ ├── test_mkdocs.py │ │ ├── test_posts.py │ │ └── test_prompt_tips.py │ ├── dsl/ │ │ ├── test_gemini_tools_async_streaming.py │ │ ├── test_partial.py │ │ ├── test_simple_type.py │ │ └── test_simple_type_fix.py │ ├── genai/ │ │ └── test_safety_settings.py │ ├── llm/ │ │ ├── __init__.py │ │ ├── shared_config.py │ │ ├── test_anthropic/ │ │ │ ├── __init__.py │ │ │ ├── conftest.py │ │ │ ├── test_multimodal.py │ │ │ ├── test_reasoning.py │ │ │ ├── test_system.py │ │ │ └── util.py │ │ ├── test_bedrock/ │ │ │ ├── conftest.py │ │ │ ├── test_bedrock_native_passthrough.py │ │ │ ├── test_normalize.py │ │ │ ├── test_openai_image_conversion.py │ │ │ └── test_prepare_kwargs.py │ │ ├── test_core_providers/ │ │ │ ├── README.md │ │ │ ├── __init__.py │ │ │ ├── capabilities.py │ │ │ ├── conftest.py │ │ │ ├── test_basic_extraction.py │ │ │ ├── test_response_modes.py │ │ │ ├── test_retries.py │ │ │ ├── test_simple_types.py │ │ │ ├── test_streaming.py │ │ │ └── test_validation.py │ │ ├── test_gemini/ │ │ │ ├── __init__.py │ │ │ ├── conftest.py │ │ │ ├── evals/ │ │ │ │ ├── __init__.py │ │ │ │ └── test_extract_users.py │ │ │ ├── test_list_content.py │ │ │ ├── test_multimodal_content.py │ │ │ └── util.py │ │ ├── test_genai/ │ │ │ ├── __init__.py │ │ │ ├── conftest.py │ │ │ ├── test_decimal.py │ │ │ ├── test_format.py │ │ │ ├── test_invalid_schema.py │ │ │ ├── test_reask.py │ │ │ ├── test_schema_conversion.py │ │ │ ├── test_utils.py │ │ │ └── util.py │ │ ├── test_litellm.py │ │ ├── test_new_client.py │ │ ├── test_openai/ │ │ │ ├── __init__.py │ │ │ ├── conftest.py │ │ │ ├── slow/ │ │ │ │ └── test_response.py │ │ │ ├── test_attr.py │ │ │ ├── test_hooks.py │ │ │ ├── test_multimodal.py │ │ │ ├── test_multitask.py │ │ │ ├── test_patch.py │ │ │ ├── test_validation_context.py │ │ │ ├── test_validators.py │ │ │ └── util.py │ │ ├── test_vertexai/ │ │ │ ├── __init__.py │ │ │ ├── conftest.py │ │ │ ├── test_deprecated_async.py │ │ │ ├── test_format.py │ │ │ ├── test_message_parser.py │ │ │ ├── test_modes.py │ │ │ └── util.py │ │ └── test_writer/ │ │ ├── __init__.py │ │ ├── conftest.py │ │ ├── evals/ │ │ │ ├── __init__.py │ │ │ ├── test_classification_enums.py │ │ │ ├── test_classification_literals.py │ │ │ ├── test_entities.py │ │ │ ├── test_extract_users.py │ │ │ └── test_sentiment_analysis.py │ │ ├── test_format_common_models.py │ │ ├── test_format_difficult_models.py │ │ └── util.py │ ├── processing/ │ │ └── test_anthropic_json.py │ ├── test_auto_client.py │ ├── test_batch_in_memory.py │ ├── test_cache_integration.py │ ├── test_cache_key.py │ ├── test_dict_operations.py │ ├── test_dict_operations_validation.py │ ├── test_dynamic_model_creation.py │ ├── test_exception_backwards_compat.py │ ├── test_exceptions.py │ ├── test_fizzbuzz_fix.py │ ├── test_formatting.py │ ├── test_function_calls.py │ ├── test_genai_config_merging.py │ ├── test_genai_reask.py │ ├── test_json_extraction.py │ ├── test_json_extraction_edge_cases.py │ ├── test_list_response.py │ ├── test_list_response_wrapper.py │ ├── test_logging.py │ ├── test_message_processing.py │ ├── test_multimodal.py │ ├── test_multitask.py │ ├── test_patch.py │ ├── test_process_response.py │ ├── test_response_model_conversion.py │ ├── test_retry_json_mode.py │ ├── test_schema.py │ ├── test_schema_utils.py │ ├── test_simple_types.py │ ├── test_streaming_reask_bug.py │ ├── test_utils.py │ ├── test_xai_optional_dependency.py │ └── v2/ │ └── test_provider_modes.py ├── ty-tests.toml └── ty.toml ================================================ FILE CONTENTS ================================================ ================================================ FILE: .coveragerc ================================================ [run] source = instructor/ omit = instructor/cli/* ================================================ FILE: .cursor/rules/documentation-sync.mdc ================================================ --- description: when making code changes or adding documentation globs: ["*.py", "*.md"] alwaysApply: true --- - When making code changes: - Update related documentation files to reflect the changes - Check docstrings and type hints are up to date - Update any example code in markdown files - Review README.md if the changes affect installation or usage - When creating new markdown files: - Add the file to mkdocs.yml under the appropriate section - Follow the existing hierarchy and indentation - Use descriptive nav titles - Example: ```yaml nav: - Home: index.md - Guides: - Getting Started: guides/getting-started.md - Your New File: guides/your-new-file.md ``` - For API documentation: - Ensure new functions/classes are documented - Include type hints and docstrings - Add usage examples - Update API reference docs if auto-generated - Documentation Quality: - Write at grade 10 reading level (see simple-language.mdc) - Include working code examples - Add links to related documentation - Use consistent formatting and style ================================================ FILE: .cursor/rules/followups.mdc ================================================ --- description: when AI agents are collaborating on code globs: "*" alwaysApply: true --- Make sure to come up with follow-up hot keys. They should be thoughtful and actionable and result in small additional code changes based on the context that you have available. using [J], [K], [L] ================================================ FILE: .cursor/rules/new-features-planning.mdc ================================================ --- description: when asked to implement new features or clients globs: *.py alwaysApply: true --- - When being asked to make new features, make sure that you check out from main a new branch and make incremental commits - Use conventional commit format: `(): ` - Types: feat, fix, docs, style, refactor, perf, test, chore - Example: `feat(validation): add email validation function` - Keep commits focused on a single change - Write descriptive commit messages in imperative mood - Use `git commit -m "type(scope): subject" -m "body" -m "footer"` for multiline commits - If the feature is very large, create a temporary `todo.md` - And start a pull request using `gh` - Create PRs with multiline bodies using: ```bash gh pr create --title "feat(component): add new feature" --body "$(cat < --add-reviewer jxnl,ivanleomk` - Or include `-r jxnl,ivanleomk` when creating the PR - use `gh pr view --comments | cat` to view all the comments - For PR updates: - Do not directly commit to an existing PR branch - Instead, create a new PR that builds on top of the original PR's branch - This creates a "stacked PR" pattern where: 1. The original PR (base) contains the initial changes 2. The new PR (stack) contains only the review-related updates 3. Once the base PR is merged, the stack can be rebased onto main ================================================ FILE: .cursor/rules/readme.md ================================================ # Cursor Rules Cursor rules are configuration files that help guide AI-assisted development in the Cursor IDE. They provide structured instructions for how the AI should behave in specific contexts or when working with certain types of files. ## What is Cursor? [Cursor](https://cursor.sh) is an AI-powered IDE that helps developers write, understand, and maintain code more efficiently. It integrates AI capabilities directly into the development workflow, providing features like: - AI-assisted code completion - Natural language code generation - Intelligent code explanations - Automated refactoring suggestions ## Understanding Cursor Rules Cursor rules are defined in `.mdc` files within the `.cursor/rules` directory. Each rule file follows a specific naming convention: lowercase names with the `.mdc` extension (e.g., `simple-language.mdc`). Each rule file contains: 1. **Metadata Header**: YAML frontmatter that defines: ```yaml --- description: when to apply this rule globs: file patterns to match (e.g., "*.py", "*.md", or "*" for all files) alwaysApply: true/false # whether to apply automatically --- ``` 2. **Rule Content**: Markdown-formatted instructions that guide the AI's behavior ## Available Rules Currently, the following rules are defined: ### `simple-language.mdc` - **Purpose**: Ensures documentation is written at a grade 10 reading level - **Applies to**: Markdown files (*.md) - **Auto Apply**: No - **Key Requirements**: - Write at grade 10 reading level - Ensure code blocks are self-contained with complete imports ### `new-features-planning.mdc` - **Purpose**: Guides feature implementation workflow - **Applies to**: Python files (*.py) - **Auto Apply**: Yes - **Key Requirements**: - Create new branch from main - Make incremental commits - Create todo.md for large features - Start pull requests using GitHub CLI (`gh`) - Include "This PR was written by [Cursor](https://cursor.sh)" in PRs ### `followups.mdc` - **Purpose**: Ensures thoughtful follow-up suggestions - **Applies to**: All files - **Auto Apply**: Yes - **Key Requirements**: - Generate actionable hotkey suggestions using: - [J]: First follow-up action - [K]: Second follow-up action - [L]: Third follow-up action - Focus on small, contextual code changes - Suggestions should be thoughtful and actionable ### `documentation-sync.mdc` - **Purpose**: Maintains documentation consistency with code changes - **Applies to**: Python and Markdown files (*.py, *.md) - **Auto Apply**: Yes - **Key Requirements**: - Update docs when code changes - Add new markdown files to mkdocs.yml - Keep API documentation current - Maintain documentation quality standards ## Creating New Rules To create a new rule: 1. Create a `.mdc` file in `.cursor/rules/` using lowercase naming 2. Add YAML frontmatter with required metadata: ```yaml --- description: when to apply this rule globs: file patterns to match alwaysApply: true/false --- ``` 3. Write clear, specific instructions in Markdown 4. Test the rule with relevant file types ## Best Practices - Keep rules focused and specific - Use clear, actionable language - Test rules thoroughly before committing - Document any special requirements or dependencies - Update rules as project needs evolve - Use consistent file naming (lowercase with .mdc extension) - Ensure globs patterns are explicit and documented ================================================ FILE: .cursor/rules/simple-language.mdc ================================================ --- description: when writing documentation globs: *.md alwaysApply: false --- - When writing documents and concepts make sure that you write at a grade 10 reading level - make sure every code block has complete imports and makes no references to previous code blocks, each one needs to be self contained ================================================ FILE: .cursorignore ================================================ # Add directories or file patterns to ignore during indexing (e.g. foo/ or *.csv) ================================================ FILE: .github/FUNDING.yml ================================================ github: jxnl ================================================ FILE: .github/ISSUE_TEMPLATE/bug_report.md ================================================ --- name: Bug report about: Create a report to help us improve --- - [ ] This is actually a bug report. - [ ] I am not getting good LLM Results - [ ] I have tried asking for help in the community on discord or discussions and have not received a response. - [ ] I have tried searching the documentation and have not found an answer. **What Model are you using?** - [ ] gpt-3.5-turbo - [ ] gpt-4-turbo - [ ] gpt-4 - [ ] Other (please specify) **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior, including code snippets of the model and the input data and openai response. **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. ================================================ FILE: .github/ISSUE_TEMPLATE/feature_request.md ================================================ --- name: Feature request about: Suggest an idea for this project --- **Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. ================================================ FILE: .github/PULL_REQUEST_TEMPLATE/pull_request_template.md ================================================ > Please use conventional commits to describe your changes. For example, `feat: add new feature` or `fix: fix a bug`. If you are unsure, leave the title as `...` and AI will handle it. ## Describe your changes ... ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code - [ ] If it is a core feature, I have added thorough tests. - [ ] If it is a core feature, I have added documentation. ================================================ FILE: .github/dependabot.yml ================================================ # To get started with Dependabot version updates, you'll need to specify which # package ecosystems to update and where the package manifests are located. # Please see the documentation for all configuration options: # https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates version: 2 updates: - package-ecosystem: "pip" # See documentation for possible values directory: "/" # Location of package manifests schedule: interval: "daily" groups: poetry: patterns: ["*"] ================================================ FILE: .github/workflows/ai-label.yml ================================================ name: AI Labeler on: issues: types: [opened, reopened] pull_request: types: [opened, reopened] jobs: ai-labeler: runs-on: ubuntu-latest permissions: contents: read issues: write pull-requests: write steps: - uses: actions/checkout@v4 - uses: jlowin/ai-labeler@v0.4.0 with: include-repo-labels: true openai-api-key: ${{ secrets.OPENAI_API_KEY }} ================================================ FILE: .github/workflows/evals.yml ================================================ name: Weekly Tests on: workflow_dispatch: schedule: - cron: "0 0 * * 0" # Runs at 00:00 UTC every Sunday push: branches: [main] paths-ignore: - "**" # Ignore all paths to ensure it only triggers on schedule jobs: weekly-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install dependencies run: uv sync --all-extras --dev - name: Run all tests run: uv run pytest tests/ --asyncio-mode=auto env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ================================================ FILE: .github/workflows/python-publish.yml ================================================ # This workflow will upload a Python Package using Twine when a release is created # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries # This workflow uses actions that are not certified by GitHub. # They are provided by a third-party and are governed by # separate terms of service, privacy policy, and support # documentation. name: Upload Python Package on: release: types: [published] permissions: contents: read jobs: release: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.10 - name: Install the project run: uv sync --all-extras - name: Build the project run: uv build - name: Build and publish Python package run: uv publish env: UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN }} ================================================ FILE: .github/workflows/ruff.yml ================================================ name: Ruff on: push: pull_request: branches: [main] env: WORKING_DIRECTORY: "." CUSTOM_PACKAGES: "instructor examples tests" jobs: Ruff: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.9 - name: Install the project run: uv sync --all-extras - name: Ruff lint run: uv run ruff check ${{ env.CUSTOM_PACKAGES }} - name: Ruff format run: uv run ruff format --check ${{ env.CUSTOM_PACKAGES }} ================================================ FILE: .github/workflows/scheduled-release.yml ================================================ name: Scheduled Release on: schedule: # Every 2 weeks on Monday at 9 AM UTC - cron: '0 9 * * 1/2' workflow_dispatch: # Allow manual trigger inputs: skip_tests: description: 'Skip LLM tests (use for testing workflow)' required: false default: false type: boolean dry_run: description: 'Dry run - dont push changes or create release' required: false default: false type: boolean jobs: test-and-release: runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v4 with: fetch-depth: 0 token: ${{ secrets.GITHUB_TOKEN }} - name: Setup UV uses: astral-sh/setup-uv@v3 - name: Install dependencies run: | uv sync --all-extras --dev - name: Run linting run: | uv run ruff check instructor examples tests - name: Run type checking run: | uv run pyright - name: Run core tests (no LLM) run: | uv run pytest tests/ -k "not openai and not llm and not anthropic and not gemini and not cohere and not mistral and not groq and not vertexai and not xai and not cerebras and not fireworks and not writer and not bedrock and not perplexity and not genai" --tb=short -v --maxfail=10 # Optional: Run LLM tests if you have API keys in secrets - name: Run LLM tests if: github.event.inputs.skip_tests != 'true' env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }} MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }} run: | echo "Running basic LLM tests if API keys are available..." # Run a subset of LLM tests to verify basic functionality if [ ! -z "$OPENAI_API_KEY" ]; then echo "Testing OpenAI integration..." uv run pytest tests/llm/test_openai/test_basics.py --tb=short -v --maxfail=1 || echo "OpenAI tests failed" fi if [ ! -z "$ANTHROPIC_API_KEY" ]; then echo "Testing Anthropic integration..." uv run pytest tests/llm/test_anthropic/test_basics.py --tb=short -v --maxfail=1 || echo "Anthropic tests failed" fi echo "LLM tests completed (non-blocking)" - name: Check for changes since last release id: changes run: | LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "") if [ -z "$LAST_TAG" ]; then echo "has_changes=true" >> $GITHUB_OUTPUT echo "last_tag=none" >> $GITHUB_OUTPUT echo "change_count=initial" >> $GITHUB_OUTPUT else CHANGES=$(git rev-list $LAST_TAG..HEAD --count) echo "has_changes=$([[ $CHANGES -gt 0 ]] && echo true || echo false)" >> $GITHUB_OUTPUT echo "change_count=$CHANGES" >> $GITHUB_OUTPUT echo "last_tag=$LAST_TAG" >> $GITHUB_OUTPUT fi echo "Last tag: $LAST_TAG" echo "Changes since last tag: $(git rev-list $LAST_TAG..HEAD --count 2>/dev/null || echo 'N/A')" # Only proceed with release if tests passed AND there are changes - name: Get current version if: steps.changes.outputs.has_changes == 'true' id: current_version run: | VERSION=$(uv run python -c "import tomllib; print(tomllib.load(open('pyproject.toml', 'rb'))['project']['version'])") echo "version=$VERSION" >> $GITHUB_OUTPUT echo "Current version: $VERSION" - name: Determine version bump type if: steps.changes.outputs.has_changes == 'true' id: version_type run: | # Check commit messages since last tag to determine bump type LAST_TAG="${{ steps.changes.outputs.last_tag }}" if [ "$LAST_TAG" = "none" ]; then COMMITS=$(git log --oneline HEAD~20..HEAD) else COMMITS=$(git log --oneline $LAST_TAG..HEAD) fi echo "Recent commits:" echo "$COMMITS" # Look for breaking changes or major features if echo "$COMMITS" | grep -qE "(BREAKING|feat!|fix!)"; then echo "bump_type=minor" >> $GITHUB_OUTPUT echo "Detected breaking changes - using minor bump" elif echo "$COMMITS" | grep -qE "feat:"; then echo "bump_type=minor" >> $GITHUB_OUTPUT echo "Detected new features - using minor bump" else echo "bump_type=patch" >> $GITHUB_OUTPUT echo "Using patch bump for bug fixes and chores" fi - name: Bump version if: steps.changes.outputs.has_changes == 'true' id: bump_version run: | CURRENT="${{ steps.current_version.outputs.version }}" BUMP_TYPE="${{ steps.version_type.outputs.bump_type }}" IFS='.' read -r major minor patch <<< "$CURRENT" case $BUMP_TYPE in major) major=$((major + 1)) minor=0 patch=0 ;; minor) minor=$((minor + 1)) patch=0 ;; patch) patch=$((patch + 1)) ;; esac NEW_VERSION="$major.$minor.$patch" echo "new_version=$NEW_VERSION" >> $GITHUB_OUTPUT echo "Bumping from $CURRENT to $NEW_VERSION ($BUMP_TYPE)" # Update pyproject.toml sed -i "s/version = \"$CURRENT\"/version = \"$NEW_VERSION\"/" pyproject.toml - name: Update lockfile if: steps.changes.outputs.has_changes == 'true' run: | uv lock # Run tests again after version bump to make sure nothing broke - name: Final test run if: steps.changes.outputs.has_changes == 'true' run: | uv sync uv run pytest tests/ -k "not openai and not llm and not anthropic and not gemini and not cohere and not mistral and not groq and not vertexai and not xai and not cerebras and not fireworks and not writer and not bedrock and not perplexity and not genai" --tb=short --maxfail=5 - name: Generate changelog if: steps.changes.outputs.has_changes == 'true' id: changelog run: | LAST_TAG="${{ steps.changes.outputs.last_tag }}" NEW_VERSION="${{ steps.bump_version.outputs.new_version }}" if [ "$LAST_TAG" = "none" ]; then CHANGELOG=$(git log --oneline HEAD~30..HEAD --pretty=format:"- %s" | head -20) else CHANGELOG=$(git log --oneline $LAST_TAG..HEAD --pretty=format:"- %s") fi # Save changelog to file for GitHub release cat > CHANGELOG.md << EOF ## 🚀 What's Changed $CHANGELOG ## 🔗 Links **Full Changelog**: https://github.com/${{ github.repository }}/compare/$LAST_TAG...v$NEW_VERSION --- 🤖 *This release was automatically generated every 2 weeks* EOF echo "changelog_file=CHANGELOG.md" >> $GITHUB_OUTPUT - name: Create release commit if: steps.changes.outputs.has_changes == 'true' run: | git config --local user.email "action@github.com" git config --local user.name "GitHub Action" git add pyproject.toml uv.lock git commit -m "chore: automated release v${{ steps.bump_version.outputs.new_version }} 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: GitHub Action " git tag "v${{ steps.bump_version.outputs.new_version }}" - name: Push changes if: steps.changes.outputs.has_changes == 'true' && github.event.inputs.dry_run != 'true' run: | git push origin main git push origin "v${{ steps.bump_version.outputs.new_version }}" - name: Create GitHub Release if: steps.changes.outputs.has_changes == 'true' && github.event.inputs.dry_run != 'true' uses: ncipollo/release-action@v1 with: tag: "v${{ steps.bump_version.outputs.new_version }}" name: "🚀 Release v${{ steps.bump_version.outputs.new_version }}" bodyFile: "CHANGELOG.md" draft: false prerelease: false - name: Dry run summary if: steps.changes.outputs.has_changes == 'true' && github.event.inputs.dry_run == 'true' run: | echo "🧪 DRY RUN MODE - No changes pushed" echo "Would have released: v${{ steps.bump_version.outputs.new_version }}" cat CHANGELOG.md # Optional: Publish to PyPI (uncomment if you want automatic PyPI releases) # - name: Build and publish to PyPI # if: steps.changes.outputs.has_changes == 'true' && secrets.PYPI_TOKEN != '' # env: # PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }} # run: | # uv build # uv publish --token $PYPI_TOKEN # Summary outputs - name: Summary if: always() run: | echo "## 📊 Scheduled Release Summary" >> $GITHUB_STEP_SUMMARY echo "- **Branch**: ${{ github.ref }}" >> $GITHUB_STEP_SUMMARY echo "- **Has Changes**: ${{ steps.changes.outputs.has_changes }}" >> $GITHUB_STEP_SUMMARY echo "- **Change Count**: ${{ steps.changes.outputs.change_count }}" >> $GITHUB_STEP_SUMMARY if [ "${{ steps.changes.outputs.has_changes }}" = "true" ]; then echo "- **Version**: ${{ steps.current_version.outputs.version }} → ${{ steps.bump_version.outputs.new_version }}" >> $GITHUB_STEP_SUMMARY echo "- **Bump Type**: ${{ steps.version_type.outputs.bump_type }}" >> $GITHUB_STEP_SUMMARY echo "- **Status**: ✅ Released" >> $GITHUB_STEP_SUMMARY else echo "- **Status**: ⏭️ Skipped (no changes)" >> $GITHUB_STEP_SUMMARY fi - name: Notify on failure if: failure() run: | echo "❌ Scheduled release failed - check the logs above" echo "Common issues:" echo "- Tests failed" echo "- Linting issues" echo "- Type checking errors" echo "- Git push permissions" ================================================ FILE: .github/workflows/test.yml ================================================ name: Test on: pull_request: push: branches: - main jobs: # Core tests without LLM providers core-tests: name: Core Tests runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Run core tests run: >- uv run pytest tests/ --asyncio-mode=auto -n auto -k 'not test_core_providers and not test_openai and not test_anthropic and not test_gemini and not test_genai and not test_writer and not test_vertexai and not docs' env: INSTRUCTOR_ENV: CI OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} # Core provider tests for OpenAI core-openai: name: Core Provider Tests (OpenAI) runs-on: ubuntu-latest needs: core-tests env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Skip core provider tests (OpenAI) if: ${{ env.OPENAI_API_KEY == '' }} run: echo "Skipping OpenAI core provider tests (missing OPENAI_API_KEY)." - name: Run core provider tests (OpenAI) if: ${{ env.OPENAI_API_KEY != '' }} run: | set +e uv run pytest tests/llm/test_core_providers -v --asyncio-mode=auto -n auto -k "openai" status=$? set -e if [ $status -eq 5 ]; then echo "No tests collected; treating as success." exit 0 fi exit $status env: INSTRUCTOR_ENV: CI OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} # Core provider tests for Anthropic core-anthropic: name: Core Provider Tests (Anthropic) runs-on: ubuntu-latest needs: core-tests env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Skip core provider tests (Anthropic) if: ${{ env.ANTHROPIC_API_KEY == '' }} run: echo "Skipping Anthropic core provider tests (missing ANTHROPIC_API_KEY)." - name: Run core provider tests (Anthropic) if: ${{ env.ANTHROPIC_API_KEY != '' }} run: | set +e uv run pytest tests/llm/test_core_providers -v --asyncio-mode=auto -n auto -k "anthropic" status=$? set -e if [ $status -eq 5 ]; then echo "No tests collected; treating as success." exit 0 fi exit $status env: INSTRUCTOR_ENV: CI ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} # Core provider tests for Google core-google: name: Core Provider Tests (Google) runs-on: ubuntu-latest needs: core-tests env: GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} GOOGLE_GENAI_MODEL: ${{ secrets.GOOGLE_GENAI_MODEL }} steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Skip core provider tests (Google) if: ${{ env.GOOGLE_API_KEY == '' || env.GOOGLE_GENAI_MODEL == '' }} run: echo "Skipping Google core provider tests (missing GOOGLE_API_KEY or GOOGLE_GENAI_MODEL)." - name: Run core provider tests (Google) if: ${{ env.GOOGLE_API_KEY != '' && env.GOOGLE_GENAI_MODEL != '' }} run: | set +e uv run pytest tests/llm/test_core_providers -v --asyncio-mode=auto -n auto -k "google" status=$? set -e if [ $status -eq 5 ]; then echo "No tests collected; treating as success." exit 0 fi exit $status env: INSTRUCTOR_ENV: CI GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} # Core provider tests for other providers core-other: name: Core Provider Tests (Other) runs-on: ubuntu-latest needs: core-tests env: COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }} CEREBRAS_API_KEY: ${{ secrets.CEREBRAS_API_KEY }} FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }} WRITER_API_KEY: ${{ secrets.WRITER_API_KEY }} PERPLEXITY_API_KEY: ${{ secrets.PERPLEXITY_API_KEY }} steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Skip core provider tests (Other) if: >- ${{ env.COHERE_API_KEY == '' && env.XAI_API_KEY == '' && env.MISTRAL_API_KEY == '' && env.CEREBRAS_API_KEY == '' && env.FIREWORKS_API_KEY == '' && env.WRITER_API_KEY == '' && env.PERPLEXITY_API_KEY == '' }} run: echo "Skipping core provider tests (Other) (missing provider secrets)." - name: Run core provider tests (Cohere, xAI, Mistral, etc) if: >- ${{ env.COHERE_API_KEY != '' || env.XAI_API_KEY != '' || env.MISTRAL_API_KEY != '' || env.CEREBRAS_API_KEY != '' || env.FIREWORKS_API_KEY != '' || env.WRITER_API_KEY != '' || env.PERPLEXITY_API_KEY != '' }} run: | set +e uv run pytest tests/llm/test_core_providers -v --asyncio-mode=auto -n auto -k "cohere or xai or mistral or cerebras or fireworks or writer or perplexity" status=$? set -e if [ $status -eq 5 ]; then echo "No tests collected; treating as success." exit 0 fi exit $status env: INSTRUCTOR_ENV: CI COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }} CEREBRAS_API_KEY: ${{ secrets.CEREBRAS_API_KEY }} FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }} WRITER_API_KEY: ${{ secrets.WRITER_API_KEY }} PERPLEXITY_API_KEY: ${{ secrets.PERPLEXITY_API_KEY }} # Provider tests run in parallel provider-tests: name: ${{ matrix.provider.name }} Tests runs-on: ubuntu-latest needs: [core-openai, core-anthropic, core-google, core-other] env: PROVIDER_API_KEY: ${{ secrets[matrix.provider.env_key] }} GOOGLE_GENAI_MODEL: ${{ secrets.GOOGLE_GENAI_MODEL }} strategy: fail-fast: false matrix: provider: - name: OpenAI env_key: OPENAI_API_KEY test_path: tests/llm/test_openai - name: Anthropic env_key: ANTHROPIC_API_KEY test_path: tests/llm/test_anthropic - name: Gemini env_key: GOOGLE_API_KEY test_path: tests/llm/test_gemini - name: Google GenAI env_key: GOOGLE_API_KEY test_path: tests/llm/test_genai - name: Vertex AI env_key: GOOGLE_API_KEY test_path: tests/llm/test_vertexai - name: Writer env_key: WRITER_API_KEY test_path: tests/llm/test_writer steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Skip ${{ matrix.provider.name }} tests if: >- ${{ env.PROVIDER_API_KEY == '' || ((matrix.provider.name == 'Gemini' || matrix.provider.name == 'Google GenAI' || matrix.provider.name == 'Vertex AI') && env.GOOGLE_GENAI_MODEL == '') }} run: >- echo "Skipping ${{ matrix.provider.name }} tests (missing ${{ matrix.provider.env_key }} or GOOGLE_GENAI_MODEL)." - name: Run ${{ matrix.provider.name }} tests if: >- ${{ env.PROVIDER_API_KEY != '' && ((matrix.provider.name != 'Gemini' && matrix.provider.name != 'Google GenAI' && matrix.provider.name != 'Vertex AI') || env.GOOGLE_GENAI_MODEL != '') }} run: | set +e uv run pytest ${{ matrix.provider.test_path }} --asyncio-mode=auto -n auto status=$? set -e if [ $status -eq 5 ]; then echo "No tests collected; treating as success." exit 0 fi exit $status env: INSTRUCTOR_ENV: CI ${{ matrix.provider.env_key }}: ${{ secrets[matrix.provider.env_key] }} # Auto client needs multiple providers auto-client-test: name: Auto Client Tests runs-on: ubuntu-latest needs: [core-openai, core-anthropic, core-google, core-other] env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} steps: - uses: actions/checkout@v2 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Skip Auto Client tests if: >- ${{ env.OPENAI_API_KEY == '' || env.GOOGLE_API_KEY == '' || env.COHERE_API_KEY == '' || env.ANTHROPIC_API_KEY == '' || env.XAI_API_KEY == '' }} run: echo "Skipping Auto Client tests (missing one or more provider secrets)." - name: Run Auto Client tests if: >- ${{ env.OPENAI_API_KEY != '' && env.GOOGLE_API_KEY != '' && env.COHERE_API_KEY != '' && env.ANTHROPIC_API_KEY != '' && env.XAI_API_KEY != '' }} run: | set +e uv run pytest tests/test_auto_client.py --asyncio-mode=auto -n auto status=$? set -e if [ $status -eq 5 ]; then echo "No tests collected; treating as success." exit 0 fi exit $status env: INSTRUCTOR_ENV: CI OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} XAI_API_KEY: ${{ secrets.XAI_API_KEY }} ================================================ FILE: .github/workflows/test_docs.yml ================================================ name: Test Docs on: schedule: - cron: '0 0 1 * *' # Runs at 00:00 on the 1st of every month jobs: release: runs-on: ubuntu-latest strategy: matrix: python-version: ["3.11"] steps: - uses: actions/checkout@v2 - name: Install system dependencies run: | sudo apt-get update sudo apt-get install -y graphviz libcairo2-dev xdg-utils - name: Install Poetry uses: snok/install-poetry@v1.3.1 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v4 with: python-version: ${{ matrix.python-version }} cache: "poetry" - name: Install uv uses: astral-sh/setup-uv@v4 - name: Install the project run: uv sync --all-extras - name: Run tests run: uv run pytest tests/docs --asyncio-mode=auto env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ================================================ FILE: .github/workflows/ty.yml ================================================ name: ty on: pull_request: branches: [main] push: branches: [main] env: WORKING_DIRECTORY: "." jobs: type-check: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Install uv uses: astral-sh/setup-uv@v4 with: enable-cache: true - name: Set up Python run: uv python install 3.11 - name: Install the project run: uv sync --all-extras - name: Run type check with ty run: uv run ty check instructor/ - name: Run type check with ty (tests) run: uv run ty check --config-file ty-tests.toml tests ================================================ FILE: .gitignore ================================================ .DS_Store # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ cover/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder .pybuilder/ target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv # For a library or package, you might want to ignore these files since the code is # intended to run in multiple environments; otherwise, check them in: # .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # poetry # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. # This is especially recommended for binary packages to ensure reproducibility, and is more # commonly ignored for libraries. # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control #poetry.lock # pdm # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. #pdm.lock # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it # in version control. # https://pdm.fming.dev/#use-with-ide .pdm.toml # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ .envrc # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # pytype static type analyzer .pytype/ # Cython debug symbols cython_debug/ # PyCharm # JetBrains specific template is maintained in a separate JetBrains.gitignore that can # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. .idea/ .vscode/ examples/citation_with_extraction/fly.toml my_cache_directory/ tutorials/wandb/* tutorials/results.csv tutorials/results.jsonl tutorials/results.jsonlines tutorials/schema.json wandb/settings math_finetunes.jsonl pr_body.md check_zero_width_chars.py # Suggestion files from architectural analysis *_SUGGESTIONS.md ORGANIZED_SUGGESTIONS.md ================================================ FILE: .grit/.gitignore ================================================ .gritmodules *.log ================================================ FILE: .grit/grit.yaml ================================================ version: 0.0.1 patterns: - name: github.com/getgrit/python#openai level: info ================================================ FILE: .pre-commit-config.yaml ================================================ repos: - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.9.9 # Ruff version hooks: - id: ruff # Run the linter. name: Run Linter Check (Ruff) args: [ --fix, --unsafe-fixes ] files: ^(instructor|tests|examples)/ - id: ruff-format # Run the formatter. name: Run Formatter (Ruff) - repo: local hooks: - id: uv-lock-check name: Check uv.lock is up-to-date entry: uv args: [lock, --check] language: system files: ^(pyproject\.toml|uv\.lock)$ pass_filenames: false - id: uv-sync-check name: Verify dependencies can be installed entry: uv args: [sync, --check] language: system files: ^(pyproject\.toml|uv\.lock)$ pass_filenames: false - id: uv-export-requirements name: Export requirements.txt from pyproject.toml entry: bash -c 'uv pip compile pyproject.toml -o requirements.txt && git add requirements.txt' language: system files: ^pyproject\.toml$ pass_filenames: false - id: ty-check name: Run Type Check (ty) entry: uv args: [run, ty, check, --ignore, unresolved-import] language: system files: ^instructor/ pass_filenames: false ================================================ FILE: .ruff.toml ================================================ # Exclude a variety of commonly ignored directories. exclude = [ ".bzr", ".direnv", ".eggs", ".git", ".git-rewrite", ".hg", ".mypy_cache", ".nox", ".pants.d", ".pytype", ".ruff_cache", ".svn", ".tox", ".venv", "__pypackages__", "_build", "buck-out", "build", "dist", "node_modules", "venv", ] # Same as Black. line-length = 88 output-format = "grouped" target-version = "py39" [lint] select = [ # bugbear rules "B", # remove unused imports "F401", # bare except statements "E722", # unused arguments "ARG", # pyupgrade "UP", ] ignore = [ # mutable defaults "B006", "B018", ] unfixable = [ # disable auto fix for print statements "T201", "T203", ] [lint.extend-per-file-ignores] "instructor/distil.py" = ["ARG002"] "tests/test_distil.py" = ["ARG001"] "tests/test_patch.py" = ["ARG001"] "examples/task_planner/task_planner_topological_sort.py" = ["ARG002"] "examples/citation_with_extraction/main.py" = ["ARG001"] ================================================ FILE: AGENT.md ================================================ # AGENT.md ## Commands - Install: `uv pip install -e ".[dev]"` or `poetry install --with dev` - Run tests: `uv run pytest tests/` - Run single test: `uv run pytest tests/path_to_test.py::test_name` - Skip LLM tests: `uv run pytest tests/ -k 'not llm and not openai'` - Temp deps for a run: `uv run --with [==version] ` (example: `uv run --with pytest-asyncio --with anthropic pytest tests/...`) - Type check: `uv run ty check` - Lint: `uv run ruff check instructor examples tests` - Format: `uv run ruff format instructor examples tests` - Build docs: `uv run mkdocs serve` (local) or `./build_mkdocs.sh` (production) - Waiting: use `sleep ` for explicit pauses (e.g., CI waits) or to let external processes finish ## Architecture - **Core**: `instructor/` - Pydantic-based structured outputs for LLMs - **Base classes**: `Instructor` and `AsyncInstructor` in `client.py` - **Providers**: Client files (`client_*.py`) for OpenAI, Anthropic, Gemini, Cohere, etc. - **Factory pattern**: `from_provider()` for automatic provider detection - **DSL**: `dsl/` directory with Partial, Iterable, Maybe, Citation extensions - **Key modules**: `patch.py` (patching), `process_response.py` (parsing), `function_calls.py` (schemas) ## Code Style - **Typing**: Strict type annotations, use `BaseModel` for structured outputs - **Imports**: Standard lib → third-party → local - **Formatting**: Ruff with Black conventions - **Error handling**: Custom exceptions from `exceptions.py`, Pydantic validation - **Naming**: `snake_case` functions/variables, `PascalCase` classes - **No mocking**: Tests use real API calls - **Client creation**: Always use `instructor.from_provider("provider_name/model_name")` instead of provider-specific methods like `from_openai()`, `from_anthropic()`, etc. ## Pull Request (PR) Formatting Use **Conventional Commits** formatting for PR titles. Treat the PR title as the message we would use for a squash merge commit. ### PR Title Format Use: `(): ` Rules: - Keep it under ~70 characters when you can. - Use the imperative mood (for example, “add”, “fix”, “update”). - Do not end with a period. - If it includes a breaking change, add `!` after the type or scope (for example, `feat(api)!:`). Good examples: - `fix(openai): handle empty tool_calls in streaming` - `feat(retry): add backoff for JSON parse failures` - `docs(agents): add conventional commit PR title guidelines` - `test(schema): cover nested union edge cases` - `ci(ruff): enforce formatting in pre-commit` Common types: - `feat`: new feature - `fix`: bug fix - `docs`: documentation-only changes - `refactor`: code change that is not a fix or feature - `perf`: performance improvement - `test`: add or update tests - `build`: build system or dependency changes - `ci`: CI pipeline changes - `chore`: maintenance work Suggested scopes (pick the closest match): - Providers: `openai`, `anthropic`, `gemini`, `vertexai`, `bedrock`, `mistral`, `groq`, `writer` - Core: `core`, `patch`, `process_response`, `function_calls`, `retry`, `dsl` - Repo: `docs`, `examples`, `tests`, `ci`, `build` ### PR Description Guidelines Keep PR descriptions short and easy to review: - **What**: What changed, in 1–3 sentences. - **Why**: Why this change is needed (link issues when possible). - **Changes**: 3–7 bullet points with the main edits. - **Testing**: What you ran (or why you did not run anything). If the PR was authored by Cursor, include: - `This PR was written by [Cursor](https://cursor.com)` ================================================ FILE: CHANGELOG.md ================================================ # Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ## [1.14.4] - 2026-01-16 ### Changed - Simplified `JsonCompleteness` by using `jiter` parsing and a sibling-based completeness heuristic (#2000) ### Fixed - Fixed Google GenAI `safety_settings` causing `400 INVALID_ARGUMENT` when requests include image content by using image-specific harm categories when needed (#1773) - Fixed `create_with_completion()` crashing for `list[T]` response models (where `T` is a Pydantic model) by preserving `_raw_response` on list outputs (#1303) - Fixed Responses API retries crashing on reasoning items by skipping non-tool-call items in `reask_responses_tools` (#2002) - Fixed Google GenAI dict-style `config` handling to preserve `labels` and other settings like `cached_content` and `thinking_config` (#2005) ## [1.14.3] - 2026-01-13 ### Added - Completeness-based validation for Partial streaming - only validates JSON structures that are structurally complete (#1999) - New `JsonCompleteness` class in `instructor/dsl/json_tracker.py` for tracking JSON completeness during streaming (#1999) ### Fixed - Fixed Stream objects crashing reask handlers when using streaming with `max_retries > 1` (#1992) - Field constraints (`min_length`, `max_length`, `ge`, `le`, etc.) now work correctly during streaming (#1999) ### Deprecated - `PartialLiteralMixin` is now deprecated - completeness-based validation handles Literal/Enum types automatically (#1999) ## [1.14.2] - 2026-01-13 ### Fixed - Fixed model validators crashing during partial streaming by skipping them until streaming completes (#1994) - Fixed infinite recursion with self-referential models in Partial (e.g., TreeNode with children: List["TreeNode"]) (#1997) ### Added - Added `PartialLiteralMixin` documentation for handling Literal/Enum types during streaming (#1994) - Added final validation against original model after streaming completes to enforce required fields (#1994) - Added tests for recursive Partial models (#1997) ## [1.14.1] - 2026-01-08 ### Fixed - Added support for cached_content in Google Gemini context caching (#1987) ## [1.14.0] - 2026-01-08 ### Added - Pre-commit hook to auto-export requirements.txt for build consistency ### Changed - Standardized provider factory methods across codebase for improved consistency - Standardized provider imports throughout documentation - Audited and standardized exception handling throughout the instructor library ### Fixed - Fixed build issues with requirements.txt regeneration from pyproject.toml - Fixed provider functionality issue (#1914) ### Documentation - Comprehensive documentation audit and SEO optimization improvements (#1944) - Updated documentation for responses API mode (#1946) - Enhanced README with PydanticAI promotion and clear feature distinctions - Removed incorrect model reference in client.create extraction example (#1951) - Fixed image base URLs in Jupyter notebook tutorials (#1922) ## [1.13.0] - Previous Release For changes in earlier versions, see the [git history](https://github.com/instructor-ai/instructor/releases). ================================================ FILE: CLAUDE.md ================================================ # CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. # Instructor Development Guide ## Commands - Install deps: `uv pip install -e ".[dev,anthropic]"` or `poetry install --with dev,anthropic` - Run tests: `uv run pytest tests/ -n auto` - Run specific test: `uv run pytest tests/path_to_test.py::test_name` - Skip LLM tests: `uv run pytest tests/ -k 'not llm and not openai'` - Type check: `uv run ty check` - Lint: `uv run ruff check instructor examples tests` - Format: `uv run ruff format instructor examples tests` - Generate coverage: `uv run coverage run -m pytest tests/ -k "not docs"` then `uv run coverage report` - Build documentation: `uv run mkdocs serve` (for local preview) or `./build_mkdocs.sh` (for production) - Waiting: use `sleep ` for explicit pauses (e.g., CI waits) or to let external processes finish ## Installation & Setup - Fork the repository and clone your fork - Install UV: `pip install uv` - Create virtual environment: `uv venv` - Install dependencies: `uv pip install -e ".[dev]"` - Install pre-commit: `uv run pre-commit install` - Run tests to verify: `uv run pytest tests/ -k "not openai"` ## Code Style Guidelines - **Typing**: Use strict typing with annotations for all functions and variables - **Imports**: Standard lib → third-party → local imports - **Formatting**: Follow Black's formatting conventions (enforced by Ruff) - **Models**: Define structured outputs as Pydantic BaseModel subclasses - **Naming**: snake_case for functions/variables, PascalCase for classes - **Error Handling**: Use custom exceptions from exceptions.py, validate with Pydantic - **Comments**: Docstrings for public functions, inline comments for complex logic ## Conventional Commits - **Format**: `type(scope): description` - **Types**: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert - **Examples**: - `feat(anthropic): add support for Claude 3.5` - `fix(openai): correct response parsing for streaming` - `docs(README): update installation instructions` - `test(gemini): add validation tests for JSON mode` ## Core Architecture - **Base Classes**: `Instructor` and `AsyncInstructor` in client.py are the foundation - **Factory Pattern**: Provider-specific factory functions (`from_openai`, `from_anthropic`, etc.) - **Unified Access**: `from_provider()` function in auto_client.py for automatic provider detection - **Mode System**: `Mode` enum categorizes different provider capabilities (tools vs JSON output) - **Patching Mechanism**: Uses Python's dynamic nature to patch provider clients for structured outputs - **Response Processing**: Transforms raw API responses into validated Pydantic models - **DSL Components**: Special types like Partial, Iterable, Maybe extend the core functionality ## Provider Architecture - **Supported Providers**: OpenAI, Anthropic, Gemini, Cohere, Mistral, Groq, VertexAI, Fireworks, Cerebras, Writer, Databricks, Anyscale, Together, LiteLLM, Bedrock, Perplexity - **Provider Implementation**: Each provider has a dedicated client file (e.g., `client_anthropic.py`) with factory functions - **Modes**: Different providers support specific modes (`Mode` enum): `ANTHROPIC_TOOLS`, `GEMINI_JSON`, etc. - **Common Pattern**: Factory functions (e.g., `from_anthropic`) take a native client and return patched `Instructor` instances - **Provider Testing**: Tests in `tests/llm/` directory, define Pydantic models, make API calls, verify structured outputs - **Provider Detection**: `get_provider` function analyzes base URL to detect which provider is being used ## Key Components - **process_response.py**: Handles parsing and converting LLM outputs to Pydantic models - **patch.py**: Contains the core patching logic for modifying provider clients - **function_calls.py**: Handles generating function/tool schemas from Pydantic models - **hooks.py**: Provides event hooks for intercepting various stages of the LLM request/response cycle - **dsl/**: Domain-specific language extensions for specialized model types - **retry.py**: Implements retry logic for handling validation failures - **validators.py**: Custom validation mechanisms for structured outputs ## Testing Guidelines - Tests are organized by provider under `tests/llm/` - Each provider has its own conftest.py with fixtures - Standard tests cover: basic extraction, streaming, validation, retries - Evaluation tests in `tests/llm/test_provider/evals/` assess model capabilities - Use parametrized tests when testing similar functionality across variants - **IMPORTANT**: No mocking in tests - tests make real API calls ## Documentation Guidelines - Every provider needs documentation in `docs/integrations/` following standard format - Provider docs should include: installation, basic example, modes supported, special features - When adding a new provider, update `mkdocs.yml` navigation and redirects - Example code should include complete imports and environment setup - Tutorials should progress from simple to complex concepts - New features should include conceptual explanation in `docs/concepts/` - **Writing Style**: Grade 10 reading level, all examples must be working code ## Branch and Development Workflow 1. Fork and clone the repository 2. Create feature branch: `git checkout -b feat/your-feature` 3. Make changes and add tests 4. Run tests and linting 5. Commit with conventional commit message 6. Push to your fork and create PR 7. Use stacked PRs for complex features ## Adding New Providers ### Step-by-Step Guide 1. **Update Provider Enum** in `instructor/utils.py`: ```python class Provider(Enum): YOUR_PROVIDER = "your_provider" ``` 2. **Add Provider Modes** in `instructor/mode.py`: ```python class Mode(enum.Enum): YOUR_PROVIDER_TOOLS = "your_provider_tools" YOUR_PROVIDER_JSON = "your_provider_json" ``` 3. **Create Client Implementation** `instructor/client_your_provider.py`: - Use overloads for sync/async variants - Validate mode compatibility - Return appropriate Instructor/AsyncInstructor instance - Handle provider-specific edge cases 4. **Add Conditional Import** in `instructor/__init__.py`: ```python if importlib.util.find_spec("your_provider_sdk") is not None: from .client_your_provider import from_your_provider __all__ += ["from_your_provider"] ``` 5. **Update Auto Client** in `instructor/auto_client.py`: - Add to `supported_providers` list - Implement provider handling in `from_provider()` - Update `get_provider()` function if URL-detectable 6. **Create Tests** in `tests/llm/test_your_provider/`: - `conftest.py` with client fixtures - Basic extraction tests - Streaming tests - Validation/retry tests - No mocking - use real API calls 7. **Add Documentation** in `docs/integrations/your_provider.md`: - Installation instructions - Basic usage examples - Supported modes - Provider-specific features 8. **Update Navigation** in `mkdocs.yml`: - Add to integrations section - Include redirects if needed ## Contributing to Evals - Standard evals for each provider test model capabilities - Create new evals following existing patterns - Run evals as part of integration test suite - Performance tracking and comparison ## Pull Request Guidelines - Keep PRs small and focused - Include tests for all changes - Update documentation as needed - Follow PR template - Link to relevant issues ## Type System and Best Practices ### Type Checking with ty - **Type Checker**: Using `ty` for fast, incremental type checking - **Python Version**: 3.9+ for compatibility - **Configuration**: Uses `pyproject.toml` settings for type checking - Run `uv run ty check` before committing - aim for zero errors ### Code Quality Checks Before Committing Always run these checks before committing code: 1. **Ruff linting**: `uv run ruff check .` - Fix all errors 2. **Ruff formatting**: `uv run ruff format .` - Apply consistent formatting 3. **Type checking**: `uv run ty check` - Aim for zero type errors 4. **Tests**: Run relevant tests to ensure changes don't break functionality ### Type Patterns - **Bounded TypeVars**: Use `T = TypeVar("T", bound=Union[BaseModel, ...])` for constraints - **Version Compatibility**: Handle Python 3.9 vs 3.10+ typing differences explicitly - **Union Type Syntax**: Use `from __future__ import annotations` to enable Python 3.10+ union syntax (`|`) in Python 3.9 - **Simple Type Detection**: Special handling for `list[Union[int, str]]` patterns - **Runtime Type Handling**: Graceful fallbacks for compatibility ### Pydantic Integration - Heavy use of `BaseModel` for structured outputs - `TypeAdapter` used internally for JSON schema generation - Field validators and custom types - Models serve dual purpose: validation and documentation ## Building Documentation ### Setup ```bash # Install documentation dependencies pip install -r requirements-doc.txt ``` ### Local Development ```bash # Serve documentation locally with hot reload uv run mkdocs serve # Build documentation for production ./build_mkdocs.sh ``` ### Documentation Features - **Material Theme**: Modern UI with extensive customization - **Plugins**: - `mkdocstrings` - API documentation from docstrings - `mkdocs-jupyter` - Notebook integration - `mkdocs-redirects` - URL management - Custom hooks for code processing - **Custom Processing**: `hide_lines.py` removes code marked with `# <%hide%>` - **Redirect Management**: Comprehensive redirect maps for moved content ### Writing Documentation - Follow templates in `docs/templates/` for consistency - Grade 10 reading level for accessibility - All code examples must be runnable - Include complete imports and environment setup - Progressive complexity: simple → advanced ## Project Structure - `instructor/` - Core library code - Base classes (`client.py`): `Instructor` and `AsyncInstructor` - Provider clients (`client_*.py`): Factory functions for each provider - DSL components (`dsl/`): Partial, Iterable, Maybe, Citation extensions - Core logic: `patch.py`, `process_response.py`, `function_calls.py` - CLI tools (`cli/`): Batch processing, file management, usage tracking - `tests/` - Test suite organized by provider - Provider-specific tests in `tests/llm/test_/` - Evaluation tests for model capabilities - No mocking - all tests use real API calls - `docs/` - MkDocs documentation - `concepts/` - Core concepts and features - `integrations/` - Provider-specific guides - `examples/` - Practical examples and cookbooks - `learning/` - Progressive tutorial path - `blog/posts/` - Technical articles and announcements - `templates/` - Templates for new docs (provider, concept, cookbook) - `examples/` - Runnable code examples - Feature demos: caching, streaming, validation, parallel processing - Use cases: classification, extraction, knowledge graphs - Provider examples: anthropic, openai, groq, mistral - Each example has `run.py` as the main entry point - `typings/` - Type stubs for untyped dependencies ## Documentation Structure - **Getting Started Path**: Installation → First Extraction → Response Models → Structured Outputs - **Learning Patterns**: Simple Objects → Lists → Nested Structures → Validation → Streaming - **Example Organization**: Self-contained directories with runnable code demonstrating specific features - **Blog Posts**: Technical deep-dives with code examples in `docs/blog/posts/` ## Example Patterns When creating examples: - Use `run.py` as the main file name - Include clear imports: stdlib → third-party → instructor - Define Pydantic models with descriptive fields - Show expected output in comments - Handle errors appropriately - Make examples self-contained and runnable ## Dependency Management ### Core Dependencies - **Minimal core**: `openai`, `pydantic`, `docstring-parser`, `typer`, `rich` - **Python requirement**: `<4.0,>=3.9` - **Pydantic version**: `<3.0.0,>=2.8.0` (constrained for stability) ### Optional Dependencies Provider-specific packages as extras: ```bash # Install with specific provider pip install "instructor[anthropic]" pip install "instructor[google-generativeai]" pip install "instructor[groq]" ``` ### Development Dependencies ```bash # Install all development dependencies uv pip install -e ".[dev]" ``` Includes: - ty - `pytest` and `pytest-asyncio` - Testing - `ruff` - Linting and formatting - `coverage` - Test coverage - `mkdocs` and plugins - Documentation ### Version Constraints - **Upper bounds on all dependencies** for stability - **Provider SDK versions** pinned to tested versions - **Test dependencies** include evaluation frameworks ### Managing Dependencies - Update `pyproject.toml` for new dependencies - Test with multiple Python versions (3.9-3.12) - Run full test suite after dependency updates - Document any provider-specific version requirements The library enables structured LLM outputs using Pydantic models across multiple providers with type safety. ================================================ FILE: CONTRIBUTING.md ================================================ # Contributing to Instructor Thank you for considering contributing to Instructor! This document provides guidelines and instructions to help you contribute effectively. ## Table of Contents - [Contributing to Instructor](#contributing-to-instructor) - [Table of Contents](#table-of-contents) - [Code of Conduct](#code-of-conduct) - [Getting Started](#getting-started) - [Environment Setup](#environment-setup) - [Development Workflow](#development-workflow) - [Dependency Management](#dependency-management) - [Using UV](#using-uv) - [Using Poetry](#using-poetry) - [Working with Optional Dependencies](#working-with-optional-dependencies) - [How to Contribute](#how-to-contribute) - [Reporting Bugs](#reporting-bugs) - [Feature Requests](#feature-requests) - [Pull Requests](#pull-requests) - [Writing Documentation](#writing-documentation) - [Contributing to Evals](#contributing-to-evals) - [Code Style Guidelines](#code-style-guidelines) - [Conventional Comments](#conventional-comments) - [Conventional Commits](#conventional-commits) - [Types](#types) - [Examples](#examples) - [Testing](#testing) - [Branch and Release Process](#branch-and-release-process) - [Using Cursor for PR Creation](#using-cursor-for-pr-creation) - [License](#license) ## Code of Conduct By participating in this project, you agree to abide by our code of conduct: treat everyone with respect, be constructive in your communication, and focus on the technical aspects of the contributions. ## Getting Started ### Environment Setup 1. **Fork the Repository**: Click the "Fork" button at the top right of the [repository page](https://github.com/instructor-ai/instructor). 2. **Clone Your Fork**: ```bash git clone https://github.com/YOUR-USERNAME/instructor.git cd instructor ``` 3. **Set up Remote**: ```bash git remote add upstream https://github.com/instructor-ai/instructor.git ``` 4. **Install UV** (recommended): ```bash # macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows PowerShell powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" ``` 5. **Install Dependencies**: ```bash # Using uv (recommended) uv pip install -e ".[dev,docs,test-docs]" # Using poetry poetry install --with dev,docs,test-docs # For specific providers, add the provider name as an extra # Example: uv pip install -e ".[dev,docs,test-docs,anthropic]" ``` 6. **Set up Pre-commit**: ```bash pip install pre-commit pre-commit install ``` ### Development Workflow 1. **Create a Branch**: ```bash git checkout -b feature/your-feature-name ``` 2. **Make Your Changes and Commit**: ```bash git add . git commit -m "Your descriptive commit message" ``` 3. **Keep Your Branch Updated**: ```bash git fetch upstream git rebase upstream/main ``` 4. **Push Changes**: ```bash git push origin feature/your-feature-name ``` ### Dependency Management We support both UV and Poetry for dependency management. Choose the tool that works best for you: #### Using UV UV is a fast Python package installer and resolver. It's recommended for day-to-day development in Instructor. ```bash # Install uv curl -LsSf https://astral.sh/uv/install.sh | sh # Install project and development dependencies uv pip install -e ".[dev,docs]" # Adding a new dependency (example) uv pip install new-package ``` Key UV commands: - `uv pip install -e .` - Install the project in editable mode - `uv pip install -e ".[dev]"` - Install with development extras - `uv pip freeze > requirements.txt` - Generate requirements file - `uv self update` - Update UV to the latest version #### Using Poetry Poetry provides more comprehensive dependency management and packaging. ```bash # Install Poetry curl -sSL https://install.python-poetry.org | python3 - # Install dependencies including development deps poetry install --with dev,docs # Add a new dependency poetry add package-name # Add a new development dependency poetry add --group dev package-name ``` Key Poetry commands: - `poetry shell` - Activate the virtual environment - `poetry run python -m pytest` - Run commands within the virtual environment - `poetry update` - Update dependencies to their latest versions ### Working with Optional Dependencies Instructor uses optional dependencies to support different LLM providers. Provider-specific utilities live under `instructor/utils`. When adding integration for a new provider: 1. **Update pyproject.toml**: Add your provider's dependencies to both `[project.optional-dependencies]` and `[dependency-groups]`: ```toml [project.optional-dependencies] # Add your provider here my-provider = ["my-provider-sdk>=1.0.0,<2.0.0"] [dependency-groups] # Also add to dependency groups my-provider = ["my-provider-sdk>=1.0.0,<2.0.0"] ``` 2. **Create Provider Client**: Implement your provider client in `instructor/clients/client_myprovider.py` 3. **Add Tests**: Create tests in `tests/llm/test_myprovider/` 4. **Document Installation**: Update the documentation to include installation instructions: ``` # Install with your provider support uv pip install "instructor[my-provider]" # or poetry install --with my-provider ``` 5. **Create Provider Utilities and Handlers**: - Add a new module at `instructor/utils/myprovider.py` - Implement `reask` functions for validation errors and `handle_*` functions for formatting requests - Define `MYPROVIDER_HANDLERS` mapping `Mode` values to these functions 6. **Register the Provider**: - Add a value in `instructor/utils/providers.py` to the `Provider` enum - Extend `get_provider` with detection logic for your base URL 7. **Update `process_response.py`**: - Import your handler functions and include them in the `mode_handlers` dictionary so the library can route requests to your provider - `process_response.py` relies on these handlers to format arguments and parse results for each `Mode` ## How to Contribute ### Reporting Bugs If you find a bug, please create an issue on [our issue tracker](https://github.com/instructor-ai/instructor/issues) with: 1. A clear, descriptive title 2. A detailed description including: - The `response_model` you are using - The `messages` you are using - The `model` you are using - Steps to reproduce the bug - The expected behavior and what went wrong - Your environment (Python version, OS, package versions) ### Feature Requests For feature requests, please create an issue describing: 1. The problem your feature would solve 2. How your solution would work 3. Alternatives you've considered 4. Examples of how the feature would be used ### Pull Requests 1. **Create a Pull Request** from your fork to the main repository. 2. **Fill out the PR template** with details about your changes. 3. **Address review feedback** and make requested changes. 4. **Wait for CI checks** to pass. 5. Once approved, a maintainer will merge your PR. ### Writing Documentation Documentation improvements are always welcome! Follow these guidelines: 1. Documentation is written in Markdown format in the `docs/` directory 2. When creating new markdown files, add them to `mkdocs.yml` under the appropriate section 3. Follow the existing hierarchy and structure 4. Use a grade 10 reading level (simple, clear language) 5. Include working code examples 6. Add links to related documentation ### Contributing to Evals We encourage contributions to our evaluation tests: 1. Explore existing evals in the [evals directory](https://github.com/instructor-ai/instructor/tree/main/tests/llm) 2. Contribute new evals as pytest tests 3. Evals should test specific capabilities or edge cases of the library or models 4. Follow the existing patterns for structuring eval tests ## Code Style Guidelines We use automated tools to maintain consistent code style: - **Ruff**: For linting and formatting - **ty**: For type checking - **Black**: For code formatting (enforced by Ruff) General guidelines: - **Typing**: Use strict typing with annotations for all functions and variables - **Imports**: Standard lib → third-party → local imports - **Models**: Define structured outputs as Pydantic BaseModel subclasses - **Naming**: snake_case for functions/variables, PascalCase for classes - **Error Handling**: Use custom exceptions from exceptions.py, validate with Pydantic - **Comments**: Docstrings for public functions, inline comments for complex logic ### Conventional Comments We use conventional comments in code reviews and commit messages. This helps make feedback clearer and more actionable: ```