Repository: quickwit-oss/quickwit Branch: main Commit: 06f0ef088a49 Files: 1206 Total size: 12.3 MB Directory structure: gitextract_hf8mrikh/ ├── .cargo/ │ └── config.toml ├── .claude/ │ └── skills/ │ ├── bump-tantivy/ │ │ └── SKILL.md │ ├── fix-clippy/ │ │ └── SKILL.md │ ├── fmt/ │ │ └── SKILL.md │ ├── rationalize-deps/ │ │ └── SKILL.md │ └── simple-pr/ │ └── SKILL.md ├── .devcontainer/ │ ├── devcontainer.json │ ├── post-create.sh │ └── welcome.txt ├── .dockerignore ├── .gitattributes ├── .github/ │ ├── ISSUE_TEMPLATE/ │ │ ├── bug_report.md │ │ ├── documentation_request.md │ │ ├── feature_request.md │ │ └── tutorial_request.md │ ├── PULL_REQUEST_TEMPLATE.md │ ├── actions/ │ │ ├── cargo-build-macos-binary/ │ │ │ └── action.yml │ │ └── cross-build-binary/ │ │ └── action.yml │ ├── dependabot.yml │ └── workflows/ │ ├── ci.yml │ ├── coverage.yml │ ├── dependency.yml │ ├── publish_cross_images.yml │ ├── publish_docker_images.yml │ ├── publish_lambda.yaml │ ├── publish_nightly_packages.yml │ ├── publish_release_packages.yml │ ├── requirements.txt │ ├── scorecard.yml │ └── ui-ci.yml ├── .gitignore ├── .localstack/ │ └── init.sh ├── CHANGELOG.md ├── CODE_OF_CONDUCT.md ├── CODE_STYLE.md ├── CONTRIBUTING.md ├── Dockerfile ├── LICENSE ├── LICENSE-3rdparty.csv ├── Makefile ├── README.md ├── SECURITY.md ├── _typos.toml ├── build/ │ └── cross-images/ │ ├── aarch64-unknown-linux-gnu.dockerfile │ ├── aarch64-unknown-linux-musl.dockerfile │ ├── x86_64-unknown-linux-gnu.dockerfile │ └── x86_64-unknown-linux-musl.dockerfile ├── config/ │ ├── quickwit.yaml │ ├── templates/ │ │ ├── gh-archive.yaml │ │ └── stackoverflow.yaml │ └── tutorials/ │ ├── fluentbit-logs/ │ │ └── index-config.yaml │ ├── gh-archive/ │ │ ├── index-config-for-clickhouse.yaml │ │ ├── index-config.yaml │ │ ├── kafka-source.yaml │ │ └── kinesis-source.yaml │ ├── grafana/ │ │ └── docker-compose.yml │ ├── hdfs-logs/ │ │ ├── index-config-partitioned.yaml │ │ ├── index-config-retention-policy.yaml │ │ ├── index-config.yaml │ │ ├── searcher-1.yaml │ │ ├── searcher-2.yaml │ │ └── searcher-3.yaml │ ├── otel-logs/ │ │ ├── index-config.yaml │ │ ├── kafka-source.yaml │ │ └── otel-values.yaml │ ├── otel-traces/ │ │ ├── index-config.yaml │ │ └── kafka-source.yaml │ ├── stackoverflow/ │ │ ├── index-config.yaml │ │ ├── pulsar-source.yaml │ │ └── send_messages_to_pulsar.py │ ├── vector-otel-logs/ │ │ └── vector.toml │ └── wikipedia/ │ ├── index-config.yaml │ └── multilang-index-config.yaml ├── distribution/ │ ├── docker/ │ │ └── ubuntu/ │ │ └── Dockerfile │ ├── ecs/ │ │ ├── .gitignore │ │ ├── README.md │ │ ├── example/ │ │ │ ├── .terraform.lock.hcl │ │ │ ├── bastion.tf │ │ │ ├── image.tf │ │ │ ├── kafka.tf │ │ │ ├── terraform.tf │ │ │ └── vpc.tf │ │ └── quickwit/ │ │ ├── cluster.tf │ │ ├── configs.tf │ │ ├── iam.tf │ │ ├── outputs.tf │ │ ├── quickwit-control-plane.tf │ │ ├── quickwit-indexer.tf │ │ ├── quickwit-janitor.tf │ │ ├── quickwit-metastore.tf │ │ ├── quickwit-searcher.tf │ │ ├── rds.tf │ │ ├── s3.tf │ │ ├── service/ │ │ │ ├── config.tf │ │ │ ├── ecs.tf │ │ │ └── variables.tf │ │ └── variables.tf │ └── kubernetes/ │ └── README.md ├── docker-compose.yml ├── docs/ │ ├── assets/ │ │ └── sqs-file-source.tf │ ├── configuration/ │ │ ├── _category_.yaml │ │ ├── index-config.md │ │ ├── index.md │ │ ├── lambda-config.md │ │ ├── metastore-config.md │ │ ├── node-config.md │ │ ├── ports-config.md │ │ ├── source-config.md │ │ ├── storage-config.md │ │ └── template-config.md │ ├── deployment/ │ │ ├── _category_.yaml │ │ ├── cluster-sizing.md │ │ ├── deployment-modes.md │ │ └── kubernetes/ │ │ ├── _category_.yaml │ │ ├── gke.md │ │ ├── glasskube.md │ │ └── helm.md │ ├── distributed-tracing/ │ │ ├── _category_.yaml │ │ ├── otel-service.md │ │ ├── overview.md │ │ ├── plug-quickwit-to-jaeger.md │ │ └── send-traces/ │ │ ├── _category_.yaml │ │ ├── using-otel-collector.md │ │ └── using-otel-sdk-python.md │ ├── get-started/ │ │ ├── _category_.yaml │ │ ├── installation.md │ │ ├── query-language-intro.md │ │ ├── quickstart.md │ │ └── tutorials/ │ │ ├── _category_.yaml │ │ ├── prometheus-metrics.md │ │ ├── trace-analytics-with-grafana.md │ │ ├── tutorial-hdfs-logs-distributed-search-aws-s3.md │ │ ├── tutorial-hdfs-logs.md │ │ └── tutorial-jaeger.md │ ├── guides/ │ │ ├── _category_.yaml │ │ ├── aws-setup.md │ │ ├── schemaless.md │ │ └── storage-setup/ │ │ ├── _category_.yaml │ │ └── aws-s3.md │ ├── ingest-data/ │ │ ├── _category_.yaml │ │ ├── index.md │ │ ├── ingest-api.md │ │ ├── ingest-local-file.md │ │ ├── kafka.md │ │ ├── kinesis.md │ │ ├── pulsar.md │ │ └── sqs-files.md │ ├── internals/ │ │ ├── backward-compatibility.md │ │ ├── date-time.md │ │ ├── ingest-v2.md │ │ ├── scroll.md │ │ ├── searcher-split-cache.md │ │ ├── sorting.md │ │ ├── split-format.md │ │ └── template-index.md │ ├── log-management/ │ │ ├── _category_.yaml │ │ ├── otel-service.md │ │ ├── overview.md │ │ ├── send-logs/ │ │ │ ├── _category_.yaml │ │ │ ├── send-docker-logs.md │ │ │ ├── using-fluentbit.md │ │ │ ├── using-otel-collector-with-helm.md │ │ │ ├── using-otel-collector.md │ │ │ └── using-vector.md │ │ └── supported-agents.md │ ├── operating/ │ │ ├── _category_.yaml │ │ ├── aws-costs.md │ │ ├── data-directory.md │ │ ├── monitoring.md │ │ └── upgrades.md │ ├── overview/ │ │ ├── _category_.yaml │ │ ├── architecture.md │ │ ├── concepts/ │ │ │ ├── _category_.yaml │ │ │ ├── deletes.md │ │ │ ├── indexing.md │ │ │ └── querying.md │ │ ├── index.md │ │ └── introduction.md │ ├── reference/ │ │ ├── _category_.yaml │ │ ├── aggregation.md │ │ ├── cli.md │ │ ├── es_compatible_api.md │ │ ├── metrics.md │ │ ├── query-language.md │ │ ├── rest-api.md │ │ └── updating-mapper.md │ └── telemetry.md ├── install.sh ├── monitoring/ │ ├── grafana/ │ │ ├── README.md │ │ ├── dashboards/ │ │ │ ├── indexers.json │ │ │ ├── ingesters.json │ │ │ ├── metastore.json │ │ │ └── searchers.json │ │ └── provisioning/ │ │ ├── dashboards/ │ │ │ └── default.yaml │ │ └── datasources/ │ │ └── default.yaml │ ├── otel-collector-config.yaml │ └── prometheus.yaml └── quickwit/ ├── .cargo/ │ └── config.toml ├── .cargo-dev/ │ └── config.toml ├── .config/ │ └── nextest.toml ├── .license_header.txt ├── CLAUDE.md ├── Cargo.toml ├── Cross.toml ├── Makefile ├── NOTICE ├── clippy.toml ├── deny.toml ├── dependency-licenses.html ├── license-tool.toml ├── quickwit-actors/ │ ├── Cargo.toml │ ├── LICENSE │ ├── README.md │ ├── benches/ │ │ └── bench.rs │ ├── examples/ │ │ └── ping_actor.rs │ └── src/ │ ├── actor.rs │ ├── actor_context.rs │ ├── actor_handle.rs │ ├── actor_state.rs │ ├── channel_with_priority.rs │ ├── command.rs │ ├── envelope.rs │ ├── lib.rs │ ├── mailbox.rs │ ├── observation.rs │ ├── registry.rs │ ├── scheduler.rs │ ├── spawn_builder.rs │ ├── supervisor.rs │ ├── tests.rs │ └── universe.rs ├── quickwit-aws/ │ ├── Cargo.toml │ └── src/ │ ├── error.rs │ ├── lib.rs │ └── retry.rs ├── quickwit-cli/ │ ├── Cargo.toml │ ├── src/ │ │ ├── checklist.rs │ │ ├── cli.rs │ │ ├── cli_doc_ext.toml │ │ ├── generate_markdown.rs │ │ ├── index.rs │ │ ├── jemalloc.rs │ │ ├── lib.rs │ │ ├── logger.rs │ │ ├── main.rs │ │ ├── metrics.rs │ │ ├── service.rs │ │ ├── source.rs │ │ ├── split.rs │ │ ├── stats.rs │ │ └── tool.rs │ └── tests/ │ ├── Pipfile │ ├── cli.rs │ ├── helpers.rs │ └── prepare_tests.sh ├── quickwit-cluster/ │ ├── Cargo.toml │ └── src/ │ ├── change.rs │ ├── cluster.rs │ ├── grpc_gossip.rs │ ├── grpc_service.rs │ ├── lib.rs │ ├── member.rs │ ├── metrics.rs │ └── node.rs ├── quickwit-codegen/ │ ├── Cargo.toml │ ├── README.md │ ├── example/ │ │ ├── Cargo.toml │ │ ├── build.rs │ │ └── src/ │ │ ├── codegen/ │ │ │ └── hello.rs │ │ ├── error.rs │ │ ├── hello.proto │ │ └── lib.rs │ └── src/ │ ├── codegen.rs │ └── lib.rs ├── quickwit-common/ │ ├── Cargo.toml │ ├── build.rs │ └── src/ │ ├── alloc_tracker.rs │ ├── binary_heap.rs │ ├── coolid.rs │ ├── cpus.rs │ ├── fs.rs │ ├── io.rs │ ├── jemalloc_profiled.rs │ ├── kill_switch.rs │ ├── lib.rs │ ├── metrics.rs │ ├── net.rs │ ├── path_hasher.rs │ ├── pretty.rs │ ├── progress.rs │ ├── pubsub.rs │ ├── rand.rs │ ├── rate_limited_tracing.rs │ ├── rate_limiter.rs │ ├── rendezvous_hasher.rs │ ├── retry.rs │ ├── ring_buffer.rs │ ├── runtimes.rs │ ├── shared_consts.rs │ ├── socket_addr_legacy_hash.rs │ ├── sorted_iter.rs │ ├── stream_utils.rs │ ├── temp_dir.rs │ ├── test_utils.rs │ ├── thread_pool.rs │ ├── tower/ │ │ ├── box_layer.rs │ │ ├── box_service.rs │ │ ├── buffer.rs │ │ ├── change.rs │ │ ├── circuit_breaker.rs │ │ ├── delay.rs │ │ ├── estimate_rate.rs │ │ ├── event_listener.rs │ │ ├── load_shed.rs │ │ ├── metrics.rs │ │ ├── mod.rs │ │ ├── one_task_per_call_layer.rs │ │ ├── pool.rs │ │ ├── rate.rs │ │ ├── rate_estimator.rs │ │ ├── rate_limit.rs │ │ ├── retry.rs │ │ ├── timeout.rs │ │ └── transport.rs │ ├── type_map.rs │ └── uri.rs ├── quickwit-config/ │ ├── Cargo.toml │ ├── resources/ │ │ └── tests/ │ │ ├── index_config/ │ │ │ ├── hdfs-logs-create-config.yaml │ │ │ ├── hdfs-logs.json │ │ │ ├── hdfs-logs.toml │ │ │ ├── hdfs-logs.yaml │ │ │ ├── minimal-hdfs-logs.yaml │ │ │ └── partial-hdfs-logs.yaml │ │ ├── node_config/ │ │ │ ├── quickwit.json │ │ │ ├── quickwit.toml │ │ │ ├── quickwit.wrongkey.yaml │ │ │ └── quickwit.yaml │ │ └── source_config/ │ │ ├── ingest-api-source.json │ │ ├── kafka-source.json │ │ └── kinesis-source.yaml │ └── src/ │ ├── cluster_config/ │ │ └── mod.rs │ ├── config_value.rs │ ├── index_config/ │ │ ├── mod.rs │ │ └── serialize.rs │ ├── index_template/ │ │ ├── mod.rs │ │ └── serialize.rs │ ├── lib.rs │ ├── merge_policy_config.rs │ ├── metastore_config.rs │ ├── node_config/ │ │ ├── mod.rs │ │ └── serialize.rs │ ├── qw_env_vars.rs │ ├── serde_utils.rs │ ├── service.rs │ ├── source_config/ │ │ ├── mod.rs │ │ └── serialize.rs │ ├── storage_config.rs │ └── templating.rs ├── quickwit-control-plane/ │ ├── Cargo.toml │ ├── README.md │ └── src/ │ ├── control_plane.rs │ ├── cooldown_map.rs │ ├── debouncer.rs │ ├── indexing_plan.rs │ ├── indexing_scheduler/ │ │ ├── change_tracker.rs │ │ ├── mod.rs │ │ └── scheduling/ │ │ ├── README.md │ │ ├── mod.rs │ │ ├── scheduling_logic.rs │ │ └── scheduling_logic_model.rs │ ├── ingest/ │ │ ├── ingest_controller.rs │ │ ├── mod.rs │ │ ├── scaling_arbiter.rs │ │ └── wait_handle.rs │ ├── lib.rs │ ├── metrics.rs │ ├── model/ │ │ ├── mod.rs │ │ └── shard_table.rs │ └── tests.rs ├── quickwit-datetime/ │ ├── Cargo.toml │ ├── README.md │ └── src/ │ ├── date_time_format.rs │ ├── date_time_parsing.rs │ ├── java_date_time_format.rs │ └── lib.rs ├── quickwit-directories/ │ ├── Cargo.toml │ └── src/ │ ├── bundle_directory.rs │ ├── caching_directory.rs │ ├── debug_proxy_directory.rs │ ├── hot_directory.rs │ ├── lib.rs │ ├── storage_directory.rs │ └── union_directory.rs ├── quickwit-doc-mapper/ │ ├── Cargo.toml │ ├── benches/ │ │ ├── data/ │ │ │ ├── simple-parse-bench.json │ │ │ └── simple-routing-expression-bench.json │ │ ├── doc_to_json_bench.rs │ │ └── routing_expression_bench.rs │ └── src/ │ ├── doc_mapper/ │ │ ├── date_time_type.rs │ │ ├── doc_mapper_builder.rs │ │ ├── doc_mapper_impl.rs │ │ ├── field_mapping_entry.rs │ │ ├── field_mapping_type.rs │ │ ├── field_presence.rs │ │ ├── mapping_tree.rs │ │ ├── mod.rs │ │ ├── tantivy_val_to_json.rs │ │ └── tokenizer_entry.rs │ ├── doc_mapping.rs │ ├── error.rs │ ├── lib.rs │ ├── query_builder.rs │ ├── routing_expression/ │ │ └── mod.rs │ └── tag_pruning.rs ├── quickwit-index-management/ │ ├── Cargo.toml │ └── src/ │ ├── garbage_collection.rs │ ├── index.rs │ └── lib.rs ├── quickwit-indexing/ │ ├── Cargo.toml │ ├── README.md │ ├── benches/ │ │ ├── data/ │ │ │ ├── bench_data.json │ │ │ ├── bench_data_heavy_transform.json │ │ │ └── bench_data_light_transform.json │ │ └── doc_process_vrl_bench.rs │ ├── data/ │ │ └── test_corpus.json │ ├── failpoints/ │ │ └── mod.rs │ └── src/ │ ├── actors/ │ │ ├── cooperative_indexing.rs │ │ ├── doc_processor.rs │ │ ├── index_serializer.rs │ │ ├── indexer.rs │ │ ├── indexing_pipeline.rs │ │ ├── indexing_service.rs │ │ ├── merge_executor.rs │ │ ├── merge_pipeline.rs │ │ ├── merge_planner.rs │ │ ├── merge_scheduler_service.rs │ │ ├── merge_split_downloader.rs │ │ ├── mod.rs │ │ ├── packager.rs │ │ ├── publisher.rs │ │ ├── sequencer.rs │ │ ├── uploader.rs │ │ └── vrl_processing.rs │ ├── controlled_directory.rs │ ├── lib.rs │ ├── merge_policy/ │ │ ├── const_write_amplification.rs │ │ ├── mod.rs │ │ ├── nop_merge_policy.rs │ │ └── stable_log_merge_policy.rs │ ├── metrics.rs │ ├── models/ │ │ ├── indexed_split.rs │ │ ├── indexing_service_message.rs │ │ ├── indexing_statistics.rs │ │ ├── merge_planner_message.rs │ │ ├── merge_scratch.rs │ │ ├── merge_statistics.rs │ │ ├── mod.rs │ │ ├── packaged_split.rs │ │ ├── processed_doc.rs │ │ ├── publish_lock.rs │ │ ├── publisher_message.rs │ │ ├── raw_doc_batch.rs │ │ ├── shard_positions.rs │ │ └── split_attrs.rs │ ├── source/ │ │ ├── doc_file_reader.rs │ │ ├── file_source.rs │ │ ├── gcp_pubsub_source.rs │ │ ├── ingest/ │ │ │ └── mod.rs │ │ ├── ingest_api_source.rs │ │ ├── kafka_source.rs │ │ ├── kinesis/ │ │ │ ├── api.rs │ │ │ ├── helpers.rs │ │ │ ├── kinesis_source.rs │ │ │ ├── mod.rs │ │ │ └── shard_consumer.rs │ │ ├── mod.rs │ │ ├── pulsar_source.rs │ │ ├── queue_sources/ │ │ │ ├── coordinator.rs │ │ │ ├── design.md │ │ │ ├── helpers.rs │ │ │ ├── local_state.rs │ │ │ ├── memory_queue.rs │ │ │ ├── message.rs │ │ │ ├── mod.rs │ │ │ ├── shared_state.rs │ │ │ ├── sqs_queue.rs │ │ │ └── visibility.rs │ │ ├── source_factory.rs │ │ ├── stdin_source.rs │ │ ├── vec_source.rs │ │ └── void_source.rs │ ├── split_store/ │ │ ├── indexing_split_cache.rs │ │ ├── indexing_split_store.rs │ │ ├── mod.rs │ │ └── split_store_quota.rs │ └── test_utils.rs ├── quickwit-ingest/ │ ├── Cargo.toml │ ├── build.rs │ └── src/ │ ├── codegen/ │ │ └── ingest_service.rs │ ├── doc_batch.rs │ ├── error.rs │ ├── ingest_api_service.rs │ ├── ingest_service.proto │ ├── ingest_v2/ │ │ ├── broadcast/ │ │ │ ├── capacity_score.rs │ │ │ ├── local_shards.rs │ │ │ └── mod.rs │ │ ├── debouncing.rs │ │ ├── doc_mapper.rs │ │ ├── fetch.rs │ │ ├── helpers.rs │ │ ├── idle.rs │ │ ├── ingest.md │ │ ├── ingester.rs │ │ ├── metrics.rs │ │ ├── mod.rs │ │ ├── models.rs │ │ ├── mrecord.rs │ │ ├── mrecordlog_utils.rs │ │ ├── publish_tracker.rs │ │ ├── rate_meter.rs │ │ ├── replication.md │ │ ├── replication.rs │ │ ├── router.rs │ │ ├── routing_table.rs │ │ ├── state.rs │ │ ├── wal_capacity_tracker.rs │ │ └── workbench.rs │ ├── lib.rs │ ├── memory_capacity.rs │ ├── metrics.rs │ ├── mrecordlog_async.rs │ ├── notifications.rs │ ├── position.rs │ └── queue.rs ├── quickwit-integration-tests/ │ ├── Cargo.toml │ ├── src/ │ │ ├── lib.rs │ │ ├── test_utils/ │ │ │ ├── cluster_sandbox.rs │ │ │ ├── mod.rs │ │ │ └── shutdown.rs │ │ └── tests/ │ │ ├── basic_tests.rs │ │ ├── ingest_v1_tests.rs │ │ ├── ingest_v2_tests.rs │ │ ├── mod.rs │ │ ├── no_cp_tests.rs │ │ ├── otlp_tests.rs │ │ ├── sqs_tests.rs │ │ ├── tls_tests.rs │ │ └── update_tests/ │ │ ├── create_on_update.rs │ │ ├── doc_mapping_tests.rs │ │ ├── mod.rs │ │ ├── restart_indexer_tests.rs │ │ └── search_settings_tests.rs │ └── test_data/ │ ├── README.md │ ├── ca.crt │ ├── ca.key │ ├── ca.srl │ ├── regenerate-certs.sh │ ├── server.crt │ ├── server.csr │ ├── server.key │ └── server.v3.ext ├── quickwit-jaeger/ │ ├── Cargo.toml │ └── src/ │ ├── lib.rs │ ├── metrics.rs │ ├── v1.rs │ └── v2.rs ├── quickwit-janitor/ │ ├── Cargo.toml │ └── src/ │ ├── actors/ │ │ ├── delete_task_pipeline.rs │ │ ├── delete_task_planner.rs │ │ ├── delete_task_service.rs │ │ ├── garbage_collector.rs │ │ ├── mod.rs │ │ └── retention_policy_executor.rs │ ├── error.rs │ ├── janitor_service.rs │ ├── lib.rs │ ├── metrics.rs │ └── retention_policy_execution.rs ├── quickwit-lambda-client/ │ ├── Cargo.toml │ ├── README.md │ ├── build.rs │ └── src/ │ ├── deploy.rs │ ├── invoker.rs │ ├── lib.rs │ └── metrics.rs ├── quickwit-lambda-server/ │ ├── Cargo.toml │ └── src/ │ ├── bin/ │ │ └── leaf_search.rs │ ├── context.rs │ ├── error.rs │ ├── handler.rs │ └── lib.rs ├── quickwit-macros/ │ ├── Cargo.toml │ └── src/ │ └── lib.rs ├── quickwit-metastore/ │ ├── Cargo.toml │ ├── README.md │ ├── build.rs │ ├── migrations/ │ │ └── postgresql/ │ │ ├── 10_add-split-incarnation-id.down.sql │ │ ├── 10_add-split-incarnation-id.up.sql │ │ ├── 11_add-split-maturity-timestamp-field.down.sql │ │ ├── 11_add-split-maturity-timestamp-field.up.sql │ │ ├── 12_create-shards.down.sql │ │ ├── 12_create-shards.up.sql │ │ ├── 13_migrate-otel-indexes-v0_6.down.sql │ │ ├── 13_migrate-otel-indexes-v0_6.up.sql │ │ ├── 14_update-shard-id.down.sql │ │ ├── 14_update-shard-id.up.sql │ │ ├── 15_create-templates.down.sql │ │ ├── 15_create-templates.up.sql │ │ ├── 16_create-index-split-uid.down.sql │ │ ├── 16_create-index-split-uid.up.sql │ │ ├── 17_create-index-split-timestamp.down.sql │ │ ├── 17_create-index-split-timestamp.up.sql │ │ ├── 18_create-index-shard-index-uid.down.sql │ │ ├── 18_create-index-shard-index-uid.up.sql │ │ ├── 19_add-split-node-id-field.down.sql │ │ ├── 19_add-split-node-id-field.up.sql │ │ ├── 1_create-indexes.down.sql │ │ ├── 1_create-indexes.up.sql │ │ ├── 20_add-shard-doc-mapping-uid-field.down.sql │ │ ├── 20_add-shard-doc-mapping-uid-field.up.sql │ │ ├── 21_add-shard-update-timestamp-field.down.sql │ │ ├── 21_add-shard-update-timestamp-field.up.sql │ │ ├── 22_change-splits-pkey.down.sql │ │ ├── 22_change-splits-pkey.up.sql │ │ ├── 23_change-indexes-unique-index.down.sql │ │ ├── 23_change-indexes-unique-index.up.sql │ │ ├── 24_add-arbitrary-kv.down.sql │ │ ├── 24_add-arbitrary-kv.up.sql │ │ ├── 25_add-split-size.down.sql │ │ ├── 25_add-split-size.up.sql │ │ ├── 2_create-splits.down.sql │ │ ├── 2_create-splits.up.sql │ │ ├── 3_add-split-publish-timestamp-field.down.sql │ │ ├── 3_add-split-publish-timestamp-field.up.sql │ │ ├── 4_create-delete_tasks.down.sql │ │ ├── 4_create-delete_tasks.up.sql │ │ ├── 5_add-delete-opstamp-splits.down.sql │ │ ├── 5_add-delete-opstamp-splits.up.sql │ │ ├── 6_delete-update-index-update-timestamp-on-split-update-trigger.up.sql │ │ ├── 7_delete-split-table-triggers.up.sql │ │ ├── 8_delete-update-timestamp-on-indexes-table.up.sql │ │ ├── 9_add-split-incarnation-id.down.sql │ │ └── 9_add-split-incarnation-id.up.sql │ ├── src/ │ │ ├── backward_compatibility_tests/ │ │ │ ├── README.md │ │ │ └── mod.rs │ │ ├── checkpoint.rs │ │ ├── error.rs │ │ ├── lib.rs │ │ ├── metastore/ │ │ │ ├── control_plane_metastore.rs │ │ │ ├── file_backed/ │ │ │ │ ├── file_backed_index/ │ │ │ │ │ ├── mod.rs │ │ │ │ │ ├── serialize.rs │ │ │ │ │ └── shards.rs │ │ │ │ ├── file_backed_metastore_factory.rs │ │ │ │ ├── index_id_matcher.rs │ │ │ │ ├── index_template_matcher.rs │ │ │ │ ├── lazy_file_backed_index.rs │ │ │ │ ├── manifest.rs │ │ │ │ ├── mod.rs │ │ │ │ ├── state.rs │ │ │ │ └── store_operations.rs │ │ │ ├── index_metadata/ │ │ │ │ ├── mod.rs │ │ │ │ └── serialize.rs │ │ │ ├── mod.rs │ │ │ └── postgres/ │ │ │ ├── error.rs │ │ │ ├── factory.rs │ │ │ ├── metastore.rs │ │ │ ├── metrics.rs │ │ │ ├── migrator.rs │ │ │ ├── mod.rs │ │ │ ├── model.rs │ │ │ ├── pool.rs │ │ │ ├── queries/ │ │ │ │ ├── index_templates/ │ │ │ │ │ ├── find.sql │ │ │ │ │ ├── insert.sql │ │ │ │ │ └── upsert.sql │ │ │ │ ├── indexes_metadata.sql │ │ │ │ └── shards/ │ │ │ │ ├── acquire.sql │ │ │ │ ├── delete.sql │ │ │ │ ├── fetch.sql │ │ │ │ ├── find_not_deletable.sql │ │ │ │ ├── insert.sql │ │ │ │ ├── open.sql │ │ │ │ ├── prune_age.sql │ │ │ │ └── prune_count.sql │ │ │ ├── split_stream.rs │ │ │ ├── tags.rs │ │ │ └── utils.rs │ │ ├── metastore_factory.rs │ │ ├── metastore_resolver.rs │ │ ├── split_metadata.rs │ │ ├── split_metadata_version.rs │ │ └── tests/ │ │ ├── delete_task.rs │ │ ├── get_identity.rs │ │ ├── index.rs │ │ ├── list_splits.rs │ │ ├── mod.rs │ │ ├── shard.rs │ │ ├── source.rs │ │ ├── split.rs │ │ └── template.rs │ └── test-data/ │ ├── .gitignore │ ├── file-backed-index/ │ │ ├── v0.7.expected.json │ │ ├── v0.7.json │ │ ├── v0.8.expected.json │ │ ├── v0.8.json │ │ ├── v0.9.expected.json │ │ └── v0.9.json │ ├── index-metadata/ │ │ ├── v0.7.expected.json │ │ ├── v0.7.json │ │ ├── v0.8.expected.json │ │ ├── v0.8.json │ │ ├── v0.9.expected.json │ │ └── v0.9.json │ ├── manifest/ │ │ ├── v0.7.expected.json │ │ ├── v0.7.json │ │ ├── v0.8.expected.json │ │ ├── v0.8.json │ │ ├── v0.9.expected.json │ │ └── v0.9.json │ └── split-metadata/ │ ├── v0.7.expected.json │ ├── v0.7.json │ ├── v0.8.expected.json │ ├── v0.8.json │ ├── v0.9.expected.json │ └── v0.9.json ├── quickwit-metastore-utils/ │ ├── Cargo.toml │ └── src/ │ ├── bin/ │ │ ├── README.md │ │ ├── proxy.rs │ │ └── replay.rs │ ├── grpc_request.rs │ └── lib.rs ├── quickwit-opentelemetry/ │ ├── Cargo.toml │ └── src/ │ ├── lib.rs │ └── otlp/ │ ├── logs.rs │ ├── metrics.rs │ ├── mod.rs │ ├── test_utils.rs │ └── traces.rs ├── quickwit-proto/ │ ├── .gitignore │ ├── Cargo.toml │ ├── build.rs │ ├── protos/ │ │ ├── quickwit/ │ │ │ ├── cluster.proto │ │ │ ├── common.proto │ │ │ ├── control_plane.proto │ │ │ ├── developer.proto │ │ │ ├── indexing.proto │ │ │ ├── ingest.proto │ │ │ ├── ingester.proto │ │ │ ├── metastore.proto │ │ │ ├── router.proto │ │ │ └── search.proto │ │ └── third-party/ │ │ ├── gogoproto/ │ │ │ └── gogo.proto │ │ ├── google/ │ │ │ └── protobuf/ │ │ │ ├── any.proto │ │ │ ├── api.proto │ │ │ ├── descriptor.proto │ │ │ ├── duration.proto │ │ │ ├── empty.proto │ │ │ ├── field_mask.proto │ │ │ ├── source_context.proto │ │ │ ├── struct.proto │ │ │ ├── timestamp.proto │ │ │ ├── type.proto │ │ │ └── wrappers.proto │ │ ├── jaeger/ │ │ │ ├── model.proto │ │ │ ├── storage/ │ │ │ │ └── v2/ │ │ │ │ └── trace_storage.proto │ │ │ └── storage.proto │ │ └── opentelemetry/ │ │ └── proto/ │ │ ├── collector/ │ │ │ ├── README.md │ │ │ ├── logs/ │ │ │ │ └── v1/ │ │ │ │ ├── logs_service.proto │ │ │ │ └── logs_service_http.yaml │ │ │ ├── metrics/ │ │ │ │ └── v1/ │ │ │ │ ├── metrics_service.proto │ │ │ │ └── metrics_service_http.yaml │ │ │ └── trace/ │ │ │ └── v1/ │ │ │ ├── trace_service.proto │ │ │ └── trace_service_http.yaml │ │ ├── common/ │ │ │ └── v1/ │ │ │ └── common.proto │ │ ├── logs/ │ │ │ └── v1/ │ │ │ └── logs.proto │ │ ├── metrics/ │ │ │ └── v1/ │ │ │ └── metrics.proto │ │ ├── resource/ │ │ │ └── v1/ │ │ │ └── resource.proto │ │ └── trace/ │ │ └── v1/ │ │ └── trace.proto │ └── src/ │ ├── cluster/ │ │ └── mod.rs │ ├── codegen/ │ │ ├── jaeger/ │ │ │ ├── jaeger.api_v2.rs │ │ │ ├── jaeger.storage.v1.rs │ │ │ ├── jaeger.storage.v2.rs │ │ │ ├── opentelemetry.proto.common.v1.rs │ │ │ ├── opentelemetry.proto.resource.v1.rs │ │ │ └── opentelemetry.proto.trace.v1.rs │ │ ├── opentelemetry/ │ │ │ ├── opentelemetry.proto.collector.logs.v1.rs │ │ │ ├── opentelemetry.proto.collector.metrics.v1.rs │ │ │ ├── opentelemetry.proto.collector.trace.v1.rs │ │ │ ├── opentelemetry.proto.common.v1.rs │ │ │ ├── opentelemetry.proto.logs.v1.rs │ │ │ ├── opentelemetry.proto.metrics.v1.rs │ │ │ ├── opentelemetry.proto.resource.v1.rs │ │ │ └── opentelemetry.proto.trace.v1.rs │ │ └── quickwit/ │ │ ├── quickwit.cluster.rs │ │ ├── quickwit.common.rs │ │ ├── quickwit.control_plane.rs │ │ ├── quickwit.developer.rs │ │ ├── quickwit.indexing.rs │ │ ├── quickwit.ingest.ingester.rs │ │ ├── quickwit.ingest.router.rs │ │ ├── quickwit.ingest.rs │ │ ├── quickwit.metastore.rs │ │ └── quickwit.search.rs │ ├── control_plane/ │ │ └── mod.rs │ ├── developer/ │ │ └── mod.rs │ ├── error.rs │ ├── getters.rs │ ├── indexing/ │ │ └── mod.rs │ ├── ingest/ │ │ ├── ingester.rs │ │ ├── mod.rs │ │ └── router.rs │ ├── lib.rs │ ├── metastore/ │ │ ├── events.rs │ │ └── mod.rs │ ├── search/ │ │ ├── mod.rs │ │ ├── span_id.rs │ │ └── trace_id.rs │ └── types/ │ ├── doc_mapping_uid.rs │ ├── doc_uid.rs │ ├── index_uid.rs │ ├── mod.rs │ ├── pipeline_uid.rs │ ├── position.rs │ └── shard_id.rs ├── quickwit-query/ │ ├── Cargo.toml │ ├── README.md │ ├── benches/ │ │ └── tokenizers_bench.rs │ └── src/ │ ├── aggregations.rs │ ├── elastic_query_dsl/ │ │ ├── bool_query.rs │ │ ├── exists_query.rs │ │ ├── match_bool_prefix.rs │ │ ├── match_phrase_query.rs │ │ ├── match_query.rs │ │ ├── mod.rs │ │ ├── multi_match.rs │ │ ├── one_field_map.rs │ │ ├── phrase_prefix_query.rs │ │ ├── prefix_query.rs │ │ ├── query_string_query.rs │ │ ├── range_query.rs │ │ ├── regex_query.rs │ │ ├── string_or_struct.rs │ │ ├── term_query.rs │ │ ├── terms_query.rs │ │ ├── visitor.rs │ │ └── wildcard_query.rs │ ├── error.rs │ ├── json_literal.rs │ ├── lib.rs │ ├── not_nan_f32.rs │ ├── query_ast/ │ │ ├── bool_query.rs │ │ ├── cache_node.rs │ │ ├── field_presence.rs │ │ ├── full_text_query.rs │ │ ├── mod.rs │ │ ├── phrase_prefix_query.rs │ │ ├── range_query.rs │ │ ├── regex_query.rs │ │ ├── tantivy_query_ast.rs │ │ ├── term_query.rs │ │ ├── term_set_query.rs │ │ ├── user_input_query.rs │ │ ├── utils.rs │ │ ├── visitor.rs │ │ └── wildcard_query.rs │ └── tokenizers/ │ ├── chinese_compatible.rs │ ├── code_tokenizer.rs │ ├── mod.rs │ └── tokenizer_manager.rs ├── quickwit-rest-client/ │ ├── Cargo.toml │ ├── README.md │ ├── resources/ │ │ └── tests/ │ │ └── documents_to_ingest.json │ └── src/ │ ├── error.rs │ ├── lib.rs │ ├── models.rs │ └── rest_client.rs ├── quickwit-search/ │ ├── Cargo.toml │ ├── README.md │ └── src/ │ ├── client.rs │ ├── cluster_client.rs │ ├── collector.rs │ ├── error.rs │ ├── fetch_docs.rs │ ├── find_trace_ids_collector.rs │ ├── invoker.rs │ ├── leaf.rs │ ├── leaf_cache.rs │ ├── lib.rs │ ├── list_fields.rs │ ├── list_fields_cache.rs │ ├── list_terms.rs │ ├── metrics.rs │ ├── metrics_trackers.rs │ ├── retry/ │ │ ├── mod.rs │ │ └── search.rs │ ├── root.rs │ ├── scroll_context.rs │ ├── search_job_placer.rs │ ├── search_permit_provider.rs │ ├── search_response_rest.rs │ ├── service.rs │ ├── tests.rs │ └── top_k_collector.rs ├── quickwit-serve/ │ ├── Cargo.toml │ ├── README.md │ ├── build.rs │ ├── resources/ │ │ └── tests/ │ │ └── jaeger_ui_trace.json │ └── src/ │ ├── build_info.rs │ ├── cluster_api/ │ │ ├── mod.rs │ │ └── rest_handler.rs │ ├── decompression.rs │ ├── delete_task_api/ │ │ ├── handler.rs │ │ └── mod.rs │ ├── developer_api/ │ │ ├── debug.rs │ │ ├── heap_prof.rs │ │ ├── heap_prof_disabled.rs │ │ ├── log_level.rs │ │ ├── mod.rs │ │ ├── pprof.rs │ │ ├── pprof_disabled.rs │ │ └── server.rs │ ├── elasticsearch_api/ │ │ ├── bulk.rs │ │ ├── bulk_v2.rs │ │ ├── filter.rs │ │ ├── mod.rs │ │ ├── model/ │ │ │ ├── bulk_body.rs │ │ │ ├── bulk_query_params.rs │ │ │ ├── cat_indices.rs │ │ │ ├── error.rs │ │ │ ├── field_capability.rs │ │ │ ├── mappings.rs │ │ │ ├── mod.rs │ │ │ ├── multi_search.rs │ │ │ ├── scroll.rs │ │ │ ├── search_body.rs │ │ │ ├── search_query_params.rs │ │ │ ├── search_response.rs │ │ │ └── stats.rs │ │ └── rest_handler.rs │ ├── format.rs │ ├── grpc.rs │ ├── health_check_api/ │ │ ├── handler.rs │ │ └── mod.rs │ ├── index_api/ │ │ ├── index_resource.rs │ │ ├── mod.rs │ │ ├── rest_handler.rs │ │ ├── source_resource.rs │ │ └── split_resource.rs │ ├── indexing_api/ │ │ ├── mod.rs │ │ └── rest_handler.rs │ ├── ingest_api/ │ │ ├── mod.rs │ │ ├── response.rs │ │ └── rest_handler.rs │ ├── jaeger_api/ │ │ ├── mod.rs │ │ ├── model.rs │ │ ├── parse_duration.rs │ │ └── rest_handler.rs │ ├── lib.rs │ ├── load_shield.rs │ ├── metrics.rs │ ├── metrics_api.rs │ ├── node_info_handler.rs │ ├── openapi.rs │ ├── otlp_api/ │ │ ├── mod.rs │ │ └── rest_handler.rs │ ├── rate_modulator.rs │ ├── rest.rs │ ├── rest_api_response.rs │ ├── search_api/ │ │ ├── grpc_adapter.rs │ │ ├── mod.rs │ │ └── rest_handler.rs │ ├── simple_list.rs │ ├── tcp_listener.rs │ ├── template_api/ │ │ ├── mod.rs │ │ └── rest_handler.rs │ └── ui_handler.rs ├── quickwit-storage/ │ ├── Cargo.toml │ ├── src/ │ │ ├── bundle_storage.rs │ │ ├── cache/ │ │ │ ├── base_cache.rs │ │ │ ├── byte_range_cache.rs │ │ │ ├── memory_sized_cache.rs │ │ │ ├── mod.rs │ │ │ ├── quickwit_cache.rs │ │ │ ├── slice_address.rs │ │ │ ├── storage_with_cache.rs │ │ │ └── stored_item.rs │ │ ├── debouncer.rs │ │ ├── error.rs │ │ ├── file_descriptor_cache.rs │ │ ├── lib.rs │ │ ├── local_file_storage.rs │ │ ├── metrics.rs │ │ ├── object_storage/ │ │ │ ├── azure_blob_storage.rs │ │ │ ├── error.rs │ │ │ ├── mod.rs │ │ │ ├── policy.rs │ │ │ ├── s3_compatible_storage.rs │ │ │ └── s3_compatible_storage_resolver.rs │ │ ├── opendal_storage/ │ │ │ ├── base.rs │ │ │ ├── google_cloud_storage.rs │ │ │ └── mod.rs │ │ ├── payload.rs │ │ ├── prefix_storage.rs │ │ ├── ram_storage.rs │ │ ├── split.rs │ │ ├── split_cache/ │ │ │ ├── download_task.rs │ │ │ ├── mod.rs │ │ │ ├── split_table.rs │ │ │ └── tests.rs │ │ ├── storage.rs │ │ ├── storage_factory.rs │ │ ├── storage_resolver.rs │ │ ├── timeout_and_retry_storage.rs │ │ └── versioned_component.rs │ └── tests/ │ ├── azure_storage.rs │ ├── google_cloud_storage.rs │ └── s3_storage.rs ├── quickwit-telemetry/ │ ├── Cargo.toml │ └── src/ │ ├── lib.rs │ ├── payload.rs │ ├── sender.rs │ └── sink.rs ├── quickwit-ui/ │ ├── .gitignore │ ├── .gitignore_for_build_directory │ ├── Makefile │ ├── README.md │ ├── biome.json │ ├── build/ │ │ └── .gitignore │ ├── e2e/ │ │ └── homepage.spec.ts │ ├── index.html │ ├── jest/ │ │ └── setup.js │ ├── jest.config.js │ ├── mocks/ │ │ ├── monacoMock.js │ │ ├── swaggerUIMock.js │ │ └── x-charts.js │ ├── package.json │ ├── playwright.config.ts │ ├── public/ │ │ ├── manifest.json │ │ └── robots.txt │ ├── src/ │ │ ├── components/ │ │ │ ├── ApiUrlFooter.tsx │ │ │ ├── IndexSideBar.tsx │ │ │ ├── IndexSummary.tsx │ │ │ ├── IndexesTable.tsx │ │ │ ├── JsonEditor.tsx │ │ │ ├── LayoutUtils.tsx │ │ │ ├── Loader.tsx │ │ │ ├── QueryActionBar.tsx │ │ │ ├── QueryEditor/ │ │ │ │ ├── AggregationEditor.tsx │ │ │ │ ├── QueryEditor.tsx │ │ │ │ └── config.ts │ │ │ ├── ResponseErrorDisplay.tsx │ │ │ ├── SearchResult/ │ │ │ │ ├── AggregationResult.tsx │ │ │ │ ├── ResultTable.tsx │ │ │ │ ├── Row.tsx │ │ │ │ └── SearchResult.tsx │ │ │ ├── SideBar.tsx │ │ │ ├── TimeRangeSelect.tsx │ │ │ └── TopBar.tsx │ │ ├── index.css │ │ ├── index.test.js │ │ ├── index.tsx │ │ ├── providers/ │ │ │ ├── EditorProvider.tsx │ │ │ └── LocalStorageProvider.tsx │ │ ├── services/ │ │ │ ├── client.test.ts │ │ │ └── client.ts │ │ ├── utils/ │ │ │ ├── SearchComponentProps.ts │ │ │ ├── models.ts │ │ │ ├── theme.ts │ │ │ └── urls.ts │ │ └── views/ │ │ ├── ApiView.tsx │ │ ├── App.tsx │ │ ├── ClusterView.test.jsx │ │ ├── ClusterView.tsx │ │ ├── IndexView.test.jsx │ │ ├── IndexView.tsx │ │ ├── IndexesView.test.jsx │ │ ├── IndexesView.tsx │ │ ├── NodeInfoView.test.jsx │ │ ├── NodeInfoView.tsx │ │ ├── SearchView.test.jsx │ │ └── SearchView.tsx │ ├── tsconfig.json │ └── vite.config.ts ├── rest-api-tests/ │ ├── Pipfile │ ├── README.md │ ├── docker-compose.yaml │ ├── run_tests.py │ └── scenarii/ │ ├── aggregations/ │ │ ├── 0001-aggregations.yaml │ │ ├── 0002-doc-len.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.quickwit.yaml │ │ └── _teardown.quickwit.yaml │ ├── concat_fields/ │ │ ├── 0001_concat_field.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.quickwit.yaml │ │ └── _teardown.quickwit.yaml │ ├── default_search_fields/ │ │ ├── 0001_default_fields.yaml │ │ ├── 0002_invalid_default_fields.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.quickwit.yaml │ │ └── _teardown.quickwit.yaml │ ├── es_compatibility/ │ │ ├── 0001-noquery.yaml │ │ ├── 0002-query_string.yaml │ │ ├── 0003-match.yaml │ │ ├── 0004-term_aggregations.yaml │ │ ├── 0005-query_string_query.yaml │ │ ├── 0006-term_query.yaml │ │ ├── 0007-range_queries.yaml │ │ ├── 0008-sort_by.yaml │ │ ├── 0009-bool_query.yaml │ │ ├── 0010-match_phrase_prefix_query.yaml │ │ ├── 0011-exists-query.yaml │ │ ├── 0012-scroll-api.yaml │ │ ├── 0013-phrase-query.yaml │ │ ├── 0014-multi-match-query.yaml │ │ ├── 0015-terms-query.yaml │ │ ├── 0016-misc-query.yaml │ │ ├── 0017-match-bool-prefix-query.yaml │ │ ├── 0018-search_after.yaml │ │ ├── 0019-count.yaml │ │ ├── 0020-stats.yaml │ │ ├── 0021-cat-indices.yaml │ │ ├── 0022-source.yaml │ │ ├── 0023-extra_filters.yaml │ │ ├── 0024-delete_indices.yaml │ │ ├── 0025-msearch.yaml │ │ ├── 0026-resolve.yaml │ │ ├── 0027-cluster-health.yaml │ │ ├── 0028-fast_only_field_query.yaml │ │ ├── 0029-wildcard.yaml │ │ ├── 0030-prefix.yaml │ │ ├── 0031-regex.yaml │ │ ├── 0032-mappings.yaml │ │ ├── _ctx.elasticsearch.yaml │ │ ├── _ctx.quickwit.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.elasticsearch.yaml │ │ ├── _setup.quickwit.yaml │ │ ├── _teardown.elasticsearch.yaml │ │ ├── _teardown.quickwit.yaml │ │ ├── bulk/ │ │ │ ├── 0001-happy-path.yaml │ │ │ ├── 0002-malformed-action.yaml │ │ │ ├── 0003-validation-failed-index-missing.yaml │ │ │ ├── 0004-put-request.yaml │ │ │ ├── 0005-document-parsing-exception.yaml │ │ │ ├── 0006-partial-index-not-found.yaml │ │ │ ├── 0007-illegal-index-name.yaml │ │ │ ├── _ctx.elasticsearch.yaml │ │ │ ├── _ctx.quickwit.yaml │ │ │ ├── _ctx.yaml │ │ │ ├── _setup.elasticsearch.yaml │ │ │ ├── _setup.quickwit.yaml │ │ │ ├── _teardown.elasticsearch.yaml │ │ │ └── _teardown.quickwit.yaml │ │ └── multi-indices/ │ │ ├── 0001-muti_indices_query.yaml │ │ ├── 0002-muti_indices_scroll.yaml │ │ ├── 0003-multi_indices_aggs.yaml │ │ ├── 0004-missing_index_query.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.elasticsearch.yaml │ │ ├── _setup.quickwit.yaml │ │ ├── _teardown.elasticsearch.yaml │ │ └── _teardown.quickwit.yaml │ ├── es_compatibility_info/ │ │ ├── 0001-info.yaml │ │ ├── _ctx.elasticsearch.yaml │ │ ├── _ctx.quickwit.yaml │ │ └── _ctx.yaml │ ├── es_field_capabilities/ │ │ ├── 0001-field-capabilities.yaml │ │ ├── _ctx.elasticsearch.yaml │ │ ├── _ctx.quickwit.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.elasticsearch.yaml │ │ ├── _setup.quickwit.yaml │ │ ├── _teardown.elasticsearch.yaml │ │ └── _teardown.quickwit.yaml │ ├── multi_splits/ │ │ ├── 0001-request-optimizations.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.quickwit.yaml │ │ └── _teardown.quickwit.yaml │ ├── qw_search_api/ │ │ ├── 0001_ts_range.yaml │ │ ├── 0002_negative_search.yaml │ │ ├── 0003_exists_search.yaml │ │ ├── 0004_exact_string.yaml │ │ ├── 0005_fast_field_search.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.quickwit.yaml │ │ └── _teardown.quickwit.yaml │ ├── search_after/ │ │ ├── 0001-search_after_edge_case.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.quickwit.yaml │ │ └── _teardown.quickwit.yaml │ ├── sort_orders/ │ │ ├── 0001-sort-elasticapi.yaml │ │ ├── _ctx.yaml │ │ ├── _setup.quickwit.yaml │ │ └── _teardown.quickwit.yaml │ └── tag_fields/ │ ├── 0001_allowed_types.yaml │ ├── 0002_negative_tags.yaml │ ├── _ctx.yaml │ ├── _setup.quickwit.yaml │ └── _teardown.quickwit.yaml ├── rust-toolchain.toml ├── rustfmt.toml └── scripts/ ├── about.hbs ├── about.toml ├── check_license_headers.sh ├── check_log_format.sh └── dep-tree.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .cargo/config.toml ================================================ [build] rustflags = ["--cfg", "tokio_unstable"] rustdocflags = ["--cfg", "tokio_unstable"] [target.x86_64-unknown-linux-gnu] # Targeting x86-64-v2 gives a ~2% performance boost while only # disallowing Intel CPUs older than 2008 and AMD CPUs older than 2011. # None of those very old CPUs are used in GCP # (https://cloud.google.com/compute/docs/cpu-platforms). Unfortunately, # AWS does not seem to disclose the exact CPUs they use. rustflags = ["-C", "target-cpu=x86-64-v2", "--cfg", "tokio_unstable"] ================================================ FILE: .claude/skills/bump-tantivy/SKILL.md ================================================ --- name: bump-tantivy description: Bump tantivy to the latest commit on main branch, fix compilation issues, and open a PR disable-model-invocation: true --- # Bump Tantivy Follow these steps to bump tantivy to its latest version: ## Step 1: Check that we are on the main branch Run: `git branch --show-current` If the current branch is not `main`, abort and ask the user to switch to the main branch first. ## Step 2: Ensure main is up to date Run: `git pull origin main` This ensures we're working from the latest code. ## Step 3: Get the latest tantivy SHA Run: `gh api repos/quickwit-oss/tantivy/commits/main --jq '.sha'` Extract the first 7 characters as the short SHA. ## Step 4: Update Cargo.toml Edit `quickwit/Cargo.toml` and update the `rev` field in the tantivy dependency to the new short SHA. The line looks like: ```toml tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "XXXXXXX", ... } ``` ## Step 5: Run cargo check and fix compilation errors Run `cargo check` in the `quickwit` directory to verify compilation. If there are compilation errors: - If the fix is straightforward (simple API changes, renames, etc.), fix them without asking - If the fix is complex or unclear, ask the user before proceeding Repeat until cargo check passes. ## Step 6: Format code Run `make fmt` from the `quickwit/` directory to format the code. ## Step 7: Update licenses Run `make update-licenses` from the `quickwit/` directory, then move the generated file: ``` mv quickwit/LICENSE-3rdparty.csv ./LICENSE-3rdparty.csv ``` ## Step 8: Create a new branch Get the git username: `git config user.name | tr ' ' '-' | tr '[:upper:]' '[:lower:]'` Get today's date: `date +%Y-%m-%d` Create and checkout a new branch named: `{username}/bump-tantivy-{date}` Example: `paul/bump-tantivy-2024-03-15` ## Step 9: Commit changes Stage all modified files and create a commit with message: ``` Bump tantivy to {short-sha} ``` ## Step 10: Push and open a PR Push the branch and open a PR using: ``` gh pr create --title "Bump tantivy to {short-sha}" --body "Updates tantivy dependency to the latest commit on main." ``` Report the PR URL to the user when complete. ================================================ FILE: .claude/skills/fix-clippy/SKILL.md ================================================ --- name: fix-clippy description: Fix all clippy lint warnings in the project --- # Fix Clippy Clippy issues are **warnings**, not errors. Never grep for `error` when looking for clippy issues. ## Step 1: Auto-fix Run `make fix` to automatically fix clippy warnings: ``` make fix ``` ## Step 2: Fix remaining warnings manually Check for remaining warnings that couldn't be auto-fixed: ``` cargo clippy --tests 2>&1 | grep "^warning:" | sort -u ``` For each remaining warning, find the exact location and fix it manually. ================================================ FILE: .claude/skills/fmt/SKILL.md ================================================ --- name: fmt description: Run `make fmt` to check the code format. --- # Format Check Run `make fmt` from the `quickwit/` subdirectory to check code formatting: ``` cd /Users/paul.masurel/git/quickwit/quickwit && make fmt ``` This command checks: 1. Rust code formatting 2. License headers 3. Log format policy (no trailing punctuation, no uppercase first character) If there are log format issues, fix them by: - Making the first character lowercase - Removing trailing punctuation (periods, exclamation marks, etc.) Fix any issues found and re-run until clean. ================================================ FILE: .claude/skills/rationalize-deps/SKILL.md ================================================ --- name: rationalize-deps description: Analyze Cargo.toml dependencies and attempt to remove unused features to reduce compile times and binary size --- # Rationalize Dependencies This skill analyzes Cargo.toml dependencies to identify and remove unused features. ## Overview Many crates enable features by default that may not be needed. This skill: 1. Identifies dependencies with default features enabled 2. Tests if `default-features = false` works 3. Identifies which specific features are actually needed 4. Verifies compilation after changes ## Step 1: Identify the target Ask the user which crate(s) to analyze: - A specific crate name (e.g., "tokio", "serde") - A specific workspace member (e.g., "quickwit-search") - "all" to scan the entire workspace ## Step 2: Analyze current dependencies For the workspace Cargo.toml (`quickwit/Cargo.toml`), list dependencies that: - Do NOT have `default-features = false` - Have default features that might be unnecessary Run: `cargo tree -p -f "{p} {f}" --edges features` to see what features are actually used. ## Step 3: For each candidate dependency ### 3a: Check the crate's default features Look up the crate on crates.io or check its Cargo.toml to understand: - What features are enabled by default - What each feature provides Use: `cargo metadata --format-version=1 | jq '.packages[] | select(.name == "") | .features'` ### 3b: Try disabling default features Modify the dependency in `quickwit/Cargo.toml`: From: ```toml some-crate = { version = "1.0" } ``` To: ```toml some-crate = { version = "1.0", default-features = false } ``` ### 3c: Run cargo check Run: `cargo check --workspace` (or target specific packages for faster feedback) If compilation fails: 1. Read the error messages to identify which features are needed 2. Add only the required features explicitly: ```toml some-crate = { version = "1.0", default-features = false, features = ["needed-feature"] } ``` 3. Re-run cargo check ### 3d: Binary search for minimal features If there are many default features, use binary search: 1. Start with no features 2. If it fails, add half the default features 3. Continue until you find the minimal set ## Step 4: Document findings For each dependency analyzed, report: - Original configuration - New configuration (if changed) - Features that were removed - Any features that are required ## Step 5: Verify full build After all changes, run: ```bash cargo check --workspace --all-targets cargo test --workspace --no-run ``` ## Common Patterns ### Serde Often only needs `derive`: ```toml serde = { version = "1.0", default-features = false, features = ["derive", "std"] } ``` ### Tokio Identify which runtime features are actually used: ```toml tokio = { version = "1.0", default-features = false, features = ["rt-multi-thread", "macros", "sync"] } ``` ### Reqwest Often doesn't need all TLS backends: ```toml reqwest = { version = "0.11", default-features = false, features = ["rustls-tls", "json"] } ``` ## Rollback If changes cause issues: ```bash git checkout quickwit/Cargo.toml cargo check --workspace ``` ## Tips - Start with large crates that have many default features (tokio, reqwest, hyper) - Use `cargo bloat --crates` to identify large dependencies - Check `cargo tree -d` for duplicate dependencies that might indicate feature conflicts - Some features are needed only for tests - consider using `[dev-dependencies]` features ================================================ FILE: .claude/skills/simple-pr/SKILL.md ================================================ --- name: simple-pr description: Create a simple PR from staged changes with an auto-generated commit message disable-model-invocation: true --- # Simple PR Follow these steps to create a simple PR from staged changes: ## Step 1: Check workspace state Run: `git status` Verify that all changes have been staged (no unstaged changes). If there are unstaged changes, abort and ask the user to stage their changes first with `git add`. Also verify that we are on the `main` branch. If not, abort and ask the user to switch to main first. ## Step 2: Ensure main is up to date Run: `git pull origin main` This ensures we're working from the latest code. ## Step 3: Review staged changes Run: `git diff --cached` Review the staged changes to understand what the PR will contain. ## Step 4: Generate commit message Based on the staged changes, generate a concise commit message (1-2 sentences) that describes the "why" rather than the "what". Display the proposed commit message to the user and ask for confirmation before proceeding. ## Step 5: Create a new branch Get the git username: `git config user.name | tr ' ' '-' | tr '[:upper:]' '[:lower:]'` Create a short, descriptive branch name based on the changes (e.g., `fix-typo-in-readme`, `add-retry-logic`, `update-deps`). Create and checkout the branch: `git checkout -b {username}/{short-descriptive-name}` ## Step 6: Commit changes Commit with the message from step 3: ``` git commit -m "{commit-message}" ``` ## Step 7: Push and open a PR Push the branch and open a PR: ``` git push -u origin {branch-name} gh pr create --title "{commit-message-title}" --body "{longer-description-if-needed}" ``` Report the PR URL to the user when complete. ================================================ FILE: .devcontainer/devcontainer.json ================================================ { "name": "Quickwit", "image": "mcr.microsoft.com/devcontainers/rust:bookworm", "customizations": { "codespaces": { "openFiles": [ "CONTRIBUTING.md" ] }, "vscode": { "extensions": [ "rust-lang.rust-analyzer" ] } }, "hostRequirements": { "cpus": 4, "memory": "16gb" }, "runArgs": [ "--init" ], "mounts": [ { "source": "/var/run/docker.sock", "target": "/var/run/docker.sock", "type": "bind" } ], "features": { "docker-from-docker": { "version": "latest", "moby": true }, "ghcr.io/devcontainers/features/node:1": { "version": "24" }, "ghcr.io/devcontainers/features/aws-cli:1": {}, "ghcr.io/devcontainers-contrib/features/protoc:1": {} }, "postCreateCommand": ".devcontainer/post-create.sh" } ================================================ FILE: .devcontainer/post-create.sh ================================================ #!/bin/bash # Define success and error color codes SUCCESS_COLOR="\e[32m" ERROR_COLOR="\e[31m" RESET_COLOR="\e[0m" # Define success tracking variables rustupToolchainNightlyInstalled=false cmakeInstalled=false # Define installation functions #Installing manually for now until we figure out why "ghcr.io/devcontainers-community/features/cmake": {} is not working install_cmake() { echo -e "Installing CMake..." sudo apt-get update sudo apt-get install -y cmake > /dev/null 2>&1 if [[ "$(cmake --version)" =~ "cmake version" ]]; then echo -e "${SUCCESS_COLOR}CMake installed successfully.${RESET_COLOR}" cmakeInstalled=true else echo -e "${ERROR_COLOR}CMake installation failed. Please install it manually.${RESET_COLOR}" fi } install_rustup_toolchain_nightly() { echo -e "Installing Rustup nightly toolchain..." rustup toolchain install nightly > /dev/null 2>&1 rustup component add rustfmt --toolchain nightly > /dev/null 2>&1 if [[ "$(rustup toolchain list)" =~ "nightly" && "$(rustup component list --toolchain nightly | grep rustfmt)" =~ "installed" ]]; then echo -e "${SUCCESS_COLOR}Rustup nightly toolchain and rustfmt installed successfully.${RESET_COLOR}" rustupToolchainNightlyInstalled=true else echo -e "${ERROR_COLOR}Rustup nightly toolchain and/or rustfmt installation failed. Please install them manually.${RESET_COLOR}" fi } # Install tools install_cmake install_rustup_toolchain_nightly # Copy our custom welcome message to replace the default github welcome message sudo cp .devcontainer/welcome.txt /usr/local/etc/vscode-dev-containers/first-run-notice.txt # Check the success tracking variables if $rustupToolchainNightlyInstalled && $cmakeInstalled; then echo -e "${SUCCESS_COLOR}All tools installed successfully.${RESET_COLOR}" else echo -e "${ERROR_COLOR}One or more tools failed to install. Please check the output for errors and install the failed tools manually.${RESET_COLOR}" fi ================================================ FILE: .devcontainer/welcome.txt ================================================ 👋 Welcome to the project! All the necessary tools have already been installed for you 🎉. You can go ahead and start hacking! Happy coding💻. Here are some useful commands you can run: 🔧 `make test-all` - starts necessary Docker services and runs all tests. 🔧 `make -k test-all docker-compose-down` - the same as above, but tears down the Docker services after running all the tests. 🔧 `make fmt` - runs formatter, this command requires the nightly toolchain to be installed by running `rustup toolchain install nightly`. 🔧 `make fix` - runs formatter and clippy checks. 🔧 `make typos` - runs the spellcheck tool over the codebase. (Install by running `cargo install typos`) 🔧 `make build-docs` - builds docs. 🔧 `make docker-compose-up` - starts Docker services. 🔧 `make docker-compose-down` - stops Docker services. 🔧 `make docker-compose-logs` - shows Docker logs. ================================================ FILE: .dockerignore ================================================ **/*.md **/*.txt **/.* **/build **/Dockerfile **/node_modules **/qwdata **/target docs examples !.git/ !quickwit-ui/build/.gitignore !quickwit-ui/.gitignore_for_build_directory ================================================ FILE: .gitattributes ================================================ **/codegen/** linguist-generated ================================================ FILE: .github/ISSUE_TEMPLATE/bug_report.md ================================================ --- name: Bug report about: Create a report to help us improve title: "" labels: bug assignees: "" --- **Describe the bug** A clear and concise description of what the bug is. **Steps to reproduce (if applicable)** Steps to reproduce the behavior: 1. 2. **Expected behavior** A clear and concise description of what you expected to happen. **Configuration:** Please provide: 1. Output of `quickwit --version` 2. The index_config.yaml ================================================ FILE: .github/ISSUE_TEMPLATE/documentation_request.md ================================================ --- name: Documentation request about: Suggest a documentation enhancement title: "[Documentation topic]" labels: documentation assignees: "" --- ## My documentation idea Use this section to give a description of what your enhancement is about. Examples: > I would like to add how to configure MinIO storage for Quickwit: > **What do you all think?** 👍 I would love to see it! 🚀 I would love to help! Thank you for your request! ================================================ FILE: .github/ISSUE_TEMPLATE/feature_request.md ================================================ --- name: Feature request about: Suggest an idea for this project title: "" labels: enhancement assignees: "" --- **Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or information about the feature request here. ================================================ FILE: .github/ISSUE_TEMPLATE/tutorial_request.md ================================================ --- name: Tutorial request about: Suggest a Quickwit tutorial title: "[Tutorial topic]" labels: tutorial assignees: "" --- ## My tutorial idea Use this section to give a description of what your tutorial is about. Examples: > I would like to write a tutorial that shows how to use Quickwit: > > - "for storing traces..." > - "with Grafana/Jaeger/MinIO..." > - "for ingesting terabytes per day with Kafka..." Are there any particular tools, concepts, languages or platforms that readers will learn about? **What do you all think?** 👍 I would love to see it! 🚀 I would love to help! Thank you for your request! ================================================ FILE: .github/PULL_REQUEST_TEMPLATE.md ================================================ ### Description Describe the proposed changes made in this PR. ### How was this PR tested? Describe how you tested this PR. ================================================ FILE: .github/actions/cargo-build-macos-binary/action.yml ================================================ name: "Build Quickwit binary for macOS" description: "Build React app and Rust binary for macOS with cargo build." inputs: target: description: "Target" required: true version: description: "Binary version" required: true token: description: "GitHub access token" required: true runs: using: "composite" steps: - run: echo "ASSET_FULL_NAME=quickwit-${{ inputs.version }}-${{ inputs.target }}" >> $GITHUB_ENV shell: bash - uses: actions/setup-node@v3 with: node-version: 24 cache: "yarn" cache-dependency-path: quickwit/quickwit-ui/yarn.lock - run: yarn global add node-gyp shell: bash - run: make build-ui shell: bash - name: Install protoc run: brew install protobuf shell: bash - name: Install rustup shell: bash run: curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain none -y - name: Add target ${{ inputs.target }} run: rustup target add ${{ inputs.target }} shell: bash working-directory: ./quickwit - name: Retrieve and export commit date, hash, and tags run: | echo "QW_COMMIT_DATE=$(TZ=UTC0 git log -1 --format=%cd --date=format-local:%Y-%m-%dT%H:%M:%SZ)" >> $GITHUB_ENV echo "QW_COMMIT_HASH=$(git rev-parse HEAD)" >> $GITHUB_ENV echo "QW_COMMIT_TAGS=$(git tag --points-at HEAD | tr '\n' ',')" >> $GITHUB_ENV shell: bash - name: Build binary run: cargo build --release --features release-macos-feature-vendored-set --target ${{ matrix.target }} --bin quickwit shell: bash working-directory: ./quickwit env: QW_COMMIT_DATE: ${{ env.QW_COMMIT_DATE }} QW_COMMIT_HASH: ${{ env.QW_COMMIT_HASH }} QW_COMMIT_TAGS: ${{ env.QW_COMMIT_TAGS }} - name: Bundle archive run: | make archive BINARY_FILE=quickwit/target/${{ inputs.target }}/release/quickwit \ BINARY_VERSION=${{ inputs.version }} ARCHIVE_NAME=${{ env.ASSET_FULL_NAME }} shell: bash - name: Save binary archive for three days uses: actions/upload-artifact@v4.4.0 with: name: ${{ env.ASSET_FULL_NAME }}.tar.gz path: ./${{ env.ASSET_FULL_NAME }}.tar.gz retention-days: 3 - name: Deploy archive to GitHub release uses: quickwit-inc/upload-to-github-release@9b2c40fba23bf8dea05b7d2eece24cbc95d4a190 env: GITHUB_TOKEN: ${{ inputs.token }} with: file: ${{ env.ASSET_FULL_NAME }}.tar.gz overwrite: true draft: ${{ inputs.version != 'nightly' }} tag_name: ${{ inputs.version }} ================================================ FILE: .github/actions/cross-build-binary/action.yml ================================================ name: "Build Quickwit binary with cargo cross" description: "Build React app and Rust binary with cargo cross." inputs: target: description: "Target" required: true version: description: "Binary version" required: true token: description: "GitHub access token" required: true runs: using: "composite" steps: - run: echo "ASSET_FULL_NAME=quickwit-${{ inputs.version }}-${{ inputs.target }}" >> $GITHUB_ENV shell: bash - uses: actions/setup-node@v3 with: node-version: 24 cache: "yarn" cache-dependency-path: quickwit/quickwit-ui/yarn.lock - run: yarn global add node-gyp shell: bash - run: make build-ui shell: bash - name: Install rustup shell: bash run: curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain none -y - name: Install cross run: cargo install cross shell: bash - name: Retrieve and export commit date, hash, and tags run: | echo "QW_COMMIT_DATE=$(TZ=UTC0 git log -1 --format=%cd --date=format-local:%Y-%m-%dT%H:%M:%SZ)" >> $GITHUB_ENV echo "QW_COMMIT_HASH=$(git rev-parse HEAD)" >> $GITHUB_ENV echo "QW_COMMIT_TAGS=$(git tag --points-at HEAD | tr '\n' ',')" >> $GITHUB_ENV shell: bash - name: Build Quickwit run: cross build --release --features release-feature-vendored-set --target ${{ inputs.target }} --bin quickwit shell: bash env: QW_COMMIT_DATE: ${{ env.QW_COMMIT_DATE }} QW_COMMIT_HASH: ${{ env.QW_COMMIT_HASH }} QW_COMMIT_TAGS: ${{ env.QW_COMMIT_TAGS }} working-directory: ./quickwit - name: Bundle archive run: | make archive BINARY_FILE=quickwit/target/${{ inputs.target }}/release/quickwit \ BINARY_VERSION=${{ inputs.version }} ARCHIVE_NAME=${{ env.ASSET_FULL_NAME }} shell: bash - name: Save binary archive for three days uses: actions/upload-artifact@v4.4.0 with: name: ${{ env.ASSET_FULL_NAME }}.tar.gz path: ./${{ env.ASSET_FULL_NAME }}.tar.gz retention-days: 3 - name: Upload archive uses: quickwit-inc/upload-to-github-release@9b2c40fba23bf8dea05b7d2eece24cbc95d4a190 env: GITHUB_TOKEN: ${{ inputs.token }} with: file: ${{ env.ASSET_FULL_NAME }}.tar.gz overwrite: true draft: ${{ inputs.version != 'nightly' }} tag_name: ${{ inputs.version }} ================================================ FILE: .github/dependabot.yml ================================================ version: 2 updates: # Rust dependencies - package-ecosystem: cargo directory: "/quickwit" schedule: interval: "monthly" groups: rust-dependencies: patterns: - "*" open-pull-requests-limit: 10 ignore: - dependency-name: "*" update-types: ["version-update:semver-patch"] # Docker dependencies - package-ecosystem: docker directory: "/" schedule: interval: "monthly" open-pull-requests-limit: 10 # GitHub Actions - package-ecosystem: github-actions directory: "/" schedule: interval: "monthly" groups: github-actions: patterns: - "*" open-pull-requests-limit: 10 # NPM dependencies - package-ecosystem: npm directory: "/" schedule: interval: "monthly" groups: npm-dependencies: patterns: - "*" open-pull-requests-limit: 10 ================================================ FILE: .github/workflows/ci.yml ================================================ name: CI on: workflow_dispatch: pull_request: push: branches: - main - trigger-ci-workflow paths: - "quickwit/**" - "!quickwit/quickwit-ui/**" permissions: contents: read env: CARGO_INCREMENTAL: 0 QW_DISABLE_TELEMETRY: 1 QW_TEST_DATABASE_URL: postgres://quickwit-dev:quickwit-dev@localhost:5432/quickwit-metastore-dev RUST_BACKTRACE: 1 RUSTDOCFLAGS: -Dwarnings -Arustdoc::private_intra_doc_links RUSTFLAGS: -Dwarnings --cfg tokio_unstable # Ensures that we cancel running jobs for the same PR / same workflow. concurrency: group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} cancel-in-progress: true jobs: tests: name: Unit tests runs-on: "ubuntu-latest" timeout-minutes: 60 permissions: contents: read actions: write services: # PostgreSQL service container postgres: image: postgres:latest ports: - 5432:5432 env: POSTGRES_USER: quickwit-dev POSTGRES_PASSWORD: quickwit-dev POSTGRES_DB: quickwit-metastore-dev # Set health checks to wait until postgres has started options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 steps: - name: Cleanup Disk Space run: | df -h if [ "$(df -BG / | awk 'NR==2 {gsub("G","",$4); print $4}')" -lt 30 ]; then echo "Less than 30GiB available. Running cleanup..." sudo rm -rf /usr/share/dotnet sudo rm -rf /usr/local/lib/android sudo rm -rf /usr/share/swift sudo rm -rf /usr/local/.ghcup sudo rm -rf /opt/hostedtoolcache/CodeQL df -h else echo "30GiB or more available. Skipping cleanup." fi - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Install Ubuntu packages run: | sudo apt-get update sudo apt-get -y install protobuf-compiler - uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v.6.1.0 with: python-version: '3.11' - uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2 id: modified with: filters: | rust_src: - quickwit/**/*.rs - quickwit/**/*.toml - quickwit/**/*.proto - quickwit/rest-api-tests/** - .github/workflows/ci.yml - name: Setup stable Rust Toolchain if: steps.modified.outputs.rust_src == 'true' uses: dtolnay/rust-toolchain@f7ccc83f9ed1e5b9c81d8a67d7ad1a747e22a561 # master with: toolchain: stable - name: Setup cache uses: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb5 # v2.8.2 if: steps.modified.outputs.rust_src == 'true' with: workspaces: "./quickwit -> target" shared-key: "quickwit-cargo" - name: Install nextest if: always() && steps.modified.outputs.rust_src == 'true' uses: taiki-e/install-action@aba36d755ec7ca22d38b12111787c26115943952 with: tool: cargo-nextest - name: cargo build if: always() && steps.modified.outputs.rust_src == 'true' run: cargo build --features=postgres --tests --bin quickwit working-directory: ./quickwit - name: cargo nextest if: always() && steps.modified.outputs.rust_src == 'true' run: cargo nextest run --features=postgres --retries 1 working-directory: ./quickwit - name: Install python packages if: always() && steps.modified.outputs.rust_src == 'true' run: | pip install --user --require-hashes -r ${{ github.workspace }}/.github/workflows/requirements.txt pipenv install --deploy --ignore-pipfile working-directory: ./quickwit/rest-api-tests - name: Run REST API tests if: always() && steps.modified.outputs.rust_src == 'true' run: pipenv run python3 ./run_tests.py --binary ../target/debug/quickwit working-directory: ./quickwit/rest-api-tests lints: name: Lints runs-on: "ubuntu-latest" timeout-minutes: 60 permissions: contents: read actions: write steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2 id: modified with: filters: | rust_src: - quickwit/**/*.rs - quickwit/**/*.toml - quickwit/**/*.proto - .github/workflows/ci.yml - name: Install Ubuntu packages if: always() && steps.modified.outputs.rust_src == 'true' run: | sudo apt-get update sudo apt-get -y install protobuf-compiler - name: Setup nightly Rust Toolchain (for rustfmt) if: steps.modified.outputs.rust_src == 'true' uses: dtolnay/rust-toolchain@f7ccc83f9ed1e5b9c81d8a67d7ad1a747e22a561 # master with: toolchain: nightly components: rustfmt - name: Setup stable Rust Toolchain if: steps.modified.outputs.rust_src == 'true' uses: dtolnay/rust-toolchain@f7ccc83f9ed1e5b9c81d8a67d7ad1a747e22a561 # master with: toolchain: stable - name: Setup cache if: steps.modified.outputs.rust_src == 'true' uses: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb5 # v2.8.2 with: workspaces: "./quickwit -> target" shared-key: "quickwit-cargo" - name: Install cargo deny if: always() && steps.modified.outputs.rust_src == 'true' uses: taiki-e/cache-cargo-install-action@34ce5120836e5f9f1508d8713d7fdea0e8facd6f # v3.0.1 with: # 0.18 requires rustc 1.85 tool: cargo-deny@0.17.0 - name: Install cargo machete if: always() && steps.modified.outputs.rust_src == 'true' uses: taiki-e/cache-cargo-install-action@34ce5120836e5f9f1508d8713d7fdea0e8facd6f # v3.0.1 with: tool: cargo-machete - name: cargo clippy if: always() && steps.modified.outputs.rust_src == 'true' run: cargo clippy --workspace --tests --all-features working-directory: ./quickwit - name: cargo deny if: always() && steps.modified.outputs.rust_src == 'true' run: cargo deny check licenses working-directory: ./quickwit - name: cargo machete if: always() && steps.modified.outputs.rust_src == 'true' run: cargo machete working-directory: ./quickwit - name: cargo doc if: always() && steps.modified.outputs.rust_src == 'true' run: cargo doc --no-deps working-directory: ./quickwit - name: License headers check if: always() run: bash scripts/check_license_headers.sh working-directory: ./quickwit - name: rustfmt if: always() && steps.modified.outputs.rust_src == 'true' run: cargo +nightly fmt --all -- --check working-directory: ./quickwit thirdparty-license: name: Check Datadog third-party license file runs-on: ubuntu-latest permissions: contents: read actions: write steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Install Rust toolchain uses: dtolnay/rust-toolchain@f7ccc83f9ed1e5b9c81d8a67d7ad1a747e22a561 # master with: toolchain: stable - name: Cache cargo tools uses: actions/cache@9255dc7a253b0ccc959486e2bca901246202afeb # v5.0.1 with: path: ~/.cargo/bin key: ${{ runner.os }}-cargo-tools-${{ hashFiles('**/Cargo.lock') }} - name: Install dd-rust-license-tool run: dd-rust-license-tool --help || cargo install --git https://github.com/DataDog/rust-license-tool.git --force - name: Check Datadog third-party license file run: dd-rust-license-tool --config quickwit/license-tool.toml --manifest-path quickwit/Cargo.toml check ================================================ FILE: .github/workflows/coverage.yml ================================================ name: Code coverage on: workflow_dispatch: push: branches: - main - trigger-coverage-workflow paths: - quickwit/Cargo.toml - quickwit/Cargo.lock - quickwit/quickwit-*/** permissions: contents: read env: AWS_REGION: us-east-1 AWS_ACCESS_KEY_ID: "placeholder" AWS_SECRET_ACCESS_KEY: "placeholder" CARGO_INCREMENTAL: 0 PUBSUB_EMULATOR_HOST: "localhost:8681" QW_DISABLE_TELEMETRY: 1 QW_S3_ENDPOINT: "http://localhost:4566" # Services are exposed as localhost because we are not running coverage in a container. QW_S3_FORCE_PATH_STYLE_ACCESS: 1 QW_TEST_DATABASE_URL: postgres://quickwit-dev:quickwit-dev@localhost:5432/quickwit-metastore-dev RUSTFLAGS: -Dwarnings --cfg tokio_unstable jobs: test: name: Coverage runs-on: gh-ubuntu-arm64 timeout-minutes: 40 permissions: contents: read actions: write # Setting a containing will require to fix the QW_S3_ENDPOINT to http://localstack:4566 services: localstack: image: localstack/localstack:latest ports: - "4566:4566" - "4571:4571" - "8080:8080" env: SERVICES: kinesis,s3,sqs options: >- --health-cmd "curl -k https://localhost:4566" --health-interval 10s --health-timeout 5s --health-retries 5 postgres: image: postgres:latest ports: - "5432:5432" env: POSTGRES_USER: quickwit-dev POSTGRES_PASSWORD: quickwit-dev POSTGRES_DB: quickwit-metastore-dev options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 kafka-broker: image: confluentinc/confluent-local:7.4.11 ports: - "9092:9092" - "9101:9101" env: # Mode KRaft (Single Node) KAFKA_NODE_ID: 1 KAFKA_PROCESS_ROLES: 'broker,controller' KAFKA_CONTROLLER_QUORUM_VOTERS: '1@localhost:9093' KAFKA_LOG4J_LOGGERS: "org.apache.kafka.image.loader.MetadataLoader=WARN" # Listeners KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT' KAFKA_LISTENERS: 'EXTERNAL://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093' KAFKA_ADVERTISED_LISTENERS: 'EXTERNAL://localhost:9092' KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER' KAFKA_INTER_BROKER_LISTENER_NAME: 'EXTERNAL' # Configuration simplifiée KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 # ID du Cluster (Nécessaire pour KRaft) CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk' KAFKA_HEAP_OPTS: -Xms256M -Xmx256M options: >- --health-cmd "ub kafka-ready -b localhost:9092 1 5" --health-interval 10s --health-timeout 5s --health-retries 5 gcp-pubsub-emulator: image: thekevjames/gcloud-pubsub-emulator:550.0.0 ports: - "8681:8681" env: PUBSUB_PROJECT1: "quickwit-emulator,emulator_topic:emulator_subscription" steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Install lib libsasl2 run: | sudo apt update sudo apt install libsasl2-dev sudo apt install libsasl2-2 - uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v.6.1.0 with: python-version: '3.11' - uses: actions/cache@9255dc7a253b0ccc959486e2bca901246202afeb # v5.0.1 with: path: | ~/.cargo/git ~/.cargo/registry key: ${{ runner.os }}-cargo-test-${{ hashFiles('Cargo.lock') }} restore-keys: | ${{ runner.os }}-cargo-test-${{ hashFiles('Cargo.lock') }} ${{ runner.os }}-cargo-test - name: Install python packages run: | pip install --user --require-hashes -r ${{ github.workspace }}/.github/workflows/requirements.txt pipenv install --deploy --ignore-pipfile working-directory: ./quickwit/quickwit-cli/tests - name: Prepare LocalStack S3 run: pipenv run ./prepare_tests.sh working-directory: ./quickwit/quickwit-cli/tests # GitHub Actions does not allow services to be started with a custom command, # so we are running Azurite as a container manually. - name: Run Azurite service run: DOCKER_SERVICES=azurite make docker-compose-up # GitHub Actions does not allow services to be started with a custom command, # so we are running fake gcs server as a container manually. - name: Run Fake GCS Server service run: DOCKER_SERVICES=fake-gcs-server make docker-compose-up - name: Run Pulsar service run: DOCKER_SERVICES=pulsar make docker-compose-up - name: Install Rust run: rustup update stable - name: Install cargo-llvm-cov, cargo-nextest, and protoc uses: taiki-e/install-action@90558ad1e179036f31467972b00dec6cb80701fa # v2.66.3 with: tool: cargo-llvm-cov,nextest,protoc # We limit the number of jobs to 4 to avoid OOM errors when linking the binary. - name: Generate code coverage run: | cargo llvm-cov clean --workspace cargo llvm-cov nextest --no-report --test failpoints --features fail/failpoints --retries 4 # increase stack size for test_all_with_s3_localstack_cli, see quickwit#4963 RUST_MIN_STACK=67108864 CARGO_BUILD_JOBS=4 cargo llvm-cov nextest --no-report --all-features --retries 4 cargo llvm-cov report --lcov --output-path lcov.info working-directory: ./quickwit - name: Upload coverage to Codecov uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v5.5.2 with: token: ${{ secrets.CODECOV_TOKEN }} # not required for public repos files: ./quickwit/lcov.info on-failure: if: ${{ github.repository_owner == 'quickwit-oss' && failure() }} name: On Failure needs: [test] runs-on: ubuntu-latest steps: - name: Send Message uses: sarisia/actions-status-discord@eb045afee445dc055c18d3d90bd0f244fd062708 # v1.16.0 with: webhook: ${{ secrets.DISCORD_WEBHOOK }} nodetail: true color: "#FF0000" title: "" description: | ### ❌ [${{ github.event.pull_request.title }}](${{ github.event.pull_request.html_url }}) @${{ github.actor }} quickwit coverage CI failed on your PR. Coverage CI contains tests that are not running in the regular CI because they are too lengthy. For this reason it is possible for it to break even if the tests were passing on your PR. This is not a catastrophy, but you are responsible for fixing it! You can run the full test suite locally with `make test-all`. Please report in this channel that you are working on it/fixed it/or if it is a flaky test/ or if you need help. **[View logs](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})** ================================================ FILE: .github/workflows/dependency.yml ================================================ name: "Dependency Review" on: [pull_request] permissions: contents: read # Ensures that we cancel running jobs for the same PR / same workflow. concurrency: group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} cancel-in-progress: true jobs: dependency-review: runs-on: ubuntu-latest steps: - name: "Checkout Repository" uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: "Dependency Review" uses: actions/dependency-review-action@98884d411b0f1c583e5ee579e7e897d4623019c2 # v4.8.1 with: # This is an minor vuln on the rsa crate, used for # google storage. allow-ghsas: GHSA-c38w-74pg-36hr,GHSA-4grx-2x9w-596c ================================================ FILE: .github/workflows/publish_cross_images.yml ================================================ name: Publish custom cross images on: workflow_dispatch: push: branches: - main paths: - "build/cross-images/**" permissions: contents: read jobs: build-cross-images: name: Publish cross images runs-on: ubuntu-latest environment: name: production steps: - name: Check out the repo uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Log in to Docker Hub uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_ACCESS_TOKEN }} - name: Build and push cross images run: make cross-images ================================================ FILE: .github/workflows/publish_docker_images.yml ================================================ name: Build and publish Docker images on: workflow_dispatch: push: branches: - main - release-0.9 paths: - "quickwit/**" tags: - airmail - happy-plazza - qw* - v* permissions: contents: read env: REGISTRY_IMAGE: quickwit/quickwit jobs: docker: strategy: matrix: include: - os: ubuntu-latest platform: linux/amd64 platform_suffix: amd64 - os: gh-ubuntu-arm64 platform: linux/arm64 platform_suffix: arm64 runs-on: ${{ matrix.os }} permissions: contents: read actions: write environment: name: production steps: - name: Cleanup Disk Space run: | df -h sudo rm -rf /opt/hostedtoolcache/CodeQL sudo rm -rf /usr/local/.ghcup sudo rm -rf /usr/local/lib/android sudo rm -rf /usr/share/dotnet sudo rm -rf /usr/share/swift df -h - name: Checkout uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Login to Docker Hub uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_ACCESS_TOKEN }} - name: Set up QEMU uses: docker/setup-qemu-action@c7c53464625b32c7a7e944ae62b3e17d2b600130 # v3.7.0 - name: Set up Docker Buildx uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0 - name: Docker meta id: meta uses: docker/metadata-action@c299e40c65443455700f0fdfc63efafe5b349051 # v5.10.0 with: images: | ${{ env.REGISTRY_IMAGE }} labels: | org.opencontainers.image.title=Quickwit maintainer=Quickwit, Inc. org.opencontainers.image.vendor=Quickwit, Inc. org.opencontainers.image.licenses=Apache-2.0 - name: Retrieve commit date, hash, and tags run: | echo "QW_COMMIT_DATE=$(TZ=UTC0 git log -1 --format=%cd --date=format-local:%Y-%m-%dT%H:%M:%SZ)" >> $GITHUB_ENV echo "QW_COMMIT_HASH=$(git rev-parse HEAD)" >> $GITHUB_ENV echo "QW_COMMIT_TAGS=$(git tag --points-at HEAD | tr '\n' ',')" >> $GITHUB_ENV if [[ "${{ github.event_name }}" == "push" && "${{ github.ref_type }}" == "tag" && "${GITHUB_REF#refs/tags/}" == *"jemprof"* ]]; then echo "CARGO_FEATURES=release-jemalloc-profiled" >> $GITHUB_ENV else echo "CARGO_FEATURES=release-feature-set" >> $GITHUB_ENV fi - name: Build and push image uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 id: build with: context: . platforms: ${{ matrix.platform }} build-args: | QW_COMMIT_DATE=${{ env.QW_COMMIT_DATE }} QW_COMMIT_HASH=${{ env.QW_COMMIT_HASH }} QW_COMMIT_TAGS=${{ env.QW_COMMIT_TAGS }} CARGO_FEATURES=${{ env.CARGO_FEATURES }} labels: ${{ steps.meta.outputs.labels }} outputs: type=image,name=${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true - name: Export digest run: | mkdir -p /tmp/digests digest="${{ steps.build.outputs.digest }}" touch "/tmp/digests/${digest#sha256:}" - name: Upload digest uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0 with: name: digest-${{ matrix.platform_suffix }} path: /tmp/digests/* if-no-files-found: error retention-days: 1 merge: runs-on: ubuntu-latest needs: [docker] permissions: contents: read actions: read environment: production steps: - name: Download digests uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0 with: pattern: digest-* path: /tmp/digests merge-multiple: true - name: Set up Docker Buildx uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3.12.0 - name: Docker meta id: meta uses: docker/metadata-action@c299e40c65443455700f0fdfc63efafe5b349051 # v5.10.0 with: images: ${{ env.REGISTRY_IMAGE }} flavor: | latest=false tags: | type=edge,branch=main type=edge,branch=main,suffix=-slim-bookworm type=semver,pattern={{version}} type=semver,pattern={{version}},value=latest type=semver,pattern={{version}},suffix=-slim-bookworm type=ref,event=tag type=raw,value=v0.9.0-rc,enable=${{ github.ref == 'refs/heads/release-0.9' }} - name: Login to Docker Hub uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_ACCESS_TOKEN }} - name: Create manifest list and push tags working-directory: /tmp/digests run: | docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \ $(printf '${{ env.REGISTRY_IMAGE }}@sha256:%s ' *) - name: Inspect image run: | docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }} ================================================ FILE: .github/workflows/publish_lambda.yaml ================================================ # This workflow creates a new release for a quickwit search aws lambda. # The artifact is a zip file containing a binary for ARM 64, # ready to be deployed by the deployer. # # See quickwit-lambda-client/README.md name: Release Lambda binary on: push: tags: - 'lambda-*' workflow_dispatch: inputs: version: description: 'Version tag (e.g., v0.8.0)' required: false default: 'dev' permissions: contents: read jobs: build-lambda: name: Build Lambda ARM64 runs-on: ubuntu-latest permissions: contents: write actions: write steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Set version run: | if [ "${{ github.ref_type }}" = "tag" ]; then # Extract version from tag (e.g., lambda-v0.8.0 -> v0.8.0) echo "ASSET_VERSION=${GITHUB_REF_NAME#lambda-}" >> $GITHUB_ENV elif [ -n "${{ github.event.inputs.version }}" ] && [ "${{ github.event.inputs.version }}" != "dev" ]; then echo "ASSET_VERSION=${{ github.event.inputs.version }}" >> $GITHUB_ENV else echo "ASSET_VERSION=dev-$(git rev-parse --short HEAD)" >> $GITHUB_ENV fi - name: Install rustup run: curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain none -y - name: Install cross run: cargo install cross - name: Retrieve and export commit date, hash, and tags run: | echo "QW_COMMIT_DATE=$(TZ=UTC0 git log -1 --format=%cd --date=format-local:%Y-%m-%dT%H:%M:%SZ)" >> $GITHUB_ENV echo "QW_COMMIT_HASH=$(git rev-parse HEAD)" >> $GITHUB_ENV echo "QW_COMMIT_TAGS=$(git tag --points-at HEAD | tr '\n' ',')" >> $GITHUB_ENV - name: Build Lambda binary run: cross build --release --features lambda-release --target aarch64-unknown-linux-gnu -p quickwit-lambda-server --bin quickwit-aws-lambda-leaf-search env: QW_COMMIT_DATE: ${{ env.QW_COMMIT_DATE }} QW_COMMIT_HASH: ${{ env.QW_COMMIT_HASH }} QW_COMMIT_TAGS: ${{ env.QW_COMMIT_TAGS }} working-directory: ./quickwit - name: Create Lambda zip run: | cd quickwit/target/aarch64-unknown-linux-gnu/release cp quickwit-aws-lambda-leaf-search bootstrap zip quickwit-aws-lambda-${{ env.ASSET_VERSION }}-aarch64.zip bootstrap mv quickwit-aws-lambda-${{ env.ASSET_VERSION }}-aarch64.zip ../../../../ - name: Upload to GitHub release uses: quickwit-inc/upload-to-github-release@9b2c40fba23bf8dea05b7d2eece24cbc95d4a190 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} with: file: quickwit-aws-lambda-${{ env.ASSET_VERSION }}-aarch64.zip overwrite: true draft: true tag_name: ${{ env.ASSET_VERSION }} ================================================ FILE: .github/workflows/publish_nightly_packages.yml ================================================ name: Build and publish nightly packages on: workflow_dispatch: schedule: - cron: "0 5 * * *" permissions: contents: read jobs: build-macos-binaries: name: Build ${{ matrix.target }} runs-on: macos-latest permissions: contents: write actions: write strategy: fail-fast: false matrix: target: [x86_64-apple-darwin, aarch64-apple-darwin] steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - uses: ./.github/actions/cargo-build-macos-binary with: target: ${{ matrix.target }} version: nightly token: ${{ secrets.GITHUB_TOKEN }} build-linux-binaries: strategy: fail-fast: false matrix: target: [x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu] name: Build ${{ matrix.target }} runs-on: ubuntu-latest permissions: contents: write actions: write steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - uses: ./.github/actions/cross-build-binary with: target: ${{ matrix.target }} version: nightly token: ${{ secrets.GITHUB_TOKEN }} ================================================ FILE: .github/workflows/publish_release_packages.yml ================================================ name: Build and publish release packages on: push: tags: - "v*" permissions: contents: read jobs: build-macos-binaries: name: Build ${{ matrix.target }} runs-on: macos-latest permissions: contents: write actions: write strategy: matrix: target: [x86_64-apple-darwin, aarch64-apple-darwin] steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Extract asset version run: echo "ASSET_VERSION=${GITHUB_REF/refs\/tags\//}" >> $GITHUB_ENV - uses: ./.github/actions/cargo-build-macos-binary with: target: ${{ matrix.target }} version: ${{ env.ASSET_VERSION }} token: ${{ secrets.GITHUB_TOKEN }} build-linux-binaries: strategy: matrix: target: [x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu] name: Build ${{ matrix.target }} runs-on: ubuntu-latest permissions: contents: write actions: write steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - name: Extract asset version run: echo "ASSET_VERSION=${GITHUB_REF/refs\/tags\//}" >> $GITHUB_ENV - uses: ./.github/actions/cross-build-binary with: target: ${{ matrix.target }} version: ${{ env.ASSET_VERSION }} token: ${{ secrets.GITHUB_TOKEN }} ================================================ FILE: .github/workflows/requirements.txt ================================================ # contains pinned dependencies for installing pipenv to ensure repeatable builds in CI/CD workflows certifi==2025.10.5 \ --hash=sha256:0f212c2744a9bb6de0c56639a6f68afe01ecd92d91f14ae897c4fe7bbeeef0de \ --hash=sha256:47c09d31ccf2acf0be3f701ea53595ee7e0b8fa08801c6624be771df09ae7b43 distlib==0.4.0 \ --hash=sha256:9659f7d87e46584a30b5780e43ac7a2143098441670ff0a49d5f9034c54a6c16 \ --hash=sha256:feec40075be03a04501a973d81f633735b4b69f98b05450592310c0f401a4e0d filelock==3.20.3 \ --hash=sha256:18c57ee915c7ec61cff0ecf7f0f869936c7c30191bb0cf406f1341778d0834e1 \ --hash=sha256:4b0dda527ee31078689fc205ec4f1c1bf7d56cf88b6dc9426c4f230e46c2dce1 packaging==25.0 \ --hash=sha256:29572ef2b1f17581046b3a2227d5c611fb25ec70ca1ba8554b24b0e69331a484 \ --hash=sha256:d443872c98d677bf60f6a1f2f8c1cb748e8fe762d2bf9d3148b5599295b0fc4f pipenv==2025.0.4 \ --hash=sha256:36fc2a7841ccdb2f58a9f787b296c2e15dea3b5b79b84d4071812f28b7e8d7a2 \ --hash=sha256:e1fbe4cfd25ab179f123d1fbb1fa1cdc0b3ffcdb1f21c775dcaa12ccc356f2bb platformdirs==4.5.0 \ --hash=sha256:70ddccdd7c99fc5942e9fc25636a8b34d04c24b335100223152c2803e4063312 \ --hash=sha256:e578a81bb873cbb89a41fcc904c7ef523cc18284b7e3b3ccf06aca1403b7ebd3 virtualenv==20.36.1 \ --hash=sha256:575a8d6b124ef88f6f51d56d656132389f961062a9177016a50e4f507bbcc19f \ --hash=sha256:8befb5c81842c641f8ee658481e42641c68b5eab3521d8e092d18320902466ba ================================================ FILE: .github/workflows/scorecard.yml ================================================ name: OpenSSF Scorecard on: schedule: - cron: '0 0 * * 0' push: branches: - main permissions: contents: read jobs: analysis: name: Scorecards analysis runs-on: ubuntu-latest permissions: # Needed to upload the results to code-scanning dashboard. security-events: write # Needed to publish results id-token: write actions: read contents: read steps: - name: 'Checkout code' uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 with: persist-credentials: false - name: 'Run analysis' uses: ossf/scorecard-action@4eaacf0543bb3f2c246792bd56e8cdeffafb205a # v2.4.3 with: results_file: results.sarif results_format: sarif repo_token: ${{ secrets.GITHUB_TOKEN }} publish_results: true # Upload the results as artifacts. - name: 'Upload artifact' uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0 with: name: SARIF file path: results.sarif retention-days: 5 # Upload the results to GitHub's code scanning dashboard. - name: 'Upload to code-scanning' uses: github/codeql-action/upload-sarif@cdefb33c0f6224e58673d9004f47f7cb3e328b89 # v4.31.10 with: sarif_file: results.sarif ================================================ FILE: .github/workflows/ui-ci.yml ================================================ name: UI CI on: workflow_dispatch: pull_request: paths: - "quickwit/quickwit-ui/**" - ".github/workflows/ui-ci.yml" push: branches: - main - trigger-ci-workflow paths: - "quickwit/quickwit-ui/**" - ".github/workflows/ui-ci.yml" permissions: contents: read jobs: checks: name: Lint, type check & unit tests runs-on: ubuntu-latest steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0 with: node-version: 24 cache: "yarn" cache-dependency-path: quickwit/quickwit-ui/yarn.lock - name: Install JS dependencies run: yarn --cwd quickwit-ui install working-directory: ./quickwit - name: Lint run: yarn --cwd quickwit-ui lint working-directory: ./quickwit - name: Type check run: yarn --cwd quickwit-ui type working-directory: ./quickwit - name: Unit tests run: yarn --cwd quickwit-ui test working-directory: ./quickwit e2e: name: Playwright e2e runs-on: ubuntu-latest permissions: contents: read actions: write services: postgres: image: postgres:latest ports: - 5432:5432 env: POSTGRES_USER: quickwit-dev POSTGRES_PASSWORD: quickwit-dev POSTGRES_DB: quickwit-metastore-dev options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 env: CARGO_INCREMENTAL: 0 RUST_BACKTRACE: 1 RUSTFLAGS: -Dwarnings --cfg tokio_unstable RUSTDOCFLAGS: -Dwarnings -Arustdoc::private_intra_doc_links QW_TEST_DATABASE_URL: postgres://quickwit-dev:quickwit-dev@postgres:5432/quickwit-metastore-dev steps: - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 - uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0 with: node-version: 24 cache: "yarn" cache-dependency-path: quickwit/quickwit-ui/yarn.lock - name: Setup stable Rust Toolchain uses: dtolnay/rust-toolchain@f7ccc83f9ed1e5b9c81d8a67d7ad1a747e22a561 # master with: toolchain: stable - name: Setup Rust cache uses: Swatinem/rust-cache@779680da715d629ac1d338a641029a2f4372abb5 # v2.8.2 with: workspaces: "./quickwit -> target" shared-key: "quickwit-cargo" - name: Install JS dependencies run: yarn --cwd quickwit-ui install working-directory: ./quickwit - name: Install Playwright browsers run: npx playwright install chromium --with-deps --only-shell working-directory: ./quickwit/quickwit-ui - name: Build UI run: CI=false yarn --cwd quickwit-ui build working-directory: ./quickwit - name: Build Quickwit run: | sudo apt-get update && sudo apt-get -y install protobuf-compiler cargo build --features=postgres working-directory: ./quickwit - name: Run e2e tests run: | mkdir -p qwdata cargo run --features=postgres -- run --service searcher --service metastore --config ../config/quickwit.yaml & yarn --cwd quickwit-ui e2e-test working-directory: ./quickwit ================================================ FILE: .gitignore ================================================ # Generated by Cargo # will have compiled files and executables **/target/** **/proptest-regressions **/perf.data* **/flamegraph.svg local/** quickwit/quickwit-ui/package-lock.json **/.DS_Store TODO.md QUESTIONS.txt # Remove Cargo.lock from gitignore if creating an executable, leave it for libraries # More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html #Cargo.lock # These are backup files generated by rustfmt **/*.rs.bk .env .idea .vscode .vscode-license deps elastic-search-artifacts qwdata # Generated by prost/tonic build *_descriptor.bin ================================================ FILE: .localstack/init.sh ================================================ #!/usr/bin/env bash set -eu awslocal s3 mb s3://quickwit-dev awslocal s3 mb s3://quickwit-integration-tests && awslocal s3 rm --recursive s3://quickwit-integration-tests if ! awslocal kinesis list-streams | grep -q quickwit-dev-stream ; then awslocal kinesis create-stream --stream-name quickwit-dev-stream --shard-count 3 fi ================================================ FILE: CHANGELOG.md ================================================ # [0.9.0] ### Added - Add Ingest V2 (#5600, #5566, #5463, #5375, #5350, #5252 #5202) - Add SQS source (#5374, #5335, #5148) - Disable control plane check for searcher (#5599, #5360) - Partially implement `_elastic/_cluster/health` (#5595) - Make Jaeger span attribute-to-tag conversion exhaustive (#5574) - Use `content_length_limit` for ES bulk limit (#5573) - Limit and monitor warmup memory usage (#5568) - Add eviction metrics to caches (#5523) - Record object storage request latencies (#5521) - Add some kind of throttling on the janitor to prevent it from overloading (#5510) - Prevent single split searches from different `leaf_search` from interleaving (#5509) - Retry on S3 internal error (#5504) - Allow specifying OTEL index ID in header (#5503) - Add a metric to count storage errors and their error code (#5497) - Add support for concatenated fields (#4773, #5369, #5331) - Add number of splits per root/leaf search histograms (#5472) - Introduce a searcher config option to timeout get requests (#5467) - Add fingerprint to task in cluster state (#5464) - Enrich root/leaf search spans with number of docs and splits (#5450) - Add some additional search metrics (#5447) - Improve GC resilience and add metrics (#5420) - Enable force shutdown with 2nd Ctrl+C (#5414) - Add request_timeout_secs config to searcher config (#5402) - Memoize S3 client (#5377) - Add more env var config for Postgres (#5365) - Enable str fast field range queries (#5324) - Allow querying non-existing fields (#5308) - Support updating doc mapper through api (#5253) - Add optional special handling for hex in code tokenizer (#5200) - Added a circuit breaker layer (#5134) - Various performance optimizations in Tantivy (https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md) ### Changed - Parse datetimes and timestamps with leading and/or trailing whitespace (#5544) - Restrict maturity period to retention (#5543) - Wait for merge at end of local ingest (#5542) - Log PostgreSQL metastore error (#5530) - Update azure multipart policy (#5553) - Stop relying on our own version of pulsar-rs (#5487) - Handle nested OTLP values in attributes and log bodies (#5485) - Improve merge pipeline finalization (#5475) - Allow failed splits in root search (#5440) - Batch delete from GC (#5404, #5380) - Make some S3 errors retryable (#5384) - Change default timestamps in OTEL logs (#5366) - Only return root spans for Jaeger HTTP API (#5358) - Share aggregation limit on node (#5357) ### Fixed - Fix existence queries for nested fields (#5581) - Fix lenient option with wildcard queries (#5575) - Fix incompatible ES Java date format (#5462) - Fix bulk api response order (#5434) - Fix pulsar finalize (#5471) - Fix pulsar URI scheme (#5470) - Fix grafana searchers dashboard (#5455) - Fix jaeger http endpoint (#5378) - Fix file re-ingestion after EOF (#5330) - Fix configuration interpolation (#5403) - Fix jaeger duration parse error (#5518) - Fix unit conversion in jaeger http search endpoint (#5519) ### Removed - Remove support for 2-digit years in java datetime parser (#5596) - Remove DocMapper trait (#5508) - Remove support for AWS Lambda (#5884) - Remove search stream endpoint (#5886) # [0.8.1] ### Fixed - Bug in the chitchat digest message serialization (chitchat#144) ## [0.8.0] ### Added - Remove some noisy logs (#4447) - Add `/{index}/_stats` and `/_stats` ES API (#4442) - Use `search_after` in ES scroll API (#4280) - Add support for wildcard exclusion in index patterns (#4458) - Add `.` support in DSL indentifiers (#3989) - Add cat indices ES API (#4465) - Limit concurrent merges (#4473) - Add Index Template API and auto create index (#4456) (only available with ingest V2) - Add support for compressed ES `_bulk` requests (#4506) - Add support for slash `/` character in field names (#4510) - Handle SIGTERM shutdown signal (#4539) - Add `start_timestamp` and `end_timestamp` filter to ES `_field_caps` API (#4547) - Limit the number of merge pipelines that can be spawned concurrently (#4574) - Add support for `_source_excludes` and `_source_includes` query parameters in ES API (#4572) - Add gRPC metrics layer to clients and servers (#4591) - Add additional cluster metrics (#4597) - Add index patterns query param on GET `/indexes` endpoint (#4600) - Add support for GCS file backed metastore (#4604) - Add default search fields for OTEL traces index (#4602) - Add support for delete index in ES API (#4606) - Add a handler to dynamically change the log level (#4662) - Add REST endpoint to parse a query into a query AST (#4652) - Add postgresql index and use `IN` instead of many `OR` (#4670) - Add support for `_source_excludes`, `_source_includes`, `extra_filters` in `_msearch` ES API (#4696) - Handle `track_total_size` on request ES body (#4710) - Add a metric for the number number of indexes (#4711) - Add various performance optimizations in Quickwit and Tantivy More details in tantivy's [changelog](https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md). ### Fixed - Fix aggregation result on empty index (#4449) - Fix Gzip file source (#4457) - Rate limit noisy logs (#4483) - Prevent the exponential backoff from overflowing after 64 attempts (#4501) - Remove field presence in ES `_field_caps` API (#4492) - Remove `source` in ES parameter, remove unsupported field `fields` in response (#4590) - Fix aggregation `split_size` parameter, add docs and test (#4627) - Various fixes in chitchat (gossip): more details in [chitchat commit history](https://github.com/quickwit-oss/chitchat/commits/main/?since=2024-01-08&until=2024-03-13) - Various fixes in mrecordlog (WAL): more details in [mrecordlog commit history](https://github.com/quickwit-oss/mrecordlog/commits/main/?since=2024-01-08&until=2024-03-13) ### Changed - (Breaking) [Add ZSTD compression to chitchat's Deltas](https://github.com/quickwit-oss/chitchat/pull/112) ### Removed ### Migration from 0.7.x to 0.8.0 To deploy Quickwit 0.8.0, you must either: - **shutdown down** your cluster **entirely** before deploying, or - **restart all** the nodes of your cluster after deploying. Because we made some breaking changes in the gossip protocol (chitchat), nodes running different versions of Quickwit cannot communicate with each other and crash upon receiving messages that do not match their release version. The new protocol is now versioned, and future updates of the gossip protocol will be backward compatible. ## [0.7.1] ### Added - Add es _count API (#4410) - Add _elastic/_field_caps API (#4350) - Make gRPC message size configurable (#4388) - Add API endpoint to get some control-plan internal info (#4339) - Add Google Cloud Storage Implementation available for storage paths starting with `gs://` (#4344) ### Changed - Return 404 on index not found in ES Bulk API (#4425) - Allow $ and @ characters in field names (#4413) ### Fixed - Assign all sources/shards, even if this requires exceeding the indexer #4363 - Fix traces doc mapping (service name set as fast) and update default otel logs index ID to `otel-logs-v0_7` (#4401) - Fix parsing multi-line queries (#4409) - Fix range query for optional fast field panics with Index out of bounds (#4362) ### Migration from 0.7.0 to 0.7.1 Quickwit 0.7.1 will create the new index `otel-logs-v0_7` which is now used by default when ingesting data with the OTEL gRPC and HTTP API. In the traces index `otel-traces-v0_7`, the `service_name` field is now fast. No migration is done if `otel-traces-v0_7` already exists. If you want `service_name` field to be fast, you have to delete first the existing `otel-traces-v0_7` index or create your own index. ## [0.7.0] ### Added - Elasticsearch-compatible API - Added scroll and search_after APIs and support for multi-index search queries - Added exists, multi-match, match phrase prefix, match bool prefix, bool queries - Added `_field_caps` API - Added support for OTLP over HTTP API (Protobuf only) (#4335) - Added Jaeger REST endpoints for Grafana tracing support (#4197) - Added support for injecting custom HTTP headers and moved REST config parameters into REST config section (#4198) - Added support for OTLP trace data in arbitrary sources - Commit Kafka offsets on suggest truncate (#3638) - Honor `auto.offset.reset` parameter in Kafka source (#4095) - Added exact count optimization (#4019) - Added stream splits gRPC (#4109) - Adding a split cache in Searchers (#3857) - Added `coerce` and `output_format` options for numeric fields (#3704) - Added `PhraseMatchQuery` and `MultiMatchQuery` (#3727) - Added Elasticsearch's `TermsQuery` (#3747) - Added GCP PubSub source (#3720) - Parse timestamp strings (#3639) - Added Digital Ocean storage flavor (#3632) - Added new tokenizers: `source_code_default`, `source_code`, `multilang` (#3647, #3655, #3608) ### Fixed - Fixed dates in UI (#4277) - Fixed duplicate splits planned on pipeline crash-respawn (#3854) - Fixed sorting (#3799) More details in tantivy's [changelog](https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md). ### Changed - Improve OTEL traces index config (#4311) - OTEL endpoints are now using by default indexes `otel-logs-v0_7` and `otel-traces-v0_7` instead of `otel-logs-v0_6` and `otel-traces-v0_6` - OTEL indexes have more fields stored as "fast" and have Trace and Span ID bytes field in hex format - Increased the gRPC payload limits from 10MiB to 20MiB (#4227) - Reject malformed Elasticsearch API requests (#4175) - Better logging when doc processing fails (#4323) - Search performance improvements - Indexing performance improvements ### Removed ### Migration from 0.6.x to 0.7 The format of the index and internal objects stored in the metastore of 0.7 is backward compatible with 0.6. If you are using the OTEL indexes and ingesting data into indexes the `otel-logs-v0_6` and `otel-traces-v0_6`, you must stop indexing before upgrading. Indeed, the first time you start Quickwit 0.7, it will update the doc mapping fields of Trace ID and Span ID of those two indexes by changing their input/output formats from base64 to hex. This is automatic: you don't have to perform any manual operation. Quickwit 0.7 will create new indexes `otel-logs-v0_7` and `otel-traces-v0_7`, which are now used by default when ingesting data with the OTEL gRPC and HTTP API. The Jaeger gRPC and HTTP APIs will query both `otel-traces-v0_6` and `otel-traces-v0_7` by default. It's possible to define the index ID you want to use for OTEL gRPC endpoints and Jaeger gRPC API by setting the request header `qw-otel-logs-index` or `qw-otel-traces-index` to the index ID you want to target. ## [0.6.1] ### Added - Support of phrase prefix queries in the query language. ### Fixed - Fix timestamp field which was not allowed when defined in an object mapping. - Fix querying of integer on a JSON field (no document were returned). ## [0.6.0] - 2023-06-03 ### Added - Elasticsearch/Opensearch compatible API. - New columnar format: - Fast fields can now have any cardinality (Optional, Multivalued, restricted). In fact cardinality is now only used to format the output. - Dynamic Fields are now fast fields. - String fast fields now can be normalized. - Various parameters of object storages can now be configured. - The ingest API makes it possible to force a commit, or wait for a scheduled commit to occur. - Ability to parse non-JSON data using VRL to extract some structure from documents. - Object storage can now use the `virtual-hosted–style`. - `date_histogram` aggregation. - `percentiles` aggregation. - Added support for Prefix Phrase query. - Added support for range queries. - The query language now supports different date formats. - Added support for base16 input/output configuration for bytes field. You can search for bytes fields using base16 encoded values. - Autotagging: fields used in the partition key are automatically added to tags. - Added arm64 docker image. - Added CORS configuration for the REST API. ### Fixed - Major bug fix that required to restart quickwit when deleting and recreating an index with the same name. - The number of concurrent GET requests to object stores is now limited. This fixes a bug observed with when requested a lot of documents from MinIO. - Quickwit now searches into resource attributes when receiving a Jaeger request carrying tags - Object storage can be figured to: - avoid Bulk delete API (workaround for Google Cloud Storage). - Use virtual-host style addresses (workaround for Alibaba Object Storage Service). - Fix aggregation min doc_count empty merge bug. - Fix: Sort order for term aggregations. - Switch to ms in histogram for date type (aligning with ES). ### Improvements - Search performance improvement. - Aggregation performance improvement. - Aggregation memory improvement. More details in tantivy's [changelog](https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md). ### Changed - Datetime now have up to a nanosecond precision. - By default, quickwit now uses the node's hostname as the default node ID. - By default, Quickwit is in dynamic mode and all dynamic fields are marked as fast fields. - JSON field uses by default the raw tokanizer and is set to fast field. - Various performance/compression improvements. - OTEL indexes Trace ID and Span ID are now bytes fields. - OTEL indexes stores timestamps with nanosecond precision. - pan status is now indexed in the OTEL trace index. - Default and raw tokenizers filter tokesn longer than 255 bytes instead of 40 bytes. ## [0.5.0] - 2023-03-16 ### Added - gRPC OpenTelemetry Protocol support for traces - gRPC OpenTelemetry Protocol support for logs - Control plane (indexing tasks scheduling) - Ingest API rate limiter - Pulsar source - VRL transform for data sources - REST API enhanced to fully manage indexes, sources, and splits - OpenAPI specification and swagger UI for all REST available endpoints - Large responses from REST API can be compressed - Add bulk stage splits method to metastore - MacOS M1 binary - Doc mapping field names starting with `_` are now valid ### Fixed - Fix UI index completion on search page - Fix CLI index describe command to show stats on published splits - Fix REST API to always return on error a body formatted as `{"message": "error message"}` - Fixed REST status code when deleting unexisting index, source and when fetching splits on unexisting index ### Changed - Source config schema (breaking or not? use serde rename to be not breaking?) - RocksDB replaced by [mrecordlog](https://github.com/quickwit-oss/mrecordlog) to store ingest API queues records - (Breaking) Indexing partition key new DSL - (Breaking) Helm chart updated with the new CLI - (Breaking) CLI indexes, sources, and splits commands use the REST API - (Breaking) Index new format: you need to reindex all your data ## [0.4.0] - 2022-12-03 ### Added - Boolean, datetime, and IP address fields - Chinese tokenizer - Distributed indexing (Kafka only) - gRPC metastore server - Index partitioning - Kubernetes - Node config templating - Prometheus metrics - Retention policies - REST API for CRUD operations on indexes/sources - Support for Azure Blob Storage - Support for BM25 document scoring - Support for deletions - Support for slop in phrase queries - Support for snippeting ### Fixed - Fixed cache misses during search fetch docs phase - Fixed credentials leak in metastore URI - Fixed GC scalability issues - Fixed support for multi-source ### Changed - Changed default docstore block size to 1 MiB and compression algorithm to ZSTD - Quickwit now relies on sqlx rather than Diesel for PostgreSQL interactions. Migrating from 0.3 should work as expected. Migrating from earlier version however is not supported. ### Removed - Removed support for i64 as timestamp field - Removed support for sorting index by field ### Security - Forbid access to paths with `..` at storage level ## [0.3.1] - 2022-06-22 ### Added - Add support for Google Cloud Storage - Sort hits by timestamp desc by default in search UI - Add `description` attribute to field mappings - Display split state in output of `quickwit split list` command ### Fixed - Clean up local split cache after index deletion - Fix API URLs displayed for copy and paste in UI - Fix custom S3 endpoint with trailing `/` - Fix `quickwit index create` command with `--overwrite` option ## [0.3.0] - 2022-05-31 ### Added - Embedded UI for displaying search hits and cluster state - Schemaless indexing with JSON field - Ingest API (Elasticsearch-compatible) - Aggregation queries - Support for Amazon Kinesis ### Fixed - Switched cluster membership algorithm from S.W.I.M. to Chitchat ### Removed - u64 as date field ## [0.2.1] - 2022-02-28 ### Added - Query validation against index schema before dispatch to leaf nodes (#1109, @linxGnu) - Support for custom S3 endpoint (#1108) - Warm up terms and fastfields concurrently (#1147) ### Fixed - Minor bug in leaf search stream (#1110) - Default index root URI and metastore URI correctly default to data dir (#1140, @ddelemeny) ### Removed - QW_ENV environment variable ### Security - Compiled binaries with Rust 1.58.1, which fixes CVE-2022-21658 ## [0.2.0] - 2022-01-12 ## [0.1.0] - 2021-07-13 ================================================ FILE: CODE_OF_CONDUCT.md ================================================ # Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. ## Scope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at adrien+cc at quickwit dot io. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: ### 1. Correction **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. ### 2. Warning **Community Impact**: A violation through a single incident or series of actions. **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. ### 3. Temporary Ban **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within the community. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at [https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0]. Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][Mozilla CoC]. For answers to common questions about this code of conduct, see the FAQ at [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at [https://www.contributor-covenant.org/translations][translations]. [homepage]: https://www.contributor-covenant.org [v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html [Mozilla CoC]: https://github.com/mozilla/diversity [FAQ]: https://www.contributor-covenant.org/faq [translations]: https://www.contributor-covenant.org/translations ================================================ FILE: CODE_STYLE.md ================================================ # Quickwit Coding Style This document resumes a couple of points we try to embrace in our coding style. Some of these points take an opinionated side on a trade-off story. The description will try to make that clear. The driving motivation of this code style is to make your code more readable. Readable is one word that hides several dimensions: - the reader understands the intent very rapidly - the reader can proofread. It can become confident that the code is correct very easily. Noticing how the two are different should not require too much squinting. Shoot for *proofreadability*. ## Code reviews Do a pass on your own code before sending it for review to avoid wasting the review time. Also, a trivial code style issues can come in the way and avoid spotting deeper issues with the code. As a reviewer, your first mission is proofreading. If you find a logical bug, feel good. You did an awesome job today. Your second goal is to make sure the code quality stays high. You can express "nitpicks": suggestions about some local aspect of the code that do not matter too much. Just prepend "nitpick:" to your comment. You can also express an opinion/advice that you know is not universal. Make sure you make it clear to the reviewee that it is fine to ignore the comment. Do not use rhetorical questions... If you are 95% sure of something, there is no need to express it as a question. Prefer `I believe this should be n+1` to `Shouldn't this be n+1?`. The issue with rhetorical questions is that when you will have a genuine question, reviewees may over interpret it as an affirmation. As a reviewee, if you are not used to CRs, it can feel like an adversarial process. Relax. This is normal to end up with a lot of comments on your first few CRs. You might feel like the comments are unjustified, try as much as possible to not feel frustrated. If you want to discuss it, the best place is the chat, or maybe send a PR to modify this document. But remember to pick your battles... If you think it does not matter much but it takes 2 secs to fix, just consider doing what is suggested by the reviewer or this style guide. ## Rust gives us a lot of tools... this does not mean we need to abuse them. Rust is an amazing language. It offers all kinds of tools to allow for zero-cost code reuse. Within these tools, however, generics and macros tend to hurt readability (and compile-time). Let's ONLY use them where necessary. The same goes with the chaining iterator style. When coupled with error handling, rust's chaining iterator style can hurt readability. Using a good old procedural for-loop is fine and recommended in that case. **example needed** ## Naming Function and variable names are key for readability. A good function name is often sufficient for the reader to build reasonable expectations of what it does. If this implies long names, let's have very long names. Trying to fit this rule has an interesting side effect. Nobody likes to type long function names. It just feels ugly. But these are frequently symptoms of a badly organized code, and it can help spot refactoring opportunities. **example needed** ## Explanatory variables One incredibly powerful tool and simple tool to help make your code more readable is to introduce explanatory variables. Explanatory variables are intermediary variables that were not really necessary, but make it possible -through their names- to convey their semantics to the reader. **example needed** ## Shadowing As much as possible, do not use reuse the same variable name in a function. It is never necessary, very rarely helpful and can hurt. ## Types Rust handles type elision. That's great. Chances are, your editor even automatically hints the type of your variables. Sometimes, however, it can be helpful for the reviewer to have the type of some very strategic variables. **example needed** ## Early returns We prefer early return. Rather than chaining `else` statement, we prefer to isolate corner case in short `if` statement to prevent nesting **example needed** ## Invariants A good idea to help reviewers proofread your code is to identify invariants and express them as `debug_assert`. These assert will not be part of the release binary and won't hurt the execution time. **example needed** ## Errors and log messages Error and log messages follow the same format. They should be concise, lowercase (except proper names), and without trailing punctuation. As a loose rule, where it does not hurt readability, log messages should rely on `tracing` structured logging instead of templating. In other words, prefer: `warn!(remaining=remaining_attempts, "trubulizor rpc plane retry failed")` to `warn!("trubulizor rpc plane retry failed ({remaining_attempts} attempts remaining)")` ### Error Examples - "failed to start actor runtimes" - "cannot join PostgreSQL URI {} with path {:?}" - "could not find split metadata in Metastore {}" - "unknown output format {:?}" ### Log examples ## Comments We use on the same code style, [rustc's doc comments](https://doc.rust-lang.org/1.0.0/style/style/comments.html). In particular, the summary line should be written in third-person singular present indicative form. No rustdoc in Quickwit or in private API is ok. No rustdoc on Tantivy public API is not ok. We usually do not expect comments to contain any implementation details. To some extent, it is normal for the user to have to look at the code. When it is not clear, comments should convey: - intent - context (links to a Wikipedia page or a paper, link to the original issue can be helpful too) - hidden contracts... but really you should avoid those. Inline comments in the code can be very useful to help the reader understand the justification of a thorny piece of code. **example needed** ## Hidden contracts We call hidden contract, a pre-condition on the arguments that is not enforced by their types. Sometimes, hidden contracts are unavoidable. For instance, a binary search requires the array to be sorted. Whenever possible, you should avoid having hidden contracts. To avoid hidden contracts, you should consider: - changing your argument types to have the type system enforce the contract - internalize the contract enforcement. For instance, the following function is not good because it hides a contract on values not being empty: ``` fn min(&self, values: &[usize]) -> usize { let mut min_val = usize::MAX; for val in values { min_val = min_val.min(val) } min_val } ``` It can be done by changing the prototype to a `Result` or an `Option`. In addition, while the author might have thought that the `usize::MAX` trick was a nice touch, it can easily backfire. Panicking is often better than returning a wrong result. The better approach here is of course an `Option` like `Iterator::min` does. Another way to internalize the contract enforcement is to move some logic from the caller to within the function. For instance: ``` // The algorithms requires splits to be sorted by `end_time` fn merge_candidates(splits: &mut Vec) -> Vec ``` It is tempting to rely on the fact that splits `Vec` is always sorted on the caller side and put this as a hidden contract. If it is not too much work, just redoing the sorting within merge candidates is a good idea. For the above function, that extra work is tiny. By the way, did you know Rust's std sort is inspired by timsort? It will perform in linear time if the array is already sorted... When implementing a function with a hidden contract, as long as it does not hurt the overall performance, add an assert statement to your code to check the contract. (For instance, check that the array is sorted). **example needed* ## Tests Test do not need to match the same quality as the original code. When a bug is encountered, it is ok to introduce a test that seems weirdly overfitted to the specific issue. A comment should then add a link to the issue. Unit test should run fast, and if possible they should not do any IO. Code should be structured to make unit testing possible. Some of our unit tests would not be considered good unit tests in some companies, and that's ok. Here are the controversial bits: ### Not just for spotting regression Our unit tests are not here just to spot regression. They are also here to check the correctness of our code. ### Not just testing public API Unit test do not only test public API. Complex code often calls half a dozen smaller functions. The cardinality of the corner case of the complex code can make it difficult to test all corner case. On the other hand, the smaller functions could be tested exhaustively. For this reason, testing internal private functions is actually encouraged. ### Not always "unit" tests Ideally, unit tests should be testing one thing and one thing only, but if they don't and it helps cover more ground, this is ok. ### Not necessarily deterministic. Finally, unit tests are not necessarily deterministic. We really like proptests. When proptesting, make sure to reduce as much as possible the space of exploration to get the most out of it. ## async vs sync Your async code should block for at most 500 microseconds. If you are unsure whether your code blocks for 500 microseconds, or if it is a non-trivial question, it should run via `tokio::spawn_block`. ================================================ FILE: CONTRIBUTING.md ================================================ # Contributing to Quickwit There are many ways to contribute to Quickwit. Code contributions are welcome of course, but also bug reports, feature requests, and evangelizing are as valuable. # Submitting a PR Check if your issue is already listed on [github](https://github.com/quickwit-oss/quickwit/issues). If it is not, create your own issue. Please add the following phrase at the end of your commit `Closes #`. It will automatically link your PR in the issue page. Also, once your PR is merged, it will close the issue. If your PR only partially addresses the issue and you would like to keep it open, just write `See #`. Feel free to send your contribution in an unfinished state to get early feedback. In that case, simply mark the PR with the tag [WIP] (standing for work in progress). ## PR verification checks When you submit a pull request to the project, the CI system runs several verification checks. After your PR is merged, a more exhaustive list of tests will be run. You will be notified by email from the CI system if any issues are discovered, but if you want to run these checks locally before submitting PR or in order to verify changes you can use the following commands in the root directory: 1. To verify that all tests are passing, run `make test-all`. 2. To fix code style and format as well as catch common mistakes run `make fix`. Alternatively, run `make -k test-all docker-compose-down` to tear down the Docker services after running all the tests. 3. To build docs run `make build-rustdoc`. # Development ## Setup & run tests ### Local Development 1. Install Rust, CMake, Docker (https://docs.docker.com/engine/install/) and Docker Compose (https://docs.docker.com/compose/install/) 2. Install node@24 and `npm install -g yarn` 3. Install awslocal https://github.com/localstack/awscli-local 4. Install protoc https://grpc.io/docs/protoc-installation/ (you may need to install the latest binaries rather than your distro's flavor) 5. Install nextest https://nexte.st/docs/installation/pre-built-binaries/ ### GitHub Codespaces [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/quickwit-oss/quickwit?devcontainer_path=.devcontainer/devcontainer.json) GitHub Codespaces provides a fully configured development environment in the cloud, making it easy to get started with Quickwit development. By clicking the badge above, you can create a codespace with all the necessary tools installed and configured. ### Running tests Run `make test-all` to run all tests. ## Useful commands * `make test-all` - starts necessary Docker services and runs all tests. * `make -k test-all docker-compose-down` - the same as above, but tears down the Docker services after running all the tests. * `make fmt` - runs formatter, this command requires the nightly toolchain to be installed by running `rustup toolchain install nightly`. * `make fix` - runs formatter and clippy checks as well as removing unused dependencies (requires `cargo install cargo-machete`). * `make typos` - runs the spellcheck tool over the codebase. (Install by running `cargo install typos-cli`) * `make doc` - builds docs. * `make docker-compose-up` - starts Docker services. * `make docker-compose-down` - stops Docker services. * `make docker-compose-logs` - shows Docker logs. ## Start the UI 1. Switch to the `quickwit` subdirectory of the project and create a data directory `qwdata` there if it doesn't exist 2. Start a server `cargo r run --config ../config/quickwit.yaml` 3. `yarn --cwd quickwit-ui install` and `yarn --cwd quickwit-ui start` 4. Open your browser at `http://localhost:3000/ui` if it doesn't open automatically ## Running UI Tests 1. Run `yarn --cwd quickwit-ui install` and `yarn --cwd quickwit-ui test` in the `quickwit` directory ## Running UI e2e tests 1. Ensure to run a searcher `cargo r run --service searcher --config ../config/quickwit.yaml` 2. Run `yarn --cwd quickwit-ui e2e-test` ## Running services such as Amazon Kinesis or S3, Kafka, or PostgreSQL locally. 1. Ensure Docker and Docker Compose are correctly installed on your machine (see above) 2. Run `make docker-compose-up` to launch all the services or `make docker-compose-up DOCKER_SERVICES=kafka,postgres` to launch a subset of services. ## Tracing with Jaeger 1. Ensure Docker and Docker Compose are correctly installed on your machine (see above) 2. Start the Jaeger services (UI, collector, agent, ...) running the command `make docker-compose-up DOCKER_SERVICES=jaeger` 3. Start Quickwit with the following environment variables: ``` OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER=true ``` 4. Open your browser and visit [localhost:16686](http://localhost:16686/) ## Using tokio console 1. Install tokio-console by running `cargo install tokio-console`. 2. Install the quickwit binary in the quickwit-cli folder `RUSTFLAGS="--cfg tokio_unstable" cargo install --path . --features tokio-console` 3. Launch a long running command such as index and activate tokio with the: `QW_ENABLE_TOKIO_CONSOLE=1 quickwit index ...` 4. Run `tokio-console`. ## Building binaries Currently, we use [cross](https://github.com/rust-embedded/cross) to build Quickwit binaries for different architectures. For this to work, we've had to customize the docker images cross uses. These customizations can be found in docker files located in the `./cross-images` folder. To make cross take into account any change on those docker files, you will need to build and push the images on Docker Hub by running `make cross-images`. We also have nightly builds that are pushed to Docker Hub. This helps continuously check that our binaries are still built even with external dependency updates. Successful builds let you access the artifacts for the next three days. Release builds always have their artifacts attached to the release. ## Docker images Each merge on the `main` branch triggers the build of a new Docker image available on DockerHub at `quickwit/quickwit:edge`. Tagging a commit also creates a new image `quickwit/quickwit:` if the tag name starts with `v*` or `qw*`. The Docker images are based on Debian. ### Notes on the embedded UI As the react UI is embedded in the rust binary, we need to build the react app before building the binary. Hence `make cross-image` depends on the command `build-ui`. ## Testing release (alpha, beta, rc) The following Quickwit installation command `curl -L https://install.quickwit.io | sh` always installs the latest stable version of quickwit. To make it easier in installing and testing new (alpha, beta, rc) releases, you can manually pull and execute the script as `./install.sh --allow-any-latest-version`. This will force the script to install any latest available release package. ## Tracking licenses We keep track of the licenses used by the open source crates used by this project using [`rust-license-tool`](https://github.com/DataDog/rust-license-tool). The listing is checked every time CI is run. To update the listing, install the tool with `cargo install --git https://github.com/DataDog/rust-license-tool` and then run `dd-rust-license-tool write`. If there are any errors, you may need to update the listing of exceptions in `license-tool.toml`. # Documentation Quickwit documentation is located in the docs directory. ## Generating the CLI docs. The [CLI doc page](docs/reference/cli.md) is partly generated by a script. To update it, first run the script: ```bash cargo run --bin generate_markdown > ../docs/reference/cli_insert.md ``` Then manually edit the [doc page](docs/reference/cli.md) to update it and delete the generated file. There are two comments to indicate where you want to insert the new docs and where it ends: ```markdown [comment]: <> (Insert auto generated CLI docs from here.) ...docs to insert... [comment]: <> (End of auto generated CLI docs.) ``` ================================================ FILE: Dockerfile ================================================ FROM node:24@sha256:b2b2184ba9b78c022e1d6a7924ec6fba577adf28f15c9d9c457730cc4ad3807a AS ui-builder COPY quickwit/quickwit-ui /quickwit/quickwit-ui WORKDIR /quickwit/quickwit-ui RUN touch .gitignore_for_build_directory \ && NODE_ENV=production make install build FROM rust:bookworm@sha256:b5efaabfd787a695d2e46b37d3d9c54040e11f4c10bc2e714bbadbfcc0cd6c39 AS bin-builder ARG CARGO_FEATURES=release-feature-set ARG CARGO_PROFILE=release ARG QW_COMMIT_DATE ARG QW_COMMIT_HASH ARG QW_COMMIT_TAGS ENV QW_COMMIT_DATE=$QW_COMMIT_DATE ENV QW_COMMIT_HASH=$QW_COMMIT_HASH ENV QW_COMMIT_TAGS=$QW_COMMIT_TAGS RUN apt-get -y update \ && apt-get -y install ca-certificates \ clang \ cmake \ libssl-dev \ llvm \ protobuf-compiler \ && rm -rf /var/lib/apt/lists/* COPY quickwit /quickwit COPY config/quickwit.yaml /quickwit/config/quickwit.yaml COPY --from=ui-builder /quickwit/quickwit-ui/build /quickwit/quickwit-ui/build WORKDIR /quickwit RUN rustup toolchain install RUN echo "Building workspace with feature(s) '$CARGO_FEATURES' and profile '$CARGO_PROFILE'" \ && RUSTFLAGS="--cfg tokio_unstable" \ cargo build \ -p quickwit-cli \ --features $CARGO_FEATURES \ --bin quickwit \ $(test "$CARGO_PROFILE" = "release" && echo "--release") \ && echo "Copying binaries to /quickwit/bin" \ && mkdir -p /quickwit/bin \ && find target/$CARGO_PROFILE -maxdepth 1 -perm /a+x -type f -exec mv {} /quickwit/bin \; FROM debian:bookworm-slim@sha256:e899040a73d36e2b36fa33216943539d9957cba8172b858097c2cabcdb20a3e2 AS quickwit LABEL org.opencontainers.image.title="Quickwit" LABEL maintainer="Quickwit, Inc. " LABEL org.opencontainers.image.vendor="Quickwit, Inc." LABEL org.opencontainers.image.licenses="Apache-2.0" RUN apt-get -y update \ && apt-get -y install ca-certificates \ libssl3 \ && rm -rf /var/lib/apt/lists/* WORKDIR /quickwit RUN mkdir config qwdata COPY --from=bin-builder /quickwit/bin/quickwit /usr/local/bin/quickwit COPY --from=bin-builder /quickwit/config/quickwit.yaml /quickwit/config/quickwit.yaml ENV QW_CONFIG=/quickwit/config/quickwit.yaml ENV QW_DATA_DIR=/quickwit/qwdata ENV QW_LISTEN_ADDRESS=0.0.0.0 RUN quickwit --version ENTRYPOINT ["quickwit"] ================================================ FILE: LICENSE ================================================ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2021-Present Datadog, Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: LICENSE-3rdparty.csv ================================================ Component,Origin,License,Copyright adler2,https://github.com/oyvindln/adler2,0BSD OR MIT OR Apache-2.0,"Jonas Schievink , oyvindln " advapi32-sys,https://github.com/retep998/winapi-rs,MIT,Peter Atashian ahash,https://github.com/tkaitchuck/ahash,MIT OR Apache-2.0,Tom Kaitchuck aho-corasick,https://github.com/BurntSushi/aho-corasick,Unlicense OR MIT,Andrew Gallant aliasable,https://github.com/avitex/rust-aliasable,MIT,avitex alloca,https://github.com/playXE/alloca-rs,MIT,"Adel Prokurov , StackOverflowExcept1on" allocator-api2,https://github.com/zakarumych/allocator-api2,MIT OR Apache-2.0,Zakarum android_system_properties,https://github.com/nical/android_system_properties,MIT OR Apache-2.0,Nicolas Silva anes,https://github.com/zrzka/anes-rs,MIT OR Apache-2.0,Robert Vojta ansi-str,https://github.com/zhiburt/ansi-str,MIT,Maxim Zhiburt ansitok,https://gitlab.com/zhiburt/ansitok,MIT,Maxim Zhiburt anstream,https://github.com/rust-cli/anstyle,MIT OR Apache-2.0,The anstream Authors anstyle,https://github.com/rust-cli/anstyle,MIT OR Apache-2.0,The anstyle Authors anstyle-parse,https://github.com/rust-cli/anstyle,MIT OR Apache-2.0,The anstyle-parse Authors anstyle-query,https://github.com/rust-cli/anstyle,MIT OR Apache-2.0,The anstyle-query Authors anstyle-wincon,https://github.com/rust-cli/anstyle,MIT OR Apache-2.0,The anstyle-wincon Authors anyhow,https://github.com/dtolnay/anyhow,MIT OR Apache-2.0,David Tolnay arc-swap,https://github.com/vorner/arc-swap,MIT OR Apache-2.0,Michal 'vorner' Vaner arrayvec,https://github.com/bluss/arrayvec,MIT OR Apache-2.0,bluss assert-json-diff,https://github.com/davidpdrsn/assert-json-diff,MIT,David Pedersen async-compression,https://github.com/Nullus157/async-compression,MIT OR Apache-2.0,"Wim Looman , Allen Bui " async-speed-limit,https://github.com/tikv/async-speed-limit,MIT OR Apache-2.0,The TiKV Project Developers async-stream,https://github.com/tokio-rs/async-stream,MIT,Carl Lerche async-stream-impl,https://github.com/tokio-rs/async-stream,MIT,Carl Lerche async-trait,https://github.com/dtolnay/async-trait,MIT OR Apache-2.0,David Tolnay atomic-waker,https://github.com/smol-rs/atomic-waker,Apache-2.0 OR MIT,"Stjepan Glavina , Contributors to futures-rs" aws-config,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-credential-types,https://github.com/smithy-lang/smithy-rs,Apache-2.0,AWS Rust SDK Team aws-lc-rs,https://github.com/aws/aws-lc-rs,ISC AND (Apache-2.0 OR ISC),AWS-LibCrypto aws-lc-sys,https://github.com/aws/aws-lc-rs,ISC AND (Apache-2.0 OR ISC) AND OpenSSL,AWS-LC aws-runtime,https://github.com/smithy-lang/smithy-rs,Apache-2.0,AWS Rust SDK Team aws-sdk-lambda,https://github.com/awslabs/aws-sdk-rust,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-sdk-s3,https://github.com/awslabs/aws-sdk-rust,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-sdk-sso,https://github.com/awslabs/aws-sdk-rust,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-sdk-ssooidc,https://github.com/awslabs/aws-sdk-rust,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-sdk-sts,https://github.com/awslabs/aws-sdk-rust,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-sigv4,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , David Barsky " aws-smithy-async,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , John DiSanti " aws-smithy-checksums,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Zelda Hessler " aws-smithy-eventstream,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , John DiSanti " aws-smithy-http,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-smithy-http-client,https://github.com/smithy-lang/smithy-rs,Apache-2.0,AWS Rust SDK Team aws-smithy-json,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , John DiSanti " aws-smithy-observability,https://github.com/awslabs/smithy-rs,Apache-2.0,AWS Rust SDK Team aws-smithy-protocol-test,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-smithy-query,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , John DiSanti " aws-smithy-runtime,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Zelda Hessler " aws-smithy-runtime-api,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Zelda Hessler " aws-smithy-types,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-smithy-xml,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " aws-types,https://github.com/smithy-lang/smithy-rs,Apache-2.0,"AWS Rust SDK Team , Russell Cohen " axum,https://github.com/tokio-rs/axum,MIT,The axum Authors axum-core,https://github.com/tokio-rs/axum,MIT,The axum-core Authors base16ct,https://github.com/RustCrypto/formats/tree/master/base16ct,Apache-2.0 OR MIT,RustCrypto Developers base64,https://github.com/marshallpierce/rust-base64,MIT OR Apache-2.0,Marshall Pierce base64-simd,https://github.com/Nugine/simd,MIT,The base64-simd Authors base64ct,https://github.com/RustCrypto/formats,Apache-2.0 OR MIT,RustCrypto Developers bit-set,https://github.com/contain-rs/bit-set,Apache-2.0 OR MIT,Alexis Beingessner bit-vec,https://github.com/contain-rs/bit-vec,Apache-2.0 OR MIT,Alexis Beingessner bitflags,https://github.com/bitflags/bitflags,MIT OR Apache-2.0,The Rust Project Developers bitpacking,https://github.com/quickwit-oss/bitpacking,MIT,Paul Masurel block-buffer,https://github.com/RustCrypto/utils,MIT OR Apache-2.0,RustCrypto Developers bon,https://github.com/elastio/bon,MIT OR Apache-2.0,The bon Authors bon-macros,https://github.com/elastio/bon,MIT OR Apache-2.0,The bon-macros Authors bpu_trasher,https://github.com/pseitz/bpu_trasher,MIT,Pascal Seitz bs58,https://github.com/Nullus157/bs58-rs,MIT OR Apache-2.0,The bs58 Authors bumpalo,https://github.com/fitzgen/bumpalo,MIT OR Apache-2.0,Nick Fitzgerald bytecount,https://github.com/llogiq/bytecount,Apache-2.0 OR MIT,"Andre Bogus , Joshua Landau " byteorder,https://github.com/BurntSushi/byteorder,Unlicense OR MIT,Andrew Gallant bytes,https://github.com/tokio-rs/bytes,MIT,"Carl Lerche , Sean McArthur " bytes-utils,https://github.com/vorner/bytes-utils,Apache-2.0 OR MIT,Michal 'vorner' Vaner bytesize,https://github.com/bytesize-rs/bytesize,Apache-2.0,"Hyunsik Choi , MrCroxx , Rob Ede " bytestring,https://github.com/actix/actix-net,MIT OR Apache-2.0,"Nikolay Kim , Rob Ede " camino,https://github.com/camino-rs/camino,MIT OR Apache-2.0,"Without Boats , Ashley Williams , Steve Klabnik , Rain " cargo-platform,https://github.com/rust-lang/cargo,MIT OR Apache-2.0,The cargo-platform Authors cargo_metadata,https://github.com/oli-obk/cargo_metadata,MIT,Oliver Schneider cast,https://github.com/japaric/cast.rs,MIT OR Apache-2.0,Jorge Aparicio cbor-diag,https://github.com/Nullus157/cbor-diag-rs,MIT OR Apache-2.0,The cbor-diag Authors cc,https://github.com/rust-lang/cc-rs,MIT OR Apache-2.0,Alex Crichton census,https://github.com/quickwit-inc/census,MIT,Paul Masurel cfg-if,https://github.com/rust-lang/cfg-if,MIT OR Apache-2.0,Alex Crichton chitchat,https://github.com/quickwit-oss/chitchat,MIT,"Quickwit, Inc. " chrono,https://github.com/chronotope/chrono,MIT OR Apache-2.0,The chrono Authors ciborium,https://github.com/enarx/ciborium,Apache-2.0,Nathaniel McCallum ciborium-io,https://github.com/enarx/ciborium,Apache-2.0,Nathaniel McCallum ciborium-ll,https://github.com/enarx/ciborium,Apache-2.0,Nathaniel McCallum clap,https://github.com/clap-rs/clap,MIT OR Apache-2.0,The clap Authors clap_builder,https://github.com/clap-rs/clap,MIT OR Apache-2.0,The clap_builder Authors clap_lex,https://github.com/clap-rs/clap,MIT OR Apache-2.0,The clap_lex Authors coarsetime,https://github.com/jedisct1/rust-coarsetime,ISC,Frank Denis cobs,https://github.com/jamesmunns/cobs.rs,MIT OR Apache-2.0,"Allen Welkie <>, James Munns " colorchoice,https://github.com/rust-cli/anstyle,MIT OR Apache-2.0,The colorchoice Authors colored,https://github.com/mackwic/colored,MPL-2.0,Thomas Wickham compression-codecs,https://github.com/Nullus157/async-compression,MIT OR Apache-2.0,"Wim Looman , Allen Bui " compression-core,https://github.com/Nullus157/async-compression,MIT OR Apache-2.0,"Wim Looman , Allen Bui " console,https://github.com/console-rs/console,MIT,The console Authors const-oid,https://github.com/RustCrypto/formats/tree/master/const-oid,Apache-2.0 OR MIT,RustCrypto Developers core-foundation,https://github.com/servo/core-foundation-rs,MIT OR Apache-2.0,The Servo Project Developers core-foundation-sys,https://github.com/servo/core-foundation-rs,MIT OR Apache-2.0,The Servo Project Developers cpufeatures,https://github.com/RustCrypto/utils,MIT OR Apache-2.0,RustCrypto Developers crc32c,https://github.com/zowens/crc32c,Apache-2.0 OR MIT,Zack Owens crc32fast,https://github.com/srijs/rust-crc32fast,MIT OR Apache-2.0,"Sam Rijs , Alex Crichton " criterion-plot,https://github.com/criterion-rs/criterion.rs,Apache-2.0 OR MIT,"Jorge Aparicio , Brook Heisler " cron,https://github.com/zslayton/cron,MIT OR Apache-2.0,Zack Slayton crossbeam-channel,https://github.com/crossbeam-rs/crossbeam,MIT OR Apache-2.0,The crossbeam-channel Authors crossbeam-deque,https://github.com/crossbeam-rs/crossbeam,MIT OR Apache-2.0,The crossbeam-deque Authors crossbeam-epoch,https://github.com/crossbeam-rs/crossbeam,MIT OR Apache-2.0,The crossbeam-epoch Authors crossbeam-utils,https://github.com/crossbeam-rs/crossbeam,MIT OR Apache-2.0,The crossbeam-utils Authors crunchy,https://github.com/eira-fransham/crunchy,MIT,Eira Fransham crypto-bigint,https://github.com/RustCrypto/crypto-bigint,Apache-2.0 OR MIT,RustCrypto Developers crypto-common,https://github.com/RustCrypto/traits,MIT OR Apache-2.0,RustCrypto Developers darling,https://github.com/TedDriggs/darling,MIT,Ted Driggs darling_core,https://github.com/TedDriggs/darling,MIT,Ted Driggs darling_macro,https://github.com/TedDriggs/darling,MIT,Ted Driggs dashmap,https://github.com/xacrimon/dashmap,MIT,Acrimon data-encoding,https://github.com/ia0/data-encoding,MIT,Julien Cretin deadpool,https://github.com/bikeshedder/deadpool,MIT OR Apache-2.0,Michael P. Jung deadpool-runtime,https://github.com/bikeshedder/deadpool,MIT OR Apache-2.0,Michael P. Jung der,https://github.com/RustCrypto/formats/tree/master/der,Apache-2.0 OR MIT,RustCrypto Developers deranged,https://github.com/jhpratt/deranged,MIT OR Apache-2.0,Jacob Pratt dialoguer,https://github.com/console-rs/dialoguer,MIT,The dialoguer Authors diff,https://github.com/utkarshkukreti/diff.rs,MIT OR Apache-2.0,Utkarsh Kukreti difflib,https://github.com/DimaKudosh/difflib,MIT,Dima Kudosh digest,https://github.com/RustCrypto/traits,MIT OR Apache-2.0,RustCrypto Developers displaydoc,https://github.com/yaahc/displaydoc,MIT OR Apache-2.0,Jane Lusby downcast,https://github.com/fkoep/downcast-rs,MIT,Felix Köpge downcast-rs,https://github.com/marcianx/downcast-rs,MIT OR Apache-2.0,The downcast-rs Authors dtoa,https://github.com/dtolnay/dtoa,MIT OR Apache-2.0,David Tolnay dyn-clone,https://github.com/dtolnay/dyn-clone,MIT OR Apache-2.0,David Tolnay ecdsa,https://github.com/RustCrypto/signatures/tree/master/ecdsa,Apache-2.0 OR MIT,RustCrypto Developers either,https://github.com/rayon-rs/either,MIT OR Apache-2.0,bluss elasticsearch-dsl,https://github.com/vinted/elasticsearch-dsl-rs,MIT OR Apache-2.0,"Evaldas Buinauskas , Search Platform " elliptic-curve,https://github.com/RustCrypto/traits/tree/master/elliptic-curve,Apache-2.0 OR MIT,RustCrypto Developers embedded-io,https://github.com/embassy-rs/embedded-io,MIT OR Apache-2.0,The embedded-io Authors embedded-io,https://github.com/rust-embedded/embedded-hal,MIT OR Apache-2.0,The embedded-io Authors encode_unicode,https://github.com/tormol/encode_unicode,Apache-2.0 OR MIT,Torbjørn Birch Moltu encoding_rs,https://github.com/hsivonen/encoding_rs,(Apache-2.0 OR MIT) AND BSD-3-Clause,Henri Sivonen enum-iterator,https://github.com/stephaneyfx/enum-iterator,0BSD,Stephane Raux enum-iterator-derive,https://github.com/stephaneyfx/enum-iterator,0BSD,Stephane Raux env_filter,https://github.com/rust-cli/env_logger,MIT OR Apache-2.0,The env_filter Authors env_logger,https://github.com/rust-cli/env_logger,MIT OR Apache-2.0,The env_logger Authors equivalent,https://github.com/indexmap-rs/equivalent,Apache-2.0 OR MIT,The equivalent Authors erased-serde,https://github.com/dtolnay/erased-serde,MIT OR Apache-2.0,David Tolnay errno,https://github.com/lambda-fairy/rust-errno,MIT OR Apache-2.0,"Chris Wong , Dan Gohman " error-chain,https://github.com/rust-lang-nursery/error-chain,MIT OR Apache-2.0,"Brian Anderson , Paul Colomiets , Colin Kiegel , Yamakaky , Andrew Gauger " fail,https://github.com/tikv/fail-rs,Apache-2.0,The TiKV Project Developers fastdivide,https://github.com/fulmicoton/fastdivide,zlib-acknowledgement OR MIT,Paul Masurel fastrand,https://github.com/smol-rs/fastrand,Apache-2.0 OR MIT,Stjepan Glavina ff,https://github.com/zkcrypto/ff,MIT OR Apache-2.0,"Sean Bowe , Jack Grigg " find-msvc-tools,https://github.com/rust-lang/cc-rs,MIT OR Apache-2.0,The find-msvc-tools Authors fixedbitset,https://github.com/petgraph/fixedbitset,MIT OR Apache-2.0,bluss flate2,https://github.com/rust-lang/flate2-rs,MIT OR Apache-2.0,"Alex Crichton , Josh Triplett " float-cmp,https://github.com/mikedilger/float-cmp,MIT,Mike Dilger flume,https://github.com/zesterer/flume,Apache-2.0 OR MIT,Joshua Barretto fnv,https://github.com/servo/rust-fnv,Apache-2.0 OR MIT,Alex Crichton foldhash,https://github.com/orlp/foldhash,Zlib,Orson Peters form_urlencoded,https://github.com/servo/rust-url,MIT OR Apache-2.0,The rust-url developers fragile,https://github.com/mitsuhiko/fragile,Apache-2.0,Armin Ronacher fs4,https://github.com/al8n/fs4-rs,MIT OR Apache-2.0,"Dan Burkert , Al Liu " fslock,https://github.com/brunoczim/fslock,MIT,The fslock Authors futures,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures Authors futures-channel,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-channel Authors futures-core,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-core Authors futures-executor,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-executor Authors futures-io,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-io Authors futures-macro,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-macro Authors futures-sink,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-sink Authors futures-task,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-task Authors futures-timer,https://github.com/async-rs/futures-timer,MIT OR Apache-2.0,Alex Crichton futures-util,https://github.com/rust-lang/futures-rs,MIT OR Apache-2.0,The futures-util Authors generic-array,https://github.com/fizyk20/generic-array,MIT,"Bartłomiej Kamiński , Aaron Trent " getrandom,https://github.com/rust-random/getrandom,MIT OR Apache-2.0,The Rand Project Developers glob,https://github.com/rust-lang/glob,MIT OR Apache-2.0,The Rust Project Developers governor,https://github.com/boinkor-net/governor,MIT,Andreas Fuchs group,https://github.com/zkcrypto/group,MIT OR Apache-2.0,"Sean Bowe , Jack Grigg " h2,https://github.com/hyperium/h2,MIT,"Carl Lerche , Sean McArthur " half,https://github.com/VoidStarKat/half-rs,MIT OR Apache-2.0,Kathryn Long hashbrown,https://github.com/rust-lang/hashbrown,MIT OR Apache-2.0,Amanieu d'Antras headers,https://github.com/hyperium/headers,MIT,Sean McArthur headers-core,https://github.com/hyperium/headers,MIT,Sean McArthur heck,https://github.com/withoutboats/heck,MIT OR Apache-2.0,The heck Authors heck,https://github.com/withoutboats/heck,MIT OR Apache-2.0,Without Boats hermit-abi,https://github.com/hermit-os/hermit-rs,MIT OR Apache-2.0,Stefan Lankes hex,https://github.com/KokaKiwi/rust-hex,MIT OR Apache-2.0,KokaKiwi hmac,https://github.com/RustCrypto/MACs,MIT OR Apache-2.0,RustCrypto Developers home,https://github.com/rust-lang/cargo,MIT OR Apache-2.0,Brian Anderson hostname,https://github.com/djc/hostname,MIT,The hostname Authors htmlescape,https://github.com/veddan/rust-htmlescape,Apache-2.0 OR MIT OR MPL-2.0,Viktor Dahl http,https://github.com/hyperium/http,MIT OR Apache-2.0,"Alex Crichton , Carl Lerche , Sean McArthur " http-body,https://github.com/hyperium/http-body,MIT,"Carl Lerche , Lucio Franco , Sean McArthur " http-body-util,https://github.com/hyperium/http-body,MIT,"Carl Lerche , Lucio Franco , Sean McArthur " http-serde,https://gitlab.com/kornelski/http-serde,Apache-2.0 OR MIT,Kornel httparse,https://github.com/seanmonstar/httparse,MIT OR Apache-2.0,Sean McArthur httpdate,https://github.com/pyfisch/httpdate,MIT OR Apache-2.0,Pyfisch humantime,https://github.com/chronotope/humantime,MIT OR Apache-2.0,The humantime Authors hyper,https://github.com/hyperium/hyper,MIT,Sean McArthur hyper-rustls,https://github.com/rustls/hyper-rustls,Apache-2.0 OR ISC OR MIT,The hyper-rustls Authors hyper-timeout,https://github.com/hjr3/hyper-timeout,MIT OR Apache-2.0,Herman J. Radtke III hyper-util,https://github.com/hyperium/hyper-util,MIT,Sean McArthur hyperloglogplus,https://github.com/tabac/hyperloglog.rs,MIT,Tasos Bakogiannis iana-time-zone,https://github.com/strawlab/iana-time-zone,MIT OR Apache-2.0,"Andrew Straw , René Kijewski , Ryan Lopopolo " iana-time-zone-haiku,https://github.com/strawlab/iana-time-zone,MIT OR Apache-2.0,René Kijewski icu_collections,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers icu_locale_core,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers icu_normalizer,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers icu_normalizer_data,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers icu_properties,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers icu_properties_data,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers icu_provider,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers ident_case,https://github.com/TedDriggs/ident_case,MIT OR Apache-2.0,Ted Driggs idna,https://github.com/servo/rust-url,MIT OR Apache-2.0,The rust-url developers idna_adapter,https://github.com/hsivonen/idna_adapter,Apache-2.0 OR MIT,The rust-url developers indexmap,https://github.com/bluss/indexmap,Apache-2.0 OR MIT,The indexmap Authors indexmap,https://github.com/indexmap-rs/indexmap,Apache-2.0 OR MIT,The indexmap Authors indicatif,https://github.com/console-rs/indicatif,MIT,The indicatif Authors inventory,https://github.com/dtolnay/inventory,MIT OR Apache-2.0,David Tolnay ipnet,https://github.com/krisprice/ipnet,MIT OR Apache-2.0,Kris Price ipnetwork,https://github.com/achanda/ipnetwork,MIT OR Apache-2.0,"Abhishek Chanda , Linus Färnstrand " iri-string,https://github.com/lo48576/iri-string,MIT OR Apache-2.0,YOSHIOKA Takuma is-terminal,https://github.com/sunfishcode/is-terminal,MIT,"softprops , Dan Gohman " is_terminal_polyfill,https://github.com/polyfill-rs/is_terminal_polyfill,MIT OR Apache-2.0,The is_terminal_polyfill Authors itertools,https://github.com/rust-itertools/itertools,MIT OR Apache-2.0,bluss itoa,https://github.com/dtolnay/itoa,MIT OR Apache-2.0,David Tolnay jobserver,https://github.com/rust-lang/jobserver-rs,MIT OR Apache-2.0,Alex Crichton js-sys,https://github.com/wasm-bindgen/wasm-bindgen/tree/master/crates/js-sys,MIT OR Apache-2.0,The wasm-bindgen Developers json_comments,https://github.com/tmccombs/json-comments-rs,Apache-2.0,Thayne McCombs lambda_runtime,https://github.com/awslabs/aws-lambda-rust-runtime,Apache-2.0,"David Calavera , Harold Sun " lambda_runtime_api_client,https://github.com/awslabs/aws-lambda-rust-runtime,Apache-2.0,"David Calavera , Harold Sun " lazy_static,https://github.com/rust-lang-nursery/lazy-static.rs,MIT OR Apache-2.0,Marvin Löbel levenshtein_automata,https://github.com/tantivy-search/levenshtein-automata,MIT,Paul Masurel libc,https://github.com/rust-lang/libc,MIT OR Apache-2.0,The Rust Project Developers libm,https://github.com/rust-lang/compiler-builtins,MIT,Jorge Aparicio linked-hash-map,https://github.com/contain-rs/linked-hash-map,MIT OR Apache-2.0,"Stepan Koltsov , Andrew Paseltiner " linux-raw-sys,https://github.com/sunfishcode/linux-raw-sys,Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT,Dan Gohman litemap,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers lock_api,https://github.com/Amanieu/parking_lot,MIT OR Apache-2.0,Amanieu d'Antras log,https://github.com/rust-lang/log,MIT OR Apache-2.0,The Rust Project Developers lru,https://github.com/jeromefroe/lru-rs,MIT,Jerome Froelich lru-slab,https://github.com/Ralith/lru-slab,MIT OR Apache-2.0 OR Zlib,Benjamin Saunders lz4_flex,https://github.com/pseitz/lz4_flex,MIT,"Pascal Seitz , Arthur Silva , ticki " matchers,https://github.com/hawkw/matchers,MIT,Eliza Weisman matchit,https://github.com/ibraheemdev/matchit,MIT AND BSD-3-Clause,Ibraheem Ahmed md-5,https://github.com/RustCrypto/hashes,MIT OR Apache-2.0,RustCrypto Developers md5,https://github.com/stainless-steel/md5,Apache-2.0 OR MIT,"Ivan Ukhov , Kamal Ahmad , Konstantin Stepanov , Lukas Kalbertodt , Nathan Musoke , Scott Mabin , Tony Arcieri , Wim de With , Yosef Dinerstein " measure_time,https://github.com/PSeitz/rust_measure_time,MIT,Pascal Seitz memchr,https://github.com/BurntSushi/memchr,Unlicense OR MIT,"Andrew Gallant , bluss" memmap2,https://github.com/RazrFalcon/memmap2-rs,MIT OR Apache-2.0,"Dan Burkert , Yevhenii Reizner " mime,https://github.com/hyperium/mime,MIT OR Apache-2.0,Sean McArthur mime_guess,https://github.com/abonander/mime_guess,MIT,Austin Bonander mini-internal,https://github.com/dtolnay/miniserde,MIT OR Apache-2.0,David Tolnay mini-moka,https://github.com/moka-rs/mini-moka,MIT OR Apache-2.0,The mini-moka Authors minimal-lexical,https://github.com/Alexhuszagh/minimal-lexical,MIT OR Apache-2.0,Alex Huszagh miniserde,https://github.com/dtolnay/miniserde,MIT OR Apache-2.0,David Tolnay miniz_oxide,https://github.com/Frommi/miniz_oxide/tree/master/miniz_oxide,MIT OR Zlib OR Apache-2.0,"Frommi , oyvindln , Rich Geldreich richgel99@gmail.com" mio,https://github.com/tokio-rs/mio,MIT,"Carl Lerche , Thomas de Zeeuw , Tokio Contributors " mockall,https://github.com/asomers/mockall,MIT OR Apache-2.0,Alan Somers mockall_derive,https://github.com/asomers/mockall,MIT OR Apache-2.0,Alan Somers mrecordlog,https://github.com/quickwit-oss/mrecordlog,MIT,The mrecordlog Authors multimap,https://github.com/havarnov/multimap,MIT OR Apache-2.0,Håvar Nøvik murmurhash32,https://github.com/quickwit-inc/murmurhash32,MIT,Paul Masurel new_string_template,https://github.com/hasezoey/new_string_template,MIT,hasezoey no-std-net,https://github.com/dunmatt/no-std-net,MIT,M@ Dunlap nom,https://github.com/Geal/nom,MIT,contact@geoffroycouprie.com nom,https://github.com/rust-bakery/nom,MIT,contact@geoffroycouprie.com nonzero_ext,https://github.com/antifuchs/nonzero_ext,Apache-2.0,Andreas Fuchs normalize-line-endings,https://github.com/derekdreery/normalize-line-endings,Apache-2.0,Richard Dodd nu-ansi-term,https://github.com/nushell/nu-ansi-term,MIT,"ogham@bsago.me, Ryan Scheel (Havvy) , Josh Triplett , The Nushell Project Developers" num-bigint,https://github.com/rust-num/num-bigint,MIT OR Apache-2.0,The Rust Project Developers num-conv,https://github.com/jhpratt/num-conv,MIT OR Apache-2.0,Jacob Pratt num-integer,https://github.com/rust-num/num-integer,MIT OR Apache-2.0,The Rust Project Developers num-rational,https://github.com/rust-num/num-rational,MIT OR Apache-2.0,The Rust Project Developers num-traits,https://github.com/rust-num/num-traits,MIT OR Apache-2.0,The Rust Project Developers num_cpus,https://github.com/seanmonstar/num_cpus,MIT OR Apache-2.0,Sean McArthur numfmt,https://github.com/kurtlawrence/numfmt,MIT,Kurt Lawrence objc2-core-foundation,https://github.com/madsmtm/objc2,Zlib OR Apache-2.0 OR MIT,The objc2-core-foundation Authors objc2-io-kit,https://github.com/madsmtm/objc2,Zlib OR Apache-2.0 OR MIT,The objc2-io-kit Authors once_cell,https://github.com/matklad/once_cell,MIT OR Apache-2.0,Aleksey Kladov once_cell_polyfill,https://github.com/polyfill-rs/once_cell_polyfill,MIT OR Apache-2.0,The once_cell_polyfill Authors oneshot,https://github.com/faern/oneshot,MIT OR Apache-2.0,Linus Färnstrand oorandom,https://hg.sr.ht/~icefox/oorandom,MIT,Simon Heath openssl-probe,https://github.com/alexcrichton/openssl-probe,MIT OR Apache-2.0,Alex Crichton opentelemetry,https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry,Apache-2.0,The opentelemetry Authors opentelemetry-appender-tracing,https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-appender-tracing,Apache-2.0,The opentelemetry-appender-tracing Authors opentelemetry-http,https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-http,Apache-2.0,The opentelemetry-http Authors opentelemetry-otlp,https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-otlp,Apache-2.0,The opentelemetry-otlp Authors opentelemetry-proto,https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-proto,Apache-2.0,The opentelemetry-proto Authors opentelemetry_sdk,https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-sdk,Apache-2.0,The opentelemetry_sdk Authors ordered-float,https://github.com/reem/rust-ordered-float,MIT,"Jonathan Reem , Matt Brubeck " ouroboros,https://github.com/someguynamedjosh/ouroboros,MIT OR Apache-2.0,Josh ouroboros_macro,https://github.com/someguynamedjosh/ouroboros,MIT OR Apache-2.0,Josh outref,https://github.com/Nugine/outref,MIT,The outref Authors ownedbytes,https://github.com/quickwit-oss/tantivy,MIT,"Paul Masurel , Pascal Seitz " p256,https://github.com/RustCrypto/elliptic-curves/tree/master/p256,Apache-2.0 OR MIT,RustCrypto Developers page_size,https://github.com/Elzair/page_size_rs,MIT OR Apache-2.0,Philip Woods papergrid,https://github.com/zhiburt/tabled,MIT,Maxim Zhiburt parking_lot,https://github.com/Amanieu/parking_lot,MIT OR Apache-2.0,Amanieu d'Antras parking_lot_core,https://github.com/Amanieu/parking_lot,MIT OR Apache-2.0,Amanieu d'Antras peakmem-alloc,https://github.com/PSeitz/peakmem-alloc,MIT,Pascal Seitz percent-encoding,https://github.com/servo/rust-url,MIT OR Apache-2.0,The rust-url developers perf-event,https://github.com/jimblandy/perf-event,MIT OR Apache-2.0,Jim Blandy perf-event-open-sys,https://github.com/jimblandy/perf-event-open-sys,MIT OR Apache-2.0,Jim Blandy petgraph,https://github.com/petgraph/petgraph,MIT OR Apache-2.0,"bluss, mitchmindtree" pin-project,https://github.com/taiki-e/pin-project,Apache-2.0 OR MIT,The pin-project Authors pin-project-internal,https://github.com/taiki-e/pin-project,Apache-2.0 OR MIT,The pin-project-internal Authors pin-project-lite,https://github.com/taiki-e/pin-project-lite,Apache-2.0 OR MIT,The pin-project-lite Authors pin-utils,https://github.com/rust-lang-nursery/pin-utils,MIT OR Apache-2.0,Josef Brandl pkcs8,https://github.com/RustCrypto/formats/tree/master/pkcs8,Apache-2.0 OR MIT,RustCrypto Developers plotters,https://github.com/plotters-rs/plotters,MIT,Hao Hou plotters-backend,https://github.com/plotters-rs/plotters,MIT,Hao Hou plotters-svg,https://github.com/plotters-rs/plotters,MIT,Hao Hou pnet,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,Robert Clipsham pnet_base,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,"Robert Clipsham , Linus Färnstrand " pnet_datalink,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,"Robert Clipsham , Linus Färnstrand " pnet_macros,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,"Robert Clipsham , Pierre Chifflier " pnet_macros_support,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,Robert Clipsham pnet_packet,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,Robert Clipsham pnet_sys,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,"Robert Clipsham , Linus Färnstrand " pnet_transport,https://github.com/libpnet/libpnet,MIT OR Apache-2.0,Robert Clipsham portable-atomic,https://github.com/taiki-e/portable-atomic,Apache-2.0 OR MIT,The portable-atomic Authors postcard,https://github.com/jamesmunns/postcard,MIT OR Apache-2.0,James Munns potential_utf,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers powerfmt,https://github.com/jhpratt/powerfmt,MIT OR Apache-2.0,Jacob Pratt ppv-lite86,https://github.com/cryptocorrosion/cryptocorrosion,MIT OR Apache-2.0,The CryptoCorrosion Contributors predicates,https://github.com/assert-rs/predicates-rs,MIT OR Apache-2.0,Nick Stevens predicates-core,https://github.com/assert-rs/predicates-rs/tree/master/crates/core,MIT OR Apache-2.0,Nick Stevens predicates-tree,https://github.com/assert-rs/predicates-rs/tree/master/crates/tree,MIT OR Apache-2.0,Nick Stevens pretty_assertions,https://github.com/rust-pretty-assertions/rust-pretty-assertions,MIT OR Apache-2.0,"Colin Kiegel , Florent Fayolle , Tom Milligan " prettyplease,https://github.com/dtolnay/prettyplease,MIT OR Apache-2.0,David Tolnay proc-macro-error,https://gitlab.com/CreepySkeleton/proc-macro-error,MIT OR Apache-2.0,CreepySkeleton proc-macro-error-attr,https://gitlab.com/CreepySkeleton/proc-macro-error,MIT OR Apache-2.0,CreepySkeleton proc-macro-error-attr2,https://github.com/GnomedDev/proc-macro-error-2,MIT OR Apache-2.0,"CreepySkeleton , GnomedDev " proc-macro-error2,https://github.com/GnomedDev/proc-macro-error-2,MIT OR Apache-2.0,"CreepySkeleton , GnomedDev " proc-macro2,https://github.com/dtolnay/proc-macro2,MIT OR Apache-2.0,"David Tolnay , Alex Crichton " proc-macro2-diagnostics,https://github.com/SergioBenitez/proc-macro2-diagnostics,MIT OR Apache-2.0,Sergio Benitez procfs,https://github.com/eminence/procfs,MIT OR Apache-2.0,Andrew Chin procfs-core,https://github.com/eminence/procfs,MIT OR Apache-2.0,Andrew Chin prometheus,https://github.com/tikv/rust-prometheus,Apache-2.0,"overvenus@gmail.com, siddontang@gmail.com, vistaswx@gmail.com" prost,https://github.com/tokio-rs/prost,Apache-2.0,"Dan Burkert , Lucio Franco , Casper Meijn , Tokio Contributors " prost-build,https://github.com/tokio-rs/prost,Apache-2.0,"Dan Burkert , Lucio Franco , Casper Meijn , Tokio Contributors " prost-derive,https://github.com/tokio-rs/prost,Apache-2.0,"Dan Burkert , Lucio Franco , Casper Meijn , Tokio Contributors " prost-types,https://github.com/tokio-rs/prost,Apache-2.0,"Dan Burkert , Lucio Franco , Casper Meijn , Tokio Contributors " pulldown-cmark,https://github.com/raphlinus/pulldown-cmark,MIT,"Raph Levien , Marcus Klaas de Vries " pulldown-cmark-to-cmark,https://github.com/Byron/pulldown-cmark-to-cmark,Apache-2.0,"Sebastian Thiel , Dylan Owen , Alessandro Ogier , Zixian Cai <2891235+caizixian@users.noreply.github.com>, Andrew Lyjak " quanta,https://github.com/metrics-rs/quanta,MIT,Toby Lawrence quick-error,http://github.com/tailhook/quick-error,MIT OR Apache-2.0,"Paul Colomiets , Colin Kiegel " quick_cache,https://github.com/arthurprs/quick-cache,MIT,Arthur Silva quinn,https://github.com/quinn-rs/quinn,MIT OR Apache-2.0,The quinn Authors quinn-proto,https://github.com/quinn-rs/quinn,MIT OR Apache-2.0,The quinn-proto Authors quinn-udp,https://github.com/quinn-rs/quinn,MIT OR Apache-2.0,The quinn-udp Authors quote,https://github.com/dtolnay/quote,MIT OR Apache-2.0,David Tolnay r-efi,https://github.com/r-efi/r-efi,MIT OR Apache-2.0 OR LGPL-2.1-or-later,The r-efi Authors rand,https://github.com/rust-random/rand,MIT OR Apache-2.0,"The Rand Project Developers, The Rust Project Developers" rand_chacha,https://github.com/rust-random/rand,MIT OR Apache-2.0,"The Rand Project Developers, The Rust Project Developers, The CryptoCorrosion Contributors" rand_core,https://github.com/rust-random/rand,MIT OR Apache-2.0,"The Rand Project Developers, The Rust Project Developers" rand_xorshift,https://github.com/rust-random/rngs,MIT OR Apache-2.0,"The Rand Project Developers, The Rust Project Developers" raw-cpuid,https://github.com/gz/rust-cpuid,MIT,Gerd Zellweger rayon,https://github.com/rayon-rs/rayon,MIT OR Apache-2.0,The rayon Authors rayon-core,https://github.com/rayon-rs/rayon,MIT OR Apache-2.0,The rayon-core Authors redox_syscall,https://gitlab.redox-os.org/redox-os/syscall,MIT,Jeremy Soller ref-cast,https://github.com/dtolnay/ref-cast,MIT OR Apache-2.0,David Tolnay ref-cast-impl,https://github.com/dtolnay/ref-cast,MIT OR Apache-2.0,David Tolnay regex,https://github.com/rust-lang/regex,MIT OR Apache-2.0,"The Rust Project Developers, Andrew Gallant " regex-automata,https://github.com/rust-lang/regex,MIT OR Apache-2.0,"The Rust Project Developers, Andrew Gallant " regex-lite,https://github.com/rust-lang/regex,MIT OR Apache-2.0,"The Rust Project Developers, Andrew Gallant " regex-syntax,https://github.com/rust-lang/regex,MIT OR Apache-2.0,"The Rust Project Developers, Andrew Gallant " reqwest,https://github.com/seanmonstar/reqwest,MIT OR Apache-2.0,Sean McArthur reqwest-middleware,https://github.com/TrueLayer/reqwest-middleware,MIT OR Apache-2.0,Rodrigo Gryzinski reqwest-retry,https://github.com/TrueLayer/reqwest-middleware,MIT OR Apache-2.0,Rodrigo Gryzinski retry-policies,https://github.com/TrueLayer/retry-policies,MIT OR Apache-2.0,Luca Palmieri rfc6979,https://github.com/RustCrypto/signatures/tree/master/rfc6979,Apache-2.0 OR MIT,RustCrypto Developers ring,https://github.com/briansmith/ring,Apache-2.0 AND ISC,The ring Authors roxmltree,https://github.com/RazrFalcon/roxmltree,MIT OR Apache-2.0,Evgeniy Reizner rust-embed,https://pyrossh.dev/repos/rust-embed,MIT,pyrossh rust-embed-impl,https://pyrossh.dev/repos/rust-embed,MIT,pyrossh rust-embed-utils,https://pyrossh.dev/repos/rust-embed,MIT,pyrossh rustc-hash,https://github.com/rust-lang/rustc-hash,Apache-2.0 OR MIT,The Rust Project Developers rustix,https://github.com/bytecodealliance/rustix,Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT,"Dan Gohman , Jakub Konka " rustls,https://github.com/rustls/rustls,Apache-2.0 OR ISC OR MIT,The rustls Authors rustls-native-certs,https://github.com/rustls/rustls-native-certs,Apache-2.0 OR ISC OR MIT,The rustls-native-certs Authors rustls-pemfile,https://github.com/rustls/pemfile,Apache-2.0 OR ISC OR MIT,The rustls-pemfile Authors rustls-pki-types,https://github.com/rustls/pki-types,MIT OR Apache-2.0,The rustls-pki-types Authors rustls-webpki,https://github.com/rustls/webpki,ISC,The rustls-webpki Authors rustop,https://chiselapp.com/user/fifr/repository/rustop,MIT,Frank Fischer rustversion,https://github.com/dtolnay/rustversion,MIT OR Apache-2.0,David Tolnay rusty-fork,https://github.com/altsysrq/rusty-fork,MIT OR Apache-2.0,Jason Lingle ryu,https://github.com/dtolnay/ryu,Apache-2.0 OR BSL-1.0,David Tolnay same-file,https://github.com/BurntSushi/same-file,Unlicense OR MIT,Andrew Gallant scc,https://github.com/wvwwvwwv/scalable-concurrent-containers,Apache-2.0,wvwwvwwv schannel,https://github.com/steffengy/schannel-rs,MIT,"Steven Fackler , Steffen Butzer " schemars,https://github.com/GREsau/schemars,MIT,Graham Esau scoped-tls,https://github.com/alexcrichton/scoped-tls,MIT OR Apache-2.0,Alex Crichton scopeguard,https://github.com/bluss/scopeguard,MIT OR Apache-2.0,bluss sct,https://github.com/rustls/sct.rs,Apache-2.0 OR ISC OR MIT,Joseph Birr-Pixton sdd,https://github.com/wvwwvwwv/scalable-delayed-dealloc,Apache-2.0,wvwwvwwv sec1,https://github.com/RustCrypto/formats/tree/master/sec1,Apache-2.0 OR MIT,RustCrypto Developers security-framework,https://github.com/kornelski/rust-security-framework,MIT OR Apache-2.0,"Steven Fackler , Kornel " security-framework-sys,https://github.com/kornelski/rust-security-framework,MIT OR Apache-2.0,"Steven Fackler , Kornel " semver,https://github.com/dtolnay/semver,MIT OR Apache-2.0,David Tolnay separator,https://github.com/saghm/rust-separator,MIT,Saghm Rossi serde,https://github.com/serde-rs/serde,MIT OR Apache-2.0,"Erick Tryzelaar , David Tolnay " serde_core,https://github.com/serde-rs/serde,MIT OR Apache-2.0,"Erick Tryzelaar , David Tolnay " serde_derive,https://github.com/serde-rs/serde,MIT OR Apache-2.0,"Erick Tryzelaar , David Tolnay " serde_json,https://github.com/serde-rs/json,MIT OR Apache-2.0,"Erick Tryzelaar , David Tolnay " serde_json_borrow,https://github.com/PSeitz/serde_json_borrow,MIT,Pascal Seitz serde_path_to_error,https://github.com/dtolnay/path-to-error,MIT OR Apache-2.0,David Tolnay serde_qs,https://github.com/samscott89/serde_qs,MIT OR Apache-2.0,Sam Scott serde_spanned,https://github.com/toml-rs/toml,MIT OR Apache-2.0,The serde_spanned Authors serde_urlencoded,https://github.com/nox/serde_urlencoded,MIT OR Apache-2.0,Anthony Ramine serde_with,https://github.com/jonasbb/serde_with,MIT OR Apache-2.0,"Jonas Bushart, Marcin Kaźmierczak" serde_with_macros,https://github.com/jonasbb/serde_with,MIT OR Apache-2.0,Jonas Bushart serde_yaml,https://github.com/dtolnay/serde-yaml,MIT OR Apache-2.0,David Tolnay serial_test_derive,https://github.com/palfrey/serial_test,MIT,Tom Parker-Shemilt sha1,https://github.com/RustCrypto/hashes,MIT OR Apache-2.0,RustCrypto Developers sha2,https://github.com/RustCrypto/hashes,MIT OR Apache-2.0,RustCrypto Developers sharded-slab,https://github.com/hawkw/sharded-slab,MIT,Eliza Weisman shell-words,https://github.com/tmiasko/shell-words,MIT OR Apache-2.0,Tomasz Miąsko shlex,https://github.com/comex/rust-shlex,MIT OR Apache-2.0,"comex , Fenhl , Adrian Taylor , Alex Touchet , Daniel Parks , Garrett Berg " signal-hook-registry,https://github.com/vorner/signal-hook,MIT OR Apache-2.0,"Michal 'vorner' Vaner , Masaki Hara " signature,https://github.com/RustCrypto/traits/tree/master/signature,Apache-2.0 OR MIT,RustCrypto Developers simd-adler32,https://github.com/mcountryman/simd-adler32,MIT,Marvin Countryman siphasher,https://github.com/jedisct1/rust-siphash,MIT OR Apache-2.0,Frank Denis sketches-ddsketch,https://github.com/mheffner/rust-sketches-ddsketch,Apache-2.0,Mike Heffner slab,https://github.com/tokio-rs/slab,MIT,Carl Lerche smallvec,https://github.com/servo/rust-smallvec,MIT OR Apache-2.0,The Servo Project Developers socket2,https://github.com/rust-lang/socket2,MIT OR Apache-2.0,"Alex Crichton , Thomas de Zeeuw " spin,https://github.com/mvdnes/spin-rs,MIT,"Mathijs van de Nes , John Ericson , Joshua Barretto " spinning_top,https://github.com/rust-osdev/spinning_top,MIT OR Apache-2.0,Philipp Oppermann spki,https://github.com/RustCrypto/formats/tree/master/spki,Apache-2.0 OR MIT,RustCrypto Developers stable_deref_trait,https://github.com/storyyeller/stable_deref_trait,MIT OR Apache-2.0,Robert Grosse static_assertions,https://github.com/nvzqz/static-assertions-rs,MIT OR Apache-2.0,Nikolai Vazquez strsim,https://github.com/rapidfuzz/strsim-rs,MIT,"Danny Guo , maxbachmann " subtle,https://github.com/dalek-cryptography/subtle,BSD-3-Clause,"Isis Lovecruft , Henry de Valence " syn,https://github.com/dtolnay/syn,MIT OR Apache-2.0,David Tolnay sync_wrapper,https://github.com/Actyx/sync_wrapper,Apache-2.0,Actyx AG synstructure,https://github.com/mystor/synstructure,MIT,Nika Layzell sysinfo,https://github.com/GuillaumeGomez/sysinfo,MIT,Guillaume Gomez tabled,https://github.com/zhiburt/tabled,MIT,Maxim Zhiburt tabled_derive,https://github.com/zhiburt/tabled,MIT,Maxim Zhiburt tagptr,https://github.com/oliver-giersch/tagptr,MIT OR Apache-2.0,Oliver Giersch tantivy,https://github.com/quickwit-oss/tantivy,MIT,Paul Masurel tantivy-bitpacker,https://github.com/quickwit-oss/tantivy,MIT,Paul Masurel tantivy-columnar,https://github.com/quickwit-oss/tantivy,MIT,The tantivy-columnar Authors tantivy-common,https://github.com/quickwit-oss/tantivy,MIT,"Paul Masurel , Pascal Seitz " tantivy-fst,https://github.com/quickwit-inc/fst,Unlicense OR MIT,Andrew Gallant tantivy-query-grammar,https://github.com/quickwit-oss/tantivy,MIT,Paul Masurel tantivy-sstable,https://github.com/quickwit-oss/tantivy,MIT,The tantivy-sstable Authors tantivy-stacker,https://github.com/quickwit-oss/tantivy,MIT,The tantivy-stacker Authors tantivy-tokenizer-api,https://github.com/quickwit-oss/tantivy,MIT,The tantivy-tokenizer-api Authors tempfile,https://github.com/Stebalien/tempfile,MIT OR Apache-2.0,"Steven Allen , The Rust Project Developers, Ashley Mannix , Jason White " termtree,https://github.com/rust-cli/termtree,MIT,The termtree Authors testing_table,https://github.com/zhiburt/tabled,MIT,Maxim Zhiburt thiserror,https://github.com/dtolnay/thiserror,MIT OR Apache-2.0,David Tolnay thiserror-impl,https://github.com/dtolnay/thiserror,MIT OR Apache-2.0,David Tolnay thousands,https://github.com/tov/thousands-rs,MIT OR Apache-2.0,Jesse A. Tov thread_local,https://github.com/Amanieu/thread_local-rs,MIT OR Apache-2.0,Amanieu d'Antras time,https://github.com/time-rs/time,MIT OR Apache-2.0,"Jacob Pratt , Time contributors" time-core,https://github.com/time-rs/time,MIT OR Apache-2.0,"Jacob Pratt , Time contributors" time-fmt,https://github.com/MiSawa/time-fmt,MIT OR Apache-2.0,mi_sawa time-macros,https://github.com/time-rs/time,MIT OR Apache-2.0,"Jacob Pratt , Time contributors" tinystr,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers tinytemplate,https://github.com/bheisler/TinyTemplate,Apache-2.0 OR MIT,Brook Heisler tinyvec,https://github.com/Lokathor/tinyvec,Zlib OR Apache-2.0 OR MIT,Lokathor tinyvec_macros,https://github.com/Soveu/tinyvec_macros,MIT OR Apache-2.0 OR Zlib,Soveu tokio,https://github.com/tokio-rs/tokio,MIT,Tokio Contributors tokio-macros,https://github.com/tokio-rs/tokio,MIT,Tokio Contributors tokio-metrics,https://github.com/tokio-rs/tokio-metrics,MIT,Tokio Contributors tokio-rustls,https://github.com/rustls/tokio-rustls,MIT OR Apache-2.0,The tokio-rustls Authors tokio-stream,https://github.com/tokio-rs/tokio,MIT,Tokio Contributors tokio-util,https://github.com/tokio-rs/tokio,MIT,Tokio Contributors toml,https://github.com/toml-rs/toml,MIT OR Apache-2.0,The toml Authors toml_datetime,https://github.com/toml-rs/toml,MIT OR Apache-2.0,The toml_datetime Authors toml_parser,https://github.com/toml-rs/toml,MIT OR Apache-2.0,The toml_parser Authors toml_writer,https://github.com/toml-rs/toml,MIT OR Apache-2.0,The toml_writer Authors tonic,https://github.com/hyperium/tonic,MIT,Lucio Franco tonic-build,https://github.com/hyperium/tonic,MIT,Lucio Franco tonic-health,https://github.com/hyperium/tonic,MIT,James Nugent tonic-prost,https://github.com/hyperium/tonic,MIT,Lucio Franco tonic-prost-build,https://github.com/hyperium/tonic,MIT,Lucio Franco tonic-reflection,https://github.com/hyperium/tonic,MIT,"James Nugent , Samani G. Gikandi " tower,https://github.com/tower-rs/tower,MIT,Tower Maintainers tower-http,https://github.com/tower-rs/tower-http,MIT,Tower Maintainers tower-layer,https://github.com/tower-rs/tower,MIT,Tower Maintainers tower-service,https://github.com/tower-rs/tower,MIT,Tower Maintainers tracing,https://github.com/tokio-rs/tracing,MIT,"Eliza Weisman , Tokio Contributors " tracing-attributes,https://github.com/tokio-rs/tracing,MIT,"Tokio Contributors , Eliza Weisman , David Barsky " tracing-core,https://github.com/tokio-rs/tracing,MIT,Tokio Contributors tracing-log,https://github.com/tokio-rs/tracing,MIT,Tokio Contributors tracing-opentelemetry,https://github.com/tokio-rs/tracing-opentelemetry,MIT,The tracing-opentelemetry Authors tracing-serde,https://github.com/tokio-rs/tracing,MIT,Tokio Contributors tracing-subscriber,https://github.com/tokio-rs/tracing,MIT,"Eliza Weisman , David Barsky , Tokio Contributors " triomphe,https://github.com/Manishearth/triomphe,MIT OR Apache-2.0,"Manish Goregaokar , The Servo Project Developers" try-lock,https://github.com/seanmonstar/try-lock,MIT,Sean McArthur ttl_cache,https://github.com/stusmall/ttl_cache,MIT OR Apache-2.0,Stu Small typeid,https://github.com/dtolnay/typeid,MIT OR Apache-2.0,David Tolnay typenum,https://github.com/paholg/typenum,MIT OR Apache-2.0,"Paho Lurie-Gregg , Andre Bogus " typetag,https://github.com/dtolnay/typetag,MIT OR Apache-2.0,David Tolnay typetag-impl,https://github.com/dtolnay/typetag,MIT OR Apache-2.0,David Tolnay ulid,https://github.com/dylanhart/ulid-rs,MIT,dylanhart unarray,https://github.com/cameron1024/unarray,MIT OR Apache-2.0,The unarray Authors unicase,https://github.com/seanmonstar/unicase,MIT OR Apache-2.0,Sean McArthur unicode-ident,https://github.com/dtolnay/unicode-ident,(MIT OR Apache-2.0) AND Unicode-3.0,David Tolnay unicode-width,https://github.com/unicode-rs/unicode-width,MIT OR Apache-2.0,"kwantam , Manish Goregaokar " unit-prefix,https://codeberg.org/commons-rs/unit-prefix,MIT,"Fabio Valentini , Benjamin Sago " unsafe-libyaml,https://github.com/dtolnay/unsafe-libyaml,MIT,David Tolnay untrusted,https://github.com/briansmith/untrusted,ISC,Brian Smith ureq-proto,https://github.com/algesten/ureq-proto,MIT OR Apache-2.0,Martin Algesten url,https://github.com/servo/rust-url,MIT OR Apache-2.0,The rust-url developers urlencoding,https://github.com/kornelski/rust_urlencoding,MIT,"Kornel , Bertram Truong " username,https://pijul.org/darcs/user,MIT OR Apache-2.0,Pierre-Étienne Meunier utf-8,https://github.com/SimonSapin/rust-utf8,MIT OR Apache-2.0,Simon Sapin utf8-ranges,https://github.com/BurntSushi/utf8-ranges,Unlicense OR MIT,Andrew Gallant utf8_iter,https://github.com/hsivonen/utf8_iter,Apache-2.0 OR MIT,Henri Sivonen utf8parse,https://github.com/alacritty/vte,Apache-2.0 OR MIT,"Joe Wilm , Christian Duerr " utoipa,https://github.com/juhaku/utoipa,MIT OR Apache-2.0,Juha Kukkonen utoipa-gen,https://github.com/juhaku/utoipa,MIT OR Apache-2.0,Juha Kukkonen uuid,https://github.com/uuid-rs/uuid,Apache-2.0 OR MIT,"Ashley Mannix, Dylan DPC, Hunar Roop Kahlon" valuable,https://github.com/tokio-rs/valuable,MIT,The valuable Authors vsimd,https://github.com/Nugine/simd,MIT,The vsimd Authors vte,https://github.com/alacritty/vte,Apache-2.0 OR MIT,"Joe Wilm , Christian Duerr " wait-timeout,https://github.com/alexcrichton/wait-timeout,MIT OR Apache-2.0,Alex Crichton walkdir,https://github.com/BurntSushi/walkdir,Unlicense OR MIT,Andrew Gallant want,https://github.com/seanmonstar/want,MIT,Sean McArthur warp,https://github.com/seanmonstar/warp,MIT,Sean McArthur wasi,https://github.com/bytecodealliance/wasi,Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT,The Cranelift Project Developers wasip2,https://github.com/bytecodealliance/wasi-rs,Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT,The wasip2 Authors wasix,https://github.com/wasix-org/wasix-abi-rust,Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT,"The Cranelift Project Developers, john-sharratt" wasm-bindgen,https://github.com/wasm-bindgen/wasm-bindgen,MIT OR Apache-2.0,The wasm-bindgen Developers wasm-bindgen-futures,https://github.com/wasm-bindgen/wasm-bindgen/tree/master/crates/futures,MIT OR Apache-2.0,The wasm-bindgen Developers wasm-bindgen-macro,https://github.com/wasm-bindgen/wasm-bindgen/tree/master/crates/macro,MIT OR Apache-2.0,The wasm-bindgen Developers wasm-bindgen-macro-support,https://github.com/wasm-bindgen/wasm-bindgen/tree/master/crates/macro-support,MIT OR Apache-2.0,The wasm-bindgen Developers wasm-bindgen-shared,https://github.com/wasm-bindgen/wasm-bindgen/tree/master/crates/shared,MIT OR Apache-2.0,The wasm-bindgen Developers wasmtimer,https://github.com/whizsid/wasmtimer-rs,MIT,"WhizSid , Pierre Krieger " web-sys,https://github.com/wasm-bindgen/wasm-bindgen/tree/master/crates/web-sys,MIT OR Apache-2.0,The wasm-bindgen Developers web-time,https://github.com/daxpedda/web-time,MIT OR Apache-2.0,The web-time Authors webpki-roots,https://github.com/rustls/webpki-roots,CDLA-Permissive-2.0,The webpki-roots Authors winapi,https://github.com/retep998/winapi-rs,MIT,Peter Atashian winapi,https://github.com/retep998/winapi-rs,MIT OR Apache-2.0,Peter Atashian winapi-i686-pc-windows-gnu,https://github.com/retep998/winapi-rs,MIT OR Apache-2.0,Peter Atashian winapi-util,https://github.com/BurntSushi/winapi-util,Unlicense OR MIT,Andrew Gallant winapi-x86_64-pc-windows-gnu,https://github.com/retep998/winapi-rs,MIT OR Apache-2.0,Peter Atashian windows,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows-collections,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-collections Authors windows-core,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows-core,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-core Authors windows-future,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-future Authors windows-implement,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-implement Authors windows-interface,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-interface Authors windows-link,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows-link,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-link Authors windows-numerics,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-numerics Authors windows-result,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows-result,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-result Authors windows-strings,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows-strings,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-strings Authors windows-sys,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows-sys,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-sys Authors windows-targets,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows-targets,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows-targets Authors windows-threading,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_aarch64_gnullvm,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_aarch64_gnullvm,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_aarch64_gnullvm Authors windows_aarch64_msvc,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_aarch64_msvc,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_aarch64_msvc Authors windows_i686_gnu,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_i686_gnu,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_i686_gnu Authors windows_i686_gnullvm,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_i686_gnullvm,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_i686_gnullvm Authors windows_i686_msvc,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_i686_msvc,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_i686_msvc Authors windows_x86_64_gnu,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_x86_64_gnu,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_x86_64_gnu Authors windows_x86_64_gnullvm,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_x86_64_gnullvm,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_x86_64_gnullvm Authors windows_x86_64_msvc,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,Microsoft windows_x86_64_msvc,https://github.com/microsoft/windows-rs,MIT OR Apache-2.0,The windows_x86_64_msvc Authors winnow,https://github.com/winnow-rs/winnow,MIT,The winnow Authors wit-bindgen,https://github.com/bytecodealliance/wit-bindgen,Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT,Alex Crichton writeable,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers xmlparser,https://github.com/RazrFalcon/xmlparser,MIT OR Apache-2.0,Yevhenii Reizner yansi,https://github.com/SergioBenitez/yansi,MIT OR Apache-2.0,Sergio Benitez yoke,https://github.com/unicode-org/icu4x,Unicode-3.0,Manish Goregaokar yoke-derive,https://github.com/unicode-org/icu4x,Unicode-3.0,Manish Goregaokar zerocopy,https://github.com/google/zerocopy,BSD-2-Clause OR Apache-2.0 OR MIT,"Joshua Liebow-Feeser , Jack Wrenn " zerocopy-derive,https://github.com/google/zerocopy,BSD-2-Clause OR Apache-2.0 OR MIT,"Joshua Liebow-Feeser , Jack Wrenn " zerofrom,https://github.com/unicode-org/icu4x,Unicode-3.0,Manish Goregaokar zerofrom-derive,https://github.com/unicode-org/icu4x,Unicode-3.0,Manish Goregaokar zeroize,https://github.com/RustCrypto/utils,Apache-2.0 OR MIT,The RustCrypto Project Developers zerotrie,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers zerovec,https://github.com/unicode-org/icu4x,Unicode-3.0,The ICU4X Project Developers zerovec-derive,https://github.com/unicode-org/icu4x,Unicode-3.0,Manish Goregaokar zmij,https://github.com/dtolnay/zmij,MIT,David Tolnay zstd,https://github.com/gyscos/zstd-rs,MIT,Alexandre Bury zstd-safe,https://github.com/gyscos/zstd-rs,MIT OR Apache-2.0,Alexandre Bury zstd-sys,https://github.com/gyscos/zstd-rs,MIT OR Apache-2.0,Alexandre Bury ================================================ FILE: Makefile ================================================ DOCKER_SERVICES ?= all QUICKWIT_SRC = quickwit help: @grep '^[^\.#[:space:]].*:' Makefile IMAGE_TAG := $(shell git branch --show-current | tr '\#/' '-') QW_COMMIT_DATE := $(shell TZ=UTC0 git log -1 --format=%cd --date=format-local:'%Y-%m-%dT%H:%M:%SZ') QW_COMMIT_HASH := $(shell git rev-parse HEAD) QW_COMMIT_TAGS := $(shell git tag --points-at HEAD | tr '\n' ',') docker-build: @docker build \ --build-arg QW_COMMIT_DATE=$(QW_COMMIT_DATE) \ --build-arg QW_COMMIT_HASH=$(QW_COMMIT_HASH) \ --build-arg QW_COMMIT_TAGS=$(QW_COMMIT_TAGS) \ -t quickwit/quickwit:$(IMAGE_TAG) . # Usage: # `make docker-compose-up` starts all the services. # `make docker-compose-up DOCKER_SERVICES='jaeger,localstack'` starts the subset of services matching the profiles. docker-compose-up: @echo "Launching ${DOCKER_SERVICES} Docker service(s)" COMPOSE_PROFILES=$(DOCKER_SERVICES) docker compose -f docker-compose.yml up -d --remove-orphans --wait docker-compose-down: docker compose -p quickwit down --remove-orphans docker-compose-logs: docker compose logs -f docker-compose.yml -t docker-compose-monitoring: COMPOSE_PROFILES=monitoring docker compose -f docker-compose.yml up -d --remove-orphans docker-rm-postgres-volume: docker volume rm quickwit_postgres_data docker-rm-volumes: docker volume rm quickwit_azurite_data quickwit_fake_gcs_server_data quickwit_grafana_conf quickwit_grafana_data quickwit_localstack_data quickwit_postgres_data doc: @$(MAKE) -C $(QUICKWIT_SRC) doc fmt: @$(MAKE) -C $(QUICKWIT_SRC) fmt fix: @$(MAKE) -C $(QUICKWIT_SRC) fix typos: typos # Usage: # `make test-all` starts the Docker services and runs all the tests. # `make -k test-all docker-compose-down`, tears down the Docker services after running all the tests. test-all: docker-compose-up @$(MAKE) -C $(QUICKWIT_SRC) test-all test-failpoints: @$(MAKE) -C $(QUICKWIT_SRC) test-failpoints # This will build and push all custom cross images for cross-compilation. # You will need to login into Docker Hub with the `quickwit` account. IMAGE_TAGS = x86_64-unknown-linux-gnu aarch64-unknown-linux-gnu x86_64-unknown-linux-musl aarch64-unknown-linux-musl .PHONY: cross-images cross-images: @for tag in ${IMAGE_TAGS}; do \ docker build --tag quickwit/cross:$$tag --file ./build/cross-images/$$tag.dockerfile ./build/cross-images; \ docker push quickwit/cross:$$tag; \ done # TODO: to be replaced by https://github.com/quickwit-oss/quickwit/issues/237 .PHONY: build build: build-ui $(MAKE) -C $(QUICKWIT_SRC) build # Usage: # `BINARY_FILE=path/to/quickwit/binary BINARY_VERSION=0.1.0 ARCHIVE_NAME=quickwit make archive` # - BINARY_FILE: Path of the quickwit binary file. # - BINARY_VERSION: Version of the quickwit binary. # - ARCHIVE_NAME: Name of the resulting archive file (without extension). .PHONY: archive archive: @echo "Archiving release binary & assets" @mkdir -p "./quickwit-${BINARY_VERSION}/config" @mkdir -p "./quickwit-${BINARY_VERSION}/qwdata" @cp ./config/quickwit.yaml "./quickwit-${BINARY_VERSION}/config" @cp ./LICENSE "./quickwit-${BINARY_VERSION}" @cp "${BINARY_FILE}" "./quickwit-${BINARY_VERSION}" @tar -czf "${ARCHIVE_NAME}.tar.gz" "./quickwit-${BINARY_VERSION}" @rm -rf "./quickwit-${BINARY_VERSION}" workspace-deps-tree: $(MAKE) -C $(QUICKWIT_SRC) workspace-deps-tree .PHONY: build-rustdoc build-rustdoc: $(MAKE) -C $(QUICKWIT_SRC) build-rustdoc .PHONY: build-ui build-ui: $(MAKE) -C $(QUICKWIT_SRC) build-ui ================================================ FILE: README.md ================================================ [![CI](https://github.com/quickwit-oss/quickwit/actions/workflows/ci.yml/badge.svg)](https://github.com/quickwit-oss/quickwit/actions?query=workflow%3ACI+branch%3Amain) [![codecov](https://codecov.io/gh/quickwit-oss/quickwit/branch/main/graph/badge.svg?token=06SRGAV5SS)](https://codecov.io/gh/quickwit-oss/quickwit) [![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/quickwit-oss/quickwit/badge)](https://scorecard.dev/viewer/?uri=github.com/quickwit-oss/quickwit) [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](CODE_OF_CONDUCT.md) [![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](LICENSE) [![Twitter Follow](https://img.shields.io/twitter/follow/Quickwit_Inc?color=%231DA1F2&logo=Twitter&style=plastic)](https://twitter.com/Quickwit_Inc) [![Discord](https://img.shields.io/discord/908281611840282624?logo=Discord&logoColor=%23FFFFFF&style=plastic)](https://discord.quickwit.io)


Quickwit Cloud-Native Search Engine Quickwit Cloud-Native Search Engine

Cloud-native search engine for observability (logs, traces, and soon metrics!). An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Quickstart | Docs | Tutorials | Chat | Download


We just released Quickwit 0.8! Read the [blog post](https://quickwit.io/blog/quickwit-0.8) to learn about the latest powerful features! ### **Quickwit is the fastest search engine on cloud storage. It's the perfect fit for observability use cases** - [Log management](https://quickwit.io/docs/log-management/overview) - [Distributed tracing](https://quickwit.io/docs/distributed-tracing/overview) - Metrics support is on the roadmap ### 🚀 Quickstart - [Search and analytics on Stack Overflow dataset](https://quickwit.io/docs/get-started/quickstart) - [Trace analytics with Grafana](https://quickwit.io/docs/get-started/tutorials/trace-analytics-with-grafana) - [Distributed tracing with Jaeger](https://quickwit.io/docs/get-started/tutorials/tutorial-jaeger)

# 💡 Features - Full-text search and aggregation queries - Elasticsearch-compatible API, use Quickwit with any Elasticsearch or OpenSearch client - [Jaeger-native](https://quickwit.io/docs/distributed-tracing/plug-quickwit-to-jaeger) - OTEL-native for [logs](https://quickwit.io/docs/log-management/overview) and [traces](https://quickwit.io/docs/distributed-tracing/overview) - [Schemaless](https://quickwit.io/docs/guides/schemaless) or strict schema indexing - Schemaless analytics - Sub-second search on cloud storage (Amazon S3, Azure Blob Storage, Google Cloud Storage, …) - Decoupled compute and storage, stateless indexers & searchers - [Grafana data source](https://github.com/quickwit-oss/quickwit-datasource) - Kubernetes ready - See our [helm-chart](https://quickwit.io/docs/deployment/kubernetes/helm) - RESTful API ## Enterprise ready - Multiple [data sources](https://quickwit.io/docs/ingest-data/) Kafka / Kinesis / Pulsar native - Multi-tenancy: indexing with many indexes and partitioning - Retention policies - Delete tasks (for GDPR use cases) - Distributed and highly available* engine that scales out in seconds (*HA indexing only with Kafka) # 📑 Architecture overview ![Quickwit Distributed Tracing](./docs/assets/images/quickwit-overview-light.svg#gh-light-mode-only)![Quickwit Distributed Tracing](./docs/assets/images/quickwit-overview-dark.svg#gh-dark-mode-only) - [Architecture overview]([https://quickwit.io/docs/distributed-tracing/overview](https://quickwit.io/docs/overview/architecture)) - [Log management](https://quickwit.io/docs/log-management/overview) - [Distributed traces](https://quickwit.io/docs/distributed-tracing/overview) # 📕 Documentation - [Installation](https://quickwit.io/docs/get-started/installation) - [Log management with Quickwit](https://quickwit.io/docs/log-management/overview) - [Distributed Tracing with Quickwit](https://quickwit.io/docs/distributed-tracing/overview) - [Ingest data](https://quickwit.io/docs/ingest-data/) - [REST API](https://quickwit.io/docs/reference/rest-api) # 📚 Resources - [Blog posts](https://quickwit.io/blog/) - [Youtube channel](https://www.youtube.com/@quickwit8103) - [Discord](https://discord.quickwit.io) # 🔮 Roadmap - Quickwit 0.9 (July 2024) - Indexing and search performance improvements - Index configuration updates (retention policy, indexing and search settings) - Concatenated field - Quickwit 0.10 (October 2024) - Schema (doc mapping) updates - Native distributed ingestion - Index templates # 🙋 FAQ ### How can I switch from Elasticsearch or OpenSearch to Quickwit? Quickwit supports a large subset of Elasticsearch/OpenSearch API. For instance, it has an ES-compatible ingest API to make it easier to migrate your log shippers (Vector, Fluent Bit, Syslog, ...) to Quickwit. On the search side, the most popular Elasticsearch endpoints, query DSL, and even aggregations are supported. The list of available endpoints and queries is available [here](https://quickwit.io/docs/reference/es_compatible_api), while the list of supported aggregations is available [here](https://quickwit.io/docs/reference/aggregation). Let us know if part of the API you are using is missing! If the client you are using is refusing to connect to Quickwit due to missing headers, you can use the `extra_headers` option in the [node configuration](https://quickwit.io/docs/configuration/node-config#rest-configuration) to impersonate any compatible version of Elasticsearch or OpenSearch. ### How is Quickwit different from traditional search engines like Elasticsearch or Solr? The core difference and advantage of Quickwit is its architecture built from the ground to search on cloud storage. We optimized IO paths, revamped the index data structures and made search stateless and sub-second on cloud storage. ### How does Quickwit compare to Elastic in terms of cost? We estimate that Quickwit can be up to 10x cheaper on average than Elastic. To understand how, check out our [blog post](https://quickwit.io/blog/commoncrawl/) about searching the web on AWS S3. ### What license does Quickwit use? Quickwit is open-source under the Apache License, Version 2.0 - Apache-2.0. ### Is it possible to set up Quickwit for a High Availability (HA)? HA is available for search, for indexing it's available only with a Kafka source. # 🤝 Contribute and spread the word We are always thrilled to receive contributions: code, documentation, issues, or feedback. Here's how you can help us build the future of log management: - Start by checking out the [GitHub issues labeled "Good first issue"](https://github.com/quickwit-oss/quickwit/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22). These are a great place for newcomers to contribute. - Read our [Contributor Covenant Code of Conduct](./CODE_OF_CONDUCT.md) to understand our community standards. - [Create a fork of Quickwit](https://github.com/quickwit-oss/quickwit/fork) to have your own copy of the repository where you can make changes. - To understand how to contribute, read our [contributing guide](./CONTRIBUTING.md). - Set up your development environment following our [development setup guide](./CONTRIBUTING.md#development). - Once you've made your changes and tested them, you can contribute by [submitting a pull request](./CONTRIBUTING.md#submitting-a-pr). ✨ After your contributions are accepted, don't forget to claim your swag by emailing us at hello@quickwit.io. Thank you for contributing! # 💬 Join Our Community We welcome everyone to our community! Whether you're contributing code or just saying hello, we'd love to hear from you. Here's how you can connect with us: - Join the conversation on [Discord](https://discord.quickwit.io). - Follow us on [Twitter](https://twitter.com/Quickwit_Inc). - Check out our [website](https://quickwit.io/) and [blog](https://quickwit.io/blog) for the latest updates. - Watch our [YouTube](https://www.youtube.com/channel/UCvZVuRm2FiDq1_ul0mY85wA) channel for video content. ================================================ FILE: SECURITY.md ================================================ # Security Policy ## Supported Versions | Version | Supported | | ------- | ------------------ | | 0.3.1 | :white_check_mark: | | < 0.3.1 | :x: | ## Reporting a Vulnerability To disclose a vulnerability in our code, please notify us by email at security@quickwit.io or private message _@fulmicoton_ or _@guilload_ on our Discord server ([discord.quickwit.io](https://discord.quickwit.io)). We will open a draft security advisory on our repository and grant you access so you can share with us more details about the vulnerability. After releasing a fix, we will publish the security advisory to publicly disclose the security vulnerability to the project's community. ================================================ FILE: _typos.toml ================================================ [files] extend-exclude = ["**/*.json"] [default.extend-words] # Don't correct the surname "Teh" strat = "strat" ================================================ FILE: build/cross-images/aarch64-unknown-linux-gnu.dockerfile ================================================ FROM ghcr.io/cross-rs/aarch64-unknown-linux-gnu:0.2.4@sha256:3356619b020614effd22e83cec41236e69f17ce581ffe35e252898b0c693b4e2 ARG PBC_URL="https://github.com/protocolbuffers/protobuf/releases/download/v21.5/protoc-21.5-linux-x86_64.zip" #TODO: # We can switch to static linking (remove `libsasl2-dev:arm64`) using # `rdkafka/gssapi-vendored` feature when there is a release including: # https://github.com/MaterializeInc/rust-sasl/pull/48 RUN dpkg --add-architecture arm64 && \ apt-get update && \ apt-get install -y clang-3.9 \ libclang-3.9-dev \ binutils-aarch64-linux-gnu \ libsasl2-dev:arm64 \ unzip && \ rm -rf /var/lib/apt/lists/* RUN curl -fLO $PBC_URL && \ unzip protoc-21.5-linux-x86_64.zip -d ./protobuf && \ mv ./protobuf/bin/protoc /usr/bin/ && \ rm -rf ./protobuf protoc-21.5-linux-x86_64.zip ENV LIBZ_SYS_STATIC=1 \ PKG_CONFIG_ALLOW_CROSS=true \ PKG_CONFIG_ALL_STATIC=true \ X86_64_UNKNOWN_LINUX_MUSL_OPENSSL_STATIC=1 \ X86_64_UNKNOWN_LINUX_MUSL_OPENSSL_DIR=/usr/local/musl/ ================================================ FILE: build/cross-images/aarch64-unknown-linux-musl.dockerfile ================================================ FROM rustembedded/cross:aarch64-unknown-linux-musl@sha256:22627e0ba533781062127b13601c37216fdca27123390b07dfabd3f31f3c84a0 # The Rust toolchain to use when building our image. Set by `hooks/build`. # ARG TOOLCHAIN=stable # The OpenSSL version to use. Here is the place to check for new releases: # # - https://www.openssl.org/source/ # # ALSO UPDATE hooks/build! ARG OPENSSL_VERSION=1.1.1i ARG ZLIB_VERSION=1.2.11 RUN echo "Building OpenSSL" && \ cd /tmp && \ short_version="$(echo "$OPENSSL_VERSION" | sed s'/[a-z]$//' )" && \ curl -fLO "https://www.openssl.org/source/openssl-$OPENSSL_VERSION.tar.gz" || \ curl -fLO "https://www.openssl.org/source/old/$short_version/openssl-$OPENSSL_VERSION.tar.gz" && \ tar xvzf "openssl-$OPENSSL_VERSION.tar.gz" && cd "openssl-$OPENSSL_VERSION" && \ AR=aarch64-linux-musl-ar CC=aarch64-linux-musl-gcc ./Configure no-zlib -fPIC --prefix=/usr/local/aarch64-linux-musl -DOPENSSL_NO_SECURE_MEMORY linux-aarch64 && \ env C_INCLUDE_PATH=/usr/local/aarch64-linux-musl/include/ make depend && \ env C_INCLUDE_PATH=/usr/local/aarch64-linux-musl/include/ make && \ make install && \ rm -r /tmp/* RUN echo "Building zlib" && \ cd /tmp && \ curl -fLO "https://zlib.net/fossils/zlib-$ZLIB_VERSION.tar.gz" && \ tar xzf "zlib-$ZLIB_VERSION.tar.gz" && cd "zlib-$ZLIB_VERSION" && \ AR=aarch64-linux-musl-ar CC=aarch64-linux-musl-gcc ./configure --static --prefix=/usr/local/aarch64-linux-musl && \ make && make install && \ rm -r /tmp/* ENV AARCH64_UNKNOWN_LINUX_MUSL_OPENSSL_STATIC=1 \ CC=aarch64-linux-musl-gcc \ CFLAGS=-I/usr/local/aarch64-linux-musl/include \ LIBZ_SYS_STATIC=1 \ LIB_LDFLAGS=-L/usr/local/aarch64-linux-musl/lib \ OPENSSL_INCLUDE_DIR=/usr/local/aarch64-linux-musl/include/openssl \ OPENSSL_LIB_DIR=/usr/local/aarch64-linux-musl/lib \ PKG_CONFIG_ALLOW_CROSS=true \ PKG_CONFIG_ALL_STATIC=true \ TARGET=aarch64-unknown-linux-musl \ AARCH64_UNKNOWN_LINUX_MUSL_OPENSSL_DIR=/usr/local/aarch64-linux-musl \ OPENSSL_ROOT_DIR=/usr/local/aarch64-linux-musl ================================================ FILE: build/cross-images/x86_64-unknown-linux-gnu.dockerfile ================================================ FROM ghcr.io/cross-rs/x86_64-unknown-linux-gnu:0.2.4@sha256:7c9067212c2283be2a1d5585af5ecebd4c4a2e18091e2a6aafd23f9b4b81d496 ARG PBC_URL="https://github.com/protocolbuffers/protobuf/releases/download/v21.5/protoc-21.5-linux-x86_64.zip" RUN apt-get update && \ apt-get install -y clang-3.9 \ libclang-3.9-dev \ libsasl2-dev \ unzip && \ rm -rf /var/lib/apt/lists/* RUN curl -fLO $PBC_URL && \ unzip protoc-21.5-linux-x86_64.zip -d ./protobuf && \ mv ./protobuf/bin/protoc /usr/bin/ && \ rm -rf ./protobuf protoc-21.5-linux-x86_64.zip ================================================ FILE: build/cross-images/x86_64-unknown-linux-musl.dockerfile ================================================ FROM quickwit/cross-base:x86_64-unknown-linux-musl@sha256:5bcc7843aab64f89bf85c464fa2c5a00ecc634a8b1ac88c84a864f60054450cb # See https://github.com/quickwit-inc/rust-musl-builder RUN echo "Upgrading CMake" && \ sudo apt-get remove cmake -y && \ curl -fLO https://www.cmake.org/files/v3.12/cmake-3.12.1.tar.gz && \ tar -xvzf cmake-3.12.1.tar.gz && \ cd cmake-3.12.1/ && ./configure && \ sudo make install ENV CC=musl-gcc \ CFLAGS=-I/usr/local/musl/include \ LIB_LDFLAGS=-L/usr/lib/x86_64-linux-gnu ================================================ FILE: config/quickwit.yaml ================================================ # ============================ Node Configuration ============================== # # Website: https://quickwit.io # Docs: https://quickwit.io/docs/configuration/node-config # # Configure AWS credentials: https://quickwit.io/docs/guides/aws-setup#aws-credentials # # -------------------------------- General settings -------------------------------- # # Config file format version. # version: 0.8 # # Node ID. Must be unique within a cluster. If not set, a random node ID is generated on each startup. # # node_id: node-1 # # Quickwit opens three sockets. # - for its HTTP server, hosting the UI and the REST API (TCP) # - for its gRPC service (TCP) # - for its Gossip cluster membership service (UDP) # # All three services are bound to the same host and a different port. The host can be an IP address or a hostname. # # Default HTTP server host is `127.0.0.1` and default HTTP port is 7280. # The default host value was chosen to avoid exposing the node to the open-world without users' explicit consent. # This allows for testing Quickwit in single-node mode or with multiple nodes running on the same host and listening # on different ports. However, in cluster mode, using this value is never appropriate because it causes the node to # ignore incoming traffic. # There are two options to set up a node in cluster mode: # 1. specify the node's hostname or IP # 2. pass `0.0.0.0` and let Quickwit do its best to discover the node's IP (see `advertise_address`) # # listen_address: 127.0.0.1 # # rest: # listen_port: 7280 # cors_allow_origins: # - "http://localhost:3000" # extra_headers: # x-header-1: header-value-1 # x-header-2: header-value-2 # # grpc: # max_message_size: 10 MiB # # IP address advertised by the node, i.e. the IP address that peer nodes should use to connect to the node for RPCs. # The environment variable `QW_ADVERTISE_ADDRESS` can also be used to override this value. # The default advertise address is `listen_address`. If `listen_address` is unspecified (`0.0.0.0`), # Quickwit attempts to sniff the node's IP by scanning the available network interfaces. # advertise_address: 192.168.0.42 # # In order to join a cluster, one needs to specify a list of # seeds to connect to. If no port is specified, Quickwit will assume # the seeds are using the same port as the current node gossip port. # By default, the peer seed list is empty. # # peer_seeds: # - quickwit-searcher-0.local # - quickwit-searcher-1.local:10000 # # Path to directory where temporary data (caches, intermediate indexing data structures) # is stored. Defaults to `./qwdata`. # # data_dir: /path/to/data/dir # # Metastore URI. Defaults to `data_dir/indexes#polling_interval=30s`, # which is a file-backed metastore and mostly convenient for testing. A cluster would # require a metastore backed by Amzon S3 or PostgreSQL. # # metastore_uri: s3://your-bucket/indexes # metastore_uri: postgres://username:password@host:port/db # # When using a file-backed metastore, the state of the metastore will be cached forever. # If you are indexing and searching from different processes, it is possible to periodically # refresh the state of the metastore on the searcher using the `polling_interval` hashtag. # # metastore_uri: s3://your-bucket/indexes#polling_interval=30s # # Default index root URI, which defines where index data (splits) is stored, # following the scheme `{default_index_root_uri}/{index-id}`. Defaults to `{data_dir}/indexes`. # # default_index_root_uri: s3://your-bucket/indexes # # -------------------------------- Storage settings -------------------------------- # https://quickwit.io/docs/configuration/node-config#storage-configuration # # Hardcoding credentials into configuration files is not secure and strongly # discouraged. Prefer the alternative authentication methods that your storage # backend may provide. # # storage: # azure: # account: ${QW_AZURE_STORAGE_ACCOUNT} # access_key: ${QW_AZURE_STORAGE_ACCESS_KEY} # # s3: # access_key_id: ${AWS_ACCESS_KEY_ID} # secret_access_key: ${AWS_SECRET_ACCESS_KEY} # region: ${AWS_REGION} # endpoint: ${QW_S3_ENDPOINT} # force_path_style_access: ${QW_S3_FORCE_PATH_STYLE_ACCESS:-false} # disable_multi_object_delete: false # disable_multipart_upload: false # # -------------------------------- Metastore settings -------------------------------- # https://quickwit.io/docs/configuration/node-config#metastore-configuration # # metastore: # postgres: # min_connections: 0 # max_connections: 10 # acquire_connection_timeout: 10s # idle_connection_timeout: 10min # max_connection_lifetime: 30min # # -------------------------------- Indexer settings -------------------------------- # https://quickwit.io/docs/configuration/node-config#indexer-configuration indexer: enable_otlp_endpoint: ${QW_ENABLE_OTLP_ENDPOINT:-true} # split_store_max_num_bytes: 100G # split_store_max_num_splits: 1000 # max_concurrent_split_uploads: 12 # # # -------------------------------- Ingest API settings ------------------------------ # https://quickwit.io/docs/configuration/node-config#ingest-api-configuration # # ingest_api: # max_queue_memory_usage: 2GiB # max_queue_disk_usage: 4GiB # content_length_limit: 10MiB # # -------------------------------- Searcher settings -------------------------------- # https://quickwit.io/docs/configuration/node-config#searcher-configuration # # searcher: # fast_field_cache_capacity: 1G # split_footer_cache_capacity: 500M # partial_request_cache_capacity: 64M # max_num_concurrent_split_streams: 100 # max_num_concurrent_split_searches: 100 # aggregation_memory_limit: 500M # aggregation_bucket_limit: 65000 # split_cache: # max_num_bytes: 1G # max_num_splits: 10000 # num_concurrent_downloads: 1 # -------------------------------- Jaeger settings -------------------------------- jaeger: enable_endpoint: ${QW_ENABLE_JAEGER_ENDPOINT:-true} ================================================ FILE: config/templates/gh-archive.yaml ================================================ version: 0.8 template_id: gh-archive index_id_patterns: - gh-archive* description: Index config template for the GH Archive dataset (gharchive.org) priority: 0 doc_mapping: field_mappings: - name: id type: text tokenizer: raw - name: type type: text fast: true tokenizer: raw - name: public type: bool fast: true - name: payload type: json tokenizer: default - name: org type: json tokenizer: default - name: repo type: json tokenizer: default - name: actor type: json tokenizer: default - name: other type: json tokenizer: default - name: created_at type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: created_at indexing_settings: commit_timeout_secs: 10 ================================================ FILE: config/templates/stackoverflow.yaml ================================================ version: 0.8 template_id: stackoverflow index_id_patterns: - stackoverflow* description: Index config template for the Stackoverflow tutorial (quickwit.io/docs/get-started/quickstart) priority: 0 doc_mapping: field_mappings: - name: title type: text tokenizer: default record: position stored: true - name: body type: text tokenizer: default record: position stored: true - name: creationDate type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: creationDate search_settings: default_search_fields: [title, body] indexing_settings: commit_timeout_secs: 10 ================================================ FILE: config/tutorials/fluentbit-logs/index-config.yaml ================================================ version: 0.8 index_id: fluentbit-logs doc_mapping: mode: dynamic field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast: true timestamp_field: timestamp indexing_settings: commit_timeout_secs: 10 ================================================ FILE: config/tutorials/gh-archive/index-config-for-clickhouse.yaml ================================================ # # Index config file for gh-archive dataset. # version: 0.8 index_id: gh-archive doc_mapping: store_source: false field_mappings: - name: id type: u64 fast: true - name: created_at type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: event_type type: text tokenizer: raw - name: title type: text tokenizer: default record: position - name: body type: text tokenizer: default record: position timestamp_field: created_at search_settings: default_search_fields: [title, body] ================================================ FILE: config/tutorials/gh-archive/index-config.yaml ================================================ # # Index config file for gh-archive dataset. # version: 0.8 index_id: gh-archive doc_mapping: field_mappings: - name: id type: text tokenizer: raw - name: type type: text fast: true tokenizer: raw - name: public type: bool fast: true - name: payload type: json tokenizer: default - name: org type: json tokenizer: default - name: repo type: json tokenizer: default - name: actor type: json tokenizer: default - name: other type: json tokenizer: default - name: created_at type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: created_at indexing_settings: commit_timeout_secs: 10 ================================================ FILE: config/tutorials/gh-archive/kafka-source.yaml ================================================ version: 0.8 source_id: kafka-source source_type: kafka num_pipelines: 2 params: topic: gh-archive client_params: bootstrap.servers: localhost:9092 ================================================ FILE: config/tutorials/gh-archive/kinesis-source.yaml ================================================ version: 0.8 source_id: kinesis-source source_type: kinesis params: stream_name: gh-archive ================================================ FILE: config/tutorials/grafana/docker-compose.yml ================================================ version: "3.9" networks: default: name: quickwit-grafana # ipam: # config: # - subnet: 172.16.7.0/24 # gateway: 172.16.7.1 services: quickwit: image: quickwit/quickwit:${QUICKWIT_VERSION:-0.7.1} grafana: image: grafana/grafana-oss:${GRAFANA_VERSION:-9.4.7} container_name: grafana ports: - "${MAP_HOST_GRAFANA:-127.0.0.1}:3000:3000" environment: GF_AUTH_DISABLE_LOGIN_FORM: "true" GF_AUTH_ANONYMOUS_ENABLED: "true" GF_AUTH_ANONYMOUS_ORG_ROLE: Admin volumes: - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards - ./monitoring/grafana/provisioning:/etc/grafana/provisioning jaeger: image: jaegertracing/all-in-one:${JAEGER_VERSION:-1.48.0} container_name: jaeger ports: - "${MAP_HOST_JAEGER:-127.0.0.1}:16686:16686" # Frontend profiles: - jaeger - monitoring otel-collector: image: otel/opentelemetry-collector:${OTEL_VERSION:-0.84.0} container_name: otel-collector ports: - "${MAP_HOST_OTEL:-127.0.0.1}:1888:1888" # pprof extension - "${MAP_HOST_OTEL:-127.0.0.1}:8888:8888" # Prometheus metrics exposed by the collector - "${MAP_HOST_OTEL:-127.0.0.1}:8889:8889" # Prometheus exporter metrics - "${MAP_HOST_OTEL:-127.0.0.1}:13133:13133" # health_check extension - "${MAP_HOST_OTEL:-127.0.0.1}:4317:4317" # OTLP gRPC receiver - "${MAP_HOST_OTEL:-127.0.0.1}:4318:4318" # OTLP http receiver - "${MAP_HOST_OTEL:-127.0.0.1}:55679:55679" # zpages extension profiles: - otel - monitoring volumes: - ./monitoring/otel-collector-config.yaml:/etc/otel-collector-config.yaml command: ["--config=/etc/otel-collector-config.yaml"] prometheus: image: prom/prometheus:${PROMETHEUS_VERSION:-v2.43.0} container_name: prometheus ports: - "${MAP_HOST_PROMETHEUS:-127.0.0.1}:9090:9090" profiles: - prometheus - monitoring volumes: - ./monitoring/prometheus.yaml:/etc/prometheus/prometheus.yml extra_hosts: - "host.docker.internal:host-gateway" gcp-pubsub-emulator: # It is not an official docker image # if we prefer we can build a docker from the official docker image (gcloud cli) # and install the pubsub emulator https://cloud.google.com/pubsub/docs/emulator image: thekevjames/gcloud-pubsub-emulator:${GCLOUD_EMULATOR:-455.0.0} container_name: gcp-pubsub-emulator ports: - "${MAP_HOST_GCLOUD_EMULATOR:-127.0.0.1}:8681:8681" environment: # create a fake gcp project and a topic / subscription - PUBSUB_PROJECT1=quickwit-emulator,emulator_topic:emulator_subscription profiles: - all - gcp-pubsub volumes: localstack_data: postgres_data: azurite_data: ================================================ FILE: config/tutorials/hdfs-logs/index-config-partitioned.yaml ================================================ # # Index config file for hdfs-logs dataset with partitioning configured. # version: 0.8 index_id: hdfs-logs-partitioned doc_mapping: field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: tenant_id type: u64 - name: severity_text type: text tokenizer: raw - name: body type: text tokenizer: default record: position - name: resource type: json tokenizer: raw tag_fields: [tenant_id] partition_key: tenant_id max_num_partitions: 1000 timestamp_field: timestamp search_settings: default_search_fields: [severity_text, body] indexing_settings: commit_timeout_secs: 30 split_num_docs_target: 10000000 merge_policy: type: "limit_merge" merge_factor: 10 max_merge_ops: 3 maturation_period: 48 hours ================================================ FILE: config/tutorials/hdfs-logs/index-config-retention-policy.yaml ================================================ # # Index config file for hdfs-logs dataset with a retention policy configured. # version: 0.8 index_id: hdfs-logs-retention-policy doc_mapping: field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: tenant_id type: u64 - name: severity_text type: text tokenizer: raw - name: body type: text tokenizer: default record: position - name: resource type: json tokenizer: raw tag_fields: [tenant_id] timestamp_field: timestamp search_settings: default_search_fields: [severity_text, body] retention: period: 90 days schedule: daily indexing_settings: commit_timeout_secs: 10 split_num_docs_target: 10000000 ================================================ FILE: config/tutorials/hdfs-logs/index-config.yaml ================================================ # # Index config file for hdfs-logs dataset. # version: 0.8 index_id: hdfs-logs doc_mapping: field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: tenant_id type: u64 - name: severity_text type: text tokenizer: raw - name: body type: text tokenizer: default record: position - name: resource type: json tokenizer: raw tag_fields: [tenant_id] timestamp_field: timestamp search_settings: default_search_fields: [severity_text, body] ================================================ FILE: config/tutorials/hdfs-logs/searcher-1.yaml ================================================ version: 0.8 node_id: searcher-1 listen_address: 127.0.0.1 rest: listen_port: 7280 ingest_api: max_queue_memory_usage: 4GiB max_queue_disk_usage: 8GiB peer_seeds: - 127.0.0.1:7290 # searcher-2 - 127.0.0.1:7300 # searcher-3 ================================================ FILE: config/tutorials/hdfs-logs/searcher-2.yaml ================================================ version: 0.8 node_id: searcher-2 listen_address: 127.0.0.1 rest: listen_port: 7290 peer_seeds: - 127.0.0.1:7280 # searcher-1 - 127.0.0.1:7300 # searcher-3 ================================================ FILE: config/tutorials/hdfs-logs/searcher-3.yaml ================================================ version: 0.8 node_id: searcher-3 listen_address: 127.0.0.1 rest: listen_port: 7300 peer_seeds: - 127.0.0.1:7280 # searcher-1 - 127.0.0.1:7290 # searcher-2 ================================================ FILE: config/tutorials/otel-logs/index-config.yaml ================================================ # # Index config file for receiving logs in OpenTelemetry format. # Link: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md # version: 0.8 index_id: otel-log-v0 doc_mapping: field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast: true - name: severity type: text tokenizer: raw fast: true - name: body type: text tokenizer: default record: position - name: attributes type: json - name: resource type: json timestamp_field: timestamp search_settings: default_search_fields: [severity, body] ================================================ FILE: config/tutorials/otel-logs/kafka-source.yaml ================================================ version: 0.8 source_id: kafka-source source_type: kafka input_format: otlp_logs_proto params: topic: otlp_logs client_params: bootstrap.servers: localhost:9092 ================================================ FILE: config/tutorials/otel-logs/otel-values.yaml ================================================ mode: "daemonset" presets: logsCollection: enabled: true kubernetesAttributes: enabled: true config: exporters: otlp: endpoint: quickwit-indexer.qw-tutorial.svc.cluster.local:7281 tls: insecure: true service: pipelines: logs: exporters: - otlp ================================================ FILE: config/tutorials/otel-traces/index-config.yaml ================================================ # # Index config file for receiving logs in OpenTelemetry format. # Link: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md # version: 0.8 index_id: otel-trace-v0 doc_mapping: mode: lenient field_mappings: - name: trace_id type: bytes - name: trace_state type: text indexed: false - name: resource_attributes type: json tokenizer: raw - name: resource_dropped_attributes_count type: u64 indexed: false - name: service_name type: text tokenizer: raw - name: span_id type: bytes - name: span_kind type: u64 - name: span_name type: text tokenizer: raw - name: span_start_timestamp_secs type: datetime indexed: true fast_precision: seconds fast: true input_formats: [unix_timestamp] output_format: unix_timestamp_secs - name: span_start_timestamp_nanos type: i64 indexed: false - name: span_end_timestamp_nanos type: i64 indexed: false - name: span_duration_secs type: i64 indexed: false - name: span_attributes type: json tokenizer: raw - name: span_dropped_attributes_count type: u64 indexed: false - name: span_dropped_events_count type: u64 indexed: false - name: span_dropped_links_count type: u64 indexed: false - name: span_status type: json indexed: false - name: parent_span_id type: bytes - name: events type: array tokenizer: raw - name: links type: array tokenizer: raw timestamp_field: span_start_timestamp_secs partition_key: service_name max_num_partitions: 100 indexing_settings: commit_timeout_secs: 30 search_settings: default_search_fields: [] ================================================ FILE: config/tutorials/otel-traces/kafka-source.yaml ================================================ version: 0.8 source_id: kafka-source source_type: kafka input_format: otlp_traces_proto params: topic: otlp_spans client_params: bootstrap.servers: localhost:9092 ================================================ FILE: config/tutorials/stackoverflow/index-config.yaml ================================================ # # Index config file for stackoverflow dataset. # version: 0.8 index_id: stackoverflow doc_mapping: field_mappings: - name: title type: text tokenizer: default record: position stored: true - name: body type: text tokenizer: default record: position stored: true - name: creationDate type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: creationDate search_settings: default_search_fields: [title, body] indexing_settings: commit_timeout_secs: 10 ================================================ FILE: config/tutorials/stackoverflow/pulsar-source.yaml ================================================ version: 0.8 source_id: pulsar-source source_type: pulsar params: topics: - quickwit/pulsar/stackoverflow address: pulsar://localhost:6650 ================================================ FILE: config/tutorials/stackoverflow/send_messages_to_pulsar.py ================================================ import json import pulsar client = pulsar.Client('pulsar://localhost:6650') producer = client.create_producer('stackoverflow') with open('stackoverflow.posts.transformed-10000.json', encoding='utf8') as file: for i, line in enumerate(file): producer.send(line.encode('utf-8')) if i % 100 == 0: print(f"{i}/10000 messages sent.", i) client.close() ================================================ FILE: config/tutorials/vector-otel-logs/vector.toml ================================================ [sources.generate_syslog] type = "demo_logs" format = "syslog" count = 100000 interval = 0.001 [transforms.remap_syslog] inputs = [ "generate_syslog"] type = "remap" source = ''' structured = parse_syslog!(.message) .timestamp_nanos, err = to_unix_timestamp(structured.timestamp, unit: "nanoseconds") .body = structured .service_name = structured.appname .resource_attributes.source_type = .source_type .resource_attributes.host.hostname = structured.hostname .resource_attributes.service.name = structured.appname .attributes.syslog.procid = structured.procid .attributes.syslog.facility = structured.facility .attributes.syslog.version = structured.version .severity_text = if includes(["emerg", "err", "crit", "alert"], structured.severity) { "ERROR" } else if structured.severity == "warning" { "WARN" } else if structured.severity == "debug" { "DEBUG" } else if includes(["info", "notice"], structured.severity) { "INFO" } else { structured.severity } .scope_name = structured.msgid del(.message) del(.timestamp) del(.source_type) ''' [sinks.emit_syslog] inputs = ["remap_syslog"] type = "console" encoding.codec = "json" [sinks.quickwit_logs] type = "http" method = "post" inputs = ["remap_syslog"] encoding.codec = "json" framing.method = "newline_delimited" uri = "http://127.0.0.1:7280/api/v1/otel-logs-v0_7/ingest" ================================================ FILE: config/tutorials/wikipedia/index-config.yaml ================================================ # # Index config file for wikipedia dataset. # version: 0.8 index_id: wikipedia doc_mapping: field_mappings: - name: title type: text tokenizer: default record: position stored: true fieldnorms: true - name: body type: text tokenizer: default record: position stored: true fieldnorms: true - name: url type: text tokenizer: raw search_settings: default_search_fields: [title, body] indexing_settings: commit_timeout_secs: 10 ================================================ FILE: config/tutorials/wikipedia/multilang-index-config.yaml ================================================ # # Index config file for multilang wikipedia datasets. # version: 0.8 index_id: multilang-wikipedia doc_mapping: tokenizers: - name: multilang type: multilang field_mappings: - name: title type: text tokenizer: multilang record: position stored: true fieldnorms: true - name: body type: text tokenizer: multilang record: position stored: true fieldnorms: true - name: url type: text tokenizer: raw search_settings: default_search_fields: [title, body] indexing_settings: commit_timeout_secs: 10 ================================================ FILE: distribution/docker/ubuntu/Dockerfile ================================================ FROM ubuntu:noble@sha256:66460d557b25769b102175144d538d88219c077c678a49af4afca6fbfc1b5252 AS builder RUN apt-get update && apt-get install -y curl RUN curl -L https://install.quickwit.io | sh FROM ubuntu:noble@sha256:66460d557b25769b102175144d538d88219c077c678a49af4afca6fbfc1b5252 AS quickwit LABEL org.opencontainers.image.title="Quickwit" LABEL maintainer="Quickwit, Inc. " LABEL org.opencontainers.image.vendor="Quickwit, Inc." LABEL org.opencontainers.image.licenses="Apache-2.0" RUN apt-get -y update \ && apt-get -y install ca-certificates \ libssl3 \ && rm -rf /var/lib/apt/lists/* WORKDIR /quickwit RUN mkdir config qwdata COPY --from=builder /quickwit-v*/quickwit /usr/local/bin/quickwit COPY --from=builder /quickwit-v*/config/quickwit.yaml /quickwit/config/quickwit.yaml ENV QW_CONFIG=/quickwit/config/quickwit.yaml ENV QW_DATA_DIR=/quickwit/qwdata ENV QW_LISTEN_ADDRESS=0.0.0.0 RUN quickwit --version ENTRYPOINT ["quickwit"] ================================================ FILE: distribution/ecs/.gitignore ================================================ .terraform terraform.tfstate* .terraform.tfstate* terraform.tfvars ================================================ FILE: distribution/ecs/README.md ================================================ # ECS deployment for quickwit ## Run Quickwit in your infrastructure Create a Quickwit module using: ```terraform module "quickwit" { source = "github.com/quickwit-oss/quickwit/distribution/ecs/quickwit" vpc_id = # VPC in which all resources will be created subnet_ids = [...] # At least 2 private subnets must be specified quickwit_ingress_cidr_blocks = [...] # List of CIDR blocks allowed to access to the Quickwit API } ``` The Quickwit cluster is running on a private subnet. For ECS to pull the image: - if using the default Docker Hub image `quickwit/quickwit`, the subnets specified must be configured with a NAT Gateway (no public IPs are attached to the tasks) - if using an image hosted on ECR, a VPC endpoint for ECR can be used instead of a NAT Gateway ## Module configurations To get the list of available configurations, check the `./quickwit/variables.tf` file. ### Tips Metastore database backups are disabled as restoring one would lead to inconsistencies with the index store on S3. To ensure high availability, you should enable `rds_config.multi_az` instead. To use your own Postgres database instead of creating a new RDS instance, configure the `external_postgres_uri_secret_arn` variable (e.g ARN of an SSM parameter with the value `postgres://user:password@domain:port/db`). Using NAT Gateways for the image registry is quite costly (approx. $0.05/hour/AZ). If you are not already using NAT Gateways in the AZs where Quickwit will be deployed, you should probably push the Quickwit image to ECR and use ECR interface VPC endpoints instead (approx. ~$0.01/hour/AZ). When using the default image, you will quickly run into the Docker Hub rate limiting. We recommend pushing the Quickwit image to ECR and configure that as `quickwit_image`. Note that the architecture of the image that you push to ECR must match the `quickwit_cpu_architecture` variable (`ARM64` by default). Sidecar container and custom logging configurations can be configured using the variables `sidecar_container_definitions`, `sidecar_container_dependencies`, `log_configuration`, `enable_cloudwatch_logging`. See [custom log routing](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_firelens.html). You can use sidecars to inject additional secrets as files. This can be useful for configuring sources such as Kafka. See `./example/kafka.tf` for an example. To access external AWS services like the Kinesis source, use the `quickwit_indexer.extra_task_policy_arns` variable to attach the necessary IAM policies to indexers. ## Running the example stack We provide an example of self contained deployment with an ad-hoc VPC. > [!IMPORTANT] > This stack costs ~$200/month to run (Fargate tasks, NAT Gateways > and RDS) ### Deploy the Quickwit module and connect through a bastion To make it easy to access your Quickwit cluster, the example stack includes a bastion instance. Access is secured using an SSH key pair that you need to provide (e.g generated with `ssh-keygen -t ed25519`). In the `./example` directory, create a `terraform.tfvars` file with the public key of your RSA key pair: ```terraform bastion_public_key = "ssh-ed25519 ..." ``` > [!NOTE] > You can skip the creation of the bastion by not specifying the > `bastion_public_key` variable, but that would make it hard to access and > experiment with the created Quickwit cluster. In the same directory (`./example`) run: ```bash terraform init terraform apply ``` The successful `apply` command should output the IP of the bastion EC2 instance. You can port forward Quickwit's search UI using: ```bash ssh -N -L 7280:searcher.quickwit:7280 -i {your-private-key-file} ubuntu@{bastion_ip} ``` To ingest some example dataset, log into the bastion: ```bash ssh -i {your-private-key-file} ubuntu@{bastion_ip} # create the log index wget https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/hdfs-logs/index-config.yaml curl -X POST \ -H "content-type: application/yaml" \ --data-binary @index-config.yaml \ http://indexer.quickwit:7280/api/v1/indexes # import some data wget https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants-10000.json curl -X POST \ -H "content-type: application/json" \ --data-binary @hdfs-logs-multitenants-10000.json \ http://indexer.quickwit:7280/api/v1/hdfs-logs/ingest?commit=force ``` If your SSH tunnel to the searcher is still running, you should be able to see the ingested data in the UI. ### Setup an ECR repository to avoid throttling from Docker Hub By default, the example stack uses Docker Hub to pull the Quickwit image. This is convenient but it quickly runs into rate limiting. To avoid this, in the `terraform.tfvars` file, set the `dockerhub_pull_through_creds_secret_arn` to a AWS Secret with the following content: ```json {"username":"...","accessToken":"..."} ``` This will: - provision an ECR repository and a pull through cache rule - configure the Quickwit module to use that repository ================================================ FILE: distribution/ecs/example/.terraform.lock.hcl ================================================ # This file is maintained automatically by "terraform init". # Manual edits may be lost in future updates. provider "registry.terraform.io/hashicorp/aws" { version = "5.39.1" constraints = ">= 4.66.1, >= 5.36.0, ~> 5.39.1" hashes = [ "h1:hQLlAd6O1LdQHy1GdWtgT5fcOlc3TWW+SaaFkpe+e8E=", "zh:05c50a5d8edb3ba4ebc4eb6e0d0b5e319142f5983b27821710ed7d475d335bdc", "zh:082986a5784dd21957e632371b289e549f051a4ea21d5c78c6d744c3537f03c5", "zh:192ae622ba562eacc4921ed549a794506179233d724fdd15a4f147f3400724a0", "zh:19a1d4637a62de90b0da174c0bf01000cd900488f7e8f709d8a37f082c59756b", "zh:1d7689a8583515f1705972d7ce57ccfab96215b19905530d2c78c02dcfaff583", "zh:22c446a21209a52ab74b4ba1ede0b220531e97ce479430047e493a2c45e1d8cb", "zh:4154de82290ab4e9f81bac1ea62342de8b3b7a608f99258c190d4dd1c6663e47", "zh:6bc4859ccdc54f28af9286b2fa090a31dcb345138d68c471510b737f6a052011", "zh:73c69e000e0b321e78a4a12fef60d37285f2afec0ea7be9e06163d985101cb59", "zh:890a3422f5e445b49bae30facf448d0ec9cd647e9155d0b685b5b39e9d331a94", "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425", "zh:9cd88bec0f5205df9032e3126d4e57edd1c5cc8d45cda25626882dafc485a3b0", "zh:a3a8e3276d0fbf051bbafa192a2998b05745f2cf285ac8c36a9ad167a75c037f", "zh:d47e4dcf4c0ad71b9a7c720be4f3a89f6786a82e77bbe8d950794562792a1da5", "zh:f74e5b2af508c7de80a6ae5198df54a795eeba5058a0cd247828943f0c54f6e0", ] } provider "registry.terraform.io/hashicorp/random" { version = "3.6.0" constraints = ">= 3.1.0" hashes = [ "h1:R5Ucn26riKIEijcsiOMBR3uOAjuOMfI1x7XvH4P6B1w=", "zh:03360ed3ecd31e8c5dac9c95fe0858be50f3e9a0d0c654b5e504109c2159287d", "zh:1c67ac51254ba2a2bb53a25e8ae7e4d076103483f55f39b426ec55e47d1fe211", "zh:24a17bba7f6d679538ff51b3a2f378cedadede97af8a1db7dad4fd8d6d50f829", "zh:30ffb297ffd1633175d6545d37c2217e2cef9545a6e03946e514c59c0859b77d", "zh:454ce4b3dbc73e6775f2f6605d45cee6e16c3872a2e66a2c97993d6e5cbd7055", "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3", "zh:91df0a9fab329aff2ff4cf26797592eb7a3a90b4a0c04d64ce186654e0cc6e17", "zh:aa57384b85622a9f7bfb5d4512ca88e61f22a9cea9f30febaa4c98c68ff0dc21", "zh:c4a3e329ba786ffb6f2b694e1fd41d413a7010f3a53c20b432325a94fa71e839", "zh:e2699bc9116447f96c53d55f2a00570f982e6f9935038c3810603572693712d0", "zh:e747c0fd5d7684e5bfad8aa0ca441903f15ae7a98a737ff6aca24ba223207e2c", "zh:f1ca75f417ce490368f047b63ec09fd003711ae48487fba90b4aba2ccf71920e", ] } ================================================ FILE: distribution/ecs/example/bastion.tf ================================================ variable "bastion_public_key" { description = "The public key used to connect to the bastion host. If empty, no bastion is created." default = "" } output "bastion_ip" { value = var.bastion_public_key != "" ? aws_instance.bastion[0].public_ip : null } data "aws_ami" "ubuntu" { most_recent = true filter { name = "name" values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"] } filter { name = "virtualization-type" values = ["hvm"] } owners = ["099720109477"] # Canonical } resource "aws_security_group" "allow_ssh" { count = var.bastion_public_key != "" ? 1 : 0 name = "qw_ecs_bastion_allow_ssh" description = "Allow SSH inbound traffic from everywhere" vpc_id = module.vpc.vpc_id ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } resource "aws_instance" "bastion" { count = var.bastion_public_key != "" ? 1 : 0 ami = data.aws_ami.ubuntu.id instance_type = "t3.nano" key_name = aws_key_pair.bastion_key[0].key_name subnet_id = module.vpc.public_subnets[0] associate_public_ip_address = true vpc_security_group_ids = [aws_security_group.allow_ssh[0].id] tags = { Name = "quickwit-ecs-bastion" } } resource "aws_key_pair" "bastion_key" { count = var.bastion_public_key != "" ? 1 : 0 key_name = "quickwit-ecs-bastion-key" public_key = var.bastion_public_key } ================================================ FILE: distribution/ecs/example/image.tf ================================================ variable "dockerhub_pull_through_creds_secret_arn" { description = "If left empty, image is pulled directly from Docker Hub, which might be throttled." default = "" } locals { ecr_repository_prefix = "quickwit-ecs-example" } # This repo is populated by the pull through cache below resource "aws_ecr_repository" "quickwit" { count = var.dockerhub_pull_through_creds_secret_arn == "" ? 0 : 1 name = "${local.ecr_repository_prefix}/quickwit/quickwit" image_tag_mutability = "MUTABLE" force_delete = true image_scanning_configuration { scan_on_push = false } } resource "aws_ecr_pull_through_cache_rule" "docker_hub" { count = var.dockerhub_pull_through_creds_secret_arn == "" ? 0 : 1 ecr_repository_prefix = local.ecr_repository_prefix upstream_registry_url = "registry-1.docker.io" credential_arn = var.dockerhub_pull_through_creds_secret_arn } locals { ecr_domain = "${data.aws_caller_identity.current.account_id}.dkr.ecr.${data.aws_region.current.name}.amazonaws.com" image_prefix = var.dockerhub_pull_through_creds_secret_arn == "" ? "" : "${local.ecr_domain}/${local.ecr_repository_prefix}/" quickwit_image = "${local.image_prefix}quickwit/quickwit" } ================================================ FILE: distribution/ecs/example/kafka.tf ================================================ # Example configuration for injecting SSL keys for securing a Kafka connection # You can then create a secured Kafka source along these lines: # # version: 0.8 # source_id: kafka-source # source_type: kafka # num_pipelines: 2 # params: # topic: your-topic # client_params: # bootstrap.servers: "your-kafka-broker.com" # security.protocol: "SSL" # ssl.ca.location: "/quickwit/keys/ca.pem" # ssl.certificate.location: "/quickwit/keys/service.cert" # ssl.key.location: "/quickwit/keys/service.key" locals { ca_pem = "echo \"$CA_PEM\" > /quickwit/cfg/ca.pem" service_cert = "echo \"$SERVICE_CERT\" > /quickwit/cfg/service.cert" service_key = "echo \"$SERVICE_KEY\" > /quickwit/cfg/service.key" example_kafka_sidecar_container_definitions = { kafka_key_init = { name = "kafka_key_init" essential = false image = "busybox" command = ["sh", "-c", "${local.ca_pem} && ${local.service_cert} && ${local.service_key}"] enable_cloudwatch_logging = true mount_points = [ { sourceVolume = "quickwit-keys" containerPath = "/quickwit/keys" } ] secrets = [ { name = "CA_PEM" valueFrom = "arn:aws:secretsmanager:eu-west-1:123456789:secret:your_kafka_ca_pem" }, { name = "SERVICE_CERT" valueFrom = "arn:aws:secretsmanager:eu-west-1:123456789:secret:your_kafka_service_cert" }, { name = "SERVICE_KEY" valueFrom = "arn:aws:secretsmanager:eu-west-1:123456789:secret:your_kafka_service_key" } ] } } example_kafka_sidecar_container_dependencies = [ { condition = "SUCCESS" containerName = "kafka_key_init" } ] } ================================================ FILE: distribution/ecs/example/terraform.tf ================================================ terraform { backend "local" {} required_providers { aws = { source = "hashicorp/aws" version = "~> 5.39.1" } } } provider "aws" { region = "eu-west-1" default_tags { tags = { provisioner = "terraform" } } } data "aws_region" "current" {} data "aws_caller_identity" "current" {} module "quickwit" { source = "../quickwit" vpc_id = module.vpc.vpc_id subnet_ids = module.vpc.private_subnets quickwit_ingress_cidr_blocks = [module.vpc.vpc_cidr_block] ## Optional configurations: # - ECR if you provide the `dockerhub_pull_through_creds_secret_arn` variable # - Docker Hub otherwise (subject to throttling) quickwit_image = "${local.quickwit_image}:latest" # quickwit_index_s3_prefix = "my-bucket/my-prefix" # quickwit_domain = "quickwit" # quickwit_cpu_architecture = "ARM64" # quickwit_indexer = { # desired_count = 3 # memory = 8192 # cpu = 4096 # ephemeral_storage_gib = 50 # extra_task_policy_arns = ["arn:aws:iam::aws:policy/AmazonKinesisFullAccess"] # } # quickwit_metastore = { # desired_count = 1 # memory = 512 # cpu = 256 # } # quickwit_searcher = { # desired_count = 1 # memory = 2048 # cpu = 1024 # } # quickwit_control_plane = { # memory = 512 # cpu = 256 # } # quickwit_janitor = { # memory = 512 # cpu = 256 # } # rds_config = { # instance_class = "db.t4g.micro" # multi_az = false # } # external_postgres_uri_secret_arn = aws_ssm_parameter.postgres_uri.arn ## Example logging configuration # sidecar_container_definitions = { # my_sidecar_container = see http://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerDefinition.html # } # sidecar_container_dependencies = [{condition = "START", containerName = "my_sidecar_container"}] # log_configuration = see https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_service#log_configuration # enable_cloudwatch_logging = false ## Example Kafka key injection (see kafka.tf) # sidecar_container_definitions = local.example_kafka_sidecar_container_definitions # sidecar_container_dependencies = local.example_kafka_sidecar_container_dependencies } output "indexer_service_name" { value = module.quickwit.indexer_service_name } output "searcher_service_name" { value = module.quickwit.searcher_service_name } ================================================ FILE: distribution/ecs/example/vpc.tf ================================================ module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "5.5.3" name = "quickwit-ecs" cidr = "10.0.0.0/16" azs = ["${data.aws_region.current.name}a", "${data.aws_region.current.name}b"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24"] enable_nat_gateway = true } ================================================ FILE: distribution/ecs/quickwit/cluster.tf ================================================ module "ecs_cluster" { source = "terraform-aws-modules/ecs/aws//modules/cluster" version = "5.9.3" cluster_name = "quickwit-${local.module_id}" } resource "aws_service_discovery_private_dns_namespace" "quickwit_internal" { name = var.quickwit_domain description = "Internal quickwit domain" vpc = var.vpc_id } resource "aws_security_group" "quickwit_cluster_member_sg" { name = "quickwit-cluster-member-${local.module_id}" description = "Security group for members of the Quickwit cluster" vpc_id = var.vpc_id } ================================================ FILE: distribution/ecs/quickwit/configs.tf ================================================ locals { quickwit_peer_list = [ "${aws_service_discovery_service.metastore.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}", "${aws_service_discovery_service.control_plane.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}", "${aws_service_discovery_service.janitor.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}", "${aws_service_discovery_service.indexer.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}", "${aws_service_discovery_service.searcher.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}", ] # id to avoid conflicts when deploying this module multiple times (random by default) module_id = var.module_id == "" ? random_id.module.hex : var.module_id s3_id = var.module_id == "" ? random_id.module.hex : "${var.module_id}-${random_id.module.hex}" quickwit_index_s3_prefix = var.quickwit_index_s3_prefix == "" ? aws_s3_bucket.index[0].id : var.quickwit_index_s3_prefix use_external_rds = var.external_postgres_uri_secret_arn != "" postgres_uri_secret_arn = var.external_postgres_uri_secret_arn != "" ? var.external_postgres_uri_secret_arn : aws_ssm_parameter.postgres_credential[0].arn } resource "random_id" "module" { byte_length = 3 } ================================================ FILE: distribution/ecs/quickwit/iam.tf ================================================ data "aws_iam_policy_document" "quickwit_task_permission" { # Reference: https://quickwit.io/docs/guides/aws-setup#amazon-s3 statement { actions = [ "s3:ListBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ] resources = [ "arn:aws:s3:::${local.quickwit_index_s3_prefix}*", ] } } resource "aws_iam_policy" "quickwit_task_permission" { name = "quickwit-task-policy-${local.module_id}" path = "/" policy = data.aws_iam_policy_document.quickwit_task_permission.json } data "aws_iam_policy_document" "quickwit_task_execution_permission" { statement { actions = [ "logs:PutLogEvents", "logs:CreateLogStream" ] resources = ["*"] } statement { actions = [ "ecr:GetDownloadUrlForLayer", "ecr:GetAuthorizationToken", "ecr:BatchGetImage", "ecr:BatchCheckLayerAvailability", "ecr:CreateRepository", "ecr:BatchImportUpstreamImage" ] resources = ["*"] } statement { actions = ["ssm:GetParameters"] resources = [local.postgres_uri_secret_arn] } statement { actions = ["secretsmanager:GetSecretValue"] resources = ["arn:aws:secretsmanager:*:*:secret:*"] } } resource "aws_iam_policy" "quickwit_task_execution_permission" { name = "quickwit-task-execution-policy-${local.module_id}" path = "/" policy = data.aws_iam_policy_document.quickwit_task_execution_permission.json } ================================================ FILE: distribution/ecs/quickwit/outputs.tf ================================================ output "indexer_service_name" { value = "${aws_service_discovery_service.indexer.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}" } output "searcher_service_name" { value = "${aws_service_discovery_service.searcher.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}" } output "janitor_service_name" { value = "${aws_service_discovery_service.janitor.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}" } output "control_plane_service_name" { value = "${aws_service_discovery_service.control_plane.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}" } output "metastore_service_name" { value = "${aws_service_discovery_service.metastore.name}.${aws_service_discovery_private_dns_namespace.quickwit_internal.name}" } ================================================ FILE: distribution/ecs/quickwit/quickwit-control-plane.tf ================================================ module "quickwit_control_plane" { source = "./service" service_name = "control_plane" service_discovery_registry_arn = aws_service_discovery_service.control_plane.arn cluster_arn = module.ecs_cluster.arn postgres_uri_secret_arn = local.postgres_uri_secret_arn quickwit_peer_list = local.quickwit_peer_list s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn module_id = local.module_id quickwit_cluster_member_sg_id = aws_security_group.quickwit_cluster_member_sg.id subnet_ids = var.subnet_ids ingress_cidr_blocks = var.quickwit_ingress_cidr_blocks quickwit_image = var.quickwit_image quickwit_cpu_architecture = var.quickwit_cpu_architecture sidecar_container_definitions = var.sidecar_container_definitions sidecar_container_dependencies = var.sidecar_container_dependencies log_configuration = var.log_configuration enable_cloudwatch_logging = var.enable_cloudwatch_logging service_config = var.quickwit_control_plane quickwit_index_s3_prefix = local.quickwit_index_s3_prefix } resource "aws_service_discovery_service" "control_plane" { name = "control-plane" dns_config { namespace_id = aws_service_discovery_private_dns_namespace.quickwit_internal.id dns_records { ttl = 10 type = "A" } routing_policy = "MULTIVALUE" } } ================================================ FILE: distribution/ecs/quickwit/quickwit-indexer.tf ================================================ module "quickwit_indexer" { source = "./service" service_name = "indexer" service_discovery_registry_arn = aws_service_discovery_service.indexer.arn cluster_arn = module.ecs_cluster.arn postgres_uri_secret_arn = local.postgres_uri_secret_arn quickwit_peer_list = local.quickwit_peer_list s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn module_id = local.module_id quickwit_cluster_member_sg_id = aws_security_group.quickwit_cluster_member_sg.id subnet_ids = var.subnet_ids ingress_cidr_blocks = var.quickwit_ingress_cidr_blocks quickwit_image = var.quickwit_image quickwit_cpu_architecture = var.quickwit_cpu_architecture sidecar_container_definitions = var.sidecar_container_definitions sidecar_container_dependencies = var.sidecar_container_dependencies log_configuration = var.log_configuration enable_cloudwatch_logging = var.enable_cloudwatch_logging service_config = var.quickwit_indexer quickwit_index_s3_prefix = local.quickwit_index_s3_prefix # Longer termination grace period for indexers because we are waiting for the # data persisted in the ingesters to be indexed and committed. Should be # larger than the largest commit timeout. stop_timeout = 120 } resource "aws_service_discovery_service" "indexer" { name = "indexer" dns_config { namespace_id = aws_service_discovery_private_dns_namespace.quickwit_internal.id dns_records { ttl = 10 type = "A" } routing_policy = "MULTIVALUE" } } ================================================ FILE: distribution/ecs/quickwit/quickwit-janitor.tf ================================================ module "quickwit_janitor" { source = "./service" service_name = "janitor" service_discovery_registry_arn = aws_service_discovery_service.janitor.arn cluster_arn = module.ecs_cluster.arn postgres_uri_secret_arn = local.postgres_uri_secret_arn quickwit_peer_list = local.quickwit_peer_list s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn module_id = local.module_id quickwit_cluster_member_sg_id = aws_security_group.quickwit_cluster_member_sg.id subnet_ids = var.subnet_ids ingress_cidr_blocks = var.quickwit_ingress_cidr_blocks quickwit_image = var.quickwit_image quickwit_cpu_architecture = var.quickwit_cpu_architecture sidecar_container_definitions = var.sidecar_container_definitions sidecar_container_dependencies = var.sidecar_container_dependencies log_configuration = var.log_configuration enable_cloudwatch_logging = var.enable_cloudwatch_logging service_config = var.quickwit_janitor quickwit_index_s3_prefix = local.quickwit_index_s3_prefix } resource "aws_service_discovery_service" "janitor" { name = "janitor" dns_config { namespace_id = aws_service_discovery_private_dns_namespace.quickwit_internal.id dns_records { ttl = 10 type = "A" } routing_policy = "MULTIVALUE" } } ================================================ FILE: distribution/ecs/quickwit/quickwit-metastore.tf ================================================ module "quickwit_metastore" { source = "./service" service_name = "metastore" service_discovery_registry_arn = aws_service_discovery_service.metastore.arn cluster_arn = module.ecs_cluster.arn postgres_uri_secret_arn = local.postgres_uri_secret_arn quickwit_peer_list = local.quickwit_peer_list s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn module_id = local.module_id quickwit_cluster_member_sg_id = aws_security_group.quickwit_cluster_member_sg.id subnet_ids = var.subnet_ids ingress_cidr_blocks = var.quickwit_ingress_cidr_blocks quickwit_image = var.quickwit_image quickwit_cpu_architecture = var.quickwit_cpu_architecture sidecar_container_definitions = var.sidecar_container_definitions sidecar_container_dependencies = var.sidecar_container_dependencies log_configuration = var.log_configuration enable_cloudwatch_logging = var.enable_cloudwatch_logging service_config = var.quickwit_metastore quickwit_index_s3_prefix = local.quickwit_index_s3_prefix } resource "aws_service_discovery_service" "metastore" { name = "metastore" dns_config { namespace_id = aws_service_discovery_private_dns_namespace.quickwit_internal.id dns_records { ttl = 10 type = "A" } routing_policy = "MULTIVALUE" } } ================================================ FILE: distribution/ecs/quickwit/quickwit-searcher.tf ================================================ module "quickwit_searcher" { source = "./service" service_name = "searcher" service_discovery_registry_arn = aws_service_discovery_service.searcher.arn cluster_arn = module.ecs_cluster.arn postgres_uri_secret_arn = local.postgres_uri_secret_arn quickwit_peer_list = local.quickwit_peer_list s3_access_policy_arn = aws_iam_policy.quickwit_task_permission.arn task_execution_policy_arn = aws_iam_policy.quickwit_task_execution_permission.arn module_id = local.module_id quickwit_cluster_member_sg_id = aws_security_group.quickwit_cluster_member_sg.id subnet_ids = var.subnet_ids ingress_cidr_blocks = var.quickwit_ingress_cidr_blocks quickwit_image = var.quickwit_image quickwit_cpu_architecture = var.quickwit_cpu_architecture sidecar_container_definitions = var.sidecar_container_definitions sidecar_container_dependencies = var.sidecar_container_dependencies log_configuration = var.log_configuration enable_cloudwatch_logging = var.enable_cloudwatch_logging service_config = var.quickwit_searcher quickwit_index_s3_prefix = local.quickwit_index_s3_prefix } resource "aws_service_discovery_service" "searcher" { name = "searcher" dns_config { namespace_id = aws_service_discovery_private_dns_namespace.quickwit_internal.id dns_records { ttl = 10 type = "A" } routing_policy = "MULTIVALUE" } } ================================================ FILE: distribution/ecs/quickwit/rds.tf ================================================ resource "random_password" "quickwit_db" { count = local.use_external_rds ? 0 : 1 length = 64 special = false } module "quickwit_db" { count = local.use_external_rds ? 0 : 1 source = "terraform-aws-modules/rds/aws" version = "6.5.2" identifier = "quickwit-metastore-${local.module_id}" engine = "postgres" engine_version = "16" family = "postgres16" # DB parameter group major_engine_version = "16" # DB option group instance_class = var.rds_config.instance_class multi_az = var.rds_config.multi_az allocated_storage = 5 db_name = "quickwit" username = "quickwit" password = random_password.quickwit_db[0].result port = "5432" publicly_accessible = false manage_master_user_password = false iam_database_authentication_enabled = true vpc_security_group_ids = [aws_security_group.quickwit_db[0].id] db_subnet_group_name = aws_db_subnet_group.quickwit[0].name maintenance_window = "Mon:00:00-Mon:03:00" create_monitoring_role = true monitoring_interval = "30" monitoring_role_name = "RDSQuickwitMonitoringRole-${local.module_id}" deletion_protection = false skip_final_snapshot = true } resource "aws_security_group" "quickwit_db" { count = local.use_external_rds ? 0 : 1 name = "quickwit-db-${local.module_id}" description = "Security group for the Quickwit Metastore DB" vpc_id = var.vpc_id ingress { description = "Connection from explicitly allowed resources" from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.quickwit_cluster_member_sg.id] } } resource "aws_db_subnet_group" "quickwit" { count = local.use_external_rds ? 0 : 1 name = "quickwit-${local.module_id}" description = "Quickwit metastore" subnet_ids = var.subnet_ids } resource "aws_ssm_parameter" "postgres_credential" { count = local.use_external_rds ? 0 : 1 name = "/quickwit/${local.module_id}/postgres" type = "SecureString" value = "postgres://${module.quickwit_db[0].db_instance_username}:${random_password.quickwit_db[0].result}@${module.quickwit_db[0].db_instance_address}:${module.quickwit_db[0].db_instance_port}/${module.quickwit_db[0].db_instance_name}" } ================================================ FILE: distribution/ecs/quickwit/s3.tf ================================================ data "aws_caller_identity" "current" {} resource "aws_s3_bucket" "index" { count = var.quickwit_index_s3_prefix == "" ? 1 : 0 bucket = "quickwit-ecs-index-${data.aws_caller_identity.current.account_id}-${local.s3_id}" force_destroy = true } ================================================ FILE: distribution/ecs/quickwit/service/config.tf ================================================ locals { quickwit_data_dir = "/quickwit/qwdata" quickwit_common_environment = [ { name = "QW_ENABLED_SERVICES" value = var.service_name }, { name = "QW_PEER_SEEDS" value = join(",", var.quickwit_peer_list) }, { name = "NO_COLOR" value = "true" }, { name = "QW_CLUSTER_ID" value = "ecs-${var.module_id}" }, { name = "QW_LISTEN_ADDRESS" value = "0.0.0.0" }, { name = "QW_DATA_DIR" value = local.quickwit_data_dir }, { name = "QW_DEFAULT_INDEX_ROOT_URI" value = "s3://${var.quickwit_index_s3_prefix}" }, ] nb_extra_policies = length(var.service_config.extra_task_policy_arns) extra_tasks_iam_role_policies = { for i in range(local.nb_extra_policies) : "extra_policy_${i}" => var.service_config.extra_task_policy_arns[i] } tasks_iam_role_policies = merge({ s3_access = var.s3_access_policy_arn }, local.extra_tasks_iam_role_policies) } ================================================ FILE: distribution/ecs/quickwit/service/ecs.tf ================================================ module "quickwit_service" { source = "terraform-aws-modules/ecs/aws//modules/service" version = "5.9.3" name = "quickwit-${var.service_name}-${var.module_id}" cluster_arn = var.cluster_arn cpu = var.service_config.cpu memory = var.service_config.memory ephemeral_storage = { size_in_gib = var.service_config.ephemeral_storage_gib } container_definitions = merge(var.sidecar_container_definitions, { quickwit = { cpu = var.service_config.cpu memory = var.service_config.memory essential = true image = var.quickwit_image enable_cloudwatch_logging = var.enable_cloudwatch_logging command = ["run"] environment = local.quickwit_common_environment secrets = [ { name = "QW_METASTORE_URI" valueFrom = var.postgres_uri_secret_arn } ] port_mappings = [ { name = "rest" containerPort = 7280 protocol = "tcp" }, { name = "grpc" containerPort = 7281 protocol = "tcp" }, { name = "gossip" containerPort = 7280 protocol = "udp" } ] log_configuration = var.log_configuration mount_points = [ { sourceVolume = "quickwit-data-vol" containerPath = local.quickwit_data_dir }, # A volume that can be used to inject secrets as files. { sourceVolume = "quickwit-keys" containerPath = "/quickwit/keys" } ] stopTimeout = var.stop_timeout dependencies = var.sidecar_container_dependencies } }) requires_compatibilities = ["FARGATE"] runtime_platform = { operating_system_family = "LINUX" cpu_architecture = var.quickwit_cpu_architecture } service_registries = { registry_arn = var.service_discovery_registry_arn container_name = "quickwit" } subnet_ids = var.subnet_ids security_group_rules = { ingress_internal = { type = "ingress" from_port = 7280 to_port = 7281 protocol = "-1" source_security_group_id = var.quickwit_cluster_member_sg_id } ingress_external = { type = "ingress" from_port = 7280 to_port = 7281 protocol = "-1" cidr_blocks = var.ingress_cidr_blocks } egress_all = { type = "egress" from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } security_group_ids = [var.quickwit_cluster_member_sg_id] enable_autoscaling = false desired_count = var.service_config.desired_count volume = [ { name = "quickwit-data-vol" }, { name = "quickwit-keys" } ] tasks_iam_role_policies = local.tasks_iam_role_policies task_exec_iam_role_policies = { policy = var.task_execution_policy_arn } } ================================================ FILE: distribution/ecs/quickwit/service/variables.tf ================================================ variable "service_name" { description = "One of indexer, metastore, searcher, control_plane, janitor" } variable "service_discovery_registry_arn" {} variable "sidecar_container_definitions" {} variable "sidecar_container_dependencies" { type = list(object({ containerName = string condition = string })) default = [] } variable "log_configuration" {} variable "enable_cloudwatch_logging" { type = bool } variable "cluster_arn" {} variable "ingress_cidr_blocks" { type = list(string) } variable "quickwit_cluster_member_sg_id" {} variable "subnet_ids" { type = list(string) } variable "postgres_uri_secret_arn" { description = "ARN of the SSM parameter or Secret Manager secret containing the URI of a Postgres instance" } variable "quickwit_image" {} variable "service_config" { type = object({ desired_count = optional(number, 1) memory = number cpu = number ephemeral_storage_gib = optional(number, 21) extra_task_policy_arns = optional(list(string), []) }) } variable "quickwit_index_s3_prefix" {} variable "quickwit_peer_list" { type = list(string) } variable "s3_access_policy_arn" {} variable "task_execution_policy_arn" {} variable "quickwit_cpu_architecture" {} variable "module_id" {} variable "stop_timeout" { # between 1s and 120s on Fargate, 30s is the ECS default default = 30 } ================================================ FILE: distribution/ecs/quickwit/variables.tf ================================================ ## REQUIRED VARIABLES variable "vpc_id" { description = "VPC ID of the cluster" } variable "subnet_ids" { description = "Subnet(s) where quickwit will be deployed" type = list(string) } ## OPTIONAL VARIABLES variable "module_id" { description = "Identifier for the module, e.g the stage. If not specified, a random string is generated." default = "" } variable "quickwit_ingress_cidr_blocks" { description = "CIDR blocks (private) that should have access to the Quickwit cluster" type = list(string) default = [] } variable "quickwit_index_s3_prefix" { description = "S3 bucket name and prefix for the Quickwit data, e.g. my-bucket-name/my-prefix. Quickwit will only have access to this S3 location. Leave empty to create a new bucket." default = "" } variable "quickwit_domain" { description = "Local domain for quickwit service discovery" default = "quickwit" } variable "quickwit_image" { description = "Quickwit docker image" default = "quickwit/quickwit:latest" } variable "quickwit_cpu_architecture" { description = "One of X86_64 / ARM64. Must match the arch of the provided image (var.quickwit_image)." default = "ARM64" } variable "sidecar_container_definitions" { description = "Sidecar containers to be attached to Quickwit tasks" default = {} } variable "sidecar_container_dependencies" { description = "Specify the Quickwit container's dependencies on sidecars" type = list(object({ containerName = string condition = string })) default = [] } variable "enable_cloudwatch_logging" { description = "Cloudwatch logging for Quickwit tasks. Usually disabled when using a custom log configuration." default = true } variable "log_configuration" { description = "Custom log configuration for Quickwit tasks" default = {} } variable "quickwit_indexer" { description = "Indexer service sizing configurations" type = object({ desired_count = optional(number, 1) memory = optional(number, 8192) cpu = optional(number, 2048) ephemeral_storage_gib = optional(number, 21) extra_task_policy_arns = optional(list(string), []) }) default = {} } variable "quickwit_metastore" { description = "Metastore service sizing configurations" type = object({ desired_count = optional(number, 1) memory = optional(number, 512) cpu = optional(number, 256) }) default = {} } variable "quickwit_searcher" { description = "Searcher service sizing configurations" type = object({ desired_count = optional(number, 1) memory = optional(number, 4096) cpu = optional(number, 1024) ephemeral_storage_gib = optional(number, 21) }) default = {} } variable "quickwit_control_plane" { description = "Control plane service sizing configurations" type = object({ # only 1 task is necessary memory = optional(number, 512) cpu = optional(number, 256) }) default = {} } variable "quickwit_janitor" { description = "Janitor service sizing configurations" type = object({ # only 1 task is necessary memory = optional(number, 512) cpu = optional(number, 256) }) default = {} } variable "rds_config" { description = "Configurations of the metastore RDS database. Enable multi_az to ensure high availability." type = object({ instance_class = optional(string, "db.t4g.micro") multi_az = optional(bool, false) }) default = {} } variable "external_postgres_uri_secret_arn" { description = "ARN of the SSM parameter or Secret Manager secret containing the URI of a Postgres instance (postgres://{user}:{password}@{address}:{port}/{db_instance_name}). The Postgres instance should allow indbound connections from the subnets specified in `variable.subnet_ids`. If provided, the internal RDS will not be created and `var.rds_config` is ignored." default = "" } ================================================ FILE: distribution/kubernetes/README.md ================================================ # Quickwit on Kubernetes To deploy Quickwit on Kubernetes, use the official Quickwit Helm chart available at [helm.quickwit.io](https://helm.quickwit.io/) and refer to our [documentation](https://quickwit.io/docs/deployment/kubernetes/helm) for more information. ================================================ FILE: docker-compose.yml ================================================ # By default, this docker compose script maps all services to localhost only. # If you need to make services available outside of your machine, add # appropriate service mappings to the .env file. See .env.example file for # configuration example. # # Notes on image versions: # - For the key services such as postgres and pulsar we are trying to run # against the oldest supported version # - For kafka we use the oldest version that supports KRaft. # - For everything else we are trying to run against the latest version. # # To run against the latest image versions update .env file. See .env.example # file for configuration examples. You might need to remove the old images # first if they are already tagged latest and volumes if their content is # incompatible with the latest version, as in case of postgres. name: quickwit networks: default: name: quickwit-network ipam: config: - subnet: 172.16.7.0/24 gateway: 172.16.7.1 services: localstack: image: localstack/localstack:${LOCALSTACK_VERSION:-3.5.0} container_name: localstack ports: - "${MAP_HOST_LOCALSTACK:-127.0.0.1}:4566:4566" - "${MAP_HOST_LOCALSTACK:-127.0.0.1}:4571:4571" - "${MAP_HOST_LOCALSTACK:-127.0.0.1}:8080:8080" profiles: - all - localstack environment: SERVICES: kinesis,s3,sqs PERSISTENCE: 1 volumes: - .localstack:/etc/localstack/init/ready.d - localstack_data:/var/lib/localstack healthcheck: test: ["CMD", "curl", "-k", "-f", "https://localhost:4566/quickwit-integration-tests"] interval: 1s timeout: 5s retries: 100 postgres: # The oldest supported version. EOL November 14, 2024 image: postgres:${POSTGRES_VERSION:-12.17-alpine} container_name: postgres ports: - "${MAP_HOST_POSTGRES:-127.0.0.1}:5432:5432" profiles: - all - postgres environment: PGDATA: /var/lib/postgresql/data/pgdata POSTGRES_USER: ${POSTGRES_USER:-quickwit-dev} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-quickwit-dev} POSTGRES_DB: ${POSTGRES_DB:-quickwit-metastore-dev} volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD", "pg_isready"] interval: 1s timeout: 5s retries: 100 pulsar-broker: # The oldest version with arm64 docker images. EOL May 2 2025 image: apachepulsar/pulsar:${PULSAR_VERSION:-3.0.0} container_name: pulsar-broker command: bin/pulsar standalone --no-functions-worker ports: - "${MAP_HOST_PULSAR:-127.0.0.1}:6650:6650" - "${MAP_HOST_PULSAR:-127.0.0.1}:8081:8080" environment: PULSAR_MEM: "-Xms384M -Xmx384M" # Disable functions worker to save memory/time PULSAR_PREFIX_functionsWorkerEnabled: "false" profiles: - all - pulsar kafka-broker: image: confluentinc/confluent-local:${CP_VERSION:-7.4.11} container_name: kafka-broker ports: - "${MAP_HOST_KAFKA:-127.0.0.1}:9092:9092" - "${MAP_HOST_KAFKA:-127.0.0.1}:9101:9101" profiles: - all - kafka environment: # Mode KRaft (Single Node) KAFKA_NODE_ID: 1 KAFKA_PROCESS_ROLES: 'broker,controller' KAFKA_CONTROLLER_QUORUM_VOTERS: '1@localhost:9093' KAFKA_LOG4J_LOGGERS: "org.apache.kafka.image.loader.MetadataLoader=WARN" # Listeners KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT' KAFKA_LISTENERS: 'EXTERNAL://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093' KAFKA_ADVERTISED_LISTENERS: 'EXTERNAL://localhost:9092' KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER' KAFKA_INTER_BROKER_LISTENER_NAME: 'EXTERNAL' # Configuration simplifiée KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 # ID du Cluster (Nécessaire pour KRaft) CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk' KAFKA_HEAP_OPTS: -Xms256M -Xmx256M healthcheck: # test: ["CMD-SHELL", "nc -z localhost 9092 || exit 1"] test: ["CMD", "ub", "kafka-ready", "-b", "localhost:9092", "1", "5"] start_period: 5s interval: 5s timeout: 10s retries: 100 azurite: image: mcr.microsoft.com/azure-storage/azurite:${AZURITE_VERSION:-3.24.0} container_name: azurite ports: - "${MAP_HOST_AZURITE:-127.0.0.1}:10000:10000" # Blob store port profiles: - all - azurite volumes: - azurite_data:/data command: azurite --blobHost 0.0.0.0 --loose fake-gcs-server: image: fsouza/fake-gcs-server:${FAKE_GCS_SERVER_VERSION:-1.47.7} container_name: fake-gcs-server ports: - "${MAP_HOST_FAKE_GCS_SERVER:-127.0.0.1}:4443:4443" # Blob store port profiles: - all - fake-gcs-server volumes: - fake_gcs_server_data:/data command: -scheme http grafana: image: grafana/grafana-oss:${GRAFANA_VERSION:-10.4.1} container_name: grafana ports: - "${MAP_HOST_GRAFANA:-127.0.0.1}:3000:3000" profiles: - grafana - monitoring environment: GF_AUTH_DISABLE_LOGIN_FORM: "true" GF_AUTH_ANONYMOUS_ENABLED: "true" GF_AUTH_ANONYMOUS_ORG_ROLE: Admin volumes: - grafana_conf:/etc/grafana - grafana_data:/var/lib/grafana - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards - ./monitoring/grafana/provisioning:/etc/grafana/provisioning jaeger: image: jaegertracing/all-in-one:${JAEGER_VERSION:-1.48.0} container_name: jaeger ports: - "${MAP_HOST_JAEGER:-127.0.0.1}:16686:16686" # Frontend profiles: - jaeger - monitoring otel-collector: image: otel/opentelemetry-collector:${OTEL_VERSION:-0.84.0} container_name: otel-collector ports: - "${MAP_HOST_OTEL:-127.0.0.1}:1888:1888" # pprof extension - "${MAP_HOST_OTEL:-127.0.0.1}:8888:8888" # Prometheus metrics exposed by the collector - "${MAP_HOST_OTEL:-127.0.0.1}:8889:8889" # Prometheus exporter metrics - "${MAP_HOST_OTEL:-127.0.0.1}:13133:13133" # health_check extension - "${MAP_HOST_OTEL:-127.0.0.1}:4317:4317" # OTLP gRPC receiver - "${MAP_HOST_OTEL:-127.0.0.1}:4318:4318" # OTLP http receiver - "${MAP_HOST_OTEL:-127.0.0.1}:55679:55679" # zpages extension profiles: - otel - monitoring volumes: - ./monitoring/otel-collector-config.yaml:/etc/otel-collector-config.yaml command: ["--config=/etc/otel-collector-config.yaml"] prometheus: image: prom/prometheus:${PROMETHEUS_VERSION:-v2.43.0} container_name: prometheus ports: - "${MAP_HOST_PROMETHEUS:-127.0.0.1}:9090:9090" profiles: - prometheus - monitoring volumes: - ./monitoring/prometheus.yaml:/etc/prometheus/prometheus.yml extra_hosts: - "host.docker.internal:host-gateway" gcp-pubsub-emulator: # It is not an official docker image # if we prefer we can build a docker from the official docker image (gcloud cli) # and install the pubsub emulator https://cloud.google.com/pubsub/docs/emulator image: thekevjames/gcloud-pubsub-emulator:${GCLOUD_EMULATOR:-550.0.0} container_name: gcp-pubsub-emulator ports: - "${MAP_HOST_GCLOUD_EMULATOR:-127.0.0.1}:8681:8681" environment: # create a fake gcp project and a topic / subscription - PUBSUB_PROJECT1=quickwit-emulator,emulator_topic:emulator_subscription profiles: - all - gcp-pubsub volumes: azurite_data: fake_gcs_server_data: grafana_conf: grafana_data: localstack_data: postgres_data: ================================================ FILE: docs/assets/sqs-file-source.tf ================================================ terraform { required_version = "1.7.5" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.39.1" } } } provider "aws" { region = "us-east-1" default_tags { tags = { provisioner = "terraform" author = "Quickwit" } } } locals { sqs_notification_queue_name = "qw-tuto-s3-event-notifications" source_bucket_name = "qw-tuto-source-bucket" } resource "aws_s3_bucket" "file_source" { bucket_prefix = local.source_bucket_name force_destroy = true } data "aws_iam_policy_document" "sqs_notification" { statement { effect = "Allow" principals { type = "*" identifiers = ["*"] } actions = ["sqs:SendMessage"] resources = ["arn:aws:sqs:*:*:${local.sqs_notification_queue_name}"] condition { test = "ArnEquals" variable = "aws:SourceArn" values = [aws_s3_bucket.file_source.arn] } } } resource "aws_sqs_queue" "s3_events" { name = local.sqs_notification_queue_name policy = data.aws_iam_policy_document.sqs_notification.json redrive_policy = jsonencode({ deadLetterTargetArn = aws_sqs_queue.s3_events_deadletter.arn maxReceiveCount = 5 }) } resource "aws_sqs_queue" "s3_events_deadletter" { name = "${locals.sqs_notification_queue_name}-deadletter" } resource "aws_sqs_queue_redrive_allow_policy" "s3_events_deadletter" { queue_url = aws_sqs_queue.s3_events_deadletter.id redrive_allow_policy = jsonencode({ redrivePermission = "byQueue", sourceQueueArns = [aws_sqs_queue.s3_events.arn] }) } resource "aws_s3_bucket_notification" "bucket_notification" { bucket = aws_s3_bucket.file_source.id queue { queue_arn = aws_sqs_queue.s3_events.arn events = ["s3:ObjectCreated:*"] } } data "aws_iam_policy_document" "quickwit_node" { statement { effect = "Allow" actions = [ "sqs:ReceiveMessage", "sqs:DeleteMessage", "sqs:ChangeMessageVisibility", "sqs:GetQueueAttributes", ] resources = [aws_sqs_queue.s3_events.arn] } statement { effect = "Allow" actions = ["s3:GetObject"] resources = ["${aws_s3_bucket.file_source.arn}/*"] } } resource "aws_iam_user" "quickwit_node" { name = "quickwit-filesource-tutorial" path = "/system/" } resource "aws_iam_user_policy" "quickwit_node" { name = "quickwit-filesource-tutorial" user = aws_iam_user.quickwit_node.name policy = data.aws_iam_policy_document.quickwit_node.json } resource "aws_iam_access_key" "quickwit_node" { user = aws_iam_user.quickwit_node.name } output "source_bucket_name" { value = aws_s3_bucket.file_source.bucket } output "notification_queue_url" { value = aws_sqs_queue.s3_events.id } output "quickwit_node_access_key_id" { value = aws_iam_access_key.quickwit_node.id sensitive = true } output "quickwit_node_secret_access_key" { value = aws_iam_access_key.quickwit_node.secret sensitive = true } ================================================ FILE: docs/configuration/_category_.yaml ================================================ label: 'Configuration' position: 4 collapsed: true ================================================ FILE: docs/configuration/index-config.md ================================================ --- title: Index configuration sidebar_position: 3 toc_max_heading_level: 4 --- This page describes how to configure an index. In addition to the `index_id`, the index configuration lets you define five items: - The **index-uri**: it defines where the index files should be stored. - The **doc mapping**: it defines how a document and the fields it contains are stored and indexed for a given index. - The **indexing settings**: it defines the timestamp field used for sharding, and some more advanced parameters like the merge policy. - The **search settings**: it defines the default search fields `default_search_fields`, a list of fields that Quickwit will search into if the user query does not explicitly target a field. - The **retention policy**: it defines how long Quickwit should keep the indexed data. If not specified, the data is stored forever. Configuration is set at index creation and can be changed using the [update endpoint](../reference/rest-api.md) or the [CLI](../reference/cli.md). ## Config file format The index configuration format is YAML. When a key is absent from the configuration file, the default value is used. Here is a complete example suited for the HDFS logs dataset: ```yaml version: 0.7 # File format version. index_id: "hdfs" index_uri: "s3://my-bucket/hdfs" doc_mapping: mode: lenient field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: severity_text type: text tokenizer: raw fast: - tokenizer: lowercase - name: body type: text tokenizer: default record: position - name: resource type: object field_mappings: - name: service type: text tokenizer: raw tag_fields: ["resource.service"] timestamp_field: timestamp index_field_presence: true search_settings: default_search_fields: [severity_text, body] retention: period: 90 days schedule: daily ``` ## Index ID The index ID is a string that uniquely identifies the index within the metastore. It may only contain uppercase or lowercase ASCII letters, digits, hyphens (`-`), and underscores (`_`). Finally, it must start with a letter and contain at least 3 characters but no more than 255. ## Index uri The index-uri defines where the index files (also called splits) should be stored. This parameter expects a [storage uri](storage-config#storage-uris). The `index-uri` parameter is optional. By default, the `index-uri` will be computed by concatenating the `index-id` with the `default_index_root_uri` defined in the [Quickwit's config](node-config). :::caution The file storage will not work when running quickwit in distributed mode. Instead, AWS S3, Azure Blob Storage, Google Cloud Storage (in s3 interoperability mode) or other S3-compatible storage systems including Scaleway Object Storage and Garage should be used as storage when running several searcher nodes. ::: ## Doc mapping The doc mapping defines how a document and the fields it contains are stored and indexed for a given index. A document is a collection of named fields, each having its own data type (text, bytes, datetime, bool, i64, u64, f64, ip, json). | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `field_mappings` | Collection of field mapping, each having its own data type (text, binary, datetime, bool, i64, u64, f64, ip, json). | `[]` | | `mode` | Defines how quickwit should handle document fields that are not present in the `field_mappings`. In particular, the "dynamic" mode makes it possible to use quickwit in a schemaless manner. (See [mode](#mode)) | `dynamic` | `dynamic_mapping` | This parameter is only allowed when `mode` is set to `dynamic`. It then defines whether dynamically mapped fields should be indexed, stored, etc. | (See [mode](#mode)) | `tag_fields` | Collection of fields* explicitly defined in `field_mappings` whose values will be stored as part of the `tags` metadata. Allowed types are: `text` (with raw tokenizer), `i64` and `u64`. [Learn more about tags](../overview/concepts/querying.md#tag-pruning). | `[]` | | `store_source` | Whether or not the original JSON document is stored or not in the index. | `false` | | `timestamp_field` | Timestamp field* used for sharding documents in splits. The field has to be of type `datetime`. [Learn more about time sharding](./../overview/architecture.md). | `None` | | `partition_key` | If set, quickwit will route documents into different splits depending on the field name declared as the `partition_key`. | `null` | | `max_num_partitions` | Limits the number of splits created through partitioning. (See [Partitioning](../overview/concepts/querying.md#partitioning)) | `200` | | `index_field_presence` | `exists` queries are enabled automatically for fast fields. To enable it for all other fields set this parameter to `true`. Enabling it can have a significant CPU-cost on indexing. | false | *: tags fields and timestamp field are expressed as a path from the root of the JSON object to the given field. If a field name contains a `.` character, it needs to be escaped with a `\` character. ### Field types Each field[^1] has a type that indicates the kind of data it contains, such as integer on 64 bits or text. Quickwit supports the following raw types [`text`](#text-type), [`i64`](#numeric-types-i64-u64-and-f64-type), [`u64`](#numeric-types-i64-u64-and-f64-type), [`f64`](#numeric-types-i64-u64-and-f64-type), [`datetime`](#datetime-type), [`bool`](#bool-type), [`ip`](#ip-type), [`bytes`](#bytes-type), and [`json`](#json-type), and also supports composite types such as array and object. Behind the scenes, Quickwit is using tantivy field types, don't hesitate to look at [tantivy documentation](https://github.com/tantivy-search/tantivy) if you want to go into the details. ### Raw types #### Text type This field is a text field that will be analyzed and split into tokens before indexing. This kind of field is tailored for full-text search. Example of a mapping for a text field: ```yaml name: body description: Body of the document type: text tokenizer: default record: position fieldnorms: true fast: normalizer: lowercase ``` **Parameters for text field** | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `description` | Optional description for the field. | `None` | | `stored` | Whether value is stored in the document store | `true` | | `indexed` | Whether value should be indexed so it can be searched | `true` | | `tokenizer` | Name of the `Tokenizer`. ([See tokenizers](#description-of-available-tokenizers)) for a list of available tokenizers. | `default` | | `record` | Describes the amount of information indexed, choices between `basic`, `freq` and `position` | `basic` | | `fieldnorms` | Whether to store fieldnorms for the field. Fieldnorms are required to calculate the BM25 Score of the document. | `false` | | `fast` | Whether value is stored in a fast field. The fast field will contain the term ids and the dictionary. The default behaviour for `true` is to store the original text unchanged. The normalizers on the fast field is separately configured. It can be configured via `normalizer: lowercase`. ([See normalizers](#description-of-available-normalizers)) for a list of available normalizers. | `false` | ##### Description of available tokenizers | Tokenizer | Description | | ------------- | ------------- | | `raw` | Does not process nor tokenize text. Filters out tokens larger than 255 bytes. This is similar to the `keyword` type in Elasticsearch. | | `raw_lowercase` | Does not tokenize text, but lowercase it. Filters out tokens larger than 255 bytes. | | `default` | Chops the text on according to whitespace and punctuation, removes tokens that are too long, and converts to lowercase. Filters out tokens larger than 255 bytes. | | `en_stem` | Like `default`, but also applies stemming on the resulting tokens. Filters out tokens larger than 255 bytes. | | `whitespace` | Chops the text on according to whitespace only. Doesn't remove long tokens or converts to lowercase. | | `chinese_compatible` | Chop between each CJK character in addition to what `default` does. Should be used with `record: position` to be able to properly search | | `lowercase` | Applies a lowercase transformation on the text. It does not tokenize the text. | ##### Description of available normalizers | Normalizer | Description | | ------------- | ------------- | | `raw` | Does not process nor tokenize text. Filters token larger than 255 bytes. | | `lowercase` | Applies a lowercase transformation on the text. Filters token larger than 255 bytes. | **Description of record options** | Record option | Description | | ------------- | ------------- | | `basic` | Records only the `DocId`s | | `freq` | Records the document ids as well as the term frequency | | `position` | Records the document id, the term frequency and the positions of occurrences. | Indexing with position is required to run phrase queries. #### Numeric types: `i64`, `u64` and `f64` type Quickwit handles three numeric types: `i64`, `u64`, and `f64`. Numeric values can be stored in a fast field (the equivalent of Lucene's `DocValues`), which is a column-oriented storage used for range queries and aggregations. When querying negative numbers without precising a field (using `default_search_fields`), you should single-quote the number (for instance '-5'), otherwise it will be interpreted as wanting to match anything but that number. Example of a mapping for an u64 field: ```yaml name: rating description: Score between 0 and 5 type: u64 stored: true indexed: true fast: true ``` **Parameters for i64, u64 and f64 field** | Variable | Description | Default value | | --------------- | ------------- | ------------- | | `description` | Optional description for the field. | `None` | | `stored` | Whether the field values are stored in the document store. | `true` | | `indexed` | Whether the field values are indexed. | `true` | | `fast` | Whether the field values are stored in a fast field. | `false` | | `coerce` | Whether to convert numbers passed as strings to integers or floats. | `true` | | `output_format` | JSON type used to return numbers in search results. Possible values are `number` or `string`. | `number` | #### `datetime` type The `datetime` type handles dates and datetimes. Since JSON doesn’t have a date type, the `datetime` field support multiple input types and formats. The supported input types are: - floating-point or integer numbers representing a Unix timestamp - strings containing a formatted date, datetime, or Unix timestamp The `input_formats` field parameter specifies the accepted date formats. The following input formats are natively supported: - `iso8601` - `rfc2822` - `rfc3339` - `strptime` - `unix_timestamp` **Input formats** When specifying multiple input formats, the corresponding parsers are attempted in the order they are declared. The following formats are natively supported: - `iso8601`, `rfc2822`, `rfc3339`: parse dates using standard ISO and RFC formats. - `strptime`: parse dates using the Unix [strptime](https://man7.org/linux/man-pages/man3/strptime.3.html) format with some variations: - `strptime` format specifiers: `%C`, `%d`, `%D`, `%e`, `%F`, `%g`, `%G`, `%h`, `%H`, `%I`, `%j`, `%k`, `%l`, `%m`, `%M`, `%n`, `%R`, `%S`, `%t`, `%T`, `%u`, `%U`, `%V`, `%w`, `%W`, `%y`, `%Y`, `%%`. - `%f` for milliseconds precision support. - `%z` timezone offsets can be specified as `(+|-)hhmm` or `(+|-)hh:mm`. :::warning The timezone name format specifier (`%Z`) is not supported currently. ::: - `unix_timestamp`: parse float and integer numbers to Unix timestamps. Floating-point values are converted to timestamps expressed in seconds. Integer values are converted to Unix timestamps whose precision, determined in `seconds`, `milliseconds`, `microseconds`, or `nanoseconds`, is inferred from the number of input digits. Internally, datetimes are converted to UTC (if the time zone is specified) and stored as *i64* integers. As a result, Quickwit only supports timestamp values ranging from `Apr 13, 1972 23:59:55` to `Mar 16, 2242 12:56:31`. :::warning Converting timestamps from float to integer values may occur with a loss of precision. ::: When a `datetime` field is stored as a fast field, the `fast_precision` parameter indicates the precision used to truncate the values before encoding, which improves compression (truncation here means zeroing). The `fast_precision` parameter can take the following values: `seconds`, `milliseconds`, `microseconds`, or `nanoseconds`. It only affects what is stored in fast fields when a `datetime` field is marked as "fast". Finally, operations on `datetime` fast fields, e.g. via aggregations, need to be done at the nanosecond level. :::info Internally `datetime` is stored in `nanoseconds` in fast fields and in the docstore, and in `seconds` in the term dictionary. ::: In addition, Quickwit supports the `output_format` field parameter to specify with which precision datetimes are deserialized. This parameter supports the same value as input formats except for `unix_timestamp` which is replaced by the following formats: - `unix_timestamp_secs`: displays timestamps in seconds. - `unix_timestamp_millis`: displays timestamps in milliseconds. - `unix_timestamp_micros`: displays timestamps in microseconds. - `unix_timestamp_nanos`: displays timestamps in nanoseconds. Example of a mapping for a datetime field: ```yaml name: timestamp type: datetime description: Time at which the event was emitted input_formats: - rfc3339 - unix_timestamp - "%Y %m %d %H:%M:%S.%f %z" output_format: unix_timestamp_secs stored: true indexed: true fast: true fast_precision: milliseconds ``` **Parameters for datetime field** | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `input_formats` | Formats used to parse input dates | [`rfc3339`, `unix_timestamp`] | | `output_format` | Format used to display dates in search results | `rfc3339` | | `stored` | Whether the field values are stored in the document store | `true` | | `indexed` | Whether the field values are indexed | `true` | | `fast` | Whether the field values are stored in a fast field | `false` | | `fast_precision` | The precision (`seconds`, `milliseconds`, `microseconds`, or `nanoseconds`) used to store the fast values. | `seconds` | #### `bool` type The `bool` type accepts boolean values. Example of a mapping for a boolean field: ```yaml name: is_active description: Activation status type: bool stored: true indexed: true fast: true ``` **Parameters for bool field** | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `description` | Optional description for the field. | `None` | | `stored` | Whether value is stored in the document store | `true` | | `indexed` | Whether value is indexed | `true` | | `fast` | Whether value is stored in a fast field | `false` | #### `ip` type The `ip` type accepts IP address values, both IpV4 and IpV6 are supported. Internally IpV4 are converted to IpV6. Example of a mapping for an IP field: ```yaml name: host_ip description: Host IP address type: ip fast: true ``` **Parameters for IP field** | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `description` | Optional description for the field. | `None` | | `stored` | Whether value is stored in the document store | `true` | | `indexed` | Whether value is indexed | `true` | | `fast` | Whether value is stored in a fast field | `false` | #### `bytes` type The `bytes` type accepts a binary value as a `Base64` encoded string. Example of a mapping for a bytes field: ```yaml name: binary type: bytes stored: true indexed: true fast: true input_format: hex output_format: hex ``` **Parameters for bytes field** | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `description` | Optional description for the field. | `None` | | `stored` | Whether value is stored in the document store | `true` | | `indexed` | Whether value is indexed | `true` | | `fast` | Whether value is stored in a fast field. Only on 1:1 cardinality, not supported on `array` fields | `false` | | `input_format` | Encoding used to represent input bytes, either `hex` or `base64` | `base64` | | `output_format` | Encoding used to represent bytes in search results, either `hex` or `base64` | `base64` | #### `json` type The `json` type accepts a JSON object. Example of a mapping for a JSON field: ```yaml name: parameters type: json stored: true indexed: true tokenizer: raw expand_dots: false fast: normalizer: lowercase ``` Stored primitive types are inferred from the JSON value types using the following rules: - a boolean value `true` or `false` is stored as `bool` - numeric values are cast to the first compatible format between `i64`, `u64` or `f64` (in this order) - for string values (surrounded with quotes), Tantivy attempts to parse a date in `rfc3339` format. If the parsing fails, the value is stored as `text` using the configured tokenization rules **Parameters for JSON field** | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `description` | Optional description for the field. | `None` | | `stored` | Whether value is stored in the document store | `true` | | `indexed` | Whether value is indexed | `true` | | `fast` | Whether value is stored in a fast field. The default behaviour for text in the JSON is to store the text unchanged. A normalizer can be configured via `normalizer: lowercase`. ([See normalizers](#description-of-available-normalizers)) for a list of available normalizers. | `false` | | `tokenizer` | **Only affects strings in the json object**. Name of the `Tokenizer`, choices between `raw`, `default`, `en_stem` and `chinese_compatible` | `raw` | | `record` | **Only affects strings in the json object**. Describes the amount of information indexed, choices between `basic`, `freq` and `position` | `basic` | | `expand_dots` | If true, json keys containing a `.` should be expanded. For instance, if `expand_dots` is set to true, `{"k8s.node.id": "node-2"}` will be indexed as if it was `{"k8s": {"node": {"id": "node2"}}}`. The benefit is that escaping the `.` will not be required at query time. In other words, `k8s.node.id:node2` will match the document. This does not impact the way the document is stored. | `true` | Note that the `tokenizer` and the `record` have the same definition and the same effect as for the text field. To search into a json object, one then needs to extend the field name with the path that will lead to the target value. For instance, when indexing the following object: ```json { "product_name": "droopy t-shirt", "attributes": { "color": ["red", "green", "white"], "size:": "L" } } ``` Assuming `attributes` as been defined as a field mapping as follows: ```yaml - type: json name: attributes ``` `attributes.color:red` is then a valid query. If, in addition, `attributes` is set as a default search field, then `color:red` is a valid query. ### Composite types #### array Quickwit supports arrays for all raw types except for `object` types. To declare an array type of `i64` in the index config, you just have to set the type to `array`. #### object Quickwit supports nested objects as long as it does not contain arrays of objects. ```yaml name: resource type: object field_mappings: - name: service type: text ``` #### concatenate Quickwit supports mapping the content of multiple fields to a single one. This can be more efficient at query time than searching through dozens of `default_search_fields`. It also allows querying inside a json field without knowing the path to the field being searched. ```yaml name: my_default_field type: concatenate concatenate_fields: - text # things inside text, tokenized with the `default` tokenizer - resource.author # all fields in resource.author, assuming resource is an `object` field. include_dynamic_fields: true tokenizer: default record: basic ``` Concatenate fields don't support fast fields, and are never stored. They uses their own tokenizer, independently of the tokenizer configured on the individual fields. At query time, concatenate fields don't support range queries. Only the following types are supported inside a concatenate field: text, bool, i64, u64, f64, json. Other types are rejected at index creation, or silently discarded during indexation if they are found inside a json field. Unlike regular JSON fields, JSON fields in a concatenate field don't store RFC3339 dates as Tantivy dates. This means you can still perform prefix queries, e.g `my_default_field:"2025-12-12"*` to work around the lack of support for range queries. Adding an object field to a concatenate field doesn't automatically add its subfields (yet). It isn't possible to add subfields from a json field to a concatenate field. For instance if `attributes` is a json field, it's not possible to add only `attributes.color` to a concatenate field. For json fields and dynamic fields, the path is not indexed, only values are. For instance, given the following document: ```json { "421312": { "my-key": "my-value" } } ``` It is possible to search for `my-value` despite not knowing the full path, but it isn't possible to search for all documents containing a key `my-key`. ### Mode The `mode` describes how Quickwit should behave when it receives a field that is not defined in the field mapping. Quickwit offers you three different modes: - `dynamic` (default value): unmapped fields are gathered by Quickwit and handled as defined in the `dynamic_mapping` parameter. - `lenient`: unmapped fields are dismissed by Quickwit. - `strict`: if a document contains a field that is not mapped, quickwit will dismiss it, and count it as an error. #### Dynamic Mapping `dynamic` mode makes it possible to operate Quickwit in a schemaless manner, or with a partial schema. The configuration of `dynamic` mode can be set via the `dynamic_mapping` parameter. `dynamic_mapping` offers the same configuration options as when configuring a `json` field. It defaults to: ```yaml version: 0.7 index_id: my-dynamic-index doc_mapping: mode: dynamic dynamic_mapping: indexed: true stored: true tokenizer: raw record: basic expand_dots: true fast: true ``` When the `dynamic_mapping` is set as indexed (default), fields mapped through dynamic mode can be searched by targeting the path needed to access them from the root of the JSON object. For instance, in an entirely schemaless settings, a minimal index configuration could be: ```yaml version: 0.7 index_id: my-dynamic-index doc_mapping: # If you have a timestamp field, it is important to tell quickwit about it. timestamp_field: unix_timestamp # mode: dynamic #< Commented out, as dynamic is the default mode. ``` With such a simple configuration, we can index a complex document like the following: ```json { "endpoint": "/admin", "query_params": { "ctk": "e42bb897d", "page": "eeb" }, "src": { "ip": "8.8.8.8", "port": 53, }, //... } ``` The following queries are then valid, and match the document above. ```bash // Fields can be searched simply. endpoint:/admin // Nested object can be queried by specifying a `.` separated // path from the root of the json object to the given field. query_params.ctk:e42bb897d // numbers are searchable too src.port:53 // and of course we can combine them with boolean operators. src.port:53 AND query_params.ctk:e42bb897d ``` The stored primitive type inference is the [same as for JSON fields](#json-type). ### Field name validation rules Currently Quickwit only accepts field name that matches the following regular expression: `^[@$_\-a-zA-Z][@$_/\.\-a-zA-Z0-9]{0,254}$` In plain language: - it needs to have at least one character. - it can only contain uppercase and lowercase ASCII letters `[a-zA-Z]`, digits `[0-9]`, `.`, hyphens `-`, underscores `_`, slash `/`, at `@` and dollar `$` signs. - it must not start with a dot or a digit. - it must be different from Quickwit's reserved field mapping names `_source`, `_dynamic`, `_field_presence`. :::caution For field names containing the `.` character, you will need to escape it when referencing them. Otherwise the `.` character will be interpreted as a JSON object property access. Because of this, it is recommended to avoid using field names containing the `.` character. ::: ### Behavior with null values or missing fields Fields with `null` or missing fields in your JSON document will be silently ignored when indexing. ## Indexing settings This section describes indexing settings for a given index. | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `commit_timeout_secs` | Maximum number of seconds before committing a split since its creation. | `60` | | `split_num_docs_target` | Target number of docs per split. | `10000000` | | `merge_policy` | Describes the strategy used to trigger split merge operations (see [Merge policies](#merge-policies) section below). | | `resources.heap_size` | Indexer heap size per source per index. | `2000000000` | | `docstore_compression_level` | Level of compression used by zstd for the docstore. Lower values may increase ingest speed, at the cost of index size | `8` | | `docstore_blocksize` | Size of blocks in the docstore, in bytes. Lower values may improve doc retrieval speed, at the cost of index size | `1000000` | :::note Choosing an appropriate commit timeout is critical. With a shorter commit timeout, ingested data is queryable faster. But the published splits will be smaller, increasing the overhead associated with [merges](#merge-policies). When decommissioning definitively an indexer node that received data through the ingest API (including the [Elastic bulk API](/docs/reference/es_compatible_api) and the OTEL [log](/docs/log-management/otel-service.md) and [trace](/docs/distributed-tracing/otel-service.md) services), we need to make sure that all the data that was persisted locally (Write Ahead Log) is indexed and committed. After receiving the termination signal, the Quickwit process waits for the indexing pipelines to finish processing this local data. This can take as long as the longest commit timeout of all indexes. Make sure that the termination grace period of the infrastructure supporting the Quickwit indexer nodes is long enough (e.g [`terminationGracePeriodSeconds`](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) in Kubernetes or [`stopTimeout`](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html) on AWS ECS). ::: ### Merge policies Quickwit makes it possible to define the strategy used to decide which splits should be merged together and when. Quickwit offers three different merge policies, each with their own set of parameters. #### "Stable log" merge policy The stable log merge policy attempts to minimize write amplification AND keep time-pruning power as high as possible, by merging splits with a similar size, and with a close time span. Quickwit's default merge policy is the `stable_log` merge policy with the following parameters: ```yaml version: 0.7 index_id: "hdfs" # ... indexing_settings: merge_policy: type: "stable_log" min_level_num_docs: 100000 merge_factor: 10 max_merge_factor: 12 maturation_period: 48h ``` | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `merge_factor` | *(advanced)* Number of splits to merge together in a single merge operation. | `10` | | `max_merge_factor` | *(advanced)* Maximum number of splits that can be merged together in a single merge operation. | `12` | | `min_level_num_docs` | *(advanced)* Number of docs below which all splits are considered as belonging to the same level. | `100000` | | `maturation_period` | Duration after which a split is considered mature, and won't be considered for merges anymore. May impact the completion time of pending delete tasks. | `48h` | #### "Limit Merge" merge policy *The limit merge policy is considered advanced*. The limit merge policy simply limits write amplification by setting an upperbound of the number of merge operation a split should undergo. ```yaml version: 0.7 index_id: "hdfs" # ... indexing_settings: merge_policy: type: "limit_merge" max_merge_ops: 5 merge_factor: 10 max_merge_factor: 12 maturation_period: 48h ``` | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `max_merge_ops` | Maximum number of merges that a given split should undergo. | `4` | | `merge_factor` | *(advanced)* Number of splits to merge together in a single merge operation. | `10` | | `max_merge_factor` | *(advanced)* Maximum number of splits that can be merged together in a single merge operation. | `12` | | `maturation_period` | Duration after which a split is considered mature, and won't be considered for merges anymore. May impact the completion time of pending delete tasks. | `48h` | #### No merge The `no_merge` merge policy entirely disables merging. :::caution This setting is not recommended. Merges are necessary to reduce the number of splits, and hence improve search performances. ::: ```yaml version: 0.7 index_id: "hdfs" indexing_settings: merge_policy: type: "no_merge" ``` ### Indexer memory usage Indexer works with a default heap of 2 GiB of memory. This does not directly reflect the overall memory usage, but doubling this value should give a fair approximation. ## Search settings This section describes search settings for a given index. | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `default_search_fields` | Default list of fields that will be used for search. The field names in this list may be declared explicitly in the schema, or may refer to a field captured by the dynamic mode. | `None` | ## Retention policy This section describes how Quickwit manages data retention. In Quickwit, the retention policy manager drops data on a split basis as opposed to individually dropping documents. Splits are evaluated based on their `time_range` which is derived from the index timestamp field specified in the (`doc_mapping.timestamp_field`) settings. Using this setting, the retention policy will delete a split when `now() - split.time_range.end >= retention_policy.period` ```yaml version: 0.7 index_id: hdfs # ... retention: period: 90 days schedule: daily ``` | Variable | Description | Default value | | ------------- | ------------- | ------------- | | `period` | Duration after which splits are dropped, expressed in a human-readable way (`1 day`, `2 hours`, `a week`, ...). | required | | `schedule` | Frequency at which the retention policy is evaluated and applied, expressed as a cron expression (`0 0 * * * *`) or human-readable form (`hourly`, `daily`, `weekly`, `monthly`, `yearly`). | `hourly` | `period` is specified as set of time spans. Each time span is an integer followed by a unit suffix like: `2 days 3h 24min`. The supported units are: - `nsec`, `ns` -- nanoseconds - `usec`, `us` -- microseconds - `msec`, `ms` -- milliseconds - `seconds`, `second`, `sec`, `s` - `minutes`, `minute`, `min`, `m` - `hours`, `hour`, `hr`, `h` - `days`, `day`, `d` - `weeks`, `week`, `w` - `months`, `month`, `M` -- a month is defined as `30.44 days` - `years`, `year`, `y` -- a year is defined as `365.25 days` ================================================ FILE: docs/configuration/index.md ================================================ --- title: Configuration Reference --- import DocCardList from '@theme/DocCardList'; ================================================ FILE: docs/configuration/lambda-config.md ================================================ --- title: Lambda configuration sidebar_position: 6 --- Quickwit supports offloading leaf search operations to AWS Lambda for horizontal scaling. When the local search queue becomes saturated, overflow splits are automatically sent to Lambda functions for processing. :::note Lambda offloading is currently only supported on AWS. ::: ## How it works Lambda offloading is **only active when a `lambda` configuration section is present** under `searcher` in your node configuration. When configured: 1. Quickwit monitors the local search queue depth 2. When pending searches exceed the `offload_threshold`, new splits are sent to Lambda instead of being queued locally 3. Lambda returns per-split search results that are cached and merged with local results This allows Quickwit to handle traffic spikes without provisioning additional searcher nodes. ## Startup validation When a `lambda` configuration is defined, Quickwit performs a **dry run invocation** at startup to verify that: - The Lambda function exists - The function version matches the embedded binary - The invoker has permission to call the function If this validation fails, **Quickwit will fail to start**. This ensures that Lambda offloading works correctly before the node begins serving traffic. ## Configuration Add a `lambda` section under `searcher` in your node configuration: ```yaml searcher: lambda: offload_threshold: 100 auto_deploy: execution_role_arn: arn:aws:iam::123456789012:role/quickwit-lambda-role memory_size: 5 GiB invocation_timeout_secs: 15 ``` ### Lambda configuration options | Property | Description | Default value | | --- | --- | --- | | `function_name` | Name of the AWS Lambda function to invoke. | `quickwit-lambda-search` | | `max_splits_per_invocation` | Maximum number of splits to send in a single Lambda invocation. Must be at least 1. | `10` | | `offload_threshold` | Number of pending local searches before offloading to Lambda. A value of `0` offloads everything to Lambda. | `100` | | `auto_deploy` | Auto-deployment configuration. If set, Quickwit automatically deploys or updates the Lambda function at startup. | (none) | ### Auto-deploy configuration options | Property | Description | Default value | | --- | --- | --- | | `execution_role_arn` | **Required.** IAM role ARN for the Lambda function's execution role. | | | `memory_size` | Memory allocated to the Lambda function. More memory provides more CPU. | `5 GiB` | | `invocation_timeout_secs` | Timeout for Lambda invocations in seconds. | `15` | ## Deployment options ### Automatic deployment (recommended) With `auto_deploy` configured, Quickwit automatically: 1. Creates the Lambda function if it doesn't exist 2. Updates the function code if the embedded binary has changed 3. Publishes a new version with a unique identifier 4. Garbage collects old versions (keeps current + 5 most recent) This is the recommended approach as it ensures the Lambda function always matches the Quickwit binary version. ### Manual deployment You can deploy the Lambda function manually without `auto_deploy`: 1. Download the Lambda zip from [GitHub releases](https://github.com/quickwit-oss/quickwit/releases) 2. Create or update the Lambda function using AWS CLI, Terraform, or the AWS Console 3. Publish a version with description format `quickwit_{version}_{sha256}_{timeout}_{deploy_config}"` (e.g., `quickwit_0_8_0_fa940f44_5120_60s_6c3b2`) The description must match the format Quickwit expects, or it won't find the function version. ## IAM permissions ### Permissions for the Quickwit node The IAM role or user running Quickwit needs the following permissions to invoke Lambda: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lambda:InvokeFunction" ], "Resource": "arn:aws:lambda:*:*:function:quickwit-lambda-search:*" } ] } ``` If using `auto_deploy`, additional permissions are required for deployment: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lambda:CreateFunction", "lambda:GetFunction", "lambda:UpdateFunctionCode", "lambda:PublishVersion", "lambda:ListVersionsByFunction", "lambda:DeleteFunction" ], "Resource": "arn:aws:lambda:*:*:function:quickwit-lambda-search" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/quickwit-lambda-role", "Condition": { "StringEquals": { "iam:PassedToService": "lambda.amazonaws.com" } } } ] } ``` ### Lambda execution role The Lambda function requires an execution role with S3 read access to your index data. Example policy: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::your-index-bucket/*" } ] } ``` The execution role must also have a trust policy allowing Lambda to assume it: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } ``` ## CloudWatch logging The Lambda function emits structured logs (JSON) to stdout. To have these logs captured by CloudWatch, add the following iam permissions to the Lambda execution role: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:*:*:*" } ] } ``` No additional configuration is needed on the Quickwit side. ## Versioning Quickwit uses content-based versioning for Lambda: - A SHA256 hash of the Lambda binary is computed at build time - This hash is embedded in the Lambda function description as `quickwit:{version}-{sha256_short}` - When Quickwit starts, it searches for a version matching this description - Different Quickwit builds with the same Lambda binary share the same Lambda version - Updating the Lambda binary automatically triggers a new deployment ## Example configuration Minimal configuration (with auto-deployment): ```yaml searcher: lambda: auto_deploy: execution_role_arn: arn:aws:iam::123456789012:role/quickwit-lambda-role ``` Full configuration (auto-deployment): ```yaml searcher: lambda: function_name: quickwit-lambda-search max_splits_per_invocation: 10 offload_threshold: 10 auto_deploy: execution_role_arn: arn:aws:iam::123456789012:role/quickwit-lambda-role memory_size: 5 GiB invocation_timeout_secs: 15 ``` Aggressive offloading (send everything to Lambda): ```yaml searcher: lambda: function_name: quickwit-lambda-search offload_threshold: 0 auto_deploy: execution_role_arn: arn:aws:iam::123456789012:role/quickwit-lambda-role ``` ================================================ FILE: docs/configuration/metastore-config.md ================================================ --- title: Metastore configuration sidebar_position: 4 --- Quickwit needs a place to store meta-information about its indexes. For instance: - The index configuration. - Meta-information about its splits. For instance, their IDs, the number of documents they contain, their sizes, their min/max timestamp, and the set of tags present in the split. - The different sources checkpoints. - Some extra information such as the index creation time. The metastore is entirely defined by a single URI. One can set it by editing the `metastore_uri` parameter of the [node configuration file](./node-config.md) (often named `quickwit.yaml`). Currently, Quickwit offers two implementations: - **PostgreSQL**: recommended for distributed usage. - **File-backed implementation**. # PostgreSQL Metastore We recommend the PostgreSQL metastore for any distributed usage. The PostgreSQL metastore can be configured by setting a PostgreSQL URI in the `metastore_uri` parameter of the Quickwit configuration file. The URI takes the following format: ``` postgres://[user]:[password]@[host]:[port]/[dbname] ``` Some of those parameters can be omitted. The following PostgreSQL URIs are for instance valid: ``` postgres://localhost/mydb postgres://user@localhost postgres://user:secret@localhost ``` The database has to be created in advance. On its first execution, Quickwit will transparently create the necessary tables. Likewise, if you upgrade Quickwit to a version that includes some changes in the PostgreSQL schema, Quickwit will transparently operate the migration startup. # File-backed metastore For convenience, Quickwit also makes it possible to store its metadata in files using a file-backed metastore. In that case, Quickwit will write one file per index. The metastore is then configured by passing a [storage URI](storage-config#storage-uris) that will serve as the root of the metastore storage. The metadata file associated with a given index will then be stored under `[storage_uri]/[index_id]/metastore.json` For the moment, Quickwit supports two types of storage types: - a local file system URI (e.g., `file:///opt/toto`). It is also valid to pass a file path directly (without file://). `/var/quickwit`. Relative paths will be resolved with respect to the current working directory. - S3-compatible storage URI (e.g., `s3://my-bucket/some-path`). See the [storage config](storage-config) documentation to configure S3 or S3-compatible storage providers. ### Polling configuration By default, the File-Backed Metastore is only read once when you start a Quickwit process (searcher, indexer, ...). You can also configure it to poll the File-Backed Metastore periodically to keep a fresh view of it. This is useful for a Searcher instance that needs to be aware of new splits published by an Indexer running in parallel. To configure the polling interval (in seconds), add a URI fragment to the storage URI as follows: `s3://quickwit/my-indexes#polling_interval=30s` :::note The polling interval can be configured in seconds only; other units, such as minutes or hours, are not supported. ::: :::tip Amazon S3 charges $0.0004 per 1000 GET requests. Polling a metastore every 30 seconds costs $0.04 per month and index. ::: ### Examples The following file-backed metastore URIs for instance are valid: ```markdown s3://my-indexes s3://quickwit/my-indexes s3://quickwit/my-indexes#polling_interval=30s file:///local/indices file:///local/indices#polling_interval=30s /local/indices ./quickwit-metastores ``` :::caution The file-backed metastore does not support multiple instances running at the same time because it does not implement any locking mechanism to prevent concurrent writes from overwriting each other. Ensure that only one file-backed metastore instance is running at all times. ::: ================================================ FILE: docs/configuration/node-config.md ================================================ --- title: Node configuration sidebar_position: 1 --- The node configuration allows you to customize and optimize the settings for individual nodes in your cluster. It is divided into several sections: - Common configuration settings: shared top-level properties - Storage settings: defined in the [storage](#storage-configuration) section - Metastore settings: defined in the [metastore](#metastore-configuration) section - Ingest settings: defined in the [ingest_api](#ingest-api-configuration) section - Indexer settings: defined in the [indexer](#indexer-configuration) section - Searcher settings: defined in the [searcher](#searcher-configuration) section - Jaeger settings: defined in the [jaeger](#jaeger-configuration) section A commented example is available here: [quickwit.yaml](https://github.com/quickwit-oss/quickwit/blob/main/config/quickwit.yaml). ## Common configuration | Property | Description | Env variable | Default value | | --- | --- | --- | --- | | `version` | Config file version. `0.7` is the only available value with a retro compatibility on `0.5` and `0.4`. | | | | `cluster_id` | Unique identifier of the cluster the node will be joining. Clusters sharing the same network should use distinct cluster IDs.| `QW_CLUSTER_ID` | `quickwit-default-cluster` | | `node_id` | Unique identifier of the node. It must be distinct from the node IDs of its cluster peers. Defaults to the instance's short hostname if not set. | `QW_NODE_ID` | short hostname | | `enabled_services` | Enabled services (control_plane, indexer, janitor, metastore, searcher) | `QW_ENABLED_SERVICES` | all services | | `listen_address` | The IP address or hostname that Quickwit service binds to for starting REST and GRPC server and connecting this node to other nodes. By default, Quickwit binds itself to 127.0.0.1 (localhost). This default is not valid when trying to form a cluster. | `QW_LISTEN_ADDRESS` | `127.0.0.1` | | `advertise_address` | IP address advertised by the node, i.e. the IP address that peer nodes should use to connect to the node for RPCs. | `QW_ADVERTISE_ADDRESS` | `listen_address` | | `gossip_listen_port` | The port which to listen for the Gossip cluster membership service (UDP). | `QW_GOSSIP_LISTEN_PORT` | `rest.listen_port` | | `grpc_listen_port` | The port on which gRPC services listen for traffic. | `QW_GRPC_LISTEN_PORT` | `rest.listen_port + 1` | | `peer_seeds` | List of IP addresses or hostnames used to bootstrap the cluster and discover the complete set of nodes. This list may contain the current node address and does not need to be exhaustive. If the list of peer seeds contains a host name, Quickwit will resolve it by querying the DNS every minute. On kubernetes for instance, it is a good practise to set it to a [headless service](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services). | `QW_PEER_SEEDS` | | | `data_dir` | Path to directory where data (tmp data, splits kept for caching purpose) is persisted. This is mostly used in indexing. | `QW_DATA_DIR` | `./qwdata` | | `metastore_uri` | Metastore URI. Can be a local directory or `s3://my-bucket/indexes` or `postgres://username:password@localhost:5432/metastore`. [Learn more about the metastore configuration](metastore-config.md). | `QW_METASTORE_URI` | `{data_dir}/indexes` | | `default_index_root_uri` | Default index root URI that defines the location where index data (splits) is stored. The index URI is built following the scheme: `{default_index_root_uri}/{index-id}` | `QW_DEFAULT_INDEX_ROOT_URI` | `{data_dir}/indexes` | | environment variable only | Log level of Quickwit. Can be a direct log level, or a comma separated list of `module_name=level` | `RUST_LOG` | `info` | ## REST configuration This section contains the REST API configuration options. | Property | Description | Env variable | Default value | | --- | --- | --- | --- | | `listen_port` | The port on which the REST API listens for HTTP traffic. | `QW_REST_LISTEN_PORT` | `7280` | | `cors_allow_origins` | Configure the CORS origins which are allowed to access the API. [Read more](#configuring-cors-cross-origin-resource-sharing) | | | `extra_headers` | List of header names and values | | | ### Configuring CORS (Cross-origin resource sharing) CORS (Cross-origin resource sharing) describes which address or origins can access the REST API from the browser. By default, sharing resources cross-origin is not allowed. A wildcard, single origin, or multiple origins can be specified as part of the `cors_allow_origins` parameter: Example of a REST configuration: ```yaml rest: listen_port: 1789 extra_headers: x-header-1: header-value-1 x-header-2: header-value-2 cors_allow_origins: '*' # cors_allow_origins: https://my-hdfs-logs.domain.com # Optionally we can specify one domain # cors_allow_origins: # Or allow multiple origins # - https://my-hdfs-logs.domain.com # - https://my-hdfs.other-domain.com ``` ## gRPC configuration This section contains the configuration options for gRPC services and clients used for internal communication between nodes. | Property | Description | Env variable | Default value | | --- | --- | --- | --- | | `max_message_size` | The maximum size (in bytes) of messages exchanged by internal gRPC clients and services. | | `20 MiB` | Example of a gRPC configuration: ```yaml grpc: max_message_size: 30 MiB ``` :::warning We advise changing the default value of 20 MiB only if you encounter the following error: `Error, message length too large: found 24732228 bytes, the limit is: 20971520 bytes.` In that case, increase `max_message_size` by increments of 10 MiB until the issue disappears. This is a temporary fix: the next version of Quickwit will rely exclusively on gRPC streaming endpoints and handle messages of any length. ::: ## Storage configuration Please refer to the dedicated [storage configuration](storage-config) page to learn more about configuring Quickwit for various storage providers. Here are also some minimal examples of how to configure Quickwit with Amazon S3 or Alibaba OSS: ```bash AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= ``` *Amazon S3* ```yaml storage: s3: region: us-east-1 ``` *Alibaba* ```yaml storage: s3: region: us-east-1 endpoint: https://oss-us-east-1.aliyuncs.com ``` ## Metastore configuration This section may contain one configuration subsection per available metastore implementation. The specific configuration parameters for each implementation may vary. Currently, the available metastore implementations are: - File-backed - PostgreSQL ### File-backed metastore configuration File-backed metastore doesn't have any node level configuration. You can configure the poll interval [at the index level](./metastore-config.md#polling-configuration). ### PostgreSQL metastore configuration | Property | Description | Default value | | --- | --- | --- | | `min_connections` | Minimum number of connections to maintain in the pool at all times. | `0` | | `max_connections` | Maximum number of connections to maintain in the pool. | `10` | | `acquire_connection_timeout` | Maximum amount of time to spend waiting for an available connection before aborting a query. | `10s` | | `idle_connection_timeout` | Maximum idle duration before closing individual connections. | `10min` | | `max_connection_lifetime` | Maximum lifetime of individual connections. | `30min` | Example of a metastore configuration for PostgreSQL in YAML format: ```yaml metastore: postgres: min_connections: 10 max_connections: 50 acquire_connection_timeout: 30s idle_connection_timeout: 1h max_connection_lifetime: 1d ``` ## Indexer configuration This section contains the configuration options for an indexer. The split store is documented in the [indexing document](../overview/concepts/indexing.md#split-store). | Property | Description | Default value | | --- | --- | --- | | `split_store_max_num_bytes` | Maximum size in bytes allowed in the split store. | `100G` | | `split_store_max_num_splits` | Maximum number of files allowed in the split store. | `1000` | | `max_concurrent_split_uploads` | Maximum number of concurrent split uploads allowed on the node. | `12` | | `merge_concurrency` | Maximum number of merge operations that can be executed on the node at one point in time. | `(2 x num threads available) / 3` | | `enable_otlp_endpoint` | If true, enables the OpenTelemetry exporter endpoint to ingest logs and traces via the OpenTelemetry Protocol (OTLP). | `false` | | `cpu_capacity` | Advisory parameter used by the control plane. The value can expressed be in threads (e.g. `2`) or in term of millicpus (`2000m`). The control plane will attempt to schedule indexing pipelines on the different nodes proportionally to the cpu capacity advertised by the indexer. It is NOT used as a limit. All pipelines will be scheduled regardless of whether the cluster has sufficient capacity or not. The control plane does not attempt to spread the work equally when the load is well below the `cpu_capacity`. Users who need a balanced load on all of their indexer nodes can set the `cpu_capacity` to an arbitrarily low value as long as they keep it proportional to the number of threads available. | `num threads available` | | `enable_cooperative_indexing` | Enable sharing resources more efficiently when the number of indexes actively written to is significantly higher than the number of cores but might decrease the overall indexing throughput. | `false` | Example: ```yaml indexer: split_store_max_num_bytes: 100G split_store_max_num_splits: 1000 max_concurrent_split_uploads: 12 enable_otlp_endpoint: true ``` ## Ingest API configuration | Property | Description | Default value | | --- | --- | --- | | `max_queue_memory_usage` | Maximum size in bytes of the in-memory Ingest queue. | `2GiB` | | `max_queue_disk_usage` | Maximum disk-space in bytes taken by the Ingest queue. The minimum size is at least `256M` and be at least `max_queue_memory_usage`. | `4GiB` | | `content_length_limit` | Maximum payload size uncompressed. Increasing this is discouraged, use a [file source](../ingest-data/sqs-files.md) instead. | `10MiB` | | `grpc_compression_algorithm` | Compression algorithm (`gzip` or `zstd`) to use for gRPC traffic between nodes for the ingest service | `None` | Example: ```yaml ingest_api: max_queue_memory_usage: 2GiB max_queue_disk_usage: 4GiB content_length_limit: 10MiB grpc_compression_algorithm: zstd ``` ## Searcher configuration This section contains the configuration options for a Searcher. | Property | Description | Default value | | --- | --- | --- | | `aggregation_memory_limit` | Controls the maximum amount of memory that can be used for aggregations before aborting. This limit is per searcher node. A node may run concurrent queries, which share the limit. The first query that will hit the limit will be aborted and frees its memory. It is used to prevent excessive memory usage during the aggregation phase, which can lead to performance degradation or crashes. | `500M`| | `aggregation_bucket_limit` | Determines the maximum number of buckets returned to the client. | `65000` | | `fast_field_cache_capacity` | Fast field in memory cache capacity on a Searcher. If your filter by dates, run aggregations, range queries, or even for tracing, it might worth increasing this parameter. The [metrics](../reference/metrics.md) starting by `quickwit_cache_fastfields_cache` can help you make an informed choice when setting this value. | `1G` | | `split_footer_cache_capacity` | Split footer in memory cache (it is essentially the hotcache) capacity on a Searcher.| `500M` | | `partial_request_cache_capacity` | Partial request in memory cache capacity on a Searcher. Cache intermediate state for a request, possibly making subsequent requests faster. It can be disabled by setting the size to `0`. | `64M` | | `max_num_concurrent_split_searches` | Maximum number of concurrent split search requests running on a Searcher. | `100` | | `split_cache` | Searcher split cache configuration options defined in the section below. Cache disabled if unspecified. | | | `request_timeout_secs` | The time before a search request is cancelled. This should match the timeout of the stack calling into quickwit if there is one set. | `30` | ### Searcher split cache configuration This section contains the configuration options for the on-disk searcher split cache. Files are stored in the data directory under `searcher-split-cache/`. | Property | Description | Default value | | --- | --- | --- | | `max_num_bytes` | Maximum disk size in bytes allowed in the split cache. Can be exceeded by the size of one split. | | | `max_num_splits` | Maximum number of splits allowed in the split cache. | `10000` | | `num_concurrent_downloads` | Maximum number of concurrent download of splits. | `1` | Example: ```yaml searcher: fast_field_cache_capacity: 1G split_footer_cache_capacity: 500M partial_request_cache_capacity: 64M split_cache: max_num_bytes: 1G max_num_splits: 10000 num_concurrent_downloads: 1 ``` ## Jaeger configuration | Property | Description | Default value | | --- | --- | --- | | `enable_endpoint` | If true, enables the gRPC endpoint that allows the Jaeger Query Service to connect and retrieve traces. | `false` | Example: ```yaml jaeger: enable_endpoint: true ``` ## Using environment variables in the configuration You can use environment variable references in the config file to set values that need to be configurable during deployment. To do this, use: `${VAR_NAME}` where `VAR_NAME` is the name of the environment variable. Each variable reference is replaced at startup by the value of the environment variable. The replacement is case-sensitive and occurs before the configuration file is parsed. Referencing undefined variables throws an error unless you specify a default value or custom error text. To specify a default value, use: `${VAR_NAME:-default_value}` where `default_value` is the value to use if the environment variable is unset. ``` : ${VAR_NAME} or : ${VAR_NAME:-default value} ``` For example: ```bash export QW_LISTEN_ADDRESS=0.0.0.0 ``` ```yaml # config.yaml version: 0.7 cluster_id: quickwit-cluster node_id: my-unique-node-id listen_address: ${QW_LISTEN_ADDRESS} rest: listen_port: ${QW_LISTEN_PORT:-1111} ``` Will be interpreted by Quickwit as: ```yaml version: 0.7 cluster_id: quickwit-cluster node_id: my-unique-node-id listen_address: 0.0.0.0 rest: listen_port: 1111 ``` ================================================ FILE: docs/configuration/ports-config.md ================================================ --- title: Ports configuration sidebar_position: 6 --- When starting a quickwit search server, one important parameter that can be configured is the `rest.listen_port` (defaults to :7280). Internally, Quickwit will, in fact, use three sockets. The ports of these three sockets cannot be configured independently at the moment. The ports used are computed relative to the `rest.listen_port` port, as follows. | Service | Port used | Protocol | Default | |-------------------------------|---------------------------|----------|-----------| | Http server with the rest api | `${rest.listen_port}` | TCP | 7280 | | Cluster membership | `${rest.listen_port}` | UDP | 7280 | | GRPC service | `${rest.listen_port} + 1` | TCP | 7281 | It is not possible for the moment to configure these ports independently. In order to form a cluster, you will also need to define a `peer_seeds` parameter. The following addresses are valid peer seed addresses: | Type | Example without port | Example with port | |--------------|--------------|---------------------------| | IPv4 | 172.1.0.12 | 172.1.0.12:7180 | | IPv6 | 2001:0db8:85a3:0000:0000:8a2e:0370:7334 | [2001:0db8:85a3:0000:0000:8a2e:0370:7334:7180]:7280 | | hostname | node3 | node3:7180 | If no port is specified in a peer node address, a Quickwit node will assume the peer is using the same port as itself. ================================================ FILE: docs/configuration/source-config.md ================================================ --- title: Source configuration sidebar_position: 5 --- Quickwit can insert data into an index from one or multiple sources. A source can be added after index creation using the [CLI command](../reference/cli.md#source) `quickwit source create`. It can also be enabled or disabled with the `quickwit source enable/disable` subcommands. A source is declared using an object called source config, which defines the source's settings. It consists of multiple parameters: - source ID - source type - source parameters - input_format - maximum number of pipelines per indexer (optional) - desired number of pipelines (optional) - transform parameters (optional) ## Source ID The source ID is a string that uniquely identifies the source within an index. It may only contain uppercase or lowercase ASCII letters, digits, hyphens (`-`), and underscores (`_`). Finally, it must start with a letter and contain at least 3 characters but no more than 255. ## Source type The source type designates the kind of source being configured. As of version 0.5, available source types are `ingest-api`, `kafka`, `kinesis`, and `pulsar`. The `file` type is also supported but only for local ingestion from [the CLI](/docs/reference/cli.md#tool-local-ingest). ## Source parameters The source parameters indicate how to connect to a data store and are specific to the source type. ### File source A file source reads data from files containing JSON objects separated by newlines (NDJSON). Gzip compression is supported provided that the file name ends with the `.gz` suffix. #### Ingest a single file (CLI only) To ingest a specific file, run the indexing directly in an adhoc CLI process with: ```bash ./quickwit tool local-ingest --index --input-path ``` Both local and object files are supported, provided that the environment is configured with the appropriate permissions. A tutorial is available [here](/docs/ingest-data/ingest-local-file.md). #### Notification based file ingestion (beta) Quickwit can automatically ingest all new files that are uploaded to an S3 bucket. This requires creating and configuring an [SQS notification queue](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html). A complete example can be found [in this tutorial](/docs/ingest-data/sqs-files.md). The `notifications` parameter takes an array of notification settings. Currently one notifier can be configured per source and only the SQS notification `type` is supported. Required fields for the SQS `notifications` parameter items: - `type`: `sqs` - `queue_url`: complete URL of the SQS queue (e.g `https://sqs.us-east-1.amazonaws.com/123456789012/queue-name`) - `message_type`: format of the message payload, either - `s3_notification`: an [S3 event notification](https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html) - `raw_uri`: a message containing just the file object URI (e.g. `s3://mybucket/mykey`) - `deduplication_window_duration_sec`: maximum duration for which ingested files checkpoints are kept (default 3600) - `deduplication_window_max_messages`: maximum number of ingested file checkpoints kept (default 100k) - `deduplication_cleanup_interval_secs`: frequency at which outdated file checkpoints are cleaned up *Adding a file source with SQS notifications to an index with the [CLI](../reference/cli.md#source)* ```bash cat << EOF > source-config.yaml version: 0.8 source_id: my-sqs-file-source source_type: file num_pipelines: 2 params: notifications: - type: sqs queue_url: https://sqs.us-east-1.amazonaws.com/123456789012/queue-name message_type: s3_notification EOF ./quickwit source create --index my-index --source-config source-config.yaml ``` :::note - Quickwit does not automatically delete the source files after a successful ingestion. You can use [S3 object expiration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-expire-general-considerations.html) to configure how long they should be retained in the bucket. - Configure the notification to only forward events of type `s3:ObjectCreated:*`. Other events are acknowledged by the source without further processing and an warning is logged. - We strongly recommend using a [dead letter queue](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html) to receive all messages that couldn't be processed by the file source. A `maxReceiveCount` of 5 is a good default value. Here are some common situations where the notification message ends up in the dead letter queue: - the notification message could not be parsed (e.g it is not a valid S3 notification) - the file was not found - the file is corrupted (e.g unexpected compression) - AWS S3 notifications and AWS SQS provide "at least once" delivery guaranties. To avoid duplicates, the file source includes a mechanism that prevents the same file from being ingested twice. It works by storing checkpoints in the metastore that track the indexing progress for each file. You can decrease `deduplication_window_*` or increase `deduplication_cleanup_interval_secs` to reduce the load on the metastore. ::: ### Ingest API source An ingest API source reads data from the [Ingest API](/docs/reference/rest-api.md#ingest-data-into-an-index). This source is automatically created at the index creation and cannot be deleted nor disabled. ### Kafka source A Kafka source reads data from a Kafka stream. Each message in the stream must hold a JSON object. A tutorial is available [here](/docs/ingest-data/kafka.md). #### Kafka source parameters The Kafka source consumes a `topic` using the client library [librdkafka](https://github.com/edenhill/librdkafka) and forwards the key-value pairs carried by the parameter `client_params` to the underlying librdkafka consumer. Common `client_params` options are bootstrap servers (`bootstrap.servers`), or security protocol (`security.protocol`). Please, refer to [Kafka](https://kafka.apache.org/documentation/#consumerconfigs) and [librdkafka](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md) documentation pages for more advanced options. | Property | Description | Default value | | --- | --- | --- | | `topic` | Name of the topic to consume. | required | | `client_log_level` | librdkafka client log level. Possible values are: debug, info, warn, error. | `info` | | `client_params` | librdkafka client configuration parameters. | `{}` | | `enable_backfill_mode` | Backfill mode stops the source after reaching the end of the topic. | `false` | **Kafka client parameters** - `bootstrap.servers` Comma-separated list of host and port pairs that are the addresses of a subset of the Kafka brokers in the Kafka cluster. - `auto.offset.reset` Defines the behavior of the source when consuming a partition for which there is no initial offset saved in the checkpoint. `earliest` consumes from the beginning of the partition, whereas `latest` (default) consumes from the end. - `enable.auto.commit` This setting is ignored because the Kafka source manages commit offsets internally using the [checkpoint API](../overview/concepts/indexing.md#checkpoint) and forces auto-commits to be disabled. - `group.id` Kafka-based distributed indexing relies on consumer groups. Unless overridden in the client parameters, the default group ID assigned to each consumer managed by the source is `quickwit-{index_uid}-{source_id}`. - `max.poll.interval.ms` Short max poll interval durations may cause a source to crash when back pressure from the indexer occurs. Therefore, Quickwit recommends using the default value of `300000` (5 minutes). *Adding a Kafka source to an index with the [CLI](../reference/cli.md#source)* ```bash cat << EOF > source-config.yaml version: 0.8 source_id: my-kafka-source source_type: kafka num_pipelines: 2 params: topic: my-topic client_params: bootstrap.servers: localhost:9092 security.protocol: SSL EOF ./quickwit source create --index my-index --source-config source-config.yaml ``` ### Kinesis source A Kinesis source reads data from an [Amazon Kinesis](https://aws.amazon.com/kinesis/) stream. Each message in the stream must hold a JSON object. A tutorial is available [here](/docs/ingest-data/kinesis.md). **Kinesis source parameters** The Kinesis source consumes a stream identified by a `stream_name` and a `region`. | Property | Description | Default value | | --- | --- | --- | | `stream_name` | Name of the stream to consume. | required | | `region` | The AWS region of the stream. Mutually exclusive with `endpoint`. | `us-east-1` | | `endpoint` | Custom endpoint for use with AWS-compatible Kinesis service. Mutually exclusive with `region`. | optional | If no region is specified, Quickwit will attempt to find one in multiple other locations and with the following order of precedence: 1. Environment variables (`AWS_REGION` then `AWS_DEFAULT_REGION`) 2. Config file, typically located at `~/.aws/config` or otherwise specified by the `AWS_CONFIG_FILE` environment variable if set and not empty. 3. Amazon EC2 instance metadata service determining the region of the currently running Amazon EC2 instance. 4. Default value: `us-east-1` *Adding a Kinesis source to an index with the [CLI](../reference/cli.md#source)* ```bash cat << EOF > source-config.yaml version: 0.7 source_id: my-kinesis-source source_type: kinesis params: stream_name: my-stream EOF quickwit source create --index my-index --source-config source-config.yaml ``` ### Pulsar source A Puslar source reads data from one or several Pulsar topics. Each message in topic(s) must hold a JSON object. A tutorial is available [here](/docs/ingest-data/pulsar.md). **Pulsar source parameters** The Pulsar source consumes `topics` using the client library [pulsar-rs](https://github.com/streamnative/pulsar-rs). | Property | Description | Default value | | --- | --- | --- | | `topics` | List of topics to consume. | required | | `address` | Pulsar URL (pulsar:// and pulsar+ssl://). | required | | `consumer_name` | The consumer name to register with the pulsar source. | `quickwit` | *Adding a Pulsar source to an index with the [CLI](../reference/cli.md#source)* ```bash cat << EOF > source-config.yaml version: 0.7 source_id: my-pulsar-source source_type: pulsar params: topics: - my-topic address: pulsar://localhost:6650 EOF ./quickwit source create --index my-index --source-config source-config.yaml ``` ## Number of pipelines The `num_pipelines` parameter is only available for distributed sources like Kafka, GCP PubSub, and Pulsar. It defines the number of pipelines to run on a cluster for the source. The actual placement of these pipelines on the different indexer will be decided by the control plane. :::info Note that distributing the indexing load of partitioned sources like Kafka is done by assigning the different partitions to different pipelines. As a result, it is important to ensure that the number of partitions is a multiple of `num_pipelines`. Also, assuming you are only indexing a single Kafka source in your Quickwit cluster, you should set the number of pipelines to a multiple of the number of indexers. Finally, if your indexing throughput is high, you should provision between 2 and 4 vCPUs per pipeline. For instance, assume you want to index a 60-partition topic, with each partition receiving a throughput of 10 MB/s. If you measured that Quickwit can index your data at a pace of 40MB/s per pipeline, a possible setting could be: - 5 indexers with 8 vCPUs each - 15 pipelines Each indexer will then be in charge of 3 pipelines, and each pipeline will cover 4 partitions. ::: ## Transform parameters For all source types but the `ingest-api`, ingested documents can be transformed before being indexed using [Vector Remap Language (VRL)](https://vector.dev/docs/reference/vrl/) scripts. | Property | Description | Default value | | --- | --- | --- | | `script` | Source code of the VRL program executed to transform documents. | required | | `timezone` | Timezone used in the VRL program for date and time manipulations. It must be a valid name in the [TZ database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) | `UTC` | ```yaml # Your source config here # ... transform: script: | .message = downcase(string!(.message)) .timestamp = now() del(.username) timezone: local ``` ## Input format The `input_format` parameter specifies the expected data format of the source. The formats currently supported are: - `json` (default) - `otlp_logs_json` - `otlp_logs_proto` - `otlp_traces_json` - `otlp_traces_proto` - `plain_text` *OTLP formats* When ingesting OTLP data into an OTLP logs or traces index with a source other than the native OTEL endpoints, use this parameter to specify whether the exported logs or traces will be serialized in JSON or Protobuf. When possible, prefer the latter, which is a more compact encoding. *Plaint text format* Use this parameter for unstructured text data. Internally, Quickwit can only index JSON data. To allow the ingestion of plain text documents, Quickwit transform them on the fly into JSON objects of the following form: `{"plain_text": ""}`. Then, they can be optionally transformed into more complex documents using a VRL script. (see [transform feature](#transform-parameters)). The following is an example of how one could parse and transform a CSV dataset containing a list of users described by 3 attributes: first name, last name, and age. ```yaml # Your source config here # ... input_format: plain_text transform: script: | user = parse_csv!(.plain_text) .first_name = user[0] .last_name = user[1] .age = to_int!(user[2]) del(.plain_text) ``` ## Enabling/disabling a source from an index A source can be enabled or disabled from an index using the [CLI command](../reference/cli.md) `quickwit source enable` or `quickwit source disable`: ```bash quickwit source disable --index my-index --source my-source ``` A source is enabled by default. When disabling a source, the related indexing pipelines will be shut down on each relevant indexer and indexing for this source will be paused. ## Deleting a source from an index A source can be removed from an index using the [CLI command](../reference/cli.md) `quickwit source delete`: ```bash quickwit source delete --index my-index --source my-source ``` When deleting a source, the checkpoint associated with the source is also removed. ================================================ FILE: docs/configuration/storage-config.md ================================================ --- title: Storage configuration sidebar_position: 2 --- ## Supported Storage Providers Quickwit currently supports four types of storage providers: - Amazon S3 and S3-compatible (Garage, MinIO, ...) - Azure Blob Storage - Local file storage* - Google Cloud Storage (native API) ## Storage URIs Storage URIs refer to different storage providers identified by a URI "protocol" or "scheme". Quickwit supports the following storage URI protocols: - `s3://` for Amazon S3 and S3-compatible - `azure://` for Azure Blob Storage - `file://` for local file systems - `gs://` for Google Cloud Storage In general, you can use a storage URI or a file path anywhere you would intuitively expect a file path. For instance: - when setting the `index_uri` of an index to specify the storage provider and location; - when setting the `metastore_uri` in a node config to set up a file-backed metastore; - when passing a file path as a command line argument. ### Local file storage URIs Quickwit interprets regular file paths as local file system URIs. Relative file paths are allowed and are resolved relatively to the current working directory (CWD). `~` can be used as a shortcut to refer to the user’s home directory. The following are valid local file system URIs: ```markdown - /var/quickwit - file:///var/quickwit - /home/quickwit/data - ~/data - ./quickwit ``` :::caution When using the `file://` protocol, a third `/` is necessary to express an absolute path. For instance, the following URI `file://home/quickwit/` is interpreted as `./home/quickwit` ::: ## Storage configuration This section contains one configuration subsection per storage provider. If a storage configuration parameter is not explicitly set, Quickwit relies on the default values provided by the storage provider SDKs ([Azure SDK for Rust](https://github.com/Azure/azure-sdk-for-rust), [AWS SDK for Rust](https://github.com/awslabs/aws-sdk-rust)). ### S3 storage configuration | Property | Description | Default value | | --- | --- | --- | | `flavor` | The optional storage flavor to use. Available flavors are `digital_ocean`, `garage`, `gcs`, and `minio`. | | | `access_key_id` | The AWS access key ID. | | | `secret_access_key` | The AWS secret access key. | | | `region` | The AWS region to send requests to. | `us-east-1` (SDK default) | | `endpoint` | Custom endpoint for use with S3-compatible providers. | SDK default | | `force_path_style_access` | Disables [virtual-hosted–style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) requests. Required by some S3-compatible providers (Ceph, MinIO). | `false` | | `disable_multi_object_delete` | Disables [Multi-Object Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) requests. Required by some S3-compatible providers (GCS). | `false` | | `disable_multipart_upload` | Disables [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) of objects. Required by some S3-compatible providers (GCS). | `false` | :::warning Hardcoding credentials into configuration files is not secure and strongly discouraged. Prefer the alternative authentication methods that your storage backend may provide. ::: #### Environment variables | Env variable | Description | | --- | --- | | `QW_S3_ENDPOINT` | Custom S3 endpoint. | | `QW_S3_MAX_CONCURRENCY` | Limit the number of concurrent requests to S3 | #### Storage flavors Storage flavors ensure that Quickwit works correctly with storage providers that deviate from the S3 API by automatically configuring the appropriate settings. The available flavors are: - `digital_ocean` - `garage` - `gcs` - `minio` *Digital Ocean* The Digital Ocean flavor (`digital_ocean`) forces path-style access and turns off multi-object delete requests. *Garage flavor* The Garage flavor (`garage`) overrides the `region` parameter to `garage` and forces path-style access. *Google Cloud Storage* The Google Cloud Storage flavor (`gcs`) turns off multi-object delete requests and multipart uploads. *MinIO flavor* The MinIO flavor (`minio`) overrides the `region` parameter to `minio` and forces path-style access. Example of a storage configuration for Google Cloud Storage in YAML format: ```yaml storage: s3: flavor: gcs region: us-east1 endpoint: https://storage.googleapis.com ``` ### Azure storage configuration | Property | Description | Default value | | --- | --- | --- | | `account` | The Azure storage account name. | | | `access_key` | The Azure storage account access key. | | #### Environment variables | Env variable | Description | | --- | --- | | `QW_AZURE_STORAGE_ACCOUNT` | Azure Blob Storage account name. | | `QW_AZURE_STORAGE_ACCESS_KEY` | Azure Blob Storage account access key. | Example of a storage configuration for Azure in YAML format: ```yaml storage: azure: account: your-azure-account-name access_key: your-azure-access-key ``` ## Storage configuration examples for various object storage providers ### Garage [Garage](https://garagehq.deuxfleurs.fr/) is an open-source distributed object storage service tailored for self-hosting. ```yaml storage: s3: flavor: garage endpoint: http://127.0.0.1:3900 ``` ### MinIO [MinIO](https://min.io/) is a high-performance object storage. ```yaml storage: s3: flavor: minio endpoint: http://127.0.0.1:9000 ``` Note: `default_index_root_uri` or index URIs do not include the endpoint, you should set it as a typical S3 path such as `s3://indexes`. ================================================ FILE: docs/configuration/template-config.md ================================================ --- title: Index template configuration sidebar_position: 7 toc_max_heading_level: 4 --- This page describes how to configure an index template. Index templates let you dynamically create indexes according to predefined rules. Templates are used automatically when documents are received on the ingest API for an index that doesn't exist. The index template configuration lets you define the following parameters: - `template_id` (required) - `description` - `index_id_patterns` (required) - `index_root_uri` - `priority` Besides, the following parameters can also be configured and are the same as those found in the [index configuration](../configuration/index-config.md): - doc mapping (required) - indexing settings - search settings - retention policy You can manage templates using the [index template API](../reference/rest-api.md#index-template-api). ## Config file format The index configuration format is YAML or JSON. When a key is absent from the configuration file, the default value is used. Here is a complete example: ```yaml version: 0.9 # File format version. template_id: "hdfs-dev" index_root_uri: "s3://my-bucket/hdfs-dev/" description: "HDFS log management dev" index_id_patterns: - hdfs-dev-* - hdfs-staging-* priority: 100 doc_mapping: mode: lenient field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: severity_text type: text tokenizer: raw fast: - tokenizer: lowercase - name: body type: text tokenizer: default record: position - name: resource type: object field_mappings: - name: service type: text tokenizer: raw tag_fields: ["resource.service"] timestamp_field: timestamp index_field_presence: true search_settings: default_search_fields: [severity_text, body] retention: period: 90 days schedule: daily ``` ## Template ID The `template_id` is a string that uniquely identifies the index template within the metastore. It may only contain uppercase or lowercase ASCII letters, digits, hyphens (`-`), and underscores (`_`). It must start with a letter and contain at least 3 characters but no more than 255. ## Description An optional string that describes what the index template is used for. ## Index root uri The `index_root_uri` defines where the index files (also called splits) should be stored. This parameter expects a [storage uri](storage-config#storage-uris). The actual URI of the index is the path concatenation of the `index_root_uri` with the index id. If `index_root_uri` is not defined, the `default_index_root_uri` from [Quickwit's node config](node-config) will be used. ## Index ID patterns `index_id_patterns` is a list of strings that define which indices should be created according to this template. Use [glob-like](https://en.wikipedia.org/wiki/Glob_(programming)) wildcard ( \* ) expressions to target indices that match a pattern: test\* or \*test or te\*t or \*test\*. You can also use negative patterns by prepending the hyphen `-` character. Patterns must obey the following rules: - It must follow the regex `^-?[a-zA-Z\*][a-zA-Z0-9-_\.\*]{0,254}$`. - It cannot contain consecutive asterisks (`*`). - If it does not contain an asterisk (`*`), the length must be greater than or equal to 3 characters. ## Priority When multiple templates match a new index ID, the template with the highest `priority` is used to configure the index. ================================================ FILE: docs/deployment/_category_.yaml ================================================ label: 'Deployment' position: 7 collapsed: true ================================================ FILE: docs/deployment/cluster-sizing.md ================================================ --- title: Cluster sizing sidebar_position: 3 --- In this guide, we discuss how to size your Quickwit cluster and nodes. As shown in the [architecture section](../overview/architecture.md), a Quickwit cluster has 5 main components: the Indexers, Searchers, Control Plane, Metastore and, Janitor. Each component has different resource requirements and can be scaled independently. We will also discuss how to size the metastore PostgreSQL database. :::note This guide provides general guidelines. The actual resource requirements depend strongly on your workload. We recommend monitoring the resource usage and adjusting the cluster size accordingly. ::: ## Quickwit services ### Indexers Here are some high-level guidelines to size your Indexer nodes: - Quickwit can index at around **7.5MB per second per core** - For the general use case, configure 4GB of RAM per core - Workloads with a large number of indexes or data sources consume more RAM - Don't use instances with less than 8GB of RAM - Mount the data directory to a volume of at least 120GB to store: - the [split cache](../configuration/node-config.md#Indexer-configuration) (default 100GB) - the [ingest queue](../configuration/node-config.md#ingest-api-configuration) (default 4GiB) - a little extra for the indexes that are being built (first generation and merges) - Local SSDs are preferred for deploying Indexers since they generally provide the best performance per dollar and save some network bandwidth. However, remote disks can also if they provide roughly 20 MB/s of write throughput per core when using the ingest API or 10 MB/s when relying on other sources. For Amazon EBS volumes, this is equivalent to 320 or 160 IOPS per core (assuming 64 KB IOPS). :::note To utilize all CPUs on Indexer nodes that have more than 4 cores, your indexing workload needs to be broken down into multiple indexing pipelines. This can be achieved by creating multiple indexes or by using a [partitioned data source](../configuration/source-config.md#number-of-pipelines) such as [Kafka](../configuration/source-config.md#kafka-source) or the [ingest API (v2)](../ingest-data/ingest-api.md#ingest-api-versions). ::: ### Searchers Search performance is highly dependent on the workload. For example, term queries are usually cheaper than aggregations. A good starting point for dimensioning Searcher nodes: - Configure 8GB of RAM per core when using a high latency / low bandwidth object store like AWS S3 - Decrease the RAM / CPU ratio (e.g 4GB/core) when using a faster object store - Provision more RAM if you expect many concurrent aggregation requests. By default, each request can use up to 500MB of RAM on each node. - Avoid instances with less than 4GB of RAM - Searcher nodes don't use disk unless the [split cache](../configuration/node-config.md#Searcher-split-cache-configuration) is explicitly enabled One strength of Quickwit is that its Searchers are stateless, which makes it easy to scale them up and down based on the workload. Scale the number of Searcher nodes based on: - the number of concurrent requests expected - aggregations that run on large amounts of data (without [time](../overview/concepts/querying.md#time-sharding) or [tag](../overview/concepts/querying.md#tag-pruning) pruning) ### Other services The Control Plane, Metastore and, Janitor are lightweight components. - **Control Plane**: A cluster must have only one Control Plane. It needs a single core and 2GB of RAM. It doesn't require any disk. - **Metastore**: A cluster must have exactly one Metastore when using the [file-backed metastore](../configuration/metastore-config.md#file-backed-metastore). When using the [PostgreSQL metastore](#postgres-metastore-backend), you can run one or several Metastore pods for high availability (HA). The Metastore requires a single core and 2GB of RAM. For clusters handling hundreds of indexes, you may increase the size to 2 cores and 4GB of RAM. It doesn't write to disk (when using PostgreSQL, the database handles persistence). - **Janitor**: A cluster must have only one Janitor. In general, it requires 1 core and 2GB of RAM and doesn't use the disk. If you use the [delete API](https://quickwit.io/docs/overview/concepts/deletes), the Janitor should be dimensioned like an indexer. ### Single node deployments For experimentations and small scale POCs, it is possible to deploy all the services on a single node (see [tutorial](../get-started/tutorials/tutorial-hdfs-logs.md)). We recommend at least 2 cores and 8GB of RAM. ## Postgres Metastore backend For most use cases, a PostgreSQL instance with 4GB of RAM and 1 core is sufficient: - with the AWS RDS managed service, use the t4g.medium instance type. Enable multi-AZ with one standby for high availability. ================================================ FILE: docs/deployment/deployment-modes.md ================================================ --- title: Deployment modes sidebar_position: 1 --- As an application, Quickwit is built out of multiple services and is designed to run as a horizontally-scalable distributed system. Currently, Quickwit supports four core services (indexer, searcher, metastore, control plane) and one maintenance service (janitor): - Indexers ingest documents from data sources and build indexes. - Searchers execute search queries submitted via the REST API. - The Metastore stores index metadata in a PostgreSQL-compatible database or cloud-hosted file. - The Control Plane distributes and coordinates indexing workloads on indexers. - The Janitor performs periodic maintenance tasks. Quickwit is distributed as a single binary or Docker image. The behavior of that executable file or image is controlled with the `--service` option of the `quickwit run` command and defines which services run on a node. You may start one service, multiple, or all of them. Nodes always serve the REST API and the search and admin UI. In addition, they will redirect requests that they cannot satisfy to the appropriate nodes in the cluster. Finally, each service can run on one or several nodes depending on the expected load on the system. ## Standalone mode (single node) This deployment mode is the simplest way to get started with Quickwit. Launch all the services with the `quickwit run` [command](../reference/cli.md), and you are now ready to ingest data and search your indexes. ## Cluster mode (multi-node) You can deploy Quickwit on multiple nodes. We provide a [Helm chart](./kubernetes/helm.md) to help you deploy Quickwit on Kubernetes. In cluster mode, you must store your index data on a shared storage backend such as Amazon S3 or MinIO. ## One indexer, multiple searchers One indexer running on a small instance (4 vCPUs) can ingest documents at a throughput of 20-40MB/s (1-3+ TB/day). A deployment with one indexer is thus an excellent place to start. However, you may need several searchers to handle large datasets or serve many resource-intensive requests such as aggregation queries. ## Multiple indexers, multiple searchers Indexing a single [data source](../configuration/source-config.md) on several indexers is only possible with a [Kafka source](../configuration/source-config.md#kafka-source). Support for native distributed indexing was added with Quickwit 0.9. ## File-backed metastore limitations The file-backed metastore is a good fit for standalone and small deployments. However, it does not support multiple instances running at the same time. As long as you can guarantee that no more than one metastore is running at any given time, the file-backed metastore is safe to use. For heavy workloads, we recommend using a PostgreSQL metastore. ================================================ FILE: docs/deployment/kubernetes/_category_.yaml ================================================ label: 'Kubernetes' position: 2 collapsed: true ================================================ FILE: docs/deployment/kubernetes/gke.md ================================================ --- title: Install Quickwit on Google GKE sidebar_label: Google GKE sidebar_position: 2 --- This guide will help you set up a Quickwit cluster with the correct GCS permissions. ## Set up Before installing Quickwit with Helm, let's create a namespace for our playground. ``` export NS=quickwit-tutorial kubectl create ns ${NS} ``` Quickwit stores its index on an object storage. We will use GCS, which is natively supported since the 0.7 version (for versions < 0.7, you should use an S3 interoperability key). The following steps create a GCP and a GKE service account and bind them together. We are going to create them, set the right permissions and bind them. ```bash export PROJECT_ID={your-project-id} export GCP_SERVICE_ACCOUNT=quickwit-tutorial export GKE_SERVICE_ACCOUNT=quickwit-sa export BUCKET=your-bucket kubectl create serviceaccount ${GKE_SERVICE_ACCOUNT} -n ${NS} gcloud iam service-accounts create ${GCP_SERVICE_ACCOUNT} --project=${PROJECT_ID} gcloud storage buckets add-iam-policy-binding gs://${BUCKET} \ --member "serviceAccount:${GCP_SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" \ --role "roles/storage.objectAdmin" # Notice that the member is related to a namespace. gcloud iam service-accounts add-iam-policy-binding ${GCP_SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:${PROJECT_ID}.svc.id.goog[${NS}/${GKE_SERVICE_ACCOUNT}]" # Now we can annotate our service account! kubectl annotate serviceaccount ${GKE_SERVICE_ACCOUNT} \ iam.gke.io/gcp-service-account=${GCP_SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com \ -n ${NS} ``` ## Install Quickwit using Helm We are now ready to install Quickwit on GKE. If you'd like to know more about Helm, consult our [comprehensive guide](./helm.md) for installing Quickwit on Kubernetes. ```bash helm repo add quickwit https://helm.quickwit.io helm repo update quickwit ``` Let's set Quickwit `values.yaml`: ```yaml # We use the edge version here as we recently fixed # a bug which prevents the metastore from running on GCS. image: repository: quickwit/quickwit pullPolicy: Always tag: edge serviceAccount: create: false name: quickwit-sa config: default_index_root_uri: gs://{BUCKET}/qw-indexes metastore_uri: gs://{BUCKET}/qw-indexes ``` We're ready to deploy: ```bash helm install quickwit/quickwit -f values.yaml ``` ## Check that Quickwit is running It should take a few seconds for the cluster to start. During the startup process, individual pods might restart themselves several times. To access the UI, you can run the following command and then open your browser at [http://localhost:7280](http://localhost:7280): ``` kubectl port-forward svc/release-name-quickwit-searcher 7280:7280 ``` ## Uninstall the deployment Run the following Helm command to uninstall the deployment ```bash helm uninstall ``` And don't forget to clean your bucket, Quickwit should have stored 3 files in `gs://{BUCKET}/qw-indexes`. ================================================ FILE: docs/deployment/kubernetes/glasskube.md ================================================ --- title: Install Quickwit with Glasskube sidebar_label: Glasskube sidebar_position: 3 --- [Glasskube](https://glasskube.dev) is a package manager for Kubernetes that empowers you to effortlessly install, upgrade, configure, and manage your Kubernetes cluster packages, all while streamlining repetitive and cumbersome maintenance tasks. ## Requirements To deploy Quickwit on Kubernetes, you will need: - kubectl, compatible with your cluster (+/- 1 minor release from your cluster) (`kubectl version`) - A Kubernetes cluster 1. Install `kubectl` and `glasskube` cli. To install `kubectl` locally, you can refer to [this documentation](https://kubernetes.io/docs/tasks/tools/#install-kubectl). To install `glasskube` cli locally, you can refer to [this documentation](https://glasskube.dev/docs/getting-started/install) and choose the right installation options according to your operating system. For example, let's assume that you're on MacOS using homebrew and kind, this is what you'll have to do: ```shell brew install glasskube/tap/glasskube # install the glasskube cli kind create cluster # create a kind Kubernetes cluster ``` 2. Install glasskube in your Kubernetes cluster: ```shell glasskube bootstrap ``` 3. Start and access to the Glasskube's GUI: ```shell glasskube serve ``` You'll be able to access to the GUI of Glasskube here: http://localhost:8580 ## Install Quickwit using Glasskube `glasskube` will install Quickwit in the `quickwit` namespace. You can perform the Quickwit installation directly with the GUI: ![screenshot-glasskube-ui.png](../../assets/images/screenshot-glasskube-ui.png) Or use the CLI instead: ```shell glasskube install quickwit ``` In both, you'll have to set the value of those parameters: * `defaultIndexRootUri`: the default index URI is a S3 compliant bucket which usually looks like this: `s3:///` * `metastoreUri`: if you're not using PostgreSQL and object storage, you can pick the same bucket and value you used for the `defaultIndexRootUri` parameter * `s3Endpoint`: the http(s) URL of your object storage service which should looks like `https://s3.{region}.{your object storage domain}` * `s3Flavor`: which can be one of the following: `do`, `garage`, `gcp`, `minio`. You can leave it empty if your object storage is compliant with AWS S3 * `s3Region` * `s3AccessKeyId` * `s3SecretAccessKey` ## Uninstall quickwit ```shell glasskube uninstall quickwit ``` ================================================ FILE: docs/deployment/kubernetes/helm.md ================================================ --- title: Install Quickwit with Helm sidebar_label: Helm sidebar_position: 1 --- [Helm](https://helm.sh) is a package manager for Kubernetes that allows you to configure, install, and upgrade containerized applications in a Kubernetes cluster in a version-controlled and reproducible way. You can install Quickwit on Kubernetes with the official Quickwit Helm chart. If you encounter any problem with the chart, please, open an issue in our [GitHub repository](https://github.com/quickwit-oss/helm-charts). ## Requirements To deploy Quickwit on Kubernetes, you will need: - kubectl, compatible with your cluster (+/- 1 minor release from your cluster) (`kubectl version`) - Helm v3 (`helm version`) - A Kubernetes cluster 1. Install `kubectl` and `helm` To install `kubectl` and `helm` locally, follow the [Kubernetes](https://kubernetes.io/docs/tasks/tools/#install-kubectl) and [Helm](https://helm.sh/docs/intro/install/) documentation pages. 2. Add the Quickwit Helm chart repository to Helm ```bash helm repo add quickwit https://helm.quickwit.io ``` 3. Update the repository ```bash helm repo update quickwit ``` 4. Create and customize your configuration file `values.yaml` You can inspect the default configuration values of the chart using the following command: ```bash helm show values quickwit/quickwit ``` Here is an example of a minimal configuration with a file-backed metastore: ```yaml environment: QW_METASTORE_URI: s3:///quickwit-indexes config: default_index_root_uri: s3:///quickwit-indexes storage: s3: region: eu-east-1 # We recommend using IAM roles and permissions to access Amazon S3 resources, # but you can specify a pair of access and secret keys if necessary. access_key_id: secret_access_key: ``` 5. Deploy Quickwit ```bash helm install quickwit/quickwit -f values.yaml ``` 6. Check that Quickwit is running It might take some time for the cluster to start. During the startup process individual pods might restart themselves several times. The command on the previous step will print the instructions on how to connect to the cluster. This endpoint can be used to access the quickwit search UI, as well execute standard API commands against. ## Using PostgreSQL as a metadata store The file-backed metastore is mainly useful for testing purposes. Though a file-backed metastore might be easier to setup, we strongly encourage you to use a PostgreSQL metastore in production. For the quickwit installation to work with PostgreSQL metadata you need to provide connection PostgreSQL information instead of metastore URI: ```yaml config: default_index_root_uri: s3:///quickwit-indexes postgres: host: port: 5432 database: quickwit-metastore username: quickwit password: # This password will be stored as a Kubernetes Secret storage: {} s3: region: eu-east-1 # We recommend using IAM roles and permissions to access Amazon S3 resources, # but you can specify a pair of access and secret keys if necessary. access_key_id: secret_access_key: ``` ## Uninstall the deployment Run the following Helm command to uninstall the deployment ```bash helm uninstall ``` ================================================ FILE: docs/distributed-tracing/_category_.yaml ================================================ label: 'Distributed tracing' position: 6 collapsed: true ================================================ FILE: docs/distributed-tracing/otel-service.md ================================================ --- title: OTEL service sidebar_position: 5 --- Quickwit natively supports the [OpenTelemetry Protocol (OTLP)](https://opentelemetry.io/docs/reference/specification/protocol/otlp/) and provides a gRPC endpoint to receive spans from an OpenTelemetry collector, or from your application directly, via an exporter. This endpoint is enabled by default. When enabled, Quickwit will start the gRPC service ready to receive spans from an OpenTelemetry collector. The spans are indexed in the `otel-trace-v0_7` index by default, and this index will be automatically created if not present. The index doc mapping is described in the next [section](#trace-and-span-data-model). If for any reason, you want to disable this endpoint, you can: - Set the `QW_ENABLE_OTLP_ENDPOINT` environment variable to `false` when starting Quickwit. - Or [configure the node config](/docs/configuration/node-config.md) by setting the indexer setting `enable_otlp_endpoint` to `false`. ```yaml title=node-config.yaml # ... Indexer configuration ... indexer: enable_otlp_endpoint: false ``` ## Sending spans in your own index You can send spans in the index of your choice by setting the header `qw-otel-traces-index` of your gRPC request to the targeted index ID. ## Trace and span data model A trace is a collection of spans that represents a single request. A span represents a single operation within a trace. OpenTelemetry collectors send spans, Quickwit then indexes them in the `otel-trace-v0_7` index by default that maps OpenTelemetry span model to an indexed document in Quickwit. The span model is derived from the [OpenTelemetry specification](https://opentelemetry.io/docs/reference/specification/trace/api/). Below is the doc mapping of the `otel-trace-v0_7` index: ```yaml version: 0.7 index_id: otel-trace-v0_7 doc_mapping: mode: strict field_mappings: - name: trace_id type: bytes input_format: hex output_format: hex fast: true - name: trace_state type: text indexed: false - name: service_name type: text tokenizer: raw fast: true - name: resource_attributes type: json tokenizer: raw - name: resource_dropped_attributes_count type: u64 indexed: false - name: scope_name type: text indexed: false - name: scope_version type: text indexed: false - name: scope_attributes type: json indexed: false - name: scope_dropped_attributes_count type: u64 indexed: false - name: span_id type: bytes input_format: hex output_format: hex - name: span_kind type: u64 - name: span_name type: text tokenizer: raw fast: true - name: span_fingerprint type: text tokenizer: raw - name: span_start_timestamp_nanos type: datetime input_formats: [unix_timestamp] output_format: unix_timestamp_nanos indexed: false fast: true fast_precision: milliseconds - name: span_end_timestamp_nanos type: datetime input_formats: [unix_timestamp] output_format: unix_timestamp_nanos indexed: false fast: false - name: span_duration_millis type: u64 indexed: false fast: true - name: span_attributes type: json tokenizer: raw fast: true - name: span_dropped_attributes_count type: u64 indexed: false - name: span_dropped_events_count type: u64 indexed: false - name: span_dropped_links_count type: u64 indexed: false - name: span_status type: json indexed: true - name: parent_span_id type: bytes input_format: hex output_format: hex indexed: false - name: events type: array tokenizer: raw fast: true - name: event_names type: array tokenizer: default record: position stored: false - name: links type: array tokenizer: raw timestamp_field: span_start_timestamp_nanos indexing_settings: commit_timeout_secs: 10 search_settings: default_search_fields: [] ``` ## Known limitations There are a few limitations on the current distributed tracing setup in Quickwit 0.9: - The OTLP gRPC service does not provide High-Durability. This will be fixed in 0.10. - OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discovered other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit). ================================================ FILE: docs/distributed-tracing/overview.md ================================================ --- title: Distributed Tracing with Quickwit sidebar_label: Overview sidebar_position: 1 --- Distributed Tracing is a process that tracks your application requests flowing through your different services: frontend, backend, databases and more. It's a powerful tool to understand how your application works and to debug performance issues. Quickwit is a cloud-native engine to index and search unstructured data which makes it a perfect fit for a traces backend. Moreover, Quickwit supports natively the [OpenTelemetry gRPC and HTTP (protobuf only) protocol](https://opentelemetry.io/docs/reference/specification/protocol/otlp/) and the [Jaeger gRPC API (SpanReader only)](https://www.jaegertracing.io/). **This means that you can use Quickwit to store your traces and to query them with Jaeger UI**. ![Quickwit Distributed Tracing](../assets/images/distributed-tracing-overview-light.png#gh-light-mode-only)![Quickwit Distributed Tracing](../assets/images/distributed-tracing-overview-dark.png#gh-dark-mode-only) ## Plug Quickwit to Jaeger Quickwit implements a gRPC service compatible with Jaeger UI. All you need is to configure Jaeger with a (span) storage type `grpc`[^1] and you will be able to visualize your traces in Jaeger that are stored in any Quickwit's indexes matching the pattern `otel-traces-v0_*`. We made a tutorial on [how to plug Quickwit to Jaeger UI](plug-quickwit-to-jaeger.md) that will guide you through the process. [^1]: It was `grpc-plugin` until the version 1.58 of Jaeger. ## Send traces to Quickwit - [Using OTEL collector](send-traces/using-otel-collector.md) - [Using python OTEL SDK](send-traces/using-otel-sdk-python.md) ================================================ FILE: docs/distributed-tracing/plug-quickwit-to-jaeger.md ================================================ --- title: Plug Quickwit to Jaeger description: A simple tutorial to use Jaeger with Quickwit backend. icon_url: /img/tutorials/quickwit-logo.png tags: [traces, ingestion] sidebar_position: 2 --- In this tutorial, we will show you how Quickwit can eat its own dog food: we will send Quickwit traces into Jaeger and analyze them, which will generate new traces to analyze :) ## Start Quickwit First, start a [Quickwit instance](../get-started/installation.md) with the OTLP service enabled: ```bash QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER=true \ OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:7281 \ ./quickwit run ``` We also set `QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER` and `OTEL_EXPORTER_OTLP_ENDPOINT` environment variables so that Quickwit will send its own traces to itself. ## Start Jaeger UI Let's start a Jaeger UI instance with docker. Here we need to inform jaeger that it should use quickwit as its backend. Due to some idiosyncrasy associated with networking with containers, we will have to use a different approach on MacOS & Windows on one side, and Linux on the other side. ### MacOS & Windows We can rely on `host.docker.internal` to get the docker bridge ip address, pointing to our quickwit server. ```bash docker run --rm --name jaeger-qw \ -e SPAN_STORAGE_TYPE=grpc \ -e GRPC_STORAGE_SERVER=host.docker.internal:7281 \ -p 16686:16686 \ jaegertracing/jaeger-query:1.60 ``` ### Linux By default, Quickwit is listening to `127.0.0.1`, and will not respond to request directed to the docker bridge (`172.17.0.1`). There are different ways to solve this problem. The easiest is probably to use host network mode. ```bash docker run --rm --name jaeger-qw --network=host \ -e SPAN_STORAGE_TYPE=grpc \ -e GRPC_STORAGE_SERVER=127.0.0.1:7281 \ -p 16686:16686 \ jaegertracing/jaeger-query:1.60 ``` ## Search traces in Jaeger UI As Quickwit is indexing its own traces, you should be able to see them in Jaeger UI after 5 seconds (the time it takes for Quickwit to do its first commit). Open the Jaeger UI at [http://localhost:16686](http://localhost:16686) and search for traces! By executing search queries, you will then see Quickwit's own traces: - `find_traces` is the endpoint called when you search for traces in Jaeger UI, it then calls `find_trace_ids`. - `find_traces_ids` is doing an aggregation query on spans to get unique trace IDs. - `root_search` is Quickwit search entry point. It calls search on each split (piece of index) in parallel, in a distributed manner, or just locally if there is only one node. - `leaf_search` is the search entry point on each node. It calls `leaf_search_single_split` on each split. - `leaf_search_single_split` is the search entry point on a split. It will call consecutively `warmup` and `tantivy_search`. - `warmup` is the warmup phase of the search. It prefetches data needed to execute the search query. - `tantivy_search` is the search phase of the search. It is executing the search query at horse speeds with the [Tantivy](https://github.com/quickwit-oss/tantivy). ![Quickwit trace in Jaeger UI](../assets/images/jaeger-ui-quickwit-trace-analysis.png) ## Next steps You are now ready for the next step: instrumenting your application and sending its traces to Quickwit. You can do it: - In [python](send-traces/using-otel-sdk-python.md). - And in any other language that OpenTelemetry supports. ================================================ FILE: docs/distributed-tracing/send-traces/_category_.yaml ================================================ label: 'Sending traces' position: 3 collapsed: false ================================================ FILE: docs/distributed-tracing/send-traces/using-otel-collector.md ================================================ --- title: Using OTEL Collector description: Using OTEL Collector tags: [otel, collector, traces] sidebar_position: 1 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; If you already have your own OpenTelemetry Collector and want to export your traces to Quickwit, you need a new OLTP gRPC exporter in your config.yaml: ```yaml title="otel-collector-config.yaml" receivers: otlp: protocols: grpc: http: processors: batch: exporters: otlp/quickwit: endpoint: host.docker.internal:7281 tls: insecure: true # By default, traces are sent to the otel-traces-v0_7. # You can customize the index ID By setting this header. # headers: # qw-otel-traces-index: otel-traces-v0_7 service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp/quickwit] ``` ```yaml title="otel-collector-config.yaml" receivers: otlp: protocols: grpc: http: processors: batch: exporters: otlp/quickwit: endpoint: 127.0.0.1:7281 tls: insecure: true service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp/quickwit] ``` ## Test your OTEL configuration 1. [Install](../../get-started/installation.md) and start a Quickwit server: ```bash ./quickwit run ``` 2. Start a collector with the previous config: ```bash docker run -v ${PWD}/otel-collector-config.yaml:/etc/otelcol/config.yaml -p 4317:4317 -p 4318:4318 -p 7281:7281 otel/opentelemetry-collector ``` ```bash docker run -v ${PWD}/otel-collector-config.yaml:/etc/otelcol/config.yaml --network=host -p 4317:4317 -p 4318:4318 -p 7281:7281 otel/opentelemetry-collector ``` 3. Send a trace to your collector with cURL: ```bash curl -XPOST "http://localhost:4318/v1/traces" -H "Content-Type: application/json" \ --data-binary @- << EOF { "resource_spans": [ { "resource": { "attributes": [ { "key": "service.name", "value": { "string_value": "test-with-curl" } } ] }, "scope_spans": [ { "scope": { "name": "manual-test" }, "spans": [ { "time_unix_nano": "1678974011000000000", "observed_time_unix_nano": "1678974011000000000", "start_time_unix_nano": "1678974011000000000", "end_time_unix_nano": "1678974021000000000", "trace_id": "3c191d03fa8be0653c191d03fa8be065", "span_id": "3c191d03fa8be065", "kind": 2, "events": [], "status": { "code": 1 } } ] } ] } ] } EOF ``` You should see a log on the Quickwit server similar to the following: ```bash 2023-03-16T13:44:09.369Z INFO quickwit_indexing::actors::indexer: new-split split_id="01GVNAKT5TQW0T2QGA245XCMTJ" partition_id=6444214793425557444 ``` This means that Quickwit has received the trace and created a new split. Wait for the split to be published before searching for traces. ## Next step Follow our tutorial on [how to send traces from your python app](using-otel-sdk-python.md). ================================================ FILE: docs/distributed-tracing/send-traces/using-otel-sdk-python.md ================================================ --- title: Using OTEL SDK - Python description: A simple tutorial to send traces to Quickwit from a Python Flask app. icon_url: /img/tutorials/python-logo.png tags: [python, traces, ingestion] sidebar_position: 2 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; In this tutorial, we will show you how to instrument a Python [Flask](https://flask.palletsprojects.com/en/2.2.x/) app with OpenTelemetry and send traces to Quickwit. This tutorial was inspired by the [Python OpenTelemetry](https://opentelemetry.io/docs/instrumentation/python/getting-started/) documentation, huge thanks to the OpenTelemetry team! ## Prerequisites - Python3 installed - Docker installed ## Start a Quickwit instance [Install Quickwit](/docs/get-started/installation.md) and start a Quickwit instance: ```bash ./quickwit run ``` ## Start Jaeger UI Let's start a Jaeger UI instance with docker. Here we need to inform jaeger that it should use quickwit as its backend. Due to some idiosyncrasy associated with networking with containers, we will have to use a different approach on MacOS & Windows on one side, and Linux on the other side. ### MacOS & Windows We can rely on `host.docker.internal` to get the docker bridge ip address, pointing to our quickwit server. ```bash docker run --rm --name jaeger-qw \ -e SPAN_STORAGE_TYPE=grpc \ -e GRPC_STORAGE_SERVER=host.docker.internal:7281 \ -p 16686:16686 \ jaegertracing/jaeger-query:1.60 ``` ### Linux By default, quickwit is listening to `127.0.0.1`, and will not respond to request directed to the docker bridge (`172.17.0.1`). There are different ways to solve this problem. The easiest is probably to use host network mode. ```bash docker run --rm --name jaeger-qw --network=host \ -e SPAN_STORAGE_TYPE=grpc \ -e GRPC_STORAGE_SERVER=127.0.0.1:7281 \ -p 16686:16686 \ jaegertracing/jaeger-query:1.60 ``` ## Run a simple Flask app We will start a flask application that is doing three things on each HTTP call `http://localhost:5000/process-ip`: - Fetching an IP address from [https://httpbin.org/ip](https://httpbin.org/ip). - Parsing it and fake processing it with a random sleep. - Displaying it with a random sleep. Let's first install the dependencies: ```bash pip install flask pip install opentelemetry-distro pip install opentelemetry-exporter-otlp ``` The opentelemetry-distro package installs the API, SDK, and the opentelemetry-bootstrap and opentelemetry-instrument tools that you’ll use. Here is the code of our app: ```python title=my_app.py import random import time import requests from flask import Flask app = Flask(__name__) @app.route("/process-ip") def process_ip(): body = fetch() ip = parse(body) display(ip) return ip def fetch(): resp = requests.get('https://httpbin.org/ip') body = resp.json() return body def parse(body): # Sleep for a random amount of time to make the span more visible. secs = random.randint(1, 100) / 1000 time.sleep(secs) return body["origin"] def display(ip): # Sleep for a random amount of time to make the span more visible. secs = random.randint(1, 100) / 1000 time.sleep(secs) message = f"Your IP address is `{ip}`." print(message) if __name__ == "__main__": app.run(port=5000) ``` ## Auto-instrumentation OpenTelemetry provides a tool called `opentelemetry-bootstrap` that automatically instruments your Python application. ```bash opentelemetry-bootstrap -a install ``` And that's it, we are now ready to run the app: ```bash # We don't need metrics. OTEL_METRICS_EXPORTER=none \ OTEL_TRACES_EXPORTER=console \ OTEL_SERVICE_NAME=my_app \ python my_app.py ``` By hitting [http://localhost:5000/process-ip](http://localhost:5000/process-ip) you should see the corresponding trace in the console. This is nice but it would be even better if we could have the time passed in each steps, get the status code of the HTTP request, and the content type of the response. Let's do that by manually instrumentating our app! ## Manual instrumentation ```python title=my_instrumented_app.py import random import time import requests from flask import Flask from opentelemetry import trace # Creates a tracer from the global tracer provider tracer = trace.get_tracer(__name__) app = Flask(__name__) @app.route("/process-ip") @tracer.start_as_current_span("process_ip") def process_ip(): body = fetch() ip = parse(body) display(ip) return ip @tracer.start_as_current_span("fetch") def fetch(): resp = requests.get('https://httpbin.org/ip') body = resp.json() headers = resp.headers current_span = trace.get_current_span() current_span.set_attribute("status_code", resp.status_code) current_span.set_attribute("content_type", headers["Content-Type"]) current_span.set_attribute("content_length", headers["Content-Length"]) return body @tracer.start_as_current_span("parse") def parse(body): # Sleep for a random amount of time to make the span more visible. secs = random.randint(1, 100) / 1000 time.sleep(secs) return body["origin"] @tracer.start_as_current_span("display") def display(ip): # Sleep for a random amount of time to make the span more visible. secs = random.randint(1, 100) / 1000 time.sleep(secs) message = f"Your IP address is `{ip}`." print(message) current_span = trace.get_current_span() current_span.add_event(message) if __name__ == "__main__": app.run(port=5000) ``` We can now start the new instrumented app: ```bash OTEL_METRICS_EXPORTER=none \ OTEL_TRACES_EXPORTER=console \ OTEL_SERVICE_NAME=my_app \ opentelemetry-instrument python my_instrumented_app.py ``` If you hit again [http://localhost:5000/process-ip](http://localhost:5000/process-ip), you should see new spans with name `fetch`, `parse`, and `display` and with the corresponding custom attributes! ## Sending traces to Quickwit To send traces to Quickwit, we need to use the OTLP exporter. This is a simple as this: ```bash OTEL_METRICS_EXPORTER=none \ # We don't need metrics OTEL_SERVICE_NAME=my_app \ OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:7281 \ opentelemetry-instrument python my_instrumented_app.py ``` Now, if you hit [http://localhost:5000/process-ip](http://localhost:5000/process-ip), traces will be send to Quickwit, you just need to wait around 30 seconds before they are indexed. It's time for a coffee break! 30 seconds has passed, let's query the traces from our service: ```bash curl -XPOST http://localhost:7280/api/v1/otel-trace-v0/search -H 'Content-Type: application/json' -d '{ "query": "resource_attributes.service.name:my_app" }' ``` And then open the Jaeger UI [localhost:16686](http://localhost:16686/) and play with it, you have now a Jaeger UI powered by a Quickwit storage backend! ![Flask trace analysis in Jaeger UI](../../assets/images/jaeger-ui-python-app-trace-analysis.png) ![Flask traces in Jaeger UI](../../assets/images/jaeger-ui-python-app-traces.png) ## Sending traces to your OpenTelemetry collector Start a collector as described in the [OpenTelemetry collector tutorial](using-otel-collector.md) and execute the following command: ```bash OTEL_METRICS_EXPORTER=none \ # We don't need metrics OTEL_SERVICE_NAME=my_app \ opentelemetry-instrument python instrumented_app.py ``` Traces will be sent to your collector, and then to Quickwit. ## Wrap up In this tutorial, we have seen how to instrument a Python application with OpenTelemetry and send traces to Quickwit. We have also seen how to use the Jaeger UI to analyze traces. All the code snippets in our [tutorial repository](https://github.com/quickwit-oss/tutorials). Please let us know what you think about this tutorial, and if you have any questions, feel free to reach out to us on [Discord](https://discord.gg/7eNYX4d) or [Twitter](https://twitter.com/quickwit_inc). ================================================ FILE: docs/get-started/_category_.yaml ================================================ label: 'Get started' position: 2 collapsed: false ================================================ FILE: docs/get-started/installation.md ================================================ --- title: Installation sidebar_position: 2 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import { useDocsVersion } from '@docusaurus/theme-common/internal'; export const RenderIf = ({children, condition}) => ( <> {condition && children} ); Quickwit compiles to a single binary and we provide different methods to install it: - Linux/MacOS binaries that you can [download manually](#download) or with the [install script](#install-script) - [Docker image](#use-the-docker-image) - [Helm chart](../deployment/kubernetes/helm.md) - [Glasskube](../deployment/kubernetes/glasskube.md) ## Prerequisites Quickwit is officially only supported for Linux. Freebsd and MacOS are not officially supported, but should work as well. Quickwit supplies binaries for x86-64 and aarch64. No special instruction set is required, but on x86-64 SSE3 is recommended. Support of aarch64 is currently experimental. ## Download Version: nightly - License: [Apache 2.0](https://github.com/quickwit-oss/quickwit/blob/main/LICENSE) - Downloads `.tar.gz`: - [Linux ARM64](https://github.com/quickwit-oss/quickwit/releases/download/nightly/quickwit-nightly-aarch64-unknown-linux-gnu.tar.gz) - [Linux x86_64](https://github.com/quickwit-oss/quickwit/releases/download/nightly/quickwit-nightly-x86_64-unknown-linux-gnu.tar.gz) - [macOS aarch64](https://github.com/quickwit-oss/quickwit/releases/download/nightly/quickwit-nightly-aarch64-apple-darwin.tar.gz) - [macOS x86_64](https://github.com/quickwit-oss/quickwit/releases/download/nightly/quickwit-nightly-x86_64-apple-darwin.tar.gz) version: 0.8.1 - [Release notes](https://github.com/quickwit-oss/quickwit/releases/tag/v0.8.1) - [Changelog](https://github.com/quickwit-oss/quickwit/blob/main/CHANGELOG.md) License: [Apache 2.0](https://github.com/quickwit-oss/quickwit/blob/main/LICENSE) Downloads `.tar.gz`: - [Linux ARM64](https://github.com/quickwit-oss/quickwit/releases/download/v0.8.1/quickwit-v0.8.1-aarch64-unknown-linux-gnu.tar.gz) - [Linux x86_64](https://github.com/quickwit-oss/quickwit/releases/download/v0.8.1/quickwit-v0.8.1-x86_64-unknown-linux-gnu.tar.gz) - [macOS aarch64](https://github.com/quickwit-oss/quickwit/releases/download/v0.8.1/quickwit-v0.8.1-aarch64-apple-darwin.tar.gz) - [macOS x86_64](https://github.com/quickwit-oss/quickwit/releases/download/v0.8.1/quickwit-v0.8.1-x86_64-apple-darwin.tar.gz) Check out the available builds in greater detail on [GitHub](https://github.com/quickwit-oss/quickwit/releases) ### Note on external dependencies Quickwit depends on the following external libraries to work correctly: - `libssl`: the industry defacto cryptography library. These libraries can be installed on your system using the native package manager. You can install these dependencies using the following command: ```bash apt-get -y update && apt-get -y install libssl ``` ```bash yum -y update && yum -y install openssl ``` ```bash pacman -S openssl ``` Additionally it requires a few more dependencies to compile it. These dependencies are not required on production system: - `clang`: used to compile some dependencies. - `protobuf-compiler`: used to compile protobuf definitions. - `libssl-dev`: headers for libssl. - `pkg-config`: used to locate libssl. - `cmake`: used to build librdkafka, for kafka support. These dependencies can be installed on your system using the native package manager. You can install these dependencies using the following command: ```bash apt install -y clang protobuf-compiler libssl-dev pkg-config cmake ``` ```bash yum -y update && yum -y install clang openssl-devel pkgconfig cmake3 # amazonlinux only has protobuf-compiler 2.5, we need something much more up to date. wget https://github.com/protocolbuffers/protobuf/releases/download/v21.9/protoc-21.9-linux-x86_64.zip sudo unzip protoc-21.9-linux-x86_64.zip -d /usr/local # amazonlinux use cmake2 as cmake, we need cmake3 ln -s /usr/bin/cmake3 /usr/bin/cmake ``` ```bash pacman -S clang protobuf openssl pkg-config cmake make ``` ## Install script To easily install Quickwit on your machine, just run the command below from your preferred shell. The script detects the architecture and then downloads the correct binary archive for the machine. ```bash curl -L https://install.quickwit.io | sh ``` All this script does is download the correct binary archive for your machine and extracts it in the current working directory. This means you can download any desired archive from [github](https://github.com/quickwit-oss/quickwit/releases) that matches your OS architecture and manually extract it anywhere. Once installed or extracted, all of Quickwit's installation files can be found in a directory named `quickwit-{version}` where `version` is the corresponding version of Quickwit. This directory has the following layout: ```bash quickwit-{version} ├── config │ └── quickwit.yaml ├── LICENSE ├── quickwit └── qwdata ``` - `config/quickwit.yaml`: is the default configuration file. - `LICENSE`: the license file. - `quickwit`: the quickwit executable binary. - `qwdata/`: the default data directory. ## Use the Docker image If you use Docker, this might be one of the quickest way to get going. The following command will pull the image from [Docker Hub](https://hub.docker.com/r/quickwit/quickwit) and start a container ready to execute Quickwit commands. ```bash docker run --rm quickwit/quickwit --version # If you are using Apple silicon based macOS system you might need to specify the platform. # You can also safely ignore jemalloc warnings. docker run --rm --platform linux/amd64 quickwit/quickwit --version ``` To get the full gist of this, follow the [Quickstart guide](./quickstart.md). ================================================ FILE: docs/get-started/query-language-intro.md ================================================ --- title: Introduction to Quickwit's query language sidebar_position: 3 --- Quickwit allows you to search on your indexed documents using a simple query language. Here's a quick overview. ## Clauses The main concept of this language is a clause, which represents a simple condition that can be tested against documents. ### Querying fields A clause operates on fields of your document. It has the following syntax : ``` field:condition ``` For example, when searching documents where the field `app_name` contains the token `tantivy`, you would write the following clause: ``` app_name:tantivy ``` In many cases the field name can be omitted, quickwit will then use the `default_search_fields` configured for the index. ### Clauses Cheat Sheet Quickwit support various types of clauses to express different kinds of conditions. Here's a quick overview of them: | type | syntax | examples | description| `default_search_field`| |-------------|--------|----------|------------|-----------------------| | term | `field:token` | `app_name:tantivy`
`process_id:1234`
`word` | A term clause tests the existence of avalue in the field's tokens | yes | | term prefix | `field:prefix*` | `app_name:tant*`
`quick*` | A term clause tests the existence of a token starting with the provided value | yes | | term set | `field:IN [token token ..]` |`severity:IN [error warn]` | A term set clause tests the existence of any of the provided value in the field's tokens| yes | | phrase | `field:"sequence of tokens"` | `full_name:"john doe"` | A phrase clause tests the existence of the provided sequence of tokens | yes | | phrase prefix | `field:"sequence of tokens"*` | `title:"how to m"*` | A phrase prefix clause tests the existence of a sequence of tokens, the last one used like in a prefix clause | yes | | all | `*` | `*` | A match-all clause will match every document | no | | exist | `field:*` | `error:*` | An exist clause tests the existence of any value for the field, it will match only if the field exists | no | | range | `field:bounds` |`duration:[0 TO 1000}`
`last_name:[banner TO miller]` | A term clause tests the existence of a token between the provided bounds | no | ## Queries ### Combining queries Clauses can be combined using boolean operators `AND` and `OR` to create more complex search expressions An `AND` query will match only if conditions on both sides of the operator are met ``` type:rose AND color:red ``` An `OR` query will match if either or both conditions on each side of the operator are met ``` weekday:6 OR weekday:7 ``` If no operator is provided, `AND` is implicitly assumed. ``` type:violet color:blue ``` ### Grouping queries You can build complex expressions by grouping clauses using parentheses. ``` (type:rose AND color:red) OR (type:violet AND color:blue) ``` When no parentheses are used, `AND` takes precedence over `OR`, meaning that the following query is equivalent to the one above. ``` type:rose AND color:red OR type:violet AND color:blue ``` ### Negating queries An expression can be negated either with the operator `NOT` or by prefixing the query with a dash `-`. `NOT` and `-` take precedence over everything, such that `-a AND b` means `(-a) AND b`, not `-(a AND B)`. ``` NOT severity:debug ``` or ``` type:proposal -(status:rejected OR status:pending) ``` ## Dive deeper If you want to know more about the query language, head to the [Query Language Reference](../reference/query-language.md) ================================================ FILE: docs/get-started/quickstart.md ================================================ --- title: Quickstart sidebar_position: 1 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; In this quick start guide, we will install Quickwit, create an index, add documents and finally execute search queries. All the Quickwit commands used in this guide are documented [in the CLI reference documentation](/docs/reference/cli.md). ## Install Quickwit using Quickwit installer The Quickwit installer automatically picks the correct binary archive for your environment and then downloads and unpacks it in your working directory. This method works only for [some OS/architectures](installation.md#download), and you will also need to install some [external dependencies](installation.md#note-on-external-dependencies). ```bash curl -L https://install.quickwit.io | sh ``` ```bash cd ./quickwit-v*/ ./quickwit --version ``` You can now move this executable directory wherever sensible for your environment and possibly add it to your `PATH` environment. ## Use Quickwit's Docker image You can also pull and run the Quickwit binary in an isolated Docker container. ```bash # Create first the data directory. mkdir qwdata docker run --rm quickwit/quickwit --version ``` If you are using Apple silicon based macOS system you might need to specify the platform. You can also safely ignore jemalloc warnings. ```bash docker run --rm --platform linux/amd64 quickwit/quickwit --version ``` ## Start Quickwit server ```bash ./quickwit run ``` ```bash docker run --rm -v $(pwd)/qwdata:/quickwit/qwdata -p 127.0.0.1:7280:7280 quickwit/quickwit run ``` Tips: you can use the environment variable `RUST_LOG` to control quickwit verbosity. Check it's working by browsing the [UI at http://localhost:7280](http://localhost:7280) or do a simple GET with cURL: ```bash curl http://localhost:7280/api/v1/version ``` ## Create your first index Before adding documents to Quickwit, you need to create an index configured with a YAML config file. This config file notably lets you define how to map your input documents to your index fields and whether these fields should be stored and indexed. See the [index config documentation](/docs/configuration/index-config.md). Let's create an index configured to receive Stackoverflow posts (questions and answers). ```bash # First, download the stackoverflow dataset config from Quickwit repository. curl -o stackoverflow-index-config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/stackoverflow/index-config.yaml ``` The index config defines three fields: `title`, `body` and `creationDate`. `title` and `body` are [indexed and tokenized](../configuration/index-config.md#text-type), and they are also used as default search fields, which means they will be used for search if you do not target a specific field in your query. `creationDate` serves as the timestamp for each record. There are no more explicit field definitions as we can use the default dynamic [mode](/docs/configuration/index-config.md#mode): the undeclared fields will still be indexed, by default fast fields are enabled to enable aggregation queries. and the `raw` tokenizer is used for text. And here is the complete config: ```yaml title="stackoverflow-index-config.yaml" # # Index config file for stackoverflow dataset. # version: 0.7 index_id: stackoverflow doc_mapping: field_mappings: - name: title type: text tokenizer: default record: position stored: true - name: body type: text tokenizer: default record: position stored: true - name: creationDate type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: creationDate search_settings: default_search_fields: [title, body] indexing_settings: commit_timeout_secs: 30 ``` Now we can create the index with the command: ```bash ./quickwit index create --index-config ./stackoverflow-index-config.yaml ``` ```bash curl -XPOST http://127.0.0.1:7280/api/v1/indexes --header "content-type: application/yaml" --data-binary @./stackoverflow-index-config.yaml ``` Check that a directory `./qwdata/indexes/stackoverflow` has been created, Quickwit will write index files here and a `metastore.json` which contains the [index metadata](../overview/architecture.md#index). You're now ready to fill the index. ## Let's add some documents Quickwit can index data from many [sources](/docs/configuration/source-config.md). We will use a new line delimited json [ndjson](http://ndjson.org/) datasets as our data source. Let's download [a bunch of stackoverflow posts (10 000)](https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json) in [ndjson](http://ndjson.org/) format and index it. ```bash # Download the first 10_000 Stackoverflow posts articles. curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json ``` ```bash # Index our 10k documents. ./quickwit index ingest --index stackoverflow --input-path stackoverflow.posts.transformed-10000.json --force ``` ```bash # Index our 10k documents. curl -XPOST "http://127.0.0.1:7280/api/v1/stackoverflow/ingest?commit=force" --data-binary @stackoverflow.posts.transformed-10000.json ``` As soon as the ingest command finishes you can start querying data by using the following `search` command: ```bash ./quickwit index search --index stackoverflow --query "search AND engine" ``` ```bash curl "http://127.0.0.1:7280/api/v1/stackoverflow/search?query=search+AND+engine" ``` It should return 10 hits. Now you're ready to play with the search API. ## Execute search queries Let's start with a query on the field `title`: `title:search AND engine`: ```bash curl "http://127.0.0.1:7280/api/v1/stackoverflow/search?query=title:search+AND+engine" ``` The same request can be expressed as a JSON query: ```bash curl -XPOST "http://localhost:7280/api/v1/stackoverflow/search" -H 'Content-Type: application/json' -d '{ "query": "title:search AND engine" }' ``` This format is more verbose but it allows you to use more advanced features such as aggregations. The following query finds most popular tags used on the questions in this dataset: ```bash curl -XPOST "http://localhost:7280/api/v1/stackoverflow/search" -H 'Content-Type: application/json' -d '{ "query": "type:question", "max_hits": 0, "aggs": { "foo": { "terms":{ "field":"tags", "size": 10 } } } }' ``` As you are experimenting with different queries check out the server logs to see what's happening. :::note Don't forget to encode correctly the query params to avoid bad request (status 400). ::: ## Clean Let's do some cleanup by deleting the index: ```bash ./quickwit index delete --index stackoverflow ``` ```bash curl -XDELETE http://127.0.0.1:7280/api/v1/indexes/stackoverflow ``` Congrats! You can level up with the following tutorials to discover all Quickwit features. ## TLDR Run the following command from within Quickwit's installation directory. ```bash curl -o stackoverflow-index-config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/stackoverflow/index-config.yaml ./quickwit index create --index-config ./stackoverflow-index-config.yaml curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json ./quickwit index ingest --index stackoverflow --input-path ./stackoverflow.posts.transformed-10000.json --force ./quickwit index search --index stackoverflow --query "search AND engine" ./quickwit index delete --index stackoverflow ``` ## Next tutorials - [Search on logs with timestamp pruning](/docs/get-started/tutorials/tutorial-hdfs-logs) - [Setup a distributed search on AWS S3](/docs/get-started/tutorials/tutorial-hdfs-logs-distributed-search-aws-s3) ================================================ FILE: docs/get-started/tutorials/_category_.yaml ================================================ label: 'Tutorials' position: 2 collapsed: false ================================================ FILE: docs/get-started/tutorials/prometheus-metrics.md ================================================ --- title: Metrics with Grafana and Prometheus description: A simple tutorial to display Quickwit metrics with Grafana. icon_url: /img/tutorials/quickwit-logo.png tags: [grafana, prometheus, integration] sidebar_position: 2 --- In this tutorial, you will learn how to set up Grafana to display Quickwit metrics using Prometheus. Grafana will visualize the metrics collected from Quickwit, allowing you to monitor its performance effectively. ## Step 1: Create a Docker Compose File First, create a `docker-compose.yml` file in your project directory. This file will configure and run Quickwit, Prometheus, and Grafana as Docker services. Here’s the complete Docker Compose configuration: ```yaml services: quickwit: image: quickwit/quickwit environment: QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER: "true" OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:7281" ports: - 7280:7280 command: ["run"] grafana: image: grafana/grafana-oss container_name: grafana ports: - "${MAP_HOST_GRAFANA:-127.0.0.1}:3000:3000" environment: GF_INSTALL_PLUGINS: https://github.com/quickwit-oss/quickwit-datasource/releases/download/v0.4.6/quickwit-quickwit-datasource-0.4.6.zip;quickwit-quickwit-datasource GF_AUTH_DISABLE_LOGIN_FORM: "true" GF_AUTH_ANONYMOUS_ENABLED: "true" GF_AUTH_ANONYMOUS_ORG_ROLE: Admin prometheus: image: prom/prometheus:latest container_name: prometheus volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml # Ensure prometheus.yml exists in the same directory ports: - 9090:9090 ``` ### Explanation of Services - **Quickwit**: Runs the Quickwit service on port `7280`. - **Grafana**: Queries and displays data from Prometheus. - **Prometheus**: Collects metrics from Quickwit using the `/metrics` endpoint. ## Step 2: Configure Prometheus Prometheus needs a configuration file to define how it scrapes metrics from Quickwit. Create a file named `prometheus.yml` in the same directory as your Docker Compose file with the following content: ```yaml global: scrape_interval: 1s scrape_timeout: 1s scrape_configs: - job_name: quickwit metrics_path: /metrics static_configs: - targets: - quickwit:7280 ``` ## Step 3: Start the Services Run the following command in your terminal to start all services defined in the Docker Compose file: ```bash docker compose up ``` This will launch Quickwit, Prometheus, and Grafana services. ## Step 4: Configure Grafana to Use Prometheus 1. Open Grafana in your browser at `http://localhost:3000`. 2. Navigate to **Configuration** > **Data Sources**. 3. Click **Add Data Source**, select **Prometheus**, and set the URL to `http://prometheus:9090`. 4. Click **Save & Test** to verify the connection. ## Step 5: Create or Use Pre-Configured Dashboards Now that Grafana is set up with Prometheus as a data source, you can create custom dashboards or use Quickwit's pre-configured dashboards: 1. Go to the **Dashboards** section in Grafana. 2. Import or create a new dashboard to visualize metrics. 3. Alternatively, use one of Quickwit’s [pre-configured dashboards](../../operating/monitoring). ================================================ FILE: docs/get-started/tutorials/trace-analytics-with-grafana.md ================================================ --- title: Logs and Traces with Grafana description: A simple tutorial to use Grafana with Quickwit's datasource plugin. icon_url: /img/tutorials/quickwit-logo.png tags: [grafana, integration] sidebar_position: 2 --- In this tutorial, we will set up a Grafana Dashboard showing Quickwit traces using Docker Compose. You only need a few minutes to get Grafana working with Quickwit and build meaningful dashboards. ## Create a Docker Compose recipe First, create a `docker-compose.yml` file. This file will define the services needed to run Quickwit with OpenTelemetry and Grafana with the Quickwit Datasource plugin. Below is the complete Docker Compose configuration: ```yaml version: '3.0' services: quickwit: image: quickwit/quickwit environment: QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER: "true" OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:7281" ports: - 7280:7280 command: ["run"] grafana: image: grafana/grafana-oss container_name: grafana ports: - "${MAP_HOST_GRAFANA:-127.0.0.1}:3000:3000" environment: GF_INSTALL_PLUGINS: https://github.com/quickwit-oss/quickwit-datasource/releases/download/v0.4.6/quickwit-quickwit-datasource-0.4.6.zip;quickwit-quickwit-datasource GF_AUTH_DISABLE_LOGIN_FORM: "true" GF_AUTH_ANONYMOUS_ENABLED: "true" GF_AUTH_ANONYMOUS_ORG_ROLE: Admin ``` The default Grafana port is 3000. If this port is already taken, you can modify the port mapping, for example, changing 3000:3000 to 3100:3000 or any other available port. Save and run the recipe: ```bash $ docker compose up ``` You should be able to access Quickwit's UI on `http://localhost:7280/` and Grafana's UI on `http://localhost:3000/`. ## Setting up the datasource In Grafana, head to [Data Sources](http://localhost:3000/connections/datasources). If the plugin is installed correctly you should be able to find Quickwit in the list. We're going to set up a new Quickwit data source looking at Quickwit's own OpenTelemetry traces, let's configure the datasource with the following parameters: - URL : `http://quickwit:7280/api/v1` _This uses the docker service name as the host_ - Index ID : `otel-traces-v0_7` Save and test, you should obtain a confirmation that the datasource is correctly set up. ![Quickwit Plugin configuration success](../../assets/images/grafana-ui-quickwit-datasource-plugin-success.png) You can also set up a new Quickwit data source looking at Quickwit's own OpenTelemetry logs (or your own logs index), let's configure the datasource with the following parameters: - URL : `http://quickwit:7280/api/v1` _This uses the docker service name as the host_ - Index ID : `otel-logs-v0_7` ## Creating a dashboard You can then [create a new dashboard](http://localhost:3000/dashboard/new) and add a visualization : you should be able to choose the traces quickwit datasource here. Quickwit sends itself its own traces, so you should already have data to display. Let's configure some panels ! - a Table counting span_names - **Panel type** : Table - **Query**: _empty_ - **Metric** : Count - **Group by** : Terms : `span_name` : order by Count - a Bar Chart showing the amount of tantivy searches per hour : - **Panel type**: Time Series - **Query** : "span_name:tantivy_search" - **Metric**: Count - **Group by** : Date Histogram : `span_start_timestamp_nanos` : Interval 1h - a Bar Chart showing the amount of ERROR logs per hour for the last 6 hours : - **Panel type**: Bar Chart - **Query**: "service_name:quickwit AND events.event_attributes.level:ERROR" - **Metric**: Count - **Group by** : Terms : `span_start_timestamp_nanos` : Interval 1h - another query on the same Bar Chart for WARN logs ## The result Here's what your first dashboard can look like : ![Quickwit Panel in Grafana Dashboard](../../assets/images/screenshot-grafana-tutorial-dashboard.png) ================================================ FILE: docs/get-started/tutorials/tutorial-hdfs-logs-distributed-search-aws-s3.md ================================================ --- title: Distributed search on AWS S3 description: Index log entries on AWS S3 using an EC2 instance and launch a distributed cluster. tags: [aws, integration] icon_url: /img/tutorials/aws-logo.png sidebar_position: 6 --- In this guide, we will index about 40 million log entries (13 GB decompressed) on AWS S3 using an EC2 instance and launch a three-node distributed search cluster. Example of a log entry: ```json { "timestamp": 1460530013, "severity_text": "INFO", "body": "PacketResponder: BP-108841162-10.10.34.11-1440074360971:blk_1074072698_331874, type=HAS_DOWNSTREAM_IN_PIPELINE terminating", "resource": { "service": "datanode/01" }, "attributes": { "class": "org.apache.hadoop.hdfs.server.datanode.DataNode" } } ``` :::caution Before using Quickwit with an object storage, check out our [advice](../../operating/aws-costs) for deploying on AWS S3 to avoid some bad surprises at the end of the month. ::: First of all, let's create an EC2 instance, install a Quickwit binary, and [configure it](../../guides/aws-setup) to let Quickwit access your S3 buckets. This instance will be used for indexing our dataset (note that you can also index your dataset from your local machine if it has the rights to read/write on AWS S3). ## Install ```bash curl -L https://install.quickwit.io | sh cd quickwit-v*/ ``` ## Configure Quickwit with S3 Let's define the S3 path where we want to store our indexes. ```bash export S3_PATH=s3://{path/to/bucket}/indexes ``` :::note You'll want to include the necessary authorization for the given bucket, this can be done by setting the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables, or via the AWS credentials file. Usually located at `~/.aws/credentials`. For more info check out [our AWS setup guide](https://quickwit.io/docs/guides/aws-setup) ::: Now we can create a Quickwit config file. ```bash # Create Quickwit config file. echo "version: 0.7 node_id: searcher-1 listen_address: 0.0.0.0 metastore_uri: ${S3_PATH} default_index_root_uri: ${S3_PATH} " > config.yaml ``` > You can also pass environment variables directly: > ```yaml > # config.yaml > node_id: searcher-1 > listen_address: 0.0.0.0 > version: 0.7 > metastore_uri: ${S3_PATH} > default_index_root_uri: ${S3_PATH} >``` We are now ready to start Quickwit. ```bash ./quickwit run --config config.yaml ``` ## Create your index ```bash # First, download the hdfs logs config from Quickwit repository. curl -o hdfs_logs_index_config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/hdfs-logs/index-config.yaml ``` The index config defines five fields: `timestamp`, `tenant_id`, `severity_text`, `body`, and one JSON field for the nested values `resource.service`, we could use an object field here and maintain a fixed schema, but for convenience we're going to use a JSON field. It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`. The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../../overview/architecture) at query time to boost search speed. Check out the [index config docs](../../configuration/index-config) for more details. ```yaml title="hdfs_logs_index_config.yaml" version: 0.7 index_id: hdfs-logs doc_mapping: field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: tenant_id type: u64 - name: severity_text type: text tokenizer: raw - name: body type: text tokenizer: default record: position - name: resource type: json tokenizer: raw tag_fields: [tenant_id] timestamp_field: timestamp search_settings: default_search_fields: [severity_text, body] ``` We can now create the index with the `create` subcommand. ```bash ./quickwit index create --index-config hdfs_logs_index_config.yaml ``` :::note This step can also be executed on your local machine. The `create` command creates the index locally and then uploads a json file `metastore.json` to your bucket at `s3://path-to-your-bucket/hdfs-logs/metastore.json`. ::: ## Index logs The dataset is a compressed [NDJSON file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz). Instead of downloading and indexing the data in separate steps, we will use pipes to send a decompressed stream to Quickwit directly. ```bash wget https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz gunzip -c hdfs-logs-multitenants.json.gz | ./quickwit index ingest --index hdfs-logs ``` :::note 8GB of RAM is enough to index this dataset; an instance like `t4g.large` with 8GB and 2 vCPU indexed this dataset in less than 10 minutes (provided that you have some CPU credits). This step can also be done on your local machine. The `ingest` subcommand generates locally [splits](../../overview/architecture) of 10 million documents and will upload them on your bucket. Concretely, each split is a bundle of index files and metadata files. ::: You can check it's working by using `search` subcommand and look for `ERROR` in `severity_text` field: ```bash ./quickwit index search --index hdfs-logs --query "severity_text:ERROR" ``` which returns the json ```json { "num_hits": 345, "hits": [ { "attributes": { "class": "org.apache.hadoop.hdfs.server.datanode.DataNode" }, "body": "RECEIVED SIGNAL 15: SIGTERM", "resource": { "service": "datanode/16" }, "severity_text": "ERROR", "tenant_id": 51, "timestamp": 1469687755 }, ... ], "elapsed_time_micros": 522542 } ``` You can see that this query has 345 hits. In this case for the first run, the server responded in 523 milliseconds. Subsequent runs use the cached metastore and can be resolved in under 100 milliseconds. Now that we have indexed the logs and can search from one instance, it's time to configure and start two other instances to form a cluster. ## Start two more instances Quickwit needs a port `rest.listen_port` for serving the HTTP rest API via TCP as well as maintaining the cluster formation via UDP. Also, it needs `{rest.listen_port} + 1` for gRPC communication between instances. In AWS, you can create a security group to group these inbound rules. Check out the [network section](../../guides/aws-setup) of our AWS setup guide. To make things easier, let's create a security group that opens the TCP/UDP port range [7200-7300]. Next, create three EC2 instances using the previously created security group. Take note of each instance's public IP address. Now ssh into the first EC2 instance, install Quickwit, and [configure the environment](../../guides/aws-setup) to let Quickwit access the index S3 buckets. Let's install Quickwit on the second and third EC2 instances. ```bash curl -L https://install.quickwit.io | sh cd quickwit-v*/ ``` And configure the environment so instances can form a cluster: ```bash export S3_PATH=s3://{path/to/bucket}/indexes export IP_NODE_1={first-ec2-instance-public-ip} ``` ```bash # configuration for our second node echo "version: 0.7 node_id: searcher-2 metastore_uri: ${S3_PATH} default_index_root_uri: ${S3_PATH} listen_address: 0.0.0.0 peer_seeds: - ${IP_NODE_1} # searcher-1 " > config.yaml # Start a Quickwit searcher. ./quickwit run --service searcher --config config.yaml ``` ```bash # configuration for our third node echo "version: 0.7 node_id: searcher-3 listen_address: 0.0.0.0 peer_seeds: - ${IP_NODE_1} # searcher-1 metastore_uri: ${S3_PATH} default_index_root_uri: ${S3_PATH} " > config.yaml # Start a Quickwit searcher. ./quickwit run --service searcher --config config.yaml ``` You will see in the terminal the confirmation that the instance has joined the existing cluster. Example of such a log: ``` 2023-03-19T16:44:56.918Z INFO quickwit_cluster::cluster: Joining cluster. cluster_id=quickwit-default-cluster node_id=searcher-2 enabled_services={Searcher} gossip_listen_addr=0.0.0.0:7280 gossip_advertise_addr=172.31.30.168:7280 grpc_advertise_addr=172.31.30.168:7281 peer_seed_addrs=172.31.91.203:7280 ``` Now we can query one of our instance directly by issuing http requests to one of the nodes rest API endpoint. ``` curl -v "http://0.0.0.0:7280/api/v1/hdfs-logs/search?query=severity_text:ERROR" ``` Check out the logs of all instances and you will see that all nodes are working. ## Load balancing incoming requests Now that you have a search cluster, ideally, you will want to load balance external requests. This can quickly be done by adding an AWS load balancer to listen to incoming HTTP or HTTPS traffic and forward it to a target group. You can now play with your cluster, kill processes randomly, add/remove new instances, and keep calm. ## Clean Let's do some cleanup by deleting the index: ```bash ./quickwit index delete --index hdfs-logs ``` Also remember to remove the security group to protect your EC2 instances. You can just remove the instances if you don't need them. Congratz! You finished this tutorial! To continue your Quickwit journey, check out the [search REST API reference](/docs/reference/rest-api) or the [query language reference](/docs/reference/query-language). ================================================ FILE: docs/get-started/tutorials/tutorial-hdfs-logs.md ================================================ --- title: Index a logging dataset locally description: Index log entries on a local machine. tags: [self-hosted, setup] icon_url: /img/quickwit-icon.svg sidebar_position: 3 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; In this guide, we will index about 20 million log entries (7 GB decompressed) on a local machine. If you want to start a server with indexes on AWS S3 with several search nodes, check out the [tutorial for distributed search](tutorial-hdfs-logs-distributed-search-aws-s3.md). Here is an example of a log entry: ```json { "timestamp": 1460530013, "severity_text": "INFO", "body": "PacketResponder: BP-108841162-10.10.34.11-1440074360971:blk_1074072698_331874, type=HAS_DOWNSTREAM_IN_PIPELINE terminating", "resource": { "service": "datanode/01" }, "attributes": { "class": "org.apache.hadoop.hdfs.server.datanode.DataNode" }, "tenant_id": 58 } ``` ## Install Let's download and install Quickwit. ```bash curl -L https://install.quickwit.io | sh cd quickwit-v*/ ``` Or pull and run the Quickwit binary in an isolated Docker container. ```bash docker run quickwit/quickwit --version ``` ## Start a Quickwit server ```bash ./quickwit run ``` ```bash docker run --rm -v $(pwd)/qwdata:/quickwit/qwdata -p 127.0.0.1:7280:7280 quickwit/quickwit run ``` You may need to specify the platform if you are using Apple silicon based macOS system with the `--platform linux/amd64` flag. You can also safely ignore jemalloc warnings. ## Create your index Let's create an index configured to receive these logs. ```bash # First, download the hdfs logs config from Quickwit repository. curl -o hdfs_logs_index_config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/hdfs-logs/index-config.yaml ``` The index config defines five fields: `timestamp`, `tenant_id`, `severity_text`, `body`, and one JSON field for the nested values `resource.service`, we could use an object field here and maintain a fixed schema, but for convenience we're going to use a JSON field. It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`. The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../../overview/concepts/querying.md#time-sharding) at query time to boost search speed. Check out the [index config docs](../../configuration/index-config) for more details. ```yaml title="hdfs-logs-index.yaml" version: 0.7 index_id: hdfs-logs doc_mapping: field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: tenant_id type: u64 - name: severity_text type: text tokenizer: raw - name: body type: text tokenizer: default record: position - name: resource type: json tokenizer: raw tag_fields: [tenant_id] timestamp_field: timestamp search_settings: default_search_fields: [severity_text, body] ``` Now let's create the index with the `create` subcommand (assuming you are inside Quickwit install directory): ```bash ./quickwit index create --index-config hdfs_logs_index_config.yaml ``` ```bash curl -XPOST http://localhost:7280/api/v1/indexes -H "content-type: application/yaml" --data-binary @hdfs_logs_index_config.yaml ``` You're now ready to fill the index. ## Index logs The dataset is a compressed [NDJSON file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz). Instead of downloading it and then indexing the data, we will use pipes to directly send a decompressed stream to Quickwit. This can take up to 10 minutes on a modern machine, the perfect time for a coffee break. ```bash curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz | gunzip | ./quickwit index ingest --index hdfs-logs ``` ```bash curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz | gunzip | docker run -v $(pwd)/qwdata:/quickwit/qwdata -i quickwit/quickwit index ingest --index hdfs-logs ``` If you are in a hurry, use the sample dataset that contains 10 000 documents, we will use this dataset for the example queries: ```bash curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants-10000.json | ./quickwit index ingest --index hdfs-logs ``` On macOS or Windows: ```bash curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants-10000.json | docker run -v $(pwd)/qwdata:/quickwit/qwdata -i quickwit/quickwit index ingest --index hdfs-logs --endpoint http://host.docker.internal:7280 ``` On linux: ```bash curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants-10000.json | docker run --network=host -v $(pwd)/qwdata:/quickwit/qwdata -i quickwit/quickwit index ingest --index hdfs-logs --endpoint http://127.0.0.1:7280 ``` ```bash wget https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants-10000.json curl -XPOST http://localhost:7280/api/v1/hdfs-logs/ingest -H "content-type: application/json" --data-binary @hdfs-logs-multitenants-10000.json ``` You can check it's working by searching for `INFO` in `severity_text` field: ```bash ./quickwit index search --index hdfs-logs --query "severity_text:INFO" ``` On macOS or Windows: ```bash docker run -v $(pwd)/qwdata:/quickwit/qwdata quickwit/quickwit index search --index hdfs-logs --query "severity_text:INFO" --endpoint http://host.docker.internal:7280 ``` On linux: ```bash docker run --network=host -v $(pwd)/qwdata:/quickwit/qwdata quickwit/quickwit index search --index hdfs-logs --query "severity_text:INFO" --endpoint http://127.0.0.1:7280 ``` :::note The `ingest` subcommand generates [splits](../../overview/architecture) of 5 million documents. Each split is a small piece of index represented by a file in which index files and metadata files are saved. ::: The query which returns the json: ```json { "num_hits": 10000, "hits": [ { "body": "Receiving BP-108841162-10.10.34.11-1440074360971:blk_1073836032_95208 src: /10.10.34.20:60300 dest: /10.10.34.13:50010", "resource": { "service": "datanode/03" }, "severity_text": "INFO", "tenant_id": 58, "timestamp": 1440670490 } ... ], "elapsed_time_micros": 2490 } ``` The index config shows that we can use the timestamp field parameters `start_timestamp` and `end_timestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../../overview/architecture) that have logs in this time range. Let's use these parameters with the following query: ```bash curl 'http://127.0.0.1:7280/api/v1/hdfs-logs/search?query=severity_text:INFO&start_timestamp=1440670490&end_timestamp=1450670490' ``` ## Clean Let's do some cleanup by deleting the index: ```bash ./quickwit index delete --index hdfs-logs ``` ```bash curl -XDELETE http://127.0.0.1:7280/api/v1/indexes/hdfs-logs ``` Congratz! You finished this tutorial! To continue your Quickwit journey, check out the [tutorial for distributed search](tutorial-hdfs-logs-distributed-search-aws-s3.md) or dig into the [search REST API](/docs/reference/rest-api) or [query language](/docs/reference/query-language). ================================================ FILE: docs/get-started/tutorials/tutorial-jaeger.md ================================================ --- title: Traces with Jaeger sidebar_position: 2 --- In this quick start guide, we will set up a Quickwit instance and analyze its own traces with Jaeger using Docker Compose. You only need a minute to get Jaeger working with Quickwit storage backend. ## Start Quickwit and Jaeger Let's use `docker compose` with the following configuration: ```yaml title="docker-compose.yaml" version: "3" services: quickwit: image: quickwit/quickwit:${QW_VERSION:-0.8.1} volumes: - ./qwdata:/quickwit/qwdata ports: - 7280:7280 environment: - QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER=true - OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:7281 command: ["run"] jaeger-query: image: jaegertracing/jaeger-query:1.60 ports: - 16686:16686 environment: - SPAN_STORAGE_TYPE=grpc - GRPC_STORAGE_SERVER=quickwit:7281 - GRPC_STORAGE_TLS=false ``` As you can see in the docker compose file, Quickwit is configured to send its own traces `OTEL_EXPORTER_OTLP_ENDPOINT` to itself `http://localhost:7281`. On the other side, Jaeger is configured to use a gRPC storage server `quickwit:7281`. Save and run the recipe: ```bash $ docker compose up ``` You should be able to access Quickwit's UI on `http://localhost:7280/` and Jager's UI on `http://localhost:16686/`. ## Searching and view traces in Jaeger Quickwit generates many traces, let's take a look at some of them: - `find_traces`: generated by the "Find traces" Jaeger button. - `get_operations`: generated by Jaeger when it is fetching the list of operations. - `get_services`: generated by Jaeger when it is fetching the list of services. - `ingest-spans`: generated when Quickwit receives spans on the gRPC OTLP API. - ... Here are the screenshots of the search and trace view: ![Jaeger search view](../../assets/images/jaeger-ui-quickwit-search-traces.png) ![Jaeger trace view](../../assets/images/jaeger-ui-quickwit-trace-view.png) ## Searching traces with Quickwit UI You can also use the Quickwit UI at [http://localhost:7280](http://localhost:7280) to search traces. Here are a couple of query examples: - `service_name:quickwit AND events.event_attributes.level:INFO` - `span_duration_millis:>100` - `resource_attributes.service.version:v0.8.1` - `service_name:quickwit` That's it! You can level up with the following tutorials to discover all Quickwit features. ## Next tutorials - [Send traces using an OTEL collector](/docs/distributed-tracing/send-traces/using-otel-collector.md) - [Send traces from a python web server](/docs/distributed-tracing/send-traces/using-otel-sdk-python.md) ================================================ FILE: docs/guides/_category_.yaml ================================================ label: 'Guides' position: 8 collapsed: true ================================================ FILE: docs/guides/aws-setup.md ================================================ --- title: AWS cluster setup sidebar_position: 3 --- Setting up a Quickwit cluster on AWS requires the configuration of three elements: - AWS credentials - AWS region - Network configuration ## AWS credentials When starting a node, Quickwit attempts to find AWS credentials using the credential provider chain implemented by [rusoto_core::ChainProvider](https://docs.rs/rusoto_credential/latest/rusoto_credential/struct.ChainProvider.html) and looks for credentials in this order: 1. Environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, or `AWS_SESSION_TOKEN` (optional). 2. Credential profiles file, typically located at `~/.aws/credentials` or otherwise specified by the `AWS_SHARED_CREDENTIALS_FILE` and `AWS_PROFILE` environment variables if set and not empty. 3. Amazon ECS container credentials, loaded from the Amazon ECS container if the environment variable `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI` is set. 4. Instance profile credentials, used on Amazon EC2 instances, and delivered through the Amazon EC2 metadata service. An error is returned if no credentials are found in the chain. ## AWS region Quickwit attempts to find an AWS region in multiple locations and with the following order of precedence: 1. Environment variables (`AWS_REGION` then `AWS_DEFAULT_REGION`) 2. Config file, typically located at `~/.aws/config` or otherwise specified by the `AWS_CONFIG_FILE` environment variable if set and not empty. 3. Amazon EC2 instance metadata service indicating the region of the currently running Amazon EC2 instance. 4. Default value: `us-east-1` :::note AWS credentials or region resolution may take a few seconds, especially if the Amazon EC2 instance metadata service is slow or unavailable. ::: ## IAM permissions ### Amazon S3 Required authorized actions: - `ListBucket` (on the bucket directly) - `GetObject` - `PutObject` - `DeleteObject` - `ListMultipartUploadParts` - `AbortMultipartUpload` Here is an example of a bucket policy: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-bucket" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload" ], "Resource": [ "arn:aws:s3:::my-bucket/*" ] } ] } ``` You can run the following commands to verify that AWS credentials, region, and IAM permissions are properly configured for Amazon S3: ```bash MY_BUCKET= aws s3 ls $MY_BUCKET echo "Hello, World!" | aws s3 cp - $MY_BUCKET/hello aws s3 ls $MY_BUCKET/hello aws s3 cp $MY_BUCKET/hello - aws s3 rm $MY_BUCKET/hello ``` ### Amazon Kinesis - `GetRecords` - `GetShardIterator` - `ListShards` You can run the following commands to verify that AWS credentials, region, and IAM permissions are properly configured for Amazon Kinesis: ```bash MY_STREAM= # List the shards in the stream and select the first one. SHARD_ID=$( aws kinesis list-shards --stream-name $MY_STREAM \ | jq -r .Shards[0].ShardId ) # Get a shard iterator for the selected shard. SHARD_ITERATOR=$( aws kinesis get-shard-iterator --stream-name $MY_STREAM \ --shard-id $SHARD_ID \ --shard-iterator-type TRIM_HORIZON \ | jq -r .ShardIterator ) # Fetch some records from the shard and display the first one. aws kinesis get-records --shard-iterator $SHARD_ITERATOR | jq -r .Records[0] ``` ## Network configuration ### Security groups To communicate with each other, nodes must reside in security groups that allow inbound and outbound traffic on one UDP port and two TCP ports. Please, refer to the [ports configuration](/configuration/ports-config.md) page for more details. ## Common errors If you set the wrong credentials, you will see this error message with `Unauthorized` in your terminal: ```bash Command failed: Another error occurred. `Metastore error`. Cause: `StorageError(kind=Unauthorized, source=failed to fetch object: s3://quickwit-dev/my-hdfs/metastore.json)` ``` If you put the wrong region, you will see this one: ```bash Command failed: Another error occurred. `Metastore error`. Cause: `StorageError(kind=Internal, source=failed to fetch object: s3://your-bucket/your-index/metastore.json)`. ``` ================================================ FILE: docs/guides/schemaless.md ================================================ --- title: Schemaless sidebar_position: 1 --- # Strict schema or schemaless? Quickwit lets you place the cursor on how strict you would like your schema to be. In other words, it is possible to operate Quickwit with a very strict mapping, in an entirely schemaless manner, and anywhere in between. Let's see how this works! :::note To execute the CLI commands throughout this guide, [install](/docs/get-started/installation.md) Quickwit and start a server in a terminal with the following command: ```bash ./quickwit run ``` ::: ## A strict mapping That's the most straightforward approach. As a user, you need to precisely define the list of fields to be ingested by Quickwit. For instance, a reasonable mapping for an application log could be: ```yaml title=my_strict_index.yaml version: 0.7 index_id: my_strict_index doc_mapping: mode: strict # <--- The mode attribute field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: server type: text tokenizer: raw - name: message type: text record: position - name: severity tokenizer: raw timestamp_field: timestamp search_settings: default_search_fields: [severity, message] indexing_settings: commit_timeout_secs: 30 ``` The `mode` attribute controls what should be done if an ingested document contains a field that is not defined in the document mapping. By default, your index is in the `dynamic` mode. In `dynamic` mode, the fields that do not appear in the document mapping will be indexed in a schemaless fashion. See details in the [dynamic mode section](#dynamic-mode). If `mode` is set to `strict` on the other hand, documents containing fields that are not defined in the mapping will be entirely discarded. Finally the last possible value for `mode` is `lenient`. In lenient mode, fields that are not present in the field mapping will simply be ignored. ## The dynamic mode: schemaless with a partial schema {#dynamic-mode} `mode` can take the value: `dynamic`. When set to dynamic, all extra fields will actually be mapped using a catch-all configuration. By default, this catch-all configuration indexes and stores all of these fields, but this can be configured by setting the [`dynamic_mapping` attribute](../configuration/index-config#mode). A minimalist, yet perfectly valid and useful index configuration is then: ```yaml title=my_dynamic_index.yaml version: 0.7 index_id: my_dynamic_index doc_mapping: mode: dynamic ``` This configuration makes it possible to ingest any JSON object and search them. However, the dynamic mode can also be used in conjunction with field mappings. This combination is especially powerful for event logs which cannot be mapped to a single schema. For instance, let's consider the following user event log: ```json file title=my_logs.json { "timestamp": 1653021741, "user_id": "8705a7fak", "event_type": "login", "ab_groups": ["phoenix-red-ux"] } { "timestamp": 1653021746, "user_id": "7618fe06", "event_type": "order", "ab_groups": ["phoenix-red-ux", "new-ranker"], "cart": [ { "product_id": 120391, "product_description": "Cherry Pi: A single-board computer that is compatible..." } ] } { "timestamp": 1653021748, "user_id": "8705a7fak", "event_type": "login", "ab_groups": ["phoenix-red-ux"] } ``` Each event type comes with its own set of attributes. Declaring our mapping as the union of all of these event-specific mappings would be a tedious exercise. Instead, we can cherry-pick the fields that are common to all of the logs, and rely on dynamic mode to handle the rest. ```yaml title=my_dynamic_index.yaml version: 0.7 index_id: my_dynamic_index doc_mapping: mode: dynamic field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: user_id type: text tokenizer: raw - name: event_type type: text tokenizer: raw timestamp_field: timestamp indexing_settings: commit_timeout_secs: 30 # <--- Your document will be searchable ~30 seconds after you ingest them. ``` Our index is now ready to handle queries like this: ``` event_type:order AND cart.product_id:120391 ``` Execute the following commands to create the index, ingest a few documents and search through them: ```bash cat << EOF > my_dynamic_index.yaml version: 0.7 index_id: my_dynamic_index doc_mapping: mode: dynamic field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: user_id type: text tokenizer: raw - name: event_type type: text tokenizer: raw timestamp_field: timestamp indexing_settings: commit_timeout_secs: 30 EOF # Create index. ./quickwit index create --index-config ./my_dynamic_index.yaml --overwrite --yes cat << EOF > my_logs.json {"timestamp":1653021741,"user_id":"8705a7fak","event_type":"login","ab_groups":["phoenix-red-ux"]} {"timestamp":1653021746,"user_id":"7618fe06","event_type":"order","ab_groups":["phoenix-red-ux","new-ranker"],"cart":[{"product_id":120391,"product_description":"Cherry Pi: A single-board computer that is compatible..."}]} {"timestamp":1653021748,"user_id":"8705a7fak","event_type":"login","ab_groups":["phoenix-red-ux"]} EOF # Ingest documents. ./quickwit index ingest --index my_dynamic_index --input-path my_logs.json --force # Execute search query. ./quickwit index search --index my_dynamic_index --query "event_type:order AND cart.product_id:120391 ``` ## A schema with schemaless pockets Some logs are isolating these event-specific attributes in a sub-field. For instance, let's have a look at an OpenTelemetry JSON log. ```json title=otel_logs.json { "Timestamp": 1653028151, "Attributes": { "split_id": "28f897f2-0419-4d88-8abc-ada72b4b5256" }, "Resource": { "service": "donut_shop", "k8s_pod_uid": "27413708-876b-4652-8ca4-50e8b4a5caa2" }, "TraceId": "f4dbb3edd765f620", "SpanId": "43222c2d51a7abe3", "SeverityText": "INFO", "SeverityNumber": 9, "Body": "merge ended" } ``` In this log, the `Attributes` and the `Resource` fields contain arbitrary key-values. Quickwit 0.3 introduced a JSON field type to handle this use case. A good index configuration here could be: ```yaml title=otel_logs.yaml version: 0.7 index_id: otel_logs doc_mapping: mode: dynamic field_mappings: - name: Timestamp type: datetime fast: true input_formats: - unix_timestamp output_format: unix_timestamp_secs fast_precision: seconds fast: true - name: Attributes type: json tokenizer: raw - name: Resource type: json tokenizer: raw - name: TraceId type: text tokenizer: raw - name: SpanId type: text tokenizer: raw - name: SeverityText type: text tokenizer: raw fast: true - name: Body type: text timestamp_field: Timestamp search_settings: default_search_fields: [SeverityText, Body, Attributes, Resource] indexing_settings: commit_timeout_secs: 10 ``` We can now naturally search our logs with the following query: ``` merge AND service:donuts_shop ``` Let's execute the following commands to create the index, ingest a document and execute a search query: ```bash # Create index. ./quickwit index create --index-config ./otel_logs.yaml --overwrite --yes cat << EOF > otel_logs.json {"Timestamp":1653028151,"Attributes":{"split_id":"28f897f2-0419-4d88-8abc-ada72b4b5256"},"Resource":{"service":"donut_shop","k8s_pod_uid":"27413708-876b-4652-8ca4-50e8b4a5caa2"},"TraceId":"f4dbb3edd765f620","SpanId":"43222c2d51a7abe3","SeverityText":"INFO","SeverityNumber":9,"Body":"merge ended"} EOF # Ingest documents. ./quickwit index ingest --index otel_logs --input-path otel_logs.json --force # Execute search query. ./quickwit index search --index otel_logs --query "merge AND service:donut_shop" ``` ================================================ FILE: docs/guides/storage-setup/_category_.yaml ================================================ label: 'Storage Setup' position: 2 collapsed: true ================================================ FILE: docs/guides/storage-setup/aws-s3.md ================================================ --- title: AWS S3 sidebar_position: 1 --- In this guide, you will learn how to configure a Quickwit [storage](../../configuration/storage-config) for Amazon S3. ## Set your AWS credentials A simple way to do it is to declare the environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`. For more details, read our guide on [AWS setup](../aws-setup). ## Set the Metastore URI and default index URI Here is an example of how to set up your [node config file](../../configuration/node-config) with S3: ```yaml metastore_uri: s3://{my-bucket}/indexes default_index_uri: s3://{my-bucket}/indexes ``` ## Set the Index URI Here is an example of how to set up your index URI in the [index config file](../../configuration/index-config): ```yaml index_uri: s3://{my-bucket}/indexes/{my-index-id} ``` ================================================ FILE: docs/ingest-data/_category_.yaml ================================================ label: 'Ingest data' position: 4 collapsed: true ================================================ FILE: docs/ingest-data/index.md ================================================ --- title: Ingest data from multiple sources --- import DocCardList from '@theme/DocCardList'; It is possible to ingest data with log shippers like [OpenTelemetry](../log-management/overview.md#opentelemetry-agent), [Fluentbit](../log-management/send-logs/using-fluentbit.md), or [Vector](../log-management/send-logs/using-vector.md). It's also possible to send traces from your apps to the [OpenTelemetry Collector](../log-management/send-logs/using-otel-collector-with-helm.md) and then to Quickwit. ================================================ FILE: docs/ingest-data/ingest-api.md ================================================ --- title: Ingest API description: A short tutorial describing how to send data in Quickwit using the ingest API tags: [ingest-api, integration] icon_url: /img/tutorials/quickwit-logo.svg sidebar_position: 1 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; In this tutorial, we will describe how to send data to Quickwit using the ingest API. You will need a [local Quickwit instance](../get-started/installation) up and running to follow this tutorial. To start it, run `./quickwit run` in a terminal. ## Create an index First, let's create a schemaless index. ```bash # Create the index config file. cat << EOF > stackoverflow-schemaless-config.yaml version: 0.7 index_id: stackoverflow-schemaless doc_mapping: mode: dynamic dynamic_mapping: tokenizer: default indexing_settings: commit_timeout_secs: 30 EOF # Use the CLI to create the index... ./quickwit index create --index-config stackoverflow-schemaless-config.yaml # Or with cURL. curl -XPOST -H 'Content-Type: application/yaml' 'http://localhost:7280/api/v1/indexes' --data-binary @stackoverflow-schemaless-config.yaml ``` Note that for this example, we configure the dynamic mapping to use the [default tokenizer](../configuration/index-config.md#description-of-available-tokenizers). This is necessary to enable full-text search on all text fields. ## Ingest data Let's first download a sample of the [StackOverflow dataset](https://www.kaggle.com/stackoverflow/stacksample). ```bash # Download the first 10_000 Stackoverflow posts articles. curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json ``` You can ingest data either with the CLI or with cURL. The CLI is more convenient for ingesting several GB as Quickwit may return `429` responses if the ingest queue is full. Quickwit CLI will automatically retry ingestion in this case. ```bash # Ingest the first 10_000 Stackoverflow posts articles with the CLI... ./quickwit index ingest --index stackoverflow-schemaless --input-path stackoverflow.posts.transformed-10000.json --force # OR with cURL. curl -XPOST -H 'Content-Type: application/json' 'http://localhost:7280/api/v1/stackoverflow-schemaless/ingest?commit=force' --data-binary @stackoverflow.posts.transformed-10000.json ``` ## Execute search queries You can now search the index. ```bash curl 'http://localhost:7280/api/v1/stackoverflow-schemaless/search?query=body:python' ``` ## Tear down resources (optional) ```bash curl -XDELETE 'http://localhost:7280/api/v1/indexes/stackoverflow-schemaless' ``` This concludes the tutorial. You can now move on to the [next tutorial](/docs/ingest-data/kafka.md) to learn how to ingest data from Kafka. ## Ingest API versions In 0.9, Quickwit introduced a new version of the ingest API that enables distributing the indexing in the cluster regardless of the node that received the ingest request. This new ingestion service is often referred to as "Ingest V2" compared to the legacy ingestion (V1). In upcoming versions the new ingest API will also be capable of replicating the write ahead log in order to achieve higher durability. By default, both ingestion services are enabled and ingest V2 is used. You can toggle this behavior with the following environment variables: | Variable | Description | Default value | | --------------------- | --------------|-------------- | | `QW_ENABLE_INGEST_V2` | Start the V2 ingest service and use it by default. | true | | `QW_DISABLE_INGEST_V1`| V1 ingest will be used by the APIs only if V2 is disabled. Running V1 along V2 is necessary to migrate to V2 without loosing existing unindexed V1 logs. | false | :::note These configurations drive the ingest service used both by the `api/v1//ingest` endpoint and the [bulk API](../reference/es_compatible_api.md#_bulk--batch-ingestion-endpoint). ::: ================================================ FILE: docs/ingest-data/ingest-local-file.md ================================================ --- title: Local file description: A short tutorial describing how to index a local file with the Quickiwt CLI tags: [local-ingest, integration] icon_url: /img/tutorials/file-ndjson.svg sidebar_position: 2 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; In this tutorial, we will describe how to index a local file with the Quickwit CLI. You will need the [Quickwit binary](/docs/get-started/installation.md) to follow this tutorial. ## Create an index First, let's create a schemaless index. We need to start a Quickwit server only for the creation so we will start it and shut it down afterwards. Start the Quickwit server. ```bash ./quickwit run ``` And create the index in a separate terminal. ```bash # Create the index config file. cat << EOF > stackoverflow-schemaless-config.yaml version: 0.7 index_id: stackoverflow-schemaless doc_mapping: mode: dynamic indexing_settings: commit_timeout_secs: 30 EOF ./quickwit index create --index-config stackoverflow-schemaless-config.yaml ``` You can now shutdown the server by pressing `Ctrl+C` in the first terminal. ## Ingest the file To ingest a file, you just need to execute the following command: ```bash ./quickwit tool local-ingest --index stackoverflow-schemaless --input-path stackoverflow.posts.transformed-10000.json ``` After a few seconds you should see the following output: ```bash ❯ Ingesting documents locally... --------------------------------------------------- Connectivity checklist ✔ metastore ✔ storage ✔ _ingest-cli-source Num docs 10000 Parse errs 0 PublSplits 1 Input size 6MB Thrghput 3.34MB/s Time 00:00:02 Num docs 10000 Parse errs 0 PublSplits 1 Input size 6MB Thrghput 2.23MB/s Time 00:00:03 Num docs 10000 Parse errs 0 PublSplits 1 Input size 6MB Thrghput 1.67MB/s Time 00:00:04 Indexed 10,000 documents in 4s. Now, you can query the index with the following command: quickwit index search --index stackoverflow-schemaless --config ./config/quickwit.yaml --query "my query" Clearing local cache directory... ✔ Local cache directory cleared. ✔ Documents successfully indexed. ``` :::tip Object store URIs like `s3://mybucket/mykey.json` are also supported as `--input-path`, provided that your environment is configured with the appropriate permissions. ::: ## Tear down resources (optional) That's it! You can now tear down the resources you created. You can do so by running the following command: ```bash ./quickwit run ``` And in a separate terminal: ```bash ./quickwit index delete --index-id stackoverflow-schemaless ``` This concludes the tutorial. You can now move on to the next tutorial. ================================================ FILE: docs/ingest-data/kafka.md ================================================ --- title: Kafka description: A short tutorial describing how to set up Quickwit to ingest data from Kafka in a few minutes tags: [kafka, integration] icon_url: /img/tutorials/kafka.svg sidebar_position: 2 --- In this tutorial, we will describe how to set up Quickwit to ingest data from Kafka in a few minutes. First, we will create an index and configure a Kafka source. Then, we will create a Kafka topic and load some events from the [GH Archive](https://www.gharchive.org/) into it. Finally, we will execute some search and aggregation queries to explore the freshly ingested data. ## Prerequisites You will need the following to complete this tutorial: - A running Kafka cluster (see Kafka [quickstart](https://kafka.apache.org/quickstart)) - A local Quickwit [installation](/docs/get-started/installation.md) ## Create index First, let's create a new index. Here is the index config and doc mapping corresponding to the schema of the GH Archive events: ```yaml title="index-config.yaml" # # Index config file for gh-archive dataset. # version: 0.7 index_id: gh-archive doc_mapping: field_mappings: - name: id type: text tokenizer: raw - name: type type: text fast: true tokenizer: raw - name: public type: bool fast: true - name: payload type: json tokenizer: default - name: org type: json tokenizer: default - name: repo type: json tokenizer: default - name: actor type: json tokenizer: default - name: other type: json tokenizer: default - name: created_at type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: created_at indexing_settings: commit_timeout_secs: 10 ``` Execute these Bash commands to download the index config and create the `gh-archive` index: ```bash # Download GH Archive index config. wget -O gh-archive.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/gh-archive/index-config.yaml # Create index. ./quickwit index create --index-config gh-archive.yaml ``` ## Create and populate Kafka topic Now, let's create a Kafka topic and load some events into it. ```bash # Create a topic named `gh-archive` with 3 partitions. bin/kafka-topics.sh --create --topic gh-archive --partitions 3 --bootstrap-server localhost:9092 # Download a few GH Archive files. wget https://data.gharchive.org/2022-05-12-{10..15}.json.gz # Load the events into Kafka topic. gunzip -c 2022-05-12*.json.gz | \ bin/kafka-console-producer.sh --topic gh-archive --bootstrap-server localhost:9092 ``` ## Create Kafka source :::note This tutorial assumes that the Kafka cluster is available locally on the default port (9092). If it's not the case, please, update the `bootstrap.servers` parameter accordingly. ::: ```yaml title="kafka-source.yaml" # # Kafka source config file. # version: 0.8 source_id: kafka-source source_type: kafka num_pipelines: 2 params: topic: gh-archive client_params: bootstrap.servers: localhost:9092 ``` Run these commands to download the source config file and create the source. ```bash # Download Kafka source config. wget https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/gh-archive/kafka-source.yaml # Create source. ./quickwit source create --index gh-archive --source-config kafka-source.yaml ``` :::note If you get the following error: ``` Command failed: Topic `gh-archive` has no partitions.``` It means the Kafka topic `gh-archive` was not properly created in the previous step. ::: ## Launch indexing and search services Finally, execute this command to start Quickwit in server mode. ```bash # Launch Quickwit services. ./quickwit run ``` Under the hood, this command spawns an indexer and a searcher. On startup, the indexer will connect to the Kafka topic specified by the source and start streaming and indexing events from the partitions composing the topic. With the default commit timeout value (see [indexing settings](../configuration/index-config#indexing-settings)), the indexer should publish the first split after approximately 60 seconds. You can run this command (in another shell) to inspect the properties of the index and check the current number of published splits: ```bash # Display some general information about the index. ./quickwit index describe --index gh-archive ``` Once the first split is published, you can start running search queries. For instance, we can find all the events for the Kubernetes [repository](https://github.com/kubernetes/kubernetes): ```bash curl 'http://localhost:7280/api/v1/gh-archive/search?query=org.login:kubernetes%20AND%20repo.name:kubernetes' ``` It is also possible to access these results through the [Quickwit UI](http://localhost:7280/ui/search?query=org.login%3Akubernetes+AND+repo.name%3Akubernetes&index_id=gh-archive&max_hits=10). We can also group these events by type and count them: ``` curl -XPOST -H 'Content-Type: application/json' 'http://localhost:7280/api/v1/gh-archive/search' -d ' { "query":"org.login:kubernetes AND repo.name:kubernetes", "max_hits":0, "aggs":{ "count_by_event_type":{ "terms":{ "field":"type" } } } }' ``` ## Secured Kafka connection (optional) The Quickwit Kafka source supports SSL and SASL authentication. This is particularly useful when consuming data from an external Kafka service. :::tip The certificate and key files must be present on all Quickwit nodes for the Kafka source to be created and for the indexing pipelines to run successfully. ::: ### SSL configuration ```yaml version: 0.8 source_id: kafka-source-ssl source_type: kafka num_pipelines: 2 params: topic: gh-archive client_params: bootstrap.servers: your-kafka-broker.com security.protocol: SSL ssl.ca.location: /path/to/ca.pem ssl.certificate.location: /path/to/service.cert ssl.key.location: /path/to/service.key ``` ### SASL configuration ```yaml version: 0.8 source_id: kafka-source-sasl source_type: kafka num_pipelines: 2 params: topic: gh-archive client_params: bootstrap.servers: your-kafka-broker.com ssl.ca.location: /path/to/ca.pem security.protocol: SASL_SSL sasl.mechanisms: SCRAM-SHA-256 sasl.username: your_sasl_username sasl.password: your_sasl_password ``` :::note If you get the following error: ```Client creation error: ssl.ca.location failed: error:05880002:x509 certificate routines::system lib``` It usually means the path to the CA certificate is incorrect. Update the `ssl.ca.location` parameter accordingly. ::: ## Tear down resources (optional) Let's delete the files and resources created for the purpose of this tutorial. ```bash # Delete Kafka topic. bin/kafka-topics.sh --delete --topic gh-archive --bootstrap-server localhost:9092 # Delete index. ./quickwit index delete --index gh-archive # Delete source config. rm kafka-source.yaml ``` This concludes the tutorial. If you have any questions regarding Quickwit or encounter any issues, don't hesitate to ask a [question](https://github.com/quickwit-oss/quickwit/discussions) or open an [issue](https://github.com/quickwit-oss/quickwit/issues) on [GitHub](https://github.com/quickwit-oss/quickwit) or contact us directly on [Discord](https://discord.com/invite/MT27AG5EVE). ================================================ FILE: docs/ingest-data/kinesis.md ================================================ --- title: Kinesis description: A short tutorial describing how to set up Quickwit to ingest data from Kinesis in a few minutes tags: [aws, integration] icon_url: /img/tutorials/aws-kinesis.svg sidebar_position: 4 --- In this tutorial, we will describe how to set up Quickwit to ingest data from Kinesis in a few minutes. First, we will create an index and configure a Kinesis source. Then, we will create a Kinesis stream and load some events from the [GH Archive](https://www.gharchive.org/) into it. Finally, we will execute some search and aggregation queries to explore the freshly ingested data. :::caution You will incur some charges for using the Amazon Kinesis service during this tutorial. ::: ## Prerequisites You will need the following to complete this tutorial: - The AWS CLI version 2 (see [Getting started with the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-prereqs.html) for prerequisites and installation) - A local Quickwit [installation](/docs/get-started/installation.md) - [jq](https://stedolan.github.io/jq/download/) - [GNU parallel](https://www.gnu.org/software/parallel/) :::note `jq` is required to reshape the events into records ingestable by the Amazon Kinesis API. ::: ### Create index First, let's create a new index. Here is the index config and doc mapping corresponding to the schema of the GH Archive events: ```yaml title="index-config.yaml" # # Index config file for gh-archive dataset. # version: 0.7 index_id: gh-archive doc_mapping: field_mappings: - name: id type: text tokenizer: raw - name: type type: text fast: true tokenizer: raw - name: public type: bool fast: true - name: payload type: json tokenizer: default - name: org type: json tokenizer: default - name: repo type: json tokenizer: default - name: actor type: json tokenizer: default - name: other type: json tokenizer: default - name: created_at type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: created_at indexing_settings: commit_timeout_secs: 10 ``` Execute these Bash commands to download the index config and create the `gh-archive` index. ```bash # Download GH Archive index config. wget -O gh-archive.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/gh-archive/index-config.yaml # Create index. ./quickwit index create --index-config gh-archive.yaml ``` ## Create and populate Kinesis stream Now, let's create a Kinesis stream and load some events into it. :::tip This step may be fairly slow depending on how much bandwidth is available. The current command limits the volume of data to ingest by taking the first 10 000 lines of every single file downloaded from the GH Archive. If you have enough bandwidth, you can remove it to ingest the whole set of files. You can also speed things up by increasing the number of shards and/or the number of jobs launched by `parallel` (`-j` option). ::: ```bash # Create a stream named `gh-archive` with 3 shards. aws kinesis create-stream --stream-name gh-archive --shard-count 8 # Download a few GH Archive files. wget https://data.gharchive.org/2022-05-12-{10..12}.json.gz # Load the events into Kinesis stream gunzip -c 2022-05-12*.json.gz | \ head -n 10000 | \ parallel --gnu -j8 -N 500 --pipe \ 'jq --slurp -c "{\"Records\": [.[] | {\"Data\": (. | tostring), \"PartitionKey\": .id }], \"StreamName\": \"gh-archive\"}" > records-{%}.json && \ aws kinesis put-records --cli-input-json file://records-{%}.json --cli-binary-format raw-in-base64-out >> out.log' ``` ## Create Kinesis source ```yaml title="kinesis-source.yaml" # # Kinesis source config file. # version: 0.7 source_id: kinesis-source source_type: kinesis params: stream_name: gh-archive ``` Run these commands to download the source config file and create the source. ```bash # Download Kinesis source config. wget https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/gh-archive/kinesis-source.yaml # Create source. ./quickwit source create --index gh-archive --source-config kinesis-source.yaml ``` :::note If this command fails with the following error message: ``` Command failed: Stream gh-archive under account XXXXXXXXX not found. Caused by: 0: Stream gh-archive under account XXXXXXXX not found. 1: Stream gh-archive under account XXXXXXXX not found. ``` it means the Kinesis stream was not properly created in the previous step. ::: ## Launch indexing and search services Finally, execute this command to start Quickwit in server mode. ```bash # Launch Quickwit services. ./quickwit run ``` Under the hood, this command spawns an indexer and a searcher. On startup, the indexer will connect to the Kinesis stream specified by the source and start streaming and indexing events from the shards composing the stream. With the default commit timeout value (see [indexing settings](../configuration/index-config#indexing-settings)), the indexer should publish the first split after approximately 60 seconds. You can run this command (in another shell) to inspect the properties of the index and check the current number of published splits: ```bash # Display some general information about the index. ./quickwit index describe --index gh-archive ``` It is also possible to get index information through the [Quickwit UI](http://localhost:7280/ui/indexes/gh-archive). Once the first split is published, you can start running search queries. For instance, we can find all the events for the Kubernetes [repository](https://github.com/kubernetes/kubernetes): ```bash curl 'http://localhost:7280/api/v1/gh-archive/search?query=org.login:kubernetes%20AND%20repo.name:kubernetes' ``` It is also possible to access these results through the [UI](http://localhost:7280/ui/search?query=org.login%3Akubernetes+AND+repo.name%3Akubernetes&index_id=gh-archive&max_hits=10). We can also group these events by type and count them: ``` curl -XPOST -H 'Content-Type: application/json' 'http://localhost:7280/api/v1/gh-archive/search' -d ' { "query":"org.login:kubernetes AND repo.name:kubernetes", "max_hits":0, "aggs":{ "count_by_event_type":{ "terms":{ "field":"type" } } } }' ``` ## Tear down resources (optional) Let's delete the files and resources created for the purpose of this tutorial. ```bash # Delete Kinesis stream. aws kinesis delete-stream --stream-name gh-archive # Delete index. ./quickwit index delete --index gh-archive # Delete source config. rm kinesis-source.yaml ``` This concludes the tutorial. If you have any questions regarding Quickwit or encounter any issues, don't hesitate to ask a [question](https://github.com/quickwit-oss/quickwit/discussions) or open an [issue](https://github.com/quickwit-oss/quickwit/issues) on [GitHub](https://github.com/quickwit-oss/quickwit) or contact us directly on [Discord](https://discord.com/invite/MT27AG5EVE). ================================================ FILE: docs/ingest-data/pulsar.md ================================================ --- title: Pulsar description: A short tutorial describing how to set up Quickwit to ingest data from Pulsar in a few minutes tags: [pulsar, integration] icon_url: /img/tutorials/pulsar.svg sidebar_position: 3 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; In this tutorial, we will describe how to set up Quickwit to ingest data from Pulsar in a few minutes. First, we will create an index and configure a Pulsar source. Then, we will create a Pulsar topic and load some events from the [Stack Overflow dataset](https://www.kaggle.com/stackoverflow/stacksample) into it. Finally, we will execute some searches. ## Prerequisites You will need the following to complete this tutorial: - A local running [Quickwit instance](/docs/get-started/installation.md) - A local running [Pulsar instance](https://pulsar.apache.org/docs/next/getting-started-standalone/) ### Quickwit setup [Download](/docs/get-started/installation.md) Quickwit and start a server. Then open a new terminal to execute CLI commands with the same binary. ```bash ./quickwit run ``` Test that the cluster is running: ```bash ./quickwit index list ``` ### Pulsar setup ```bash wget https://archive.apache.org/dist/pulsar/pulsar-2.11.0/apache-pulsar-2.11.0-bin.tar.gz tar xvfz apache-pulsar-2.11.0-bin.tar.gz cd apache-pulsar-2.11.0 bin/pulsar standalone ``` ```bash docker run -it -p 6650:6650 -p 8080:8080 apachepulsar/pulsar:2.11.0 bin/pulsar standalone ``` See the details on the [official documentation](https://pulsar.apache.org/docs/next/getting-started-docker/). ## Prepare Quickwit First, let's create a new index. Here is the index config and doc mapping corresponding to the schema of Stack Overflow posts: ```yaml title="index-config.yaml" # # Index config file for Stack Overflow dataset. # version: 0.7 index_id: stackoverflow doc_mapping: field_mappings: - name: user type: text fast: true tokenizer: raw - name: tags type: array fast: true tokenizer: raw - name: type type: text fast: true tokenizer: raw - name: title type: text tokenizer: default record: position stored: true - name: body type: text tokenizer: default record: position stored: true - name: questionId type: u64 - name: answerId type: u64 - name: acceptedAnswerId type: u64 - name: creationDate type: datetime fast: true input_formats: - rfc3339 fast_precision: seconds timestamp_field: creationDate search_settings: default_search_fields: [title, body] indexing_settings: commit_timeout_secs: 10 ``` Execute these Bash commands to download the index config and create the `stackoverflow` index. ```bash # Download stackoverflow index config. wget -O stackoverflow.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/stackoverflow/index-config.yaml # Create index. ./quickwit index create --index-config stackoverflow.yaml ``` ## Create the Pulsar source A Pulsar source just needs to define the list of topics and the instance address. ```yaml title="pulsar-source.yaml" # # Pulsar source config file. # version: 0.7 source_id: pulsar-source source_type: pulsar params: topics: - stackoverflow address: pulsar://localhost:6650 ``` Run these commands to download the source config file and create the source. ```bash # Download Pulsar source config. wget -O stackoverflow-pulsar-source.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/stackoverflow/pulsar-source.yaml # Create source. ./quickwit source create --index stackoverflow --source-config stackoverflow-pulsar-source.yaml ``` As soon as the Pulsar source is created, Quickwit control plane will ask an indexer to start a new indexing pipeline. You will see logs like below by looking on the indexer: ```bash INFO spawn_pipeline{index=stackoverflow gen=0}:pulsar-consumer{subscription_name="quickwit-stackoverflow-pulsar-source" params=PulsarSourceParams { topics: ["stackoverflow"], address: "pulsar://localhost:6650", consumer_name: "quickwit", authentication: None } current_positions={}}: quickwit_indexing::source::pulsar_source: Seeking to last checkpoint positions. positions={} ``` ## Create and populate a Pulsar topic We will use the Pulsar's default tenant/namespace `public/default`. To populate the topic, we will use a python script: ```python title=send_messages_to_pulsar.py import json import pulsar client = pulsar.Client('pulsar://localhost:6650') producer = client.create_producer('public/default/stackoverflow') with open('stackoverflow.posts.transformed-10000.json', encoding='utf8') as file: for i, line in enumerate(file): producer.send(line.encode('utf-8')) if i % 100 == 0: print(f"{i}/10000 messages sent.", i) client.close() ``` Install locally the python client, more details on [documentation page](https://pulsar.apache.org/docs/2.11.x/client-libraries-python/): ```bash # Download the first 10_000 Stackoverflow posts articles. curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json # Install pulsar python client. # Requires a python version < 3.11 pip3 install 'pulsar-client==2.10.1' wget https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/stackoverflow/send_messages_to_pulsar.py python3 send_messages_to_pulsar.py ``` ## Time to search! You can run this command to inspect the properties of the index and check the current number of published splits and documents: ```bash # Display some general information about the index. ./quickwit index describe --index stackoverflow ``` You will notably see the number of published documents. You are now ready to execute some queries. ```bash curl 'http://localhost:7280/api/v1/stackoverflow/search?query=search+AND+engine' ``` If your Quickwit server is local, you can access to the results through Quickwit UI on [localhost:7280](http://localhost:7280/ui/search?query=&index_id=stackoverflow&max_hits=10). ## Tear down resources (optional) Let's delete the files and resources created for the purpose of this tutorial. ```bash # Delete quickwit index. ./quickwit index delete --index stackoverflow --yes # Delete Pulsar topic. bin/pulsar-admin topics delete stackoverflow ``` This concludes the tutorial. If you have any questions regarding Quickwit or encounter any issues, don't hesitate to ask a [question](https://github.com/quickwit-oss/quickwit/discussions) or open an [issue](https://github.com/quickwit-oss/quickwit/issues) on [GitHub](https://github.com/quickwit-oss/quickwit) or contact us directly on [Discord](https://discord.com/invite/MT27AG5EVE). ================================================ FILE: docs/ingest-data/sqs-files.md ================================================ --- title: S3 with SQS notifications description: A short tutorial describing how to set up Quickwit to ingest data from S3 files using an SQS notifier tags: [s3, sqs, integration] icon_url: /img/tutorials/file-ndjson.svg sidebar_position: 5 --- In this tutorial, we describe how to set up Quickwit to ingest data from S3 with bucket notification events flowing through SQS. We will first create the AWS resources (S3 bucket, SQS queue, notifications) using terraform. We will then configure the Quickwit index and file source. Finally we will send some data to the source bucket and verify that it gets indexed. ## AWS resources The complete terraform script can be downloaded [here](../assets/sqs-file-source.tf). First, create the bucket that will receive the source data files (NDJSON format): ``` resource "aws_s3_bucket" "file_source" { bucket_prefix = "qw-tuto-source-bucket" } ``` Then setup the SQS queue that will carry the notifications when files are added to the bucket. The queue is configured with a policy that allows the source bucket to write the S3 notification messages to it. Also create a dead letter queue (DLQ) to receive the messages that couldn't be processed by the file source (e.g corrupted files). Messages are moved to the DLQ after 5 indexing attempts. ``` locals { sqs_notification_queue_name = "qw-tuto-s3-event-notifications" } data "aws_iam_policy_document" "sqs_notification" { statement { effect = "Allow" principals { type = "*" identifiers = ["*"] } actions = ["sqs:SendMessage"] resources = ["arn:aws:sqs:*:*:${local.sqs_notification_queue_name}"] condition { test = "ArnEquals" variable = "aws:SourceArn" values = [aws_s3_bucket.file_source.arn] } } } resource "aws_sqs_queue" "s3_events_deadletter" { name = "${locals.sqs_notification_queue_name}-deadletter" } resource "aws_sqs_queue" "s3_events" { name = local.sqs_notification_queue_name policy = data.aws_iam_policy_document.sqs_notification.json redrive_policy = jsonencode({ deadLetterTargetArn = aws_sqs_queue.s3_events_deadletter.arn maxReceiveCount = 5 }) } resource "aws_sqs_queue_redrive_allow_policy" "s3_events_deadletter" { queue_url = aws_sqs_queue.s3_events_deadletter.id redrive_allow_policy = jsonencode({ redrivePermission = "byQueue", sourceQueueArns = [aws_sqs_queue.s3_events.arn] }) } ``` Configure the bucket notification that writes messages to SQS each time a new file is created in the source bucket: ``` resource "aws_s3_bucket_notification" "bucket_notification" { bucket = aws_s3_bucket.file_source.id queue { queue_arn = aws_sqs_queue.s3_events.arn events = ["s3:ObjectCreated:*"] } } ``` :::note Only events of type `s3:ObjectCreated:*` are supported. Other types (e.g. `ObjectRemoved`) are acknowledged and a warning is logged. ::: The source needs to have access to both the notification queue and the source bucket. The following policy document contains the minimum permissions required by the source: ``` data "aws_iam_policy_document" "quickwit_node" { statement { effect = "Allow" actions = [ "sqs:ReceiveMessage", "sqs:DeleteMessage", "sqs:ChangeMessageVisibility", "sqs:GetQueueAttributes", ] resources = [aws_sqs_queue.s3_events.arn] } statement { effect = "Allow" actions = ["s3:GetObject"] resources = ["${aws_s3_bucket.file_source.arn}/*"] } } ``` Create the IAM user and credentials that will be used to associate this policy to your local Quickwit instance: ``` resource "aws_iam_user" "quickwit_node" { name = "quickwit-filesource-tutorial" path = "/system/" } resource "aws_iam_user_policy" "quickwit_node" { name = "quickwit-filesource-tutorial" user = aws_iam_user.quickwit_node.name policy = data.aws_iam_policy_document.quickwit_node.json } resource "aws_iam_access_key" "quickwit_node" { user = aws_iam_user.quickwit_node.name } ``` :::warning We don't recommend using IAM user credentials for running Quickwit nodes in production. This is just a simplified setup for the sake of the tutorial. When running on EC2/ECS, attach the policy document to an IAM roles instead. ::: Download the [complete terraform script](../assets/sqs-file-source.tf) and deploy it using `terraform init` and `terraform apply`. After a successful execution, the outputs required to configure Quickwit will be listed. You can display the values of the sensitive outputs (key id and secret key) with: ```bash terraform output quickwit_node_access_key_id terraform output quickwit_node_secret_access_key ``` ## Run Quickwit [Install Quickwit locally](/docs/get-started/installation), then in your install directory, run Quickwit with the necessary access rights by replacing the `` and `` with the matching Terraform output values: ```bash AWS_ACCESS_KEY_ID= \ AWS_SECRET_ACCESS_KEY= \ AWS_REGION=us-east-1 \ ./quickwit run ``` ## Configure the index and the source In another terminal, in the Quickwit install directory, create an index: ```bash cat << EOF > tutorial-sqs-file-index.yaml version: 0.7 index_id: tutorial-sqs-file doc_mapping: mode: dynamic indexing_settings: commit_timeout_secs: 30 EOF ./quickwit index create --index-config tutorial-sqs-file-index.yaml ``` Replacing `` with the corresponding Terraform output value, create a file source for that index: ```bash cat << EOF > tutorial-sqs-file-source.yaml version: 0.8 source_id: sqs-filesource source_type: file num_pipelines: 2 params: notifications: - type: sqs queue_url: message_type: s3_notification EOF ./quickwit source create --index tutorial-sqs-file --source-config tutorial-sqs-file-source.yaml ``` :::tip The `num_pipeline` configuration controls how many consumers will poll from the queue in parallel. Choose the number according to the indexer compute resources you want to dedicate to this source. As a rule of thumb, configure 1 pipeline for every 2 cores. ::: ## Ingest data We can now ingest data into Quickwit by uploading files to S3. If you have the AWS CLI installed, run the following command, replacing `` with the associated Terraform output: ```bash curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants-10000.json | \ aws s3 cp - s3:///hdfs-logs-multitenants-10000.json ``` If you prefer not to use the AWS CLI, you can also download the file and upload it manually to the source bucket using the AWS console. Wait approximately 1 minute and the data should appear in the index: ```bash ./quickwit index describe --index tutorial-sqs-file ``` ## Tear down the resources The AWS resources instantiated in this tutorial don't incur any fixed costs, but we still recommend deleting them when you are done. In the directory with the Terraform script, run `terraform destroy`. ================================================ FILE: docs/internals/backward-compatibility.md ================================================ # Backward compatibility in Quickwit. If you are reading this, chances are you want to make a change to one of the resource of Quickwit's meta/config. There are basically 3 types of configuration: Edited by the user and read back from file on startup: - QuickwitConfig Edited by the user then stored in the metastore: - IndexConfig - SourceConfig - VersionedIndexTemplate Assembled by Quickwit then stored in the metastore: - IndexMetadata - SplitMetadata - FileBackedIndex (file backed metastore only) - Manifest (file backed metastore only) Quickwit currently manages the backward compatibility of all of these resources except the `QuickwitConfig`. This document describes how to handle a change, and how to make test such a change, and spot eventual regression. ## How do I update `{IndexMetadata, SplitMetadata, FileBackedIndex, SourceConfig, IndexConfig, Manifest}`? There are two types of upgrades: - naturally backward compatible change - change requiring a new version ### Naturally backward compatible change Serde offers some attributes to make backward compatible changes to our model. For instance, it is possible to add a new field to a struct and slap a `serde(default)` attribute to it in order to handle older serialized version of the struct. If you want to avoid to generate any diff on the non-regression json files, you can also avoid use `#[serde(skip_serializing_if)]`, although by default, it is recommended to not use it. It is also possible to rename a field in a backward compatible manner by using the `#[serde(alias)]`. For this type of change it is not required to update the serialization version. Nevertheless, the regression tests will spot these changes. When that happens: - modify your model with the help of the attributes above. - modify the example for the model by editing its `TestableForRegression` trait implementation. - run the backward compatibility tests (see below) - check the diff between the `xxx.modified.json` files created and the matching `xxx.json` files. If the changes are acceptable, replace the content of the `xxx.json` files and commit them. Be particularly careful to changes on files corresponding to the most recent version. If the changes are not compatible, create a new configuration version. ### Change requiring a new version For changes requiring a new version, you will have to increment the configuration version. You need to make sure that all of these resources share the same version number. - update the resource struct you want to change. - create a new item in the `VersionedXXXX` struct. It is usually located in a serialize.rs file - `Serialize` is not needed for the previous serialized version. We just need `Deserialize`. We can remove the `Serialize` impl from the derive statement, and mark it a `skip_serializing` as follows. e.g. ``` #[serde(tag = "version")] pub(crate) enum VersionedXXXXXX { #[serde(rename = "0")] V0(#[serde(skip_serializing)] XXXX_V0), #[serde(rename = "1")] V1(XXXX_V1), } ``` - complete the conversion `From for XXXX` and `From for VersionedXXXX` - run the backward compatibility tests (see below) - for older versions, check the diff between the `xxx.expected.modified.json` files created and the matching `xxx.expected.json` files. If the changes are acceptable, replace the content of the `xxx.expected.json` files and commit them. - check the `yyyy.json` that was created for the new version and commit it along with the `yyyy.expected.json` file (identical). - possibly update the generation of the default XXXX instance used for regression. It is in the function `TestableForRegression::sample_for_regression`. ## Backward compatibility tests These tests are used to ensure the backward compatibility of Quickwit. Right now, `SplitMetadata`, `IndexMetadata`, `Manifest` and `FileBackedIndex` are tested. We want to be able to read all past versions of these files, but only write the most recent format. The tests consist of pairs of JSON files, `XXXX.json` and `XXXX.expected.json`: - `XXXX.json` is the first serialized value of a new version. - `XXXX.expected.json` is the result of `serialize_new_version(deserialize(XXXX.json))`. Format changes are automatically detected. There are two possible situations when a format changes. #### Updating expected.json We need to keep `*.expected.json` files up-to-date with the format changes. This is done in a semi-automatic fashion. Checks are performed in two steps: - first pass, `deserialize(original_json) == deserialize(expectation_json)` - second pass, `expectation_json = serialize(deserialize(expectation_json))` When changing the json format, it is expected to see this test fail. The unit test then updates automatically the `expected.json`. The developer just has to check the diff of the result (in particular no information should be lost) and commit the updated expected.json files. Adding this update operation within the unit test is a tad unexpected, but it has the merit of integrating well with CI. If a developer forgets to update the expected.json file, the CI will catch it. #### Adding a new test case. If the serialization format changes, a new version should be created and the unit test will automatically add a new unit test generated from the sample tested objects. Concretely, it will just write two files `XXXX.json` and `XXXX.expected.json` for each model. The two files will be identical. This is expected as this is a unit test for the most recent version. The unit test will start making sense in future updates thanks to the update phase described in the previous section. ================================================ FILE: docs/internals/date-time.md ================================================ # Datetime format Quickwit's DateTime is a wrapper around Tantivy's provided DateTime type which is internally represented as an `i64` microseconds value. For optimization reasons, Tantivy stores the value differently at the following locations: - DocStore: Dates are stored as they are received from the input document. - TermDict: Dates are stored with `seconds` precision. - FastField: Dates are stored using the DateTime type configured precision that can take of the following values: `seconds`, `milliseconds`, `microseconds`. ================================================ FILE: docs/internals/ingest-v2.md ================================================ # Ingest V2 Ingest V2 is the latest ingestion API that is designed to be more efficient and scalable for thousands of indexes than the previous version. It is the default since 0.9. ## Architecture Just like ingest V1, the new ingest uses [`mrecordlog`](https://github.com/quickwit-oss/mrecordlog) to persist ingested documents that are waiting to be indexed. But unlike V1, which always persists the documents locally on the node that receives them, ingest V2 can dynamically distribute them into WAL units called _shards_. The assigned shard can be local or on another indexer. The control plane is in charge of distributing the shards to balance the indexing work as well as possible across all indexer nodes. The progress within each shard is not tracked as an index metadata checkpoint anymore but in a dedicated metastore `shards` table. In the future, the shard based ingest will also be capable of writing a replica for each shard, thus ensuring a high durability of the documents that are waiting to be indexed (durability of the indexed documents is guarantied by the object store). ## Toggling between ingest V1 and V2 Variables driving the ingest configuration are documented [here](../ingest-data/ingest-api.md#ingest-api-versions). With ingest V2, you can also activate the `enable_cooperative_indexing` option in the indexer configuration. This setting is useful for deployments with very large numbers (dozens) of actively written indexers, to limit the indexing workbench memory consumption. The indexer configuration is in the node configuration: ```yaml version: 0.8 # [...] indexer: enable_cooperative_indexing: true ``` See [full configuration example](https://github.com/quickwit-oss/quickwit/blob/main/config/quickwit.yaml). ## Differences between ingest V1 and V2 - V1 uses the `queues/` directory whereas V2 uses the `wal/` directory - both V1 and V2 are configured with: - `ingest_api.max_queue_memory_usage` - `ingest_api.max_queue_disk_usage` - but ingest V2 can also be configured with: - `ingest_api.replication_factor`, not working yet - ingest V1 always writes to the WAL of the node receiving the request, V2 potentially forwards it to another node, dynamically assigned by the control plane to distribute the indexing work more evenly. - ingest V2 parses and validates input documents synchronously. Schema and JSON formatting errors are returned in the ingest response (for ingest V1 those errors were available in the server logs only). ================================================ FILE: docs/internals/scroll.md ================================================ # Scroll API The scroll API has been implemented to offer compatibility with ElasticSearch. The API and the implementation are quirky and are detailed in this document. ## API description You can find information about the scroll API here. https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-results https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html The user runs a regular search request with a `scroll` param. The search result then contains the normal response, but a `_scroll` property is added to the search body. That id is then meant to be sent to a scroll rest API. This API successive calls will then return pages incrementally. ## Quirk and difficulty. The scrolled results should be consistent with the state of the original index. For this reason we need to capture the state of the index at the point of the original request. If a network error happens between the client and the server at page N, there is no way for the client to ask the reemission of page N. Page N+1 will be returned on the next call. ## Implementation Server side, we store a replicated scroll context. It contains: - the detail about the original query (we need to be able to reemit paginated queries) - the "point-in-time" list of split metadatas used for the query - a cached list of partial docs (= not the doc content, just its address and its score) to avoid performing search over and over. - the total number of results, in order to append that information to our response. searching at every single scroll requests. We use a simple leaderless KV store to keep the state required to run the scroll API. We generate a scroll ULID and use it to get a list of the servers with the best affinity according to rendez vous hashing. We then go through them in order and attempt to put that key on up to 2 servers. Failures for these PUTs are silent. For each call to scroll, one of two things can happen: - the partial docs for the page requested is in the partial doc cache. We just run the fetch_docs phase, and update the context with the `start_offset`. - the partial docs for the page request are not in the partial doc cache. We then run a new search query. We attempt to fetch `SCROLL_BATCH_LEN` in order to fill the partial doc address cache for subsequent calls. # A strange `scroll_id`. The elasticsearch API is needlessly broken as it returns the same scroll_id most of the time. The "page-change" mutation is something that happens on the server side. In quickwit on the other hand, the scroll id is the concatenation of the - ULID: used as the address for the search context. - the start_offset. - the number of hits per page - a search_after key We only mutate the state server side to update the cache whenever needed. The idea here is that if that if the put request failed, we can still return the right results even if we have an obsolete version of the `ScrollContext`. # Quickwit implementation (improvement, quirks and shortcuts) We do not do explicitly protect the split from our store Point-In-Time information from deletion. Instead we simply rely on the existing grace period mechanism (a split only is effectively garbage collected 32mn after it is marked as deleted). For this reason we limit the scroll period to 30mn and subsequent scroll calls do not extend the scroll period. Also thanks to this period, we do not add any extra replication repair mechanism. Some scroll calls will end up being broken if we were to remove 2 servers within 30mn. Quickwit caches partial hits in batches of 1000 results. Querying page N leverages `search_after`, so that accessing further pages isn't more costly than accessing the first ones. ================================================ FILE: docs/internals/searcher-split-cache.md ================================================ # Searcher split cache Quickwit includes a split cache. It can be useful for specific workloads: - to improve performance - to reduce the cost associated with GET requests. The split cache stores entire split files on disk. It works under the following configurable constraints: - number of concurrent downloads - amount of disk space - number of on-disk files. Searcher get tipped by indexers about the existence of splits (for which they have the best affinity). They also might learn about split existence, upon read requests. The searcher is then in charge of maintaining an in-memory data structure with a bounded list of splits it knows about and their score. The current strategy for admission/evicton is a simple LRU logic. If the most recently accessed split not already in cache has been accessed, we consider downloading it. If the limits have been reached, we only proceed to eviction if one of the split currently in cache has been less recently accessed. ================================================ FILE: docs/internals/sorting.md ================================================ # Sorting Quickwit can sort results based on fastfield values or score. This document discuss where and how it happens. It also tries to describe optimizations that may be enabled (but are not necessarily implemented) by this behavior. ## Behavior Sorting is controlled by the `sort_by` query parameter. It accepts a comma separated list of fields to use for sorting. Sorting is Descending by default. The sorting order can be reversed by prefixing a field name with a hyphen `-`. The special value `_score` means sorting by score, it is also Descending by default. In case of equality between two documents, the GlobalDocId, composed of (SplitId, SegmentId, DocId) is used as a tie breaker. It is used to sort in the same order as the first field being sorted by. This means it is in Descending order by default. If a document doesn't have a value for a sorting field, that document is considered to go after any document which has a value, independently of sort order. That is, when sorting the value 1,2 and None, ascending sort would give `[1, 2, None]`, and descending sort would give `[2, 1, None]`. If a client does not request sorting, documents are sorted using (SplitId, SegmentId, DocId), on Descending order. In other words, everything happens as if documents were sorted by a constant value. # Code A new structure TopK is introduced which is used both for in-split sorting and for merging of results. It reduces the risks of inconsistencies between in-split and between-split behavior. `SortOrder` gets new `compare` and `compare_opt` method which can be used to compare two values with respect to the particular sort order required, and with proper handling of the `None` special case. # Optimization permitted Both orders allow an optimization when sorting by date (either direction), by leveraging splits meta-data to know in advance if a split can, or not, contain better results. Changing the sorting order for "not sorted" queries allows to leverage SplitId as a way to know whether a split can contain or not better results (if its SplitId is more/less than the current worst best-hit, the split does not need to be searched). These optimization have limited to no impact if we give an exact count of matching documents. An option to request only a lower bound would be required for these optimizations to make sense. ================================================ FILE: docs/internals/split-format.md ================================================ # Split format Quickwit's index are divided into small independent immutable piece of index called split. For convenience, a split consists in a single file, with the extension `.split`. In reality, this file hides an internal mini static filesystem, with: - the Tantivy index files (`.idx`, `.pos`, `.term`...) - a Quickwit specific file with the list of fields, including those indexed as part of a JSON type. It contains the field name, type and capabilities. The split file data layout looks like this: - concatenation all of the files in the split - a footer The footer follows the following format. - a json object called `BundleStorageFileOffsets` containing the `[start, end)` byte-offsets of all files. - the length of this json (8 bytes little endian) - a hotcache, a small static cache that contains some important file sections. - the length of this hotcache (8 bytes little endian) This footer plays a key role a very important role in quickwit. It packs in one read all of the information required to open a split. When opening a file from a distant storage, Quickwit's metastore stores the byte offsets of this footer to make this read possible. If this footer offset information is not available, for instance if the split is just a file on the filesystem, it is still possible to open it by reading the last 8 bytes of the split (encoding the length of the hotcache), deducing the position of the meta information and unpacking this in turn. ================================================ FILE: docs/internals/template-index.md ================================================ # Index template API Index templates are a way to create indexes automatically with some given configuration when Quickwit receives documents for an index that doesn't exist yet. Example of templates: [https://github.com/quickwit-oss/quickwit/tree/main/config/templates](https://github.com/quickwit-oss/quickwit/tree/main/config/templates). # Curl to run to use the REST API to create Stackoverflow template ```bash curl -XPOST -H 'Content-Type: application/yaml' 'http://localhost:7280/api/v1/templates' --data-binary @config/templates/stackoverflow.yaml # Lists templates. curl 'http://localhost:7280/api/v1/templates' # Update Stackoverflow template. curl -XPUT -H 'Content-Type: application/yaml' 'http://localhost:7280/api/v1/templates/stackoverflow' --data-binary @config/templates/stackoverflow.yaml # Download dataset. curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json # Ingest 10k docs into `stackoverflow-foo` index. curl -XPOST "http://127.0.0.1:7280/api/v1/stackoverflow-foo/ingest" --data-binary @stackoverflow.posts.transformed-10000.json # Ingest 10k docs into `stackoverflow-bar` index. curl -XPOST "http://127.0.0.1:7280/api/v1/stackoverflow-bar/ingest" --data-binary @stackoverflow.posts.transformed-10000.json # Delete Stackoverflow template. curl -XDELETE 'http://localhost:7280/api/v1/templates/stackoverflow' ```bash ================================================ FILE: docs/log-management/_category_.yaml ================================================ label: 'Log management' position: 5 collapsed: true ================================================ FILE: docs/log-management/otel-service.md ================================================ --- title: OTEL service sidebar_position: 4 --- Quickwit natively supports the [OpenTelemetry Protocol (OTLP)](https://opentelemetry.io/docs/reference/specification/protocol/otlp/) and provides a gRPC endpoint to receive spans from an OpenTelemetry collector. This endpoint is enabled by default. When enabled, Quickwit will start the gRPC service ready to receive logs from an OpenTelemetry collector. The logs are indexed in the `otel-logs-v0_7` index by default, and this index will be automatically created if not present. The index doc mapping is described in the next [section](#trace-and-span-data-model). If for any reason, you want to disable this endpoint, you can: - Set the `QW_ENABLE_OTLP_ENDPOINT` environment variable to `false` when starting Quickwit. - Or [configure the node config](/docs/configuration/node-config.md) by setting the indexer setting `enable_otlp_endpoint` to `false`. ```yaml title=node-config.yaml # ... Indexer configuration ... indexer: enable_otlp_endpoint: false ``` ## Sending logs in your own index You can send logs in the index of your choice by setting the header `qw-otel-logs-index` of your gRPC request to the targeted index ID. ## OpenTelemetry logs data model Quickwit sends OpenTelemetry logs into the `otel-logs-v0_7` index by default which is automatically created if you enable the OpenTelemetry service. The doc mapping of this index described below is derived from the [OpenTelemetry logs data model](https://opentelemetry.io/docs/reference/specification/logs/data-model/). ```yaml version: 0.7 index_id: otel-logs-v0_7 doc_mapping: mode: strict field_mappings: - name: timestamp_nanos type: datetime input_formats: [unix_timestamp] output_format: unix_timestamp_nanos indexed: false fast: true fast_precision: milliseconds - name: observed_timestamp_nanos type: datetime input_formats: [unix_timestamp] output_format: unix_timestamp_nanos - name: service_name type: text tokenizer: raw fast: true - name: severity_text type: text tokenizer: raw fast: true - name: severity_number type: u64 fast: true - name: body type: json tokenizer: default - name: attributes type: json tokenizer: raw fast: true - name: dropped_attributes_count type: u64 indexed: false - name: trace_id type: bytes input_format: hex output_format: hex - name: span_id type: bytes input_format: hex output_format: hex - name: trace_flags type: u64 indexed: false - name: resource_attributes type: json tokenizer: raw fast: true - name: resource_dropped_attributes_count type: u64 indexed: false - name: scope_name type: text indexed: false - name: scope_version type: text indexed: false - name: scope_attributes type: json indexed: false - name: scope_dropped_attributes_count type: u64 indexed: false timestamp_field: timestamp_nanos indexing_settings: commit_timeout_secs: 10 search_settings: default_search_fields: [body.message] ``` ## UI Integration Currently, Quickwit provides a simplistic UI to get basic information from the cluster, indexes and search documents. If a simple UI is not sufficient for you and you need additional features, Grafana and Elasticsearch query API support are planned for Q2 2023, stay tuned! You can also send traces to Quickwit that you can visualize in Jaeger UI, as explained in the following [tutorial](../distributed-tracing/send-traces/using-otel-sdk-python.md). ## Known limitations There are a few limitations on the log management setup in Quickwit 0.9: - The ingest API does not provide High-Durability. This will be fixed in 0.10. - OTLP HTTP is only available with the Binary Protobuf Encoding. OTLP HTTP with JSON encoding is not planned yet, but this can be easily fixed in the next version. Please open an issue if you need this feature. If you are interested in new features or discover other limitations, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit). ================================================ FILE: docs/log-management/overview.md ================================================ --- title: Log management with Quickwit sidebar_label: Overview sidebar_position: 1 --- Quickwit is built from the ground up to [efficiently index unstructured data](../guides/schemaless.md), and search it effortlessly on cloud storage. Moreover, Quickwit supports OpenTelemetry gRPC and HTTP (protobuf only) protocols out of the box and provides a REST API ready to ingest any JSON formatted logs. **This makes Quickwit a perfect fit for logs!**. ![Quickwit Log Management](../assets/images/log-management-overview-light.svg#gh-light-mode-only)![Quickwit Log Management](../assets/images/log-management-overview-dark.svg#gh-dark-mode-only) ## Sending logs to Quickwit - [Using OTEL collector](send-logs/using-otel-collector.md) - [Using OTEL collector with Helm](send-logs/using-otel-collector-with-helm.md) - [Using Fluentbit](send-logs/using-fluentbit.md) - [Using Vector](send-logs/using-vector.md) ================================================ FILE: docs/log-management/send-logs/_category_.yaml ================================================ label: 'Sending logs' position: 2 collapsed: false ================================================ FILE: docs/log-management/send-logs/send-docker-logs.md ================================================ --- title: Send docker logs into Quickwit sidebar_label: Docker logs into Quickwit description: Send docker logs into Quickwit tags: [otel, docker, collector, log] sidebar_position: 5 --- To send docker container logs into Quickwit, you just need to setup an OpenTelemetry Collector with the file logs receiver. In this tutorial, we will use `docker compose` to start the collector and Quickwit. You only need a minute to get your Quickwit log UI! ![Quickwit UI Logs](../../assets/images/screenshot-quickwit-ui-docker-compose-logs.png) ## OTEL collector configuration The following collector configuration will collect docker logs in `/var/lib/docker/containers/*/*-json.log` (depending on your system, log files can be at a different location), add a few attributes and send them to Quickwit through gRPC at `http://quickwit:7281`. ```yaml title="otel-collector-config.yaml" receivers: filelog: include: - /var/lib/docker/containers/*/*-json.log operators: - id: parser-docker timestamp: layout: '%Y-%m-%dT%H:%M:%S.%LZ' parse_from: attributes.time type: json_parser - field: attributes.time type: remove - id: extract_metadata_from_docker_tag parse_from: attributes.attrs.tag regex: ^(?P[^\|]+)\|(?P[^\|]+)\|(?P[^$]+)$ type: regex_parser if: 'attributes?.attrs?.tag != nil' - from: attributes.name to: resource["docker.container.name"] type: move if: 'attributes?.name != nil' - from: attributes.image_name to: resource["docker.image.name"] type: move if: 'attributes?.image_name != nil' - from: attributes.id to: resource["docker.container.id"] type: move if: 'attributes?.id != nil' - from: attributes.log to: body type: move processors: batch: timeout: 5s exporters: otlp/qw: endpoint: quickwit:7281 tls: insecure: true service: pipelines: logs: receivers: [filelog] processors: [batch] exporters: [otlp/qw] ``` ## Start the OTEL collector and a Quickwit instance Let's use `docker compose` with the following configuration: ```yaml title="docker-compose.yaml" version: "3" x-default-logging: &logging driver: "json-file" options: max-size: "5m" max-file: "2" tag: "{{.Name}}|{{.ImageName}}|{{.ID}}" services: quickwit: image: quickwit/quickwit:${QW_VERSION:-0.8.1} volumes: - ./qwdata:/quickwit/qwdata ports: - 7280:7280 environment: - NO_COLOR=true command: ["run"] logging: *logging otel-collector: user: "0" # Needed to access the directory /var/lib/docker/containers/ image: otel/opentelemetry-collector-contrib:${OTEL_VERSION:-0.87.0} volumes: - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml - /var/lib/docker/containers:/var/lib/docker/containers:ro command: ["--config=/etc/otel-collector-config.yaml"] logging: *logging ``` You will notice the custom `logging`, the OTEL collector will use that additional information to enrich the logs. ## Run it and search Download the configuration files and start the containers: ```bash mkdir qwdata docker compose up ``` After a few seconds, you will see the logs in the Quickwit UI [http://localhost:7280](http://localhost:7280). Here is what it should look like: ```json { "attributes": { "log.file.name": "34ad1a84c71de1d29ad75f99b56d01205e2976440f2398734037151ba2bcde1a-json.log", "stream": "stdout" }, "body": { "message": "2023-10-23T16:39:57.892 INFO --- [ asgi_gw_1] localstack.request.aws : AWS s3.ListObjects => 200\n" }, "observed_timestamp_nanos": 1698079197979435000, "service_name": "unknown_service", "severity_number": 0, "timestamp_nanos": 1698079197892726000, "trace_flags": 0 } ``` ## Troubleshooting It's possible that you get no logs in the UI. In this case, check the `docker compose` logs. The problem can typically come from a wrong configuration of the OTEL collector. ================================================ FILE: docs/log-management/send-logs/using-fluentbit.md ================================================ --- title: Send logs using Fluentbit sidebar_label: Using Fluentbit description: A simple tutorial to send logs from Fluentbit to Quickwit in a few minutes. icon_url: /img/tutorials/fluentbit-logo.png tags: [logs, ingestion] sidebar_position: 4 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; [Fluent Bit](https://fluentbit.io/) is an open-source logging and metrics processor and forwarder to multiple destinations. In this guide, we will show you how to connect it to Quickwit. ## Prerequisites - [Install Quickwit](/docs/get-started/installation.md) - Start a Quickwit instance with `./quickwit run` - [Install Fluentbit](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit) ## Create a simple index for Fluentbit logs Let's create a schemaless index with only one field `timestamp`. The mode `dynamic` indicates that Quickwit will index all fields even if they are not defined in the doc mapping. ```yaml title="index-config.yaml" version: 0.7 index_id: fluentbit-logs doc_mapping: mode: dynamic field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast: true timestamp_field: timestamp indexing_settings: commit_timeout_secs: 10 ``` ```bash curl -o fluentbit-logs.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/fluentbit-logs/index-config.yaml ``` And then create the index with `cURL` or the `CLI`: ```bash curl -XPOST http://localhost:7280/api/v1/indexes -H "content-type: application/yaml" --data-binary @fluentbit-logs.yaml ``` ```bash ./quickwit index create --index-config fluentbit-logs.yaml ``` ## Setup Fluentbit Fluentbit configuration file is made of inputs and outputs. For this tutorial, we will use a dummy configuration: ``` title=fluent-bit.conf [INPUT] Name dummy Tag dummy.log [OUTPUT] Name http Match * URI /api/v1/fluentbit-logs/ingest Host localhost Port 7280 tls Off Format json_lines Json_date_key timestamp Json_date_format epoch ``` Fluentbit will send `dummy` logs to Quickwit endpoint `/api/v1/fluentbit-logs/ingest`. Let's start Fluentbit. ```bash fluent-bit -c fluent-bit.conf ``` ## Search logs Quickwit is now ingesting logs coming from Fluentbit and you can search them either with `cURL` or by using the UI: - `curl "http://127.0.0.1:7280/api/v1/fluentbit-logs/search?query=severity:DEBUG"` - Open your browser at `http://127.0.0.1:7280/ui/search?query=severity:DEBUG&index_id=fluentbit-logs&max_hits=10`. ## Further improvements You will soon be able to do aggregations on dynamic fields (planned for 0.7). ================================================ FILE: docs/log-management/send-logs/using-otel-collector-with-helm.md ================================================ --- title: Send K8s logs using OTEL collector sidebar_label: Using OTEL with Helm description: Send K8s logs with OTEL collectors and Helm to Quickwit in a few minutes. tags: [k8s, helm] icon_url: /img/tutorials/helm-otel-k8s-tutorial-illustation.jpg sidebar_position: 2 --- This guide will help you to unlock log search on your k8s cluster logs. We will first deploy Quickwit and OTEL collectors with [Helm](https://helm.sh/) and then see how to index and search them. ## Prerequisites You will need the following to complete this tutorial: - A Kubernetes cluster. - The command line tool [kubectl](https://kubernetes.io/docs/reference/kubectl/). - The command line tool [Helm](https://helm.sh/). - An access to an object storage like AWS S3, GCS, Azure blob storage, or Scaleway to store index data. ## Install with Helm Let's first create a namespace to isolate our experiment and set it as the default namespace. ```bash kubectl create namespace qw-tutorial kubectl config set-context --current --namespace=qw-tutorial ``` Then let's add [Quickwit](https://github.com/quickwit-oss/helm-charts) and [Otel](https://github.com/open-telemetry/opentelemetry-helm-charts) helm repositories: ```bash helm repo add quickwit https://helm.quickwit.io helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts ``` You should now see the two repos in helm: ```bash helm repo list NAME URL quickwit https://helm.quickwit.io open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts ``` ### Deploy Quickwit Let's create a basic chart configuration: ```bash export AWS_REGION=us-east-1 export AWS_ACCESS_KEY_ID=XXXX export AWS_SECRET_ACCESS_KEY=XXXX export DEFAULT_INDEX_ROOT_URI=s3://your-bucket/indexes ``` ```bash # Create Quickwit config file. echo " searcher: replicaCount: 1 indexer: replicaCount: 1 metastore: replicaCount: 1 janitor: enabled: true control_plane: enabled: true environment: # Remove ANSI colors. NO_COLOR: 1 # Quickwit configuration config: storage: s3: region: ${AWS_REGION} access_key_id: ${AWS_ACCESS_KEY_ID} secret_access_key: ${AWS_SECRET_ACCESS_KEY} # If you are not on AWS S3, you can define a flavor (gcs, minio, garage...) # and additional variables for your object storage. # flavor: gcs # endpoint: https://storage.googleapis.com # Metastore on S3. metastore_uri: ${DEFAULT_INDEX_ROOT_URI} default_index_root_uri: ${DEFAULT_INDEX_ROOT_URI} # Indexer settings indexer: # By activating the OTEL service, Quickwit will be able # to receive gRPC requests from OTEL collectors. enable_otlp_endpoint: true " > qw-tutorial-values.yaml ``` Before installing Quickwit chart, make sure you have access to S3 and that you did not make a typo in the `default_index_root_uri`. This can be easily done with `aws` CLI with a simple `ls`: ```bash aws s3 ls ${DEFAULT_INDEX_ROOT_URI} ``` If the CLI did not return an error, you are ready to install the chart: ```bash helm install quickwit quickwit/quickwit -f qw-tutorial-values.yaml ``` In a few moments, you will see the pods running Quickwit services: ```bash kubectl get pods NAME READY STATUS RESTARTS AGE quickwit-control-plane-7fc495f4c4-slqv4 1/1 Running 2 (84s ago) 87s quickwit-indexer-0 1/1 Running 2 (84s ago) 87s quickwit-janitor-7f75f4bc8-jrfv6 1/1 Running 2 (84s ago) 87s quickwit-metastore-6989978fc-9s82j 1/1 Running 2 (85s ago) 87s quickwit-searcher-0 1/1 Running 2 (84s ago) 87s ``` Let's check Quickwit is working: ```bash kubectl port-forward svc/quickwit-searcher 7280 ``` Then open your browser `http://localhost:7280/ui/indexes`. You should see the list of indexes. If everything is fine, keep the kubectl command running and open a new terminal. ### Deploy OTEL collectors We need to configure a bit the collectors in order to: - collect logs from k8s - enrich the logs with k8s attributes - export the logs to Quickwit indexer. ```bash echo " mode: daemonset presets: logsCollection: enabled: true kubernetesAttributes: enabled: true config: exporters: otlp: endpoint: quickwit-indexer.qw-tutorial.svc.cluster.local:7281 tls: insecure: true # By default, logs are sent to the otel-logs-v0_7. # You can customize the index ID By setting this header. # headers: # qw-otel-logs-index: otel-logs-v0_7 service: pipelines: logs: exporters: - otlp " > otel-values.yaml ``` ``` helm install otel-collector open-telemetry/opentelemetry-collector -f otel-values.yaml ``` After a few seconds, you should see logs on your indexer that show indexing has started. It looks like this: ``` 2022-11-30T18:27:37.628Z INFO spawn_merge_pipeline{index=otel-log-v0 gen=0}: quickwit_indexing::actors::merge_pipeline: Spawning merge pipeline. index_id=otel-log-v0 source_id=_ingest-api-source pipeline_ord=0 root_dir=/quickwit/qwdata/indexing/otel-log-v0/_ingest-api-source merge_policy=StableLogMergePolicy { config: StableLogMergePolicyConfig { min_level_num_docs: 100000, merge_factor: 10, max_merge_factor: 12, maturation_period: 172800s }, split_num_docs_target: 10000000 } 2022-11-30T18:27:37.628Z INFO quickwit_serve::grpc: Starting gRPC server. enabled_grpc_services={"otlp-log", "otlp-trace"} grpc_listen_addr=0.0.0.0:7281 2022-11-30T18:27:37.628Z INFO quickwit_serve::rest: Starting REST server. rest_listen_addr=0.0.0.0:7280 2022-11-30T18:27:37.628Z INFO quickwit_serve::rest: Searcher ready to accept requests at http://0.0.0.0:7280/ 2022-11-30T18:27:42.654Z INFO quickwit_indexing::actors::indexer: new-split split_id="01GK4WPTXK8GH3AGTRNBN9A8YG" partition_id=0 2022-11-30T18:27:52.643Z INFO quickwit_indexing::actors::indexer: send-to-index-serializer commit_trigger=Timeout split_ids=01GK4WPTXK8GH3AGTRNBN9A8YG num_docs=22 2022-11-30T18:27:52.652Z INFO index_batch{index_id=otel-log-v0 source_id=_ingest-api-source pipeline_ord=0}:packager: quickwit_indexing::actors::packager: start-packaging-splits split_ids=["01GK4WPTXK8GH3AGTRNBN9A8YG"] 2022-11-30T18:27:52.652Z INFO index_batch{index_id=otel-log-v0 source_id=_ingest-api-source pipeline_ord=0}:packager: quickwit_indexing::actors::packager: create-packaged-split split_id="01GK4WPTXK8GH3AGTRNBN9A8YG" 2022-11-30T18:27:52.653Z INFO index_batch{index_id=otel-log-v0 source_id=_ingest-api-source pipeline_ord=0}:uploader: quickwit_indexing::actors::uploader: start-stage-and-store-splits split_ids=["01GK4WPTXK8GH3AGTRNBN9A8YG"] 2022-11-30T18:27:52.733Z INFO index_batch{index_id=otel-log-v0 source_id=_ingest-api-source pipeline_ord=0}:uploader:stage_and_upload{split=01GK4WPTXK8GH3AGTRNBN9A8YG}:store_split: quickwit_indexing::split_store::indexing_split_store: store-split-remote-success split_size_in_megabytes=0.018351 num_docs=22 elapsed_secs=0.07654519 throughput_mb_s=0.23974074 is_mature=false ``` If you see some errors there, it's probably coming from a misconfiguration of your object storage. If you need some help, please open an issue on [GitHub](https://github.com/quickwit-oss/quickwit) or come on our [discord server](https://discord.gg/MT27AG5EVE). ### Ready to search logs You are now ready to search, wait 30 seconds and you will see the first indexed logs: just [open the UI](http://localhost:7280/ui/search?query=*&index_id=otel-logs-v0&max_hits=10&sort_by=-timestamp_secs) and play with it. Funny thing you will see quickwit logs in it :). Example of queries: - [body.message:quickwit](http://localhost:7280/ui/search?query=body.message:quickwit&index_id=otel-logs-v0&max_hits=10&sort_by=-timestamp_secs) - [resource_attributes.k8s.container.name:quickwit](http://localhost:7280/ui/search?query=resource_attributes.k8s.container.name%3Aquickwit&index_id=otel-logs-v0&max_hits=10&sort_by=-timestamp_secs) - [resource_attributes.k8s.container.restart_count:1](http://localhost:7280/ui/search?query=resource_attributes.k8s.container.restart_count%3A1&index_id=otel-logs-v0&max_hits=10&sort_by=-timestamp_secs) ![UI screenshot](../../assets/screenshot-ui-otel-logs.png) That's all, folks! ### Clean up Let's first delete the index and then uninstall the charts. ```bash # Delete the index. The command will return the list of delete split files. curl -XDELETE http://127.0.0.1:7280/api/v1/indexes/otel-logs-v0 # Uninstall charts helm uninstall otel-collector helm uninstall quickwit # Delete namespace kubectl delete namespace qw-tutorial ``` Finally, you need to delete three JSON files created by Quickwit on your object storage: ```bash # if your version <= 0.7.1 aws s3 rm ${DEFAULT_INDEX_ROOT_URI}/indexes_states.json # if your version > 0.7.1 aws s3 rm ${DEFAULT_INDEX_ROOT_URI}/manifest.json # the metastore file of the logs index aws s3 rm ${DEFAULT_INDEX_ROOT_URI}/otel-logs-v0_7/metastore.json # the metastore file of the traces index aws s3 rm ${DEFAULT_INDEX_ROOT_URI}/otel-traces-v0_7/metastore.json ``` ## Next step Follow our [tutorial](../../get-started/tutorials/trace-analytics-with-grafana.md) to install Quickwit Grafana plugin to explore your logs, create dashboards and alerts. ================================================ FILE: docs/log-management/send-logs/using-otel-collector.md ================================================ --- title: Send logs from OTEL Collector sidebar_label: Using OTEL collector description: Using OTEL Collector tags: [otel, collector, log] sidebar_position: 1 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; If you already have your own OpenTelemetry Collector and want to export your logs to Quickwit, you need a new OLTP gRPC exporter in your config.yaml: ```yaml title="otel-collector-config.yaml" receivers: otlp: protocols: grpc: http: processors: batch: exporters: otlp/quickwit: endpoint: host.docker.internal:7281 tls: insecure: true # By default, logs are sent to the otel-logs-v0_7. # You can customize the index ID By setting this header. # headers: # qw-otel-logs-index: otel-logs-v0_7 service: pipelines: logs: receivers: [otlp] processors: [batch] exporters: [otlp/quickwit] ``` ```yaml title="otel-collector-config.yaml" receivers: otlp: protocols: grpc: http: processors: batch: exporters: otlp/quickwit: endpoint: 127.0.0.1:7281 tls: insecure: true # By default, logs are sent to the otel-logs-v0_7. # You can customize the index ID By setting this header. # headers: # qw-otel-logs-index: otel-logs-v0_7 service: pipelines: logs: receivers: [otlp] processors: [batch] exporters: [otlp/quickwit] ``` ## Test your OTEL configuration 1. [Install](../../get-started/installation.md) and start a Quickwit server: ```bash ./quickwit run ``` 2. Start a collector with the previous config: ```bash docker run -v ${PWD}/otel-collector-config.yaml:/etc/otelcol/config.yaml -p 4317:4317 -p 4318:4318 -p 7281:7281 otel/opentelemetry-collector ``` ```bash docker run -v ${PWD}/otel-collector-config.yaml:/etc/otelcol/config.yaml --network=host -p 4317:4317 -p 4318:4318 -p 7281:7281 otel/opentelemetry-collector ``` 3. Send a log to your collector with cURL: ```bash curl -XPOST "http://localhost:4318/v1/logs" -H "Content-Type: application/json" \ --data-binary @- << EOF { "resource_logs": [ { "resource": { "attributes": [ { "key": "service.name", "value": { "stringValue": "test-with-curl" } } ] }, "scope_logs": [ { "scope": { "name": "manual-test" }, "log_records": [ { "time_unix_nano": "1678974011000000000", "observed_time_unix_nano": "1678974011000000000", "name": "test", "severity_text": "INFO" } ] } ] } ] } EOF ``` You should see a log on the Quickwit server similar to the following: ```bash 2023-03-16T13:44:09.369Z INFO quickwit_indexing::actors::indexer: new-split split_id="01GVNAKT5TQW0T2QGA245XCMTJ" partition_id=6444214793425557444 ``` This means that Quickwit has received the log and created a new split. Wait for the split to be published before searching for logs. ================================================ FILE: docs/log-management/send-logs/using-vector.md ================================================ --- title: Send logs from Vector sidebar_label: Using Vector description: A simple tutorial to send logs from Vector to Quickwit in a few minutes. icon_url: /img/tutorials/vector-logo.png tags: [logs, ingestion] sidebar_position: 3 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; [Vector](https://vector.dev/) is an amazing piece of software (in Rust obviously) and brings a new fresh wind in the observability space, it is well-known for collecting logs from every part of your infrastructure, transforming and aggregating them, and finally forwarding them to a sink. In this guide, we will show you how to connect it to Quickwit. ## Start Quickwit server ```bash # Create Quickwit data dir. mkdir qwdata ./quickwit run ``` ```bash # Create Quickwit data dir. mkdir qwdata docker run --rm -v $(pwd)/qwdata:/quickwit/qwdata -p 7280:7280 quickwit/quickwit run ``` ## Taking advantage of Quickwit's native support for logs Let's embrace the OpenTelemetry standard and take advantage of Quickwit features. With the native support for OpenTelemetry standards, Quickwit already comes with an index called `otel-logs_v0_7` that is compatible with the OpenTelemetry [logs data model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md). This means we can start pushing log data without any prior usual index setup. The OpenTelemetry index configuration can be found in the [quickwit-opentelemetry/src/otlp/logs.rs](https://github.com/quickwit-oss/quickwit/blob/main/quickwit/quickwit-opentelemetry/src/otlp/logs.rs) source file. ## Setup Vector Our sink here will be Quickwit ingest API `http://127.0.0.1:7280/api/v1/otel-logs-v0_7/ingest`. To keep it simple in this tutorial, we will use a log source called `demo_logs` that generates logs in a given format. Let's choose the common `syslog` format (Vector does not generate logs in the OpenTelemetry format directly!) and use the transform feature to map the `syslog` format into the OpenTelemetry format. ```toml title=vector.toml [sources.generate_syslog] type = "demo_logs" format = "syslog" count = 100000 interval = 0.001 [transforms.remap_syslog] inputs = [ "generate_syslog"] type = "remap" source = ''' structured = parse_syslog!(.message) .timestamp_nanos = to_unix_timestamp!(structured.timestamp, unit: "nanoseconds") .body = structured .service_name = structured.appname .resource_attributes.source_type = .source_type .resource_attributes.host.hostname = structured.hostname .resource_attributes.service.name = structured.appname .attributes.syslog.procid = structured.procid .attributes.syslog.facility = structured.facility .attributes.syslog.version = structured.version .severity_text = if includes(["emerg", "err", "crit", "alert"], structured.severity) { "ERROR" } else if structured.severity == "warning" { "WARN" } else if structured.severity == "debug" { "DEBUG" } else if includes(["info", "notice"], structured.severity) { "INFO" } else { structured.severity } .scope_name = structured.msgid del(.message) del(.timestamp) del(.service) del(.source_type) ''' # useful to see the logs in the terminal # [sinks.emit_syslog] # inputs = ["remap_syslog"] # type = "console" # encoding.codec = "json" [sinks.quickwit_logs] type = "http" method = "post" inputs = ["remap_syslog"] encoding.codec = "json" framing.method = "newline_delimited" uri = "http://127.0.0.1:7280/api/v1/otel-logs-v0_7/ingest" ``` Download the above Vector config file. ```bash curl -o vector.toml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/vector-otel-logs/vector.toml ``` Now let's start Vector so that we can start sending logs to Quickwit. ```bash docker run -v $(pwd)/vector.toml:/etc/vector/vector.toml:ro -p 8383:8383 --net=host timberio/vector:0.25.0-distroless-libc ``` ## Search logs Quickwit is now ingesting logs coming from Vector and you can search them either with `curl` or by using the UI: - `curl -XGET http://127.0.0.1:7280/api/v1/otel-logs-v0_7/search?query=severity_text:ERROR` - Open your browser at `http://127.0.0.1:7280/ui/search?query=severity_text:ERROR&index_id=otel-logs-v0_7&max_hits=10` and play with it! ## Compute aggregation on severity_text For aggregations, we can't use yet Quickwit UI but we can use cURL. Let's craft a nice aggregation query to count how many `INFO`, `DEBUG`, `WARN`, and `ERROR` per minute (all datetime are stored in microseconds thus the interval of 60_000_000 microseconds) we have: ```json title=aggregation-query.json { "query": "*", "max_hits": 0, "aggs": { "count_per_minute": { "histogram": { "field": "timestamp_nanos", "interval": 60000000 }, "aggs": { "severity_text_count": { "terms": { "field": "severity_text" } } } } } } ``` ```bash curl -XPOST -H "Content-Type: application/json" http://127.0.0.1:7280/api/v1/otel-logs-v0_7/search --data @aggregation-query.json ``` ## Going further Now you can also deploy Grafana and connect to Quickwit as data source for query, dashboard, alerts and more! ================================================ FILE: docs/log-management/supported-agents.md ================================================ --- title: Supported agents sidebar_position: 3 --- Quickwit is compatible with the following agents: ## OpenTelemetry agent Before using an [OpenTelemetry collector](https://opentelemetry.io/docs/collector/), check that [Quickwit OpenTelemetry service](otel-service.md) is enabled. Once started, Quickwit is then ready to receive and ingest OpenTelemetry gRPC requests. Here is a configuration example of an OpenTelemetry agent that sends logs into Quickwit: ```yaml mode: daemonset presets: logsCollection: enabled: true kubernetesAttributes: enabled: true config: exporters: otlp: # Replace quickwit-host with the hostname of your Quickwit node/service. # On k8s, it should be of the form `{quickwit-indexer-service-name}.{namespace}.svc.cluster.local:7281 endpoint: quickwit-host:7281 tls: insecure: true service: pipelines: logs: exporters: - otlp ``` Find more configuration details on the [OpenTelemetry documentation](https://opentelemetry.io/docs/collector/configuration/). You can also check out our [tutorial to send logs with OTEL collector](send-logs/using-otel-collector.md) to Quickwit. ## HTTP-based agents It's also possible to use other agents that send HTTP requests to Quickwit Ingest API. Quickwit also partially supports Elasticseardch `_bulk` API. Thus, there is a good chance that your agent is already compatible with Quickwit. Currently, we have tested the following HTTP-based agents: - [Vector](send-logs/using-vector.md) - [Fluentbit](send-logs/using-fluentbit.md) - FluentD (tutorial coming soon) - Logstash: Quickwit does not support the Elasticsearch output. However, it's possible to send logs with the HTTP output but with `json` [format](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-http.html) only. Quickwit natively supports the [OpenTelemetry Protocol (OTLP)](https://opentelemetry.io/docs/reference/specification/protocol/otlp/) and provides a gRPC endpoint to receive logs from an OpenTelemetry collector by default. The logs received by this endpoint are indexed on the `otel-logs-v0` index. This index will be automatically created if not present. The index doc mapping is described in this [section](#opentelemetry-logs-data-model). You can also send your logs directly to this index by using the [ingest API](/docs/reference/rest-api.md#ingest-data-into-an-index). ## OpenTelemetry service Quickwit natively supports the [OpenTelemetry Protocol (OTLP)](https://opentelemetry.io/docs/reference/specification/protocol/otlp/) and provides a gRPC endpoint to receive spans from an OpenTelemetry collector. This endpoint is enabled by default. When enabled, Quickwit will start the gRPC service ready to receive spans from an OpenTelemetry collector. The spans are indexed in the `otel-trace-v0_7` index by default, and this index will be automatically created if not present. The index doc mapping is described in the next [section](#trace-and-span-data-model). If for any reason, you want to disable this endpoint, you can: - Set the `QW_ENABLE_OTLP_ENDPOINT` environment variable to `false` when starting Quickwit. - Or [configure the node config](/docs/configuration/node-config.md) by setting the indexer setting `enable_otlp_endpoint` to `false`. ```yaml title=node-config.yaml # ... Indexer configuration ... indexer: enable_otlp_endpoint: false ``` ================================================ FILE: docs/operating/_category_.yaml ================================================ label: 'Operating Quickwit' position: 7 collapsed: true ================================================ FILE: docs/operating/aws-costs.md ================================================ --- title: AWS Cost Optimization sidebar_position: 3 --- Quickwit has been tested on Amazon S3. This page sums up what we have learned from that experience. ## Real World Example In this [blog post](https://quickwit.io/blog/benchmarking-quickwit-engine-on-an-adversarial-dataset#indexing-costs), we indexed 23 TB of data and evaluated performance and costs. You may be able to deduce the costs of indexing and querying on your dataset. ## Data transfers costs and latency Cloud providers charge for data transfers in and out of their networks. In addition, querying an index from a remote machine adds some extra latency. For those reasons, we recommend that you test and use the Quickwit from an instance located within your cloud provider's network. ## Optimizing bandwidth with wisely chosen instances We recommend picking instances with high network performance to allow faster downloads from Amazon S3. In our experience, `c5n.2xlarge` instances offer the best bang for your buck. ## Requests cost A final note on object storage requests costs. These are [quite low](https://aws.amazon.com/s3/pricing/) actually, $0,0004 / 1000 requests for GET and $0.005 / 1000 requests for PUT on AWS S3. ### PUT requests During indexing, Quickwit uploads new splits on Amazon S3 and progressively merges them until they reach 10 million documents that we call “mature splits”. Such splits have a typical size between 1GB and 10GB and will usually require 2 PUT requests to be uploaded (1 PUT request / 5GB). With default indexing parameters `commit_timeout_secs` of 60 seconds and `merge_policy.merge_factor` of 10 and assuming you want to ingest 1 million documents every minute, this will cost you less than $1 / month. ### GET requests When querying, Quickwit needs to make multiple GET requests: ```jsx #num requests = #num splits * ((#num search fields * #num terms * 3) + (#num search fields with fieldnorms enabled) + 1 (timestamp fast field if present)) + #num docs returned ``` The above formula assumes that the hotcache is cached, which will be loaded after the first query for every split. `#num splits` can be reduced with [pruning](../overview/concepts/querying.md). When positions are not enabled, only 2 GET requests will be executed per term. These requests costs could add up quickly if you have a high number of splits or QPS > 10. Don't hesitate to [contact us](mailto:hello@quickwit.io) if this is the case :). ================================================ FILE: docs/operating/data-directory.md ================================================ --- title: Data directory sidebar_position: 1 --- Quickwit operates on a local directory to store temporary files. By default, this working directory is named `qwdata` and placed right next to the Quickwit binary. Let's have a look at how Quickwit organizes the data directory. ## Data directory layout When operating Quickwit, you will end up with the following tree: ```bash qwdata ├── cache │ └── splits | ├── 03BSWV41QNGY5MZV9JYAE1DCGA.split │ └── 01GSWV41QNGY5MZV9JYAE1DCB7.split ├── delete_task_service │ └── wikipedia%01H13SVKDS03P%TpCfrA ├── indexing │ ├── wikipedia%01H13SVKDS03P%_ingest-api-source%RbaOAI │ └── wikipedia%01H13SVKDS03P%kafka-source%cNqQtI ├── wal │ ├── wal-00000000000000000056 │ └── wal-00000000000000000057 └── queues ├── partition_id ├── wal-00000000000000000028 └── wal-00000000000000000029 ``` ### `/queues` and `/wal` directories These directories are created only if the ingest API service is running on your node. They contain write ahead log files of the ingest API to guard against data loss. The `/queues` directory is used by the legacy version of the ingest (sometimes referred to as ingest V1). It is meant to be phased out in upcoming versions of Quickwit. Learn more about ingest API versions [here](../ingest-data/ingest-api.md#ingest-api-versions). The log file is truncated when Quickwit commits a split (piece of index), which means that the split is stored on the storage and its metadata are in the metastore. You can configure `max_queue_memory_usage` and `max_queue_disk_usage` in the [node config file](../configuration/node-config.md#ingest-api-configuration) to limit the max disk usage. ### `/indexing` directory This directory holds the local indexing directory of each indexing source of each index managed by Quickwit. In the above tree, you can see two directories corresponding to the `wikipedia` index, which means that index is currently handling two sources. ### `/delete_task_service` directory This directory is used by the Janitor service to apply deletes on indexes. During this process, splits are downloaded, a new split is created while applying deletes and uploaded to the target storage. This directory gets created only on nodes running the Janitor service. ### `/cache` directory This directory is used for caching splits that will undergo a merge operation to save disk IOPS. Splits are evicted if they are older than two days. If cache limits are reached, oldest splits are evicted. You can [configure](../configuration/node-config#indexer-configuration) the number of splits the cache can hold with `split_store_max_num_splits` and limit the overall size in bytes of splits with `split_store_max_num_bytes`. ### `/searcher-split-cache` directory This directory is used by searcher nodes to cache entire splits and reduce calls to the object store. It won't be created unless you set the `split_cache` fields in the [searcher configuration](../configuration/node-config.md#searcher-configuration). ## Setting the right splits cache limits Caching splits saves disk IOPS when Quickwit needs to merge splits. Setting the right limits will depend on your [merge policy](../configuration/index-config.md#merge-policies) and the number of partitions you are using. The default splits cache limits should fit most use cases. ### Splits cache with the default configuration For a given index, Quickwit commits one split every minute and uses the "Stable log" [merge policy](../configuration/index-config.md#merge-policies). This merge policy by default merges splits by group of 10, 11, or 12 until splits have more than 10 millions of documents. A split will typically undergo 3 or 4 merges and after will be considered as mature and evicted from the cache. The following table shows how many splits will be created after a given amount of time assuming a 20MB/s ingestion rate with a compression ratio of 0.5: | Time (minutes) | Number of splits | Splits size (GB) | | -------------- | -------------------------------------- | ----------- | | 1 | 1 | 0.6 GB | | 2 | 2 | 1.2 GB | | 10 | 10 | 6 GB | | 11 | 1 + 1 (merged once) | 6.6 GB | | 21 | 1 + 2 (merged once) | 12.6 GB | | 91 | 1 + 9 (merged once) | 54.6 GB | | 101 | 1 + 1 (merged twice) | 60.6 GB | | 111 | 2 + 1 (merged once) + 1 (merged twice) | 66.6 GB | | 201 | 1 + 0 (merged once) + 2 (merged twice) | 120.6 GB | | .. | ... | | In this case, the default cache limits of 1000 splits and 100GB are good enough to avoid downloading splits from the storage for the first two merges. This is perfectly fine for a production use case. You may want to increase the splits cache size to avoid any split download. You can monitor the download rate with our [indexers dashboard](monitoring.md). ### Splits cache with partitioning When using [partitions](../overview/concepts/querying.md#partitioning), Quickwit will create one split per partition and the number of splits can add up very quickly. Let's take a concrete example with the following assumptions: - A [commit timeout](../configuration/index-config.md#indexing-settings) of 1 minute. - A partitioning that has 100 partitions. Quickwit will create 100 splits per minute assuming a document of each partition is ingested in one minute. - A merge policy that merges splits of same partition as soon as there is 10 splits. The following table shows how many splits will be created after a given amount of time: | Time (minutes) | Number of splits | | ------------ | ---------------- | | 1 | 100 | | 2 | 200 | | 10 | 1000 | | 11 | 100 + 100 (merged once) | | 21 | 100 + 200 (merged once) | | 91 | 100 + 900 (merged once) | | 100 | 1000 + 900 (merged once) | | 101 | 100 + 0 (merge once) + 100 (merged twice) | | 200 | 1000 + 900 (merged once) + 100 (merged twice) | | 201 | 100 + 0 (merged once) + 200 (merged twice) | With these assumptions, you have to set `split_store_max_num_splits` to at least 1000 to avoid downloading splits from the storage for the first merge operation. And as merging can take a bit of time, you should set `split_store_max_num_splits` to a value that can hold all the splits that are not yet merged plus the incoming splits, a value of 1100 splits should be enough. If you want to store split until the second merge, a limit of 2500 splits should be good enough. ## Troubleshooting with a huge number of local splits When starting, Quickwit is scanning all the splits in the cache directory to know which split is present locally, this can take a few minutes if you have tens of thousands splits. On Kubernetes, as your pod can be restarted if it takes too long to start, you may want to clean up the data directory or increase the liveliness probe timeout. Also please report such a behavior on [GitHub](https://github.com/quickwit-oss/quickwit) as we can certainly optimize this start phase. ================================================ FILE: docs/operating/monitoring.md ================================================ --- title: Monitoring with Grafana sidebar_position: 2 --- You can monitor your Quickwit cluster with Grafana. Follow the tutorial at [Quickwit Monitoring with Grafana](../get-started/tutorials/prometheus-metrics) on how to set it up. We provide three Grafana dashboards to help you monitor: - [indexers performance](https://github.com/quickwit-oss/quickwit/blob/main/monitoring/grafana/dashboards/indexers.json) - [searchers performance](https://github.com/quickwit-oss/quickwit/blob/main/monitoring/grafana/dashboards/searchers.json) - [metastore queries](https://github.com/quickwit-oss/quickwit/blob/main/monitoring/grafana/dashboards/metastore.json) Dashboards rely on a prometheus datasource fed with [Quickwit metrics](../reference/metrics.md). ## Screenshots ![Indexers Grafana Dashboard](../assets/images/screenshot-indexers-grafana-dashboard.png) ![Searchers Grafana Dashboard](../assets/images/screenshot-searchers-grafana-dashboard.png) ![Metastore Grafana Dashboard](../assets/images/screenshot-metastore-grafana-dashboard.png) ================================================ FILE: docs/operating/upgrades.md ================================================ --- title: Version upgrade sidebar_position: 4 --- ## Migration from 0.6.x to 0.7.0 The format of the index and internal objects stored in the metastore of 0.7 is backward compatible with 0.6. If you are using the OTEL indexes and ingesting data into indexes the `otel-logs-v0_6` and `otel-traces-v0_6`, you must stop indexing before upgrading. Indeed, the first time you start Quickwit 0.7, it will update the doc mapping fields of Trace ID and Span ID of those two indexes by changing their input/output formats from `base64` to `hex`. This is automatic: you don't have to perform any manual operation. Quickwit 0.7 will also create the new index `otel-traces-v0_7`, which is now used by default when ingesting data with the OTEL gRPC and HTTP API. The Jaeger gRPC and HTTP APIs will query both `otel-traces-v0_6` and `otel-traces-v0_7` by default. It's possible to define the index ID you want to use for OTEL gRPC endpoints and Jaeger gRPC API by setting the request header `qw-otel-logs-index` or `qw-otel-traces-index` to the index ID you want to target. ## Migration from 0.7.0 to 0.7.1 Quickwit 0.7.1 will create the new index `otel-logs-v0_7` which is now used by default when ingesting data with the OTEL gRPC and HTTP API. In the traces index `otel-traces-v0_7`, the `service_name` field is now `fast`. No migration is done if `otel-traces-v0_7` already exists. If you want `service_name` field to be `fast`, you have to delete first the existing `otel-traces-v0_7` index or you need to create your own index. ## Migration from 0.8 to 0.9 Quickwit 0.9 introduces a new ingestion service to to power the ingest and bulk APIs (v2). The new ingest is enabled and used by default, even though the legacy one (v1) remains enabled to finish indexing residual data in the legacy write ahead logs. Note that `ingest_api.max_queue_disk_usage` is enforced on both ingest versions separately, which means that the cumulated disk usage might be up to twice this limit. When upgrading to 0.9, we recommend to perform a full cluster restart. Shutdown order: 1) indexers, searchers and janitor 2) control plane 3) metastores Start up order: 1) metastores 2) control plane 3) indexers, searchers and janitor ================================================ FILE: docs/overview/_category_.yaml ================================================ label: 'Introduction' position: 1 collapsed: true ================================================ FILE: docs/overview/architecture.md ================================================ --- title: Architecture sidebar_position: 2 --- Quickwit distributed search engine relies on 4 major services and one maintenance service: - The Searchers for executing search queries from the REST API. - The Indexers that index data from data sources. - The Metastore that stores the index metadata in a PostgreSQL-like database or in a cloud storage file. - The Control plane that schedules indexing tasks to the indexers. - The Janitor that executes periodic maintenance tasks. Moreover, Quickwit leverages existing infrastructure by relying on battled-tested technologies for index storage, metadata storage, and ingestion: - Cloud storage like AWS S3, Google Cloud Storage, Azure Blob Storage or other S3 compatible storage for index storage. - Postgresql for metadata storage. - Distributed queues like Kafka and Pulsar for ingestion. ## Architecture diagram The following diagram shows a Quickwit cluster with its four major components and the janitor whose role is to execute periodic maintenance tasks, see the [Janitor section](#janitor) for more details. ![Quickwit Architecture](../assets/images/quickwit-architecture-light.svg#gh-light-mode-only)![Quickwit Log Management](../assets/images/quickwit-architecture-dark.svg#gh-dark-mode-only) ## Index & splits A Quickwit index stores documents and makes it possible to query them efficiently. The index organizes documents into a collection of smaller independent indexes called **splits**. A document is a collection of fields. Fields can be stored in different data structures: - an inverted index, which enables fast full-text search. - a columnar storage called `fast field`. It is the equivalent of doc values in [Lucene](https://lucene.apache.org/). Fast fields are required to compute aggregates over the documents matching a query. They can also allow some advanced types of filtering. - a row-storage called the doc store. It makes it possible to get the content of the matching documents. You can configure your index to control how to map your JSON object to a Quickwit document and, for each field, define whether it should be stored, indexed, or be a fast field. [Learn how to configure your index](../configuration/index-config.md) ### Splits A split is a small piece of an index identified by a UUID. For each split, Quickwit adds up a `hotcache` file along with index files. This **hotcache** is what makes it possible for Searchers to open a split in less than 60ms, even on high latency storage. The Quickwit index is aware of its splits by keeping splits metadata, notably: - the split state which indicates if the split is ready for search - the min/max time range computed on the timestamp field if present. This timestamp metadata can be handy at query time. If the user specifies a time range filter to their query, Quickwit will use it to **prune irrelevant splits**. Index metadata needs to be accessible by every instance of the cluster. This is made possible thanks to the `metastore`. ### Index storage Quickwit stores the indexes data (splits files) on cloud storage (AWS S3, Google Cloud Storage, Azure Blob Storage or other S3 compatible storage) and also on local disk for single-server deployment. ## Metastore Quickwit gathers index metadata into a metastore to make them available across the cluster. On the write path, indexers push index data on the index storage and publish metadata to the metastore. On the read path, for a given query on a given index, a search node will ask the metastore for the index metadata and then use it to do the query planning and finally execute the plan. In a clustered deployment, the metastore is typically a traditional RDBMS like PostgreSQL which we only support today. In a single-server deployment, it’s also possible to rely on a local file or on Amazon S3. ## Quickwit cluster and services ### Cluster formation Quickwit uses [chitchat](https://github.com/quickwit-oss/chitchat), a cluster membership protocol with failure detection implemented by Quickwit. The protocol is inspired by Scuttlebutt reconciliation and phi-accrual detection, ideas borrowed from Cassandra and DynamoDB. [Learn more on chitchat](https://github.com/quickwit-oss/chitchat). ### Indexers See [dedicated indexing doc page](./concepts/indexing.md). ### Searchers Quickwit's search cluster has the following characteristics: - It is composed of stateless nodes: any node can answer any query about any splits. - A node can distribute search workload to other nodes. - Load-balancing is made with rendezvous hashing to allow for efficient caching. This design provides high availability while keeping the architecture simple. **Workload distribution: root and leaf nodes** Any search node can handle any search request. A node that receives a query will act as the root node for the span of the request. It will then process it in 3 steps: - Get the index metadata from the metastore and identify the splits relevant to the query. - Distributes the split workload among the nodes of the cluster. These nodes are assuming the role of leaf nodes. - Waits for results from leaf nodes, merges them, and returns the aggregated results. **Stateless nodes** Quickwit cluster distributes search workloads while keeping nodes stateless. Thanks to the hotcache, opening a split on Amazon S3 only takes 60ms. It makes it possible to remain totally stateless: a node does not need to know anything about the indexes. Adding or removing nodes takes seconds and does not require moving data around. **Rendezvous hashing** The root node uses [Rendezvous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing) to distribute the workload among leaf nodes. Rendez-vous hashing makes it possible to define a node/split affinity function with excellent stability properties when a node joins or leaves the cluster. This trick unlocks efficient caching. Learn more about query internals on the [querying doc page](./concepts/querying.md). ### Control plane The control plane service schedules indexing tasks to indexers. The scheduling is executed when the scheduler receives external or internal events and on certains conditions: - The scheduler listens to metastore events: source create, delete, toggle, or index delete. On each of these events, it will schedule a new plan, named the `desired plan` and send indexing tasks to the indexers. - On every `HEARTBEAT` (3 seconds), the scheduler controls if the `desired plan` and the indexing tasks running on indexers are in sync. If not, it will reapply the desired plan to indexers. - Every minute, the scheduler rebuilds a plan with the latest metastore state, and if it differs from the last applied plan, it will apply the new one. This is necessary as the scheduler may have not received all metastore events due to network issues. ### Janitor The Janitor service runs maintenance tasks on indexes: garbage collection, delete query tasks, and retention policy tasks. ## Data sources Quickwit supports [multiple sources](../ingest-data/) to ingest data from. A file is ideal for a one-time ingestion like an initial load, the ingest API or a message queue are ideal to continuously feed data into the system. Quickwit indexers connect directly to external message queues like Kafka, Pulsar or Kinesis and guarantee the exactly-once semantics. If you need support for other distributed queues, please vote for yours [here](https://github.com/quickwit-oss/quickwit/issues/1000). ================================================ FILE: docs/overview/concepts/_category_.yaml ================================================ label: 'Advanced concepts' position: 3 collapsed: true ================================================ FILE: docs/overview/concepts/deletes.md ================================================ --- title: Deletes sidebar_position: 3 --- Quickwit supports deletes thanks to the [delete API](../../reference/rest-api.md#delete-api). It's important to note that this feature is mainly intended to comply with GDPR (General Data Protection Regulation) and should be used parsimoniously as deletes are expensive: typically a few queries per hour or day is recommended. ## Delete tasks A delete task on a given index is executed on all splits created before the delete task creation. This can be a long-running task that could last several hours if the delete query is matching documents present in many splits. To track the progress of the execution, each delete task is given a unique and incremental identifier called "operation stamp" or `opstamp`. All existing splits will undergo a delete operation and, after its success, each split metadata will be updated with the corresponding operation stamp. All splits created after the creation of a delete tasks will have a `opstamp` greater or equal to the `opstamp` of the delete task (greater if other delete tasks have been created at the same moment). Quickwit batches delete operations on a given split: for example, if a split has it delete `opstamp = n` and the last created delete task has a `opstamp = n + 10`, ten (10) delete queries will be executed at once on the split. ## Delete API Delete tasks are created through the [Delete REST API](../../reference/rest-api.md#delete-api). ## Pitfalls ### Immature splits Delete operations are applied only to “mature” splits, that is splits that will no longer undergo merges. Whether a split is mature depends on the [merge policy](../../configuration/index-config.md#merge-policies). It is possible to define `maturation_period` after which a split will be mature. Thus, a delete request created at `t0` will first apply deletes to mature splits and, in the worst case, will wait the `t0 + maturation_period` for immature splits to become mature. ### Monitoring and dev XP It's currently not possible to monitor delete operations. An [issue](https://github.com/quickwit-oss/quickwit/issues/2494) is opened to improve the dev experience, don't hesitate to add your comments it and follow its progress. ================================================ FILE: docs/overview/concepts/indexing.md ================================================ --- title: Indexing sidebar_position: 1 --- ## Supported data formats Quickwit ingests JSON records and refers to them as "documents" or "docs". Each document must be a JSON object. When ingesting files, documents must be separated by a newline. Quickwit does not yet support file formats such as `Avro` or `CSV`. Compression formats such as `bzip2` or `gzip` are also not supported yet. ## Data model Quickwit supports both schemaless indexes and fixed schemas. The "document mapping" of an index, also commonly called "doc mapping", is a list of field names and types that declares the schema of an index. For a schemaless or mixed fixed schema and schemaless indexing, follow our [guide on schemaless indexing](../../guides/schemaless.md). Additionally, a doc mapping specifies how documents are indexed (tokenizers) and stored (column-oriented vs. row-oriented). ## Merge process and merge policy An index is broken into immutable splits. The size of a split is defined by the number of documents it carries. A split is considered "mature" when its size reaches a threshold defined in the index config as `split_num_docs_target`. An indexer buffers incoming documents and produces a new split when the size of the buffer reaches `split_num_docs_target` or `commit_timeout_secs` seconds have passed since the first document has been enqueued, depending on which event occurs first. In the latter case, the indexer generates immature splits. The merge process designates the iterative procedure that groups and merges immature splits together to produce mature splits. The merge policy controls the merge algorithm, which is mainly driven by the two parameters `split_num_docs_target` and `merge_factor`. Each time a new split is published, the merge policy examines the list of immature splits and attempts to merge `merge_factor` splits together in order to produce larger splits. The merge policy may also decide to merge fewer or more splits together if deemed necessary. Finally, the merge algorithm never merges more than `max_merge_factor` splits together. ### Split store The split store is a cache that keeps recently published and immature splits on disk to speed up the merge process. After a successful merge phase, the split store evicts dangling splits. The disk space allocated to the split store is controlled by the config parameters `split_store_max_num_splits` and `split_store_max_num_bytes`. ## Data sources A data source designates the location and set of parameters that allow to connect to and ingest data from an external data store, which can be a file, a stream, or a database. Often, Quickwit simply refers to data sources as "sources". The indexing engine supports local adhoc file ingests using [the CLI](/docs/reference/cli#tool-local-ingest) and streaming sources (e.g. the Kafka source). Quickwit can insert data into an index from one or multiple sources. More details can be found [in the source configuration page](https://quickwit.io/docs/configuration/source-config). ## Checkpoint Quickwit achieves exactly-once processing using checkpoints. For each source, a "source checkpoint" records up to which point documents have been processed in the target file or stream. Checkpoints are stored in the metastore and updated atomically each time a new split is published. When an indexing error occurs, the indexing process is resumed right after the last successfully published checkpoint. Internally, a source checkpoint is represented as an object mapping from absolute paths or partition IDs to offsets or sequence numbers. ================================================ FILE: docs/overview/concepts/querying.md ================================================ --- title: Querying sidebar_position: 2 --- A search query received by a searcher will be executed using a map-reduce approach following these steps: 1. The Searcher identifies relevant splits based on the request’s [timestamp interval](#time-sharding) and [tags](#tag-pruning). 2. It distributes the splits workload among other searchers available in the cluster using *[rendez-vous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing)* to optimize caching and load. 3. It finally waits for all results, merges them, and returns them to the client. A search stream query follows the same execution path as for a search query except for the last step: instead of waiting for each Searcher's result, the searcher streams the results as soon as it starts receiving some from a searcher. ### **Time sharding** On datasets with a time component, Quickwit will shard data into timestamp-aware splits. With this feature, Quickwit is capable of filtering out most splits before they can make it to the query processing stage, thus reducing drastically the amount of data needed to process the query. The following query parameters are available to apply timestamped pruning to your query: - `startTimestamp`: restricts search to documents with a `timestamp >= start_timestamp` - `endTimestamp`: restricts search to documents with a `timestamp < end_timestamp` ### Tag pruning Quickwit also provides pruning on a second dimension called `tags`. By [setting a field as tagged](../../configuration/index-config.md) Quickwit will generate split metadata at indexing in order to filter splits that match requested tags at query time. Note that this metadata is only generated when the cardinality of the field is less than 1,000. Tag pruning is notably useful on multi-tenant datasets. ### Partitioning Quickwit makes it possible to route documents into different splits based on a partitioning key. This feature is especially useful in a context where documents with different tags are all mixed together in the same source (usually a Kafka topic). In that case, simply marking the field as tag will have no positive effect on search, as all produced splits will contain almost all tags. The `partition_key` attributes (defined in the doc mapping) lets you configure the logic used by Quickwit to route documents into isolated splits. Quickwit will also enforce this isolation during merges. This functionality is, in a sense, similar to sharding. Quickwit supports a simple DSL for partitioning described in the next section. Partition & tags are often used to: - separate `tenants` in a multi-tenant application - separate `team` or `application` in an observation logging case. Emitting many splits can heavily stress an `indexer`. For this reason, another parameter of the doc mapping called `max_num_partitions` acts as a safety valve. If the number of partitions is about to exceed `max_num_partitions`, a single extra partition is created and all extra partitions will be grouped together into this special partition. If you are expecting 20 partitions, we strongly recommend you to not set `max_num_partitions` to 20, but instead use a larger value (200 for instance). Quickwit should handle that number of partitions smoothly, and it will avoid documents belonging to different partitions from being grouped together due to a few faulty documents. ### Partition key DSL Quickwit allows you to configure how document are routed with a simple DSL. Here are some sample expression with a short description of their result: - `tenant_id`: create one partition per tenant\_id - `tenant_id,app_id`: create one partition per unique combination of tenant\_id and app\_id - `tenant_id,hash_mod(app_id, 8)`: for each tenant, create up to 8 partitions containing each data related to some applications - `hash_mod((tenant_id,app_id), 50)`: create 50 partition in total, containing some combination of tenant and apps. The partition key DSL is generated by this grammar: ``` RoutingExpr := RoutingSubExpr [ , RoutingExpr ] RougingSubExpr := Identifier [ \( Arguments \) ] Identifier := FieldChar [ Identifier ] FieldChar := { a..z | A..Z | 0..9 | _ } Arguments := Argument [ , Arguments ] Argument := { \( RoutingExpr \) | RoutingSubExpr | DirectValue } # We may want other DirectValue in the future DirectValue := Number Number := { 0..9 } [ Number ] ``` Supported functions are currently: - `hash_mod(RoutingExpr, Number)`: hash `RoutingExpr` and divide the result by `Number`, keeping only the reminder. When using `hash_mod` with a tuple of key like in `hash_mod((tenant_id,app_id), 50)`, beware it might route together documents which would make tags less effective. For instance, if tenant\_1,app\_1 and tenant\_2,app\_2 are both sent to partition one, but tenant\_1,app\_2 is sent to partition two, a query for tenant\_1,app\_2 will still search inside the 1st partition as it will be tagged with tenant\_1,tenant\_2,app\_1 and app\_2. You should therefore prefer a partition key such as `hash_mod(tenant_id, 10),hash_mod(app_id, 5)` which will generate as many splits, but with better tags. ### Caching Quickwit does caching in many places to deliver a highly performing query engine. In memory: - Hotcache caching: A static cache that holds information about a split file internal representation. It helps speed up the opening of a split file. Its size can be defined via the `split_footer_cache_capacity` configuration parameter. - Fast field caching: Fast fields tend to be accessed very frequently by users especially for stream requests. They are cached in a RAM whose size can be limited by the `fast_field_cache_capacity` configuration value. - Partial request caching: In some cases, like when using dashboards, some very similar requests might be issued, with only timestamp bounds changing. Some partial results can be cached to make these requests faster and issue less requests to the storage. They are cached in a RAM whose size can be limited by the `partial_request_cache_capacity` configuration value. On disk: - The split cache stores entire splits on disk. It can be enabled by setting the `split_cache` configuration fields. This cache can help reduce object store costs and load. Searchers populate this cache when splits are created or queried and evict them with a simple LRU strategy. Learn more about cache parameters in the [searcher configuration docs](../../configuration/node-config.md#searcher-configuration). ### Scoring Quickwit supports sorting docs by their BM25 scores. In order to query by score, [fieldnorms](../../configuration/index-config.md#text-type) must be enabled for the field. By default, BM25 scoring is disabled to improve query latencies but it can be opt-in by setting the `sort_by` option to `_score` in queries. ### Document ID Each document in Quickwit is assigned a unique document ID, which is a combination of the split ID and the Tantivy DocId within the split. This implies that you cannot assign a custom ID and that the ID changes when splits undergo merges. This ID is used for every search query as sort order (after the explicitly specified sort values) to make the results deterministic. ================================================ FILE: docs/overview/index.md ================================================ --- title: Quickwit documentation slug: / sidebar_position: 1 --- import CallToAction from '@theme/CallToAction'; Quickwit is the first engine to execute complex search and analytics queries directly on cloud storage with sub-second latency. Powered by Rust and its decoupled compute and storage architecture, it is designed to be resource-efficient, easy to operate, and scale to petabytes of data. Quickwit is a great fit for log management, distributed tracing, and generally immutable data such as conversational data (emails, texts, messaging platforms) and event-based analytics. ## Use cases - [Log management](../log-management/overview.md) - [Distributed Tracing](../distributed-tracing/overview.md) ## Key concepts - [Architecture](architecture.md) - [Indexing](concepts/indexing.md) - [Querying](concepts/querying.md) ## Reference - [Configuration](../configuration/index.md) - [REST API](../reference/rest-api.md) - [CLI](../reference/cli.md) ================================================ FILE: docs/overview/introduction.md ================================================ --- title: What is Quickwit? sidebar_position: 1 --- Quickwit is the first engine to execute complex search and analytics queries directly on cloud storage with sub-second latency. Powered by Rust and its decoupled compute and storage architecture, it is designed to be resource-efficient, easy to operate, and scale to petabytes of data. Quickwit is a great fit for log management, distributed tracing, and generally immutable data such as conversational data (emails, texts, messaging platforms) and event-based analytics. ## Why Quickwit is different from other search engines? Quickwit is designed for sub-second search straight from object storage allowing true decoupled compute and storage. And it means a lot for your infrastructure: - You store once for all your data on cheap, safe and unlimited storage. - You scale out your cluster in seconds, no need to move data around. - Indexing and search workloads are decoupled, you can scale them independently. - Your tenants are easily isolated and you can charge them for their usage. Quickwit is also designed to index and search semi-structured data. Its schemaless indexing allows you to index JSON document with an arbitrary amount of field without heavily impacting your performance. Aggregation are not yet supported but we are working on it, stay tuned! ## When to use Quickwit Quickwit is a great fit for log management, distributed tracing, and generally immutable data such as conversational data (emails, texts, messaging platforms), event-based analytics, audit logs, security logs, and more. Check out our guides to see how you can use Quickwit: - [Log management](../log-management/overview.md) - [Distributed Tracing](../distributed-tracing/overview.md) ## Key features - Full-text search and aggregation queries - Elasticsearch query language support - Sub-second search on cloud storage (Amazon S3, Azure Blob Storage, …) - Decoupled compute and storage, stateless indexers & searchers - [Schemaless](https://quickwit.io/docs/guides/schemaless) or strict schema indexing - Schemaless analytics - [Grafana data source](https://github.com/quickwit-oss/quickwit-datasource) - [Jaeger-native](https://quickwit.io/docs/distributed-tracing/plug-quickwit-to-jaeger) - OTEL-native for [logs](https://quickwit.io/docs/log-management/overview) and [traces](https://quickwit.io/docs/distributed-tracing/overview) - Kubernetes ready - See our [helm-chart](https://quickwit.io/docs/deployment/kubernetes) - RESTful API ### Enterprise-grade features - Multiple [data sources](../ingest-data/index.md) Kafka / Kinesis / Pulsar native - Multi-tenancy: indexing with many indexes and partitioning - Retention policies - Delete tasks (for GRPR use cases) - Distributed and highly available* engine that scales out in seconds (HA indexing only with Kafka) ## When not to use Quickwit Use cases where you would likely *not* want to use Quickwit include: - You need a low-latency search for e-commerce websites. - Your data is mutable. ## Time to discover Quickwit - [Quickstart](../get-started/quickstart.md) - [Concepts](architecture.md) - [Last release blog post](https://quickwit.io/blog/quickwit-0.7) ================================================ FILE: docs/reference/_category_.yaml ================================================ label: 'Reference' position: 11 collapsed: true ================================================ FILE: docs/reference/aggregation.md ================================================ --- title: Aggregations API sidebar_position: 30 --- An aggregation summarizes your data as statistics on buckets or metrics. Aggregations can provide answers to questions like: - What is the average price of all sold articles? - How many errors with status code 500 do we have per day? - What is the average listing price of cars grouped by color? There are two categories: [Metrics](#metric-aggregations) and [Buckets](#bucket-aggregations). #### Prerequisite To be able to use aggregations on a field, the field needs to have a fast field index created. A fast field index is a columnar storage, where documents values are extracted and stored. Example to create a fast field on text for term aggregations. ```yaml name: category type: text tokenizer: raw record: basic fast: true ``` See the [index configuration](../configuration/index-config.md) page for more details and examples. #### API Endpoint The endpoints for aggregations are the search endpoints: - Quickwit API: `api/v1//search` - Elasticsearch API: `api/v1/_elastic//_search`. #### Format The aggregation request and result de/serialize into elasticsearch compatible JSON. If not documented otherwise you should be able to drop in your elasticsearch aggregation queries. In some examples below is not the full request shown, but only the payload for `aggregations`. #### Example Request ```json skip { "query": "*", "max_hits": 0, "aggs": { "sites_and_aqi": { "terms": { "field": "County", "size": 2, "order": { "average_aqi": "asc" } }, "aggs": { "average_aqi": { "avg": { "field": "AQI" } } } } } } ``` Response ```json ... "aggregations": { "sites_and_aqi": { "buckets": [ { "average_aqi": { "value": 32.62267569707098 }, "doc_count": 56845, "key": "臺東縣" }, { "average_aqi": { "value": 35.97893635571055 }, "doc_count": 28675, "key": "花蓮縣" } ], "sum_other_doc_count": 1872055 } } ``` ### Supported Aggregations - Bucket - [Histogram](#histogram) - [DateHistogram](#date-histogram) - [Range](#range) - [Terms](#terms) - Metric - [Average](#average) - [Count](#count) - [Max](#max) - [Min](#min) - [Stats](#stats) - [Sum](#sum) - [Percentiles](#percentiles) - [Cardinality](#cardinality) ## Bucket Aggregations BucketAggregations create buckets of documents. Each bucket is associated with a rule which determines whether or not a document falls into it. In other words, the buckets effectively define document sets. Buckets are not necessarily disjunct, therefore a document can fall into multiple buckets. In addition to the buckets themselves, the bucket aggregations also compute and return the number of documents for each bucket. Bucket aggregations, as opposed to metric aggregations, can hold sub-aggregations. These sub-aggregations will be aggregated for the buckets created by their “parent” bucket aggregation. There are different bucket aggregators, each with a different “bucketing” strategy. Some define a single bucket, some define a fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process. ### Histogram A histogram is a type of bucket aggregation where documents are grouped into buckets based on a fixed interval. Each document's value is "rounded down" to the nearest bucket boundary. E.g. if we have a price 18 and an interval of 5, the document will fall into the bucket with the key 15. The formula used for this is: `((val - offset) / interval).floor() * interval + offset`. #### Histogram on datetime fields See [`DateHistogram`](#date-histogram) for more convenient API for `datetime` fields. Fields of type `datetime` are handled the same way as any numeric field. However, all values in the requests such as intervals, offsets, bounds, and range boundaries need to be expressed in milliseconds. Histogram with one bucket per day on a `datetime` field. `interval` needs to be provided in milliseconds. In the following example, we grouped documents per day (`1 day = 86400000 milliseconds`). The returned format is currently fixed at `RFC3339`. ##### Request ```json skip { "query": "*", "max_hits": 0, "aggs": { "count_per_day":{ "histogram":{ "field": "datetime", "interval": 86400000 } } } } ``` ##### Response ```json skip { ... "aggregations": { "count_per_day": { "buckets": [ { "doc_count": 1, "key": 1546300800000000.0, "key_as_string": "2019-01-01T00:00:00Z" }, { "doc_count": 2, "key": 1546560000000000.0, "key_as_string": "2019-01-04T00:00:00Z" } ] } } } ``` #### Returned Buckets By default buckets are returned between the min and max value of the documents, including empty buckets. Setting `min_doc_count > 0` will filter empty buckets. The value range of the buckets can be extended via [`extended_bounds`](#extended_bounds) or limit the range via [`hard_bounds`](#hard_bounds). #### Example ```json { "query": "*", "max_hits": 0, "aggs": { "prices": { "histogram": { "field": "price", "interval": 10 } } } } ``` #### Parameters ###### **field** The field to aggregate on. Currently this aggregation only works on fast fields of type `u64`, `f64`, `i64`, and `datetime`. ###### **keyed** Change response format from an array to a hashmap, `key` in the bucket will be the `key` in the hashmap. ###### **interval** The interval to chunk your data range. Each bucket spans a value range of [0..interval). Must be larger than 0. ###### **offset** Intervals implicitly defines an absolute grid of buckets `[interval * k, interval * (k + 1))`. Offset makes it possible to shift this grid into `[offset + interval * k, offset + interval (k + 1))`. Offset has to be in the range [0, interval). As an example, if there are two documents with value 8 and 12 and interval 10.0, they would fall into the buckets with the key 0 and 10. With offset 5 and interval 10, they would both fall into the bucket with the key 5 and the range [5..15) ```json { "query": "*", "max_hits": 0, "aggs": { "prices": { "histogram": { "field": "price", "interval": 10, "offset": 2.5 } } } } ``` ###### **min_doc_count** The minimum number of documents in a bucket to be returned. Defaults to 0. ###### **hard_bounds** Limits the data range to [min, max] closed interval. This can be used to filter values if they are not in the data range. hard_bounds only limits the buckets, to force a range set both `extended_bounds` and `hard_bounds` to the same range. ```json { "query": "*", "max_hits": 0, "aggs": { "prices": { "histogram": { "field": "price", "interval": 10, "hard_bounds": { "min": 0, "max": 100 } } } } } ``` ###### **extended_bounds** Can be set to extend your bounds. The range of the buckets is by default defined by the data range of the values of the documents. As the name suggests, this can only be used to extend the value range. If the bounds for min or max are not extending the range, the value has no effect on the returned buckets. Cannot be set in conjunction with `min_doc_count` > 0, since the empty buckets from extended bounds would not be returned. ```json { "query": "*", "max_hits": 0, "aggs": { "prices": { "histogram": { "field": "price", "interval": 10, "extended_bounds": { "min": 0, "max": 100 } } } } } ``` ### Date Histogram `DateHistogram` is similar to `Histogram`, but it can only be used with [datetime type](../configuration/index-config#datetime-type) and provides a more convenient API to define intervals. Like the histogram, values are rounded down to the closest bucket. The returned format is currently fixed at `Rfc3339`. ##### Limitations Only fixed time intervals via the `fixed_interval` parameter are supported. The parameters `interval` and `calendar_interval` are unsupported. ##### Request ```json skip { "query": "*", "max_hits": 0, "aggs": { "sales_over_time": { "date_histogram": { "field": "sold_at", "fixed_interval": "30d" "offset": "-4d" } } } } ``` ##### Response ```json skip { ... "aggregations": { "sales_over_time" : { "buckets" : [{ "key_as_string" : "2015-01-01T00:00:00Z", "key" : 1420070400000, "doc_count" : 4 }] } } } ``` #### Parameters ###### **field** The field to aggregate on. Currently this aggregation only works on fast fields of type `datetime`. ###### **keyed** Change response format from an array to a hashmap, `key` in the bucket will be the `key` in the hashmap. ###### **interval** The interval to chunk your data range. Each bucket spans a value range of [0..interval). Must be larger than 0. Fixed intervals are configured with the `fixed_interval` parameter. Fixed intervals are a fixed number of SI units and never deviate, regardless of where they fall on the calendar. One second is always composed of 1000ms. This allows fixed intervals to be specified in any multiple of the supported units. However, it means fixed intervals cannot express other units such as months, since the duration of a month is not a fixed quantity. Attempting to specify a calendar interval like month or quarter will return an Error. The accepted units for fixed intervals are: * `ms`: milliseconds * `s`: seconds. Defined as 1000 milliseconds each. * `m`: minutes. Defined as 60 seconds each (60_000 milliseconds). * `h`: hours. Defined as 60 minutes each (3_600_000 milliseconds). * `d`: days. Defined as 24 hours (86_400_000 milliseconds). Fractional time values are not supported, but this can be addressed by shifting to another time unit (e.g., `1.5h` could instead be specified as `90m`). ###### **offset** Intervals implicitly define an absolute grid of buckets `[interval * k, interval * (k + 1))`. Offset makes it possible to shift this grid into `[offset + interval * k, offset + interval (k + 1))`. Offset has to be in the range [0, interval). This is especially useful when using `fixed_interval`, to shift the first bucket e.g. at the start of the year. The `offset` parameter has the same syntax as the `fixed_interval` parameter, but also allows for negative values. ###### **min_doc_count** The minimum number of documents in a bucket to be returned. Defaults to 0. ###### **hard_bounds** Same as in [`Histogram`](#hard_bounds) but `min` and `max` parameters need to be set as timestamp with milliseconds precision. ###### **extended_bounds** Same as in [`Histogram`](#extended_bounds) but `min` and `max` parameters need to be set as timestamp with milliseconds precision. ### Range Provide user-defined buckets to aggregate on. Two special buckets will automatically be created to cover the whole range of values. The provided buckets have to be continuous. During the aggregation, the values extracted from the fast_field field will be checked against each bucket range. Note that this aggregation includes the from value and excludes the to value for each range. #### Limitations/Compatibility Overlapping ranges are not yet supported. ##### Request ```json skip { "query": "*", "max_hits": 0, "aggs": { "my_scores": { "range": { "field": "score", "ranges": [ { "to": 3.0, "key": "low" }, { "from": 3.0, "to": 7.0, "key": "medium-low" }, { "from": 7.0, "to": 20.0, "key": "medium-high" }, { "from": 20.0, "key": "high" } ] } } } } ``` ##### Response ```json skip { ... "aggregations": { "my_scores" : { "buckets": [ {"key": "low", "doc_count": 0, "to": 3.0}, {"key": "medium-low", "doc_count": 10, "from": 3.0, "to": 7.0}, {"key": "medium-high", "doc_count": 10, "from": 7.0, "to": 20.0}, {"key": "high", "doc_count": 80, "from": 20.0} ] } } } ``` #### Parameters ###### **keyed** Change response format from an array to a hashmap, the serialized range will be the `key` in the hashmap. If a custom `key` is provided, it will be used instead. ###### **field** The field to aggregate on. Currently this aggregation only works on fast fields of type `u64`, `f64`, `i64`, and `datetime`. ###### **ranges** The list of buckets, with `from` and `to` values. The `from` value is inclusive in the range. The `to` value is not inclusive in the range. `key` is optional, and will be used as the bucket key in the response. The first bucket can omit the `from` value, and the last bucket the `to` value. Note that this aggregation includes the `from` value and excludes the `to` value for each range. Extra buckets will be created until the first `to`, and last `from`, if necessary. ### Terms Creates a bucket for every unique term and counts the number of occurrences. Request ```json skip { "query": "*", "max_hits": 0, "aggs": { "genres": { "terms": { "field": "genre" } } } } ``` Response ```json ... "aggregations": { "genres": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "drumnbass", "doc_count": 6 }, { "key": "raggae", "doc_count": 4 }, { "key": "jazz", "doc_count": 2 } ] } } ``` #### Document count error In Quickwit, we have one segment per split. Therefore the results returned from a split, is equivalent to results returned from a segment. To improve performance, results from one split are cut off at `shard_size`. When combining results of multiple splits, terms that don't make it in the top n of a result from a split increase the theoretical upper bound error by lowest term-count. Even with a larger `shard_size` value, doc_count values for a terms aggregation may be approximate. As a result, any sub-aggregations on the terms aggregation may also be approximate. `sum_other_doc_count` is the number of documents that didn’t make it into the top size terms. If this is greater than 0, the terms agg had to throw away some buckets, either because they didn’t fit into `size` on the root node or they didn’t fit into `shard_size` on the leaf node. #### Per bucket document count error If you set the `show_term_doc_count_error` parameter to true, the terms aggregation will include doc_count_error_upper_bound, which is an upper bound to the error on the doc_count returned by each split. It’s the sum of the size of the largest bucket on each split that didn’t fit into `shard_size`. #### Parameters ###### **field** The field to aggregate on. Currently term aggregation only works on fast fields of type `text`, `f64`, `i64` and `u64`. ###### **size** By default, the top 10 terms with the most documents are returned. Larger values for size are more expensive. ###### **shard_size** To obtain more accurate results, we fetch more than the `size` from each segment/split. Increasing this value will enhance accuracy but will also increase CPU/memory usage. Refer to the [`document count error`](#document-count-error) section for more information on how `shard_size` impacts accuracy. `shard_size` represents the number of terms that are returned from one split. For example, if there are 100 splits and `shard_size` is set to 1000, the root node may receive up to 100_000 terms to merge. Assuming an average cost of 50 bytes per term, this would require up to 5MB of memory. The actual number of terms sent to the root depends on the number of splits handled by one node and how the intermediate results can be merged (e.g., the cardinality of the terms). Note on differences between Quickwit and Elasticsearch: * Unlike Elasticsearch, Quickwit does not use global ordinals, so serialized terms need to be sent to the root node. * The concept of shards in Elasticsearch differs from splits in Quickwit. In Elasticsearch, a shard contains up to 200M documents and is a collection of segments. In contrast, a Quickwit split comprises a single segment, typically with 5M documents. Therefore, `shard_size` in Elasticsearch applies to a group of segments, whereas in Quickwit, it applies to a single segment. Defaults to `size * 10`. ###### **show_term_doc_count_error** If you set the show_term_doc_count_error parameter to true, the terms aggregation will include doc_count_error_upper_bound, which is an upper bound to the error on the doc_count returned by each split. It’s the sum of the size of the largest bucket on each split that didn’t fit into `shard_size`. Defaults to true when ordering by count desc. ###### **min_doc_count** Filter all terms that are lower than `min_doc_count`. Defaults to 1. _Expensive_ : When set to 0, this will return all terms in the field. ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "genre", "missing": "NO_DATA" } ``` ###### **order** Set the order. String is here a target, which is either “_count”, “_key”, or the name of a metric sub_aggregation. Single value metrics like average can be addressed by its name. Multi value metrics like stats are required to address their field by name e.g. “stats.avg”. _Limitation_ : Ordering is only supported by one property currently. Passing an array for `order` is _not_ supported `"order": [{ "average_price": "asc" }, { "_key": "asc" }]`. Order alphabetically ```json skip { "query": "*", "max_hits": 0, "aggs": { "genres": { "terms": { "field": "genre", "order": { "_key": "asc" } } } } } ``` Order by sub_aggregation ```json skip { "query": "*", "max_hits": 0, "aggs": { "articles_by_price": { "terms": { "field": "article_name", "order": { "average_price": "asc" } }, "aggs": { "average_price": { "avg": { "field": "price" } } } } } } ``` ## Metric Aggregations The aggregations in this family compute metrics based on values extracted from the documents that are being aggregated. Values are extracted from the fast field of the document. Some aggregations output a single numeric metric (e.g. Average) and are called single-value numeric metrics aggregation, others generate multiple metrics (e.g. Stats) and are called multi-value numeric metrics aggregation. In contrast to bucket aggregations, metrics don't allow sub-aggregations, since there is no document set to aggregate on. ### Average A single-value metric aggregation that computes the average of numeric values that are extracted from the aggregated documents. Supported field types are `u64`, `f64`, `i64`, and `datetime`. **Request** ```json skip { "query": "*", "max_hits": 0, "aggs": { "average_price": { "avg": { "field": "price" } } } } ``` **Response** ```json { "num_hits": 9582098, "hits": [], "elapsed_time_micros": 101942, "errors": [], "aggregations": { "average_price": { "value": 133.7 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ### Count A single-value metric aggregation that counts the number of values that are extracted from the aggregated documents. Supported field types are `u64`, `f64`, `i64`, and `datetime`. **Request** ```json skip { "query": "*", "max_hits": 0, "aggs": { "price_count": { "value_count": { "field": "price" } } } } ``` **Response** ```json { "num_hits": 9582098, "hits": [], "elapsed_time_micros": 102956, "errors": [], "aggregations": { "price_count": { "value": 9582098 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ### Max A single-value metric aggregation that computes the maximum of numeric values that are that are extracted from the aggregated documents. Supported field types are `u64`, `f64`, `i64`, and `datetime`. **Request** ```json skip { "query": "*", "max_hits": 0, "aggs": { "max_price": { "max": { "field": "price" } } } } ``` **Response** ```json { "num_hits": 9582098, "hits": [], "elapsed_time_micros": 101543, "errors": [], "aggregations": { "max_price": { "value": 1353.23 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ### Min A single-value metric aggregation that computes the minimum of numeric values that are that are extracted from the aggregated documents. Supported field types are `u64`, `f64`, `i64`, and `datetime`. **Request** ```json skip { "query": "*", "max_hits": 0, "aggs": { "min_price": { "min": { "field": "price" } } } } ``` **Response** ```json { "num_hits": 9582098, "hits": [], "elapsed_time_micros": 102342, "errors": [], "aggregations": { "min_price": { "value": 0.01 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ### Stats A multi-value metric aggregation that computes stats (average, count, min, max, standard deviation, and sum) of numeric values that are extracted from the aggregated documents. Supported field types are `u64`, `f64`, `i64`, and `datetime`. **Request** ```json skip { "query": "*", "max_hits": 0, "aggs": { "timestamp_stats": { "stats": { "field": "timestamp" } } } } ``` **Response** ```json { "num_hits": 10000783, "hits": [], "elapsed_time_micros": 65297, "errors": [], "aggregations": { "timestamp_stats": { "avg": 1462320207.9803998, "count": 10000783, "max": 1475669670.0, "min": 1440670432.0, "standard_deviation": 11867304.28681695, "sum": 1.4624347076526848e16 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ### Extended Stats Extended stats is the same as `stats`, but with following additional metrics: `sum_of_squares`, `variance`, `std_deviation`, and `std_deviation_bounds`. Supported field types are `u64`, `f64`, `i64`, and `datetime`. **Request** ```json { "query": "*", "max_hits": 0, "aggs": { "response_extended_stats": { "extended_stats": { "field": "response" } } } } ``` **Response** ```json { .. "aggregations": { "response_extended_stats": { "avg": 65.55555555555556, "count": 9, "max": 130.0, "min": 20.0, "std_deviation": 42.97573245736381, "std_deviation_bounds": { "lower": -20.395909359172062, "lower_population": -20.395909359172062, "lower_sampling": -25.60973998562673, "upper": 151.50702047028318, "upper_population": 151.50702047028318, "upper_sampling": 156.72085109673785 }, "std_deviation_population": 42.97573245736381, "std_deviation_sampling": 45.582647770591144, "sum": 590.0, "sum_of_squares": 55300.0, "variance": 1846.9135802469136, "variance_population": 1846.9135802469136, "variance_sampling": 2077.777777777778 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ###### **sigma** The sigma parameter controls how many standard deviations +/- from the mean should be displayed. The default value is 2. ```json skip { "field": "price", "sigma": "3.0" } ``` ### Sum A single-value metric aggregation that sums up numeric values that are that are extracted from the aggregated documents. Supported field types are `u64`, `f64`, `i64`, and `datetime`. **Request** ```json skip { "query": "*", "max_hits": 0, "aggs": { "total_price": { "sum": { "field": "price" } } } } ``` **Response** ```json { "num_hits": 9582098, "hits": [], "elapsed_time_micros": 101142, "errors": [], "aggregations": { "total_price": { "value": 12966782476.54 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ### Percentiles The percentiles aggregation is a useful tool for understanding the distribution of a data set. It calculates the values below which a given percentage of the data falls. For instance, the 95th percentile indicates the value below which 95% of the data points can be found. This aggregation can be particularly interesting for analyzing website or service response times. For example, if the 95th percentile website load time is significantly higher than the median, this indicates that a small percentage of users are experiencing much slower load times than the majority. To use the percentiles aggregation, you'll need to provide a field to aggregate on. In the case of website load times, this would typically be a field containing the duration of time it takes for the site to load. **Request** ```json skip { "query": "*", "max_hits": 0, "aggs": { "loading_times": { "percentiles": { "field": "load_time" "percents": [90, 95, 99] } } } } ``` **Response** ```JSON { "num_hits": 9582098, "hits": [], "elapsed_time_micros": 101142, "errors": [], "aggregations": { "loading_times": { "values": { "90.0": 33.4, "95.0": 83.4, "99.0": 230.3 } } } } ``` `percents` may be omitted, it will default to `[1, 5, 25, 50 (median), 75, 95, and 99]`. #### Estimating Percentiles While percentiles provide valuable insights into the distribution of data, it's important to understand that they are often estimates. This is because calculating exact percentiles for large data sets can be computationally expensive and time-consuming. #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` ### Cardinality The cardinality aggregation is used to approximate the count of distinct values in a field. Cardinality aggregations are essential when working with large datasets where computing the exact count of distinct values would be computationally expensive. The cardinality aggregation can be useful to e.g. to count the number of unique users visiting a website or to determine the number of unique IP addresses that have logged into a server over a certain period. The algorithm behind the cardinality aggregation is based on HyperLogLog++, which provides an approximate count over the hashed values. To use the cardinality aggregation, you need to specify the field on which to perform the aggregation. **Request** ```json { "query": "*", "max_hits": 0, "aggs": { "unique_users": { "cardinality": { "field": "user_id" } } } } ``` **Response** ```json { "num_hits": 9582098, "hits": [], "elapsed_time_micros": 101142, "errors": [], "aggregations": { "unique_users": { "value": 345672 } } } ``` #### Parameters ###### **missing** The `missing` parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value. ```json skip { "field": "price", "missing": "10.0" } ``` #### Performance The cardinality aggregation on text fields is computationally expensive for datasets with a large amount of unique values. This is because the aggregation computes the hash for each unique term in the field. In order to do this, Quickwit will for each split first collect the term ids and then fetch the compressed terms for those term ids from the dictionary. Decompressing the terms is comparatively expensive and keeping the term ids increases the memory usage. For numeric fields, the cardinality aggregation is much more efficient as it directly computes the hash of the numeric values and adds them to HLL++. ##### Limitations The parameter `precision_threshold` is ignored currently. Normally it allows to set the threshold until the aggregation is exact. ================================================ FILE: docs/reference/cli.md ================================================ --- title: Command-line options sidebar_position: 50 --- Quickwit command line tool lets you start a Quickwit server and manage indexes (create, delete, ingest), splits and sources (create, delete, toggle). To start a server, `quickwit` needs a [node config file path](../configuration/node-config.md) that you can specify with `QW_CONFIG` environment variable: `export QW_CONFIG=./config/quickwit.yaml`. This page documents all the available commands, related options, and environment variables. ### Common Options To manage indexes, splits and sources on a remote cluster you might need to specify the connection to a Quickwit node. The following options are supported: | Option | Description | Default | |---------------------|-----------------------------|------------------------:| | `--endpoint` | The url of a Quickwit node. | `http://127.0.0.1:7280` | | `--timeout` | Command timeout. | *See below* | | `--connect-timeout` | Connect timeout. | `5s` | The default timeouts are command specific: - **search** - 1 minute - **ingest** (without force or wait) - 1 minute - **ingest** (with force or wait) - 30 minute - all other operations - 10 seconds The timeout can be expressed as in seconds, minutes, hours or days. For example: - `10s` - 10 seconds timeout - `1m` - 1 minute timeout - `2h` - 2 hours timeout - `1d` - 1 day timeout - `none` - no timeout is applied. :::caution Before using Quickwit with object storage, consult our [guidelines](../operating/aws-costs.md) for deploying on AWS S3 to avoid surprises on your next bill. ::: ## Commands [Command-line synopsis syntax](https://developers.google.com/style/code-syntax) ### Help `quickwit` or `quickwit --help` displays the list of available commands. `quickwit --help` displays the documentation for the command and a usage example. ### Version `quickwit --version` displays the version. It is helpful for reporting bugs. ### Syntax The CLI is structured into high-level commands with subcommands. `quickwit [command] [subcommand] [args]`. * `command`: `run`, `index`, `split`, `source` and `tool`. ## run Starts a Quickwit node with all services enabled by default: `indexer`, `searcher`, `metastore`, `control-plane`, and `janitor`. ### Indexer service The indexer service runs indexing pipelines assigned by the control plane. ### Searcher service Starts a web server at `rest_listing_address:rest_list_port` that exposes the [Quickwit REST API](rest-api.md) where `rest_listing_address` and `rest_list_port` are defined in Quickwit config file (quickwit.yaml). The node can optionally join a cluster using the `peer_seeds` parameter. This list of node addresses is used to discover the remaining peer nodes in the cluster through a gossip protocol, see [chitchat](https://github.com/quickwit-oss/chitchat). ### Metastore service The metastore service exposes Quickwit metastore over the network. This is a core internal service that is needed to operate Quickwit. As such, at least one running instance of this service is required for other services to work. ### Control plane service The control plane service schedules indexing tasks to indexers. It listens to metastore events such as an source create, delete, toggle, or index delete and reacts accordingly to update the indexing plan. ### Janitor service The Janitor service runs maintenance tasks on indexes: garbage collection, documents delete, and retention policy tasks. :::note Quickwit needs to open the following port for cluster formation and workload distribution: TCP port (default is 7280) for REST API TCP and UDP port (default is 7280) for cluster membership protocol TCP port + 1 (default is 7281) for gRPC address for the distributed search If ports are already taken, the serve command will fail. ::: `quickwit run [args]` *Synopsis* ```bash quickwit run [--config ] [--service ] ``` *Options* | Option | Description | Default | |-----------------|-------------|--------:| | `--config` | Config file location | `config/quickwit.yaml` | | `--service` | Services (`indexer`, `searcher`, `metastore`, `control-plane`, or `janitor`) to run. If unspecified, all the supported services are started. | | *Examples* *Starts an indexer and a metastore services* ```bash quickwit run --service indexer --service metastore --endpoint=http://127.0.0.1:7280 ``` *Start a control plane, metastore and janitor services* ```bash quickwit run --service control_plane --service metastore --service janitor --config=./config/quickwit.yaml ``` *Make a search request on a wikipedia index* ```bash # To create wikipedia index and ingest data, go to our tutorials https://quickwit.io/docs/get-started/. # Start a searcher. quickwit run --service searcher --service metastore --config=./config/quickwit.yaml # Make a request. curl "http://127.0.0.1:7280/api/v1/wikipedia/search?query=barack+obama" ``` ## index Manages indexes: creates, updates, deletes, ingests, searches, describes... ### index create Creates an index of ID `index` at `index-uri` configured by a [YAML config file](../configuration/index-config.md) located at `index-config`. The index config lets you define the mapping of your document on the index and how each field is stored and indexed. If `index-uri` is omitted, `index-uri` will be set to `{default_index_root_uri}/{index}`, more info on [Quickwit config docs](../configuration/node-config.md). The command fails if an index already exists unless `overwrite` is passed. When `overwrite` is enabled, the command deletes all the files stored at `index-uri` before creating a new index. `quickwit index create [args]` *Synopsis* ```bash quickwit index create --index-config [--overwrite] ``` *Options* | Option | Description | |-----------------|-------------| | `--index-config` | Location of the index config file. | | `--overwrite` | Overwrites pre-existing index. This will delete all existing data stored at `index-uri` before creating a new index. | *Examples* *Create a new index.* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: curl -o wikipedia_index_config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/wikipedia/index-config.yaml quickwit index create --endpoint=http://127.0.0.1:7280 --index-config wikipedia_index_config.yaml ``` ### index update Updates an index using an index config file. `quickwit index update [args]` *Synopsis* ```bash quickwit index update --index --index-config ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--index-config` | Location of the index config file. | | `--create` | Create the index if it doesn't exist. | ### index clear Clears an index: deletes all splits and resets checkpoint. `quickwit index clear [args]` `quickwit index clr [args]` *Synopsis* ```bash quickwit index clear --index ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | Index ID | ### index delete Deletes an index. `quickwit index delete [args]` `quickwit index del [args]` *Synopsis* ```bash quickwit index delete --index [--dry-run] ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--dry-run` | Executes the command in dry run mode and only displays the list of splits candidates for deletion. | *Examples* *Delete your index* ```bash # Start a Quickwit server. quickwit run --service metastore --config=./config/quickwit.yaml # Open a new terminal and run: quickwit index delete --index wikipedia --endpoint=http://127.0.0.1:7280 ``` ### index describe Displays descriptive statistics of an index. `quickwit index describe [args]` *Synopsis* ```bash quickwit index describe --index ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | *Examples* *Displays descriptive statistics of your index* ```bash # Start a Quickwit server. quickwit run --service metastore --config=./config/quickwit.yaml # Open a new terminal and run: quickwit index describe --endpoint=http://127.0.0.1:7280 --index wikipedia 1. General infos =============================================================================== Index id: wikipedia Index uri: file:///home/quickwit-indices/qwdata/indexes/wikipedia Number of published splits: 1 Number of published documents: 300000 Size of published splits: 448 MB 2. Statistics on splits =============================================================================== Document count stats: Mean ± σ in [min … max]: 300000 ± 0 in [300000 … 300000] Quantiles [1%, 25%, 50%, 75%, 99%]: [300000, 300000, 300000, 300000, 300000] Size in MB stats: Mean ± σ in [min … max]: 448 ± 0 in [448 … 448] Quantiles [1%, 25%, 50%, 75%, 99%]: [448, 448, 448, 448, 448] ``` ### index list List indexes. `quickwit index list [args]` `quickwit index ls [args]` *Examples* *List indexes* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: quickwit index list --endpoint=http://127.0.0.1:7280 # Or with alias. quickwit index ls --endpoint=http://127.0.0.1:7280 Indexes +-----------+--------------------------------------------------------+ | Index ID | Index URI | +-----------+--------------------------------------------------------+ | hdfs-logs | file:///home/quickwit-indices/qwdata/indexes/hdfs-logs | +-----------+--------------------------------------------------------+ | wikipedia | file:///home/quickwit-indices/qwdata/indexes/wikipedia | +-----------+--------------------------------------------------------+ ``` ### index ingest Indexes a dataset consisting of newline-delimited JSON objects located at `input-path` or read from *stdin*. The data is appended to the target index of ID `index` unless `overwrite` is passed. `input-path` can be a file or another command output piped into stdin. Currently, only local datasets are supported. By default, Quickwit's indexer will work with a heap of 2 GiB of memory. Learn how to change `heap-size` in the [index config doc page](../configuration/index-config.md). `quickwit index ingest [args]` *Synopsis* ```bash quickwit index ingest --index [--input-path ] [--batch-size-limit ] [--wait] [--detailed-response] [--force] [--commit-timeout ] ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--input-path` | Location of the input file. | | `--batch-size-limit` | Size limit of each submitted document batch. | | `--wait` | Wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417). | | `--detailed-response` | Print detailed errors. Enabling might impact performance negatively. | | `--force` | Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417). | | `--commit-timeout` | Timeout for ingest operations that require waiting for the final commit (`--wait` or `--force`). This is different from the `commit_timeout_secs` indexing setting, which sets the maximum time before committing splits after their creation. | *Examples* *Indexing a dataset from a file* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: curl -o wiki-articles-10000.json https://quickwit-datasets-public.s3.amazonaws.com/wiki-articles-10000.json quickwit index ingest --endpoint=http://127.0.0.1:7280 --index wikipedia --input-path wiki-articles-10000.json ``` *Indexing a dataset from stdin* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: cat wiki-articles-10000.json | quickwit index ingest --endpoint=http://127.0.0.1:7280 --index wikipedia ``` ### index search Searches an index with ID `--index` and returns the documents matching the query specified with `--query`. More details on the [query language page](query-language.md). The offset of the first hit returned and the number of hits returned can be set with the `start-offset` and `max-hits` options. It's possible to override the default search fields `search-fields` option to define the list of fields that Quickwit will search into if the user query does not explicitly target a field in the query. Quickwit will return snippets of the matching content when requested via the `snippet-fields` options. Search can also be limited to a time range using the `start-timestamp` and `end-timestamp` options. These timestamp options are useful for boosting query performance when using a time series dataset. :::warning The `start_timestamp` and `end_timestamp` should be specified in seconds regardless of the timestamp field precision. The timestamp field precision only affects the way it's stored as fast-fields, whereas the document filtering is always performed in seconds. ::: `quickwit index search [args]` *Synopsis* ```bash quickwit index search --index --query [--aggregation ] [--max-hits ] [--start-offset ] [--search-fields ] [--snippet-fields ] [--start-timestamp ] [--end-timestamp ] [--sort-by-score] ``` *Options* | Option | Description | Default | |-----------------|-------------|--------:| | `--index` | ID of the target index | | | `--query` | Query expressed in natural query language ((barack AND obama) OR "president of united states"). Learn more on https://quickwit.io/docs/reference/search-language. | | | `--aggregation` | JSON serialized aggregation request in tantivy/elasticsearch format. | | | `--max-hits` | Maximum number of hits returned. | `20` | | `--start-offset` | Offset in the global result set of the first hit returned. | `0` | | `--search-fields` | List of fields that Quickwit will search into if the user query does not explicitly target a field in the query. It overrides the default search fields defined in the index config. Space-separated list, e.g. "field1 field2". | | | `--snippet-fields` | List of fields that Quickwit will return snippet highlight on. Space-separated list, e.g. "field1 field2". | | | `--start-timestamp` | Filters out documents before that timestamp (time-series indexes only). | | | `--end-timestamp` | Filters out documents after that timestamp (time-series indexes only). | | | `--sort-by-score` | Sorts documents by their BM25 score. | | *Examples* *Searching a index* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: quickwit index search --endpoint=http://127.0.0.1:7280 --index wikipedia --query "Barack Obama" # If you have jq installed. quickwit index search --endpoint=http://127.0.0.1:7280 --index wikipedia --query "Barack Obama" | jq '.hits[].title' ``` *Sorting documents by their BM25 score* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: quickwit index search --endpoint=http://127.0.0.1:7280 --index wikipedia --query "obama" --sort-by-score ``` *Limiting the result set to 50 hits* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: quickwit index search --endpoint=http://127.0.0.1:7280 --index wikipedia --query "Barack Obama" --max-hits 50 # If you have jq installed. quickwit index search --endpoint=http://127.0.0.1:7280 --index wikipedia --query "Barack Obama" --max-hits 50 | jq '.num_hits' ``` *Looking for matches in the title only* ```bash # Start a Quickwit server. quickwit run --config=./config/quickwit.yaml # Open a new terminal and run: quickwit index search --endpoint=http://127.0.0.1:7280 --index wikipedia --query "obama" --search-fields body # If you have jq installed. quickwit index search --endpoint=http://127.0.0.1:7280 --index wikipedia --query "obama" --search-fields body | jq '.hits[].title' ``` ## source Manages sources: creates, updates, deletes sources... ### source create Adds a new source to an index. `quickwit source create [args]` *Synopsis* ```bash quickwit source create --index --source-config ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--source-config` | Path to source config file. Please, refer to the documentation for more details. | ### source update Update an existing source. `quickwit source update [args]` *Synopsis* ```bash quickwit source update --index --source --source-config ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--source` | ID of the source | | `--source-config` | Path to source config file. Please, refer to the documentation for more details. | | `--create` | Create the source if it doesn't exist. | ### source enable Enables a source for an index. `quickwit source enable [args]` *Synopsis* ```bash quickwit source enable --index --source ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--source` | ID of the source. | ### source disable Disables a source for an index. `quickwit source disable [args]` *Synopsis* ```bash quickwit source disable --index --source ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--source` | ID of the source. | ### source ingest-api Enables/disables the ingest API of an index. `quickwit source ingest-api [args]` *Synopsis* ```bash quickwit source ingest-api --index [--enable] [--disable] ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--enable` | Enables the ingest API. | | `--disable` | Disables the ingest API. | ### source delete Deletes a source from an index. `quickwit source delete [args]` `quickwit source del [args]` *Synopsis* ```bash quickwit source delete --index --source ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--source` | ID of the source. | *Examples* *Delete a `wikipedia-source` source* ```bash # Start a Quickwit server. quickwit run --service metastore --config=./config/quickwit.yaml # Open a new terminal and run: quickwit source delete --endpoint=http://127.0.0.1:7280 --index wikipedia --source wikipedia-source ``` ### source describe Describes a source. `quickwit source describe [args]` `quickwit source desc [args]` *Synopsis* ```bash quickwit source describe --index --source ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--source` | ID of the source. | ### source list Lists the sources of an index. `quickwit source list [args]` `quickwit source ls [args]` *Synopsis* ```bash quickwit source list --index ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | *Examples* *List `wikipedia` index sources* ```bash # Start a Quickwit server. quickwit run --service metastore --config=./config/quickwit.yaml # Open a new terminal and run: quickwit source list --endpoint=http://127.0.0.1:7280 --index wikipedia ``` ### source reset-checkpoint Resets a source checkpoint. `quickwit source reset-checkpoint [args]` `quickwit source reset [args]` *Synopsis* ```bash quickwit source reset-checkpoint --index --source ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | Index ID | | `--source` | Source ID | ## split Manages splits: lists, describes, marks for deletion... ### split list Lists the splits of an index. `quickwit split list [args]` `quickwit split ls [args]` *Synopsis* ```bash quickwit split list --index [--offset ] [--limit ] [--states ] [--create-date ] [--start-date ] [--end-date ] [--output-format ] ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | Target index ID | | `--offset` | Number of splits to skip. | | `--limit` | Maximum number of splits to retrieve. | | `--states` | Selects the splits whose states are included in this comma-separated list of states. Possible values are `staged`, `published`, and `marked`. | | `--create-date` | Selects the splits whose creation dates are before this date. | | `--start-date` | Selects the splits that contain documents after this date (time-series indexes only). | | `--end-date` | Selects the splits that contain documents before this date (time-series indexes only). | | `--output-format` | Output format. Possible values are `table`, `json`, and `pretty-json`. | ### split describe Displays metadata about a split. `quickwit split describe [args]` `quickwit split desc [args]` *Synopsis* ```bash quickwit split describe --index --split [--verbose] ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--split` | ID of the target split | | `--verbose` | Displays additional metadata about the hotcache. | ### split mark-for-deletion Marks one or multiple splits of an index for deletion. `quickwit split mark-for-deletion [args]` `quickwit split mark [args]` *Synopsis* ```bash quickwit split mark-for-deletion --index --splits [--yes] ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | Target index ID | | `--splits` | Comma-separated list of split IDs | | `--yes` | Assume "yes" as an answer to all prompts and run non-interactively. | ## tool Performs utility operations. Requires a node config. ### tool local-ingest Indexes NDJSON documents locally. `quickwit tool local-ingest [args]` *Synopsis* ```bash quickwit tool local-ingest --index [--input-path ] [--input-format ] [--overwrite] [--transform-script ] [--keep-cache] ``` *Options* | Option | Description | Default | |-----------------|-------------|--------:| | `--index` | ID of the target index | | | `--input-path` | Location of the input file. | | | `--input-format` | Format of the input data. | `json` | | `--overwrite` | Overwrites pre-existing index. | | | `--transform-script` | VRL program to transform docs before ingesting. | | | `--keep-cache` | Does not clear local cache directory upon completion. | | ### tool extract-split Downloads and extracts a split to a directory. `quickwit tool extract-split [args]` *Synopsis* ```bash quickwit tool extract-split --index --split [--target-dir ] ``` *Options* | Option | Description | |-----------------|-------------| | `--index` | ID of the target index | | `--split` | ID of the target split | | `--target-dir` | Directory to extract the split to. | ### tool gc Garbage collects stale staged splits and splits marked for deletion. :::note Intermediate files are created while executing Quickwit commands. These intermediate files are always cleaned at the end of each successfully executed command. However, failed or interrupted commands can leave behind intermediate files that need to be removed. Also, note that using a very short grace period (like seconds) can cause the removal of intermediate files being operated on, especially when using Quickwit concurrently on the same index. In practice, you can settle with the default value (1 hour) and only specify a lower value if you really know what you are doing. ::: `quickwit tool gc [args]` *Synopsis* ```bash quickwit tool gc --index [--grace-period ] [--dry-run] ``` *Options* | Option | Description | Default | |-----------------|-------------|--------:| | `--index` | ID of the target index | | | `--grace-period` | Threshold period after which stale staged splits are garbage collected. | `1h` | | `--dry-run` | Executes the command in dry run mode and only displays the list of splits candidates for garbage collection. | | ## Environment Variables ### QW_CLUSTER_ENDPOINT Specifies the address of the cluster to connect to. Management commands `index`, `split` and `source` require the `cluster_endpoint`, which you can set once and for all with the `QW_CLUSTER_ENDPOINT` environment variable. ### QW_CONFIG Specifies the path to the [quickwit config](../configuration/node-config.md). Commands `run` and `tools` require the `config`, which you can set once and for all with the `QW_CONFIG` environment variable. *Example* `export QW_CONFIG=config/quickwit.yaml` ### QW_DISABLE_TELEMETRY Disables [telemetry](../telemetry.md) when set to any non-empty value. *Example* `QW_DISABLE_TELEMETRY=1 quickwit help` ### QW_POSTGRES_SKIP_MIGRATIONS Don't run database migrations (but verify that migrations were run successfully before, and no that unknown migration was run). ### QW_POSTGRES_SKIP_MIGRATION_LOCKING Don't lock the database during migration. This may increase compatibility with alternative databases using the PostgreSQL wire protocol. However, it is dangerous to use this if you can't guarantee that only one node will run the migrations. ### RUST_LOG Configure quickwit log level. *Examples* ``` # run with higher verbosity RUST_LOG=debug quickwit run # run with log level info, except for indexing related logs RUST_LOG=info,quickwit_indexing=debug quickwit run ``` ================================================ FILE: docs/reference/es_compatible_api.md ================================================ --- title: Elasticsearch compatible API sidebar_position: 20 --- In order to facilitate migrations and integrations with existing tools, Quickwit offers an Elasticsearch/Opensearch compatible API. This API is incomplete. This page lists the available features and endpoints. ## Supported endpoints All the API endpoints start with the `api/v1/_elastic/` prefix. ### `_bulk`   Batch ingestion endpoint ``` POST api/v1/_elastic/_bulk ``` ``` POST api/v1/_elastic//_bulk ``` The _bulk ingestion API makes it possible to index a batch of documents, possibly targeting several indices in the same request. #### Request Body example ```json { "create" : { "_index" : "wikipedia", "_id" : "1" } } {"url":"https://en.wikipedia.org/wiki?id=1","title":"foo","body":"foo"} { "create" : { "_index" : "wikipedia", "_id" : "2" } } {"url":"https://en.wikipedia.org/wiki?id=2","title":"bar","body":"bar"} { "create" : { "_index" : "wikipedia", "_id" : "3" } } {"url":"https://en.wikipedia.org/wiki?id=3","title":"baz","body":"baz"}' ``` Ingest a batch of documents to make them searchable using the [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) bulk API. This endpoint provides compatibility with tools or systems that already send data to Elasticsearch for indexing. Currently, only the `create` action of the bulk API is supported, all other actions such as `delete` or `update` are ignored. If an index is specified via the url path, it will act as a default value for the `_index` properties. The [`refresh`](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html) parameter is supported. :::caution The quickwit API will not report errors, you need to check the server logs. In Elasticsearch, the `create` action has a specific behavior when the ingested documents contain an identifier (the `_id` field). It only inserts such a document if it was not inserted before. This is extremely handy to achieve At-Most-Once indexing. Quickwit does not have any notion of document id and does not support this feature. ::: :::info The payload size is limited to 10MB as this endpoint is intended to receive documents in batch. ::: #### Query parameter | Variable | Type | Description | Default value | | --------- | -------- | ---------------------------------------------------------------- | ------------- | | `refresh` | `String` | The commit behavior: blank string, `true`, `wait_for` or `false` | `false` | #### Response The response is a JSON object, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :------: | | `num_docs_for_processing` | Total number of documents ingested for processing. The documents may not have been processed. The API will not return indexing errors, check the server logs for errors. | `number` | ### `_search`   Index search endpoint ``` POST api/v1/_elastic//_search ``` ``` GET api/v1/_elastic//_search ``` #### Request Body example ```json { "size": 10, "query": { "bool": { "must": [ { "query_string": { "query": "bitpacking" } }, { "term": { "actor.login": { "value": "fulmicoton" } } } ] } }, "sort": [ { "actor.id": { "order": null } } ], "aggs": { "event_types": { "terms": { "field": "type", "size": 5 } } } } ``` Search into a specific index using the [Elasticsearch search API](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/search-search.html). Some of the parameter can be passed as query string parameter, and some via JSON payload. If a parameter appears both as a query string parameter and in the JSON payload, the query string parameter value will take priority. #### Supported Query string parameters | Variable | Type | Description | Default value | | ------------------ | ------------- | -------------------------------------------------------------------------------- | ------------- | | `default_operator` | `AND` or `OR` | The default operator used to combine search terms. It should be `AND` or `OR`. | `OR` | | `from` | `Integer` | The rank of the first hit to return. This is useful for pagination. | 0 | | `q` | `String` | The search query. | (Optional) | | `size` | `Integer` | Number of hits to return. | 10 | | `sort` | `String` | Describes how documents should be ranked. See [Sort order](#sort-order) | (Optional) | | `scroll` | `Duration` | Creates a scroll context for "time to live". See [Scroll](#_searchscroll--scroll-api). | (Optional) | | `allow_partial_search_results` | `Boolean` | Returns a partial response if some (but not all) of the split searches were unsuccessful. | `true` | #### Supported Request Body parameters | Variable | Type | Description | Default value | | ------------------ | ----------------- | ------------------------------------------------------------------------------ | ------------- | | `default_operator` | `"AND"` or `"OR"` | The default operator used to combine search terms. It should be `AND` or `OR`. | `OR` | | `from` | `Integer` | The rank of the first hit to return. This is useful for pagination. | 0 | | `query` | `Json object` | Describe the search query. See [Query DSL](#query-dsl) | (Optional) | | `size` | `Integer` | Number of hits to return. | 10 | | `sort` | `JsonObject[]` | Describes how documents should be ranked. See [Sort order](#sort-order) | `[]` | | `search_after` | `Any[]` | Ignore documents with a SortingValue preceding or equal to the parameter | (Optional) | | `aggs` | `Json object` | Aggregation definition. See [Aggregations](aggregation.md). | `{}` | #### Sort order You can define up to two criteria on which to apply sort. The second criterion will only be used in presence of a tie for the first criterion. A given criterion can either be - the name of a fast field (explicitly defined in the schema or captured by the dynamic mode) - `_score` to sort by BM25. By default, the sort order is `ascending` for fast fields and descending for `_score`. When sorting by a fast field and this field contains several values in a single document, only the first value is used for sorting. The sort order can be set as descending/ascending using the following syntax. ```json { // ... "sort" : [ { "timestamp" : {"order" : "asc"}}, { "serial_number" : "desc" } ] // ... } ``` It is also possible to not supply an order and rely on the default order using the following syntax. ```json { //... "sort" : ["_score", "timestamp"] // ... } ``` If no format is provided for timestamps, timestamps are returned with milliseconds precision. If you need nanosecond precision, you can use the `epoch_nanos_int` format. Beware this means the resulting JSON may contain high numbers for which there is loss of precision when using languages where all numbers are floats, such as JavaScript. ```json { // ... "sort" : [ { "timestamp" : {"format": "epoch_nanos_int","order" : "asc"}}, { "serial_number" : "desc" } ] // ... } #### Search after When sorting results, the answer looks like the following ```json { // ... "hits": { // ... "hits": [ // ... { // ... "sort": [ 1701962929199 ] } ] } } ``` You can pass the `sort` value of the last hit in a subsequent request where other fields are kept unchanged: ```json { // keep all fields from the original request "search_after": [ 1701962929199 ] } ``` This allows you to paginate your results. ### `_msearch`   Multi search API ``` POST api/v1/_elastic/_msearch ``` #### Request Body example ```json {"index": "gharchive" } {"query" : {"match" : { "author.login": "fulmicoton"}}} {"index": "gharchive"} {"query" : {"match_all" : {}}} ``` [Multi search endpoint ES API reference](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/search-multi-search.html) Runs several search requests at once. The payload is expected to alternate: - a `header` json object, containing the targeted index id. - a `search request body` as defined in the [`_search` endpoint section]. ### `_search/scroll`   Scroll API ``` GET api/v1/_elastic/_search/scroll ``` #### Supported Request Body parameters | Variable | Type | Description | Default value | | ----------- | ------------------------------------------- | ----------- | ------------- | | `scroll_id` | Scroll id (obtained from a search response) | Required | | The `_search/scroll` endpoint, in combination with the `_search` API makes it possible to request successive pages of search results. First, the client needs to call the `search api` with a `scroll` query parameter, and then pass the `scroll_id` returned in the response payload to `_search/scroll` endpoint. Each subsequent call to the `_search/scroll` endpoint will return a new `scroll_id` pointing to the next page. :::tip Using `_search` and then `_search/scroll` is somewhat similar to using `_search` with the `search_after` parameter, except that it creates a lightweight snapshot view of the dataset during the initial call to `_search`. Further calls to `_search/scroll` only return results from that view, thus ensuring more consistent results. ::: ### `_cat`   Cat API ``` GET api/v1/_elastic/_cat/indices/ ``` ``` GET api/v1/_elastic/_cat/indices ``` #### Supported Query string parameters | Variable | Type | Description | Default value | |----------|------------|--------------------------------------------------------------------------------------------------------|---------------| | `format` | `String` | Format for response. Only JSON supported for now. | | | `h` | `String[]` | Comma-separated list of column names to display. | (Optional) | | `health` | `String` | Filter for health: `green`, `yellow`, or `red`. | (Optional) | | `bytes` | `String` | Unit used to display byte values. Unsupported for now. | (Optional) | | `s` | `String` | Comma-separated list of column names or column aliases used to sort the response. Unsupported for now. | (Optional) | | `v` | `Boolean` | If true, the response includes column headings. Unsupported for now. | (Optional) | Use the [cat indices API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.html) to get the following information for each index in a cluster: * Shard count * Document count * Deleted document count * Primary store size * Total store size #### Response The response is a JSON object, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |------------------|--------------------------------------------------|:--------:| | `uuid` | Index uuid | `String` | | `index` | Index name | `String` | | `health` | Health of the index `green`, `yellow`, or `red`. | `String` | | `status` | Status of the index `open`. | `String` | | `rep` | Replication factor. | `Number` | | `pri` | Number of primary shards | `Number` | | `pri.store.size` | Stored size of primary shard. | `String` | | `store.size` | Stored size of index. | `String` | | `dataset.size` | Indexed data size. | `String` | | `docs.count` | Number of records in index. | `Number` | | `docs.deleted` | Number of deleted records in index. | `Number` | Example response: ```json [ { "dataset.size": "0b", "docs.count": "0", "docs.deleted": "0", "health": "green", "index": "otel-traces-v0_7", "pri": "1", "pri.store.size": "0b", "rep": "1", "status": "open", "store.size": "0b", "uuid": "otel-traces-v0_7:01HTJC6TQDGM07KBDQZ2KDHW53" }, { "dataset.size": "387.5gb", "docs.count": "224453081", "docs.deleted": "0", "health": "green", "index": "otel-logs-v0_7", "pri": "1", "pri.store.size": "37.5gb", "rep": "1", "status": "open", "store.size": "37.5gb", "uuid": "otel-logs-v0_7:01HTJC6TME1JGXBFERHZ0FJ860" } ] ``` [HTTP accept header]: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html ### `_field_caps`   Field capabilities API ``` GET api/v1/_elastic//_field_caps ``` ``` POST api/v1/_elastic//_field_caps ``` ``` GET api/v1/_elastic/_field_caps ``` ``` POST api/v1/_elastic/_field_caps ``` The [field capabilities API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-field-caps.html) returns information about the capabilities of fields among multiple indices. #### Supported Query string parameters | Variable | Type | Description | Default value | | --------------------- | ---------- | ------------------------------------------------------------------------------ | ------------- | | `fields` | `String` | Comma-separated list of fields to retrieve capabilities for. Supports wildcards (`*`). | (Optional) | | `allow_no_indices` | `Boolean` | If `true`, missing or closed indices are not an error. | (Optional) | | `expand_wildcards` | `String` | Controls what kind of indices that wildcard patterns can match. | (Optional) | | `ignore_unavailable` | `Boolean` | If `true`, unavailable indices are ignored. | (Optional) | | `start_timestamp` | `Integer` | *(Quickwit-specific)* If set, restricts splits to documents with a timestamp range start >= `start_timestamp` (seconds since epoch). | (Optional) | | `end_timestamp` | `Integer` | *(Quickwit-specific)* If set, restricts splits to documents with a timestamp range end < `end_timestamp` (seconds since epoch). | (Optional) | #### Supported Request Body parameters | Variable | Type | Description | Default value | | ------------------ | ------------- | --------------------------------------------------------------------------- | ------------- | | `index_filter` | `Json object` | A query to filter indices. If provided, only fields from indices that can potentially match the filter are returned. See [index_filter](#index_filter). | (Optional) | | `runtime_mappings` | `Json object` | Accepted but not supported. | (Optional) | #### `index_filter` The `index_filter` parameter allows you to filter which indices contribute to the field capabilities response. When provided, Quickwit uses the filter query to prune indices (splits) that cannot match the filter, and only returns field capabilities for the remaining ones. Like Elasticsearch, this is a **best-effort** approach: Quickwit may return field capabilities from indices that do not actually contain any matching documents. In Quickwit, the filtering is limited to the existing split-pruning based on metadata: - **Time pruning**: Range queries on the timestamp field can eliminate splits whose time range does not overlap with the filter. - **Tag pruning**: Term queries on [tag fields](../configuration/index-config.md#tag-fields) can eliminate splits that do not contain the requested tag value. Other filter types (e.g. full-text queries or term queries on non-tag fields) are accepted but will not prune any splits — all indices will be returned as if no filter was specified. In particular, Quickwit does not check whether terms are present in the term dictionary. #### Request Body example ```json { "index_filter": { "range": { "timestamp": { "gte": "2024-01-01T00:00:00Z", "lt": "2024-02-01T00:00:00Z" } } } } ``` ```json { "index_filter": { "term": { "status": "active" } } } ``` ## Query DSL [Elasticsearch Query DSL reference](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl.html). The following query types are supported. ### `query_string` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-query-string-query.html) #### Example ```json { "query": { "query_string": { "query": "bitpacking AND author.login:fulmicoton", "fields": [ "payload.description" ] } } } ``` #### Supported parameters | Variable | Type | Description | Default value | | ------------------ | --------------------- | --------------------------------------------------------------------------------------------------------------------------- | ------------- | | `query` | `String` | Query meant to be parsed. | - | | `fields` | `String[]` (Optional) | Default search target fields. | - | | `default_operator` | `"AND"` or `"OR"` | In the absence of boolean operator defines whether terms should be combined as a conjunction (`AND`) or disjunction (`OR`). | `OR` | | `boost` | `Number` | Multiplier boost for score computation. | 1.0 | | `lenient` | `Boolean` | [See note](#about-the-lenient-argument). | false | ### `bool` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-term-query.html) #### Example ```json { "query": { "bool": { "must": [ { "query_string": { "query": "bitpacking" } } ], "must_not": { "term": { "type": { "value": "CommitEvent" } } } } } } ``` #### Supported parameters | Variable | Type | Description | Default value | | ---------- | ------------------------- | ----------------------------------------------------------------- | ------------- | | `must` | `JsonObject[]` (Optional) | Sub-queries required to match the document. | [] | | `must_not` | `JsonObject[]` (Optional) | Sub-queries required to not match the document. | [] | | `should` | `JsonObject[]` (Optional) | Sub-queries that should match the documents. | [] | | `filter` | `JsonObject[]` | Like must queries, but the match does not influence the `_score`. | [] | | `boost` | `Number` | Multiplier boost for score computation. | 1.0 | | `minimum_should_match` | `Number` or `Str` | If present, quickwit will only match documents for which at least `minimum_should_match` should clauses are matching. `2`, `-1`, `"10%"` and `"-10%"` are supported. | | ### `range` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-range-query.html) #### Example ```json { "query": { "range": { "my_date_field": { "lt": "2015-02-01T00:00:13Z", "gte": "2015-02-01T00:00:10Z" } } } } ``` #### Supported parameters | Variable | Type | Description | Default value | | -------- | ------------------------------- | -------------------------------------- | ------------- | | `gt` | bool, string, Number (Optional) | Greater than | None | | `gte` | bool, string, Number (Optional) | Greater than or equal | None | | `lt` | bool, string, Number (Optional) | Less than | None | | `lte` | bool, string, Number (Optional) | Less than or equal | None | | `boost` | `Number` | Multiplier boost for score computation | 1.0 | ### `match` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-match-query.html) #### Example ```json { "query": { "match": { "type": { "query": "CommitEvent", "zero_terms_query": "all" } } } } ``` #### Supported Parameters | Variable | Type | Description | Default | | ------------------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------ | ------- | | `query` | String | Full-text search query. | - | | `operator` | `"AND"` or `"OR"` | Defines whether all terms should be present (`AND`) or if at least one term is sufficient to match (`OR`). | OR | | `zero_terms_query` | `all` or `none` | Defines if all (`all`) or no documents (`none`) should be returned if the query does not contain any terms after tokenization. | `none` | | `boost` | `Number` | Multiplier boost for score computation | 1.0 | | `lenient` | `Boolean` | [See note](#about-the-lenient-argument). | false | ### `match_phrase` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-match-query-phrase.html) #### Example ```json { "query": { "match_phrase": { "title": "search keywords", "analyzer": "default" } } } ``` ### `match_phrase_prefix` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html) #### Example ```json { "query": { "match_phrase_prefix": { "payload.commits.message": { "query": "automated comm" // This will match "automated commit" for instance. } } } } ``` #### Supported Parameters | Variable | Type | Description | Default | | ------------------ | --------------- | ------------------------------------------------------------------------------------------------------------------------------ | --------------------------- | | `query` | String | Full-text search query. The last token will be prefix-matched | - | | `zero_terms_query` | `all` or `none` | Defines if all (`all`) or no documents (`none`) should be returned if the query does not contain any terms after tokenization. | `none` | | `max_expansions` | `Integer` | Number of terms to be match by the prefix matching. | 50 | | `slop` | `Integer` | Allows extra tokens between the query tokens. | 0 | | `analyzer` | String | Analyzer meant to cut the query into terms. It is recommended to NOT use this parameter. | The actual field tokenizer. | ### `match_bool_prefix` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html) #### Example ```json { "query": { "match_bool_prefix": { "payload.commits.message": { "query": "automated comm" // This will match "automated commit" for instance. } } } } ``` Contrary to ES/Opensearch, in Quickwit, at most 50 terms will be considered when searching the last term of the query as a prefix `match_bool_prefix`. #### Supported Parameters | Variable | Type | Description | Default | | ------------------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------ | ------- | | `query` | String | Full-text search query. The last token will be prefix-matched | - | | `operator` | `"AND"` or `"OR"` | Defines whether all terms should be present (`AND`) or if at least one term is sufficient to match (`OR`). | OR | | `zero_terms_query` | `all` or `none` | Defines if all (`all`) or no documents (`none`) should be returned if the query does not contain any terms after tokenization. | `none` | ### `Multi-match` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-multi-match-query.html) #### Example ```json { "query": { "multi_match": { "query": "search keywords", "fields": [ "title", "body" ] } } } ``` ```json { "query": { "multi_match": { "query": "search keywords", "type": "most_fields", "fields": [ "title", "body" ] } } } ``` ```json { "query": { "multi_match": { "query": "search keywords", "type": "phrase", "fields": [ "title", "body" ] } } } ``` ```json { "query": { "multi_match" : { "query": "search key", "type": "phrase_prefix", "fields": [ "title", "body" ] } } } ``` #### Supported parameters | Variable | Type | Description | Default value | | ------------------ | --------------------- | ---------------------------------------------| ------------- | | `type` | `String` | See supported types below | `most_fields` | | `fields` | `String[]` (Optional) | Default search target fields. | - | | `lenient` | `Boolean` | [See note](#about-the-lenient-argument). | false | Supported types: | `type` value | Description | | --------------- | ------------------------------------------------------------------------------------------- | | `most_fields` | Finds documents matching any field and combines the `_score` from each field (default). | | `phrase` | Runs a `match_phrase` query on each field. | | `phrase_prefix` | Runs a `match_phrase_prefix` query on each field. | | `bool_prefix` | Runs a `match_bool_prefix` query on each field. | :::warning In `phrase`, `phrase_prefix` and `bool_prefix` modes, Quickwit sums the score of the different fields instead of returning their max. Moreover, while Quickwit does not support `best_fields` or `cross_fields`, it will not return an error when presented a `best_fields` or `cross_fields` type. For compatibilility reasons, Quickwit silently accepts these parameters and interprets them as a `most_fields` type. ::: ### `term` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-term-query.html) :::note When working on text, it is recommended to only use `term` queries on fields configured with `tokenizer: raw`. This is the Quickwit equivalent of the Elasticsearch `keyword` type. ::: #### Example ```json { "query": { "term": { "payload.commits.message": { "value": "automated", "boost": 2.0 } } } } ``` #### Supported Parameters | Variable | Type | Description | Default | | ------------------ | ------- | ---------------------------------------------------------------------------- | ------- | | `value` | String | Term value. This is the string representation of a token after tokenization. | - | | `boost` | Number | Multiplier boost for score computation | 1.0 | | `case_insensitive` | Boolean | Allows ASCII case insensitive matching of the value. | false | ### `match_all` / `match_none` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-all-query.html) #### Example ```json {"match_all": {}} ``` ```json {"match_none": {}} ``` ### `exists` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-exists-query.html) Query matching only documents containing a non-null value for a given field. #### Example ```json { "query": { "exists": { "field": "author.login" } } } ``` #### Supported Parameters | Variable | Type | Description | Default | | -------- | ------ | ------------------------------------------------------- | ------- | | `field` | String | Only documents with a value for field will be returned. | - | ### `prefix` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-prefix-query.html) Returns documents that contain a specific prefix in a provided field. #### Example ```json { "query": { "prefix": { "author.login" { "value": "adm", } } } } ``` #### Supported Parameters | Variable | Type | Description | Default | | ------------------ | ------- | ---------------------------------------------------- | ------- | | `value` | String | Beginning characters of terms you wish to find. | - | | `case_insensitive` | Boolean | Allows ASCII case insensitive matching of the value. | false | ### `wildcard` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-wildcard-query.html) Returns documents that contain terms matching a wildcard pattern: * `?` replaces one and only one term character * `*` replaces any number of term characters or an empty string #### Example ```json { "query": { "wildcard": { "author.login" { "value": "adm?n*", } } } } ``` #### Supported Parameters | Variable | Type | Description | Default | | ------------------ | ------- | ---------------------------------------------------- | ------- | | `value` | String | Wildcard pattern for terms you wish to find. | - | | `boost` | Number | Multiplier boost for score computation. | 1.0 | | `case_insensitive` | Boolean | Allows ASCII case insensitive matching of the value. | false | ### `regexp` [Elasticsearch reference documentation](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl-regexp-query.html) Returns documents that contain terms matching a regular expression. #### Example ```json { "query": { "regexp": { "author.login" { "value": "adm.*n", } } } } ``` #### Supported Parameters | Variable | Type | Description | Default | | ------------------ | ------- | ---------------------------------------------------- | ------- | | `value` | String | Wildcard pattern for terms you wish to find. | - | | `case_insensitive` | Boolean | Allows ASCII case insensitive matching of the value. | false | ### About the `lenient` argument Quickwit and Elasticsearch have different interpretations of the `lenient` setting: - In Quickwit, lenient mode allows ignoring parts of the query that reference non-existing columns. This is a behavior that Elasticsearch supports by default. - In Elasticsearch, lenient mode primarily addresses type errors (such as searching for text in an integer field). Quickwit always supports this behavior, regardless of the `lenient` setting. ## Search multiple indices Search APIs that accept requests path parameter also support multi-target syntax. ### Multi-target syntax In multi-target syntax, you can use a comma or its URL encoded version '%2C' separated list to run a request on multiple indices: test1,test2,test3. You can also sue [glob-like](https://en.wikipedia.org/wiki/Glob_(programming)) wildcard ( \* ) expressions to target indices that match a pattern: test\* or \*test or te\*t or \*test\*. The multi-target expression has the following constraints: - It must follow the regex `^[a-zA-Z\*][a-zA-Z0-9-_\.\*]{0,254}$`. - It cannot contain consecutive asterisks (`*`). - If it does not contain an asterisk (`*`), the length must be greater than or equal to 3 characters. ### Examples ``` GET api/v1/_elastic/stackoverflow-000001,stackoverflow-000002/_search { "query": { "query_string": { "query": "search AND engine", "fields": [ "title", "body" ] } } } ``` ``` GET api/v1/_elastic/stackoverflow*/_search { "query": { "query_string": { "query": "search AND engine", "fields": [ "title", "body" ] } } } ``` ================================================ FILE: docs/reference/metrics.md ================================================ --- title: Metrics sidebar_position: 70 --- Quickwit exposes key metrics in the [Prometheus](https://prometheus.io/) format on the `/metrics` endpoint. You can use any front-end that supports Prometheus to examine the behavior of Quickwit visually. ## Cache Metrics Currently Quickwit exposes metrics for three caches: `fastfields`, `shortlived`, `splitfooter`. These metrics share the same structure. | Namespace | Metric Name | Description | Type | | --------- | ----------- | ----------- | ---- | | `quickwit_cache_{cache_name}` | `in_cache_count` | Count of {cache_name} in cache | `gauge` | | `quickwit_cache_{cache_name}` | `in_cache_num_bytes` | Number of {cache_name} bytes in cache | `gauge` | | `quickwit_cache_{cache_name}` | `cache_hit_total` | Number of {cache_name} cache hits | `counter` | | `quickwit_cache_{cache_name}` | `cache_hits_bytes` | Number of {cache_name} cache hits in bytes | `counter` | | `quickwit_cache_{cache_name}` | `cache_miss_total` | Number of {cache_name} cache hits | `counter` | ## CLI Metrics | Namespace | Metric Name | Description | Type | | --------- | ----------- | ----------- | ---- | | `quickwit` | `allocated_num_bytes` | Number of bytes allocated memory, as reported by jemalloc. | `gauge` | ## Common Metrics | Namespace | Metric Name | Description | Labels | Type | | --------- | ----------- | ----------- | ------ | ---- | | `quickwit` | `write_bytes`| Number of bytes written by a given component in [`indexer`, `merger`, `deleter`, `split_downloader_{merge,delete}`] | [`index`, `component`] | `counter` | ## Indexing Metrics | Namespace | Metric Name | Description | Labels | Type | | --------- | ----------- | ----------- | ------ | ---- | | `quickwit_indexing` | `processed_docs_total`| Number of processed docs by index, source and processed status in [`valid`, `schema_error`, `parse_error`, `transform_error`] | [`index`, `source`, `docs_processed_status`] | `counter` | | `quickwit_indexing` | `processed_bytes`| Number of processed bytes by index, source and processed status in [`valid`, `schema_error`, `parse_error`, `transform_error`] | [`index`, `source`, `docs_processed_status`] | `counter` | | `quickwit_indexing` | `available_concurrent_upload_permits`| Number of available concurrent upload permits by component in [`merger`, `indexer`] | [`component`] | `gauge` | | `quickwit_indexing` | `ongoing_merge_operations`| Number of available concurrent upload permits by component in [`merger`, `indexer`]. | [`index`, `source`] | `gauge` | ## Ingest Metrics | Namespace | Metric Name | Description | Type | | --------- | ----------- | ----------- | ---- | | `quickwit_ingest` | `docs_bytes_total` | Total size of the docs ingested, measured in ingester's leader, after validation and before persistence/replication | `counter` | | `quickwit_ingest` | `docs_total` | Total number of the docs ingested, measured in ingester's leader, after validation and before persistence/replication | `counter` | | `quickwit_ingest` | `queue_count` | Number of queues currently active | `counter` | ## Metastore Metrics All metastore methods are monitored by the 3 metrics: | Namespace | Metric Name | Description | Labels | Type | | --------- | ----------- | ----------- | ------ | ---- | | `quickwit_metastore` | `requests_total` | Number of requests | [`operation`, `index`] | `counter` | | `quickwit_metastore` | `request_errors_total` | Number of failed requests | [`operation`, `index`] | `counter` | | `quickwit_metastore` | `request_duration_seconds` | Duration of requests | [`operation`, `index`, `error`] | `histogram` | Examples of operation names: `create_index`, `index_metadata`, `delete_index`, `stage_splits`, `publish_splits`, `list_splits`, `add_source`, ... ## Rest API Metrics | Namespace | Metric Name | Description | Type | | --------- | ----------- | ----------- | ---- | | `quickwit` | `http_requests_total` | Total number of HTTP requests received | `counter` | ## Search Metrics | Namespace | Metric Name | Description | Type | | --------- | ----------- | ----------- | ---- | | `quickwit_search` | `leaf_searches_splits_total` | Number of leaf searches (count of splits) started | `counter` | | `quickwit_search` | `leaf_search_split_duration_secs` | Number of seconds required to run a leaf search over a single split. The timer starts after the semaphore is obtained | `histogram` | | `quickwit_search` | `active_search_threads_count` | Number of threads in use in the CPU thread pool | `gauge` | ## Storage Metrics | Namespace | Metric Name | Description | Type | | --------- | ----------- | ----------- | ---- | | `quickwit_storage` | `object_storage_gets_total` | Number of objects fetched | `counter` | | `quickwit_storage` | `object_storage_puts_total` | Number of objects uploaded. May differ from object_storage_requests_parts due to multipart upload | `counter` | | `quickwit_storage` | `object_storage_puts_parts` | Number of object parts uploaded | `counter` | | `quickwit_storage` | `object_storage_download_num_bytes` | Amount of data downloaded from an object storage | `counter` | ================================================ FILE: docs/reference/query-language.md ================================================ --- title: Query Language Reference sidebar_position: 40 --- ## Pseudo-grammar ``` query = '(' query ')' | query operator query | unary_operator query | query query | clause operator = 'AND' | 'OR' unary_operator = 'NOT' | '-' clause = field_name ':' field_clause | defaultable_clause | '*' field_clause = term | term_prefix | term_set | phrase | phrase_prefix | range | '*' defaultable_clause = term | term_prefix | term_set | phrase | phrase_prefix ``` --- ## Writing Queries ### Escaping Special Characters Some characters need to be escaped in non quoted terms because they are syntactically significant otherwise: special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. If such such characters appear in query terms, they need to be escaped by prefixing them with an anti-slash `\`. In quoted terms, the quote character in use `'` or `"` needs to be escaped. ###### Allowed characters in field names See the [Field name validation rules](https://quickwit.io/docs/configuration/index-config#field-name-validation-rules) in the index config documentation. ### Addressing nested structures Data stored deep inside nested data structures like `object` or `json` fields can be addressed using dots as separators in the field name. For instance, the document `{"product": {"attributes": {color": "red"}}}` is matched by ``` product.attributes.color:red ``` If the keys of your object contain dots, the above syntax has some ambiguity : by default `{"k8s.component.name": "quickwit"}` will be matched by ```k8s.component.name:quickwit``` It is possible to remove the ambiguity by setting expand_dots in the json field configuration. In that case, it will be necessary to escape the `.` in the query to match this document like this : ``` k8s\.component\.name:quickwit ``` --- ## Structured data ### Datetime Datetime values must be provided in rfc3339 format, such as `1970-01-01T00:00:00Z` ### IP addresses IP addresses can be provided as IPv4 or IPv6. It is recommended to search with the format used when indexing documents. There is no support for searching for a range of IP using CIDR notation, but you can use normal range queries. --- ## Types of clauses ### Term `field:term` ``` term = term_char+ ``` Matches documents if the targeted field contains a token equal to the provided term. `field:value` will match any document where the field 'field' has a token 'value'. ### Wildcard `field:wil?car*d` ``` wildcard = [term_char\*\?]+ ``` Matches documents if the targeted field contains a token that matches the wildcard: - `?` replaces one and only one term character - `*` replaces any number of term characters or an empty string Examples: - `field:quick*` will match any document where the field 'field' has a token like `quickwit` or `quickstart`, but not `qui` or `abcd`. - `field:h?llo` will match any document where the field 'field' has a token like `hello` or `hallo`, but not `heillo` or `hllo`. Queries with prefixes (`field:qui*`) are much more efficient than queries starting with a wildcard (`field:*wit`) ### Term set `field:IN [a b c]` ``` term_set = 'IN' '[' term_list ']' term_list = term_list term | term ``` Matches if the document contains any of the tokens provided. ###### Examples `field:IN [ab cd]` will match 'ab' or 'cd', but nothing else. ###### Performance Note This is a lot like writing `field:ab OR field:cd`. When there are only a handful of terms to search for, using ORs is usually faster. When there are many values to match, a term set query can become more efficient. ### Phrase `field:"sequence of words"` ``` phrase = phrase_string | phrase_string slop phrase_string = '"' phrase_char '"' slop = '~' [01-9]+ ``` Matches if the field contains the sequence of token provided: - `field:"looks good to me"` will match any document containing that sequence of tokens. - `field:"look* good to me"` with the default tokenizer is equivalent to `field:"look good to me"`, i.e. the '*' character is pruned by the tokenizer and not interpreted as a wildcard. :::info The field must have been configured with `record: position` when indexing. ::: ###### Slop operator Is is also possible to add a slop, which allow matching a sequence with some distance. For instance `"looks to me"~1` will match "looks good to me", but not "looks very good to me". Transposition costs 2, e.g. `"A B"~1` will not match `"B A"` but it would with `"A B"~2`. Transposition is not a special case, in the example above A is moved 1 position and B is moved 1 position, so the slop is 2. ### Phrase Prefix `field:"finish this phr"*` ``` phrase_prefix = phrase '*' ``` Matches if the field contains the sequence of token provided, where the last token in the query may be only a prefix of the token in the document. The field must have been configured with `record: position` when indexing. There is no slop for phrase prefix queries. ###### Examples `field:"thanks for your contrib"*` will match 'thanks for your contribution'. ###### Limitation Quickwit may trim some results matched by this clause in some cases. If you search for `"thanks for your co"*`, it will enumerate the first 50 tokens which start with "co" (in their storage order), and search for any documents where "thanks for your" is followed by any of these tokens. If there are many tokens starting with "co", "contribution" might not be one of the 50 selected tokens, and the query won't match a document containing "thanks for your contribution". Normal prefix queries don't suffer from this issue. ### Range `field:[low_bound TO high_bound}` ``` range = explicit_range | comparison_half_range explicit_range = left_bound_char bounds right_bound_char left_bound_char = '[' | '{' right_bound_char = '}' | ']' bounds = term TO term | term TO '*' | '*' TO term comparison_range = comparison_operator term comparison_operator = '<' | '>' | '<=' | '>=' ``` Matches if the document contains a token between the provided bounds for that field. For range queries, you must provide a field. Quickwit won't use `default_search_fields` automatically. ###### Order For text fields, the ranges are defined by lexicographic order on uft-8 encoded byte arrays. It means for a text field, 100 is between 1 and 2. When using ranges on integers, it behaves naturally. ###### Inclusive and exclusive bounds Inclusive bounds are represented by square brackets `[]`. They will match tokens equal to the bound term. Exclusive bounds are represented by curly brackets `{}`. They will not match tokens equal to the bound term. ###### Half-Open bounds You can make an half open range by using `*` as one of the bounds. `field:[b TO *]` will match 'bb' and 'zz', but not 'ab'. You can also use a comparison based syntax:`field:b`, `field:<=b` or `field:>=b`. ###### Examples - Inclusive Range: `ip:[127.0.0.1 TO 127.0.0.50]` - Exclusive Range: `ip:{127.0.0.1 TO 127.0.0.50}` - Unbounded Inclusive Range: `ip:[127.0.0.1 TO *] or ip:>=127.0.0.1` - Unbounded Exclusive Range: `ip:{127.0.0.1 TO *] or ip:>127.0.0.1` ### Exists `field:*` Matches documents where the field is set. You have to specify a field for this query, Quickwit won't use `default_search_fields` automatically. ### Match All `*` Matches every document. You can't put a field in front. It is simply written as `*`. --- ## Building Queries Most queries are composed of more than one clause. When doing so, you may add operators between clauses. Implicitly if no operator is provided, 'AND' is assumed. ### Conjunction `AND` An `AND` query will match only if both sides match. ### Disjunction `OR` An `OR` query will match if either (or both) sides match. ### Negation `NOT` or `-` A `NOT` query will match if the clause it is applied to does not match. The `-` prefix is equivalent to the `NOT` operator. ### Grouping `()` Parentheses are used to force the order of evaluation of operators. For instance, if a query should match if 'field1' is 'one' or 'two', and 'field2' is 'three', you can use `(field1:one OR field1:two) AND field2:three`. ### Operator Precedence Without parentheses, `AND` takes precedence over `OR`. That is, `a AND b OR c` is interpreted as `(a AND b) or c`. `NOT` and `-` takes precedence over everything, such that `-a AND b` means `(-a) AND b`, not `-(a AND B)`. --- ## Other considerations ### Default Search Fields In many case it is possible to omit the field you search if it was configured in the `default_search_fields` array of the index configuration. If more than one field is configured as default, the resulting implicit clauses are combined using a conjunction ('OR'). ### Tokenization Note that the result of a query can depend on the tokenizer used for the field getting searched. Hence this document always speaks of tokens, which may be the exact value the document contain (in case of the raw tokenizer), or a subset of it (for instance any tokenizer cutting on spaces). ================================================ FILE: docs/reference/rest-api.md ================================================ --- title: REST API sidebar_position: 10 --- ## API version All the API endpoints start with the `api/v1/` prefix. `v1` indicates that we are currently using version 1 of the API. ## OpenAPI specification The OpenAPI specification of the REST API is available at `/openapi.json` and a Swagger UI version is available at `/ui/api-playground`. ## Parameters Parameters passed in the URL must be properly URL-encoded, using the UTF-8 encoding for non-ASCII characters. ``` GET [..]/search?query=barack%20obama ``` ## Error handling Successful requests return a 2xx HTTP status code. Failed requests return a 4xx HTTP status code. The response body of failed requests holds a JSON object containing a `message` field that describes the error. ```json { "message": "Failed to parse query" } ``` ## Search API ### Search in an index Search for documents matching a query in the given index `api/v1//search`. This endpoint is available as long as you have at least one node running a searcher service in the cluster. The search endpoint accepts `GET` and `POST` requests. The [parameters](#get-parameters) are URL parameters for `GET` requests or JSON key-value pairs for `POST` requests. ``` GET api/v1//search?query=searchterm ``` ``` POST api/v1//search { "query": searchterm } ``` #### Path variable | Variable | Description | | ------------- | ------------- | | `index id` | The index id | #### Parameters | Variable | Type | Description | Default value | |---------------------|------------|-----------------|-----------------| | `query` | `String` | Query text. See the [query language doc](query-language.md) | _required_ | | `start_timestamp` | `i64` | If set, restrict search to documents with a `timestamp >= start_timestamp`, taking advantage of potential time pruning opportunities. The value must be in seconds. | | | `end_timestamp` | `i64` | If set, restrict search to documents with a `timestamp < end_timestamp`, taking advantage of potential time pruning opportunities. The value must be in seconds. | | | `start_offset` | `Integer` | Number of documents to skip | `0` | | `max_hits` | `Integer` | Maximum number of hits to return (by default 20) | `20` | | `search_field` | `[String]` | Fields to search on if no field name is specified in the query. Comma-separated list, e.g. "field1,field2" | index_config.search_settings.default_search_fields | | `snippet_fields` | `[String]` | Fields to extract snippet on. Comma-separated list, e.g. "field1,field2" | | | `sort_by` | `[String]` | Fields to sort the query results on. You can sort by one or two fast fields or by BM25 `_score` (requires fieldnorms). By default, hits are sorted in reverse order of their [document ID](/docs/overview/concepts/querying.md#document-id) (to show recent events first). | | | `format` | `Enum` | The output format. Allowed values are "json" or "pretty_json" | `pretty_json` | | `aggs` | `JSON` | The aggregations request. See the [aggregations doc](aggregation.md) for supported aggregations. | | :::info The `start_timestamp` and `end_timestamp` should be specified in seconds regardless of the timestamp field precision. ::: #### Response The response is a JSON object, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | | -------------------- | ------------------------------ | :--------: | | `hits` | Results of the query | `[hit]` | | `num_hits` | Total number of matches | `number` | | `elapsed_time_micros` | Processing time of the query | `number` | ### Search multiple indices Search APIs that accept `index id` requests path parameter also support multi-target syntax. #### Multi-target syntax In multi-target syntax, you can use a comma or its URL encoded version '%2C' separated list to run a request on multiple indices: test1,test2,test3. You can also use [glob-like](https://en.wikipedia.org/wiki/Glob_(programming)) wildcard ( \* ) expressions to target indices that match a pattern: test\* or \*test or te\*t or \*test\*. The following are some constrains about the multi-target expression. - It must follow the regex `^[a-zA-Z\*][a-zA-Z0-9-_\.\*]{0,254}$`. - It cannot contain consecutive asterisks (`*`). - If it does not contain an asterisk (`*`), the length must be greater than or equal to 3 characters. #### Examples ``` GET api/v1/stackoverflow-000001,stackoverflow-000002/search { "query": "search AND engine", } ``` ``` GET api/v1/stackoverflow*/search { "query": "search AND engine", } ``` ## Ingest API ### Ingest data into an index ``` POST api/v1//ingest -d \ '{"url":"https://en.wikipedia.org/wiki?id=1","title":"foo","body":"foo"} {"url":"https://en.wikipedia.org/wiki?id=2","title":"bar","body":"bar"} {"url":"https://en.wikipedia.org/wiki?id=3","title":"baz","body":"baz"}' ``` Ingest a batch of documents to make them searchable in a given ``. Currently, NDJSON is the only accepted payload format. This endpoint is only available on a node that is running an indexer service. #### Controlling when the indexed documents will be available for search Newly added documents will not appear in the search results until they are added to a split and that split is committed. This process is automatic and is controlled by `split_num_docs_target` and `commit_timeout_secs` parameters. By default, the ingest command exits as soon as the records are added to the indexing queue, which means that the new documents will not appear in the search results at this moment. This behavior can be changed by adding `commit=wait_for` or `commit=force` parameters to the query. The `wait_for` parameter will cause the command to wait for the documents to be committed according to the standard time or number of documents rules. The `force` parameter will trigger a commit after all documents in the request are processed. It will also wait for this commit to finish before returning. Please note that the `force` option may have a significant performance cost especially if it is used on small batches. ``` POST api/v1//ingest?commit=wait_for -d \ '{"url":"https://en.wikipedia.org/wiki?id=1","title":"foo","body":"foo"} {"url":"https://en.wikipedia.org/wiki?id=2","title":"bar","body":"bar"} {"url":"https://en.wikipedia.org/wiki?id=3","title":"baz","body":"baz"}' ``` :::info The payload size is limited to 10MB [by default](../configuration/node-config.md#ingest-api-configuration) since this endpoint is intended to receive documents in batches. ::: #### Path variable | Variable | Description | | ------------- | ------------- | | `index id` | The index id | #### Query parameters | Variable | Type | Description | Default value | |---------------------|------------|----------------------------------------------------|---------------| | `commit` | `String` | The commit behavior: `auto`, `wait_for` or `force` | `auto` | | `detailed_response` | `bool` | Enable `parse_failures` in the response. Setting to `true` might impact performances negatively. | `false` | #### Response The response is a JSON object, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------:| | `num_docs_for_processing` | Total number of documents submitted for processing. The documents may not have been processed. | `number` | | `num_ingested_docs` | Number of documents successfully persisted in the write ahead log | `number` | | `num_rejected_docs` | Number of documents that couldn't be parsed (invalid json, bad schema...) | `number` | | `parse_failures` | List detailing parsing failures. Only available if `detailed_response` is set to `true`. | `list(object)` | The parse failure objects contain the following fields: - `message`: a detailed message explaining the error - `reason`: one of `invalid_json`, `invalid_schema` or `unspecified` - `document`: the utf-8 decoded string of the document byte chunk that generated the error ## Index API ### Create an index ``` POST api/v1/indexes ``` Create an index by posting an `IndexConfig` payload. The API accepts JSON with `content-type: application/json` and YAML with `content-type: application/yaml`. #### POST payload | Variable | Type | Description | Default value | |---------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------| | `version` | `String` | Config format version, use the same as your Quickwit version. | _required_ | | `index_id` | `String` | Index ID, see its [validation rules](../configuration/index-config.md#index-id) on identifiers. | _required_ | | `index_uri` | `String` | Defines where the index files are stored. This parameter expects a [storage URI](../configuration/storage-config.md#storage-uris). | `{default_index_root_uri}/{index_id}` | | `doc_mapping` | `DocMapping` | Doc mapping object as specified in the [index config docs](../configuration/index-config.md#doc-mapping). | _required_ | | `indexing_settings` | `IndexingSettings` | Indexing settings object as specified in the [index config docs](../configuration/index-config.md#indexing-settings). | | | `search_settings` | `SearchSettings` | Search settings object as specified in the [index config docs](../configuration/index-config.md#search-settings). | | | `retention` | `Retention` | Retention policy object as specified in the [index config docs](../configuration/index-config.md#retention-policy). | | **Payload Example** curl -XPOST http://localhost:7280/api/v1/indexes --data @index_config.json -H "Content-Type: application/json" ```json title="index_config.json { "version": "0.8", "index_id": "hdfs-logs", "doc_mapping": { "field_mappings": [ { "name": "tenant_id", "type": "u64", "fast": true }, { "name": "app_id", "type": "u64", "fast": true }, { "name": "timestamp", "type": "datetime", "input_formats": ["unix_timestamp"], "fast_precision": "seconds", "fast": true }, { "name": "body", "type": "text", "record": "position" } ], "partition_key": "tenant_id", "max_num_partitions": 200, "tag_fields": ["tenant_id"], "timestamp_field": "timestamp" }, "search_settings": { "default_search_fields": ["body"] }, "indexing_settings": { "merge_policy": { "type": "limit_merge", "max_merge_ops": 3, "merge_factor": 10, "max_merge_factor": 12 } }, "retention": { "period": "7 days", "schedule": "@daily" } } ``` #### Response The response is the index metadata of the created index, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |----------------------|-----------------------------------------------|:---------------------:| | `version` | The current index configuration format version. | `string` | | `index_uid` | The server-generated index UID. | `string` | | `index_config` | The posted index config. | `IndexConfig` | | `checkpoint` | Map of checkpoints by source. | `IndexCheckpoint` | | `create_timestamp` | Index creation timestamp | `number` | | `sources` | List of the index sources configurations. | `Array` | ### Update an index ``` PUT api/v1/indexes/ ``` #### Path variable | Variable | Description | | ------------- | ------------- | | `index id` | The index id | #### Query parameters | Variable | Type | Description | Default value | |-----------|--------|-----------------------------------------------|---------------| | `create` | `bool` | Create the index if it doesn't already exists | `false` | Update the configurations of an index. This endpoint follows PUT semantics, which means that all the fields of the current configuration are replaced by the values specified in this request or the associated defaults. In particular, if the field is optional (e.g. `retention_policy`), omitting it will delete the associated configuration. If the new configuration file contains updates that cannot be applied, the request fails, and none of the updates are applied. The API accepts JSON with `content-type: application/json` and YAML with `content-type: application/yaml`. - The retention policy update is automatically picked up by the janitor service on its next state refresh. - The search settings update is automatically picked up by searcher nodes when the next query is executed. - The indexing settings update is automatically picked up by the indexer nodes once the control plane emits a new indexing plan. - The doc mapping update is automatically picked up by the indexer nodes once the control plane emit a new indexing plan. :::warning If you use the ingest or ES bulk API (V2), the old doc mapping will still be used to validate new documents that end up being persisted on existing shards (see [#5738](https://github.com/quickwit-oss/quickwit/issues/5738)). ::: Updating the doc mapping doesn't reindex existing data. Queries and results are mapped on a best-effort basis when querying older splits. For more details, check [the reference](updating-mapper.md) out. #### PUT payload | Variable | Type | Description | Default value | |---------------------|--------------------|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------| | `version` | `String` | Config format version, use the same as your Quickwit version. | _required_ | | `index_id` | `String` | Index ID, must be the same index as in the request URI. | _required_ | | `index_uri` | `String` | Defines where the index files are stored. Cannot be updated. | `{default_index_root_uri}/{index_id}` | | `doc_mapping` | `DocMapping` | Doc mapping object as specified in the [index config docs](../configuration/index-config.md#doc-mapping). | _required_ | | `indexing_settings` | `IndexingSettings` | Indexing settings object as specified in the [index config docs](../configuration/index-config.md#indexing-settings). | | | `search_settings` | `SearchSettings` | Search settings object as specified in the [index config docs](../configuration/index-config.md#search-settings). | | | `retention` | `Retention` | Retention policy object as specified in the [index config docs](../configuration/index-config.md#retention-policy). | | **Payload Example** curl -XPUT http://localhost:7280/api/v1/indexes/hdfs-logs --data @updated_index_update.json -H "Content-Type: application/json" ```json title="updated_index_update.json { "version": "0.8", "index_id": "hdfs-logs", "doc_mapping": { "field_mappings": [ { "name": "tenant_id", "type": "u64", "fast": true }, { "name": "app_id", "type": "u64", "fast": true }, { "name": "timestamp", "type": "datetime", "input_formats": ["unix_timestamp"], "fast_precision": "seconds", "fast": true }, { "name": "body", "type": "text", "record": "position" } ], "partition_key": "tenant_id", "max_num_partitions": 200, "tag_fields": ["tenant_id"], "timestamp_field": "timestamp" }, "search_settings": { "default_search_fields": ["body"] }, "indexing_settings": { "merge_policy": { "type": "limit_merge", "max_merge_ops": 3, "merge_factor": 10, "max_merge_factor": 12 } }, "retention": { "period": "30 days", "schedule": "@daily" } } ``` #### Response The response is the index metadata of the updated index, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |----------------------|-----------------------------------------|:---------------------:| | `version` | The current server configuration version. | `string` | | `index_uid` | The server-generated index UID. | `string` | | `index_config` | The posted index config. | `IndexConfig` | | `checkpoint` | Map of checkpoints by source. | `IndexCheckpoint` | | `create_timestamp` | Index creation timestamp | `number` | | `sources` | List of the index sources configurations. | `Array` | ### Get an index metadata ``` GET api/v1/indexes/ ``` Get the index metadata of ID `index id`. #### Response The response is the index metadata of the requested index, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |----------------------|-------------------------------------------|:---------------------:| | `version` | The current server configuration version. | `string` | | `index_uid` | The server-generated index UID. | `string` | | `index_config` | The posted index config. | `IndexConfig` | | `checkpoint` | Map of checkpoints by source. | `IndexCheckpoint` | | `create_timestamp` | Index creation timestamp. | `number` | | `sources` | List of the index sources configurations. | `Array` | ### Describe an index ``` GET api/v1/indexes//describe ``` Describes an index of ID `index id`. #### Response The response is the stats about the requested index, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |-------------------------------------|----------------------------------------------------------|:---------------------:| | `index_id` | Index ID of index. | `String` | | `index_uri` | Uri of index | `String` | | `num_published_splits` | Number of published splits. | `number` | | `size_published_splits` | Size of published splits. | `number` | | `num_published_docs` | Number of published documents. | `number` | | `size_published_docs_uncompressed` | Size of the published documents in bytes (uncompressed). | `number` | | `timestamp_field_name` | Name of timestamp field. | `String` | | `min_timestamp` | Starting time of timestamp. | `number` | | `max_timestamp` | Ending time of timestamp. | `number` | ### Get splits ``` GET api/v1/indexes//splits ``` Get splits belongs to an index of ID `index id`. #### Path variable | Variable | Description | | ------------- | ------------- | | `index id` | The index id | #### Get parameters | Variable | Type | Description | |---------------------|------------|------------------------------------------------------------------------------------------------------------------| | `offset` | `number` | If set, restrict the number of splits to skip| | `limit ` | `number` | If set, restrict maximum number of splits to retrieve| | `split_states` | `usize` | If set, specific split state(s) to filter by| | `start_timestamp` | `number` | If set, restrict splits to documents with a `timestamp >= start_timestamp| | `end_timestamp` | `number` | If set, restrict splits to documents with a `timestamp < end_timestamp| | `end_create_timestamp` | `number` | If set, restrict splits whose creation dates are before this date| #### Response The response is the stats about the requested index, and the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |-------------------------------------|----------------------------------------------------------|:---------------------:| | `offset` | Index ID of index. | `String` | | `size` | Uri of index | `String` | | `splits` | Number of published splits. | `List` | #### Examples ``` GET /api/v1/indexes/stackoverflow/splits?offset=0&limit=10 ``` ```json { "offset": 0, "size": 1, "splits": [ { "split_state": "Published", "update_timestamp": 1695642901, "publish_timestamp": 1695642901, "version": "0.7", "split_id": "01HB632HD8W6WHNM7CZFH3KG1X", "index_uid": "stackoverflow:01HB6321TDT3SP58D4EZP14KSX", "partition_id": 0, "source_id": "_ingest-api-source", "node_id": "jerry", "num_docs": 10000, "uncompressed_docs_size_in_bytes": 6674940, "time_range": { "start": 1217540572, "end": 1219335682 }, "create_timestamp": 1695642900, "maturity": { "type": "immature", "maturation_period_millis": 172800000 }, "tags": [], "footer_offsets": { "start": 4714989, "end": 4719999 }, "delete_opstamp": 0, "num_merge_ops": 0 } ] } ``` ### Clears an index ``` PUT api/v1/indexes//clear ``` Clears index of ID `index id`: all splits will be deleted (metastore + storage) and all source checkpoints will be reset. It returns an empty body. ### Delete an index ``` DELETE api/v1/indexes/ ``` Delete index of ID `index id`. #### Response The response is the list of deleted split files; the content type is `application/json; charset=UTF-8.` ```json [ { "split_id": "01GK1XNAECH7P14850S9VV6P94", "num_docs": 1337, "uncompressed_docs_size_bytes": 23933408, "file_name": "01GK1XNAECH7P14850S9VV6P94.split", "file_size_bytes": 2991676 } ] ``` ### Get all indexes metadata ``` GET api/v1/indexes ``` Retrieve the metadata of all indexes present in the metastore. #### Response The response is an array of `IndexMetadata`, and the content type is `application/json; charset=UTF-8.` ### Create a source ``` POST api/v1/indexes//sources ``` Create source by posting a source config JSON payload. #### POST payload | Variable | Type | Description | Default value | |-------------------|----------|----------------------------------------------------------------------------------------|---------------| | `version** | `String` | Config format version, put your current Quickwit version. | _required_ | | `source_id` | `String` | Source ID. See ID [validation rules](../configuration/source-config.md). | _required_ | | `source_type` | `String` | Source type: `kafka`, `kinesis` or `pulsar`. | _required_ | | `num_pipelines` | `usize` | Number of running indexing pipelines per node for this source. | `1` | | `transform` | `object` | A [VRL](https://vector.dev/docs/reference/vrl/) transformation applied to incoming documents, as defined in [source config docs](../configuration/source-config.md#transform-parameters). | `null` | | `params` | `object` | Source parameters as defined in [source config docs](../configuration/source-config.md). | _required_ | **Payload Example** curl -XPOST http://localhost:7280/api/v1/indexes/my-index/sources --data @source_config.json -H "Content-Type: application/json" ```json title="source_config.json { "version": "0.8", "source_id": "kafka-source", "source_type": "kafka", "params": { "topic": "quickwit-fts-staging", "client_params": { "bootstrap.servers": "kafka-quickwit-server:9092" } } } ``` #### Response The response is the created source config, and the content type is `application/json; charset=UTF-8.` ### Update a source ``` PUT api/v1/indexes//sources/ ``` #### Path variable | Variable | Description | | ------------- | ------------- | | `index id` | The index id | | `source id` | The source id | #### Query parameters | Variable | Type | Description | Default value | |-----------|--------|-----------------------------------------------|---------------| | `create` | `bool` | Create the index if it doesn't already exists | `false` | Update a source by posting a source config JSON payload. #### PUT payload | Variable | Type | Description | Default value | |-------------------|----------|----------------------------------------------------------------------------------------|---------------| | `version** | `String` | Config format version, put your current Quickwit version. | _required_ | | `source_id` | `String` | Source ID, must be the same source as in the request URL. | _required_ | | `source_type` | `String` | Source type: `kafka`, `kinesis` or `pulsar`. Cannot be updated. | _required_ | | `num_pipelines` | `usize` | Number of running indexing pipelines per node for this source. | `1` | | `transform` | `object` | A [VRL](https://vector.dev/docs/reference/vrl/) transformation applied to incoming documents, as defined in [source config docs](../configuration/source-config.md#transform-parameters). | `null` | | `params` | `object` | Source parameters as defined in [source config docs](../configuration/source-config.md). | _required_ | :::warning While updating `num_pipelines` and `transform` is generally safe and reversible, updating `params` has consequences specific to the source type and might have side effects such as loosing the source's checkpoints. Perform such updates with great care. ::: **Payload Example** curl -XPOST http://localhost:7280/api/v1/indexes/my-index/sources --data @source_config.json -H "Content-Type: application/json" ```json title="source_config.json { "version": "0.8", "source_id": "kafka-source", "source_type": "kafka", "params": { "topic": "quickwit-fts-staging", "client_params": { "bootstrap.servers": "kafka-quickwit-server:9092" } } } ``` #### Response The response is the created source config, and the content type is `application/json; charset=UTF-8.` ### Toggle source ``` PUT api/v1/indexes//sources//toggle ``` Toggle (enable/disable) source `source id` of index ID `index id`. It returns an empty body. #### PUT payload | Variable | Type | Description | |-------------------|----------|------------------------------------------------------------------------------------------------------| | `enable` | `bool` | If `true` enable the source, else disable it. | ### Reset source checkpoint ``` PUT api/v1/indexes//sources//reset-checkpoint ``` Resets checkpoints of source `source id` of index ID `index id`. It returns an empty body. ### Delete a source ``` DELETE api/v1/indexes//sources/ ``` Delete source of ID ``. ## Cluster API This endpoint lets you check the state of the cluster from the point of view of the node handling the request. ``` GET api/v1/cluster?format=pretty_json ``` #### Parameters Name | Type | Description | Default value --- | --- | --- | --- `format` | `String` | The output format requested for the response: `json` or `pretty_json` | `pretty_json` ## Delete API The delete API enables to delete documents matching a query. ### Create a delete task ``` POST api/v1//delete-tasks ``` Create a delete task that will delete all documents matching the provided query in the given index ``. The endpoint simply appends your delete task to the delete task queue in the metastore. The deletion will eventually be executed. #### Path variable | Variable | Description | | ------------- | ------------- | | `index id` | The index id | #### POST payload `DeleteQuery` | Variable | Type | Description | Default value | |---------------------|------------|---------------------------------------------------------------------------------------------------------|----------------------------------------------------| | `query` | `String` | Query text. See the [query language doc](query-language.md) | _required_ | | `search_field` | `[String]` | Fields to search on. Comma-separated list, e.g. "field1,field2" | index_config.search_settings.default_search_fields | | `start_timestamp` | `i64` | If set, restrict search to documents with a `timestamp >= start_timestamp`. The value must be in seconds. | | | `end_timestamp` | `i64` | If set, restrict search to documents with a `timestamp < end_timestamp`. The value must be in seconds. | | **Example** ```json { "query": "body:trash", "start_timestamp": "1669738645", "end_timestamp": "1669825046", } ``` #### Response The response is the created delete task represented in JSON, `DeleteTask`, the content type is `application/json; charset=UTF-8.` | Field | Description | Type | |----------------------|--------------------------------------------------------|:-------------:| | `create_timestamp` | Create timestamp of the delete query in seconds | `i64` | | `opstamp` | Unique operation stamp associated with the delete task | `u64` | | `delete_query` | The posted delete query | `DeleteQuery` | ### List delete queries ``` GET api/v1//delete-tasks ``` Get the list of delete tasks for a given `index_id`. #### Response The response is an array of `DeleteTask`. ## Index template API This API manages index template resources. Templates are higher level configuration objects used to automatically create indexes according to predefined rules. See [index template configuration](../configuration/template-config.md). ### Create a template ``` POST api/v1/templates ``` #### POST payload Create an index template by posting a [template configuration](../configuration/template-config.md) payload. The API accepts JSON with the header `content-type: application/json` and YAML with `content-type: application/yaml`. **Example** ```yaml version: 0.9 # File format version. template_id: "all-logs" index_root_uri: "s3://my-bucket/logs/" description: "All my logs" index_id_patterns: - logs-* priority: 100 doc_mapping: mode: dynamic field_mappings: - name: timestamp type: datetime input_formats: - unix_timestamp output_format: unix_timestamp_secs fast: true timestamp_field: timestamp ``` #### Response The created index template configuration as JSON. ### Update a template ``` PUT api/v1/templates/