pycsw is Certified OGC Compliant and is an OGC Reference Implementation
================================================ FILE: docs/_templates/layout.html ================================================ {% extends "!layout.html" %} {%- block extrahead %} {{ super() }} {% endblock %} {% block relbar1 %} {{ super() }} {% endblock %} {% block footer %} {% endblock %} ================================================ FILE: docs/administration.rst ================================================ .. _administration: Administration ============== pycsw administration is handled by the ``pycsw-admin.py`` utility. ``pycsw-admin.py`` is installed as part of the pycsw install process and should be available in your PATH. .. note:: Run ``pycsw-admin.py --help`` to see all administration operations and parameters Metadata Repository Setup ------------------------- pycsw supports the following databases: - SQLite3 - PostgreSQL (without PostGIS) - PostgreSQL with PostGIS enabled - MySQL .. note:: The easiest and fastest way to deploy pycsw is to use SQLite3 as the backend. To use an SQLite in-memory database, in the pycsw configuration, set `repository.database` to ``sqlite://``. .. note:: PostgreSQL support includes support for PostGIS functions if enabled .. note:: If PostGIS is activated before setting up the pycsw/PostgreSQL database, then native PostGIS geometries will be enabled. To expose your geospatial metadata via pycsw, perform the following actions: - setup the database - import metadata - publish the repository Supported Information Models ---------------------------- By default, pycsw's API supports the core OGC API - Records and CSW Record information models. From the database perspective, the pycsw metadata model is loosely based on ISO 19115 and is able to transform to other formats as part of transformation during OGC API - Records/CSW requests. .. note:: See :ref:`profiles` for information on enabling profiles .. note:: See :ref:`metadata-model-reference` for detailed information on pycsw's internal metadata model Setting up the Database ----------------------- .. code-block:: bash pycsw-admin.py setup-repository --config default.yml This will create the necessary tables and values for the repository. The database created is an `OGC SFSQL`_ compliant database, and can be used with any implementing software. For example, to use with `GDAL`_: .. code-block:: bash ogrinfo /path/to/records.db INFO: Open of 'records.db' using driver 'SQLite' successful. 1: records (Polygon) ogrinfo -al /path/to/records.db # lots of output .. note:: If PostGIS is detected, the ``pycsw-admin.py`` script does not create the SFSQL tables as they are already in the database. Loading Records ---------------- .. code-block:: bash pycsw-admin.py load-records --config default.yml --path /path/to/records This will import all ``*.xml`` records from ``/path/to/records`` into the database specified in ``default.yml`` (``repository.database``). Passing ``-r`` to the script will process ``/path/to/records`` recursively. Passing ``-y`` to the script will force overwrite existing metadata with the same identifier. Note that ``-p`` accepts either a directory path or single file. .. note:: Records can also be imported using CSW-T (see :ref:`transactions`). Exporting the Repository ------------------------ .. code-block:: bash pycsw-admin.py export-records --config default.yml --path /path/to/output_dir This will write each record in the database specified in ``default.yml`` (``repository.database``) to an XML document on disk, in directory ``/path/to/output_dir``. Optimizing the Database ----------------------- .. code-block:: bash pycsw-admin.py optimize-db --config default.yml pycsw-admin.py rebuild-db-indexes --config default.yml .. note:: This feature is relevant only for PostgreSQL and MySQL Deleting Records from the Repository ------------------------------------ .. code-block:: bash pycsw-admin.py delete-records --config default.yml This will empty the repository of all records. Database Specific Notes ----------------------- PostgreSQL ^^^^^^^^^^ - To enable PostgreSQL support, the database user must be able to create functions within the database. - `PostgreSQL Full Text Search`_ is supported for ``csw:AnyText`` based queries. pycsw creates a tsvector column based on the text from anytext column. Then pycsw creates a GIN index against the anytext_tsvector column. This is created automatically in ``pycsw.core.repository.setup``. Any query against the OGC API - Records ``q`` parameter or CSW `csw:AnyText` or `apiso:AnyText` will process using PostgreSQL FTS handling PostGIS ^^^^^^^ - pycsw makes use of PostGIS spatial functions and native geometry data type. - It is advised to install the PostGIS extension before setting up the pycsw database - If PostGIS is detected, the ``pycsw-admin.py`` script will create both a native geometry column and a WKT column, as well as a trigger to keep both synchronized - In case PostGIS gets disabled, pycsw will continue to work with the `WKT`_ column - In case of migration from plain PostgreSQL database to PostGIS, the spatial functions of PostGIS will be used automatically - When migrating from plain PostgreSQL database to PostGIS, in order to enable native geometry support, a "GEOMETRY" column named "wkb_geometry" needs to be created manually (along with the update trigger in ``pycsw.core.repository.setup``). Also the native geometries must be filled manually from the `WKT`_ field. Next versions of pycsw will automate this process .. _custom_repository: Mapping to an Existing Repository --------------------------------- pycsw supports publishing metadata from an existing repository. To enable this functionality, the default database mappings must be modified to represent the existing database columns mapping to the abstract core model (the default mappings are in ``pycsw/core/config.py:StaticContext.md_core_model``). To override the default settings: - define a custom database mapping based on ``etc/mappings.py`` - in ``default.yml``, set ``repository.mappings`` to the location of the mappings.py file: .. code-block:: yaml repository: ... mappings: path/to/mappings.py Note you can also reference mappings as a Python object as a dotted path: .. code-block:: yaml repository: ... mappings: path.to.pycsw_mappings See the :ref:`geonode`, :ref:`hhypermap`, and :ref:`odc` for further examples. .. _existing-repository-requirements: Existing Repository Requirements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pycsw requires certain repository attributes and semantics to exist in any repository to operate as follows: - ``pycsw:Identifier``: unique identifier - ``pycsw:Typename``: typename for the metadata; typically the value of the root element tag (e.g. ``csw:Record``, ``gmd:MD_Metadata``) - ``pycsw:Schema``: schema for the metadata; typically the target namespace (e.g. ``http://www.opengis.net/cat/csw/2.0.2``, ``http://www.isotc211.org/2005/gmd``) - ``pycsw:InsertDate``: date of insertion - ``pycsw:XML``: full XML representation (deprecated; will be removed in a future release) - ``pycsw:Metadata``: full metadata representation - ``pycsw:MetadataType``: media type of metadata representation - ``pycsw:AnyText``: bag of XML element text values, used for full text search. Realized with the following design pattern: - capture all XML element and attribute values - store in repository - ``pycsw:BoundingBox``: string of `WKT`_ or `EWKT`_ geometry The following repository semantics exist if the attributes are specified: - ``pycsw:Keywords``: comma delimited list of keywords - ``pycsw:Themes``: Text field of JSON list of objects with properties ``concepts``, ``scheme`` .. code-block:: json [ { "concepts": [ { "id": "atmosphericComposition" }, { "id": "pollution" }, { "id": "observationPlatform" }, { "id": "rocketSounding" } ], "scheme": "https://wis.wmo.int/2012/codelists/WMOCodeLists.xml#WMO_CategoryCode" } ] - ``pycsw:Contacts``: Text field of JSON list of objects with properties as per the OGC API - Records party definition .. code-block:: json [ { "name": "contact", "individual": "Lastname, Firstname", "positionName": "Position Title", "contactInfo": { "phone": { "office": "+xx-xxx-xxx-xxxx" }, "email": { "office": "you@example.org" }, "address": { "office": { "deliveryPoint": "Mailing Address", "city": "City", "administrativeArea": "Administrative Area", "postalCode": "Zip or Postal Code", "country": "COuntry" }, "onlineResource": { "href": "Contact URL" } }, "hoursOfService": "Hours of Service", "contactInstructions": "During hours of service. Off on weekends", "url": { "rel": "canonical", "type": "text/html", "href": "https://example.org" } }, "roles": [ { "name": "pointOfContact" } ] } ] - ``pycsw:Links``: Text field of JSON list of objects with properties ``name``, ``description``, ``protocol``, ``url`` .. code-block:: json [ { "name": "foo", "description": "bar", "protocol": "OGC:WMS", "url": "https://example.org/wms" } ] .. note:: The ``pycsw:Links`` field should be a text type, not a JSON object type - ``pycsw:Bands``: Text field of JSON list of dicts with properties: ``name``, ``units``, ``min``, ``max`` .. code-block:: json [ { "name": "B1", "units": "nm", "min": 0.1, "max": 0.333 } ] .. note:: The ``pycsw:Bands`` field should be a text type, not a JSON object type Values of mappings can be derived from the following mechanisms: - text fields - Python datetime.datetime or datetime.date objects - Python functions Further information is provided in ``pycsw/config.py:MD_CORE_MODEL``. .. note:: See :ref:`metadata-model-reference` for detailed information on pycsw's internal metadata model Using a SQL View as the repository table ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If your pre-existing database stores information in a normalized fashion, *i.e.* distributed on multiple tables rather than on a single table (which is what pycsw expects by default), you have the option to create a DB view and use that as pycsw's repository. As a practical example, lets say you have a `CKAN`_ project which you would like to also provide pycsw integration with. CKAN stores dataset-related information over multiple tables: - ``package`` - has base metadata fields for each dataset; - ``package_extra`` - additional custom metadata fields, depending on the user's metadata schema; - ``package_tag`` - dataset_related keywords; - ``tag`` - dataset_related keywords; - ``group`` - details about a dataset's owner organization; - etc. One way to adapt such a DB structure to be able to integrate with pycsw is to create a `PostgreSQL Materialized View`_. For example: .. code-block:: SQL CREATE MATERIALIZED VIEW IF NOT EXISTS my_pycsw_view AS WITH cte_extras AS ( SELECT p.id, p.title, g.title AS org_name, json_object_agg(pe.key, pe.value) AS extras, array_agg(DISTINCT t.name) AS tags -- remaining columns omitted for brevity FROM package AS p JOIN package_extra AS pe ON p.id = pe.package_id JOIN "group" AS g ON p.owner_org = g.id JOIN package_tag AS pt ON p.id = pt.package_id JOIN tag AS t on pt.tag_id = t.id WHERE p.state = 'active' AND p.private = false GROUP BY p.id, g.title ) SELECT c.id AS identifier, c.title AS title, c.org_name AS organization, ST_GeomFromGeoJSON(c.extras->>'spatial')::geometry(Polygon, 4326) AS geom, c.extras->>'reference_date' AS date, concat_ws(', ', VARIADIC c.tags) AS keywords -- remaining columns omitted for brevity FROM cte_extras AS c WITH DATA; Creating this SQL view in the database means that all we now have the CKAN dataset information all on a single flat table, ready for pycsw to integrate with. A crucial setup that is required in order for SQL Views to be usable by pycsw is to include the additional ``column_constraints`` property in your custom mappings. This property is used to specify which column(s) should function as the primary key of the SQL View: .. code-block:: python # contents of my_custom_pycsw_mappings.py from sqlalchemy.schema import PrimaryKeyConstraint MD_CORE_MODEL = { "column_constraints": (PrimaryKeyConstraint("identifier"),), "typename": "pycsw:CoreMetadata", "outputschema": "http://pycsw.org/metadata", "mappings": { "pycsw:Identifier": "identifier", # remaining mappings omitted for brevity The above code snippet demonstrates how you could instruct sqlalchemy, which is what pycsw uses to interface with the DB, that the ``identifier`` column of the SQL view should be assumed to be the primary key of the table. Finally, we can configure pycsw with the path to the custom mappings and the name of the SQL view: .. code-block:: yaml # file: pycsw.yml repository: database: postgresql://${DB_USERNAME}:${DB_PASSWORD}@${DB_HOST}/${DB_NAME} mappings: /path/to/my_custom_pycsw_mappings.py table: my_pycsw_view .. _`GDAL`: https://www.gdal.org .. _`OGC SFSQL`: https://www.ogc.org/standards/sfs .. _`WKT`: https://en.wikipedia.org/wiki/Well-known_text .. _`EWKT`: https://en.wikipedia.org/wiki/Well-known_text#Variations .. _`PostgreSQL Full Text Search`: https://www.postgresql.org/docs/current/textsearch.html .. _`CKAN`: https://ckan.org/ .. _`PostgreSQL Materialized View`: https://www.postgresql.org/docs/current/sql-creatematerializedview.html ================================================ FILE: docs/api.rst ================================================ .. _api: API === Python applications can integrate pycsw into their custom workflows. This allows for seamless integate within frameworks such as Flask and Django. Below are examples of where using the API (as opposed to the default WSGI/CGI services could be used: - configuration based on a Python dict, or stored in a database - downstream request environment / framework (Flask, Django) - authentication or authorization logic - forcing CSW version 2.0.2 as default OGC API - Records Flask Example ------------------------------- See https://github.com/geopython/pycsw/blob/master/pycsw/wsgi_flask.py for how to implement a Flask wrapper atop all pycsw supported APIs. Note the use of Flask blueprints to enable integration with downstream Flask applications. Simple Flask blueprint example ------------------------------ .. code-block:: python from flask import Flask, redirect from pycsw.wsgi_flask import BLUEPRINT as pycsw_blueprint app = Flask(__name__, static_url_path='/static') app.url_map.strict_slashes = False app.register_blueprint(pycsw_blueprint, url_prefix='/oapi') @app.route('/') def hello_world(): return "Hello, World!" In the above example, all pycsw endpoints are made available under ``http://localhost:8000/oapi``. Simple CSW Flask Example ------------------------ .. code-block:: python import logging from flask import Flask, request from pycsw import __version__ as pycsw_version from pycsw.server import Csw LOGGER = logging.getLogger(__name__) APP = Flask(__name__) @APP.route('/csw') def csw_wrapper(): """CSW wrapper""" LOGGER.info('Running pycsw %s', pycsw_version) pycsw_config = some_dict # really comes from somewhere # initialize pycsw # pycsw_config: dict of the pycsw configuration # # env: dict of (HTTP) environment (defaults to os.environ) # # version: defaults to '3.0.0' my_csw = Csw(pycsw_config, request.environ, version='2.0.2') # dispatch the request http_status_code, response = my_csw.dispatch_wsgi() return response, http_status_code, {'Content-type': csw.contenttype} ================================================ FILE: docs/ckan.rst ================================================ .. _ckan: CKAN Configuration ================== CKAN (https://ckan.org) is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. `ckanext-spatial`_ is CKAN's geospatial extension. The extension adds a spatial field to the default CKAN dataset schema, using PostGIS as the backend. This allows to perform spatial queries and display the dataset extent on the frontend. It also provides harvesters to import geospatial metadata into CKAN from other sources, as well as commands to support the CSW standard. Finally, it also includes plugins to preview spatial formats such as GeoJSON. CKAN Setup ---------- Installation and configuration Instructions are provided as part of the ckanext-spatial `documentation`_. .. _`ckanext-spatial`: https://github.com/ckan/ckanext-spatial .. _`documentation`: https://docs.ckan.org/projects/ckanext-spatial/en/latest/csw.html ================================================ FILE: docs/committers.rst ================================================ .. _committers: Committers ========== .. include:: ../COMMITTERS.txt ================================================ FILE: docs/conf.py ================================================ # -*- coding: utf-8 -*- # ================================================================= # # Authors: Tom Kralidis
.. _`Jinja`: https://palletsprojects.com/p/jinja/
.. _`Jinja documentation`: https://jinja.palletsprojects.com
.. _`Flask`: https://palletsprojects.com/p/flask/
.. _`Flask documentation`: https://flask.palletsprojects.com
================================================
FILE: docs/index.rst
================================================
.. _index:
=============================
pycsw |release| Documentation
=============================
.. image:: https://zenodo.org/badge/2367090.svg
:target: https://zenodo.org/badge/latestdoi/2367090
:Author: Tom Kralidis
:Contact: tomkralidis at gmail.com
:Release: |release|
:Date: |today|
.. toctree::
:maxdepth: 2
introduction
installation
docker
configuration
administration
metadata-model-reference
oarec-support
csw-support
pubsub
stac
distributedsearching
sru
opensearch
oaipmh
json
soap
sitemaps
transactions
repofilters
profiles
repositories
outputschemas
xslt
html-templating
geonode
hhypermap
odc
ckan
api
testing
migration-guide
tools
support
contributing
license
committers
================================================
FILE: docs/installation.rst
================================================
.. _installation:
Installation
============
System Requirements
-------------------
pycsw is written in `Python This schema document describes the XML namespace, in a form suitable for import by other schema documents.
See http://www.w3.org/XML/1998/namespace.html and http://www.w3.org/TR/REC-xml for information about this namespace.
Note that local names in this namespace are intended to be defined only by the World Wide Web Consortium or its subgroups. The names currently defined in this namespace are listed below. They should not be used with conflicting semantics by any Working Group, specification, or document instance.
See further below in this document for more information about how to refer to this schema document from your own XSD schema documents and about the namespace-versioning policy governing this schema document.
denotes an attribute whose value is a language code for the natural language of the content of any element; its value is inherited. This name is reserved by virtue of its definition in the XML specification.
Attempting to install the relevant ISO 2- and 3-letter codes as the enumerated possible values is probably never going to be a realistic possibility.
See BCP 47 at http://www.rfc-editor.org/rfc/bcp/bcp47.txt and the IANA language subtag registry at http://www.iana.org/assignments/language-subtag-registry for further information.
The union allows for the 'un-declaration' of xml:lang with the empty string.
denotes an attribute whose value is a keyword indicating what whitespace processing discipline is intended for the content of the element; its value is inherited. This name is reserved by virtue of its definition in the XML specification.
denotes an attribute whose value provides a URI to be used as the base for interpreting any relative URIs in the scope of the element on which it appears; its value is inherited. This name is reserved by virtue of its definition in the XML Base specification.
See http://www.w3.org/TR/xmlbase/ for information about this attribute.
denotes an attribute whose value should be interpreted as if declared to be of type ID. This name is reserved by virtue of its definition in the xml:id specification.
See http://www.w3.org/TR/xml-id/ for information about this attribute.
denotes Jon Bosak, the chair of the original XML Working Group. This name is reserved by the following decision of the W3C XML Plenary and XML Coordination groups:
In appreciation for his vision, leadership and dedication the W3C XML Plenary on this 10th day of February, 2000, reserves for Jon Bosak in perpetuity the XML name "xml:Father".
This schema defines attributes and an attribute group suitable
for use by schemas wishing to allow xml:base,
xml:lang, xml:space or
xml:id attributes on elements they define.
To enable this, such a schema must import this schema for the XML namespace, e.g. as follows:
<schema . . .>
. . .
<import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2001/xml.xsd"/>
or
<import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2009/01/xml.xsd"/>
Subsequently, qualified reference to any of the attributes or the group defined below will have the desired effect, e.g.
<type . . .>
. . .
<attributeGroup ref="xml:specialAttrs"/>
will define a type which will schema-validate an instance element with any of those attributes.
In keeping with the XML Schema WG's standard versioning policy, this schema document will persist at http://www.w3.org/2009/01/xml.xsd.
At the date of issue it can also be found at http://www.w3.org/2001/xml.xsd.
The schema document at that URI may however change in the future, in order to remain compatible with the latest version of XML Schema itself, or with the XML namespace itself. In other words, if the XML Schema or XML namespaces change, the version of this document at http://www.w3.org/2001/xml.xsd will change accordingly; the version at http://www.w3.org/2009/01/xml.xsd will not change.
Previous dated (and unchanging) versions of this schema document are at: