Search Engine Optimization (SEO)
OGC API - Features adopted the Spatial Data on the Web Best Practice 2: Make your spatial data indexable by search engines with the recommendation to include HTML as an output format of any OGC API. It means that users can navigate an OGC API from within their browser and Search Engines are able to crawl the content.
An aspect to consider is that, since the API becomes a webpage, common practices for web architecture and development become relevant:
- does the website have a clear navigation?
- is a company logo, branding, privacy statement, cookie warning included?
- is the webpage WCAG accessable?
Tip
Notice that the pygeoapi configuration also has an option to disable HTML output. In that scenario, only the JSON output is available.
On the Web, websites are typically visited by web crawlers of popular search engines. Crawlers are automated processes which aid in building the index of the search engine. Crawlers follow links on the Web to identify new or updated content. Cross linking your API to other resources therefore increases the visibility (and ranking) of your API.
The British Geo6 wrote an extensive best practice on SEO for data publishers which offers a good overview of SEO in the scope of data publications.
Tweaking Web Crawler behaviour
This paragraph introduces you to some mechanisms which facilitate or block web crawlers to index your content.
If you are not interested in having your content indexed by search engines, you can provide a robots.txt file at the root of your website, specifying which folders should not be indexed. More drastically is the option to block access for crawlers or bots to your content by filtering traffic to the website based on the HTTP User-Agent header. Such a rule can be added to a firewall or web server configuration.
A robots.txt
file can also include a link to a Sitemap. Many search engines provide the option to submit a sitemap
in order to speed up crawling and indexing. pygeoapi does not provide a sitemap of its content, but you can create your own sitemap (publish as /sitemap.xml
),
specifying URLs of your desired content to be indexed.
Search engines provide tooling to evaluate the search behaviour of your website. These tools can provide valuable insight in the findability of your website and content (e.g. keywords used to locate your website).
Schema.org/Dataset
Search engines cooperate in the Schema.org initiative. Schema.org enables you to annotate your website using the schema.org
vocabulaire, in
order for search engines to index the content in a structured manner. Google was the first to employ these annotations to provide a dedicated search engine for datasets. pygeoapi adds schema.org/Dataset
annotations to collection pages, so collections are automagically included in Google's dataset search.
Evaluate the schema.org annotations in collections
Google provides a tool to evaluate Schema.org
annotation in websites. Try evaluating a collection endpoint of pygeoapi
in the tool. If you run pygeoapi locally (not accessible to google), you can copy the source of a page as HTML into the <code>
tab, otherwise you can paste the URL of the page in the URL
tab.
Note
A similar tool is made available by Yandex (note that registration is required).