Automation — Dynamic sitemap generation with Golang API

Automation: Write less and Get more
Mar 21 2022 · 4 min read

Introduction

A sitemap is a file where you provide information about your site's pages, videos, and other files, and the relationships between them.

A sitemap makes it easy for Search engines like Google to crawl a site more efficiently.

The sitemap is usually added to the website's root directory or sitemap directory. It’s just an XML file containing all possible website routes.

If you are unfamiliar with sitemap, visit this article to learn more.

We are what we repeatedly do. Excellence, then, is not an act, but a habit. Try out Justly and start building your habits today!

1. Background

At canopas, we wanted a sitemap for the Job posting website, in which each job has a separate page.

Initially, we tried many solutions like plugins and some open-source repositories, but they have either not been updated recently or do not fulfill requirements, even though the requirements are very simple.

So eventually we decided to create an API for generating a sitemap and run it at the time of building the web app using shell script which will put sitemap.xml file in the website’s public directory.

Golang provides a feature to create XML format output using structures. So we created the API in Go.

2. Sitemap structure

Here is the XML structure of a simple sitemap. It’s simple! Basically, sitemap is a set of URLs.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://canopas.com</loc>
    <changefreq>monthly</changefreq>
    <lastmod>2022-03-01T00:00:00.000Z</lastmod>
    <priority>1</priority>
  </url>
  <url>
    <loc>https://canopas.com/jobs</loc>
    <changefreq>monthly</changefreq>
    <lastmod>2022-03-01T00:00:00.000Z</lastmod>
    <priority>1</priority>
  </url>
  <url>
    <loc>https://canopas.com/contact</loc>
    <changefreq>monthly</changefreq>
    <lastmod>2022-03-01T00:00:00.000Z</lastmod>
    <priority>0.9</priority>
  </url>
  <url>
    <loc>https://blog.canopas.com</loc>
    <changefreq>monthly</changefreq>
    <lastmod>2022-03-01T00:00:00.000Z</lastmod>
    <priority>0.8</priority>
  </url>
</urlset>

3. Generate sitemap dynamically

I’m assuming you have basic knowledge of gin, routers, databases, and CI/CD configuration.

To start, create a new golang project, add the database and gin’s engine configuration in main.go file.

Now, let’s add code to generate an XML sitemap using API with the following steps.

a. Create XML structs

All sitemap URLs contain loc, changefreq, lastmod and priority fields, We will create a structure URL that contains all these required fields.

Also, we have to add the name of XML which is url to this structure. For that added XMLName field in the URL structure.

package sitemap

type URL struct {
	XMLName    xml.Name `xml:"url"`
	Loc        string   `xml:"loc"`
	ChangeFreq string   `xml:"changefreq"`
	LastMod    string   `xml:"lastmod"`
	Priority   string   `xml:"priority"`
}

And there is one more parent node urlset. We have to create a structure for that also.

type URLSet struct {
    XMLName xml.Name `xml:"urlset"`
    XMLNS   string   `xml:"xmlns,attr"`
    URL     []URL    `xml:"url"`
}

We can add XML node’s attributes like XMLNS using the attr key as shown in the structure. We have taken an array of the website’s URLs inside the urlset .

b. Retrieve all jobs from the database

We have done database configuration before, Now it’s time to get all jobs for the website.

package jobs

import (
	"db"
  
	log "github.com/sirupsen/logrus"
)

type Job struct {
	Id          int    `json:"id"`
	Title       string `json:"title"`
	Description string `json:"description"`
}

func GetJobs() (jobs []Job, err error) {
	err = db.Select(&jobs, `SELECT id, title, description FROM jobs WHERE is_active = 1`)
	if err != nil {
		log.Error(err)
		return nil, err
	}
	return jobs, nil
}

c. Prepare URLs with all required fields

We will get baseUrl of the website as a query parameter of API.

baseUrl := c.Query("baseUrl")

Form all required static and dynamic URLs for the sitemap like below.

changefreq and lastmod will be same for all URLs, we will add it after forming array of all static and dynamic urls.

jobsUrl := baseUrl + '/jobs' 

// Init all static urls for sitemap
sitemapUrls := []URL{
     {Loc: baseUrl, Priority: `1`},
     {Loc: jobsUrl , Priority: `1`},
}

// Init all dynamic urls for sitemap
jobs, err := GetJobs()
for i := range jobs {
    sitemapUrls = append(sitemapUrls, URL{Loc: jobsUrl + `/` + jobs[i].Id, Priority: `0.9`})
}

Now it’s time to add lastmod and changefreq .

lastmod: considered changefreq as monthly, so we will add lastmod as first date of current month.

// Add changefreq and lastmod to all urls
year, month, _ := time.Now().Date()   //get current month and year
lastmod := time.Date(year, month, 1, 0, 0, 0, 0, time.UTC).Format("2006-01-02T00:00:00.000Z")

for i := range sitemapUrls {
     sitemapUrls[i].ChangeFreq = "monthly"
     sitemapUrls[i].LastMod = lastmod
}

d. Format API response in XML

Add XML header and return XML response to the API.

urlset := URLset{URL: sitemapUrls, XMLNS: "http://www.sitemaps.org/schemas/sitemap/0.9"}

c.Header("Content-Type", "application/xml")

c.XML(http.StatusOK, urlset)

Here is the full source code of sitemap generation in go.

Create an API using this function and use it at the time of building the website.

router := gin.Default()
router.GET("/sitemap", GenerateSitemap)

4. Automate sitemap generation on CI

The next step is to create a sitemap.sh file in the website project and add the following commands in it.

#! /bin/bash

set -e

BASE_URL=$1
BASE_API_URL=$2

API_URL="$BASE_API_URL/sitemap?baseUrl=$BASE_URL"

xml=$(curl -X GET --header "Accept: */*" $API_URL)

echo $xml >> public/sitemap.xml

In the above snippet —

  1. Defined the website’s base URL(https://example.com) as BASE_URL.
  2. Defined the go API's base URL(http://localhost:8080) as BASE_API_URL.
  3. Formed sitemap API URL in API_URL .
  4. Invoked API using the curl command
  5. Copied XML response of the API to the website’s public directory using echo

Run this shell script in CI/CD configuration, just before npm run build.

sh sitemap.sh https://example.com http://localhost:8080

Cheers, we have automated dynamic sitemap generation with CI/CD.

Conclusion

Hope this will help you get started on the automation of sitemap generation. This is the simplest method to generate a sitemap without using any external library. Once you have done this, you will never have to worry about generating sitemap again!

We’re Grateful to have you with us on this journey!

Suggestions and feedback are more than welcome! 

Please reach us at Canopas Twitter handle @canopas_eng with your content or feedback. Your input enriches our content and fuels our motivation to create more valuable and informative articles for you.

Similar articles


sumita-k image
Sumita Kevat
Sumita is an experienced software developer with 5+ years in web development. Proficient in front-end and back-end technologies for creating scalable and efficient web applications. Passionate about staying current with emerging technologies to deliver.


sumita-k image
Sumita Kevat
Sumita is an experienced software developer with 5+ years in web development. Proficient in front-end and back-end technologies for creating scalable and efficient web applications. Passionate about staying current with emerging technologies to deliver.


Talk to an expert
get intouch
Our team is happy to answer your questions. Fill out the form and we’ll get back to you as soon as possible
footer
Subscribe Here!
Follow us on
2024 Canopas Software LLP. All rights reserved.