When we create apps like blogs, articles or news we often need to generate a nice, SEO-style path for the details page, which should contain the title. This looks easy - I spent over a day on this simple challenge, and would like to share what I worked out…
Here's a super-short video explaining why this is important…
So basically the challenge is converting all kinds of titles to paths. Just replacing bad characters doesn't come close to delivering a useful solution. Let's look at some common issues:
- Umlauts - just killing them wouldn't be good as the word would get mangled - like "große Küchengeräte" which should result in a url like "grosse-kuechengeraete"
- Multiple "bad" characters, like "We +/- love this" would result in something like "We-----love-this"
- Leading / trailing spaces or special characters like "Learn Grunt (200)" should NOT result in "Learn-Grunt-200-"
- …especially in combination with path-characters (if you allow them) like "catalog/-best-mixer-ever"
I needed to get this worked out, because 2sxc 8.3.5 provides a new input-field called "string-url-path" which will auto-fill from one or more other fields. So the designer can specify it to fill from "[Title]" or more advanced cases like "[Category]/[Title]" and everything else must just happen.
- Before even starting, get the fields like Category, Title and remove slashes inside each. Reason is that the final result may have slashes (because category/name can have a slash), but if an inner piece also had a slash, this could cause trouble.
- Merge the result based on the mask (like [Category]/[Title]
- Lowercase everything
- Latinize everything - I created an Angular-Service which does this for me, converting around 1000+ "bad" characters like "áűőú" or "ǽ" to simpler characters. If you want to use it, you can find my AngularJS latinize-text-service here.
- Neutralize apostrophe-s combinations like "Daniel's cat" to "Daniels cat" because I don't want it to end up as "daniel-s-cat" in the URL, but I also don't want to capture "she said 'super'" just because we have apos-s in a normal content
- Rotate all bad slashes \ to /
- Replace all unwanted characters including spaces with "-"
- Remove duplicate "-" and duplicate "/" in case they were created by previous conversions
- Replace all "-" and "/" side-by-side variations as they can easily be generated by previous conversions with simpler "/". This is to catch things like "(beta) Learn Gulp (200)" from resulting in a "blog/-beta-learn-gulp-200-" url
- And finally trim leading and trailing "-" characters
Love from Switzerland,