How I built Programming Podcasts

The end result in all its glory, Programming Podcasts

Although I've been aware of podcasts for as long as they've been around, it's only in the past six months that I've become a huge fan of the medium. It all started with The Laravel Podcast, a show that I started listening to after a conversation with Gordon Murray prompted me to go check it out. I was hooked almost instantly, and started listening to episodes in the car on my daily commute and eventually in the gym too.

I quickly ran into a pretty major problem - discovering new podcasts, particularly in the realm of software development, was a total pain in the ass. iTunes categories don't go beyond "Technology", and when you search on Google you're presented with BuzzFeed-style 15 Top-Notch Podcasts, 11 podcasts I listen to and 42 Podcasts for Developers and Programmers articles. These are great lists, but to figure out if I'd actually listen to any of the podcasts, I now had to go through each site one-by-one and check out an episode.

I knew that I could solve the problem by building a podcast directory that featured podcasts that focus on software development and related topics like devops, design and startups.

Deciding should I build

I start a lot of projects, but I complete few. Often this is due to becoming bored with the concept after a short time. Sometimes it is because I'm not all that interested in the subject matter, or I'm not solving a problem that I have myself. Other times it's because it becomes clear after a while that nobody actually wants what I'm working on.

With this idea, I already knew going in that it would have one loyal user at a mininum - me. But I wanted to know if anyone else would find it useful - this would dictate if I should spend more than a couple of hours on it. I used Google AdWords' Keyword Planner to research how many monthly searches there were for related search terms.

Google AdWords Keyword Planner

With 10k-100k average monthly searches across the topic, I felt there was enough of an audience to justify building something for more than just myself.

Coming up with a name

Because the site was a directory, and because I felt that the best approach to getting traffic would be through organic search and SEO, it was important that the site would have a descriptive name. No made-up brandable words or something that would require tons of marketing - something that would be highly relevant to what people would search for.

I was dead set on registering a .com name for the site, and it was pure luck that programmingpodcasts.com was available. Given the strong performance of the keyword in Google search results, and its high relevance to the content of the site, I went ahead and bought it.

Basic site mechanics

I wanted to keep the site as simple as possible - the goal was to make it easy to find podcasts and episodes, so I decided to avoid any superfluous content right from the start. As a result, the structure of the site wouldn't run too deep. I mapped out the following structure:

Site Map

On the front page, the site would have a listing of the newest episodes, a signup form for the mailing list and a listing of the most popular podcasts on the site. One of the key benefits I wanted the site to offer was a solid categorisation of podcasts - to a much finer level than that offered by iTunes. I decided to define two taxonomies:

Categories - these are broad areas like General Programming, Web Development, DevOps and Design
Topics - these are specific programming languages, frameworks or products like JavaScript, Ruby, PHP, React, Amazon Web Services and Laravel

A list of categories and topics is on the front page, and drilling down into these will show just the podcasts or episodes in the selected taxonomy.

The "Explore" page would adapt based on the user's selection to show all items or a specific category, topic or search results. It would allow the results to be sorted based on date, popularity or alphabetically. It would also let the user flip the view between a listing of podcasts and a listing of episodes.

Exploring the directory

Drilling to a podcast would display that podcast's information and a list of episodes. Drilling to an episode would display individual episode information. Anywhere an episode is shown to the user, they would have the option to play the audio right there in the browser. At launch, this was just a simple HTML5 audio element using the native browser controls for playback, but I had plans to make this much better in the future.

The stack

In the past couple of years, I've built virtually every side project I've worked on with the Laravel PHP framework. When I first came across Laravel, I was truly blown away by it. I hadn't used PHP since I was in college, and didn't have too fond memories of it. Since then, I'd used everything from Oracle and PL/SQL to ASP.NET MVC and C#, from Node.js and Express to Ruby and Sinatra. PHP had come a long way since the dark ages of PHP4, but Laravel was by far the most impressive Web frameworks I had ever used. It is a batteries-included framework, it comes with pretty much everything you'll need when building a Web application - but is still flexible enough to get out of your way when you need it to.

Having spent several years working on heavy client-side JavaScript applications, in recent years I have fallen back in love with applications that don't require much JavaScript at all. Don't get me wrong, I strongly believe that there are times when using something like React, Vue or Angular is justified - but for many projects they are completely overkill. So at first, Programming Podcasts was completely JavaScript-free. After all, it's a content site, so why would I really need it? When I launched the site, the only JavaScript code was to prevent multiple HTML5 audio elements playing simultaneously.

For search, I wanted to offer proper full-text search, no crappy MySQL LIKE '%:query%' clauses that would miss tons of relevant results. Using Laravel Scout and Algolia, this took all of five minutes. Algolia is a little expensive for a project like this, so I might need to look at alternative solutions down the line, but to get up and running with almost no setup and configuration, nothing I've come across is faster.

As for deployment, I decided to keep things simple to start off. Using Laravel Forge, I deployed to a single Linode box. If the site took off, it would be straightforward to move the database to its own box, implement load balancing and add multiple Web and queue worker servers if needs be. Forge makes the entire deployment process a breeze. It will provision a Linode box, DigitalOcean droplet or AWS EC2 instance for you, install and configure the relevant software and then set it up so you can deploy just by pushing to a Git repository. It also makes automatically provisioning and renewing SSL certificates completely painless.

I also use S3 for storing podcast and episode images, and use CloudFront CDN to serve them up. More on why I needed to do this later.

Populating the database with feeds

One of the aspects of the project that attracted me the most was that given the nature of how podcasts are distributed, it should have been possible to have the entire site run on auto-pilot. Almost every podcast has an RSS feed, and to get listed on iTunes there are specific guidelines on what data needs to be present in the feed - including hi-res cover images. RSS is straightforward to parse, and I found that almost every podcast I looked at published consistent and reliable feeds.

Once I had a feed URL for a podcast, I could then fetch everything else I needed from the XML. So the title, description, name of the host and all of the details of every episode could easily be extracted from the feed and added to the database. A scheduled job would then periodically check for new episodes (and podcasts) and automatically add them to the database.

The only manual effort I'd need to do past collecting the RSS feed URLs would be:

Getting the iTunes URL for a podcast
Selecting a category, and if relevant, topic for the podcast

I set a target of 200 podcasts for launch - this would be enough to make the directory feel complete, but not so high that manually adding iTunes URLs and categories would be too time consuming. I did write a little script that would go to the website of the feed and look for an iTunes link and present it to me if it found one - but I still manually checked everything just to make sure everything was right before launch.

With the 200 podcast target, I set off to fill the directory. Many of the lists I mentioned earlier came in handy, and I wrote a script that would search for podcasts on the Web, pull in their RSS feed data and I then queried this data for relevant keywords. This gave me a shortlist of podcasts, and I reviewed them to see which would be suitable for the site. I ended up launching with 201 feeds.

Snags

When I set out to build the site, it looked as though it would be very straightforward. But quickly enough I hit a bump in the road. At first I was simply hotlinking to the podcast and episode image URLs that were included in the RSS feed but when I started focusing on ensuring the site performed well I noticed that this was leading to huge images being downloaded. Some of the images were over 3 megabytes, all down to iTunes requirement that podcast artwork had minimum dimensions of 1400x1400 pixels.

Megabytes of images - no thank you!

The hotlinking of images also presented a second problem - mixed content security warnings for images hosted on a HTTP URL. As a result, I decided it would make more sense for me to download the images for each podcast and episode, resize them (I settled on 500x500) and if they were JPEGs compressed them to 60% quality. I then uploaded the modified image to Amazon S3, and put Amazon CloudFront CDN in front of this bucket to optimize the delivery of the assets to visitors around the world.

Hotlinking mixed content warnings - yikes!

As the site contained over 15,000 episodes - the first run of this task took a while, but it drastically reduced the load time of the page and solved the mixed-content warning issue, so it was worth the effort.

A better audio listening experience

One of the aspects of the site that bothered me at launch was the fact that if you used the simple HTML5 audio player to listen to an episode, it would stop playing if you navigated away from the page, and would not remember where you left off if you went to play it again. So I decided early on that I would fix this post-launch with an audio player that remained static across the site and would not be interrupted as you browse.

I had actually previously solved this issue for music artist websites for my startup, Subwoofr. At the time, I used the PJAX library to load internal links over XMLHttpRequest and just replace the section of the document that needed to be replaced with new content. This basically removed the need for page loads when navigating between pages, and allowed audio to continue playing as you browse. A nice bonus was that the site feels much faster as links appear to load instantly without the page flash that normally occurs.

Turbolinks in action

The PJAX library is dependent on jQuery, and it hasn't been updated in some time. It also requires you to do some hackery to get things like page titles to update as you navigate from page to page. At some point in the past when working on a Ruby on Rails project, I had used Turbolinks for a similar purpose. I checked out Turbolinks again and found it was perfect for what I wanted to achieve and didn't require me to load in jQuery or third-party libraries.

I then built a custom audio player using the native HTMLAudioElement API. I did so for a couple of reasons. First, when using the browser's native controls, Turbolinks will pause the audio between page loads. Second, the styling of the player is quite different on various browsers. Finally, I wanted to allow users the option of changing the playback speed, as I know many people like to listen to podcasts as 1.5x playback rate.

This worked great, but there was an edge cases where the audio would get cut off between page loads - if Turbolinks times out and does a hard page load. Also, I thought it would be nice if a user navigated away to another site and came back later, that the last episode they were listening to would resume where they left off, with their playback rate and muting preferences intact. To achieve this, I used HTML5 localStorage. I listen to several audio events, and update localStorage to reflect the current playtime, playback rate and whether or not the volume is muted. When the page loads, I check if there is data in localStorage, and if so I restore it into the player and the user can jump right back in to the episode.

Launch

I launched the website on Wednesday, 22nd March 2017 at 4pm. I posted a Show HN to Hacker News, tweeted about it, posted on a couple of Subreddits and added it to BetaPage. I've never had a post do well on Hacker News, but this post performed even worse than anything I've put up in the past. This was very disappointing, as I thought the HN audience would really find it useful. A week later, someone posted an Ask HN seeking everyone's favorite podcasts, and this made the front page. Perhaps my timing was bad - I posted at a very busy time for HN submissions - I knew this was a risk, but felt that if it got traction, the heavy volume of traffic would be worth it. Perhaps nobody gave a shit. I guess I'll never know!

On the Sunday after launch, I dropped an email to Adam Wathan, asking him to be one of the first guests on my soon-to-be-launched podcast (this is something I'm planning in the coming weeks). I told him about the new site and asked for his feedback. He loved it, retweeted one of my tweets and this drove a nice stream of traffic to the site, especially for a Sunday! Someone posted the site on Product Hunt the next Tuesday, but I didn't know this had happened until the next day, and it had barely even registered. I ran into problems trying to add images to the Product Hunt listing, presumably they were victims of the chronic S3 outage that affected much of the Internet that day.

First Week Traffic Stats

Overall, I was pretty pleased with the first week's traffic. 786 unique users generated a little over 3,000 page views. Users spent an average of 2 and a half minutes on the site which was decent, but I'm hopeful that the fixed audio player (deployed today, a week after the initial launch) will lead to an uptick in this particular statistic. As you can see, most of the traffic stemmed following Adam's post on Sunday, and despite it dropping significantly the following day, it was good to see traffic rise again the next day.

What's next?

I'm hopeful that over time the site will start to rank well in Google's results for related keywords, and that this will drive decent organic traffic to the site. I also have some other ideas about how to take the project further:

A new blog on the site that showcases podcasts and offers tips on hosting podcasts
A podcast of my own, interviewing the hosts of podcasts featured on the site
User sign up - logged in users will be able to set their preferred categories and topics to tailor the site to their interests, upvote and comment on podcasts and episodes, add podcasts and episodes to their collection, get custom RSS feeds for their collection, queue up multiple episodes for playback and subscribe for email updates whenever new episodes are published in their favorite podcasts
Automatically download and process audio files to generate transcripts and waveform data and perform further analysis, extend searchability and more
Native mobile app on iOS/Android, native desktop app on Mac/Linux/Windows
Add sister sites for other verticals (design, startups, business, others)

I'd love to hear your thoughts on the above ideas, and if you have any features you'd like to see on the site I'd greatly appreciate any suggestions.

Whether or not I'll do all of this will depend on the success of the project. I haven't given too much thought to monetisation yet, obviously there is the potential to run ads or sponsorship slots on the site, mailing list and podcast, but I'm going to hold off on pursuing this until the site has a much larger audience. I have some rough ideas about related products that could provide a revenue stream, but I'll keep those close to my chest for now.

In conclusion

Overall this has been a great project to work on, and I'm glad I did it. The reaction from those that I have reached so far has been almost exclusively positive. Not to mention that I now have an endless supply of podcasts to listen to every morning and evening.

I'll post a follow up to this after the site's been live for one month - it will be interesting to see the difference between the progress in the first week and the first month, here's hoping it'll be a positive one!