Artwork for podcast Modern Digital Business
8 Steps to higher quality DNS systems
Episode 529th August 2022 • Modern Digital Business • Lee Atchison
00:00:00 00:16:06

Share Episode

Shownotes

DNS is a highly available, highly redundant, highly reliable service that is absolutely essential to your company's application and business operations. A failure in your DNS system can bring your company's business to a halt jeopardizing your company's future.

DNS is essential to the operation of all aspects of the internet and modern digital businesses. The problem with DNS, is that a very tiny mistake in a configuration file can cause ripples throughout the entire DNS system and impact all aspects of your company's operations, it's customer's ability to use the company's products and a company's ability to make money. All of it can be brought to its knees by a very tiny mistake in a single configuration entry. Without solid DNS configuration management in place, you make yourself vulnerable to simple but costly mistakes.

But how do you implement a high quality DNS hygiene solution? In this episode, I'll give you eight steps to higher quality DNS systems.

Today on Modern Digital Business.

Useful Links

About Lee

Lee Atchison is a software architect, author, public speaker, and recognized thought leader on cloud computing and application modernization. His most recent book, Architecting for Scale (O’Reilly Media), is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee has been widely quoted in multiple technology publications, including InfoWorld, Diginomica, IT Brief, Programmable Web, CIO Review, and DZone, and has been a featured speaker at events across the globe.

Take a look at Lee's many books, courses, and articles by going to leeatchison.com.

Looking to modernize your application organization?

Check out Architecting for Scale. Currently in it's second edition, this book, written by Lee Atchison, and published by O'Reilly Media, will help you build high scale, highly available web applications, or modernize your existing applications. Check it out! Available in paperback or on Kindle from Amazon.com or other retailers.


Don't Miss Out!

Subscribe here to catch each new episode as it becomes available.

Want more from Lee? Click here to sign up for our newsletter. You'll receive information about new episodes, new articles, new books and courses from Lee. Don't worry, we won't send you spam and you can unsubscribe at any time.

Mentioned in this episode:

O'Reilly Media - Building a Cloud Roadmap

Have you struggled with the cloud migration? Then you'll appreciate my live training course, Building a Cloud Roadmap presented by O'Reilly Media. Live on October 5th at 9:00 AM PDT. For more information, go to mdb.fm/roadmap or leeatchison.com/roadmap. But hurry seats are limited.

Transcripts

Lee:

DNS is a highly available, highly redundant, highly reliable service, that

Lee:

is absolutely essential to your company's applications and business operations.

Lee:

Yet, DNS configurations are highly sensitive and simple mistakes

Lee:

can cause catastrophic problems.

Lee:

But how do you implement a high quality DNS hygiene solution?

Lee:

In this episode, I'll give you eight steps to higher quality DNS systems.

Lee:

Are you ready?

Lee:

Let's go.

Lee:

DNS is a highly available, highly redundant, highly reliable service that

Lee:

is absolutely essential to your company's application and business operations.

Lee:

A failure in your DNS system can bring your company's business to a halt

Lee:

jeopardizing your company's future.

Lee:

DNS is essential to the operation of all aspects of the internet

Lee:

and modern digital businesses.

Lee:

The problem with DNS, is that a very tiny mistake in a configuration file can

Lee:

cause ripples throughout the entire DNS system and impact all aspects of your

Lee:

company's operations, it's customer's ability to use the company's products

Lee:

and a company's ability to make money.

Lee:

All of it can be brought to its knees by a very tiny mistake in

Lee:

a single configuration entry.

Lee:

Without solid DNS configuration management in place, you make yourself

Lee:

vulnerable to simple but costly mistakes.

Lee:

That's where problems often occur.

Lee:

Why are DNS configurations so sensitive to mistakes?

Lee:

The root cause of this sensitivity is that DNS changes are so common

Lee:

and so simple that they are rarely considered risky business operations.

Lee:

For smaller organizations, the development team probably manages

Lee:

their own DNS servers or has some other way to make DNS changes on the fly.

Lee:

As organizations get larger and more complex, the number of DNS servers

Lee:

and the number of people who can make changes to them tends to multiply.

Lee:

With so many people, making so many changes, it's not surprising that

Lee:

something goes wrong occasionally.

Lee:

In fact, it would be much more surprising if things didn't go wrong.

Lee:

DNS outages can be caused by a variety of factors, including human error,

Lee:

software issues, and hardware failures, but the most common cause of DNS

Lee:

outages is incorrect configuration files being deployed to DNS servers.

Lee:

What steps can smaller companies without quality DNS hygiene make

Lee:

in order to put a high quality DNS management process in place?

Lee:

Here are eight things any company can do to improve their

Lee:

overall DNS quality to keep your applications operational and healthy.

Lee:

Number one.

Lee:

manage DNS configuration using revision control.

Lee:

This is the simplest and most basic thing you can do to improve the

Lee:

quality of your DNS infrastructure.

Lee:

At the core, DNS configurations are simply flat text files.

Lee:

Many DNS providers do give you a front end control panel to these configuration

Lee:

files in order to let you make changes easier, and with less actual knowledge on

Lee:

the impact of the changes you are making.

Lee:

Don't use these control panels.

Lee:

Instead, manage your configuration files, using the standard flat text file format.

Lee:

Once you have moved to this flat file format, you can easily manage these

Lee:

configuration files using the same revision control program you use for

Lee:

managing your application source code.

Lee:

For most companies, this is some variation of GIT.

Lee:

You undoubtedly have processes in place today in your company

Lee:

for managing your source code.

Lee:

Use the same or similar process for managing a DNS

Lee:

configuration files as well.

Lee:

This simple change will allow many other process improvements to come naturally,

Lee:

such as configuration reviews, approval workflows, and the ability to track

Lee:

when specific changes were made that may have impacted your application.

Lee:

This is an essential base necessary to keep your DNS

Lee:

service operating and error free.

Lee:

Number two.

Lee:

Review all needed, DNS changes.

Lee:

This falls right behind the first recommendation.

Lee:

Once you're managing your changes using a revision control program,

Lee:

make sure that all changes you make are reviewed and approved.

Lee:

This can be accomplished just like your application source code using

Lee:

branches, pull requests and merges.

Lee:

Establish a process for approvals for all DNS changes.

Lee:

Make sure at least one or more people review all changes before

Lee:

they are incorporated into your production configuration.

Lee:

This review process should include checks for things like syntax

Lee:

errors, incorrect DNS settings, and other potential problems.

Lee:

Problems with DNS configurations can be subtle.

Lee:

So the review should be thorough and methodical by a knowledgeable reviewer.

Lee:

Number three.

Lee:

Document the intent of all changes.

Lee:

Every change you make should be documented.

Lee:

If you following the above steps, then this can naturally be

Lee:

accomplished using the code checking commit and poll request process.

Lee:

This documentation will help you later if a problem exists or an

Lee:

incompatible change is proposed.

Lee:

Understanding why a previous change was made will help repair problems

Lee:

and help you avoid future problems.

Lee:

Number four.

Lee:

Automate the configuration deploy process.

Lee:

Once you have the process in place to manage your configuration files,

Lee:

establish a process to automate the deployment of those configuration file

Lee:

updates to your production DNS system.

Lee:

By automating this process, you reduce the likelihood of an incorrect

Lee:

change being pushed to production or a simple human error causing your DNS

Lee:

system to fail or produce bad results.

Lee:

If you find yourself copying and pasting changes from one configuration file to

Lee:

another, during a deployment process, you're much more likely to make a mistake

Lee:

and introduce a bug into the DNS system.

Lee:

Automatically deploying changes using scripts, we'll make sure

Lee:

the changes are applied in a consistent and reliable manner.

Lee:

Part of the automated system should include an automated rollback mechanism.

Lee:

This may be a natural extension of your revision control process or a separate

Lee:

deployment rollback process, but being able to quickly and effectively undo a

Lee:

change may make the difference between a mistake being a small inconvenience

Lee:

or a massive product outage.

Lee:

Number five.

Lee:

Grow into a more sophisticated change management system.

Lee:

As your DNS system grows in complexity, you may want to consider putting an

Lee:

entire change management system on top of the simple version control system

Lee:

that you've already established.

Lee:

This might include using change request forms, request

Lee:

for authorization, multi-team sign-offs and other such processes.

Lee:

These changes may seem onerous, but DNS configuration is not a

Lee:

place for slacking off and process.

Lee:

A simple DNS change can impact many teams within your organization.

Lee:

Allowing those teams input before the change is made, or even the

Lee:

proposal for changes accepted can save you many headaches later on.

Lee:

The size and complexity of your change management system will naturally be

Lee:

tied to the size and complexity of your organization, and other software

Lee:

management processes that you employ.

Lee:

Number six.

Lee:

Use an independent DNS provider.

Lee:

A high quality DNS system requires more than configuration management.

Lee:

It requires a high quality operational environment as well.

Lee:

Many of your existing service providers may provide DNS services that you can

Lee:

easily and inexpensively leverage.

Lee:

In particular, most cloud providers naturally provide DNS services and

Lee:

usually rather high quality DNS services.

Lee:

However, be careful using a DNS service that is provided by a company

Lee:

that provides you any other services, including other cloud services.

Lee:

The reason why?

Lee:

Well, during a service outage, the most critical tool you need to be

Lee:

operating normally is your DNS system.

Lee:

You need it to help you diagnose and repair most other outages.

Lee:

If your DNS system is also down, the length of your outage

Lee:

will extend significantly.

Lee:

The reverse is also true.

Lee:

if you are dealing with a DNS issue, the last thing you also want to be dealing

Lee:

with is an outage caused by another service in your application ecosystem.

Lee:

Avoid these problems by using a high quality DNS provider that only provides

Lee:

DNS services to you and nothing else.

Lee:

This allows you to isolate your DNS and problems with your DNS system,

Lee:

from any other service in your application, reducing the likelihood

Lee:

of a DNS related extended outage.

Lee:

And be careful, make sure the provider you select isn't dependent on service

Lee:

providers, such as cloud providers, that you are also already relying on.

Lee:

If AWS has an outage, you want your independent DNS

Lee:

provider to keep operating.

Lee:

That doesn't happen if that service provider is also depending on AWS.

Lee:

Now, some people run their own DNS systems.

Lee:

If you decide to run your own DNS, make sure you operate it

Lee:

using independent resources from the rest of your application.

Lee:

This means operating it in different data centers, availability zones

Lee:

and even cloud regions, than the rest of your application.

Lee:

Number seven.

Lee:

Separate internal and external DNS.

Lee:

Let's take that last point one step further.

Lee:

You have DNS needs that are internal to your company and external DNS

Lee:

needs that your customers depend on.

Lee:

Your internal DNS provides access to internal documentation, internal systems

Lee:

including email and communications tools and other internal processes and systems.

Lee:

Your external DNS provides access to your company's applications, products, and

Lee:

services that your customers depend on.

Lee:

Make sure these two DNS needs are handled by different providers.

Lee:

If your external DNS goes down, fixing that problem will be substantially

Lee:

harder if your internal DNS is also down.

Lee:

This is part of what took Facebook so long to fix their application when

Lee:

they went down in October of 2021.

Lee:

There external DNS went down and they couldn't diagnose and

Lee:

fix the problem easily because their internal DNS was also down.

Lee:

And conversely, if your internal DNS goes down, you don't want that problem

Lee:

to bleed out to your external customers.

Lee:

Using different providers, along with different DNS configurations and

Lee:

configuration processes is extremely valuable to avoid these sorts of problems.

Lee:

And lastly, number eight.

Lee:

Duplicate your DNS in another provider.

Lee:

Let's go one final step further.

Lee:

Set up your production, DNS using two different providers, use one

Lee:

as a primary provider and the secondary is a backup provider.

Lee:

This way.

Lee:

If your primary provider goes down, for some reason, you may be able

Lee:

to switch your production DNS over to your backup provider quickly.

Lee:

The backup provider should have a complete, operational and

Lee:

fully tested copy of your DNS configuration set up and operating.

Lee:

So it can be put into play quickly if needed.

Lee:

This process will be easier if you have implemented the automated deployment

Lee:

processes, we talked about previously.

Lee:

This automated process can help assure that you keep your changes in sync

Lee:

between your primary and backup providers.

Lee:

The worst thing that can happen is for your primary provider to go down, you

Lee:

switched to your backup provider, but you end up with an incomplete or incorrect

Lee:

DNS configuration because you haven't tested your backup provider setup.

Lee:

DNS is a critical system that should be designed for high availability

Lee:

and reliability from the start.

Lee:

You also need to think about security when designing your DNS infrastructure.

Lee:

Make sure you have redundant systems in place, and that access to your

Lee:

DNS system is tightly controlled.

Lee:

Finally monitoring DNS is critical to ensuring your

Lee:

system continues to run smoothly.

Lee:

You need tools that will alert you if problems occur so you

Lee:

can take steps to mitigate the impact as quickly as possible.

Lee:

DNS outages are common occurrences, but they don't have to bring your

Lee:

entire company to a standstill.

Lee:

By using the proper processes and tools.

Lee:

You can minimize the impact of any outages and keep your business running smoothly.