It’s Elementary, My Dear Coder

Seth Cohen
8 min readApr 22, 2021

Or: How I Learned to Stop Worrying and Love the DOM

Target audience: You’ve learned some basics about HTML and CSS and you are just beginning to learn the fundamentals of JavaScript. You’ve heard of the DOM, but don’t quite understand what it is, how it works, and why it is significant.

tl;dr: The DOM is a layer of abstraction, provided by the browser, that sits between HTML and JavaScript. It provides us with programmatic access to the HTML document. This gives us the ability to dynamically alter the HTML, and thereby create rich interactive experiences on the web.

The modern web offers us pages full of rich user interactivity. We go on Twitter, click like on a funny tweet, and watch our interaction change the state of the page right before our eyes. The heart changes from colorless to red and the likes count on the tweet in question increments by one. This is a very basic bit of interactivity, and at this point, one that is easy to take for granted. Some of us are old enough, however, to remember a time when interactions like this on the web were few and far between.

Indeed, at the web’s outset, the most interactive a web page could be was to provide a link which, upon being clicked, would re-direct the user to another, separate page. (As an aside, some really great websites are still like this. Think Wikipedia, arguably this author’s single favorite site on the whole web, which doesn’t offer much in the way of interactivity outside of links.)

To understand how interactions like the one described above work under the hood, we have to take a step back and get a firm understanding of a concept that makes it all possible: the “Document Object Model”, or DOM for short.

Before diving in to the DOM, though, let’s do a quick review of HTML. As you are likely aware; the structure of a web page, at its most basic, is nothing more than text. Most of the time, this text is written using HTML (Hyper Text Markup Language). Browsers prefer HTML to plain old text because they are designed to be able to do things with HTML. Take a look at the following example of a browser-rendered HTML document:

Here, I have written a simple HTML document in a text editor and opened it up in Chrome, my default browser. Notice that “This is a header” is displayed differently from “This is a paragraph…”, and that there is a button on the page as well that also has its own distinct stylistic features. If you take a look at the right hand side of the photo, you can see the corresponding HTML that I wrote.

Everything that you see displayed on the above web page was written in between two HTML tags. These three things, the opening and closing HTML tags and the text in between them, together make up an HTML element. When the browser loads up the document, it sees these elements and renders them accordingly. Probably the most obvious example, in this case, is the button element. As you can see, the text in between the two button tags is not much different than what is in between the two “p” tags and the two “h1” tags, that is to say: it’s just text! But because the text is contained inside of those two “button” tags, the browser says to itself: “the author of this HTML must want this text to be rendered as a button, so that’s what I’m going to go ahead and do!” Side note: browsers don’t actually talk to themselves.

At this point, it may start becoming clearer why web pages were once upon a time so static. If all we are doing to create web content is providing the browser with marked up text, it stands to reason that there’s not much in our power to change that text once we send it over to a browser. Enter the Document Object Model!

Here’s the kicker: what you see displayed on a given webpage is not being directly rendered from the HTML document itself, but rather from a model of the document that the browser stores in local memory upon parsing the original document. If we were to represent this model graphically, it would look something like this:

If this looks an awful lot like a family tree to you, that’s a great thought! It is important to take note of the the nested and hierarchical nature of an HTML document. Looking again at the simple example that we went over previously, note how the “h1”, “p”, and “button” elements are all contained inside of an opening and closing “body” tag. This makes those three elements children of the body element. The body tag, in turn is nested inside of an open and closing “html” tag, making the “body” a child of the “html” element.

Humans are not great at organizing abstract hierarchies in their head like this. As an HTML document becomes more and more complex, it becomes harder and harder to conceptualize which elements belong to which and how the whole structure is organized. Luckily, computers are very good at thinking about hierarchical structures, so when the browser parses the HTML document and creates a model of it in memory, that’s exactly how it organizes the data: as a hierarchical tree structure.

If you’ve worked with JavaScript objects before (or Ruby hashes, or Python dictionaries, etc.), it might not be such a stretch to imagine how this hierarchical data structure might be represented as a series of nested key value pairs. Taking the example above, if we were to describe this as a JavaScript object, it might look something like this:

Given this, it is probably not so hard to imagine yourself accessing one of these properties and re-assigning it’s value, e.g.:

document.html.body.p.style.color = “green” 

Disclaimer: the example above is not identical to the data-structure that a browser creates upon parsing an HTML document. As you will see when you begin working with the DOM, the data-structure that the browser creates is quite a bit more complex than this. The important point is to begin thinking about an HTML document as one big programmatically accessible object.

In this completely contrived example, we would be accessing the “color” property of the “style” node that is a child of the “p” element that is a child of the “body” element, and so on and so forth. Again, this is a very simple HTML document. Imagine having to do something like this with a significantly more complex hierarchical structure. The chaining of properties would be seemingly endless, and it would be enormously frustrating.

Fortunately browsers provide us with an interface for working with the DOM, aptly called the DOM API (application programming interface), that gives us access to extremely powerful methods for traversing the object model and grabbing the specific parts of it that we want (or even creating new ones!) Methods like Document.getElementById(), Document.createElement(), and my personal favorite Document.querySelector(), are just a few of the powerful tools at our disposal through the DOM API. Check out the MDN documentation to get a sense of what you can do!

So how does this all tie back to web-page interactivity? For the final piece of the puzzle, we have to talk about JavaScript. The original intention of JavaScript was to create a scripting language that could be provided to a browser. Modern browsers are all equipped to run JavaScript, and JavaScript was originally designed to run in the browser.

The word script is significant here. Think about what a script is in the context of a stage performance: a series of cues and programmed responses to those cues. For example: “when character_a says the following line to character_b, the band should be cued to strike up song_a.”

We can do something very similar to this by combining our knowledge of HTML, the DOM API, and JavaScript. Along with the HTML file that we send to our user’s browsers, we can send a JavaScript file. In that file, we can create a program for the browser to follow. The cues that we provide the browser are called events. We can use JavaScript in conjunction with our DOM traversal methods to find a specific element on the page, and attach something called an event listener that will, you guessed it, listen for an event, and fire off a given action when that event gets triggered. In other words: “when the user clicks on a given HTML element, make the following changes to the DOM.”

To show how this all works in practice, I have expanded on the simple HTML document that we went over before to include some JavaScript:

Everything in between the two “script” tags above is JavaScript. To break it down, this script directs the browser to:

  • Find the first button in the DOM
  • Attach an event listener which listens for a click event
  • When the event gets triggered (the button is clicked), look for the first “p” element in the DOM and pass it into a toggle function
  • The toggle function receives the “p” element and changes its properties depending on the element’s current state

If you look closely at the HTML in the above animation, you will see that, when the button gets clicked, that HTML instantly changes. This is thanks to the event listener that was attached to the button, and ultimately, all thanks to the DOM itself. One of the key concepts to remember here is that when we attach these event listeners that make changes to the DOM, we are not altering the original HTML document. We are altering the state of the DOM as it currently exists in local memory.

Through the power of the DOM, HTML, and JavaScript, we provide our users with the ability to have rich, event-driven, interactive experiences with the document that we create. It is because of the DOM that we can send forth the same exact HTML document to millions of users, who then each can have a separate and distinct experience of that page.

Thank you for reading! I hope that this was helpful and that you feel excited about going forth to see what you can do with this new-found understanding of the DOM (if indeed it helped you understand at all.)

For more helpful resources see the following:

--

--