Or: How I Learned to Stop Worrying and Love the DOM
The modern web offers us pages full of rich user interactivity. We go on Twitter, click like on a funny tweet, and watch our interaction change the state of the page right before our eyes. The heart changes from colorless to red and the likes count on the tweet in question increments by one. This is a very basic bit of interactivity, and at this point, one that is easy to take for granted. Some of us are old enough, however, to remember a time when interactions like this on the web were few and far between.
Indeed, at the web’s outset, the most interactive a web page could be was to provide a link which, upon being clicked, would re-direct the user to another, separate page. (As an aside, some really great websites are still like this. Think Wikipedia, arguably this author’s single favorite site on the whole web, which doesn’t offer much in the way of interactivity outside of links.)
To understand how interactions like the one described above work under the hood, we have to take a step back and get a firm understanding of a concept that makes it all possible: the “Document Object Model”, or DOM for short.
Before diving in to the DOM, though, let’s do a quick review of HTML. As you are likely aware; the structure of a web page, at its most basic, is nothing more than text. Most of the time, this text is written using HTML (Hyper Text Markup Language). Browsers prefer HTML to plain old text because they are designed to be able to do things with HTML. Take a look at the following example of a browser-rendered HTML document:
Here, I have written a simple HTML document in a text editor and opened it up in Chrome, my default browser. Notice that “This is a header” is displayed differently from “This is a paragraph…”, and that there is a button on the page as well that also has its own distinct stylistic features. If you take a look at the right hand side of the photo, you can see the corresponding HTML that I wrote.
Everything that you see displayed on the above web page was written in between two HTML tags. These three things, the opening and closing HTML tags and the text in between them, together make up an HTML element. When the browser loads up the document, it sees these elements and renders them accordingly. Probably the most obvious example, in this case, is the button element. As you can see, the text in between the two button tags is not much different than what is in between the two “p” tags and the two “h1” tags, that is to say: it’s just text! But because the text is contained inside of those two “button” tags, the browser says to itself: “the author of this HTML must want this text to be rendered as a button, so that’s what I’m going to go ahead and do!” Side note: browsers don’t actually talk to themselves.
At this point, it may start becoming clearer why web pages were once upon a time so static. If all we are doing to create web content is providing the browser with marked up text, it stands to reason that there’s not much in our power to change that text once we send it over to a browser. Enter the Document Object Model!
Here’s the kicker: what you see displayed on a given webpage is not being directly rendered from the HTML document itself, but rather from a model of the document that the browser stores in local memory upon parsing the original document. If we were to represent this model graphically, it would look something like this:
If this looks an awful lot like a family tree to you, that’s a great thought! It is important to take note of the the nested and hierarchical nature of an HTML document. Looking again at the simple example that we went over previously, note how the “h1”, “p”, and “button” elements are all contained inside of an opening and closing “body” tag. This makes those three elements children of the body element. The body tag, in turn is nested inside of an open and closing “html” tag, making the “body” a child of the “html” element.
Humans are not great at organizing abstract hierarchies in their head like this. As an HTML document becomes more and more complex, it becomes harder and harder to conceptualize which elements belong to which and how the whole structure is organized. Luckily, computers are very good at thinking about hierarchical structures, so when the browser parses the HTML document and creates a model of it in memory, that’s exactly how it organizes the data: as a hierarchical tree structure.
Given this, it is probably not so hard to imagine yourself accessing one of these properties and re-assigning it’s value, e.g.:
document.html.body.p.style.color = “green”
Disclaimer: the example above is not identical to the data-structure that a browser creates upon parsing an HTML document. As you will see when you begin working with the DOM, the data-structure that the browser creates is quite a bit more complex than this. The important point is to begin thinking about an HTML document as one big programmatically accessible object.
In this completely contrived example, we would be accessing the “color” property of the “style” node that is a child of the “p” element that is a child of the “body” element, and so on and so forth. Again, this is a very simple HTML document. Imagine having to do something like this with a significantly more complex hierarchical structure. The chaining of properties would be seemingly endless, and it would be enormously frustrating.
Fortunately browsers provide us with an interface for working with the DOM, aptly called the DOM API (application programming interface), that gives us access to extremely powerful methods for traversing the object model and grabbing the specific parts of it that we want (or even creating new ones!) Methods like Document.getElementById(), Document.createElement(), and my personal favorite Document.querySelector(), are just a few of the powerful tools at our disposal through the DOM API. Check out the MDN documentation to get a sense of what you can do!
The word script is significant here. Think about what a script is in the context of a stage performance: a series of cues and programmed responses to those cues. For example: “when character_a says the following line to character_b, the band should be cued to strike up song_a.”
- Find the first button in the DOM
- Attach an event listener which listens for a click event
- When the event gets triggered (the button is clicked), look for the first “p” element in the DOM and pass it into a toggle function
- The toggle function receives the “p” element and changes its properties depending on the element’s current state
If you look closely at the HTML in the above animation, you will see that, when the button gets clicked, that HTML instantly changes. This is thanks to the event listener that was attached to the button, and ultimately, all thanks to the DOM itself. One of the key concepts to remember here is that when we attach these event listeners that make changes to the DOM, we are not altering the original HTML document. We are altering the state of the DOM as it currently exists in local memory.
Thank you for reading! I hope that this was helpful and that you feel excited about going forth to see what you can do with this new-found understanding of the DOM (if indeed it helped you understand at all.)
For more helpful resources see the following: