Anatomy of an HTML Page
Creating a website like Facebook or Twitter is far more than coding in HTML, but you have to start somewhere and HTML is a good place to begin. Everything else you learn about web development builds on your understanding of HTML.
A matter of semantics
A web page contains multiple elements. The important key to remember is that HTML defines the meaning or semantics of elements on the page rather than the look or layout of the content. Look and layout are the job of CSS and JavaScript and other web-based programming languages such as C#, Python, or PHP, provide the interactivity.
A page can be divided into sections, or articles, and each might have its own header and footer. You might have multiple levels of header in a single page, but according to the HTML specification, you should only have one top level header on the page.
Examining your page
Remember that first web page you created? Let's examine the code. Here is the code that we created:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <title>My First Web Page</title> </head> <body> <h1>Irene P. Smith</h1> <p>Welcome to my very first web page. It isn't fancy but it shows how easy it is to create a web page.</p> <h2>Contact me at:</h2> <address> <!-- Replace the following information with your own! --> Irene Smith
irene@playingwithwebdesign.com </address> </body> </html>
The first line <!DOCTYPE html>
tells the web browser that
this is an HTML page. Furthermore, it identifies the page as an HTML5
web page. Earlier versions of HTML required a more complicated DOCTYPE tag but
we don’t have to worry about that. And that is a good thing.
Why is that good? It means that the browser can expect the file to be compliant with current usage. The browser won't have to deal with special coding made for older web browsers. For details, take a look at the MDN web docs article Quirks Mode and Standards Mode for a good description of the difference.
The second and last line of the document work together to define a container for the rest of the page. <html lang=”en”> is the opening tag. The part that follows “html” is an attribute that tells the browser that the language for this page is English. Other possibilities include: fr (French), de (German), it (Italian), and es (Spanish). If you want to know more about the lang attribute, you can check out the definition on the w3c website, Specifying the language of content: the lang attribute.
Next, in lines 3 to 6, we define the head section of the document. The head section, not to be confused with the header, contains metadata for the current page. (Metadata is data about the page, as opposed to data that will be displayed in the browser.) In our example, I have included two bits of information. The tag on line 4 tells the web browser how to interpret the text on the page. The character set used in this case is “utf-8”. If you include the character set information, it should be within the first 1,024 characters of the page definition. The best way to ensure that is to place it first in the head section of the page, right after the opening <head> tag.
The title tag is the only required child tag for the head section of a web page. The text between <title> and lt;/title> will usually be displayed either in the title bar of the web browser or the tab for the page.
In case you're curious UTF stands for Unicode Transformation Format and the 8 means that each character in a string is represented by 8 bits or one byte.
The next section, the <body>, is where all the visible content of the page is placed. In the case of our sample document, we have a few bits of information. We have a heading, the information between <h1> and </h1>, a paragraph, and finally a section that contains my name and email address.
The <address> tag is usually formatted as italic text, but that depends, of course, on what formatting the web designer may have applied using CSS, but it also depends on how the browser chooses to interpret the tag.
In case you're curious...
Browsers and whitespace -- When the web browser interprets an HTML page, it ignores whitespace. Characters such as spaces and carriage return/line feed pairs are removed and replaced with a single space. You cannot format text using extra spaces to produce indentations, or extra paragraph marks to create new lines.
If you want a new line, you have to use the <br /> tag. We’ll talk about spaces later when we get to character entities.
This would be a good time to play around a little bit. See what you can do with the few tags I’ve already shown you. It won’t be fancy, but the goal is to be able to type any one of the tags you learn from this book without having to look up what it means or how to use it.