UNIX SIG Notes By Christopher J. Fearnley WWW HTML and the Principle of Markup Languages In these pages much has been written about HTML (HyperText Markup Language). I first wrote about HTML last March. In that article I described URLs (Uniform Resource Locators) and the basic concepts of the World Wide Web (WWW). I also cited several URLs with more detailed information about WWW and HTML. Then last month an article entitled "Hypertext" by Howard Rosen was published. So why another article on HTML? Because I see a disturbing trend among Web publishers: writing documents that depend upon the formatting capabilities of certain browsers and displays. This violates one of the fundamental features and advantages of using a markup language: device independent documents. When writing in a markup language the author is able to (and should) ignore the formatting of the output and concentrate on the content and logical structure of the document. The computer takes responsibility for translating that logical structure into a meaningful display. This is diametrically opposed to the concept of WYSIWYG used by most of today's word processors which give the author too much flexibility in affecting the format of presentation (that's why it takes so long to format certain types of documents properly with these programs). Documents written in LaTeX (another markup language that produces publisher quality output) or HTML should not care what type of paper (or what size screen) they are being displayed upon. It is possible to design HTML (and even LaTeX) documents poorly. These markup languages give the author some flexibility to "tweak" the display formatting. It is my objective to convince you that these "tweaks" do more harm than good, in general. [Like any good rule of thumb there are times when this one should be broken --- but those times are few and far between.] First, by concentrating on the logical structure and content of a document (vs. getting side-tracked by formatting considerations) the author is able to focus on her purpose: writing a high-quality text. Clearly, this should increase the quality of the writing. Secondly, if display-dependent formatting is used, it becomes very time-consuming to find each occurrence and to change its display style. Finally, (and perhaps most apropos to HTML documents) different display devices may garble your device specific code rendering the document unintelligible. On the Web there are many different browsers and different screen sizes each with different display characteristics. I have used many web browsers (Lynx, Netscape, Mosaic, tkWWW, Arena, and Chimera). Each has its strengths and weaknesses. Through using them, I have been repeatedly shocked by the terrible Web publishing job of some companies. Some documents won't display at all in some of these viewers. Certainly this is due to bugs in some of the browsers, but a well written HTML document should be viewable in ANY browser! How can one ensure that your HTML document will display in a reasonable fashion for all browsers? First, avoid the TEXT (boldface), TEXT (italics), TEXT (underline), and font tags (such as TEXT for typewriter font). They are device/display dependent. Fonts, should be chosen by the user of the Web browser, not the author of the document. Instead use TEXT (to emphasize text), TEXT (for strong emphasis), TEXT (to cite the title of a book), TEXT (to display computer code), and TEXT (to display sample output). Notice how these HTML directives refer to the logical function of the text and not to the display characteristics of the monitor or printing device. Secondly, be very judicious in your use of Netscape's supplements to HTML. Some of them are nice, but only Netscape can display them well. Finally, use the ALT attribute when referring to an image. For example, use 25 frequency geodesic
sphere to reference an inline JPEG image. This way text-based browsers such as Lynx can display something meaningful to their users. [I find it annoying when I visit a Web site with Lynx and all I see is "[image] [image] [image]" in the document!] Although many people feel that the exciting thing about the Web are the multi-media and graphical capabilities, there is a huge contingent of text-only users. While working for a local Internet service provider, I have spoken to two blind users who can only use text-based interfaces. When designing your Web pages, think of the millions of users who use text-only browsers. For more information on HTML see the excellent introductory article in the July issue of Linux Journal "HTML: A Gentle Introduction" by Eric Kasten. Programming Perl I started reading "Programming Perl" by Larry Wall (Perl's author) and Randal L. Schwartz in preparation for the two presentations on Perl for the Unix SIG given in April and May of 1994. Although I read the first two chapters of the book, I didn't really get into the language. I think Perl takes advantage of more programming and Unix experience than I had at that time. So I didn't appreciate its advantages. Now that I understand the basics of C, awk and bash programming and have diligently reread the book, I find "Programming Perl" to be very good and the language to be quite useful. Unix has the deeply ingrained concept of software tools. The Unix toolbox consists of dozens of small useful tools and the ability to combine them synergetically into powerful applications. This is accomplished in many ways: on the command line with pipes, using shell scripts, and using the C libraries in C development. Perl provides another way to access these tools: putting most of them into one integrated interface --- the Perl language. Perl has remnants of sh, C, awk, lisp and many other popular Unix programming constructs. Plus access to many of the Unix system calls, system administration functions, and even the networking facilities. Because variables, arrays, and associative arrays interact with the Unix system features so smoothly in Perl, it is a very elegant way to do Unix system programming. But the richness of Perl's programming constructs make it easy to write obfuscated scripts. And Perl's special variables (like $' $` $& $0 $! and etc.) take some getting used to (though it's no worse than sh programming). All in all, I like Perl and will probably use it more and more. Unix and Internet Security Next month's topic will be Unix and Internet Security. The Unix Sig meets from 11 am to 1 pm in Room 242.