UNIX SIG Notes
By Christopher J. Fearnley
WWW HTML and the Principle of Markup Languages
In these pages much has been written about HTML (HyperText Markup
Language). I first wrote about HTML last March. In that article I
described URLs (Uniform Resource Locators) and the basic concepts of the
World Wide Web (WWW). I also cited several URLs with more detailed
information about WWW and HTML. Then last month an article entitled
"Hypertext" by Howard Rosen was published. So why another article on
HTML? Because I see a disturbing trend among Web publishers: writing
documents that depend upon the formatting capabilities of certain
browsers and displays. This violates one of the fundamental features
and advantages of using a markup language: device independent documents.
When writing in a markup language the author is able to (and should)
ignore the formatting of the output and concentrate on the content and
logical structure of the document. The computer takes responsibility
for translating that logical structure into a meaningful display. This
is diametrically opposed to the concept of WYSIWYG used by most of
today's word processors which give the author too much flexibility in
affecting the format of presentation (that's why it takes so long to
format certain types of documents properly with these programs).
Documents written in LaTeX (another markup language that produces
publisher quality output) or HTML should not care what type of paper (or
what size screen) they are being displayed upon.
It is possible to design HTML (and even LaTeX) documents poorly. These
markup languages give the author some flexibility to "tweak" the display
formatting. It is my objective to convince you that these "tweaks" do
more harm than good, in general. [Like any good rule of thumb there are
times when this one should be broken --- but those times are few and far
between.] First, by concentrating on the logical structure and content
of a document (vs. getting side-tracked by formatting considerations)
the author is able to focus on her purpose: writing a high-quality text.
Clearly, this should increase the quality of the writing. Secondly, if
display-dependent formatting is used, it becomes very time-consuming to
find each occurrence and to change its display style. Finally, (and
perhaps most apropos to HTML documents) different display devices may
garble your device specific code rendering the document unintelligible.
On the Web there are many different browsers and different screen sizes
each with different display characteristics. I have used many web
browsers (Lynx, Netscape, Mosaic, tkWWW, Arena, and Chimera). Each has
its strengths and weaknesses. Through using them, I have been
repeatedly shocked by the terrible Web publishing job of some companies.
Some documents won't display at all in some of these viewers. Certainly
this is due to bugs in some of the browsers, but a well written HTML
document should be viewable in ANY browser!
How can one ensure that your HTML document will display in a reasonable
fashion for all browsers? First, avoid the TEXT (boldface),
TEXT (italics), TEXT (underline), and font tags (such as
TEXT for typewriter font). They are device/display dependent.
Fonts, should be chosen by the user of the Web browser, not the author
of the document. Instead use TEXT (to emphasize text),
TEXT (for strong emphasis), TEXT (to cite
the title of a book), TEXT
(to display computer code), and
TEXT (to display sample output). Notice how these HTML
directives refer to the logical function of the text and not to the
display characteristics of the monitor or printing device. Secondly, be
very judicious in your use of Netscape's supplements to HTML. Some of
them are nice, but only Netscape can display them well. Finally, use
the ALT attribute when referring to an image. For example, use
to reference an inline JPEG image. This way text-based
browsers such as Lynx can display something meaningful to their users.
[I find it annoying when I visit a Web site with Lynx and all I see is
"[image] [image] [image]" in the document!]
Although many people feel that the exciting thing about the Web are the
multi-media and graphical capabilities, there is a huge contingent of
text-only users. While working for a local Internet service
provider, I have spoken to two blind users who can only use text-based
interfaces. When designing your Web pages, think of the millions of
users who use text-only browsers. For more information on HTML see the
excellent introductory article in the July issue of Linux Journal "HTML:
A Gentle Introduction" by Eric Kasten.
Programming Perl
I started reading "Programming Perl" by Larry Wall (Perl's author) and
Randal L. Schwartz in preparation for the two presentations on Perl for
the Unix SIG given in April and May of 1994. Although I read the first
two chapters of the book, I didn't really get into the language. I
think Perl takes advantage of more programming and Unix experience than
I had at that time. So I didn't appreciate its advantages. Now that I
understand the basics of C, awk and bash programming and have diligently
reread the book, I find "Programming Perl" to be very good and the
language to be quite useful.
Unix has the deeply ingrained concept of software tools. The Unix
toolbox consists of dozens of small useful tools and the ability to
combine them synergetically into powerful applications. This is
accomplished in many ways: on the command line with pipes, using shell
scripts, and using the C libraries in C development. Perl provides
another way to access these tools: putting most of them into one
integrated interface --- the Perl language. Perl has remnants of sh, C,
awk, lisp and many other popular Unix programming constructs. Plus
access to many of the Unix system calls, system administration
functions, and even the networking facilities. Because variables,
arrays, and associative arrays interact with the Unix system features so
smoothly in Perl, it is a very elegant way to do Unix system
programming. But the richness of Perl's programming constructs make it
easy to write obfuscated scripts. And Perl's special variables (like $'
$` $& $0 $! and etc.) take some getting used to (though it's no worse
than sh programming). All in all, I like Perl and will probably use it
more and more.
Unix and Internet Security
Next month's topic will be Unix and Internet Security. The Unix Sig
meets from 11 am to 1 pm in Room 242.