|
|
HP C
|
Previous | Contents | Index |
To write C programs using character sets that do not contain all of C's punctuation characters, ANSI C allows the use of nine trigraph sequences in the source file. These three-character sequences are replaced by a single character in the first phase of compilation. (See Section 2.16 for an explanation of compilation phases.) Table 1-1 lists the valid trigraph sequences and their character equivalents.
Trigraph Sequence | Character Equivalent |
---|---|
??= | # |
??( | [ |
??/ | \ |
??) | ] |
??' | ^ |
??< | { |
??! | | |
??> | } |
??- | ~ |
No other trigraph sequences are recognized. A question mark (?) that does not begin a trigraph sequence remains unchanged during compilation. For example, consider the following source line:
printf ("Any questions???/n"); |
After the ??/ sequence is replaced, this line is translated as follows:
printf ("Any questions?\n"); |
Digraph processing is supported when compiling in ISO C 94 mode (/STANDARD=ISOC94 on OpenVMS systems).
Digraphs are pairs of characters that translate into a single character, much like trigraphs, except that trigraphs get replaced inside string literals, but digraphs do not. Table 1-2 lists the valid digraph sequences and their character equivalents.
Digraph Sequence | Character Represented |
---|---|
<: | [ |
:> | ] |
<% | { |
%> | } |
%: | # |
%:%: | ## |
An identifier is a sequence of characters that represents a name for the following:
The following rules apply to identifiers:
struct employee { int number; char sex; } emp; |
An identifier without external linkage has at most 32,767 significant characters. An identifier with external linkage has 1023 significant characters on Tru64 UNIX systems and 31 significant characters for OpenVMS platforms. ( Section 2.8 describes linkage in more detail.) Case is not significant in external identifiers on OpenVMS systems.
Identifiers that differ within their significant characters are different identifiers. If two or more identifiers differ in nonsignificant characters only, they are treated as the same identifier.
Universal character names provide a way to name other characters. They can be used in identifiers, character constants, and string literals to designate characters that are not in the basic character set.
A universal character name begins with a \u or \U and is followed by either four or eight hexadecimal digits.
The universal character name \Unnnnnnnn designates the character whose eight-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn) Similarly, the universal character name \unnnn designates the character whose four-digit short identifier is nnnn (and whose eight-digit short identifier is 0000nnnn).
A universal character name cannot specify a character whose short identifier is less than 00A0, other than 0024 ($), 0040 (@), or 0060 (`), nor one in the range D800 through DFFF inclusive.)
See Appendix F for a list of valid universal character names.
Except within a character constant, string literal, or a comment, the /* character combination introduces a comment and the */ character combination ends a comment. The contents of such a comment are examined only to identify multibyte characters and to find the characters */ to terminate it.
Alternatively, the // character combination introduces a comment that includes all multibyte characters up to, but not including, the next new-line character. The contents of such a comment are examined only to identify multibyte characters and to find the terminating new-line character.
Comments cannot be nested; once a comment is started, the compiler treats the first occurrence of */ as the end of the comment.
To comment out sections of code, avoid using the /* and */ sequences. Using the /* and */ sequences works only for code sections containing no comments, because comments do not nest. A better method is to use the #if and #endif preprocessor directives, as in the following example:
#if 0 /* This code is excluded from execution because ... */ code_to_be_excluded (); #endif |
See Chapter 8 for more information on the preprocessing directives #if and #endif .
Comments cannot span source files. Within a source file, comments can be of any length and are interpreted as white space by both the compiler and the preprocessor.
Examples:
"a//b" // four-character string literal #include "//e" // undefined behavior // */ // comment, not syntax error f = g/**//h; // equivalent to f = g / h; //\ i(); // part of a two-line comment /\ / j(); // part of a two-line comment #define glue(x,y) x##y glue(/,/) k(); // syntax error, not comment /*//*/ l(); // equivalent to l(); m = n//**/o + p; // equivalent to m = n + p; |
C defines several keywords, each with special meaning to the compiler. Keywords identify statement constructs and specify basic types and storage classes. Keywords cannot be used as identifiers and cannot be declared.
Table 1-3 lists the C keywords.
In addition to the keywords listed in Table 1-3, the compiler reserves all identifiers that begin with two underscores (__) or with an underscore followed by an uppercase letter. User variable names must never begin with one of these sequences.
The following VAX C keywords are also sometimes 1 recognized by the compiler:
_align globaldef globalref globalvalue noshare readonly variant_struct variant_union |
The following C99 Standard keywords are also sometimes 2 recognized by the compiler:
inline restrict |
Use of a keyword as a superfluous macro name is not recommended, but is legal; for example, to change the default size of a basic data type:
#define int short |
Here, the keyword int has been redefined as short , which causes all data objects declared with the int data type to be stored as short objects.
1 Recognized on OpenVMS systems when /STANDARD=RELAXED (the default), /STANDARD=VAXC or /ACCEPT=VAXC_KEYWORDS is specified on the compiler command line. Recognized on Tru64 UNIX systems when -vaxc or -accept vaxc_keywords is specified on the compiler command line.2 Recognized on OpenVMS systems when /STANDARD=RELAXED (the default), /STANDARD=C99, or /ACCEPT=C99_KEYWORDS is specified on the compiler command line. Recognized on Tru64 UNIX systems when -std (the default), -c99, or -accept c99_keywords is specified on the compiler command line. |
An operator is a token that specifies an operation on at least one operand, and yields some result (a value, designator, side effect, or some combination). Operands are expressions or constants (a form of expression). Operators with one operand are unary operators, and operators with two operands are binary operators. For example:
x = -b; /* Unary minus operator */ y = a - c; /* Binary minus operator */ |
Operators with three operands are called ternary operators.
All operators are ranked by precedence, a ranking system determining which operators are evaluated before others in a statement. See Chapter 6 for information on what each operator does and for the rules of operator precedence.
Some operators in C are composed of more than one character, while others are single characters. The single-character operators in C are:
! % ^ & * - + = ~ | . < > / ? : , [ ] ( ) # |
The multiple-character operators in C are:
++ -- -> << >> <= >= == != *= /= %= += -= <<= >>= &= ^= |= ## && || |
The # and ## operators can only be used in preprocessor macro definitions. See Chapter 8 for more information on predefined macros and preprocessor directives.
The sizeof operator determines the size of a data type. See Chapter 6 for more information on the sizeof operator.
The old form for compound assignment operators ( =+ , =- , =* , =/ , =% , =<< , =>> , =& , =^ , and =| ) is not supported by the ANSI C standard. Use of these operators in a program is unsupported, and will produce unpredictable results. For example:
x =-3; |
This construction means
x
is assigned the value
-3
, not
x
is assigned the value
x - 3
.
The error-checking compiler option provides a warning message when the
old form of compound assignment operators is encountered.
Some characters in C are used as punctuators, which have their own syntactic and semantic significance. Punctuators are not operators or identifiers. Table 1-4 lists the C punctuators.
The following punctuators must be used in pairs:
< >
[ ]
( )
' '
" "
{ }
Some characters can be used either as a punctuator or as an operator, or as part of an operator. The context of the occurrence specifies the meaning. Punctuators usually delineate a specific type of C construct, as shown in Table 1-4.
Strings are sequences of zero or more characters. A character string literal is a sequence of zero or more multibyte characters enclosed in double quotation marks, as in "xyz". String literals can include any valid character, including white-space characters and character escape sequences. A wide string literal is the same, except prefixed by the letter L. Once a string is stored as a string literal, modification of the string leads to undefined results.
In the following example, ABC is the string literal. It is assigned to a character array where each character in the string literal is stored as one array element. Storing a string literal in a character array lets you modify the characters of the array.
char x[] = "ABC"; |
String literals are typically stored as arrays of type char (or wchar_t if prefaced with an L ), and have static storage duration.
The following declaration declares a character array to hold the string "Hello!":
char s[] = "Hello!"; |
The character array s is initialized with the characters specified in the double quotation marks, and terminated with a null character ( \0 ). The null character marks the end of each string, and is automatically concatenated to the end of the string literal by the compiler. Adjacent string literals are automatically concatenated (with a single null character added at the end) to reduce the need for the line continuation character (the backslash at the end of a line).
Normal string literals and wide string literals can be concatenated, in which case the normal strings get promoted to wide strings, and a wide-string result is produced.
Following are some valid string literals:
"" /* Here's a string with only the null character */ "You can have many characters in a string." "\"You can mix characters and escape sequences.\"\n" "Long lines of text can be continued on the next line \ by using the backslash character at the end of a line." "Or, long lines of text can be continued by using " "ANSI's concatenation of adjacent string literals." "\'\n" /* Only escape sequences are in this string */ |
To determine the length of a given string literal (not including the null character), use the strlen function. See Chapter 9 for more information on other library routines available for string manipulation.
Previous | Next | Contents | Index |
|