General

Scope

This document describes a set of rules for creating and maintaining documentations or some other kinds of materials, which is not project-specific. This document is also self-conforming to these rules.

Notation

Specific formats may be used. The visual output may depend on the method of rendering.

Texts intended to be handled differently (for example, portions of program source code) may be in some specific format. Other texts are considered normal.

Hyperlinks may be used in normal texts pointing to external references, local pages, or anchors in specific documents.

Normal text empasized in general are in the specific format, usually (visually) bold.

Local terms in the normal text are emphasized at first appearence in the specific format, usually (visually) italic.

For terms used globally in this document and any other derivations, see below.

Terms and definitions

Resources

Contents of materials are split as resources (e.g. files) in possibly nested namespaces (e.g. directories). A namespace is also considered as a resource for convenience.

Paths and identifiers

A path is used to identifying or locating a resource, which can be in various forms (e.g. filesystem path or URL).

A path may have several components denoting different levels of namespace or the last level non-namespace resource.

An empty path is a path without any components.

A path with more than one components shall have syntactic separators (e.g. a slash(/) or whitespace) to split different components.

An identifier is a path with exactly one components without any separators, which can be used to differentiate resources in the same namespace or to collectively name some sets of resources in various namespaces.

A resource may be denoted with not necessarily the unique identifier or path. However, all resources this document discussed below are named.

Languages

Rules of natural languages are specified in this subclause. They have effects on normal texts.

Normal text of noun phrases may have embedded translations for different natural languages or more detailed descriptions following its first occurence, in parentheses (( and )).

Different letter cases (if appropriate) may be used for sentences, acronyms and words in the titles of clauses.

Editions in languages

A set of documentation may be in one (natural) language. The IETF language tag with at least one subtag and an additional prefix dot(.) shall be placed in the end of identifier of the resource before the dot and the extension name (if any). Otherwise the documentation shall be in multiple languages or without text contents (e.g. containing only ideographic images), and no language code shall be in the identifier of the resource.

When the additional dot and tag is removed, all different resource with same names shall refer to the same set of contents only in different languages, or at least one of them shall be incomplete which means to be completed as in former case. The resource is one edition in the specific language of the documentation.

Unless explicitly specified, when the meaning is in conflict for multiple editions in different languages, the complete one shall be valid over others. If there is not only one complete edition, the validity is specified in following order:

  • en-US
  • en
  • zh-CN
  • zh

NOTE The form of these literals conforms to the recommendation of IETF language tag, specifically, the "language" and "region" syntax elements in RFC 5646.

If no one edition in above languages is complete, the documentation is defective.

A language tag may be used to annotate one or more words in text. An annotation of such use is a language tag annotation, which consists of a tag combined with one pair of enclosing parentheses (namely, ( and )).

Hyperlinks in pages should preferrably link to localized contents corresponding to the language or one of the major languages used in the page (if any) when suitable. If contents of the linked target is in other languages (esp. when there are more than one semantically identical editions in multiple languages), at least one language tag for majority of the contents should be noted subsequent to the hyperlink; otherwise, the tag should be omitted.

For compatibility of client programs, each link of URI should be encoded in form of normalized Percent-Encoding in RFC 3986.

Additionally, several hyperlinks are normalized with the same form for a specific language. Currently the rule consists of following cases:

In English

Stylistic usage of letter cases shall be respected in the following precedence:

  1. All uppercase should not be used normally.
  2. Acronyms and other proper noun (pharses) shall be in the appropriate styles.
  3. The title case style shall be used for page or document titles.
  4. Either the title case or the sentence case shall be used in the titles in a page. This shall be consistent within a document.
  5. Either the title case or the sentence case shall be used in the detailed descriptions for acronyms in parentheses. This may vary in the same page.
  6. Detailed descriptions for acronyms in parentheses may use title case or sentence case.
  7. All lowercase style shall be used for words in the embedded translations or detailed descriptions in parentheses in other cases.
  8. Sentence case should be used otherwise.

English wording documentation is intended to be conforming to the ISO/IEC directive, part 3.

NOTE The use of modal verbs is distinct with RFC 2119.

For wording referenced from RFC documents, RFC 2119 is preferred, but not necessary with the case clarification (i.e. RFC 8174) for documents published earlier than RFC 8174 due to compatibility issues.

The following grammartical forms of English (with en or en-US tags) are considered idiomatic and application of such forms may be preferred:

  • answer ellipsis to elide the subject in the summary of commit messages where a question for the topic of the log message is assumed
  • bare passive clause omitting the auxiliary verb for short descriptive notes (e.g. commit messages in repositories and assertions messages in programs)
  • null subject and pronoun dropping in imperative forms
  • zero article for singular form of a countable noun denoting a specialized term being referenced, usually used in a terse-style title or in a list term (like this line)

Informative notes: The tense and mood used in the logs in version control systems are opinion-based. However, the implied rules are choosed here to avoid imperative forms by default, because:

  • First, it should be respected same in all information processing system: to make sure who are the messages in the logs serve to.
    • Version control systems are capable for reading and writing operations on the version history, with asymmetric operational frequency in general.
      • For most stakeholders to a repository in most cases, read-only accesses of the version history are more frequent compared to changing opertions.
      • This is also consistent with the idiom pattern used in programming: do not abuse imperative updates with side effects.
    • For most users, commit logs are entries of journal of the version history.
      • They do not and should not care about imperative changes in the logical perspective.
  • Unconstrained changes in the version history as effectful operations can make messes easily.
    • They are usually only well-behaved enough within some local context (e.g. in a single branch of a reliable instance of the version history).
    • They often make troubles in other cases (e.g. when stripped as patches possibly reordered).
  • Messages in the logs may be cooperated with other instances of version history.
    • No imperative mood can essentially assume the changes described will always be applied in the exactly same way.
    • As mentioned above, out-of-order changes make messes. If the messages are precise, they also make messes like other changed contents.
  • In general, messages in the logs work for distributed repositories.
    • There is simply no standpoint for the global view of the universe of the version history by default.
    • Messages should be ready to be audited by random accesses, besides being applied subsequently in some replays.
    • These facts further undermines the necessity of imperative changes.

Format-specific rules

Text files

Unless otherwise specified, all text files should be encoded as UTF-8 with BOM enabled.

Any use of encoding which may not be converted verbatim and losslessly in binary form to UTF-8 shall be explicit specified in documantation.

BOM should be omitted for text files dedicated to tools without capability of properly handling it. Otherwise, BOM shall be used as possible when it can clarify the encoding being used.

Unless definitely intended and explictly specified in documentation, newlines shall be consistent. Default use of newline is CR+LF.

Two subsequent newlines indicate an EOF logically. Subsequent newlines out of verbatim quoted text (including source code) should only be used at EOF.

There are also some default rules on typography implemented by ordinary characters in plain texts:

  • No space characters should be at EOL.
  • For text other than verbatim quoted, no more than one whitespace characters should be used to represent a single indent, except there are preferred combination in the language.
    • Rationale By default, no more than one whitespaces should be used to represent an indent, because there should be no chance to insert a character in the middle of an indent.
    • NOTE An example of preferred exceptional case is that the hanging indent (in the first line of a paragraph) in east Asian languages where dedicated combination of fullwidth whitespaces are preferred. Typically, the sequence consists of 2 ideographic space (U+3000).
    • NOTE To keep the semantics rules clear, when possible (in horizontal texts and out of the context of making tables) and no other forms are more preferred by the rules of the language, use horizontal tab character(U+0009) instead of other spaces (i.e. U+0020) to indent.
  • For Western languages, except at the first of line, each word which consists of alphanumeric characters should be seperated by a single space character (U+0020) with other words.
  • Space characters (U+0020) should be used for alignment when portability is required.
    • Rationale This makes the visal effect easy to predicate in the usual settings with monospaced fonts in contexts like source code of programs.
    • NOTE Other spaces like non-breaking space (U+00A0) may be better in specific uses, but not portable as U+0020.

Markdown

Names of markdown files should be with .md extension.

Dialects

Unless explicitly specified elsewhere, only common dialects are to be used. Currently this should be GFM (GitHub Flavored Markdown).

NOTE This is not GLFM (GitLab Flavored Markdown), which also abbreviated as GFM formerly.

And if the content may be presented on Bitbucket wiki, stricter rules applies, notably:

NOTE This repository is not intended deployed in Bitbucket wiki now. The stricter rules on Bitbucket wiki above are not applicable here.

Syntactic restrictions

As text files, markdown files shall obey the same rules above. The indentation rule is necessary to avoid some compatibility issues, e.g. this.

As specified, reserved characters defined by RFC 3986 should be percentage-encoded. Notably, the parentheses(()) in hyperlinks shall be encoded to make it more fault-tolerent for some editors.

Headers should be prefixed by #s.

There should be no redundant characters allowed between the annotated words and annotation (esp. whitespace characters), even there are whitespaces in the words. The annotation in this rule includes any language tag annotation defined in previous subclause.

Rational This is for the sake of compact annotation representations.

NOTE The whitespace rules in the language annotation is also applicable. Instead, it is also allowed to use word combination (instead of the annotation) when gramatically correct, so this rule does not apply.

Cross references

This document is used by the YSLib project. It may be also referenced by other repositories.

Except for the following list, do not edit unless ultimately necessary.

Known referenced by:

Annex (Informative)

Alternaive imcompatible rules

Usually there the rules of documentation here are compatible to other rules in various specifications. However, some of the well-known rules are considered overspecified (albeit not rigouous) and with insufficient quality in specification. Thus, these rules are deliberately kept incompatible, and never accepted here:

  • The specification may be too vague by missing separating the comformance rules and the suggestions, so it is difficult to manually verify the conformance just by the specification text.
    • Some confusions may be from the lack of rules on the modal verbs.
    • Some rules may be underspecified for external resources. For example, the claim of "be a valid Markdown file" is unclear without further notes, because there is no unique standard to determine the definition of "valid", since there are multiple dialects of the Markdown language and no flavor is definitely more representative than others.
  • The rules of mandated letter cases (in particular, capitalization) may be too restrictive.
    • This may be generally too subjective. It can be good to sticking to a well-name for to ease for use cases for technical merits (like for machine verication), but the fixed spelling on cases may be overspecified.
      • The exception is when the name is standardized and machine-oriented by default.
      • As a notable instance, RFC 5646 recommends but does not mandate the capitalization for the codes from ISO 639-1 and ISO 3166-1, while the preferred capitalization diverges in the 2 standards.
    • On the other hand, mandotory like "README" instead of "Readme" is too restrictve. It will be problatic to be transferred between case-insensitive enviornments and case-sensitive environments (e.g. names in filesystems), where one environment may allow entries of README and Readme coexisting but another may not.
      • When techically feasible to having different cases coexisting, "README" and "Readme" are symmentric, i.e. no one is definitely more preferred than the other for machines. It is then not intuitive to reason why "README" must be preferred to "Readme" instead of the exact opposite in the specification, in particular with the fact that such entry is mainly created for human readers but not machines.
      • Instead, keeping one overridable as well as a recommended default form (which does not necessarily to be all capitalized) of spelling is better for both portability and other needs.
      • Further, names like "README.md" are less consistent to "README.MD". The latter is at least required in some ancient systems not support the small case, hence even more preferred for portability (in extreme cases).
  • Prioritizing non-regional subtags for languages should not be recommended normally, because this is less accurate, and the confusion may even be offensive to specific culture, since there can be lack of consensus that one subtag can override another without changing the meaning of the text (which is not the case of the relationship between tags and subtags).
  • Validation of hyperlinks should be acknowledged not always possible when the linked resource is out of the control (i.e. external) in a document.
    • Anyway, there is no persistency guarantee for most hyperlinks in the Web.
    • Mandating the state of the referenced resource of hyperlinks unconditionally will make any verification result one-time, because the exteranl links may be broken immediately after the verification. Then the conformance is non-deterministic.
    • Such mandatory is applicable only for hyperlinks provable to be persistent. But this is infeasible with automatic methods at least for external links on the Web, because the test of persistency may be unreliable until the link is broken.
    • So, unless external links are not allowed (which seems an overkill), rules having impractical assumptions of the validation process should be in the specification.

An example of most bullets above can be found in the specification of standard-readme.