From 9e8afc68e554c5c45b8e9f387d4fcf4ac0174a6a Mon Sep 17 00:00:00 2001 From: Ryan Hendrickson Date: Wed, 31 Jul 2024 19:07:57 -0400 Subject: [PATCH 1/2] docs: add language/string-literals.md --- doc/manual/redirects.js | 1 + doc/manual/src/SUMMARY.md.in | 1 + doc/manual/src/language/identifiers.md | 2 +- .../src/language/string-interpolation.md | 4 + doc/manual/src/language/string-literals.md | 189 ++++++++++++++++++ doc/manual/src/language/syntax.md | 172 +--------------- doc/manual/src/language/types.md | 2 +- 7 files changed, 199 insertions(+), 172 deletions(-) create mode 100644 doc/manual/src/language/string-literals.md diff --git a/doc/manual/redirects.js b/doc/manual/redirects.js index beef6ef4a..cb8cd18fa 100644 --- a/doc/manual/redirects.js +++ b/doc/manual/redirects.js @@ -344,6 +344,7 @@ const redirects = { }, "language/syntax.html": { "scoping-rules": "scoping.html", + "string-literal": "string-literals.html", }, "installation/installing-binary.html": { "linux": "uninstall.html#linux", diff --git a/doc/manual/src/SUMMARY.md.in b/doc/manual/src/SUMMARY.md.in index 7661f5f62..eef7d189c 100644 --- a/doc/manual/src/SUMMARY.md.in +++ b/doc/manual/src/SUMMARY.md.in @@ -29,6 +29,7 @@ - [String context](language/string-context.md) - [Syntax and semantics](language/syntax.md) - [Variables](language/variables.md) + - [String literals](language/string-literals.md) - [Identifiers](language/identifiers.md) - [Scoping rules](language/scope.md) - [String interpolation](language/string-interpolation.md) diff --git a/doc/manual/src/language/identifiers.md b/doc/manual/src/language/identifiers.md index bd58a9b36..a4153b588 100644 --- a/doc/manual/src/language/identifiers.md +++ b/doc/manual/src/language/identifiers.md @@ -16,7 +16,7 @@ An *identifier* is an [ASCII](https://en.wikipedia.org/wiki/ASCII) character seq # Names -A name can be an [identifier](#identifier) or a [string literal](./syntax.md#string-literal). +A name can be an [identifier](#identifier) or a [string literal](string-literals.md). > **Syntax** > diff --git a/doc/manual/src/language/string-interpolation.md b/doc/manual/src/language/string-interpolation.md index 1778bdfa0..27780dcbb 100644 --- a/doc/manual/src/language/string-interpolation.md +++ b/doc/manual/src/language/string-interpolation.md @@ -8,6 +8,10 @@ Such a construct is called *interpolated string*, and the expression inside is a [path]: ./types.md#type-path [attribute set]: ./types.md#attribute-set +> **Syntax** +> +> *interpolation_element* → `${` *expression* `}` + ## Examples ### String diff --git a/doc/manual/src/language/string-literals.md b/doc/manual/src/language/string-literals.md new file mode 100644 index 000000000..7605e8c61 --- /dev/null +++ b/doc/manual/src/language/string-literals.md @@ -0,0 +1,189 @@ +# String literals + +A *string literal* represents a [string](types.md#type-string) value. + +> **Syntax** +> +> *expression* → *string* +> +> *string* → `"` ( *string_char*\* [*interpolation_element*][string interpolation] )* *string_char*\* `"` +> +> *string* → `''` ( *indented_string_char*\* [*interpolation_element*][string interpolation] )* *indented_string_char*\* `''` +> +> *string* → *uri* +> +> *string_char* ~ `[^"$\\]|\$(?!\{)|\\.` +> +> *indented_string_char* ~ `[^$']|\$\$|\$(?!\{)|''[$']|''\\.|'(?!')` +> +> *uri* ~ `[A-Za-z][+\-.0-9A-Za-z]*:[!$%&'*+,\-./0-9:=?@A-Z_a-z~]+` + +Strings can be written in three ways. + +The most common way is to enclose the string between double quotes, e.g., `"foo bar"`. +Strings can span multiple lines. +The results of other expressions can be included into a string by enclosing them in `${ }`, a feature known as [string interpolation]. + +[string interpolation]: ./string-interpolation.md + +The following must be escaped to represent them within a string, by prefixing with a backslash (`\`): + +- Double quote (`"`) + +> **Example** +> +> ```nix +> "\"" +> ``` +> +> "\"" + +- Backslash (`\`) + +> **Example** +> +> ```nix +> "\\" +> ``` +> +> "\\" + +- Dollar sign followed by an opening curly bracket (`${`) – "dollar-curly" + +> **Example** +> +> ```nix +> "\${" +> ``` +> +> "\${" + +The newline, carriage return, and tab characters can be written as `\n`, `\r` and `\t`, respectively. + +A "double-dollar-curly" (`$${`) can be written literally. + +> **Example** +> +> ```nix +> "$${" +> ``` +> +> "$\${" + +String values are output on the terminal with Nix-specific escaping. +Strings written to files will contain the characters encoded by the escaping. + +The second way to write string literals is as an *indented string*, which is enclosed between pairs of *double single-quotes* (`''`), like so: + +```nix +'' +This is the first line. +This is the second line. + This is the third line. +'' +``` + +This kind of string literal intelligently strips indentation from +the start of each line. To be precise, it strips from each line a +number of spaces equal to the minimal indentation of the string as a +whole (disregarding the indentation of empty lines). For instance, +the first and second line are indented two spaces, while the third +line is indented four spaces. Thus, two spaces are stripped from +each line, so the resulting string is + +```nix +"This is the first line.\nThis is the second line.\n This is the third line.\n" +``` + +> **Note** +> +> Whitespace and newline following the opening `''` is ignored if there is no non-whitespace text on the initial line. + +> **Warning** +> +> Prefixed tab characters are not stripped. +> +> > **Example** +> > +> > The following indented string is prefixed with tabs: +> > +> > '' +> > all: +> > @echo hello +> > '' +> > +> > "\tall:\n\t\t@echo hello\n" + +Indented strings support [string interpolation]. + +The following must be escaped to represent them in an indented string: + +- `$` is escaped by prefixing it with two single quotes (`''`) + +> **Example** +> +> ```nix +> '' +> ''$ +> '' +> ``` +> +> "$\n" + +- `''` is escaped by prefixing it with one single quote (`'`) + +> **Example** +> +> ```nix +> '' +> ''' +> '' +> ``` +> +> "''\n" + +These special characters are escaped as follows: +- Linefeed (`\n`): `''\n` +- Carriage return (`\r`): `''\r` +- Tab (`\t`): `''\t` + +`''\` escapes any other character. + +A "double-dollar-curly" (`$${`) can be written literally. + +> **Example** +> +> ```nix +> '' +> $${ +> '' +> ``` +> +> "$\${\n" + +Indented strings are primarily useful in that they allow multi-line +string literals to follow the indentation of the enclosing Nix +expression, and that less escaping is typically necessary for +strings representing languages such as shell scripts and +configuration files because `''` is much less common than `"`. +Example: + +```nix +stdenv.mkDerivation { +... +postInstall = + '' + mkdir $out/bin $out/etc + cp foo $out/bin + echo "Hello World" > $out/etc/foo.conf + ${if enableBar then "cp bar $out/bin" else ""} + ''; +... +} +``` + +Finally, as a convenience, *URIs* as defined in appendix B of +[RFC 2396](http://www.ietf.org/rfc/rfc2396.txt) can be written *as +is*, without quotes. For instance, the string +`"http://example.org/foo.tar.bz2"` can also be written as +`http://example.org/foo.tar.bz2`. diff --git a/doc/manual/src/language/syntax.md b/doc/manual/src/language/syntax.md index 6108bacd6..daf073aef 100644 --- a/doc/manual/src/language/syntax.md +++ b/doc/manual/src/language/syntax.md @@ -6,175 +6,7 @@ This section covers syntax and semantics of the Nix language. ### String {#string-literal} - *Strings* can be written in three ways. - - The most common way is to enclose the string between double quotes, e.g., `"foo bar"`. - Strings can span multiple lines. - The results of other expressions can be included into a string by enclosing them in `${ }`, a feature known as [string interpolation]. - - [string interpolation]: ./string-interpolation.md - - The following must be escaped to represent them within a string, by prefixing with a backslash (`\`): - - - Double quote (`"`) - - > **Example** - > - > ```nix - > "\"" - > ``` - > - > "\"" - - - Backslash (`\`) - - > **Example** - > - > ```nix - > "\\" - > ``` - > - > "\\" - - - Dollar sign followed by an opening curly bracket (`${`) – "dollar-curly" - - > **Example** - > - > ```nix - > "\${" - > ``` - > - > "\${" - - The newline, carriage return, and tab characters can be written as `\n`, `\r` and `\t`, respectively. - - A "double-dollar-curly" (`$${`) can be written literally. - - > **Example** - > - > ```nix - > "$${" - > ``` - > - > "$\${" - - String values are output on the terminal with Nix-specific escaping. - Strings written to files will contain the characters encoded by the escaping. - - The second way to write string literals is as an *indented string*, which is enclosed between pairs of *double single-quotes* (`''`), like so: - - ```nix - '' - This is the first line. - This is the second line. - This is the third line. - '' - ``` - - This kind of string literal intelligently strips indentation from - the start of each line. To be precise, it strips from each line a - number of spaces equal to the minimal indentation of the string as a - whole (disregarding the indentation of empty lines). For instance, - the first and second line are indented two spaces, while the third - line is indented four spaces. Thus, two spaces are stripped from - each line, so the resulting string is - - ```nix - "This is the first line.\nThis is the second line.\n This is the third line.\n" - ``` - - > **Note** - > - > Whitespace and newline following the opening `''` is ignored if there is no non-whitespace text on the initial line. - - > **Warning** - > - > Prefixed tab characters are not stripped. - > - > > **Example** - > > - > > The following indented string is prefixed with tabs: - > > - > > '' - > > all: - > > @echo hello - > > '' - > > - > > "\tall:\n\t\t@echo hello\n" - - Indented strings support [string interpolation]. - - The following must be escaped to represent them in an indented string: - - - `$` is escaped by prefixing it with two single quotes (`''`) - - > **Example** - > - > ```nix - > '' - > ''$ - > '' - > ``` - > - > "$\n" - - - `''` is escaped by prefixing it with one single quote (`'`) - - > **Example** - > - > ```nix - > '' - > ''' - > '' - > ``` - > - > "''\n" - - These special characters are escaped as follows: - - Linefeed (`\n`): `''\n` - - Carriage return (`\r`): `''\r` - - Tab (`\t`): `''\t` - - `''\` escapes any other character. - - A "double-dollar-curly" (`$${`) can be written literally. - - > **Example** - > - > ```nix - > '' - > $${ - > '' - > ``` - > - > "$\${\n" - - Indented strings are primarily useful in that they allow multi-line - string literals to follow the indentation of the enclosing Nix - expression, and that less escaping is typically necessary for - strings representing languages such as shell scripts and - configuration files because `''` is much less common than `"`. - Example: - - ```nix - stdenv.mkDerivation { - ... - postInstall = - '' - mkdir $out/bin $out/etc - cp foo $out/bin - echo "Hello World" > $out/etc/foo.conf - ${if enableBar then "cp bar $out/bin" else ""} - ''; - ... - } - ``` - - Finally, as a convenience, *URIs* as defined in appendix B of - [RFC 2396](http://www.ietf.org/rfc/rfc2396.txt) can be written *as - is*, without quotes. For instance, the string - `"http://example.org/foo.tar.bz2"` can also be written as - `http://example.org/foo.tar.bz2`. +See [String literals](string-literals.md). ### Number {#number-literal} @@ -253,7 +85,7 @@ Attribute sets are written enclosed in curly brackets (`{ }`). Attribute names and attribute values are separated by an equal sign (`=`). Each value can be an arbitrary expression, terminated by a semicolon (`;`) -An attribute name is a string without context, and is denoted by a [name] (an [identifier](./identifiers.md#identifiers) or [string literal](#string-literal)). +An attribute name is a string without context, and is denoted by a [name] (an [identifier](./identifiers.md#identifiers) or [string literal](string-literals.md)). [name]: ./identifiers.md#names diff --git a/doc/manual/src/language/types.md b/doc/manual/src/language/types.md index 229756e6b..82184a8b0 100644 --- a/doc/manual/src/language/types.md +++ b/doc/manual/src/language/types.md @@ -45,7 +45,7 @@ The function [`builtins.isBool`](builtins.md#builtins-isBool) can be used to det A _string_ in the Nix language is an immutable, finite-length sequence of bytes, along with a [string context](string-context.md). Nix does not assume or support working natively with character encodings. -String values without string context can be expressed as [string literals](syntax.md#string-literal). +String values without string context can be expressed as [string literals](string-literals.md). The function [`builtins.isString`](builtins.md#builtins-isString) can be used to determine if a value is a string. ### Path {#type-path} From 17318bc70dd0a3712d0a2c94a180dd9f614c97b0 Mon Sep 17 00:00:00 2001 From: Ryan Hendrickson Date: Wed, 31 Jul 2024 19:22:17 -0400 Subject: [PATCH 2/2] docs: fix string literal example formatting --- doc/manual/src/language/string-literals.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/doc/manual/src/language/string-literals.md b/doc/manual/src/language/string-literals.md index 7605e8c61..8f4b75f3e 100644 --- a/doc/manual/src/language/string-literals.md +++ b/doc/manual/src/language/string-literals.md @@ -36,7 +36,7 @@ The following must be escaped to represent them within a string, by prefixing wi > "\"" > ``` > -> "\"" +> "\"" - Backslash (`\`) @@ -46,7 +46,7 @@ The following must be escaped to represent them within a string, by prefixing wi > "\\" > ``` > -> "\\" +> "\\" - Dollar sign followed by an opening curly bracket (`${`) – "dollar-curly" @@ -56,7 +56,7 @@ The following must be escaped to represent them within a string, by prefixing wi > "\${" > ``` > -> "\${" +> "\${" The newline, carriage return, and tab characters can be written as `\n`, `\r` and `\t`, respectively. @@ -68,7 +68,7 @@ A "double-dollar-curly" (`$${`) can be written literally. > "$${" > ``` > -> "$\${" +> "$\${" String values are output on the terminal with Nix-specific escaping. Strings written to files will contain the characters encoded by the escaping. @@ -107,10 +107,11 @@ each line, so the resulting string is > > > > The following indented string is prefixed with tabs: > > -> > '' +> >
''
 > > 	all:
 > > 		@echo hello
 > > ''
+> > 
> > > > "\tall:\n\t\t@echo hello\n"