What's the point in adding a new line to the end of a file?
Some compilers (especially C or C++ ones) give you warnings about:
No new line at end of file
I thought this would be a C-programmers-only problem, but github displays a message in the commit view:
No newline at end of file
for a PHP file.
I understand the preprocessor thing explained in this thread, but what has this to do with PHP? Is it the same include()
thing or is it related to the rn
vs n
topic?
What is the point in having a new line at the end of a file?
The No newline at end of file
you get from github appears at the end of a patch (in diff
format, see the note at the end of the “Unified Format” section).
Compilers don’t care whether there is a newline or not at the end of a file, but git
(and the diff
/patch
utilities) have to take those in account. There are many reasons for that. For example, forgetting to add or to remove a newline at the end of a file would change its hashsum (md5sum
/sha1sum
). Also, files are not always programs, and a final n
might make some difference.
Note: About the warning from C compilers, I guess they insist for a final newline for backward compatibility purposes. Very old compilers might not accept the last line if doesn’t end with n
(or other system-dependent end-of-line char sequence).
There are two aspects:
-
There are/were some C compilers that cannot parse the last line if it does not end with a newline. The C standard specifies that a C file should end with a newline (C11, 5.1.1.2, 2.) and that a last line without a newline yields undefined behavior (C11, J.2, 2nd item). Perhaps for historic reasons, because some vendor of such a compiler was part of the committee when the first standard was written. Thus the warning by GCC.
-
diff
programs (like used bygit diff
, github etc.) show line by line differences between files. They usually print a message when only one file ends with a newline because else you would not see this difference. For example if the only difference between two files is the presence of the last newline character, without the hint it would look like the both files were the same, whendiff
andcmp
return an exit-code unequal success and the checksums of the files (e.g. viamd5sum
) don’t match.
It’s not about adding an extra newline at the end of a file, it’s about not removing the newline that should be there.
A text file, under unix, consists of a series of lines, each of which ends with a newline character (n
). A file that is not empty and does not end with a newline is therefore not a text file.
Utilities that are supposed to operate on text files may not cope well with files that don’t end with a newline; historical Unix utilities might ignore the text after the last newline, for example. GNU utilities have a policy of behaving decently with non-text files, and so do most other modern utilities, but you may still encounter odd behavior with files that are missing a final newline¹.
With GNU diff, if one of the files being compared ends with a newline but not the other, it is careful to note that fact. Since diff is line-oriented, it can’t indicate this by storing a newline for one of the files but not for the others — the newlines are necessary to indicate where each line in the diff file starts and ends. So diff uses this special text No newline at end of file
to differentiate a file that didn’t end in a newline from a file that did.
By the way, in a C context, a source file similarly consists of a series of lines. More precisely, a translation unit is viewed in an implementation-defined as a series of lines, each of which must end with a newline character (n1256 §5.1.1.1). On unix systems, the mapping is straightforward. On DOS and Windows, each CR LF sequence (rn
) is mapped to a newline (n
; this is what always happens when reading a file opened as text on these OSes). There are a few OSes out there which don’t have a newline character, but instead have fixed- or variable-sized records; on these systems, the mapping from files to C source introduces a n
at the end of each record. While this isn’t directly relevant to unix, it does mean that if you copy a C source file that’s missing its final newline to a system with record-based text files, then copy it back, you’ll either end up with the incomplete last line truncated in the initial conversion, or an extra newline tacked onto it during the reverse conversion.
¹
Example: the output of GNU sort
on non-empty files always ends with a newline. So if the file foo
is missing its final newline, you’ll find that sort foo | wc -c
reports one more byte than cat foo | wc -c
. The read
builtin of sh
is required to return false if the end-of-file is reached before the end of the line is reached, so you’ll find that loops such as while IFS= read -r line; do ...; done
skip an unterminated line altogether.
Not necessarily the reason, but a practical consequence of files not ending with a new line:
Consider what would happen if you wanted to process several files using cat
. For instance, if you wanted to find the word foo
at the start of the line across 3 files:
cat file1 file2 file3 | grep -e '^foo'
If the first line in file3 starts with foo
, but file2 does not have a final n
after its last line, this occurrence would not be found by grep, because the last line in file2 and the first line in file3 would be seen by grep as a single line.
So, for consistence and in order to avoid surprises I try to keep my files always ending with a new line.
There is also the point of keeping diff history. If a file ends without a newline character, then adding anything to the end of the file will be viewed by diff utilities as changing that last line (because n
is being added to it).
This could cause unwanted results with commands such as git blame
and hg annotate
.
POSIX, this is a set of standards specified by IEEE to maintain compatibility between operating systems.
One of which is the definition of a “line” being a sequence of zero or more non- characters plus a terminating newline character.
So for that last line to be recognised as an actual “line” it should have a terminating new line character.
This is important if you depend on OS tools to say line count or split / help parse your file. Given PHP is a script language, its entirely possible especially in its early days or even now (I have no idea / postulating) it had OS dependencies like that.
In reality, most operating systems are not fully POSIX compliant and humans are not that machine like or even caring about terminating new lines. So for most things its a smorgasbord of everything either caring about it, warning or just going that last bit of text is really a line so just include it.
Here is an additional reason.
Say you have a file file.txt
containing a list of names, with one name per line (or consider a file such as a .gitignore file).
To add a new entry, it makes sense to call something like:
echo "John" >> file.txt
and you expect this to always work right ?
Well it will actually only work if your file ends with a newline (no matter if your file is empty or not).
Otherwise, say your file contains:
Alice<no_newline>
when you will send your:
echo "John" >> file.txt
you will end up with:
AliceJohn
<new_line_here>
which is definitely not what you expected.
So having all text files terminated with a EOL makes life much easier.