social network

This day we zoomed in as best we could – the notebook in Windows 10 began to understand the unix line feed

Notepad in windows 10 began to understand the unix line translation , and not just the windows format.

With the problem of “porridge” instead of readable text for decades, those who tried to open in the Windows environment text documents prepared on other operating systems. Now everything changes all of a sudden. And this change is as little, as epic, in its practical results and ideological consequences. Microsoft is again trying to play in cross-integration and support for open standards.

For many years, Windows Notepad could normally display only those text documents that contained the characters of the beginning of a new line in the format Windows End of Line (EOL) – “carriage return” (CR) and “feed per line” (LF). In fact, this led to the fact that Notepad could not correctly display the contents of text files created in Unix, Linux and macOS, where only the LF character was used as the end-of-line character.

For example, here is a screenshot of Notepad, trying to display the contents of a text file Linux .bashrc, which contains only characters Unix LF EOL:

And here is a screenshot of the recently updated “Notepad”, displays the contents of the UNIX / Linux .bashrc file itself, but with the correct hyphenation:

Please note that the status bar indicates the detected EOL format of the current open file.

Also for flexible management of the new option in the registry section of [HKEY_CURRENT_USER \ Software \ Microsoft \ Notepad] two additional keys are entered:

On the heat of passion, the dispute about the method of starting a new line in electronic documents is comparable to the dispute about the gaps and tabulations in the source code of the programs. This confrontation “for the line” had many reasons , both lying in the field of ancient standards and traditions, and taking their roots in the features of the design of printing machines and teletypes. Equally important was the desire of some programmers to literally execute (interpret) commands and control symbols, while others – to follow common sense.

What can we learn about the problem from Wikipedia?

Historically, on mechanical typewriters there was a lever that returned the carriage to the left edge of the page and scrolled the shaft, pushing the paper upwards onto the line. On teletypes and later alphanumeric printers (ATSPU) there was a head in place of the carriage, in laser printers it ceased to be material, but in the term carriage return all this was continued to be called a carriage so as not to be changed. On teletypes the carriage return and the feed of the line are divided, from where the tradition of representing the line feed as CR + LF has passed to text files.

Systems based on an ASCII or compatible character set use either LF (line feed, 0x0A), or CR (carriage return, 0x0D) individually, or the sequence CR + LF. These names are based on printer commands: line feed means that one line on the paper should be transferred when printing, and carriage return means that the print carriage must return to the beginning of the current line.

  • CR (ASCII 0x0D) was used in 8-bit Commodore machines, TRS-80 machines, Apple II, Mac OS systems up to version 9 and OS-9;
  • LF (ASCII 0x0A) is used in Multics, UNIX, UNIX-like operating systems (GNU / Linux, AIX, Xenix, Mac OS X, FreeBSD), BeOS, Amiga UNIX, RISC OS and others;
  • CR + LF (ASCII 0x0D 0x0A) is used in DEC RT-11 and most other early non-UNIX and non-IBM systems, as well as in CP / M, MP / M, MS-DOS, OS / 2, Microsoft Windows , Symbian OS, Internet protocols.

By standard, any application compatible with Unicode should treat each of the following characters as a line feed:

  • LF (U + 000A): Eng. line feed – feed the line <PS>;
  • CR (U + 000D): English. carriage return – carriage return <VK>;
  • NEL (U + 0085): English next line – go to the next line;
  • LS (U + 2028): English. line separator – line separator;
  • PS (U + 2029): English. paragraph separator – separator of paragraphs.

Moreover, the sequence CR + LF (U + 000D U + 000A) should be treated as one line feed, not two.

But as you know, standards are standards, and implementations often all come out different. And oil in the fire pours the need to correctly display the legacy documents created before the Unicode era. The lack of a single generally accepted representation of line breaks in different operating systems for a long time complicated the exchange of text data between them.

Unicode tries to reconcile this difference by equating CR, LF and CR + LF, but it conflicts with the ASCII inherited by it when interpreting the LF + CR sequence, not the preceded CR: according to ASCII this is one line feed, and according to Unicode there are two.

Back to top button