Notes for a Scientific Writing Workshop

Posted on Mon 09 March 2020 in 2020-scientific-writing

Lucia and I hold a scientific writing workshop for students. No credits, four days split into two blocks. These are (some of) the notes, mainly about LaTeX. This document is incomplete and subject to updates. We also uploaded the slides to the workshop.

Wikibooks has an excellent book on LaTeX for more information; the bottom of this page contains a small example document containing examples for most of the things I discuss here.

LaTeX is the de-facto standard in our field (rightfully so). Use it. It is fun and works across machines. Being based on plain text, it is easy to version-control. Changes in style (e.g. for submitting at a different conference) are easy.

Use git for version control and commit often, even if only you are working on the text. Make backups, at least daily. Git helps you by pushing your commits to a remote repository.

Try to not use a complicated template; it will make understanding LaTeX harder.

If you use overleaf: do not ignore the errors. Overleaf tries to fix errors for you but you get no guarantee that everything works as you want.

There are two implementations for LaTeX: pdflatex and lualatex. Both are compatible, but lualatex is the more modern version. It has full unicode support and can work with all fonts. Use it if you can, it will make your life (slightly) easier.

The very basics

Paragraphs in LaTeX are made with an empty line. \\ Marks a newline, not a paragraph. You rarely need \\. Sections titles are inserted with \section{Foo}, \subsection{Bar}, \subsubsection{Baz}. Paragraphs can be named with \paragraph{Foo}. This is helpful if you e.g. want to describe different aspects in one paragraph each (à la “Our system consists a Frobnicator, a Barsalafer and a Bazimoner. [Write a named paragraph for each]”).

To refer to sections, add labels. Write \label{sec:Foo} after a seaction header. sec:Foo is the label that can be references later on. This can be any string, but using sec: as a prefix for sections, fig: for figures, tab: for tables and ex: for examples is standard and useful for quickly grasping what is refered to. References are made by \ref{labelname}. Either use a package such as cleveref or make sure to use ~ like this: Section~\ref{sec:Foo}. The ~ is a non-breaking space, meaning that LaTeX will make sure to not have a newline between “Section” and the section number.

Tables and Figures

Tables and Figures are so-called float environments. They are moved by LaTeX to fitting locations on the page, either the top or bottom. This is by design and desired! Refrain from using [h] to force them to be placed in the middle of the text.

Tabulars are what we would call “tables” outside of latex, i.e. cells. Do not use vertical lines. Do use booktabs. If you want to learn more about tables, read Tables in LATEX2ε: Packages and Methods, which covers probably everything you will ever need.

Spaces

Spaces: LaTeX handles spacing intelligently. It inserts more space after a full stop to denote the end of a sentence. If you have a full stop with a space after it in the middle of a sentence (e.g. when writing “e.g.”), escape the space with a backslash to denote that this is not the end of a sentence: Be funny, e.g.\ by telling a joke. Depending on preference, you might want to add a small space by inserting \,: e.\,g.\.

Bibliography

BibTeX is nice and standardized. Use proper sources for your bibtex entries (e.g. the ACL anthology) and add missing information. Use a recognizable key such as lastname-etal-2019-somekeyword instead of auto-generated random number sequences.

You can protect capitalization by enclosing words in brackets: “The {German} Title”. Only do this for proper nouns! Lower-casing of words is a feature, do not force different casing in your bibliography for all words.

BibLaTeX can do more than BibTeX and is fully unicode-compatible. I would use it for theses; paper submissions usually require standard BibTeX. If you use BibLaTeX, the command to create the bibliography file is biber. I.e., you compile your document as follows: lualatex mypaper && biber mypaper && lualatex mypaper && lualatex mypaper

other packages

If you want to annotate remarks for others, use todonotes.

gb4e provides environment for (linguistic) examples and glosses. Examples for gb4e and other packages can be found in the LaTeX wikibook.

Just run texdoc [package_name] in a shell if you need information about a package or search for the package on ctan. ctan is the central TeX archive network, THE place to find packages.

graphics

Tikz is a versatile and frustrating tool to create graphics of all kinds in LaTeX. Its biggest strength is the integration into LaTeX to create a common look. It can also be used to draw over text.

If you have graphics, use vector graphics. Scale graphics uniformly by a common factor or – better – use a tikz export if available, e.g. when using gnuplot.

If you need to copy a Figure from a paper, open the paper in inkscape and import it using the poppler/Cairo import. Ungroup everything (Ctrl-Shift-G), Select the region you are interested in, invert selection (press “!”), remove everything (press delete), resize (Ctrl-Shift-D -> “resize page to drawing or selection”), save, done.

An example LaTeX file

There is an example LaTeX file with the corresponding bibtex file. You can have a look at the resulting PDF. The LaTeX file is also shown below:

\documentclass{scrartcl}
% scrartcl is from the koma classes.  They look better than the standard
% in my opinion. other options: scrreprt (Reports), scrbook (e.g. for theses)

% This package makes sure you can write Unicode.
% You only need this when using pdflatex.
% Use lualatex whenever you can, it has much better unicode support
% and many more desirable features such as good font handling.
% \usepackage[utf8]{inputenc}

%set the language
\usepackage[english]{babel}

% Bibliography settings


% Some nice extras you probably want
% Create hyperlinks in the PDF
\usepackage{hyperref}
% Use nicer tables
\usepackage{booktabs}
% microtype does some magic to optimize typography
\usepackage{microtype}
% needed to include graphics
\usepackage{graphicx}

% We use biblatex here, which is the better alternative
% to old-style bibtex.  It is very cutomizable.
\usepackage[%
% Sets the style to "Köhn (2015)".
style=authoryear
% otherwise, names are forced to be unique in the
% citation even with different years of the publication
% e.g. it would use Köhn (2018a) and Köhn (2015b)
% instead of just Köhn (2018) and Köhn (2015)
,uniquename=false
% Creates back-references ("Cited on page 12"), very
% helpful for long documents
,backref=true
% Entries in the bibliography are seperated by vertical spaces
,block=space
% Print up to 999 authors for each publication in the bibliography 
,maxbibnames=999
% Do not show DOI and URL information
,doi=false
,url=false
% Use only the year when printing the date
,date=year
]{biblatex}

% Now define where the bibtex file(s) are
\addbibresource{bibliography.bib}


% define author etc
\author{Arne Köhn}
\title{Example Document}
\date{\today} % \today always returns the current day in the language set above.

% Start the document
\begin{document}

% Command to create the title.  Uses authors etc defined before.
\maketitle

% Creates the table of contents.
% Not needed for short papers as this.
% \tableofcontents
% If you have a TOC, add a pagebreak
% \pagebreak

\section{Introduction}
\label{sec:introduction}

A long sentence with an e.\,g.\ inserted.  Notice how spacing differs
for e.g. in contrast to e.g.\ in contrast to e.\,g.\ and never use the
first variant.

A new paragraph.  Use empty lines, not backslashes to create a new
paragraph.\\
Backslashes only introduce a new line.

\begin{itemize}
\item Items  
\item More items
\end{itemize}

\begin{enumerate}
\item Enumerations
\item even more
\end{enumerate}

\begin{figure}
  \centering
  This Figure could contain great images
  included by \verb+\includegraphics[width=\textwidth]{somefile.pdf}+.

  Instead, it only contains centered text.
  \caption{A Figure that actually only contains text.}
  \label{fig:examplefig}
\end{figure}

Some citations: cite gives you name and year without parantheses
\cite{Koehn2018-inc-nlp}, parencite is for giving a source for a
statement \parencite{Koehn2018-PLG} and textcide use used when you
write about a source such as ``\textcite{koehn-2015} proposed something
weird''.  Let's also refer to Table~\ref{tab:exampletable} and
Figure~\ref{fig:examplefig}.

\begin{table}
  \centering
  \begin{tabular}{rrr}
    % Use toprule on the top and bottomrule on the bottom
    \toprule
    Value &Foo & bar \\
    % Only use midrules to separate blocks, not for every line
    \midrule
    X  & 0.57 & 1.50 \\
    Y  & 5.62 & 2.52 \\
    \midrule
    sum& 5.32 & 4.03 \\
    \bottomrule
  \end{tabular}
  \caption{A pointless table.}
  \label{tab:exampletable}
\end{table}

\printbibliography

\end{document}