Fast and efficient computing in Python using generators

Generators don’t hold the entire result in memory. It yields one result at a time.

Ways of creating generators:

Using a function

def squares_gen(num):
        for i in num:
                yield i**2

def squares(num):
                results=[]
                for i in num:
                        results.append(i**2)
                return results 

  • Elapsed time for list: 7.360722 Seconds
  • Elapsed time for generators: 5.999999999950489e-06 Seconds
  • Difference in time taken for the list and generators: 7.360716 Seconds for num = np.arange(1,10000000)

Using comprehension

resl = [i**2 for i in num]

resg = (i**2 for i in num)
  • Elapsed time for list: 7.663468000000001 Seconds
  • Elapsed time for generators: 9.999999999621423e-06 Seconds
  • Difference in time taken: 7.663458000000001 Seconds for num = np.arange(1,10000000)

Obtaining results from the generator object:

  1. Using next
    resg = squares_gen(num)
    		print('res of generators: ',next(resg))
    		print('res of generators: ',next(resg))
    		print('res of generators: ',next(resg))
    		
    2.Using loop:
    for n in resg:
    		    print(n)
    		

Advantages of using generators:

  1. The generator codes are more readable.
  2. Generators are much faster and uses little memory.

Results:

  1. Using function is a faster way of creating values in Python than using loop or list comprehension for both lists and generators.
  2. The difference between using list or generators is more pronounced when using a comprehension (though generators are still much faster.)
  3. When we need the result of whole array at a time then the amount of time (or memory) taken to create a list or list(generators) are almost same.

Overall, generators gives a performance boost not only in execution time but with the memory as well.

Appendix

How I calculated the time taken by the process

  • Calculate sum of the system and user CPU time of the current process.
    • time.process_time provides the system and user CPU time of the current process in seconds.
    • Use time.process_time_ns to get the result in nanoseconds

NOTE: The “time taken” shown in this study is subjective to different computers and varies each time depending on the state of the CPU. But each and everytime, the using generators are much faster.

How to insert equation numbering in MS Word

It is of essential to insert equation number if you are working on your thesis and/or any scientific paper that consists of a lot of equations. The easiest and probably the best way of doing this is to write your manuscript in latex. Do consider that but if that is not an option then you can follow the following steps:

  1. I have a manuscript where I want to insert a number of equations in order.
    Screen Shot 2019-05-09 at 12.24.39 PM.png
    Let’s first start with one equation.
  2. We select the equation, and then go to the references tab
    Screen Shot 2019-05-09 at 12.28.46 PM.png
  3. We click on the `Insert Caption` option and select the `label` as an equation. We can exclude the label from the caption if desired.
    Screen Shot 2019-05-09 at 12.30.01 PM.png
  4. We can also edit the numbering format.
    Screen Shot 2019-05-09 at 12.32.14 PM.png
    We can select to include the chapter number where the chapter starts with heading 1 numbering and use the separator as “period”. Here, I chose to exclude the chapter number in the numbering.
    Screen Shot 2019-05-09 at 12.34.45 PM.png
  5. Now, we insert the table with 3 columns and format the cell size according to our requirement.
    Screen Shot 2019-05-09 at 12.37.00 PM.png
    Screen Shot 2019-05-09 at 12.37.20 PM.png
  6. Now, we cut and paste the equation and equation number in the second and third column respectively.
    Screen Shot 2019-05-09 at 12.39.19 PM.png
  7. Now, we need to align everything. We do this by selecting the table and going to the layout tab and `align center`.
    Screen Shot 2019-05-09 at 12.40.35 PM.png
  8. For the table, we don’t need a border, so remove it.
    Screen Shot 2019-05-09 at 12.42.00 PM.png
  9. Now, we have an equation and its number. We can now write as many equations as we like by just copy and paste the format. We can right click and update the field to get the ordered numbering of equations.
    Screen Shot 2019-05-09 at 1.03.43 PM.png
    We can also edit the equation label and use `Eq.` instead of just a number.
    Screen Shot 2019-05-09 at 1.05.17 PM.png

Hope, this article might come handy in your writing of your thesis!

Hosting a website on the Heroku server

In the earlier post, we’ve seen how to run a python app on the Heroku server. It is even easier to host a webpage on the Heroku server. Most of the steps are similar. Here, I run the app containing HTML, CSS and javascript codes only.

Steps involved

First, navigate to the directory containing the index.html file and execute the following codes (with desired modifications) in the terminal.

Screen Shot 2019-05-08 at 5.52.57 PM.png

Initiate an empty git repo

git init

Create the composer.json and index.php files.

touch composer.json
echo "{}" >composer.json
touch index.php
echo "<?php include_once("index.html"); ?>" >index.php

Now, create a Heroku app and check it’s availability.

heroku create omni-food-app

Add all the files in the current directory to git and commit

git add .
git commit -m 'Initial app to omni-app'

Screen Shot 2019-05-08 at 6.27.18 PM.png
. . .

Next, we deploy the app to the Heroku server

git push heroku master

Screen Shot 2019-05-08 at 6.27.42 PM.png
. . .
Screen Shot 2019-05-08 at 6.27.57 PM.png

Now, follow the link and your webpage is running on the Heroku server.

Screen Shot 2019-05-08 at 6.32.40 PM.png

Using GitHub for Team Collaboration: Part II

Introduction to GitHub

We first need an account on the GitHub. You can sign up on the GitHub website .

Screen Shot 2018-10-17 at 2.47.08 AM.png

 

Steps for starting a repository on Github:

Screen Shot 2018-10-17 at 2.50.20 AM.png

Screen Shot 2018-10-17 at 2.51.39 AM.png

Screen Shot 2018-10-17 at 2.54.47 AM.png

Now, we push the code from our master branch to the GitHub.

git status
git push https://github.com/utpalkumariesas/learn_git.git master

Screen Shot 2018-10-17 at 2.58.44 AM.png

Now, let’s add some more changes to the master branch and push those changes to the GitHub. Before that, we can create an alias to the long address to the online repository of the GitHub so that we don’t need to type that again and again. Here, we use “origin” as an alias.
git remote add origin https://github.com/utpalkumariesas/learn_git.git
Now, we can simply type
git push origin masterto push the repository to the remote location on GitHub.

Screen Shot 2018-10-17 at 3.07.58 AM.png

Cloning remote repository locally

We can instead do the other way round too. We can clone the online repository onto our local computer.

Screen Shot 2018-10-17 at 3.14.40 AM.png

Screen Shot 2018-10-17 at 3.20.27 AM.png

Collaborating on GitHub

The first thing we should make sure that we have the updated master code present locally. We can pull the code using the pull command in the cloned directory
git pull origin master
Now, we made a new branch called complex_app and made some changes and commit those changes.

Screen Shot 2018-10-17 at 3.29.51 AM.png
Now, we want to push this branch to the remote repository on the GitHub. We do not want to merge this with the master and then push to the remote GitHub repository as this will mess up the master branch on the GitHub. Later, all the collaborators can review the code and then decide if they wanna merge it or not.

Screen Shot 2018-10-17 at 3.32.56 AM.png

Screen Shot 2018-10-17 at 3.33.38 AM.png

Screen Shot 2018-10-17 at 3.35.22 AM.png

Screen Shot 2018-10-17 at 3.38.25 AM.png

Forking

We can fork the repo on GitHub in order to contribute to some open source project. The forking will copy the open source project from other’s account to our own account. After that, we can clone that repository to our local computer. Later if we wanna contribute to that project, we can do the pull request. And then if the original creator of the project accepts the pull request then they can merge it to the original project.

Using GitHub for Team Collaboration: Part I

Why do we use Git?

Git can record changes to our file over time. We can recall any specific version of the file at any given time. It also allows many people to easily collaborate on a project and have their own version of the project files on their local computer. It is incredibly useful to keep all the histories of the project codes so that if we want to go back the previous version in the future, we don’t need to rewrite it but we can just switch back to the past version directly.

GitHub

Okay! So far we understood git to some extent. But what is “GitHub”? Well, GitHub is an online service that hosts our projects, helpful in sharing our code to other collaborators of the project. The collaborators can simply download the codes and work on them. They can re-upload their edits and merge with the main codebase.

Installing Git

The easiest way is just to download from the link here. Select the download for your operating system and this will download in your computer. Go through the installation steps and that’s it, you have Git now! If you are Windows user, then I’d suggest you install some text editor for writing and running codes. My personal favorite is “VSCode” by Microsoft. It is free, open-source and cross-platform i.e. it provides a similar environment for Windows, Linux, and Mac Users and it has inbuilt “terminal”.

After installing Git and supposedly VSCode then you open and terminal and type

git --version

Screen Shot 2018-10-16 at 11.43.24 PM.png

Setting up Git

Now, after installing Git in your local computer, the first thing that you wanna do is to set it up so that Git could know you. We can tell Git about us by telling the username and email.

git config --global user.name utpalkumar
git config --global user.email utpalkumar50@gmail.com

 

How Git works?

We make a container for our project where we dump all our codes and it is popularly called repositories (or repo, for short). We can have a repository on our local computer as well as remotely on some kind of online repository hosting service like GitHub. We can track the contents of the repository using Git. Git tracks the history of the contents of the repository using the so-called “commits”. Commits are the different points in the history of making the repository where we have told the Git to save. We tell the Git to save the version using the “commit” command which we will see in detail later. If we have made, say 5 commits to our repository, we can roll back to any previous commit smoothly.

Creating a repository

We first open an editor and a terminal for typing the commands. In VSCode, we can do it both in one window.

Screen Shot 2018-10-17 at 12.06.23 AM.png

Make sure the terminal and the editor points to the same path.

  1. To initialize empty Git repository in the directory, we type in the terminal:

git init

Screen Shot 2018-10-17 at 12.09.33 AM.png

The existence of the .git directory in our working directory shows that this is now the git repository. We can even initialize Git in a directory which already has contents in the same way.

After initializing the Git in the working directory, we can create and modify any files in the current directory or the sub-directory. After finishing the code modification, we can check the status of the Git using the command

git status

It will show the status of files which are tracked and untracked. We can add all the files in the current directory and subsequent subdirectory for the tracking using the command:

git add .

Alternatively, we can also add each file separately by their names.

Screen Shot 2018-10-17 at 12.47.46 AM.png

Sometimes, we don’t wanna add some files for committing to track using the Git but by mistake, it gets added. To remove those files, we can use the command:

git rm --cached filename

If we modify the file “testApp.py” and then run the command

git status

Screen Shot 2018-10-17 at 12.57.01 AM.png

Making Commits

In simple words, a commit is a safe-point, a snapshot in time of our code at a particular point.

git commit -m "some message"

Screen Shot 2018-10-17 at 1.03.50 AM.png

Please make sure to add meaningful messages to the commits so that at some point if we wanna go back to the previous version, we can figure that out easily using the message.

Screen Shot 2018-10-17 at 1.06.58 AM.png

If we wanna see the history of all our commits we can use the command

git log

Screen Shot 2018-10-17 at 1.11.13 AM.png

Sometimes, if we have a lot of commits, we don’t wanna print everything out. So, we can condense the output of the log using the command:

git log --oneline

Screen Shot 2018-10-17 at 1.14.23 AM.png

Undoing stuff

Undoing the mistake of one version is the primary goal of using Git. Let’s see how we can execute that. We can rewind the commit and go back to the previous version. We can do that by three ways in the order of increasing risk:

  1. Checkout commit: Very safe option. Best option to go back to the past version without getting rid of any other versions.
  2. Revert commit: Apparently, delete some unrequired commits from the history.
  3. Reset commit: We need to be very sure before we do this. This will permanently delete all the commits after the point we move to.

Screen Shot 2018-10-17 at 1.25.33 AM.pngHere, I have added 2 more commits and output the total of 4 commits.

Now, if we wanna see the state of the code at the point we added the axis labels only, we can do that.

git checkout 016b638

Screen Shot 2018-10-17 at 1.28.40 AM.png

This takes us back immediately to the previous version where we didn’t have the title or have changed the line styles. This is the best way to go back in time, inspect the past without changing anything. Now, we can come back to the present time by just using the command:

git checkout master

Screen Shot 2018-10-17 at 1.32.17 AM.png

 

Now, let’s say we wanna remove the commit where we have added the title to the plot. We can do that using the command:

git revert 60c62cb

When we execute this command, we get the following on the screen. Don’t get intimidated. This is a vim text editor which is asking you to give the title to this commit.

Screen Shot 2018-10-17 at 1.36.03 AM.png

We type “:wq” to save that file and quit.

Screen Shot 2018-10-17 at 1.40.42 AM.png

Now, we can see that this has removed the line in the code which added the title to the plot. But when we log the commits, we see that it has not actually deleted the commit but added a new commit which has reverted that commit.

Okay, if we want to permanently delete some commits and go back to the point in history, we can use the “reset” option

git reset 016b638

Screen Shot 2018-10-17 at 1.46.34 AM.png

Now, we see that all the commits from the point we moved in the past has been deleted but the code stays unchanged. This is a good way to merge some commits into one. But if we are really strict and want to change the code as well, we can do that using the flag “hard”:

Screen Shot 2018-10-17 at 1.49.15 AM.png

Beware that now there is no way to get back to the versions where we had title and linestyles.

Branches

So far we have been working on one branch that is the “master” branch of the repository. When we make any commits, we were committing only to the master branch. We usually use the master branch to represent the stable version of our codes. For that reason, we don’t really wanna try new features or new codes on this branch as there is a risk of messing up the code. What we can do is try out the new feature in an isolated environment and if we like it then we can merge then in the master branch. This is mainly useful if more than one person is working on a project. They can make the branch of the code, apply several new features and when they are really satisfied then they can add it to the master branch.

Screen Shot 2018-10-17 at 2.02.45 AM.png

If we wanna add the branch at this point of the code, we can do

git branch feature1

If we wanna see all the branches, we type

git branch -a

Screen Shot 2018-10-17 at 2.05.12 AM.png

The asterisk (*) in front of “master” shows that we are currently on the master branch. To switch the branch, we use

git checkout feature1

Screen Shot 2018-10-17 at 2.07.41 AM.png

Now, we can work on the branch “feature1” separately than the master branch

Screen Shot 2018-10-17 at 2.10.06 AM.png

When we switch back to the master branch, we can notice that we have not actually added any title

Screen Shot 2018-10-17 at 2.11.16 AM.png

If the things don’t work out as expected, we can even delete the branch

git checkout master# first we move to the master branch

git branch -d feature1 # this will give the error because this branch has not been merged with the master branch

Instead, we can use

git branch -D feature1

to forcibly delete the branch.

Okay, now let’s see more about working with the branches. The quick way of making a branch and checkout to it is

git checkout -b feature-a

Now, we work on this branch.

Screen Shot 2018-10-17 at 2.21.02 AM.png

Now, we have two branches “feature1” and “feature-a” going on at the same time. But neither one is affecting the original codes. One branch has some changes to the plotting of the data and the other branch is having the title to the plot. Now, how do we merge those two changes to the master branch?

Merging Branches

To merge the branches, we first need to move to the branch into which we wanna merge, which in our case is master branch.

git checkout master

git merge feature1

Screen Shot 2018-10-17 at 2.29.55 AM.png

Now, let’s merge the other branch

Screen Shot 2018-10-17 at 2.32.21 AM.png

This time, we encounter some conflicts to the merge. In this case, we need to manually fix the conflict and then add the files using

git add .
git commit

Then a vim editor will appear, just save and quit the editor using the command “:wq”.

Screen Shot 2018-10-17 at 2.41.40 AM.png

Deploying a Python app on Heroku Server

Heroku is a cloud platform supporting several programming languages. It allows a developer to build, run, and scale different applications. Heroku hosts its services on the Amazon’s EC2 cloud computing platform. The Heroku applications have a unique domain name “appname.herokuapp.com” which routes the run requests to the correct application containers or “dynos”.

For deploying an app on the Heroku server, we first need to install Heroku on the local computer. On Mac, just install using the Heroku and Git installer and it should do the job.

Now, I’d like to make the app using the Dash library of Plotly. This library makes the job of complex web coding quite smooth.

  1. Make a directory and “cd” to that directory
  2. Initialize the folder with git and a virtualenv. It is always a good idea to use separate environment for different projects.
git init #initializes an empty git repo
virtualenv venv
source venv/bin/activate

Screen Shot 2018-10-16 at 1.17.00 PM
virtualenv creates a fresh Python instance. We need to reinstall the app’s dependencies i.e. all the libraries required by the app we are making.
Let’s install all the dependencies:

pip install dash
pip install dash-renderer
pip install dash-core-components
pip install dash-html-components
pip install plotly
pip install gunicorn

Now, we initialize a folder with the app (app.py), a .gitignore file (to tell git what files in the directory to ignore), requirements.txt (tells heroku to install given packages), and a Procfile (for the command) for deployment.
For making the requirements.txt, we run

pip freeze > requirements.txt

After making all the above files, we can create the app on Heroku. The application is sent to heroku using either of Git, Github, Dropbox or via an API. Here, we will use Git.

heroku create utpal-dash-app
git add . #add all files to git
git commit -m 'Initial app to Heroku'
git push heroku master # deploy code to heroku
heroku ps:scale web=1  # run the app with a 1 heroku "dyno"
  1. Update the code and redeploy

When we modify app.py with our own code, you will need to add the changes to git and push those changes to heroku.

git status # view the changes
git add .  # add all the changes
git commit -m 'a description of the changes'
git push heroku master

 

Time Series Analysis: Filtering or Smoothing the Data

In this post, we will see how we can use Python to low pass filter the 10 year long daily fluctuations of GPS time series. We need to use the “Scipy” package of Python.

The only important thing to keep in mind is the understanding of Nyquist frequency. The Nyquist or folding frequency half of the sampling rate of the discrete signal. To understand the concept of Nyquist frequency and aliasing, the reader is advised to visit this post. For filtering the time-series, we use the fraction of Nyquist frequency (cut-off frequency).

Following is the code and line by line explanation for performing the filtering in few steps:

import numpy as np #importing numpy module for efficiently executing numerical operations
import matplotlib.pyplot as plt #importing the pyplot from the matplotlib library
from scipy import signal

from matplotlib import rcParams
rcParams['figure.figsize'] = (10.0, 6.0) #predefine the size of the figure window
rcParams.update({'font.size': 14}) # setting the default fontsize for the figure
rcParams['axes.labelweight'] = 'bold' #Bold font style for axes labels
from matplotlib import style
style.use('ggplot') #I personally like to use "ggplot" style of graph for my work but it depends on the user's preference whether they wanna use it.

# - - - # We load the data in the mat format but this code will work for any sort of time series.# - - - #
dN=np.array(data['dN'])
dE=np.array(data['dE'])
dU=np.array(data['dU'])
slat=np.array(data['slat'])[0]
slon=np.array(data['slon'])[0]
tdata=np.array(data['tdata'])[0]
stn_name=np.array(stn_info['stn_name'])[0]
stns=[stn_name[i][0] for i in range(len(stn_name))]

# Visualizing the original and the Filtered Time Series
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
indx=np.where( (tdata > 2012) & (tdata < 2016) )
ax.plot(tdata[indx],dU[indx,0][0],'k-',lw=0.5)
## Filtering of the time series
fs=1/24/3600 #1 day in Hz (sampling frequency)

nyquist = fs / 2 # 0.5 times the sampling frequency
cutoff=0.1 # fraction of nyquist frequency, here  it is 5 days
print('cutoff= ',1/cutoff*nyquist*24*3600,' days') #cutoff=  4.999999999999999  days
b, a = signal.butter(5, cutoff, btype='lowpass') #low pass filter


dUfilt = signal.filtfilt(b, a, dU[:,0])
dUfilt=np.array(dUfilt)
dUfilt=dUfilt.transpose()

ax.plot(tdata[indx],dUfilt[indx],'b',linewidth=1)

ax.set_xlabel('Time in years',fontsize=18)
ax.set_ylabel('Stations',fontsize=18)
# ax.set_title('Vertical Component CGPS Data')
plt.savefig('test.png',dpi=150,bbox_inches='tight')

test.png

Utility program: add all images in your directory to make a video

 

If you want to add all the images in your directory sequentially to make a video for a presentation then you can use this utility program.

In the first step, we will make a series of “.png” images by plotting the bar diagram of the GDP of the top 10 countries with years (1960-2017). We then add all these images sequentially to make a video. Here, we will add the images using the cv2 package of Python. Please keep in mind that the video can also be created using the matplotlib package of python. This is an alternative to perform the same task. In some cases, when we have a collection of images and we want to add them, then also this can be very handy.

To install the cv2 package using Anaconda, type:

conda install -c menpo opencv

Now, we can obtain the data for GDP of all countries from 1960-2017 from the World Bank website. We then use “pandas” package of Python to read the csv data file and “matplotlib” package to plot the bar diagram.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

# function for setting the format of values in billions of dollars
def trillions(x,pos):
    'The two args are the value and tick position'
    return '${:.1f}T'.format(x * 1e-12)
formatter = FuncFormatter(trillions)

df=pd.read_csv('API_NY.GDP.MKTP.CD_DS2_en_csv_v2_10203569.csv',skiprows=3)
df = df.set_index("Country Name")
with open('neglect_regions.txt',mode='r') as file:
    alist = [line.rstrip() for line in file]
# print(alist)
df=df.drop(alist)

years=np.arange(1960,2018)
xx=45
for year in years:
    num=10
    print(year)
    top_10=df.nlargest(num, str(year))
    # print(top_10[str(year)])
    fig, ax = plt.subplots(figsize=(10,6))
    ax.yaxis.set_major_formatter(formatter)
    plt.bar(np.arange(num),top_10[str(year)].values)
    plt.xticks(np.arange(num), top_10.index.values,fontsize=6)
    plt.title(str(year))
    plt.savefig('figures/gdprank_{}.png'.format(year),dpi=200,bbox_inches='tight')
    plt.close(fig) #to clear the memory
    plt.cla()

#source:https://data.worldbank.org

 

gdprank_2017.png

The series of bar diagram for each year from 1960-2017 is saved in the “figures” directory. Then we can run the command:

python conv_image2video.py -loc "figures" -o "world_countries_gdp.mp4"

to add all the “.png” images in the figures directory to make the video named “world_countries_gdp.mp4”.

Usage of the program “conv_image2video.py”

  1. To convert the images in the current directory, with the extension “.png”, output file name “output.mp4” and frame rate 5 fps, simply type:

python conv_image2video.py

  1. If you want to change some options, add the following flags:
-loc "path_to_images": This will look for images in the directory "path_to_images"; default="."
-o "custom_name.mp4": This will output the video with the name "custom_name.mp4"; default: "output.mp4"
-ext "jpg": This will work on the jpeg images; default: "png"
-fr "1": This will change the speed/frame rate of the video; default: "5"

 

To download this program, click here.

If you want to download all the files (program +examples), click here.

The “ABC” of writing or editing scientific manuscript

If you’re doing science, you need to, inevitably, write a manuscript. I have read and collected, from various sources, some of the important points about writing in science. The most important and recommendable source is the “Writing in Science” course by Stanford University! I would like to share my collection with my readers. Don’t worry, I will keep it short and concise so that you don’t need to read it like a novel but you can always refer to it while writing.

The aim of the writer should be to make the manuscript clear, elegant, and stylish. It should have streamlined flow of ideas for which the writer must follow some set of rules. After completing the sentence, the writers must ask themselves whether it is: readable, understandable, and enjoyable to read.

Editing the manuscript:

Sentence Level Editing:

Writing a manuscript always starts with editing at the lowest level, which is a sentence. For editing the manuscript at the sentence level, the writer should keep in mind a few things:

  1. Use active voice (Subject + Verb + Object). It is livelier and easy to read.
  2. Write with verbs (instead of nouns).
    • Use strong verbs
    • Avoid turning verbs into nouns
    • Don’t bury the main verb
    • Pick right verb
    • Use “to be” verbs purposefully and sparingly.
    • Don’t turn spunky verbs into clunky nouns.
    • “Compared to” to point out similarities between different things. “Compared with” to point out differences between similar things (often used in science).
    • “That” defining. “Which” non-defining
  3. Avoid using “his/her”. Instead, use “their”.
  4. Cut unnecessary words and phrases ruthlessly. Get rid of
    • Dead-weight words and phrases. E.g., as it is well known, as it has been shown.
    • Empty words and phrases. E.g., basic tenets of, methodologic, important.
    • Long words or phrases.
    • Unnecessary jargons and acronyms
    • Repetitive words or phrases
    • Adverbs. E.g., very, really, quickly, basically, generally etc
  5. Eliminate negatives, and, superfluous uses of “there are/ there is”.
  6. Omit needless prepositions. Change, “They agreed that it was true” to “they agreed it was true”.
  7. Experiment with punctuation (comma, colon, dash, parentheses, semicolon, period). Use them to vary sentence structure.
    • Semicolon connects two independent clauses.
    • Colon to separate items in a list, quote, explanation, conclusion or amplification.
    • Parentheses to insert an afterthought/explanation.
    • Dash to add emphasis, or to insert an abrupt definition or description, join or condense. Don’t overuse it, or it loses its impact.
  8. Pairs of ideas joined by “and”, “or”, or “but” should be written in parallel form. E.g., The velocity decreased by 50% but the pressure decreased by only 10%.

Paragraph level Editing:

  1. 1 paragraph = 1 idea
  2. Give away the punch line early.
  3. The logical flow of ideas. General -\> specific-\> take home message. Logical arguments: if a then b; a therefore b.
  4. Parallel sentence structure.
  5. If necessary then transition words.
  6. The emphasis at the end.
  7. Variable sentence length. Long, short, long
  8. Follow: Arguments, counter-arguments, rebuttals

Writing Process:

Many writers are not sure how to start and how to organize their work. Here are some tips.

  1. Prewriting: give 70% time
    • Get Organized first
      • Arrange key facts and citations from literature into a crude road map- think in paragraphs and sections.
      • Like ideas should be grouped; like paragraphs should be grouped.
  2. Writing the first draft: give 10% time
    • Don’t be a perfectionist: get the ideas down in complete order.et
    • Focus on logical organization
    • Write it quickly and efficiently
  3. Revision: give 20% time
    • Read out your work loud: Brain processes the spoken word differently
    • Do a verb check: Underline the main verb in each sentence (lackluster verbs, passive verbs, buried verbs).
    • Cut clutter: Watch out for: dead weight words, empty words, long words and phrases, Unnecessary jargons and acronyms, repetitive words or phrases, adverbs.
    • Do an organizational review: tag each paragraph with a phrase or sentence that sums up the main point.
    • Get feedback from others: ask someone outside your department to read your manuscript. They should be able to grasp: the main findings, take-home messages, and significance of your work. Ask them to point out particularly hard-to-read sentences and paragraphs.
    • Get editing help: find a good editor to edit your work.

 

Checklist for Final Draft

  1. Check for consistency: the values of any variable (such as the mean temperature of your data) used in different sentences, paragraphs or sections should be the same.
  2. Check for numerical consistency:
    • Numbers in your abstract should match the numbers in your tables/figures/text,
    • Numbers in the text should match those in tables/figures.
  3. Check for references:
    • Does that information really exist in that paper?
    • Always cite/go back to primary source.

The Original Manuscript:

Recommended order of writing:

  1. Tables and Figures: Very important
    • Should stand-alone and tell a complete story. The reader should not need to refer back to the text.
    • Use fewest figures/tables
    • Do not present the same data in both table and figure.
    • Figures: Visual impact, show trends and patterns, tell a quick story, tell a whole story, highlight particular result
      • Keep it simple (If it’s too complex then maybe it belongs to the table)
      • Make easy to distinguish the group.
    • Tables: give precise values.
      • Use superscript symbols to identify footnotes and give footnotes to explain experimental details.
      • Use three horizontal lines for table format.
      • Make sure everything lines up and looks professional
      • Use a reasonable number of significant figures
      • Give units
      • Omit unnecessary columns
  2.  Results:
    • Summarize what the data show
    • Point out simple relationships
    • Describe big picture trends
    • Cite figures or tables that present supporting data.
    • Avoid repeating the numbers that already available in tables or figures.
    • Break into subsections with headings, if necessary.
    • Complement the information that is already in tables and figures
    • Give precise values that are not available in the figure
    • Report the percent change or percent difference if the absolute values are given in tables.
    • Don’t forget to talk about negative results.
    • Reserve information about what you did for the methods section
    • Reserve comments on the meaning of your results for the discussion section.
    • Use past tense for completed actions:
      • We found that…
      • Women were more likely to…
      • Men smoked more cigarettes than…
    • Use the present tense for assertions that continue to be true, such as what the tables show, what you believe, and what the data suggest:
      • Figure 1 shows…
      • The findings confirm….
      • The data suggest…
      • We believe that this shows…
      • Use the active voice
  3. Methods:
    • Give a clear overview of what was done.
    • Give enough information to replicate the study.
    • Be complete but make life easy for your reader:
      • Break into smaller subsections with subheadings
      • Cite a reference for commonly used methods
      • Display a flow diagram or table where possible.
      • May use jargon and the passive voice more liberally
    • Use past tense to report methods (“we measured”)
    • Use present tense to describe how data are presented in the paper (“data are summarized as means +- SD”)
  4.  Introduction:
    • Typically 3 paragraphs long (recommended range 2-5)
    • Should focus on the specific hypothesis/aim of your study
    • Information comes in Cone format:
      • What’s known: Paragraph 1
      • What’s unknown: limitations and gaps in previous studies: paragraph 2
      • Your burning question/hypothesis/aim: paragraph 3
      • Experimental approach: paragraph 3
      • Why your experimental approach is new, different and important: paragraph 3
    • Keep paragraphs short
    • Write for the general audience (clear, concise, non-technical)
    • Do not answer the research questions.
    • Summarize at a high level
  5. Discussion:
    • Information comes in inverted cone format
    • Answer the questions asked
    • Support your conclusion
    • Defend your conclusion
    • Give a big-picture take-home message: what do my results mean and why should anyone care. Make sure your take-home message is clear and consistent.
    • Use active voice
    • Tell it like a story
    • Don’t travel too far from your data
    • Focus on limitations that matter
    • Verb Tense:
      • Past when referring to study details, results, analyses, and background research. E.g., we found that…, Subjects may have experienced…, Miller et al. found…
      • Present when talking about what data suggest. E.g, the greater weight loss suggests…, the explanation for this difference is not clear, potential explanation includes…
  6. Abstract:
    • Overview of the main story
    • Gives highlights from each section of the paper
    • Limited length (100-300 words)
    • Stands on its own
      • Background
      • Question/aim/hypothesis

      • Experiment
      • Results
      • Conclusion
      • Implication, speculation or recommendation

Plagiarism:

In the end, it is profoundly important to stay away from plagiarism. Do not pass off other people’s writing (or tables and figures) as your own.

What is plagiarism:

  1. Cutting or pasting sentences or even phrases
  2. Slightly rewriting or re-arranging others’ words. It is unlikely that 2 people will come up with exact 7-8 strings in a sentence independently.

-Utpal Kumar

Institute of Earth Sciences, Academia Sinica