Some pictures on preparing my MSCA proposal

While preparing the final report on the past MSCA project, I found some memorable pictures. Here me, my wife and nephew are building a LEGO illustration for the project proposal. Yes, we had some fun while I was thinking about the concept.

The results looks pretty.

Still, as the concept illustration, I draw this figure. Today, I have reused it for the report illustration.

Visualizing ASE atoms in Jupyter notebooks

For a long time I wanted to see ASE atoms in my Jupyter notebook. My previous attempts were usually unsuccessful. Today I decided to try again. First ASE wiki suggests x3d and webngl:

view(atoms, viewer='x3d')
view(atoms, viewer='ngl')

Łucasz Mentel gives some useful tips in his blogpost from 2017.

In my case x3d works and webngl fails. The x3d picture is not enought, and I do not want to spend much time on fixing webngl.

ASE-notebook is what works for me.

conda create -n "jupyter"
conda activate jupyter
conda install -c conda-forge ase-notebook
conda install -c conda-forge jupyterlab

By the way, the model is from my “Surface Curvature Effect on Dual-Atom Site Oxygen Electrocatalysis” paper, which you can read at chemRxiv until it turns Gold Open Access.

Creating scientific figures with versioning

While working on the “Potential of monolayer charge” letter, we learned that saving figure versions is essential. In particular, while working on the first Figure (representing the concept), we started with this sketch as an intermediate of a cartoon from “Overscreening versus Crowding” and MD snapshots from “Interfaces between Charged Surfaces and Ionic Liquids.”

With chatGPT, we created a wavy electrode and arranged ions in Matplotlib. Then, we created over 30 versions in Inkscape, as shown in this animation (also made with guidance from chatGPT):

Versions of Figure 1

It was crucial to have versions instead of working on the same file because we often reverted to previous versions after trying new visual elements. Our final figure illustrates that the Potential of Monolayer Charge (PMC) is right between overscreening and crowding regimes. Moreover, the PMC can fit into the electrochemical stability window by lowering its absolute value with varying ion area. Read more in the “Potential of monolayer charge” letter.

A cappella by ChatGPT

Here are two examples of lyrics generated by ChatGPT. To make it even more fun, copy-paste the text to deepl.com (better than google translate) and make it read/sign the lyrics. melobytes is not good at all.

Verse 1:
Carbon in the air, causing so much harm
But we can make a change, with carbon capture
Trapping it away, before it can do more
We can reduce the harm, and protect our shore

Chorus:
Carbon capture, carbon capture
A solution for our planet’s future
Carbon capture, carbon capture
Together we can make it happen

Verse 2:
Technology is key, to making it a success
We can power our lives, while reducing the excess
Carbon in the air, will no longer be a threat
With carbon capture, we can make a new bet

Chorus:
Carbon capture, carbon capture
A solution for our planet’s future
Carbon capture, carbon capture
Together we can make it happen

Bridge:
We can make a difference, with every step we take
Towards a brighter future, for the sake of those who’ll come
Let’s work together, and make this dream come true
With carbon capture, we can make a better future for you

Chorus:
Carbon capture, carbon capture
A solution for our planet’s future
Carbon capture, carbon capture
Together we can make it happen.

and another one

Electricity, oh electricity
Flowing through a circuit so free
But when it comes to ionic liquids
There’s more to it, as you’ll see

The electric double layer
Is a fascinating thing
It forms between the electrodes
And the ionic liquids they bring

The ions in the liquid
Are attracted to the metal
They line up in a layer
It’s really quite essential

This double layer of charge
Controls the flow of electricity
It’s a key part of the circuit
That makes our technology so advanced, you see

So next time you flip a switch
Or plug in your phone to charge
Think of the electric double layer
Making it all possible, oh so large!

k-points with kplib and gpaw

Choosing optimal k-points is a tricky task. In GPAW, one can set them manually, using size or density and following a rule of thumb:

calc = GPAW(kpts={'size': (4, 4, 4), 'gamma': True})
# or
calc = GPAW(kpts={'density': 2.5, 'gamma': True})

A rule of thumb for choosing the initial k-point sampling is, that the product, ka, between the number of k-points, k, in any direction, and the length of the basis vector in this direction, a, should be:

  • ka ~ 30 Å, for d band metals
  • ka ~ 25 Å, for simple metals
  • ka ~ 20 Å, for semiconductors
  • ka ~ 15 Å, for insulators

Remember that convergence in this parameter should always be checked.

https://wiki.fysik.dtu.dk/gpaw/tutorialsexercises/structureoptimization/surface/surface.html

The corresponding densities (ka/2π) are:

  • ka/2π ~ 4.8 Å, for d band metals
  • ka/2π ~ 4.0 Å, for simple metals
  • ka/2π ~ 3.2 Å, for semiconductors
  • ka/2π ~ 2.4 Å, for insulators

With the recent update, I can start using kplib (see paper) to choose the optimal generalized k-point grids. The main variable in kplib is min_distance, which is analogous to the density×2π. Read more about the min_distance at muellergroup.jhu.edu/K-Points.html.

Here is an example of my conda environment

conda create -n gpaw23 python=3.9
conda activate gpaw23
conda install -c conda-forge cxx-compiler
pip install kplib # from pypi.org/project/kpLib
conda install -c conda-forge gpaw

Here is a working example:

from ase import Atoms
from ase.parallel import parprint
from gpaw import GPAW, PW
from kpLib import get_kpoints
from pymatgen.io.ase import AseAtomsAdaptor

atoms = Atoms(cell=[[1.608145, -2.785389, 0.0], [1.608145, 2.785389, 0.0], [0.0, 0.0, 5.239962]],
              symbols=['Ga', 'Ga', 'N', 'N'],
              positions=[[ 1.608145  , -0.92846486,  2.61536983],
                         [ 1.608145  ,  0.92846486,  5.23535083],
                         [ 1.608145  , -0.92846486,  4.58957792],
                         [ 1.608145  ,  0.92846486,  1.96959692]],
              pbc=True)
structure = AseAtomsAdaptor.get_structure(atoms)
kpts_data = get_kpoints(structure, minDistance=30, include_gamma=False)
    
parprint("Found lattice with kplib: ")
parprint(f"Nominal kpts: {kpts_data['num_total_kpts']}")
parprint(f"Distinct kpts: {kpts_data['num_distinct_kpts']}")

atoms.calc = GPAW(xc='PBE',
                  mode=PW(400),
                  kpts=kpts_data['coords'],
                  symmetry={'point_group': True,
                            'time_reversal': True,
                            'symmorphic': False,
                            'tolerance': 1e-4},
                  txt='gpaw-out.txt')
energy = atoms.get_total_energy()

parprint(f"Total energy: {energy}")
parprint(f"kpts passed to GPAW: {len(atoms.calc.get_bz_k_points())}")
parprint(f"kpts in GPAW IBZ: {len(atoms.calc.get_ibz_k_points())}")

Working with RSS

RSS is a site summary – a format used to create feeds with articles’ metadata, including Graphical Abstract, Title, Publication data, Authors, and Abstract.

Here is my way of organizing RSS flows. Let us take as an example ACS journals. Their RSS feeds are all given on one page:

https://pubs.acs.org/page/follow.html

I have copied them all by opening the html-code and taking the urls, which I then merge into a single opml-file at https://opml-gen.ovh.

Then I uploaded the opml-file to a very old but still working webpage:

http://www.feedrinse.com

feedrinse merges all feeds into one “channel” feed. Here is my merged feed:

http://www.feedrinse.com/services/channel/?chanurl=7bde3acd38bc31fc705118deb2300ca1

Using feedrince’s interface is tricky. Check this blogposts for a step-by-step instruction:

https://www.journalism.co.uk/skills/how-to-tame-your-rss-sources-using-feed-rinse/s7/a53238/

In my case, feedrince’s filters do not work. So, I turned to https://siftrss.com/ , where one can set up a regex filter. You can check your regex expression at https://regex101.com/. Here is my example:

/(electro)|(cataly)|(double)/

which finds all words containing “electro” or “cataly” or “double”.

From siftrss I got a new feed that I entered to my RSS reader.

I am currently using online and mobile RSS readers, which are synced together. Namely, I use Nextcloud News, because I have a Nextcloud account.

In these RSS readers, one can see the essential info about each article and star articles. It is a pleasure to swipe articles on the mobile phone and star interesting articles. Later one can open the stared articles from the online reader and go to the publisher’s webpage. At that stage, I also use Reader View (in Firefox) and listen to the abstract.

Nextcloud news
Nextcloud News (mobile)

P.S. Here are all ACS feeds (as for dec 2022):

http://feeds.feedburner.com/acs/aabmcb
http://feeds.feedburner.com/acs/aaembp
http://feeds.feedburner.com/acs/aaemcq
http://feeds.feedburner.com/acs/aamick
http://feeds.feedburner.com/acs/aanmf6
http://feeds.feedburner.com/acs/aapmcd
http://feeds.feedburner.com/acs/aastgj
http://feeds.feedburner.com/acs/abmcb8
http://feeds.feedburner.com/acs/abseba
http://feeds.feedburner.com/acs/acbcct
http://feeds.feedburner.com/acs/accacs
http://feeds.feedburner.com/acs/achre4
http://feeds.feedburner.com/acs/achsc5
http://feeds.feedburner.com/acs/acncdm
http://feeds.feedburner.com/acs/acscii
http://feeds.feedburner.com/acs/acsodf
http://feeds.feedburner.com/acs/aeacb3
http://feeds.feedburner.com/acs/aeacc4
http://feeds.feedburner.com/acs/aeecco
http://feeds.feedburner.com/acs/aelccp
http://feeds.feedburner.com/acs/aesccq1
http://feeds.feedburner.com/acs/aewcaa
http://feeds.feedburner.com/acs/afsthl
http://feeds.feedburner.com/acs/aidcbc
http://feeds.feedburner.com/acs/amacgu
http://feeds.feedburner.com/acs/amachv
http://feeds.feedburner.com/acs/amclct
http://feeds.feedburner.com/acs/amlccd
http://feeds.feedburner.com/acs/amlcef
http://feeds.feedburner.com/acs/amrcda
http://feeds.feedburner.com/acs/anaccx
http://feeds.feedburner.com/acs/ancac3
http://feeds.feedburner.com/acs/ancham/
http://feeds.feedburner.com/acs/aoiab5
http://feeds.feedburner.com/acs/apaccd
http://feeds.feedburner.com/acs/apcach
http://feeds.feedburner.com/acs/apchd5
http://feeds.feedburner.com/acs/aptsfn
http://feeds.feedburner.com/acs/asbcd6
http://feeds.feedburner.com/acs/ascecg
http://feeds.feedburner.com/acs/ascefj
http://feeds.feedburner.com/acs/bcches
http://feeds.feedburner.com/acs/bichaw
http://feeds.feedburner.com/acs/bomaf6
http://feeds.feedburner.com/acs/cgdefu
http://feeds.feedburner.com/acs/chreay
http://feeds.feedburner.com/acs/cmatex
http://feeds.feedburner.com/acs/crtoec
http://feeds.feedburner.com/acs/enfuem
http://feeds.feedburner.com/acs/esthag
http://feeds.feedburner.com/acs/estlcu
http://feeds.feedburner.com/acs/iecred
http://feeds.feedburner.com/acs/inocaj
http://feeds.feedburner.com/acs/jaaucr
http://feeds.feedburner.com/acs/jacsat
http://feeds.feedburner.com/acs/jafcau
http://feeds.feedburner.com/acs/jamsef
http://feeds.feedburner.com/acs/jceaax
http://feeds.feedburner.com/acs/jceda8
http://feeds.feedburner.com/acs/jcisd8
http://feeds.feedburner.com/acs/jctcce
http://feeds.feedburner.com/acs/jmcmar
http://feeds.feedburner.com/acs/jnprdf
http://feeds.feedburner.com/acs/joceah
http://feeds.feedburner.com/acs/jpcafh
http://feeds.feedburner.com/acs/jpcbfk
http://feeds.feedburner.com/acs/jpccck
http://feeds.feedburner.com/acs/jpclcd
http://feeds.feedburner.com/acs/jprobs
http://feeds.feedburner.com/acs/langd5
http://feeds.feedburner.com/acs/mamobx
http://feeds.feedburner.com/acs/mpohbp
http://feeds.feedburner.com/acs/nalefd
http://feeds.feedburner.com/acs/oprdfk
http://feeds.feedburner.com/acs/orgnd7
http://feeds.feedburner.com/acs/orlef7

Playing with Galactica

Installation of Galactica is as easy as:

conda create -n papers python=3.8
conda activate papers
pip install galai transformers accelerate

Now you can work with the simplest Galactica models (125m, 1.3b, 6.7b) using CPUs. Here is my script:

from transformers import AutoTokenizer, OPTForCausalLM
import sys

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 200
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto")

#input_text = '# Introduction \n\n The main idea of the paper "Supervised hashing for image retrieval via image representation learning" is'
#input_text = "# Review \n\n The main idea of the paper 'On the thickness of the double layer in ionic liquids'"
#input_text = "# Review High entropy alloys in electrocatalysis"
input_text = str(sys.argv[1])
input_ids = tokenizer(input_text, padding='max_length', return_tensors="pt").input_ids

outputs = model.generate(input_ids, max_new_tokens=200,
                         do_sample=True,
                         temperature=0.7,
                         top_k=25,
                         top_p=0.9,
                         no_repeat_ngram_size=10,
                         early_stopping=True)
print(tokenizer.decode(outputs[0]).lstrip('<pad>'))

Run it on your laptop as:

python script.py "YOUR QUERY"

For example, let us check what Galactica know about HEA:

python script.py "High entropy alloys in catalysis "

6.7b model will give:

High entropy alloy catalysis (HEAC) is a new concept for catalytic applications. A series of HEAs with a similar chemical composition (CoCrFeNiMn) were prepared by arc-melting and characterized by X-ray diffraction (XRD), X-ray photoelectron spectroscopy (XPS), and high resolution transmission electron microscopy (HRTEM). The catalytic performance of the HEAs was tested in the CO oxidation reaction. The catalytic activity of the HEAs is compared with that of the pure metals and the HEA-supported Pt catalysts. The results show that the HEAs are active in the CO oxidation reaction, and that the activity is comparable to that of the Pt catalysts. The HEAs have a much lower activity than the pure metals. XPS and HRTEM results show that the HEAs have a different surface structure than the pure metals, which is probably the reason for the high catalytic activity of the HEA.

Galactica

Also, let us review a paper by the CHEAC founders:

python script.py '# Introduction \n\n The main idea of the paper "Self-supported Pt–CoO networks combining high specific activity with high surface area for oxygen reduction" is'

“Self-supported Pt–CoO networks combining high specific activity with high surface area for oxygen reduction” is to report the synthesis of highly porous self-supported electrocatalysts, which combine high surface area with high specific activity for the oxygen reduction reaction (ORR). The synthesis is based on a self-supported network of Pt doped CoO (Pt-CoO) nanoparticles, which are prepared by a two-step process. In the first step, Pt-doped Co₃O₄ (Pt-Co₃O₄) nanoparticles are formed via the thermal decomposition of Co- and Pt-oleate complexes, followed by the oxidation of Pt-Co₃O₄ to Pt-CoO at 550 °C. The resulting porous self-supported network consists of Pt-CoO nanoparticles with diameters of 4–5 nm and a high surface area of 130 m2/g. The specific activity of the Pt-CoO network for the ORR is 2.6 times higher than that of the Pt/C catalyst, and the mass activity is 2.

Galactica

You can run the same code in Google Drive with colab.

Here are some links:

https://huggingface.co/facebook/galactica-125m
https://huggingface.co/spaces/morenolq/galactica-base/blob/main/app.py
https://github.com/paperswithcode/galai
https://github.com/paperswithcode/galai/issues/39

P.S. https://chat.openai.com/chat seems to be much cooler!

Positive writing

Here are my notes and thoughts about positive writing.

https://twitter.com/grammarly/status/1457749263904133124

Positive writing helps to communicate better with readers. Naturally, positive writing is more concrete than the negative one. For instance, just removing “not” in  “bananas are not vegetables” or “bananas are not blue” and turning it into positive “bananas are yellow fruits” results in a clear undeniable statement. Another aspect of positive writing is tuning the reader’s attitude towards your ideas. Psychologically, after going through easily agreeable sentences, like “bananas are sweet” and “bananas are colorful”, the reader will be more ready to agree on your conclusion that “a banana is a comfort and nutritious choice for a lunchbox”.

More text with examples are under editing 🙂

External XC libraries for GPAW

There are two libraries of XC functionals that can be used in GPAW. These are libxc and libvdwxc. Conda installation of GPAW automatically picks them. You can check whether your GPAW connects to libxc and libvdwxc like gpaw info.

libvdwxc is useful when you wish to run calculations with vdW-functionals and GPAW. Such as BEEF-vdW. Herewith, libvdwxc implementation of vdW-functionals are better parallelized than the native GPAW implementation. For example, add the following line to your GPAW calculator xc={'name':'BEEF-vdW','backend':'libvdwxc'} to run a calculation with the BEEF-vdW functional. BEEF-vdW calculations with libvdwxc can run as fast as PBE-like calculations if you use the proper grid, like parallel={'augment_grids':True,'sl_auto':True}. Here is a list of libvdwxc functionals: gitlab.com/libvdwxc/libvdwxc

Note that the following GPAW page is somewhat outdated:
wiki.fysik.dtu.dk/gpaw/documentation/xc/vdw.html

libxc is useful when you wish to run calculations with functionals that are not implemented in GPAW. Note that GPAW implementation is more efficient. There are many ways to call for libxc. For example, add the following line to your GPAW calculator xc='MGGA_X_SCAN+MGGA_C_SCAN' to run a calculation with the SCAN functional. Nore that GPAW setups are for LDA, PBE, and RPBE. You can generate setups specifically for your functional if it is GGA or HGGA. Here is a list of libxc functionals: tddft.org/programs/libxc/functionals/

Memory issues in GPAW

Try to use default parameters for the calculator. Simple and often useful.

Below you find a list of suggestions that should be considered when encountering a memory problem – when a calculation does not fit an allocated memory limit.

Note1: You can use –dry-run to make memory estimation and check for parallelization over kpts, domain, and bands as well as use of symmetry.

gpaw python --dry-run=N script.py

Mind that the memory estimation with –dry-run is underestimated. https://gitlab.com/gpaw/gpaw/-/issues/614

Note2: You can use def monkey_patch_timer() to write information about memory usage into mem.* files. Call the function before the actual work is started.

from gpaw.utilities.memory import monkey_patch_timer

monkey_patch_timer()

SUBMISSION OPTIONS

  1. Try increasing the total memory or memory per tasks in the submission script, if you are sure that everything else (see below) is correct.
  2. Try increasing number of tasks (CPUs×threading) and nodes, if only you are sure that everything else (see below) is correct. Note that your calculation accesses all the nodes’ memory independent on the number of allocated tasks, but not not all memory is actually available because some is used by the OS and other running jobs. Also, increasing the number of tasks decreases parallelization efficiency and might decrease the queue priority (depending on the queuing system).

GEOMETRY

  1. Check the model geometry. Perhaps, you can make a more compact model. For example, with orthorhombic=False option.
  2. In slab calculations, use just enough vacuum. Mind that PW mode is egg-box effect free, so, with the dipole-layer correction, you can reduce the vacuum layer significantly. Just check for the energy convergence.
    https://wiki.fysik.dtu.dk/gpaw/tutorialsexercises/electrostatics/dipole_correction/dipole.htm
  3. Ensure that symmetry is used. Sometimes, the calculator uses less symmetry than there is. In that case, recheck the geometry. Remember that you can preserve symmetry during optimization. https://wiki.fysik.dtu.dk/ase/ase/constraints.html#the-fixsymmetry-class

PARALLELIZATION

In general, parallelization over domains requires less memory than parallelization over bands and k-points, but the default order of parallelization is k-points, then domains, then bands. Remember the formula kpts×domain×bands = N, where N is the number of tasks (CPUs).

  1. In most cases, the default parallelization with symmetry is most efficient in terms of memory usage.
  2. Reprioritizing parallelization over domain can reduce memory consumption, but also slow down the calculation as parallelization over k-points is usually more time-efficient.
  3. Parallelization over any type can be suppressed by setting, for example, for domains like parallel = {'domain':1}. In the LCAO mode, you should check whether parallelizing over bands, like parallel = {'bands':2}, helps with the memory issue.

CALCULATOR PARAMETERS

  1. Consider using a different mode. “With LCAO, you have fewer degrees of freedom, so memory usage is low. PW mode uses more memory and FD a lot more.” https://wiki.fysik.dtu.dk/gpaw/documentation/basic.html#manual-mode
  2. Change calculation parameters, like h, cutoff, setups (like setups={'Pt': '10'}), basis (like basis={'H': 'sz', 'C': 'dz'}), etc. Check for convergence of properties, like in this tutorial: wiki.fysik.dtu.dk/gpaw/summerschools/summerschool22/catalysis/catalysis.html#extra-material-convergence-test
  3. It is possible to reduce the memory by changing the mixer options.
    https://wiki.fysik.dtu.dk/gpaw/documentation/convergence.html