Installation of Galactica is as easy as:
conda create -n papers python=3.8
conda activate papers
pip install galai transformers accelerate
Now you can work with the simplest Galactica models (125m, 1.3b, 6.7b) using CPUs. Here is my script:
from transformers import AutoTokenizer, OPTForCausalLM
import sys
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 200
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto")
#input_text = '# Introduction \n\n The main idea of the paper "Supervised hashing for image retrieval via image representation learning" is'
#input_text = "# Review \n\n The main idea of the paper 'On the thickness of the double layer in ionic liquids'"
#input_text = "# Review High entropy alloys in electrocatalysis"
input_text = str(sys.argv[1])
input_ids = tokenizer(input_text, padding='max_length', return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=200,
do_sample=True,
temperature=0.7,
top_k=25,
top_p=0.9,
no_repeat_ngram_size=10,
early_stopping=True)
print(tokenizer.decode(outputs[0]).lstrip('<pad>'))
Run it on your laptop as:
python script.py "YOUR QUERY"
For example, let us check what Galactica know about HEA:
python script.py "High entropy alloys in catalysis "
6.7b model will give:
High entropy alloy catalysis (HEAC) is a new concept for catalytic applications. A series of HEAs with a similar chemical composition (CoCrFeNiMn) were prepared by arc-melting and characterized by X-ray diffraction (XRD), X-ray photoelectron spectroscopy (XPS), and high resolution transmission electron microscopy (HRTEM). The catalytic performance of the HEAs was tested in the CO oxidation reaction. The catalytic activity of the HEAs is compared with that of the pure metals and the HEA-supported Pt catalysts. The results show that the HEAs are active in the CO oxidation reaction, and that the activity is comparable to that of the Pt catalysts. The HEAs have a much lower activity than the pure metals. XPS and HRTEM results show that the HEAs have a different surface structure than the pure metals, which is probably the reason for the high catalytic activity of the HEA.
Galactica
Also, let us review a paper by the CHEAC founders:
python script.py '# Introduction \n\n The main idea of the paper "Self-supported Pt–CoO networks combining high specific activity with high surface area for oxygen reduction" is'
“Self-supported Pt–CoO networks combining high specific activity with high surface area for oxygen reduction” is to report the synthesis of highly porous self-supported electrocatalysts, which combine high surface area with high specific activity for the oxygen reduction reaction (ORR). The synthesis is based on a self-supported network of Pt doped CoO (Pt-CoO) nanoparticles, which are prepared by a two-step process. In the first step, Pt-doped Co₃O₄ (Pt-Co₃O₄) nanoparticles are formed via the thermal decomposition of Co- and Pt-oleate complexes, followed by the oxidation of Pt-Co₃O₄ to Pt-CoO at 550 °C. The resulting porous self-supported network consists of Pt-CoO nanoparticles with diameters of 4–5 nm and a high surface area of 130 m2/g. The specific activity of the Pt-CoO network for the ORR is 2.6 times higher than that of the Pt/C catalyst, and the mass activity is 2.
Galactica
You can run the same code in Google Drive with colab.
Here are some links:
https://huggingface.co/facebook/galactica-125m
https://huggingface.co/spaces/morenolq/galactica-base/blob/main/app.py
https://github.com/paperswithcode/galai
https://github.com/paperswithcode/galai/issues/39
P.S. https://chat.openai.com/chat seems to be much cooler!
