The Beginner’s Guide to Language Models with Python
By
Iván Palomares Carrascosa on March 11, 2025 in
Language Models
Introduction
Language models — often known for the acronym
LLM for Large Language Models, their large-scale version — fuel powerful AI applications like conversational chatbots, AI assistants, and other intelligent text and content generation apps. This article provides a concise and basic understanding of LLMs, followed by three code-based introductory examples to illustrate their use through several well-known frameworks like
Hugging Face,
Ollama, and
Langchain. Don’t worry if some of these terms sound unfamiliar to you at this point: by the end of this reading, you’ll become acquainted with all of them.
What are Language Models?
At their essence, language models are natural language processing (NLP) systems capable of predicting the next word in a sequence, after having learned complex human language patterns by having been exposed to vast datasets of text data, typically thousands, millions, or even billions of documents. Of course, language models, in particular LLMs, have significantly evolved to accommodate many complex language understanding tasks beyond just next-word generation: they can answer questions, summarize or translate texts, and even extract insights or classify them.
Applications based on LLMs exist in different forms:
- API-based models like OpenAI’s GPT-4 (popularly known as “ChatGPT”) and Anthropic’s Claude are accessible worldwide via their websites or downloadable apps.
- Local models like LLaMA, Mistral, and Qwen, are normally run on personal or on-premises hardware.
- Hybrid models like Langchain enable app integration with other frameworks.
In the remainder of this article, we will stick to free and open-source models you will be able to try for free. Besides, the examples shown are configured to be run on an instance of Google Colab or Jupyter notebooks, thereby easing or even bypassing local configurations steps otherwise needed in your machine. Feel free to adapt them for their use in a Python IDE if you are acquainted with them.
Using Hugging Face’s Transformers Library
Hugging Face is a repository that provides open-source pre-trained language models ready to load and use for NLP tasks like text generation, translation, and sentiment analysis. It is powered by its centerpiece library:
Transformers, which offers seamless integration with popular Python libraries like PyTorch, JAX, and TensorFlow. Best of all: they are free to use and require minimal setup, making AI development accessible to everyone.
Let’s start this practical tour by installing the transformers library in a new notebook of your own:
We will now load a model from Hugging Face, specifically GPT-2 for text generation. When loading pre-trained models, we normally need to load not only the model itself, but also a compatible tokenizer responsible for splitting text inputs into logical language units called tokens (roughly equivalent to words in most cases) before being passed to the language model for processing it and generating follow-up text.
The following code imports the necessary packages and initializes the language model and tokenizer.
Next we define a prompt for the model, which is typically a query, question, or request to which the language model will try to generate a response. Some less sophisticated models limit themselves to generating follow-up words that continue the input sequence or prompt, for instance to continue a tale that starts with “Once upon a time”. In our case, we will try asking GPT-2 what is the capital of Japan and … fingers crossed.
Getting technical, in the above code you may have noticed that there are two processing steps taking place: encoding and decoding. The input sequence (the prompt) must be encoded into a numerical vector representation understandable by the model, and after it generates the raw (numerical) response, it is decoded back into text to make it understandable by us.
This is the decoded response:
Well, at least it gave the right answer! These pre-trained models are comparatively smaller, more manageable for testing environments, and not as absurdly powerful and high-performing as ChatGPT ones, hence it is not surprising they might be more limited in their text generation behavior. In particular, since we specified a maximum response length of 50 tokens, the model seems to prioritize the need to provide a response that closely matches that length, rather than keeping it short, simple, and logical.
Time to look at another example and introduce another framework.
Running Language Models Locally with Ollama
Ollama is a framework that enables
running language models locally in a streamlined and efficient way. For example, one of its available models is Qwen, which is a versatile open-source LLM capable of generating human-like text.
In contrast with Hugging Face models accessible via the Transformers library, Ollama’s models allow offline inference, reducing dependency on external APIs and internet connectivity (if used locally). The downside is the significant disk space needed to download the framework in your machine. While the usual procedure to try running one of Ollama’s language models locally would involve running our Python code and doing the necessary configurations on our machine, there is a workaround to be able to emulate the process in the comfort of our Google Colab notebook.
Let’s see how.
We first install
Colab-xterm, a Google Colab extension to be able to run a command line terminal inside our notebook:
Next, the following simple instruction will open a terminal inside the notebook:
This article provides a concise and basic understanding of LLMs, followed by three code-based introductory examples to illustrate their use through several well-known frameworks.
machinelearningmastery.com