Practical Python and LLMs for FinTech


The Role of Python in Fintech

1. Overview

Python is a high-level programming language valued for its simplicity and readability. It’s widely used in data analysis, automation, machine learning, and backend development. Its strength comes from a large ecosystem of libraries such as pandas, numpy, matplotlib, and scikit-learn, which make complex data and AI tasks more efficient.

2. In Fintech

Python plays a central role in financial data processing and modeling. It supports:

  • Data Analysis & Visualization: Using pandas and matplotlib to clean and interpret large financial datasets.
  • Financial Forecasting: Predicting stock trends, credit risk, and market behavior with statsmodels or Prophet.
  • Automation: Powering trading bots, API connections, and routine financial operations.
  • AI Applications: Enabling contract summarization, fraud detection, and other intelligent tools through OpenAI and Transformers libraries.

Key Takeaway

Python bridges traditional finance and AI innovation, making it one of the most essential tools for modern Fintech professionals.


How to Install Python

1. Anaconda (Recommended for Beginners)

Anaconda is an all-in-one platform that includes Python, Jupyter Notebook, and a built-in package manager called conda. It offers a simple graphical interface for managing environments and installing libraries without using command lines.
You can download it directly from anaconda.com

2. Native Python + pip (For Advanced Users)

For more control, you can install Python manually from python.org
Use pip install to install packages

  • Recommended IDEs:

VS Code (via code.visualstudio.com)

PyCharm (from JetBrains)IDLE Shell(default)

IDLE Shell(default)


Python Library Categories for ML

THREE ESSENTIAL PYTHON LIBRARY


Data Handling & Analysis

OS & Glob

What is OS?
The OS module is a built-in Python library that allows you to interact directly with your computer’s operating system. It’s commonly used for tasks such as navigating folders, managing file paths, renaming or deleting files, and creating new directories.
In data projects, OS helps automate repetitive file-handling tasks — for example, reading all files in a specific folder or organizing output directories for analysis results.


What is Glob?
The Glob module is another built-in Python tool that helps you find files using wildcard patterns (like *.csv or data_*.txt). It’s especially useful when you need to process multiple files at once — for instance, loading all CSV files from a folder for combined data analysis.

In short, OS manages your system’s file structure, while Glob helps you quickly locate and handle multiple files — together, they form the foundation of efficient data handling and automation in Python.


NumPy is a powerful library for numerical and scientific computing in Python. It provides high-speed array and matrix operations, making it far more efficient than standard Python lists. NumPy forms the foundation for most data science and AI libraries, such as scikit-learn, TensorFlow, and PyTorch, by handling the underlying mathematical calculations.
In short, NumPy helps Python handle large-scale numerical data quickly and efficiently.

import numpy as np
a = np.array([1, 2, 3])
a.mean()


Pandas is a library designed for data manipulation and analysis. It introduces data structures like the DataFrame, which works much like an Excel table but with far greater flexibility and power.
It’s widely used in financial modeling, transaction analysis, and time series forecasting, allowing users to clean, reshape, and summarize complex datasets easily.
In essence, Pandas is your go-to tool for turning raw data into structured, analyzable information.

import pandas as pd
df = pd.read_csv(“sales.csv”)
df.groupby(“Region”).sum()


Machine Learning & Modeling

SCIKIT-LEARN & PYTORCH

Scikit-learn (sklearn)

Scikit-learn is one of the most widely used libraries for classical machine learning in Python. It provides tools for both supervised and unsupervised learning — helping you train, test, and evaluate models with ease.

You can use Scikit-learn to build models like Support Vector Machines (SVM), Random Forests, Logistic Regression, and K-Means clustering — all through a clean, beginner-friendly interface.

Its greatest strength lies in its simplicity and consistency. Whether you’re trying out a quick experiment or building a small ML project, Scikit-learn lets you move from data to predictions with just a few lines of code.

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


PyTorch is a powerful deep learning framework developed by Facebook (Meta). It’s designed for building flexible and dynamic neural networks, making it a favorite among researchers and AI developers.

Unlike traditional machine learning libraries, PyTorch allows you to experiment, customize, and visualize your models in real time — perfect for research-level AI development.

It’s widely used in cutting-edge fields such as Large Language Models (LLMs), Natural Language Processing (NLP), Computer Vision, and other custom AI applications, forming the foundation of many modern AI breakthroughs.

import torch

import torch.nn as nn

x = torch.tensor([[1.0], [2.0]])

linear = nn.Linear(1, 1)

y_pred = linear(x)

FeatureScikit-learn (sklearn)PyTorch
FocusTraditional Machine LearningDeep Learning / AI
Use CaseClassification, Regression, Clustering, Model EvaluationNeural Networks, Large Language Models (LLMs), Computer Vision, NLP
Best ForQuick experiments and classical ML workflowsResearch-level AI and custom deep learning architectures
Complexity LevelBeginner to IntermediateIntermediate to Advanced
Learning StyleSimple, consistent APIs for fast implementationFlexible, customizable framework for creative model design

Visualization & Insights

Matplotlib

Matplotlib is the most widely used Python library for data visualization. It allows you to turn numerical results into clear and insightful charts with just a few lines of code. Commonly used for showing trends over time through line plots and data distributions using histograms or scatter plots, Matplotlib helps transform raw data into visuals that are easy to interpret. It’s the perfect starting point for beginners who want to create quick, professional visuals for reports, presentations, or exploratory analysis.

mport matplotlib.pyplot as plt

x = [‘Mon’, ‘Tue’, ‘Wed’, ‘Thu’, ‘Fri’]

y = [100, 120, 90, 140, 110]

plt.plot(x, y, marker=’o’);plt.title(“Weekly Sales”)

plt.xlabel(“Day”);plt.ylabel(“Revenue”);plt.grid(True)

plt.show()

Matplotlib chart displaying Weekly Sales from Monday to Friday.


Python Library Categories for LLM

THREE ESSENTIAL PYTHON LIBRARY

Overview

The chart summarizes key Python libraries used in AI model development and evaluation, grouped by function — from loading models to processing data and evaluating results.


1. Model Loading & Inference

  • transformers → Used for loading and running pre-trained AI models (e.g., GPT, BERT).
  • bitsandbytes → Optimizes large model loading and inference with efficient memory usage (quantization).

2. Data & Embedding

  • Datasets → Helps manage and load large text or image datasets for training or testing.
  • sentence-transformers → Used to generate embeddings (vector representations) for text, essential for similarity search and NLP applications.

3. Evaluation & Utilities

sklearn.metrics → Provides tools for measuring model performance (e.g., accuracy, precision, recall, F1 score).

Tqdm → Adds progress bars to loops for easier tracking during long training or evaluation runs.


Model Loading & Inference

Transformers

What is Transformers?
Transformers is a powerful open-source library developed by Hugging Face that makes it easy to use Large Language Models (LLMs) and other state-of-the-art AI models. It provides a simple interface to load and run models such as Flan-T5, LLaMA, Gemma, BERT, and GPT — all with just a few lines of code.

What can it do?

  • Quickly download pre-trained models and tokenizers from the Hugging Face Hub with one command.
  • Perform a wide range of Natural Language Processing (NLP) tasks such as:
    • Text summarization – condensing long content into concise summaries.
    • Text classification – identifying sentiment, intent, or topic.
    • Question answering, translation, text generation, and more.

In short, Transformers gives you ready-to-use AI power without the need to build or train models from scratch.

Bitsandbytes

What is Bitsandbytes?
Bitsandbytes is a lightweight Python library designed to make large language models (LLMs) more efficient by loading them in low-precision formats such as 8-bit or 4-bit. This drastically reduces the GPU memory usage needed to run massive models — allowing developers to use advanced models even on standard or limited hardware.

What can it do?

  • Load large models efficiently: Makes it possible to run models like LLaMA-7B or Gemma on mid-range GPUs or even consumer-grade devices.
  • Integrates seamlessly with Transformers: When used together, you can load models directly in 4-bit or 8-bit mode, striking a balance between performance and resource savings.

In short, Bitsandbytes helps bring big AI models to smaller machines, enabling faster, cheaper, and more accessible experimentation.

Used together : Efficient LLM inference on real machines


Data & Embedding

DATASETS & SENTENCE-TRANSFORMERS 

What is Datasets?
Datasets is a library developed by Hugging Face that makes it incredibly easy to load, manage, and process large NLP datasets with just a single line of code. It’s designed to handle text, images, and structured data efficiently — even when working with massive collections.

What can it do?

  • Instantly access 1,000+ ready-to-use datasets such as AG News, Financial PhraseBank, and many others from the Hugging Face Hub.
  • Seamlessly integrates with Transformers for model training and with Pandas for data analysis and preprocessing.

Datasets helps you get data


Sentence-transformers

What is Sentence-transformers?
Sentence-transformers is a library built on top of Transformers, designed to convert sentences or paragraphs into numerical embeddings — compact vector representations that capture meaning.

What can it do?

  • Measure similarity between two sentences or texts.
  • Perform semantic search, text clustering, and information retrieval tasks.
  • Ideal for real-world applications such as comparing financial summaries, matching similar legal clauses, or detecting duplicate content.

Sentence-transformers helps you understand data


Evaluation & Utilities

Tqdm

What is Tqdm?
Tqdm is a small yet powerful Python library that adds progress bars to your code. It’s especially useful when running loops or processes that take a while — helping you see how much work is done and how much time remains.

What can it do?

  • Display real-time progress bars during long-running tasks such as:
    • Model predictions or inference
    • Data loading and preprocessing
    • Batch processing or training loops
  • Works seamlessly with Python loops, Pandas, and even machine learning workflows.


Sklearn.metrics

What is Sklearn.metrics?
Sklearn.metrics is a submodule of the Scikit-learn library that provides a wide range of evaluation metrics to measure how well your machine learning models perform. It helps you go beyond just accuracy to understand the true quality of your model’s predictions.

What can it do?

Commonly used in classification tasks and financial risk modeling, where evaluation accuracy directly impacts decision-making.

• Calculate key performance metrics such as:

Accuracy – how often predictions are correct

Recall – how well the model identifies true positives

F1-score – the balance between precision and recall

AUC (Area Under the Curve) – measures the model’s ability to distinguish between classes

• Compare your model’s predictions against actual labels to assess reliability and fairness.

Tqdm improves your workflow experience, Sklearn.metrics improves your model assessment


Deploying LLMs

OPEN VS CLOSED

AspectOpen-Source ModelsProprietary Models
InstallationRequires local setup with Python, dependencies, and sometimes GPU configuration.Runs through hosted APIs (e.g., OpenAI, Anthropic, Google) — no local installation needed.
InferenceExecuted locally or on self-managed cloud platforms.Performed remotely via API calls to the provider’s servers.
StrengthsHigh customizability, data privacy, and full control over model parameters.Offers ease of use, powerful performance, and instant scalability.
DrawbacksRequires more hardware resources and setup effort.Limited customization and recurring API fees.
ExamplesLLaMA, Gemma, Mistral, Flan-T5, FalconGPT-4, GPT-3.5 (OpenAI), Claude (Anthropic), Gemini (Google), Amazon Titan

Conclusion :
What Have We Learned Today?

1. Python empowers FinTech innovation
Python’s flexibility and simplicity make it the ideal language for financial technology. By mastering libraries like pandas, numpy, scikit-learn, and matplotlib, you can efficiently handle financial data, build predictive models, and visualize insights that drive smarter business decisions.

2. Large Language Models (LLMs) expand AI capabilities
Modern AI tools such as transformers, bitsandbytes, and sentence-transformers make it possible to integrate LLMs into FinTech workflows — from automated summarization and text classification to semantic search and risk analysis.

3. Real-world deployment creates real impact
Understanding how to deploy models — from GPU setup to loading advanced models like Gemma-2B — bridges the gap between theory and practice. These skills enable you to bring AI solutions to production, delivering measurable value in real-world financial applications.

In essence: You’ve learned how Python and AI come together to transform FinTech — turning data, models, and automation into meaningful business outcomes.


Exercise Questions :

Q1. How is Python used in FinTech?
Name two real examples and mention which Python libraries might be used in each.

Q2. What’s the difference between installing Python through Anaconda and Native Python?
Who should use each option, and why?

Q3. What are the purposes of the os and glob modules in Python?
How can they make batch file processing easier?

Q4. What do NumPy and Pandas help you do?
Explain their main features and how they’re used in financial data analysis.

Q5. How do Scikit-learn and PyTorch differ?
When would you use each, and can you give one example model for both?

Q6. What are Transformers and Bitsandbytes, and how do they work together?
Explain how they help run large language models on limited hardware.

Q7. How would you use Sentence-transformers for semantic search?
Describe the basic workflow and the libraries involved.


THANK YOU