Random notes

𝕏 | LinkedIn | GitHub | E-mail | RSS

Ten years of Knockpy: version 8 released

Oct 26, 2025

Knockpy is a small project I have been working on for about ten years. It started from curiosity and need, and over time it became a stable tool, used in many penetration testing distributions. Through the years, it has proved useful in an important area of security and OSINT: subdomain reconnaissance, a key step for those who map the exposed assets of an organization. The goal has always been to keep the tool simple, portable, and easy to adapt to different situations. I never wanted to make it something huge or very complex, but rather something that works well, clearly, and helps people who work with these tasks every day. Recognitions that are good for the code One of the things that makes me happiest, even after many years, is when a researcher writes to me privately to say thanks, or mentions Knockpy in a report or public post. Knowing... Read all →
Capire gli embedding con EmbeddingGemma

Sep 23, 2025

Si parla molto di LLM, i cosiddetti Large Language Models come ChatGPT, Gemini o Llama, modelli che sanno scrivere testi, rispondere a domande, riassumere documenti. Insomma addestrati per generare linguaggio. Accanto a questa famiglia esiste un altro tipo di modello, meno conosciuto ma non per questo meno importante. Sono i modelli di embedding. A differenza degli LLM, questi non producono frasi, il loro scopo è quello di prendere un testo e trasformarlo in una sequenza di numeri, un vettore che ne rappresenta il significato. Anche un LLM, per funzionare, utilizza internamente un sistema di embedding. Ogni parola, ogni pezzo di parola, viene trasformato in numeri prima di poter essere elaborato. La differenza è che negli LLM questo passaggio rimane nascosto, serve solo come base per arrivare alla generazione del linguaggio. Nei modelli di embedding, invece, questa trasformazione è l’obiettivo stesso. Per intenderci, se chiediamo ad un LLM “cos’è una firma... Read all →
Weights manipulation

Aug 24, 2025

Some time ago I asked myself: do we really need many days of calculation and powerful GPUs to understand how an open weights language model manages its safety mechanisms? More important, is there a fast and reversible way that does not need the creation of abliterated models to make the model more obedient to specific requests? From that question I started a research that was not easy. Making the code work with different models took time, because of many adjustments to fix library problems and memory limits. After I got a working version (not completely stable), I tried a different approach: change in real time the embedding weights of specific tokens, reducing step by step the ones linked to refusal (sorry, cannot, dangerous) and increasing the ones linked to compliance (sure, help, explain). With some models this worked well: small changes to refusal tokens slowly weakened the safety mechanisms, with... Read all →
Sentenza

Apr 23, 2025

The division of texts into chunks is an important step to build a good vector database. When embeddings are created for RAG systems, the size and meaning of the segments directly affect the accuracy and relevance of search results. Chunks that are too short break the content too much, while chunks that are too long risk joining unrelated information, making queries less effective. The whole process depends on the tokenizer used. Finding the right sentence boundaries is important to apply good chunking strategies. The histogram in the figure, made from a literary text, shows the distribution of 1011 sentences, with an average length of 118.48 characters and a standard deviation of 94.49. This shows that the sentence lengths in the corpus are very different. A limit of the current method comes from the asymmetric distribution of sentence lengths, with a long tail on the right (see graph). This means that... Read all →
Hello World

Mar 20, 2025

This is my first post on GitHub Pages. Read all →

Ten years of Knockpy: version 8 released

Capire gli embedding con EmbeddingGemma

Weights manipulation

Sentenza

Hello World