What is BFS pathfinding on a knowledge graph?

BFS (Breadth-First Search) pathfinding on a knowledge graph explores nodes level by level starting from a source node, expanding outward until it reaches the target. On Wikipedia's graph of 60 million articles connected by internal hyperlinks, BFS finds the shortest chain of article-to-article links between any two concepts. Time complexity is O(V+E) where V is vertices and E is edges. It guarantees the shortest path in unweighted graphs.

How does SPARQL reasoning work on Wikidata?

SPARQL is a W3C standard query language for querying RDF knowledge graphs. On Wikidata, SPARQL queries trace the 'subclass of' property (P279) upward from two concepts until their classification hierarchies converge at a common ancestor. For example, both Physics and Music eventually trace up to 'academic discipline' or 'field of study.' The public endpoint is query.wikidata.org.

What is TF-IDF cosine similarity?

TF-IDF (Term Frequency-Inverse Document Frequency) converts text into numerical vectors where each dimension represents a word weighted by its importance. Cosine similarity then measures the angle between two such vectors — a value of 1.0 means identical content, 0.0 means completely different. When applied to Wikipedia articles, TF-IDF cosine similarity reveals statistical overlaps in vocabulary that indicate structural connections invisible to link analysis.

How Conceptual Search Works: BFS, SPARQL, TF-IDF and Formal Logic

A conceptual search engine discovers relationships between ideas using computational methods rather than keyword matching. MapOfLogic combines four algorithms: BFS pathfinding traces the shortest chain of hyperlinks between two Wikipedia articles, SPARQL ontological reasoning queries Wikidata's classification hierarchies to find common ancestors, TF-IDF cosine similarity measures statistical overlap between concept descriptions, and formal logic generates verifiable propositions from shared Wikidata properties. Each method reveals a different type of connection — editorial, ontological, statistical, and logical.

This page explains the technical mechanics behind each algorithm. Every claim is verifiable — the algorithms are standard computer science, and the data sources are open.

Four Algorithms, One Answer

Each algorithm discovers a different type of connection between two concepts:

BFS — editorial connections (how Wikipedia editors linked the concepts)
SPARQL — ontological connections (where the concepts sit in the tree of knowledge)
TF-IDF — statistical connections (hidden vocabulary overlaps)
Formal logic — property-based connections (shared attributes with verifiable premises)

Running all four simultaneously gives the most complete picture of how two ideas relate. A connection that appears across multiple methods is stronger than one found by a single method.

BFS Pathfinding: Following the Links

Breadth-First Search is a graph traversal algorithm. It starts from a source node, explores all neighbors at distance 1, then all neighbors at distance 2, and so on. It uses a queue (FIFO) data structure to track which nodes to visit next.

On Wikipedia's graph:

Nodes = Wikipedia articles (60 million+)
Edges = internal hyperlinks (hundreds of millions)
Path = the shortest chain of article-to-article links between two concepts

Time complexity: O(V+E) where V is the number of vertices visited and E is the number of edges traversed. In practice, most Wikipedia paths are found within 3-6 hops, so the search space is manageable.

Why BFS and not other algorithms? Wikipedia links are unweighted — a link is a link, with no inherent "cost." BFS is optimal for shortest-path problems in unweighted graphs. Dijkstra's algorithm is designed for weighted graphs and adds unnecessary complexity. DFS (Depth-First Search) does not guarantee finding the shortest path. A* requires a heuristic function, which is difficult to define for semantic connections.

SPARQL Ontological Reasoning: Climbing the Tree

SPARQL (SPARQL Protocol and RDF Query Language) is a W3C standard for querying structured data in RDF format. Wikidata's public SPARQL endpoint allows anyone to query its 100 million+ entities.

MapOfLogic uses the P279 property ("subclass of") to trace classification hierarchies. Every Wikidata entity has a chain of increasingly general categories. The algorithm traces upward from both concepts until the chains converge at a common ancestor.

This is formal ontological reasoning — not keyword matching, not statistical inference. The connection is derived from the taxonomic structure of human knowledge, encoded by thousands of Wikidata contributors.

TF-IDF Cosine Similarity: Measuring Overlap

TF-IDF (Term Frequency-Inverse Document Frequency) is an information retrieval technique that converts text into numerical vectors:

Term Frequency (TF) — how often a word appears in a document
Inverse Document Frequency (IDF) — how rare that word is across all documents
TF-IDF weight = TF × IDF — words that are frequent in one document but rare overall get the highest weights

Cosine similarity then measures the angle between two TF-IDF vectors. A cosine of 1.0 means identical word distributions. A cosine of 0.0 means completely different vocabulary. Values between 0.0 and 1.0 indicate varying degrees of similarity.

When applied to Wikipedia articles, TF-IDF reveals that two concepts that seem unrelated may share surprisingly similar vocabulary — indicating a structural connection that neither link analysis nor ontological reasoning would detect.

Formal Logic: Verifiable Propositions

When two concepts share properties in Wikidata, the system constructs formal logical propositions:

Premise 1: Concept A has property X (verified in Wikidata)

Premise 2: Concept B has property X (verified in Wikidata)

Conclusion: A and B share property X, indicating a structural parallel

Each premise is traceable to a specific Wikidata property and entity ID. The conclusion is derived, not generated. This makes formal logic propositions the most auditable form of connection — every step can be independently verified by anyone with access to Wikidata.

The Data Sources

Wikipedia

60 million+ articles across 300+ languages. Licensed under CC BY-SA 3.0. Updated continuously by volunteer editors. Used for: BFS pathfinding (link graph) and TF-IDF analysis (article text).

Wikidata

100 million+ structured entities. Licensed under CC0 (public domain). Queryable via SPARQL at query.wikidata.org. Used for: SPARQL ontological reasoning and formal logic propositions.

Both sources are open, free, and verifiable. MapOfLogic does not use proprietary datasets. Every result can be independently confirmed using the public APIs.

Find the hidden connection between any two ideas

LAUNCH MAPOFLOGIC →