# sem

> Entity-level semantic diff on top of Git. Functions, classes, methods instead of lines.

## Overview

sem extends Git with entity-level operations. Instead of tracking lines, sem tracks functions, classes, methods, and types. Uses tree-sitter for parsing and AST-normalized structural hashing to detect cosmetic vs structural changes.

Git tracks lines. Developers think in functions. sem bridges the gap.

## Install

```
brew install sem-cli
```

Or build from source:
```
git clone https://github.com/Ataraxy-Labs/sem
cd sem/crates && cargo install --path sem-cli
```

Binary at `crates/target/release/sem`.

## Commands

### sem diff
Entity-level diff. Shows which functions/classes were added, modified, deleted, or renamed. Distinguishes cosmetic changes (whitespace/formatting) from structural changes (logic).

```
sem diff                          # working changes
sem diff --staged                 # staged only
sem diff --commit abc1234         # specific commit
sem diff --from HEAD~5 --to HEAD  # commit range
sem diff file1.ts file2.ts       # compare two files (no git needed)
sem diff --format json            # JSON output for agents/CI
sem diff --format plain           # git-status style
sem diff --format markdown        # markdown tables
sem diff --stdin --format json    # read file changes from stdin
sem diff --file-exts .py .rs      # filter by extension
sem diff -v                       # verbose inline content diffs
```

### sem blame
Entity-level blame. Who last modified each function/class, not each line.

```
sem blame src/auth.ts
sem blame src/auth.ts --json
```

### sem graph
Cross-file entity dependency graph. Shows what each function calls and what calls it.

```
sem graph
sem graph --entity validateToken
sem graph --file-exts .py
sem graph --format json
sem graph --no-default-excludes
```

### sem impact
Transitive impact analysis. If this entity changes, what else is affected? BFS through dependency graph.

```
sem impact validateToken
sem impact validateToken --json
sem impact validateToken --file-exts .py
sem impact validateToken --no-default-excludes
```

## Key Features

- 27 languages with full entity extraction via tree-sitter
- Structural hashing: AST-normalized hashes that ignore whitespace, comments, formatting
- Cosmetic vs structural change detection in diff
- Entity-level blame (per function, not per line)
- Cross-file dependency graph via call/reference analysis
- Transitive impact analysis (BFS through dependency graph)
- Three-phase entity matching: exact ID, structural hash (rename detection), fuzzy similarity
- JSON output for AI agents and CI pipelines
- Stdin mode for non-git usage

## Language Support

27 programming languages:

| Language | Extensions | Entity Types |
|----------|-----------|--------------|
| TypeScript | .ts .tsx .mts .cts | functions, classes, interfaces, types, enums, exports |
| JavaScript | .js .jsx .mjs .cjs | functions, classes, variables, exports |
| Python | .py | functions, classes, decorated definitions |
| Go | .go | functions, methods, types, vars, consts |
| Rust | .rs | functions, structs, enums, impls, traits, mods, consts |
| Java | .java | classes, methods, interfaces, enums, fields, constructors |
| C | .c .h | functions, structs, enums, unions, typedefs |
| C++ | .cpp .cc .hpp | functions, classes, structs, enums, namespaces, templates |
| C# | .cs | classes, methods, interfaces, enums, structs, properties |
| Ruby | .rb | methods, classes, modules |
| PHP | .php | functions, classes, methods, interfaces, traits, enums |
| Swift | .swift | functions, classes, protocols, structs, enums, properties |
| Elixir | .ex .exs | modules, functions, macros, guards, protocols |
| Bash | .sh | functions |
| HCL/Terraform | .hcl .tf .tfvars | blocks, attributes (qualified names) |
| Kotlin | .kt .kts | classes, interfaces, objects, functions, properties |
| Fortran | .f90 .f95 .f | functions, subroutines, modules, programs |
| Vue | .vue | template/script/style blocks + inner TS/JS entities |
| XML | .xml .plist .svg .csproj | elements (nested, tag-name identity) |
| ERB | .erb | blocks, expressions, code tags |
| Svelte | .svelte .svelte.js .svelte.ts | component blocks, rune modules + inner JS/TS entities |
| Dart | .dart | classes, mixins, extensions, enums, type aliases, functions |
| Perl | .pl .pm .t | subroutines, packages |
| OCaml | .ml .mli | values, modules, types, classes, externals |
| Scala | .scala .sc .sbt | classes, objects, traits, enums, functions, vals, extensions |
| Nix | .nix | bindings, inherit declarations |
| Zig | .zig | functions, tests, variables |

Plus structured data formats:

| Format | Extensions | Entity Types |
|--------|-----------|--------------|
| JSON | .json | properties, objects (RFC 6901 paths) |
| YAML | .yml .yaml | sections, properties (dot paths) |
| TOML | .toml | sections, properties |
| CSV | .csv .tsv | rows (first column as ID) |
| Markdown | .md .mdx | heading-based sections |

Everything else falls back to chunk-based diffing.

## JSON Output

```
sem diff --format json
```

Returns:
```json
{
  "summary": { "fileCount": 2, "added": 1, "modified": 1, "deleted": 1, "moved": 0, "renamed": 0, "reordered": 0, "orphan": 0, "total": 3 },
  "changes": [
    {
      "entityId": "src/auth.ts::function::validateToken",
      "changeType": "added",
      "entityType": "function",
      "entityName": "validateToken",
      "startLine": 12,
      "endLine": 18,
      "oldStartLine": null,
      "oldEndLine": null,
      "filePath": "src/auth.ts"
    }
  ]
}
```

The named change-type buckets (added, modified, deleted, moved, renamed, reordered) always sum to total. Orphan is metadata for module-level changes already included in those buckets.

## Architecture

Cargo workspace: sem-core (library) + sem-cli (binary).

- tree-sitter for code parsing (native Rust, not WASM)
- git2 for in-process Git operations
- rayon for parallel file processing
- xxhash for structural hashing
- Plugin system for adding new languages and formats

## Performance

- Small commit (1 file): 5ms
- Medium commit (5 files): 8ms
- Large commit (13 files): 19ms
- Range (8 commits, 30 files): 24ms
- Faster than git diff for equivalent operations

## As a Library

sem-core can be used as a Rust library dependency:

```toml
[dependencies]
sem-core = { git = "https://github.com/Ataraxy-Labs/sem", version = "0.3" }
```

Used by weave (semantic merge driver) and inspect (entity-level code review).

## Links

- GitHub: https://github.com/Ataraxy-Labs/sem
- Website: https://ataraxy-labs.github.io/sem
- License: MIT OR Apache-2.0
