git-fast-import(1) ================== NAME ---- git-fast-import - Backend for fast Git data importers SYNOPSIS -------- [verse] frontend | 'git fast-import' [options] DESCRIPTION ----------- This program is usually not what the end user wants to run directly. Most end users want to use one of the existing frontend programs, which parses a specific type of foreign source and feeds the contents stored there to 'git fast-import'. fast-import reads a mixed command/data stream from standard input and writes one or more packfiles directly into the current repository. When EOF is received on standard input, fast import writes out updated branch and tag refs, fully updating the current repository with the newly imported data. The fast-import backend itself can import into an empty repository (one that has already been initialized by 'git init') or incrementally update an existing populated repository. Whether or not incremental imports are supported from a particular foreign source depends on the frontend program in use. OPTIONS ------- --force:: Force updating modified existing branches, even if doing so would cause commits to be lost (as the new commit does not contain the old commit). --quiet:: Disable all non-fatal output, making fast-import silent when it is successful. This option disables the output shown by \--stats. --stats:: Display some basic statistics about the objects fast-import has created, the packfiles they were stored into, and the memory used by fast-import during this run. Showing this output is currently the default, but can be disabled with \--quiet. Options for Frontends ~~~~~~~~~~~~~~~~~~~~~ --cat-blob-fd=:: Write responses to `cat-blob` and `ls` queries to the file descriptor instead of `stdout`. Allows `progress` output intended for the end-user to be separated from other output. --date-format=:: Specify the type of dates the frontend will supply to fast-import within `author`, `committer` and `tagger` commands. See ``Date Formats'' below for details about which formats are supported, and their syntax. --done:: Terminate with error if there is no `done` command at the end of the stream. This option might be useful for detecting errors that cause the frontend to terminate before it has started to write a stream. Locations of Marks Files ~~~~~~~~~~~~~~~~~~~~~~~~ --export-marks=:: Dumps the internal marks table to when complete. Marks are written one per line as `:markid SHA-1`. Frontends can use this file to validate imports after they have been completed, or to save the marks table across incremental runs. As is only opened and truncated at checkpoint (or completion) the same path can also be safely given to \--import-marks. --import-marks=:: Before processing any input, load the marks specified in . The input file must exist, must be readable, and must use the same format as produced by \--export-marks. Multiple options may be supplied to import more than one set of marks. If a mark is defined to different values, the last file wins. --import-marks-if-exists=:: Like --import-marks but instead of erroring out, silently skips the file if it does not exist. --[no-]relative-marks:: After specifying --relative-marks the paths specified with --import-marks= and --export-marks= are relative to an internal directory in the current repository. In git-fast-import this means that the paths are relative to the .git/info/fast-import directory. However, other importers may use a different location. + Relative and non-relative marks may be combined by interweaving --(no-)-relative-marks with the --(import|export)-marks= options. Performance and Compression Tuning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --active-branches=:: Maximum number of branches to maintain active at once. See ``Memory Utilization'' below for details. Default is 5. --big-file-threshold=:: Maximum size of a blob that fast-import will attempt to create a delta for, expressed in bytes. The default is 512m (512 MiB). Some importers may wish to lower this on systems with constrained memory. --depth=:: Maximum delta depth, for blob and tree deltification. Default is 10. --export-pack-edges=:: After creating a packfile, print a line of data to listing the filename of the packfile and the last commit on each branch that was written to that packfile. This information may be useful after importing projects whose total object set exceeds the 4 GiB packfile limit, as these commits can be used as edge points during calls to 'git pack-objects'. --max-pack-size=:: Maximum size of each output packfile. The default is unlimited. Performance ----------- The design of fast-import allows it to import large projects in a minimum amount of memory usage and processing time. Assuming the frontend is able to keep up with fast-import and feed it a constant stream of data, import times for projects holding 10+ years of history and containing 100,000+ individual commits are generally completed in just 1-2 hours on quite modest (~$2,000 USD) hardware. Most bottlenecks appear to be in foreign source data access (the source just cannot extract revisions fast enough) or disk IO (fast-import writes as fast as the disk will take the data). Imports will run faster if the source data is stored on a different drive than the destination Git repository (due to less IO contention). Development Cost ---------------- A typical frontend for fast-import tends to weigh in at approximately 200 lines of Perl/Python/Ruby code. Most developers have been able to create working importers in just a couple of hours, even though it is their first exposure to fast-import, and sometimes even to Git. This is an ideal situation, given that most conversion tools are throw-away (use once, and never look back). Parallel Operation ------------------ Like 'git push' or 'git fetch', imports handled by fast-import are safe to run alongside parallel `git repack -a -d` or `git gc` invocations, or any other Git operation (including 'git prune', as loose objects are never used by fast-import). fast-import does not lock the branch or tag refs it is actively importing. After the import, during its ref update phase, fast-import tests each existing branch ref to verify the update will be a fast-forward update (the commit stored in the ref is contained in the new history of the commit to be written). If the update is not a fast-forward update, fast-import will skip updating that ref and instead prints a warning message. fast-import will always attempt to update all branch refs, and does not stop on the first failure. Branch updates can be forced with \--force, but it's recommended that this only be used on an otherwise quiet repository. Using \--force is not necessary for an initial import into an empty repository. Technical Discussion -------------------- fast-import tracks a set of branches in memory. Any branch can be created or modified at any point during the import process by sending a `commit` command on the input stream. This design allows a frontend program to process an unlimited number of branches simultaneously, generating commits in the order they are available from the source data. It also simplifies the frontend programs considerably. fast-import does not use or alter the current working directory, or any file within it. (It does however update the current Git repository, as referenced by `GIT_DIR`.) Therefore an import frontend may use the working directory for its own purposes, such as extracting file revisions from the foreign source. This ignorance of the working directory also allows fast-import to run very quickly, as it does not need to perform any costly file update operations when switching between branches. Input Format ------------ With the exception of raw file data (which Git does not interpret) the fast-import input format is text (ASCII) based. This text based format simplifies development and debugging of frontend programs, especially when a higher level language such as Perl, Python or Ruby is being used. fast-import is very strict about its input. Where we say SP below we mean *exactly* one space. Likewise LF means one (and only one) linefeed and HT one (and only one) horizontal tab. Supplying additional whitespace characters will cause unexpected results, such as branch names or file names with leading or trailing spaces in their name, or early termination of fast-import when it encounters unexpected input. Stream Comments ~~~~~~~~~~~~~~~ To aid in debugging frontends fast-import ignores any line that begins with `#` (ASCII pound/hash) up to and including the line ending `LF`. A comment line may contain any sequence of bytes that does not contain an LF and therefore may be used to include any detailed debugging information that might be specific to the frontend and useful when inspecting a fast-import data stream. Date Formats ~~~~~~~~~~~~ The following date formats are supported. A frontend should select the format it will use for this import by passing the format name in the \--date-format= command line option. `raw`:: This is the Git native format and is `