Frequently Asked Questions

General

(1)I cannot ﬁnd the program executable? I double-click the program icon but nothing happens?
(2)The program crashes! Your program has a bug!
(3)The program crashes with large but not with small data sets, what is wrong?

Data file related

(4)I need more input about how microsatellites are coded in migrate?
(5)How can I code haploid data for Migrate?
(6)Can I use haplotype frequencies as input?
(7)Can I use gene frequencies as input?

Options and how long to run

(8)It runs with the default number of chains etc. Has it run long enough?
(9)How long does it run?
(10)Can migrate run on multiple machines in parallel?
(11)I run migrate several times and get inconsistent estimates.
(12)I run migrate and the population sizes are strangely high.

Outputfile and interpretation

(13)I have haploid data, what is Θ?
(14)I have mtDNA sequence data, what is Θ?
(15)How should I interpret each of the 4Nm estimates for pair i,j?
(16)Why are the Likelihood values diﬀerent between runs?
(17)Why do I have positive numbers in the Ln(L) column?
(18)I have problems to understand what are the Null-hypothesis and the alternative hypothesis in the likelihood ratio test section.

General

1.I cannot ﬁnd the program executable? I double-click the program icon but nothing happens?

Some binary distributions contain migrate-n as command line tool and they need to be started from

a Terminal program [or shell]. A typical migrate run on Macosx operating systems involves to start

of the Terminal.app (for tutorials about this see http://www.macdevcenter.com/pub/ct/51,

and then change to the directory where the data resides, and then start migrate-n.

2. The program crashes! Your program has a bug!

Sure, this program most likely has some bugs, but more likely is that the infile is not correct,

and without more detail about what went wrong there is little hope for help.

3. The program crashes with large but not with small data sets, what is wrong? [System description... + part of log]

General: Most often mistakes in the infile, such as wrong number loci or populations

or individuals or number of sites or using few characters for the individual names, let the

program crash almost immediately after the menu. Check the inﬁle carefully and compare

with the data ﬁle speciﬁcations.

Answers

Data ﬁle related

4. I need more input on how to code microsatellites for migrate Assume you analyze two

individuals for two msat loci (locus1 is a CA repeat, locus 2 a TCA repeat) that look like this

Primer 1 msat locus 1

------------- ------

1A ATTAGACATTGTGCACACACACACACATTGGAC

1B ATTAGACATTGTGCACACACACACACATTGGAC

2A ATTAGACATTGTGCACACACACACACATTGGAC

2B ATTAGACATTGTGCACACACA------TTGGAC

Primer 2 msat locus 2

------------- ------

1A ATTAGACATTGTGTCATCATCATCATCATCATCATCATCATCATCATCATCATCATTGGAC

1B ATTAGACATTGTGTCATCATCATCATCATCATCATCATCATCATCATCATCA---TTGGAC

2A ATTAGACATTGTGTCATCATCATCATCATCATCATCATCATCA------------TTGGAC

2B ATTAGACATTGTGTCATCATCATCA------------------------------TTGGAC

in a migrate inﬁle this would code (where / is a free chosen delimiter you can choose any character,

but I recommend something like / . , or similar)

1 2 / Example of how to code migrate

2 Example population with 2 diploid indiviudals

1A1B______ 7/7 14/13

2A2B______ 7/4 10/4

5. I have haploid allelic data, how should I structure my inﬁle

Unfortunately, I was biased towards diploid data for microsatellite and enzyme electrophoretic data

and you need to fake diploids for the inﬁle. Your microsatellite exampled data look like this:

Locus1 Locus2 Locus3 Locus4 Locus5

Ind1 11 45 14 15 89

Ind2 11 47 13 15 67

Ind3 11 43 13 15 67

Ind4 12 47 13 15 73

Ind5 11 45 13 15 89

And your infile should look like this

2 5 . Example input for haploid microsatellite data

5 Fake diploid population 1

Ind1 11.? 45.? 14.? 15.? 89.?

Ind2 11.? 47.? 13.? 15.? 67.?

Ind3 11.? 43.? 13.? 15.? 67.?

Ind4 12.? 47.? 13.? 15.? 73.?

Ind5 11.? 45.? 13.? 15.? 89.?

4 Fake diploid population 2

..data not shown..

Or

2 5 . Example input for haploid microsatellite data

3 Fake diploid population 1

Ind1Ind2 11.11 45.47 14.13 15.15 89.67

Ind3Ind4 11.12 43.47 13.13 15.15 67.73

Ind5???? 11.? 45.? 13.? 15.? 89.?

4 Fake diploid population 2

..data not shown..

The “?” are removed for the analysis (But recognize that in sequence data the ? are not removed.

6.Can I use haplotype frequencies as input?

No, input formats are a rather arbitrary matter, and I decided that you need to input each single sequence of genotype. I principle it would be easy to add a “frequency” input mode, but currently I have not time to do that. But keep asking for it, if this is so important to you.
7.Can I use gene frequencies as input?

No, not yet, this is on the todo list, but has a rather low priority. To circumvent the problem, you can create artiﬁcial genotypes for the inﬁle. The genotypes themselves are not important. A simple script that assigns alleles to individuals will do, this can be written in almost any scripting language from excel (yikes!), word-macro (yikes!), Perl, C, C++, applescript, Mathematica, ... for throw away programs I use Perl1 , Mathematica2 , or C3 .

About options and how to run

8. It run with the default number of chains etc. Has it run long enough?

This depends on the number of populations you want to analyze. If you have one it will be almost

certainly enough. But if you try to analyze 6 or more it almost certainly will not. You need to

experiment a little with the length of chains. See chapter 3 (Accuracy of results).

9. How long does it run?

With progress=Yes the program tries to estimate the length of a run from the work it has done

so far, after the ﬁrst short chain this may be rather imprecise, but you may realize that you need

to wait minutes or days (just imagine you estimate the time to travel from Spokane to Seattle

in a car and estimate when you will arrive only using the distance and time you have ﬁnished

already). The time calculated is only based on the genealogy search, and does not include the

time to create the plots for each locus and population. Therefore, if you have many populations

and many loci you can expect to wait longer than the time stamp indicates. There is an additional

time estimate for the proﬁle-likelihoods.

10. Can migrate run on multiple machines in parallel?

Short answer: YES. Long Answer 1: If you use the heating option and your machine is

a symmetric multiprocessor machine and you compiled with make thread then the program

will utilize n processors. This will improve the heated search by about a factor of n , also

the performance degrades somewhat the more threads are running concurrently. Long Answer

2: Yes, on UNIX systems (incl MacOSX) you can use a parallel virtual machine, for example

LAM-MPI (see their website: httpd://www.lam-mpi.org) and compile migrate with ”conﬁgure;

make mpis” (or similar see by typing ”conﬁgure”) you need the MPI libraries that come with

the above environment (see HOWTO-PARALLEL). Or you can do it yourself manually. See the ﬁle

HOWTO-PARALLEL.

11. I run migrate several times and get inconsistent estimates. If the proﬁle conﬁdence intervals

of a run exclude other runs,

Then you should run the program longer by increasing short-inc

and long-inc and short-sample and long-sample (For ML), or increase the long-inc andlong-sample (for Bayes inference). In addition, you should try to do replicates (for

example replicate=YES:10) and also use heating (heating=Static:1:1,1.5,3,10000), if you still

have problems I would like to hear about this. I have seen datasets were people tried to estimate

several parameters with very short sequences that when run properly delivered conﬁdence intervals

with rather unwelcome conﬁdence intervals from close to zero to very large values (~inf ).

12. I run migrate and the population sizes are strangely high.

If the likelihood surfaces are very ﬂat than migrate might err onto regions that deliver to high population sizes. if this happens in a short chain than the program will rarely be able to return to more reasonable values. You need replication and heating (see question about inconsistent estimates above). I am biasing starting in version 1.5 towards the driving parameters (the parameters you use to run a chain), so that

it will be harder for the program to climb to unreasonable high values, but it will go there if your

data suggests such values. Although I do not believe that Θ > 10 are reasonable [remember our

Θ is per site and not per locus for sequence data.], your data might violate assumptions of migrate

(and also of FST) that make it hard to get correct estimates.

About reading the outﬁle

13. I have haploid data, do I have to multiply my Θ, M and 4N m?

The Θ you get with haploid data is Θ = 2Ne µ. Comparing with other values for haploid data

should be ﬁne, but you need to multiply when you compare it with a T heta from diploid data.

14. I have mtDNA data, do I have to multiply my Θ, M and 4N m?

See question above, but in most vertebrates mtDNA is only passing through the maternal lineages

and is haploid, for a comparison with diploid data you should multiply by 4.

15.How should I interpret each of the 4Nm estimates for pair i,j.
<answer is in manual but will come here too, formulae are hard to move between systems>

16. Why do I have positive numbers in the Ln(L) column?

See also question before. the Ln(L) is actually a ratio (see Beerli and Felsenstein 1999, we have a

derivation of this ratio in the appendix, but this can be found in statistics books that talk about In our case we try to maximize. In fact, the ln(L) should be rather close to 0.0, but this is dependent on the number parameters (I think) that produce noise, with many parameter it will be not very close to 0.0, but with just

one param (single population) the value is more like 0.00x, with 16 parameter it seems more like

5-30. If you have more than one locus then it is likely that when they produce rather diﬀerent

results, that the value will go negative.

17. I have problems to understand what are the Null-hypothesis and the alternative hypothesis

in the likelihood ratio test section?

You compare two models: the Null hypothesis is that both models are equivalent. Alternative model suggests that they are different. So if you have a model that allows 4 parameters and one that only allows two, and you get a “nonsignificant” result, you should go with the simpler model, if you get a significant results then go with the full model because hat was the better model (if not than you did not run the program long enough!)

Eventually this page will turn into a WIKI so that all users can add questions and perhaps give answers.