Frequently Asked Questions
Frequently Asked Questions
General
(1)I cannot find the program executable? I double-click the program icon but nothing happens?
(2)The program crashes! Your program has a bug!
(3)The program crashes with large but not with small data sets, what is wrong?
Data file related
(4)I need more input about how microsatellites are coded in migrate?
(5)How can I code haploid data for Migrate?
(6)Can I use haplotype frequencies as input?
(7)Can I use gene frequencies as input?
Options and how long to run
(8)It runs with the default number of chains etc. Has it run long enough?
(9)How long does it run?
(10)Can migrate run on multiple machines in parallel?
(11)I run migrate several times and get inconsistent estimates.
(12)I run migrate and the population sizes are strangely high.
Outputfile and interpretation
(13)I have haploid data, what is Θ?
(14)I have mtDNA sequence data, what is Θ?
(15)How should I interpret each of the 4Nm estimates for pair i,j?
(16)Why are the Likelihood values different between runs?
(17)Why do I have positive numbers in the Ln(L) column?
(18)I have problems to understand what are the Null-hypothesis and the alternative hypothesis in the likelihood ratio test section.
General
1.I cannot find the program executable? I double-click the program icon but nothing happens?
Some binary distributions contain migrate-n as command line tool and they need to be started from
a Terminal program [or shell]. A typical migrate run on Macosx operating systems involves to start
of the Terminal.app (for tutorials about this see http://www.macdevcenter.com/pub/ct/51,
and then change to the directory where the data resides, and then start migrate-n.
2. The program crashes! Your program has a bug!
Sure, this program most likely has some bugs, but more likely is that the infile is not correct,
and without more detail about what went wrong there is little hope for help.
3. The program crashes with large but not with small data sets, what is wrong? [System description... + part of log]
General: Most often mistakes in the infile, such as wrong number loci or populations
or individuals or number of sites or using few characters for the individual names, let the
program crash almost immediately after the menu. Check the infile carefully and compare
with the data file specifications.
Answers
Data file related
4. I need more input on how to code microsatellites for migrate Assume you analyze two
individuals for two msat loci (locus1 is a CA repeat, locus 2 a TCA repeat) that look like this
Primer 1 msat locus 1
------------- ------
1A ATTAGACATTGTGCACACACACACACATTGGAC
1B ATTAGACATTGTGCACACACACACACATTGGAC
2A ATTAGACATTGTGCACACACACACACATTGGAC
2B ATTAGACATTGTGCACACACA------TTGGAC
Primer 2 msat locus 2
------------- ------
1A ATTAGACATTGTGTCATCATCATCATCATCATCATCATCATCATCATCATCATCATTGGAC
1B ATTAGACATTGTGTCATCATCATCATCATCATCATCATCATCATCATCATCA---TTGGAC
2A ATTAGACATTGTGTCATCATCATCATCATCATCATCATCATCA------------TTGGAC
2B ATTAGACATTGTGTCATCATCATCA------------------------------TTGGAC
in a migrate infile this would code (where / is a free chosen delimiter you can choose any character,
but I recommend something like / . , or similar)
1 2 / Example of how to code migrate
2 Example population with 2 diploid indiviudals
1A1B______ 7/7 14/13
2A2B______ 7/4 10/4
5. I have haploid allelic data, how should I structure my infile
Unfortunately, I was biased towards diploid data for microsatellite and enzyme electrophoretic data
and you need to fake diploids for the infile. Your microsatellite exampled data look like this:
Locus1 Locus2 Locus3 Locus4 Locus5
Ind1 11 45 14 15 89
Ind2 11 47 13 15 67
Ind3 11 43 13 15 67
Ind4 12 47 13 15 73
Ind5 11 45 13 15 89
And your infile should look like this
2 5 . Example input for haploid microsatellite data
5 Fake diploid population 1
Ind1 11.? 45.? 14.? 15.? 89.?
Ind2 11.? 47.? 13.? 15.? 67.?
Ind3 11.? 43.? 13.? 15.? 67.?
Ind4 12.? 47.? 13.? 15.? 73.?
Ind5 11.? 45.? 13.? 15.? 89.?
4 Fake diploid population 2
..data not shown..
Or
2 5 . Example input for haploid microsatellite data
3 Fake diploid population 1
Ind1Ind2 11.11 45.47 14.13 15.15 89.67
Ind3Ind4 11.12 43.47 13.13 15.15 67.73
Ind5???? 11.? 45.? 13.? 15.? 89.?
4 Fake diploid population 2
..data not shown..
The “?” are removed for the analysis (But recognize that in sequence data the ? are not removed.
6.Can I use haplotype frequencies as input?
No, input formats are a rather arbitrary matter, and I decided that you need to input each single sequence of genotype. I principle it would be easy to add a “frequency” input mode, but currently I have not time to do that. But keep asking for it, if this is so important to you.
7.Can I use gene frequencies as input?
No, not yet, this is on the todo list, but has a rather low priority. To circumvent the problem, you can create artificial genotypes for the infile. The genotypes themselves are not important. A simple script that assigns alleles to individuals will do, this can be written in almost any scripting language from excel (yikes!), word-macro (yikes!), Perl, C, C++, applescript, Mathematica, ... for throw away programs I use Perl1 , Mathematica2 , or C3 .
About options and how to run
8. It run with the default number of chains etc. Has it run long enough?
This depends on the number of populations you want to analyze. If you have one it will be almost
certainly enough. But if you try to analyze 6 or more it almost certainly will not. You need to
experiment a little with the length of chains. See chapter 3 (Accuracy of results).
9. How long does it run?
With progress=Yes the program tries to estimate the length of a run from the work it has done
so far, after the first short chain this may be rather imprecise, but you may realize that you need
to wait minutes or days (just imagine you estimate the time to travel from Spokane to Seattle
in a car and estimate when you will arrive only using the distance and time you have finished
already). The time calculated is only based on the genealogy search, and does not include the
time to create the plots for each locus and population. Therefore, if you have many populations
and many loci you can expect to wait longer than the time stamp indicates. There is an additional
time estimate for the profile-likelihoods.
10. Can migrate run on multiple machines in parallel?
Short answer: YES. Long Answer 1: If you use the heating option and your machine is
a symmetric multiprocessor machine and you compiled with make thread then the program
will utilize n processors. This will improve the heated search by about a factor of n , also
the performance degrades somewhat the more threads are running concurrently. Long Answer
2: Yes, on UNIX systems (incl MacOSX) you can use a parallel virtual machine, for example
LAM-MPI (see their website: httpd://www.lam-mpi.org) and compile migrate with ”configure;
make mpis” (or similar see by typing ”configure”) you need the MPI libraries that come with
the above environment (see HOWTO-PARALLEL). Or you can do it yourself manually. See the file
HOWTO-PARALLEL.
11. I run migrate several times and get inconsistent estimates. If the profile confidence intervals
of a run exclude other runs,
Then you should run the program longer by increasing short-inc
and long-inc and short-sample and long-sample (For ML), or increase the long-inc andlong-sample (for Bayes inference). In addition, you should try to do replicates (for
example replicate=YES:10) and also use heating (heating=Static:1:1,1.5,3,10000), if you still
have problems I would like to hear about this. I have seen datasets were people tried to estimate
several parameters with very short sequences that when run properly delivered confidence intervals
with rather unwelcome confidence intervals from close to zero to very large values (~inf ).
12. I run migrate and the population sizes are strangely high.
If the likelihood surfaces are very flat than migrate might err onto regions that deliver to high population sizes. if this happens in a short chain than the program will rarely be able to return to more reasonable values. You need replication and heating (see question about inconsistent estimates above). I am biasing starting in version 1.5 towards the driving parameters (the parameters you use to run a chain), so that
it will be harder for the program to climb to unreasonable high values, but it will go there if your
data suggests such values. Although I do not believe that Θ > 10 are reasonable [remember our
Θ is per site and not per locus for sequence data.], your data might violate assumptions of migrate
(and also of FST) that make it hard to get correct estimates.
About reading the outfile
13. I have haploid data, do I have to multiply my Θ, M and 4N m?
The Θ you get with haploid data is Θ = 2Ne µ. Comparing with other values for haploid data
should be fine, but you need to multiply when you compare it with a T heta from diploid data.
14. I have mtDNA data, do I have to multiply my Θ, M and 4N m?
See question above, but in most vertebrates mtDNA is only passing through the maternal lineages
and is haploid, for a comparison with diploid data you should multiply by 4.
15.How should I interpret each of the 4Nm estimates for pair i,j.
<answer is in manual but will come here too, formulae are hard to move between systems>
16. Why do I have positive numbers in the Ln(L) column?
See also question before. the Ln(L) is actually a ratio (see Beerli and Felsenstein 1999, we have a
derivation of this ratio in the appendix, but this can be found in statistics books that talk about In our case we try to maximize. In fact, the ln(L) should be rather close to 0.0, but this is dependent on the number parameters (I think) that produce noise, with many parameter it will be not very close to 0.0, but with just
one param (single population) the value is more like 0.00x, with 16 parameter it seems more like
5-30. If you have more than one locus then it is likely that when they produce rather different
results, that the value will go negative.
17. I have problems to understand what are the Null-hypothesis and the alternative hypothesis
in the likelihood ratio test section?
You compare two models: the Null hypothesis is that both models are equivalent. Alternative model suggests that they are different. So if you have a model that allows 4 parameters and one that only allows two, and you get a “nonsignificant” result, you should go with the simpler model, if you get a significant results then go with the full model because hat was the better model (if not than you did not run the program long enough!)
Eventually this page will turn into a WIKI so that all users can add questions and perhaps give answers.