One misconception about proteomics (at least bottom up proteomics) is that we can identify your protein sequence directly from the data we generate from our mass spectrometers. Although this is technically possible using something called de novo sequencing, in practice it still does not work very well.
What we really do is match MS/MS patterns we generate in the mass spectrometer with theoretical MS/MS pattern from existing sequences. Usually these sequences are stored in a FASTA formatted text file. Sometimes people refer to this as a database, which it really isn’t.
So where do we get this FASTA file? There are a few good places to look
There are a few others too
Another option is to generate your own transcriptome if you cannot find any available protein sequences.
Here is a good article that can get you started
I’ll add more to this section hopefully soon
Please contact me if you can’t find a good database 530-754-5298. Usually we can find if one exists.