As much as I like IA64, there are some things that I have found annoying. I've listed a few here, in the hope that future architecture designers consider these issues. Feel free to add to this page.
* Nested interruptions do not write any state other than cr.isr. This makes tracking them down difficult. It also means that one cannot return to the interruption address without explicitly saving the return address beforehand.
* It is not possible to branch without trashing a branch register (other than brl and rfi). In some situations it would be useful to branch without coming back and without destroying registers, for instance to return from a nested interruption handler.
* cover has the side-effect of writing cr.ifs when psr.ic=0 but not otherwise. This is a problem for virtualisation, since one cannot easily intercept the (non-privileged) cover instruction and emulate this behaviour depending on the virtualised psr.
* Writing to r0 causes a fault instead of discarding the result. A discard register can be handy, for instance the result of alloc may not be required. It was since pointed out to me that this complicates register bypass, to avoid RAW of r0 returning the value written.
* The default protection key is the region ID. It would make more sense to make the default protection key a constant, so that a default policy can easily be applied (e.g. allow if no protection key is specified). Otherwise, one needs to either make sure that a protection key is set each time, or waste a PKR for each active RID.
* The long format VHPT has a tag-invalid bit instead of tag-valid bit. This means that zeroing it potentially makes tags valid. Also, one needs to specifically set the ti bit in some situations. It would be simpler to make ttag always return tags with a certain bit set, making zero an invalid tag.
* Readability of ar.k* from user mode cannot be disabled. For some applications these registers would more useful simply as kernel temporaries. Also, a VMware-like hosted VMM needs to switch ar.k* whenever it switches between host and guest, because it cannot intercept read accesses to them.
* The Software Conventions specify that a callee extends to 32 bits and not 64 bits. This interacts badly with C, which in the absence of a prototype, assumes that a function returns int (32 bits). The caller then (sign or zero) extends to 64 bits, destroying the top 32 bits in the case that the function was actually intended to return 64 bits.
* break.b does not fill cr.iim. If you write break in your assembly code, the assembler might put it into a B slot, in which case cr.iim is not filled in correctly.