顯示具有 EFI 標籤的文章。 顯示所有文章
顯示具有 EFI 標籤的文章。 顯示所有文章

2015年12月23日 星期三

Re: [RFC PATCH 00/14] Support timezone of ACPI TAD and EFI TIME (with Chinese translation comment)


Subject Re: [RFC PATCH 00/14] Support timezone of ACPI TAD and EFI TIME
From "H. Peter Anvin" <>
Date Fri, 20 Dec 2013 08:57:03 -0800

But we prefer the TAD for that.  The case where the EFI runtime is the only source of that info is problematic as they are known to not work at runtime.  We could collect it at boot and then never change it, although you end up in definitional issues between EFI and the hw RTC.

/* 但是那樣的狀況下我們比較喜歡 TAD. 在某種案例下 EFI runtime 是唯一的資訊來源但是它卻有問題, 因為我們已知它們在 runtime 無法運作. 我們可以在開機時收集它然後永遠不修改, 但是你最終會遭遇 EFI 和 hw RTC 間的定義上問題.  */

Matthew Garrett <matthew.garrett@nebula.com> wrote:
>On Thu, 2013-12-19 at 20:22 -0800, H. Peter Anvin wrote:
>> On 12/19/2013 08:05 PM, joeyli wrote:
>> > Can we use EFI time services on x86_64 after Borislav's patches
>accepted
>> > to mainline?
/* 當 Borislav 的 patches 被上游允許之後, 在x86_64是否我們可以使用 EFI time services? */
>> > 
>> 
>> No.
/* 不行. */
>
>We will want to use them to (at minimum) obtain the clock timezone.
>Using them for general RTC access is less attractive.

/* 我們想要(最低限度)使用他們以獲得時鐘時區. 用他們作為一般的RTC存取(功能)並沒有吸引力. */
-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.


From Matthew Garrett <>
Subject Re: [RFC PATCH 00/14] Support timezone of ACPI TAD and EFI TIME
Date Fri, 20 Dec 2013 16:58:33 +0000

On Fri, 2013-12-20 at 08:57 -0800, H. Peter Anvin wrote:
> But we prefer the TAD for that.  The case where the EFI runtime is the only source of that info is problematic as they are known to not work at runtime.  We could collect it at boot and then never change it, although you end up in definitional issues between EFI and the hw RTC.

Most shipping UEFI hardware has no TAD.
/* 大部份出貨的 UEFI 硬體沒有 TAD */

-- 
Matthew Garrett <matthew.garrett@nebula.com>


Subject Re: [RFC PATCH 00/14] Support timezone of ACPI TAD and EFI TIME
From "H. Peter Anvin" <>
Date Fri, 20 Dec 2013 12:29:51 -0800

Yes, but the TZ isn't all that critical, either.  It certainly doesn't matter at all for a pure Linux system.

/* 對, 但是 TZ 並不是那麼的關鍵. 在一個純 Linux 系統上它並不重要 */

Matthew Garrett <matthew.garrett@nebula.com> wrote:
>On Fri, 2013-12-20 at 08:57 -0800, H. Peter Anvin wrote:
>> But we prefer the TAD for that.  The case where the EFI runtime is
>the only source of that info is problematic as they are known to not
>work at runtime.  We could collect it at boot and then never change it,
>although you end up in definitional issues between EFI and the hw RTC.
>
>Most shipping UEFI hardware has no TAD.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.


From Matthew Garrett <>
Subject Re: [RFC PATCH 00/14] Support timezone of ACPI TAD and EFI TIME
Date Fri, 20 Dec 2013 20:32:12 +0000

On Fri, 2013-12-20 at 12:29 -0800, H. Peter Anvin wrote:
> Yes, but the TZ isn't all that critical, either.  It certainly doesn't matter at all for a pure Linux system.

No, but it does matter for a great number of deployed Linux systems.
Dealing with the timezone over DST changes has been a perpetual problem,
and if we can make that work then life will be significantly better.

/* 不, 但它對於一個已大量佈署的 Linux 系統很重要. 處理在時區上的DST變化是個永遠的課題, 如果我們可以讓它運作, 那麼生活會更加美好 */
-- 
Matthew Garrett <matthew.garrett@nebula.com>

Date Fri, 20 Dec 2013 13:14:25 -0800
From "H. Peter Anvin" <>
Subject Re: [RFC PATCH 00/14] Support timezone of ACPI TAD and EFI TIME

On 12/20/2013 12:32 PM, Matthew Garrett wrote:
> On Fri, 2013-12-20 at 12:29 -0800, H. Peter Anvin wrote:
>> Yes, but the TZ isn't all that critical, either.  It certainly doesn't matter at all for a pure Linux system.
> 
> No, but it does matter for a great number of deployed Linux systems.
> Dealing with the timezone over DST changes has been a perpetual problem,
> and if we can make that work then life will be significantly better.
> 

And as I pointed out, it can matter a lot for VMs, since the provider
doesn't want to provision the VMs differently for different types of guests.

/* 就像我指出的, 它對虛擬機更重要, 因為供應商不希望對於不同型態的 guest 提供有差別的虛擬機 */

 -hpa


Date Fri, 20 Dec 2013 13:12:52 -0800
From "H. Peter Anvin" <>
Subject Re: [RFC PATCH 00/14] Support timezone of ACPI TAD and EFI TIME

On 12/20/2013 07:16 AM, Matthew Garrett wrote:
> On Thu, 2013-12-19 at 20:22 -0800, H. Peter Anvin wrote:
>> On 12/19/2013 08:05 PM, joeyli wrote:
>>> Can we use EFI time services on x86_64 after Borislav's patches accepted
>>> to mainline?
>>>
>>
>> No.
> 
> We will want to use them to (at minimum) obtain the clock timezone.
> Using them for general RTC access is less attractive.
> 

One option is to use the EFI runtime call to get and save the clock
timezone before we call ExitBootServices() in the EFI stub.  This
doesn't obviate the need for proper handling of the TAD, though,
especially since it is likely that future hardware will not have a RTC
in the current form (it is a way more complex device than is needed,
which wouldn't normally be a problem, but the fact that it has to
operate in the Vbat well makes it a major one.)

/* 有個選項是我們可以在 EFI stub 內, 調用 ExitBootServices() 之前, 使用 EFI runtime call 取得和儲存時鐘時區. 雖然這樣仍無法避免需妥善處理 TAD, 特別是因為未來的硬體很可能沒有現在這種形式的 RTC (它是一種比需求更複雜的設備, 這通常不會是個問題, 但事實上, 它必須具備在Vbat(電池)下運作良好的重大特性) */

 -hpa

2015年11月17日 星期二

Re: [GIT PULL] x86/mm changes for v4.4 (with Chinese translation comment)

Prior read: Re: [PATCH v2] x86/mm: warn on W+x mappings

Date: Fri, 6 Nov 2015 11:39:43 +0000
From: Matt Fleming <matt@codeblueprint.co.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Jones <davej@codemonkey.org.uk>, Ingo Molnar <mingo@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>,
        Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Stephen Smalley <sds@tycho.nsa.gov>,
        linux-efi@vger.kernel.org
Subject: Re: [GIT PULL] x86/mm changes for v4.4
User-Agent: Mutt/1.5.24 (2015-08-30)

We have separate page tables today, for a few reasons, but mainly it's
/* 因為一些原因, 我們目前有分離的page tables */
so that we can have an identity mapping of memory present in the
/* 主要原因是我們可以有一個恆等映射(identity mapping, 1:1) */
region usually used by user processes - broken firmware still uses
/* 通常被 user processes 使用, 壞掉的 firmware 仍然使用恆等映射 */
those identity mappings even after the kernel tells it they're
/* 即使 kernel 告訴他們已經失效了 */
invalid.

Note that when I say "separate" I'm talking about trampoline_pgd[]
which is also used by the x86 suspend/resume code.
/* 注意當我說"分離"時, 我講的是關於 trampoline_pgd[]也被使用在 x86 suspend/resume 的程式中 */

However, turns out that the issue with the current scheme is the fact
/* 原來的問題在當前的方案, 事實上 trampoline_pgd[] 分享了一些 PGD entries 給 swapper_pg_dir */
that trampoline_pgd[] actually shares a couple of PGD entries with
swapper_pg_dir as can be seen in setup_real_mode(),


        trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd);
        trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
        trampoline_pgd[511] = init_level4_pgt[511].pgd;

So when we map the EFI regions in efi_map_regions() we're inserting
/* 所以當我們映射 EFI 區域也一並映射到swapper_pg_dir */
them into swapper_pg_dir also, which is why you're seeing the
warnings.

If I remember correctly the rationale for using trampoline_pgd[] was
/* 使用 trampoline_pgd[] 是因為它已經有我們想要的(提供恆等映射) */
that it already did what we wanted (provided the identity mapping) and
would save us the overhead of maintaining more page tables for no good
/* 可以節省我們用於維護更多 page tables 的開銷 */
reason. Obviously this entire thread is a good reason.

I suggest we stop using trampoline_pgd[] (since it has a good reason 
/* 我建議停止使用 trampoline_pgd[] (它具有一個好的理由去分享 kernel 映射 PGD entries)   */
for sharing the kernel mapping PGD entries) and create our own so that
/* 而且建立我們自己的(PGD)然後我們可以完全隔離 EFI */
we can isolate EFI completely.

For the immediate problem of the warnings spewing forth on all UEFI
machines, at the very least the config options needs to be disabled by
/* 最起碼 config 選項必須預設關閉 */
default, if not the patch reverted.



Date: Sat, 7 Nov 2015 08:05:54 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, Dave Jones <davej@codemonkey.org.uk>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin"
        <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Stephen Smalley <sds@tycho.nsa.gov>,
        linux-efi@vger.kernel.org
Subject: Re: [GIT PULL] x86/mm changes for v4.4
User-Agent: Mutt/1.5.23 (2014-03-12)


* Matt Fleming <matt@codeblueprint.co.uk> wrote:

> On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote:
[...]
> I suggest we stop using trampoline_pgd[] (since it has a good reason
> for sharing the kernel mapping PGD entries) and create our own so that
> we can isolate EFI completely.

Ok. Could you please make this fix a priority for upcoming EFI changes?

> For the immediate problem of the warnings spewing forth on all UEFI
> machines, at the very least the config options needs to be disabled by
> default, if not the patch reverted.

We'll certainly flip around the default, but reverting would be shooting
/* 我們肯定會反轉預設值 */
the messenger: the EFI code is endangering everyone else today, and for
/* EFI 程式正在危害其他人, 而且它的出現沒有充份理由 */
no good reason as it appears... so the warning very much served its
/* 這樣的警告(CONFIG_DEBUG_WX)非常成功的達成目的, 指出了一個有效的問題 */
purpose in pointing out a valid problem.

Thanks,

        Ingo



Date: Fri, 6 Nov 2015 12:39:12 +0000
From: Matt Fleming <matt@codeblueprint.co.uk>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner
        <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Denys Vlasenko
        <dvlasenk@redhat.com>, Kees Cook <keescook@chromium.org>, linux-efi@vger.kernel.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: [GIT PULL] x86/mm changes for v4.4
User-Agent: Mutt/1.5.24 (2015-08-30)

On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>
>  3) We should fix the EFI permission problem without relying on the firmware: it
/* 我們必須在不依賴軔體的狀況下修好 EFI 的權限問題 */
>     appears we could just mark everything R-X optimistically, and if a write fault
/* 我們可以樂觀的標記所有東西為R-X */
>     happens (it's pretty rare in fact, only triggers when we write to an EFI
/* 而且當寫入失敗發生時(這很罕見, 只有我們寫入EFI變數時會觸發), 我們可以在運行中標記失敗page為RW- */
>     variable and so), we can mark the faulting page RW- on the fly, because it
>     appears that writable EFI sections, while not enumerated very well in 'old'
/* 因為它出現在寫入 EFI 段的時候, 在舊的軔體沒有很好的列舉出來, 仍應該是page粒度 */
>     firmware, are still supposed to be page granular. (Even 'new' firmware I 
/* (就算是新的軔體, 我也不會自動的相信會得到正確的列舉...) */
>     wouldn't automatically trust to get the enumeration right...)

Sorry, this isn't true. I misled you with one of my earlier posts on
/* 抱歉這是錯的,我誤導了你 */
this topic. Let me try and clear things up...

Writing to EFI regions has to do with every invocation of the EFI
/* 寫入 EFI 區域在每次叫用EFI runtime services時會發生, 不僅限於 讀/寫/刪除 EFI 變數 */
runtime services - it's not limited to when you read/write/delete EFI
variables. In fact, EFI variables really have nothing to do with this
/* 事實上, EFI 變數和這次的討論真的沒關係 */
discussion, they're a completely opaque concept to the OS, we have no
/* 對OS來說他們完全是不透明的觀念 */
idea how the firmware implements them. Everything is done via the EFI
boot/runtime services.

The firmware itself will attempt to write to EFI regions when we
/* 當我們調用EFI services時, 軔體本身會嘗試寫入 EFI 區域, 因為 PE/COFF 的.data 以及.bss 是和 heap 存活在一起 */
invoke the EFI services because that's where the PE/COFF ".data" and
".bss" sections live along with the heap. There's even some relocation
/* 甚至像一些發生在 SetVirtualAddressMap() 時的重新定位位置調整, 所以它也會寫入.text */
fixups that occur as SetVirtualAddressMap() time so it'll write to
".text" too.

Now, the above PE/COFF sections are usually (always?) contained within
/* 上述的 PE/COFF sections 常常(總是?) 被包含在 EfiRuntimeServicesCode 型態的EFI 區域中 */
EFI regions of type EfiRuntimeServicesCode. We know this is true
/* 我們知道這個事實乃是因為軔體開發者告訴我們 */
because the firmware folks have told us so, and because stopping that
/* 而且也是因為它阻擋了 EFI_PROPERTIES_TABLE 新功能背後的動機 */
is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
V2.5.

The data sections within the region are also *not* guaranteed to be
/* 在區域中的 data 區段也不保證是 page 粒度 */
page granular because work was required in Tianocore for emitting
/* 因為 Tianocore 的工作需求, 用來發出 4k 對齊的區段作為支援 EFI_PROPERTIES_TABLE 的一部份 */
sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
support.

Ultimately, what this means is that if you were to attempt to
/* 最終這代表了如果你嘗試動態佈置這些需要write權限的區域, 你橫豎都必須修改EFI區域的主要映攝 */
dynamically fixup those regions that required write permission, you'd
have to modify the mappings for the majority of the EFI regions
anyway. And if you're blindly allowing write permission as a fixup,
/* 而且如果你盲目的允許write權限, 這就不會得到太多的安全性 */
there's not much security to be had.

>     If that 'supposed to be' turns out to be 'not true' (not unheard of in
/* 如果這個"認為應該是"被正名為"不對的" (不是前所未聞的軔體園地) */
>     firmware land), then plan B would be to mark pages that generate write faults
/* 則 plan B 就是標記那些產生 write 失敗的 pages 成為 RWX, 這樣不會破壞功能 */
>     RWX as well, to not break functionality. (This 'mark it RWX' is not something
/* 這個"標記它為RWX"並不是一些容易取用的漏洞, 而且我們仍然可以產生一個警告[在EFI call完成之後], 如果這個警告曾經被觸發 */
>     that exploits would have easy access to, and we could also generate a warning
>     [after the EFI call has finished] if it ever triggers.)
>
>     Admittedly this approach might not be without its own complications, but it
/* 誠然,這種方法可能不是沒有自己的並發症, */
>     looks reasonably simple (I don't think we need per EFI call page tables,
/* 但是他看來相當簡單 (我不認為我們需要逐一 EFI call 的 page tables, 等等) */
>     etc.), and does not assume much about the firmware being able to enumerate its
/* 而且這並沒有假設軔體能夠正確列舉其權限 */
>     permissions properly. Were we to merge EFI support today I'd have insisted on
>     trying such an approach from day 1 on.

We already have separate EFI page tables, though with the caveat that
/* 我們已經有分開的 EFI page tables */ /* 但需要提醒的是 */
we share some of swapper_pg_dir's PGD entries. The best solution would
/* 我們共享了一些 swapper_pg_dir 的 PGD entries. */
be to stop sharing entries and isolate the EFI mappings from every
/* 最好的解法是停止共享 entires 並且將 EFI mappings 從所有其他的 page table 結構隔離開來 */
other page table structure, so that they're only used during the EFI
/* 所以他們(EFI mappings page tables) 只被用在 EFI service calls 中 */
service calls.



Date: Sat, 7 Nov 2015 08:09:22 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner
        <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Denys Vlasenko
        <dvlasenk@redhat.com>, Kees Cook <keescook@chromium.org>, linux-efi@vger.kernel.org, Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: Re: [GIT PULL] x86/mm changes for v4.4
User-Agent: Mutt/1.5.23 (2014-03-12)


* Matt Fleming <matt@codeblueprint.co.uk> wrote:

> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
> >
[...]
>
> Ultimately, what this means is that if you were to attempt to
> dynamically fixup those regions that required write permission, you'd
> have to modify the mappings for the majority of the EFI regions
> anyway. And if you're blindly allowing write permission as a fixup,
> there's not much security to be had.

I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
/* "修理"代表把R-X改成RW-, 例如, 它增加了 write 權限但是移除 execute 權限 */
to RW-, i.e. it would add 'write' permission but remove 'execute' permission.

Note that there would be no 'RWX' permission at any given moment - which is the
/* 請注意這就不會有 RWX 權限同時存在, 這是危險的組合 */
dangerous combination.

> >     If that 'supposed to be' turns out to be 'not true' (not unheard of in
> >     firmware land), then plan B would be to mark pages that generate write faults
> >     RWX as well, to not break functionality. (This 'mark it RWX' is not something
> >     that exploits would have easy access to, and we could also generate a warning
> >     [after the EFI call has finished] if it ever triggers.)
> >
> >     Admittedly this approach might not be without its own complications, but it
> >     looks reasonably simple (I don't think we need per EFI call page tables,
> >     etc.), and does not assume much about the firmware being able to enumerate its
> >     permissions properly. Were we to merge EFI support today I'd have insisted on
> >     trying such an approach from day 1 on.
>
> We already have separate EFI page tables, though with the caveat that
> we share some of swapper_pg_dir's PGD entries. The best solution would
> be to stop sharing entries and isolate the EFI mappings from every
> other page table structure, so that they're only used during the EFI
> service calls.

Absolutely. Can you try to fix this for v4.3?

Thanks,

        Ingo



Date: Sat, 7 Nov 2015 08:39:35 +0100
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux Kernel Mailing List
        <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski
        <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Kees Cook <keescook@chromium.org>, "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>
Subject: Re: [GIT PULL] x86/mm changes for v4.4

On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>
[...]
>
> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>
> Note that there would be no 'RWX' permission at any given moment - which is the
> dangerous combination.
>

The problem with that is that /any/ page in the UEFI runtime region
/* 問題在於 EFI runtime 區域中的任何 page 可能和任何組成 runtime 軔體的 PE/COFF images 的 .text 與 .data 相交 */
may intersect with both .text and .data of any of the PE/COFF images
that make up the runtime firmware (since the PE/COFF sections are not
/* 因為 PE/COFF 區段不需要 page 對齊 */
necessarily page aligned). Such pages require RWX permissions. The
/* 這些 pages 需要 RWX 權限 */
UEFI memory map does not provide the information to identify those
/* UEFI memory map 沒有提供資訊以先前識別這些 pages */
pages a priori (the entire region containing several PE/COFF images
/* 包含了幾個 PE/COFF 影像的整個區域可能只被單一entry包覆 */
could be covered by a single entry) so it is hard to guess which pages
/* 所以很難猜測哪個 pages 必須允許 RWX 權限 */
should be allowed these RWX permissions.



Date: Sat, 7 Nov 2015 22:58:52 -0800
From: Kees Cook <keescook@chromium.org>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>, Matt Fleming <matt@codeblueprint.co.uk>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux
        Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>, Matthew Garrett <mjg59@coreos.com>
Subject: Re: [GIT PULL] x86/mm changes for v4.4

On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>
>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>>> >
[...]
>
> The problem with that is that /any/ page in the UEFI runtime region
> may intersect with both .text and .data of any of the PE/COFF images
> that make up the runtime firmware (since the PE/COFF sections are not
> necessarily page aligned). Such pages require RWX permissions. The
> UEFI memory map does not provide the information to identify those
> pages a priori (the entire region containing several PE/COFF images
> could be covered by a single entry) so it is hard to guess which pages
> should be allowed these RWX permissions.

I'm sad that UEFI was designed without even the most basic of memory            
/* 我感到遺憾 UEFI 的設計沒有最基本的記憶體保護 */
protections in mind. UEFI _itself_ should be setting up protective              
/* UEFI 本身應該設置保護性 page mappings */
page mappings. :(

For a boot firmware, it seems to me that safe page table layout would           
/* 對於一個開機軔體, 對我來說"安全的 page table 佈局"會是高優先級的臭蟲 */
be a top priority bug. The "reporting issues" page for TianoCore
doesn't actually seem to link to the "Project Tracker":
https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues

Does anyone know how to get this correctly reported so future UEFI
releases don't suffer from this?

-Kees



Date: Sun, 8 Nov 2015 08:55:24 +0100
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>, Matt Fleming <matt@codeblueprint.co.uk>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux
        Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>, Matthew Garrett <mjg59@coreos.com>
Subject: Re: [GIT PULL] x86/mm changes for v4.4

On 8 November 2015 at 07:58, Kees Cook <keescook@chromium.org> wrote:
> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>>
[...]
>
> I'm sad that UEFI was designed without even the most basic of memory
> protections in mind. UEFI _itself_ should be setting up protective
> page mappings. :(
>

Well, the 4 KB alignment of sections was considered prohibitive at the
/* 4KB 對齊區段在節省程式大小時被考慮過禁止. 但這是很久以前 */
time from code size pov. But this was a long time ago, obviously.

> For a boot firmware, it seems to me that safe page table layout would
> be a top priority bug. The "reporting issues" page for TianoCore
> doesn't actually seem to link to the "Project Tracker":
> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
>
> Does anyone know how to get this correctly reported so future UEFI
> releases don't suffer from this?
>

Ugh. Don't get me started on that topic. I have been working with the           
/* 不要讓我開始這個話題. */
UEFI forum since July to get a fundamentally broken implementation of           
/* 我從7月份開始和 UEFI 論壇工作以修復從根本上就損壞的記憶體保護 */
memory protections fixed. UEFI v2.5 defines a memory protection scheme          
/* UEFI v2.5 定義了記憶體保護策略, 它是基於分割 PE/COFF 影像到分離的記憶體區域 */
that is based on splitting PE/COFF images into separate memory regions
so that R-X and RW- permissions can be applied. Unfortunately, that             
/* 所以R-X 和 RW- 權限可以應用上去 */
broke every OS in existence (including Windows 8), since the OS is             
/* 不幸的是, 這破壞了每個既存的 OS (包含 Windows 8) */
allowed to reorder memory regions when it lays out the virtual                  
/* 由於 OS 在規劃 EFI 區域的虛擬映射時, 被允許對於記憶體區域重新排序 */
remapping of the UEFI regions, resulting in PE/COFF .data and .text             
/* 這造成 PE/COFF 中 .data 和 .text 可能出現順序亂掉 */
potentially appearing out of order.

The good news is that we fixed it for the upcoming release (v2.6). I            
/* 好消息是我們在即將發行的v2.6修正了, 我不能透露任何細節 :-( */
can't disclose any specifics, though :-(



Date: Mon, 9 Nov 2015 13:08:01 -0800
From: Kees Cook <keescook@chromium.org>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>, Matt Fleming <matt@codeblueprint.co.uk>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux
        Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>, Matthew Garrett <mjg59@coreos.com>
Subject: Re: [GIT PULL] x86/mm changes for v4.4

On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 8 November 2015 at 07:58, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>>>>
[...]
>
> Well, the 4 KB alignment of sections was considered prohibitive at the
> time from code size pov. But this was a long time ago, obviously.

Heh, yeah, I'd expect max 4K padding to get code/data correctly
/* 我期望最大 4K 的填充在獲取代碼/數據時正確的對齊 2MB 而不會構成問題 */
aligned on a 2MB binary to not be an issue. :)

[...]
>
> Ugh. Don't get me started on that topic. I have been working with the
> UEFI forum since July to get a fundamentally broken implementation of
> memory protections fixed. UEFI v2.5 defines a memory protection scheme
> that is based on splitting PE/COFF images into separate memory regions
> so that R-X and RW- permissions can be applied. Unfortunately, that
> broke every OS in existence (including Windows 8), since the OS is
> allowed to reorder memory regions when it lays out the virtual
> remapping of the UEFI regions, resulting in PE/COFF .data and .text
> potentially appearing out of order.
>
> The good news is that we fixed it for the upcoming release (v2.6). I
> can't disclose any specifics, though :-(

As long as there's motion to getting it fixed, that makes me happy! :)
/* 只要有動力讓它修正, 都可以讓我開心! */
Does 2.6 get rid of the (AIUI) 2MB limit too?                           
/* 2.6 版是否也擺脫了 2MB(就我了解) 的限制? */

-Kees



Date: Tue, 10 Nov 2015 08:08:30 +0100
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>, Matt Fleming <matt@codeblueprint.co.uk>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux
        Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>, Matthew Garrett <mjg59@coreos.com>
Subject: Re: [GIT PULL] x86/mm changes for v4.4

On 9 November 2015 at 22:08, Kees Cook <keescook@chromium.org> wrote:
> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
[...]
>
> Heh, yeah, I'd expect max 4K padding to get code/data correctly
> aligned on a 2MB binary to not be an issue. :)
>

This is not about section sizes on ARM. The PE/COFF format does not             
/* 這和 ARM 的區段大小無關 */
use segments, like ELF, so the payload (the sections) needs to be               
/* PE/COFF 格式沒有使用分段, 和 ELF 相同, */
completely disjoint from the header. This means, when using 4 KB                
/* 所以負載(這些區段)必須和 header 完全脫節 */
alignment, that every PE/COFF image wastes ~4 KB in the header and 4            
/* 每個 PE/COFF 影像浪費大約 4 KB 在 header 和平均 4KB 在段填充 */
KB on average in the section padding (assuming a .text/.data/.reloc             
/* (假設一個 .text/.data/.reloc 佈局, 在 PE/COFF 常見) */
layout, as is common with PE/COFF)

Considering that a typical UEFI firmware image consists of numerous             
/* 考慮到一個典型的 UEFI 軔體影像是由多個(我想平均大約五十個) PE/COFF 影像組成 */
(around 50 on average, I think) PE/COFF images, and some of them                
/* 而且他們部份從 NOR flash 中執行, Tianocore 工具 (關係到實作) */
execute from NOR flash, the Tianocore tooling (which is the reference           
/* 一直著眼於儘可能小的前提下保持對齊 */
implementation) has always been geared towards keeping the alignment
as small as possible, typically 32 bytes unless data objects need               
/* 通常是 32 位元, 除非需要更多 data 物件 */
more. Since the UEFI runtime services are typically implemented by              
/* 由於 UEFI runtime services 通常以數個 PE/COFF 影像來實作 */
several of these PE/COFF images, and since the memory they occupy may           
/* 而且由於記憶體所佔用(空間)可能只由單一個 UEFI memory map 條目所描述 */
be described by a single UEFI memory map entry, there is simply no             
/* 根本沒有簡單的方法來決定哪些頁面需要 R-X, RW- 或 RWX */
easy way to decide which pages need R-X, RW- or RWX. Even looking for           
/* 即使尋找記憶體中的 PE/COFF 標頭們也無法保證可行, 由於 */
PE/COFF headers in the memory region is not guaranteed to work, since
the PE/COFF header is part of the file format, not the memory format            
/* PE/COFF 標頭是檔案格式的一部份, 不是記憶體格式 */
(i.e., since the header is disjoint from the payload, a PE/COFF loader          
/* (也就是: 因為標頭和負載是脫節的, 一個 PE/COFF 載入器不需要拷貝標頭到記憶體) */
is not required to copy the header to memory)

>
> As long as there's motion to getting it fixed, that makes me happy! :)
> Does 2.6 get rid of the (AIUI) 2MB limit too?
>

No, there is no such limit in UEFI. If there is a limit like that, it          
/* 不, 並沒有這樣的限制在 UEFI. 如果有類似這樣的限制, */
is an implementation detail of the UEFI support in the OS.                      
/* 它會是一個 OS 支援 UEFI 的實作細節 */

For arm64 (and the upcoming ARM support), the UEFI runtime services             
/* 對於 ARM64 (和即將到來的 ARM 支援), UEFI runtime services 區域 */
regions are remapped into a virtual userland range that is only active          
/* 被重新映射到一個虛擬的使用者空間範圍 */
during the time runtime services are being invoked. (x86 does                  
/* 這個範圍只有在 runtime services 被調用時啟動 */
something similar, but it shares the page tables with the                       
/* x86 下做了類似的事情, 但是就我了解它和 suspend/resume 程式共享了 page tables */
suspend/resume code afaiu) These mappings could be page granularity             
/* 這些映射可以是頁粒度 (由於他們不需要在線性區域中分割PUDs或PMDs) */
(since they don't require splitting PUDs or PMDs in the linear
region), with the side note that arm64 mandates 64 KB alignment (to             
/* 補充說明 arm64 要求 64 KB 對齊 (和 64 KB 頁的作業系統互通) */
interoperate with 64 KB pages OSes). This requirement has been added            
/* 這個需求已經添加到 UEFI 規範, 也就是, */
to the UEFI spec, i.e., a v2.5 compliant arm64 firmware should not              
/* 一個和 v2.5 相容的 arm64 軔體不應該以非64 KB 對齊的方式曝露 UEFI runtime 區域 */
expose UEFI runtime regions that are not 64 KB aligned.



Date: Tue, 10 Nov 2015 12:11:18 -0800
From: Kees Cook <keescook@chromium.org>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>, Matt Fleming <matt@codeblueprint.co.uk>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux
        Kernel Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>, Matthew Garrett <mjg59@coreos.com>
Subject: Re: [GIT PULL] x86/mm changes for v4.4

On Mon, Nov 9, 2015 at 11:08 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 9 November 2015 at 22:08, Kees Cook <keescook@chromium.org> wrote:
>> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
[...]
>
> This is not about section sizes on ARM. The PE/COFF format does not
> use segments, like ELF, so the payload (the sections) needs to be
> completely disjoint from the header. This means, when using 4 KB
> alignment, that every PE/COFF image wastes ~4 KB in the header and 4
> KB on average in the section padding (assuming a .text/.data/.reloc
> layout, as is common with PE/COFF)
>
> Considering that a typical UEFI firmware image consists of numerous
> (around 50 on average, I think) PE/COFF images, and some of them

Oooh, that's no fun. So the linker can't produce merged .text and               
/* 喔, 這不妙. 所以linker 不能產出合併了 .text 和 .data 的區段? */
.data sections?

[...]
>
> No, there is no such limit in UEFI. If there is a limit like that, it
> is an implementation detail of the UEFI support in the OS.
>
> For arm64 (and the upcoming ARM support), the UEFI runtime services
> regions are remapped into a virtual userland range that is only active
> during the time runtime services are being invoked. (x86 does
> something similar, but it shares the page tables with the
> suspend/resume code afaiu) These mappings could be page granularity
> (since they don't require splitting PUDs or PMDs in the linear
> region), with the side note that arm64 mandates 64 KB alignment (to
> interoperate with 64 KB pages OSes). This requirement has been added
> to the UEFI spec, i.e., a v2.5 compliant arm64 firmware should not
> expose UEFI runtime regions that are not 64 KB aligned.

Cool, thanks for the details!

-Kees



Date: Fri, 6 Nov 2015 13:09:48 +0000
From: Matt Fleming <matt@codeblueprint.co.uk>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux Kernel Mailing List
        <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski
        <luto@kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Kees Cook <keescook@chromium.org>, linux-efi@vger.kernel.org
Subject: Re: [GIT PULL] x86/mm changes for v4.4
User-Agent: Mutt/1.5.24 (2015-08-30)

On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote:
>
> Admittedly, we might need to use a certain amount of care to avoid        
 /* 不可否認的, 我們必須有一定程度的謹慎去避免 vmap 機制們間有趣的衝突 */
> interesting conflicts with the vmap mechanism.  We might need to vmap      
/* 我們可能需要虛擬映射所有 EFI 的東西, */
> all of the EFI stuff, and possibly even all the top-level entries that      
/* 而且甚至可能是全部包含 EFI 材料的頂層條目 */
> contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*)      
/* (亦即他們其中只有一個, 除非EFI最終太巨大) */
> as a blank not-present region to avoid overlaps, but that's not a big      
/* 成為一個空白不存在的區域以避免重疊, 但這不是大問題 */
> deal.

There shouldn't be any room for conflicting with vmap() because the VA        
/* 不應該存在任何和vmap()衝突的空間, */
region where we map EFI regions is still carved out especially for us.        
/* 因為用於映射EFI區域的虛擬位置區域仍然有為我們特別刻劃出來 */

Right Boris?



Date: Fri, 6 Nov 2015 14:24:47 +0100
From: Borislav Petkov <bp@alien8.de>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>, Ingo Molnar <mingo@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, Stephen Smalley <sds@tycho.nsa.gov>, Dave Jones <davej@codemonkey.org.uk>, Linux Kernel
        Mailing List <linux-kernel@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Denys
        Vlasenko <dvlasenk@redhat.com>, Kees Cook <keescook@chromium.org>, linux-efi@vger.kernel.org
Subject: Re: [GIT PULL] x86/mm changes for v4.4
User-Agent: Mutt/1.5.23 (2014-03-12)

On Fri, Nov 06, 2015 at 01:09:48PM +0000, Matt Fleming wrote:
> On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote:
> >
> > Admittedly, we might need to use a certain amount of care to avoid
> > interesting conflicts with the vmap mechanism.  We might need to vmap
> > all of the EFI stuff, and possibly even all the top-level entries that
> > contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*)
> > as a blank not-present region to avoid overlaps, but that's not a big
> > deal.
>
> There shouldn't be any room for conflicting with vmap() because the VA
> region where we map EFI regions is still carved out especially for us.
>
> Right Boris?

Yap:                                                                            /* 是的 */

ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)

vs

ffffffef00000000 - ffffffff00000000 EFI region in trampoline_pgd

the new pagetable will make that issue moot too.                               
/* 新的 pagetable 也將使該問題沒有實際意義 */

--
Regards/Gruss,
    Boris.